Elastic Performance for Spatial Queries Slows Down when Geometries are in Millions

I have a usecase to query ES for spatial intersection query with a Multipolygon. I tried two approaches where I store documents in flat structure, where 1 Polygon represents 1 document and in another setup I aggregate docs before ingesting and so 1 Polygon now represents n documents using nested type.

When I do spatial intersection query, I get very similar query time, average difference of + - 2sec but the reduction of documents between the two approaches is around 5x, meaning second approach has only 3 Million documents as compared to 13M previously

I want to understand what can I optimise in terms of my spatial query or index settings. I intend to have all my spatial queries perform at around 1s irrespective of polygon complexity. My intention is to get unique values for some metadata key based on a spatial intersection, in this case, areaName. I'm aware of different complexity of polygons will give different response times but comparing 13M vs 3M doc index, I expected a considerable improvement. I'm running everything using Python ES API.

On checking with Kibana Profiler, I understand that IndexOrDocValuesQuery takes longest time in which specifically build_scorer is taking 90% of the time.

Initially, I tried to ingest my documents in a flat structure with this kind of mapping:

{"mappings": {"properties": {"areaName": {"type": "keyword"      },"location": {"type": "geo_shape"      }    }  }}

Here's a sample of docs for this index:

{"_index": "catalog.a","_id": "71c0fca6-45a0-4c15-942a-a31a674f5d0d","_score": 0.0025355585,"_source": {"location": "POLYGON ((-76.4891423210711707 37.0441858681596372, -76.4891423210711707 37.0503207513172370))","areaName": "Dataset A"  },"_index": "catalog.a","_id": "71c0fca6-45a0-4c15-942a-a31a674f5d0d","_score": 0.0015355585,"_source": {"location": "POLYGON ((-76.4891423210711707 37.0441858681596372, -76.4891423210711707 37.0503207513172370))","areaName": "Dataset B"  },"_index": "catalog.a","_id": "71c0fca6-45a0-4c15-942a-a31a674f5d0d","_score": 0.002355585,"_source": {"location": "POLYGON ((-76.4891423210711707 37.0441858681596372, -76.4891423210711707 37.0503207513172370))","areaName": "Dataset C"  }}

This index is around 80GB in size and has 3 primary shards and has around 13 Million such docs. I'm testing using this query for performance:

response = es.search(index = catalog.a, body =        {"size":0,"query": {"bool": {"filter": [                {"geo_shape": {"location": {"shape": {"type": "Multipolygon","coordinates": country_variable                      },"relation": "intersects"                    }                  }                }              ]            }          },"aggs": {"unique_soilProperty": {"terms": {"field": "areaName","size":15,              }            }          }        },     request_cache=False    )

Next, I changed my mapping to this:

{"mappings": {"location": {"type": "geo_shape"      }"items": {"type": "nested","properties": {"areaName": {"type": "keyword"      },    }  }}

My documents now look like this:

{"_index": "catalog.b","_id": "71c0fca6-45a0-4c15-942a-a31a674f5d0d","_score": 0.0025355585,"_source": {"location": "POLYGON ((-76.4891423210711707 37.0441858681596372, -76.4891423210711707 37.0503207513172370))","items":[{ "areaName": "Dataset A" }]  },"_index": "catalog.b","_id": "71c0fca6-45a0-4c15-942a-a31a674f5d0d","_score": 0.0015355585,"_source": {"location": "POLYGON ((-76.4891423210711707 37.0441858681596372, -76.4891423210711707 37.0503207513172370))","items":[{ "areaName": "Dataset B" }]  }}

This index is around 25GB in size and has 1 primary shard and has around 3 Million such docs. I'm testing using this query for performance:

response = es.search(index = catalog.b, body =        {"size":0,"query": {"bool": {"filter": [                {"geo_shape": {"location": {"shape": {"type": "Multipolygon","coordinates": country_variable                      },"relation": "intersects"                    }                  }                }              ]            }          },"aggs": {"nested_items": {"nested": {"path": "items"              },"aggs": {"unique_dataSources": {"terms": {"field": "items.areaName","size": 15                  }                }              }            }          }        },     request_cache=False    )

I get very similar query performance even with this setup. What kind of optimisations I could do more, to speed it up?

Elastic Performance for Spatial Queries Slows Down when Geometries are in Millions

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...