You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 20, 2025. It is now read-only.
Within Elasticsearch, the raw data for PSC uses ukpsc_company_records, ukpsc_roe_company_records2 indexes. In each of these, documents have a top-level data mapping with nested field type.
I'm not convinced that this is an optimal schema for this data, since it prevents inner object flattening, and makes nested queries necessary for searching the fields contained within the inner object. Whilst there are valid use cases for nested field types, a brief look through the data here seems to indicate that this doesn't hold an array of objects or similar, only a single object itself containing other objects (which isn't directly relevant here since those can have their own types).
Note that this is different to raw data for DK, which uses dk_deltagerperson_records index without a top-level data key or top-level nested field type, and also different to raw data for SK, which uses sk_records index without a top-level data key or top-level nested field type (but with numerous nested field types below, which similarly are likely non-optimal, in contrast to the DK schema).
It's not immediately clear to me why this schema has been chosen, and whether there are any advantages or issues that it's working around. But it should likely be investigated, since it prevents certain types of queries and more efficient exploration of the data, and it might well be simply an oversight.
References #173 , during which investigation this was found.
The text was updated successfully, but these errors were encountered:
Within Elasticsearch, the raw data for PSC uses
ukpsc_company_records
,ukpsc_roe_company_records2
indexes. In each of these, documents have a top-leveldata
mapping withnested
field type.I'm not convinced that this is an optimal schema for this data, since it prevents inner object flattening, and makes
nested
queries necessary for searching the fields contained within the inner object. Whilst there are valid use cases fornested
field types, a brief look through the data here seems to indicate that this doesn't hold an array of objects or similar, only a single object itself containing other objects (which isn't directly relevant here since those can have their own types).Note that this is different to raw data for DK, which uses
dk_deltagerperson_records
index without a top-leveldata
key or top-levelnested
field type, and also different to raw data for SK, which usessk_records
index without a top-leveldata
key or top-levelnested
field type (but with numerousnested
field types below, which similarly are likely non-optimal, in contrast to the DK schema).It's not immediately clear to me why this schema has been chosen, and whether there are any advantages or issues that it's working around. But it should likely be investigated, since it prevents certain types of queries and more efficient exploration of the data, and it might well be simply an oversight.
References #173 , during which investigation this was found.
The text was updated successfully, but these errors were encountered: