Skip to content
This repository has been archived by the owner on Jan 20, 2025. It is now read-only.

Elasticsearch schema Raw PSC data nested type #225

Open
tiredpixel opened this issue Nov 23, 2023 · 0 comments
Open

Elasticsearch schema Raw PSC data nested type #225

tiredpixel opened this issue Nov 23, 2023 · 0 comments

Comments

@tiredpixel
Copy link
Contributor

tiredpixel commented Nov 23, 2023

Within Elasticsearch, the raw data for PSC uses ukpsc_company_records, ukpsc_roe_company_records2 indexes. In each of these, documents have a top-level data mapping with nested field type.

I'm not convinced that this is an optimal schema for this data, since it prevents inner object flattening, and makes nested queries necessary for searching the fields contained within the inner object. Whilst there are valid use cases for nested field types, a brief look through the data here seems to indicate that this doesn't hold an array of objects or similar, only a single object itself containing other objects (which isn't directly relevant here since those can have their own types).

Note that this is different to raw data for DK, which uses dk_deltagerperson_records index without a top-level data key or top-level nested field type, and also different to raw data for SK, which uses sk_records index without a top-level data key or top-level nested field type (but with numerous nested field types below, which similarly are likely non-optimal, in contrast to the DK schema).

It's not immediately clear to me why this schema has been chosen, and whether there are any advantages or issues that it's working around. But it should likely be investigated, since it prevents certain types of queries and more efficient exploration of the data, and it might well be simply an oversight.

References #173 , during which investigation this was found.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

No branches or pull requests

1 participant