You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 20, 2025. It is now read-only.
Similar to #225 , indexes such as bods_v2_psc_prod100 use nested field type for publicationDetails. Doing so makes exploration of the data more difficult, as well as complicating queries, since it prevents inner object flattening. I can't really see a reason why nested field types are used, in this case; this would need a little investigation.
I only just realised that publicationDetails.publicationDate gets set when statements are republished. This is admittedly in accordance with BODS 0.2. Perhaps I should have spotted this sooner, but I didn't, because publicationDate is buried within publicationDetails as a nested object, and even though I now know it's there, it's still hard to use it for data exploration or debugging, because of the field type.
I suggest that the usage of all nested fields types within BODS indexes in Elasticsearch is evaluated, to see whether the usage of such is in fact necessary or desirable. There might well be good reasons for some of them—identifiers comes to mind, for which the use of a nested field type is not only desirable but critical to correct results being returned. But some others, particularly those not modelled as arrays of objects—which requires special treatment in Elasticsearch since there is no dedicated array field type—would benefit from being re-evaluated.
Fields to check
addresses
annotations
identifiers (almost certainly correct, as noted above)
incorporatedInJurisdiction
interestedParty
interestedParty.unspecified
interests
interests.share
names
nationalities
pepStatusDetails
pepStatusDetails.source
pepStatusDetails.source.assertedBy
placeOfBirth
placeOfResidence
publicationDetails (likely incorrect, as noted above)
publicationDetails.publisher
source
source.assertedBy
subject
taxResidencies
unspecifiedEntityDetails
unspecifiedPersonDetails
Indexes to migrate
bods_v2_psc_prod100
bods_v2_dk_prod100
bods_v2_sk_prod100
bods_v2_am_prod100
Index templates
Given the number of affected indexes, which all contain the same mappings, this is likely a good time to consider using Elasticsearch index templates instead. This would enable mappings to be updated centrally and apply automatically to all indexes. Doing so would also eliminate the need to run multiple 'create indexes' steps within the various transformers.
References #189 , during which this was re-discovered.
The text was updated successfully, but these errors were encountered:
Similar to #225 , indexes such as
bods_v2_psc_prod100
usenested
field type forpublicationDetails
. Doing so makes exploration of the data more difficult, as well as complicating queries, since it prevents inner object flattening. I can't really see a reason whynested
field types are used, in this case; this would need a little investigation.I only just realised that
publicationDetails.publicationDate
gets set when statements are republished. This is admittedly in accordance with BODS 0.2. Perhaps I should have spotted this sooner, but I didn't, becausepublicationDate
is buried withinpublicationDetails
as a nested object, and even though I now know it's there, it's still hard to use it for data exploration or debugging, because of the field type.I suggest that the usage of all nested fields types within BODS indexes in Elasticsearch is evaluated, to see whether the usage of such is in fact necessary or desirable. There might well be good reasons for some of them—
identifiers
comes to mind, for which the use of a nested field type is not only desirable but critical to correct results being returned. But some others, particularly those not modelled as arrays of objects—which requires special treatment in Elasticsearch since there is no dedicated array field type—would benefit from being re-evaluated.Fields to check
addresses
annotations
identifiers
(almost certainly correct, as noted above)incorporatedInJurisdiction
interestedParty
interestedParty.unspecified
interests
interests.share
names
nationalities
pepStatusDetails
pepStatusDetails.source
pepStatusDetails.source.assertedBy
placeOfBirth
placeOfResidence
publicationDetails
(likely incorrect, as noted above)publicationDetails.publisher
source
source.assertedBy
subject
taxResidencies
unspecifiedEntityDetails
unspecifiedPersonDetails
Indexes to migrate
bods_v2_psc_prod100
bods_v2_dk_prod100
bods_v2_sk_prod100
bods_v2_am_prod100
Index templates
Given the number of affected indexes, which all contain the same mappings, this is likely a good time to consider using Elasticsearch index templates instead. This would enable mappings to be updated centrally and apply automatically to all indexes. Doing so would also eliminate the need to run multiple 'create indexes' steps within the various transformers.
References #189 , during which this was re-discovered.
The text was updated successfully, but these errors were encountered: