Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPI-571 Fix SolrJ Convert Child Docs Correctly #62

Open
wants to merge 1 commit into
base: bw_branch_8_11_2
Choose a base branch
from

Conversation

yfernando-bw
Copy link

The current solrj Document DTO to SolrInputDocument binder treat all child docs as anonymous rather than prefix them with the given path. This will result in the document being indexed without the _nest_path_ field populated which is required for the ChildDocumentTransformer (fl=[child]) to work.

In order to fix this if the Document DTO contain any fields which are also a DTO or a List/Array of DTOs, they should be converted as SolrInputDocuments on its own and nested as sub document(s) within the parent document.

The current solrj Document DTO to SolrInputDocument binder
treat all child docs as anonymous rather than prefix them with
the given path. This will result in the document being indexed
without the `_nest_path_` field populated which is required
for the ChildDocumentTransformer (`fl=[child]`) to work.

In order to fix this if the Document DTO contain any fields
which are also a DTO or a List/Array of DTOs, they should be
converted as SolrInputDocuments on its own and nested as sub
document(s) within the parent document.
Copy link

@johnmbw johnmbw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this something that's fixed in later versions (e.g. 9)?
Or are we meant to submit the child docs as their own objects - at the same time as the parent?

@johnmbw
Copy link

johnmbw commented Feb 5, 2024

Or should our calling code just convert from the DTOs to SolrInputDocument for the children itself?
From a performance POV I've often wondered if we should ditch the DTO in the storage apps, as it's an extra intermediate between avro + solr input docs.

@yfernando-bw
Copy link
Author

Is this something that's fixed in later versions (e.g. 9)? Or are we meant to submit the child docs as their own objects - at the same time as the parent?

No its still a bug that exist on the HEAD - this class hasnt seen any changes for last couple of years, even that its just a formatting change. Probably this is a very underused code path in Solr, hence hadnt had much attention and care..

And yes if we want child docs to be indexed with the correct nested fields (and we do!), we need to submit child docs as their on SolrInputDocs embeded within the parent..

@yfernando-bw
Copy link
Author

Or should our calling code just convert from the DTOs to SolrInputDocument for the children itself?

I think we should either make binder to convert DTO and generate correct/valid SolrInputDocument for everything OR we convert everything manually without the DTO binder. Ideally not mix and match and get DTO binder to convert some and we convert others to SolrInputDocs manually.

From performance POV yes it could be seen as just an extra intermediary step we could get rid, but dont think its a bottleneck as such as.. I guess biggest time sink is the indexing time as oppose to these conversions..

@yfernando-bw
Copy link
Author

Option 2 to address this issue: #63 - this feels a better non breaking approach than this PR..

@timatbw
Copy link

timatbw commented Feb 6, 2024

fwiw we do have a direct Avro -> SolrInputDocument converter, auto-generated in our code.. MentionToDocumentConverter.mentionToSolrInputDocument I think I'd be more in favour of trying to move our storage app away from creating a DTO (which then relies on SolrJ's bean mapper thing) and create a SolrInputDocument builder for the enricher to do its thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants