Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IndexInput#prefetch in Exact search #2423

Open
shatejas opened this issue Jan 23, 2025 · 2 comments
Open

Use IndexInput#prefetch in Exact search #2423

shatejas opened this issue Jan 23, 2025 · 2 comments

Comments

@shatejas
Copy link
Collaborator

Description

Exact search evaluates vectors in linear fashion. Leveraging IndexInput#prefetch to load the next vector in memory, can possibly help with reducing the read cost during runtime reducing the latencies. Prefetch gives a madvise WILL_NEED system call to the kernel, kernel may use this signal to prefetch a set of bytes async.

We need to benchmark and see if this yields improvements.

Pre-requisites

  • Lucene 10.x: prefetch API is only available with Lucene 10.x
  • Lucene changes to have prefetch supported in FloatVectorValues: Currently it is not supported and requires a lucene contribution

This can help speed up filtering queries, rescoring and exact search scripting

@sohami
Copy link

sohami commented Jan 23, 2025

A similar mechanism is being addressed here with searchable snapshot in core where based on file type we can perform the read ahead of the blocks. So for exact search if we are using flat vector files then access to that file can be implicitly powered using read ahead functionality to help in sequential access cases. This can tie up well with prefetch interface later where accessor can provide specific indication on when to perform read ahead vs when not to (random access).

@shatejas
Copy link
Collaborator Author

@sohami Thanks for the reference. I would be interested in the low level RFC/ implementation, currently there are only specific cases where we want prefetch since it affects search latencies for lucene engine (and with partial loading it might affect faiss engine as well). Its easy to add a prefetch API in float vector values which can use IndexInput#prefetch and then call prefetch based on how many vectors you need instead of a predefined block of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants