Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Strongly filtered IVF methods #481

Open
achirkin opened this issue Nov 20, 2024 · 0 comments
Open

[FEA] Strongly filtered IVF methods #481

achirkin opened this issue Nov 20, 2024 · 0 comments
Labels
feature request New feature or request

Comments

@achirkin
Copy link
Contributor

IVF-Flat and IVF-PQ have been observed to yield low recall when the ratio of filtered-out values is high. The most likely reason for this is the fixed n_probes parameter: both methods cannot return more valid elements than available in the probed clusters.

One obvious workaround from the user side is to set a very large n_probes parameter when they anticipate a high filtering ratio. A rule of thumb could be as follows n_probes = C * k * (n_lists / n_rows) / (1 - filtered_out_ratio), where C is a constant reflecting an expected number of processed dataset rows per candidate.

Alternatively, we can change the behavior of our IVF methods to adjust n_probes based on the number of found candidates.

  1. For this, rather than selecting n_probes clusters during the coarse search, we can simply sort all clusters by their distance to queries.
  2. Change the loop condition in the fine search to allow stopping based on the number of topk sort iterations performed (as an indirect indication of number of rows processed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
Development

No branches or pull requests

1 participant