[FEA] Strongly filtered IVF methods #481

achirkin · 2024-11-20T08:16:51Z

IVF-Flat and IVF-PQ have been observed to yield low recall when the ratio of filtered-out values is high. The most likely reason for this is the fixed n_probes parameter: both methods cannot return more valid elements than available in the probed clusters.

One obvious workaround from the user side is to set a very large n_probes parameter when they anticipate a high filtering ratio. A rule of thumb could be as follows n_probes = C * k * (n_lists / n_rows) / (1 - filtered_out_ratio), where C is a constant reflecting an expected number of processed dataset rows per candidate.

Alternatively, we can change the behavior of our IVF methods to adjust n_probes based on the number of found candidates.

For this, rather than selecting n_probes clusters during the coarse search, we can simply sort all clusters by their distance to queries.
Change the loop condition in the fine search to allow stopping based on the number of topk sort iterations performed (as an indirect indication of number of rows processed).

The text was updated successfully, but these errors were encountered:

achirkin added the feature request New feature or request label Nov 20, 2024

achirkin added this to VS/ML/DM Primitives Release Board Nov 20, 2024

achirkin moved this to Todo in VS/ML/DM Primitives Release Board Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Strongly filtered IVF methods #481

[FEA] Strongly filtered IVF methods #481

achirkin commented Nov 20, 2024

[FEA] Strongly filtered IVF methods #481

[FEA] Strongly filtered IVF methods #481

Comments

achirkin commented Nov 20, 2024