You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IVF-Flat and IVF-PQ have been observed to yield low recall when the ratio of filtered-out values is high. The most likely reason for this is the fixed n_probes parameter: both methods cannot return more valid elements than available in the probed clusters.
One obvious workaround from the user side is to set a very large n_probes parameter when they anticipate a high filtering ratio. A rule of thumb could be as follows n_probes = C * k * (n_lists / n_rows) / (1 - filtered_out_ratio), where C is a constant reflecting an expected number of processed dataset rows per candidate.
Alternatively, we can change the behavior of our IVF methods to adjust n_probes based on the number of found candidates.
For this, rather than selecting n_probes clusters during the coarse search, we can simply sort all clusters by their distance to queries.
Change the loop condition in the fine search to allow stopping based on the number of topk sort iterations performed (as an indirect indication of number of rows processed).
The text was updated successfully, but these errors were encountered:
IVF-Flat and IVF-PQ have been observed to yield low recall when the ratio of filtered-out values is high. The most likely reason for this is the fixed
n_probes
parameter: both methods cannot return more valid elements than available in the probed clusters.One obvious workaround from the user side is to set a very large
n_probes
parameter when they anticipate a high filtering ratio. A rule of thumb could be as followsn_probes = C * k * (n_lists / n_rows) / (1 - filtered_out_ratio)
, whereC
is a constant reflecting an expected number of processed dataset rows per candidate.Alternatively, we can change the behavior of our IVF methods to adjust
n_probes
based on the number of found candidates.n_probes
clusters during the coarse search, we can simply sort all clusters by their distance to queries.The text was updated successfully, but these errors were encountered: