-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Filtered CAGRA to automatically use pre-filtered brute-force when the filter ratio reaches a particular theshold. #252
Comments
Does the issue mentioned here depend on the specific dataset possibly? |
No specific dataset, but the core use-case is hybrid search, where Lucene or Milvus might be accepting a filter that has been constructed based on having done a prior structured search. For example, a user searches for the nearest vectors within a specific geographic region that have certain attributes (Eg above a certain age, or work at a certain store). The results returned from the structured search are often very small in comparison to the total number of elements in the index and so it results in a filter that only includes maybe 1% (sometimes maybe slightly more). |
Got it! Thank you for the clarification! |
Now that we are quite far into implementation, I think more and more that this idea could backfire on us.
|
Here is the latest benchmark that can be referred to make this decision: #378 (comment) |
@achirkin, according to @enp1s0 and @anaruse, significant recall drop at high filtering levels is a fundamental limitation of the algorithm and happens as a result of the hash table filling up during search. I agree that we want to investigate to know with absolute certainty that this is what is happening, but I've been given a fairly strong impression that this is what's happening, and it's not just a limitation of CAGRA itself because many of the CPU-based algorithms struggle with this problem as well. The most common use-case for filtering is for hybrid search, where 99% of the vectors need to be excluded from the index. Pre-filtered brute-force works well in these cases, and we've been asked by users to fall back to brute-force so that we can return correct results. If it turns out we find a better way to enable CAGRA to do this while maintaining high recall and performance, then I'll be the first to suggest that we remove the automatic fallback to pre-filtered brute-force. This isn't a specialized or niche use-case. Most users we have encountered who are doing generative AI and RAG in production today need to be able to do hybrid search. Even if this is a stopgap, it's an opportunity to get them using the GPU for search. |
If the filling up of hash table is really causing a significant recall drop, I think we may be able expose the hashmap size (bitlen) parameter or improve the heuristic there. |
Oh, I didn't realize that was with no filtering at all! Yes, that's concerning for sure and should definitely create an issue for it. |
If we really want to compose the search algorithms, maybe we should better introduce dedicated indices for that? Or, at the very least, one could add an explicit search parameter in CAGRA (disabled by default), which would allow automatic switching between CAGRA and bruteforce. This would at least leave the control to the user and avoid any surprises. |
The goal here is intentionally to make this work for users without them having to change their code. Users aren't asking for heavy filtering w/ IVF, mostly because that's nearly impossible to do with high recall and do efficiently. They are using graph-based algorithms today, and don't want to have to do special things in their code to support the use-cases they need. I don't see this as a surprise in any negative way- they are getting high recall for heavy filters. I have not yet encountered a user that would prefer to have garbage recall over a loss in performance if it meant getting the correct results. We will be introducing a tiered index in a future version and maybe that's ultimately a route for users to have more composability going forward. But Milvus is among the users asking for this today and automatically switching to bfknn for 99%+ filters is not an unreasable solution in the meantime (and it'll enable a very large number of important users without any code changes to Milvus). |
|
@cjnolet ok, so to summarize our conversation about my three concerns so far: (2) you're ok with the current solution in terms of design, (3) from the experiments provided by @rhdong the slowdown level is acceptable. Point (1) is largely unanswered in my opinion.
I assume you mean HNSW here? I suspect it may have problem with the 'hierarchical' aspect: if the graph is structured so that it resembles a tree, the search process may likely struggle with going between the branches. Much like our IVF methods surely struggle with high filtering ratios because they only go through a fixed number of clusters. |
While CAGRA offers the capability to specify a pre-filter, it is limited in that heavy filters can end up reducing recall significantly after a particular point. For this reason, we recommend to use pre-filtered brute-force in the cases where 90%+ of the vectors are being filtered out, since it'll conly compute the distances for the vectors that are not being filtered out.
Many users are getting unpleasant surprises when they attempt to filter out 99% of the vectors from a CAGRA index and find that the recall is close to 0. We also have CAGRA integrated into several different places atm and instead of having each integrator manually switch over to pre-filtered brute-force in these cases, we should do the switch in CAGRA itself so everyone benefits from this feature automatically.
The text was updated successfully, but these errors were encountered: