Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] The recall rate significantly drops by 20% when using CAGRA with a full 1s prefilter (nothing filtered out). #472

Open
rhdong opened this issue Nov 18, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@rhdong
Copy link
Member

rhdong commented Nov 18, 2024

Describe the bug
When using CAGRA with a full ones prefilter, (nothing filtered out), the recall rate significantly drop by 20%

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark                                            Time             CPU   Iterations        GPU    Latency     Recall end_to_end items_per_second      itopk          k  n_queries total_queries
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
raft_cagra.dim32/0/0/process_time/real_time       1.17 ms         1.17 ms        11941   1.16104m   1.17158m     0.9385    13.9899        853.548/s        128         10          1       11.941k algo="single_cta"
raft_cagra.dim32/1/0/process_time/real_time       2.14 ms         2.14 ms         6554   2.12471m   2.13548m   **0.789747**    13.9959        468.279/s        256         10          1        6.554k algo="single_cta"
raft_cagra.dim32/0/1/process_time/real_time       2.42 ms         2.42 ms         5768   2.41063m   2.42168m     0.9385    13.9682       412.937k/s        128         10       1000        5.768M algo="single_cta"
raft_cagra.dim32/1/1/process_time/real_time       4.59 ms         4.59 ms         3045   4.58215m   4.59372m    **0.79002**    13.9879       217.689k/s        256         10       1000        3.045M algo="single_cta"

Steps/Code to reproduce bug
CUVS_CAGRA_ANN_BENCH reproduce branch

Expected behavior
The recall rate should be very close to the no prefilter case.

Environment details (please complete the following information):

  • Any benchmark environment.

Additional context
Add any other context about the problem here.

@enp1s0
Copy link
Member

enp1s0 commented Nov 21, 2024

@rhdong Thank you for the bug report. I have reproduced the problem when the itopk size is equal to or larger than 256. I'll investigate the cause, so please wait a while.

@enp1s0
Copy link
Member

enp1s0 commented Nov 23, 2024

@rhdong Can you apply this change and check the recall again? #489

rapids-bot bot pushed a commit that referenced this issue Dec 4, 2024
Ref : #472

## The cause of the bug
The bitonic sort was used on an array that was not a power of 2 long. In the current search implementation, the bitonic sort is used to move the invalid elements to the end of the buffer as:
https://github.com/rapidsai/cuvs/blob/5062594138a40231475299c7bac61083b0669fd1/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L758-L763
https://github.com/rapidsai/cuvs/blob/5062594138a40231475299c7bac61083b0669fd1/cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh#L644-L649

The problem is that the (max) array length (=`MAX_ITOPK + MAX_CANDIDATES`) is not always the power of two.
These bitonic sorts are called even if no elements are filtered out unless `cuvs::neighbors::filtering::none_sample_filter` is specified as the filter, so #472 occurs.

## Fix
This PR changes the filtering process so that the bitonic sort is not used to move the invalid elements to the end of the buffer.

Authors:
  - tsuki (https://github.com/enp1s0)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)

URL: #489
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants