Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-merge branch-25.02 into branch-25.04 #632

Merged
merged 19 commits into from
Jan 31, 2025

Conversation

bdice
Copy link
Contributor

@bdice bdice commented Jan 31, 2025

Manual forward merge from 25.02 to 25.04. This PR should not be squashed.

jameslamb and others added 19 commits January 24, 2025 19:13
…apidsai#604)

Contributes to rapidsai/build-planning#138

Updates to using UCX 1.18 in pip devcontainers here.

Also fixes some small `update-version.sh` issues, and updates references that were outdated as a result of those issues.

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Gil Forsyth (https://github.com/gforsyth)
  - https://github.com/jakirkham
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#604
`cuvs-cu{11,12}` wheels don't currently have a runtime dependency on `libcuvs-cu{11,12}`. They need one, for library-loading:

https://github.com/rapidsai/cuvs/blob/e9983e17408e6bec6f2558f9df49be97a7255417/python/cuvs/cuvs/__init__.py#L19-L25

This was missed in rapidsai#594. This PR adds it.

## Notes for Reviewers

Adding for searchability... this bug can result in issues like this at runtime when using `cuvs` installed from wheels:

> ImportError: libcuvs_c.so: cannot open shared object file: No such file or directory

Authors:
  - James Lamb (https://github.com/jameslamb)

Approvers:
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#615
Adds the CAGRA filtering feature to the C API using DLPack Tensor as blocklist

Authors:
  - Ajit Mistry (https://github.com/ajit283)
  - Ben Frederickson (https://github.com/benfred)
  - Micka (https://github.com/lowener)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Micka (https://github.com/lowener)
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#452
This PR applies `pre-commit` hooks to normalize whitespace (trimming trailing whitespace and enforcing consistent end-of-file newlines).

These rules are already applied to most other RAPIDS repos, so this PR aligns with the norm in RAPIDS.

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#593
Currently, running `NEIGHBORS_ANN_CAGRA_TEST` takes:
[0.96 hours on CUDA 11.8, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418022?pr=596#step:8:1718)
[1.59 hours on CUDA 12.5, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418329?pr=596#step:8:492)
[0.28 hours on CUDA 12.0, A100 (ARM)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418741?pr=596#step:8:1729)

Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes.

This PR proposes some changes to CAGRA tests:
- Each CAGRA type is now its own test executable (e.g. `NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST`)
- Some parameter combinations were trimmed by ~50%

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Artem M. Chirkin (https://github.com/achirkin)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Divye Gala (https://github.com/divyegala)

URL: rapidsai#602
Renames `test` directories to `tests` for alignment with the rest of RAPIDS.

Closes rapidsai/build-planning#140.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Ray Douglass (https://github.com/raydouglass)
  - Divye Gala (https://github.com/divyegala)
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#590
This PR uses CUDA 12.8.0 to build and test.

xref: rapidsai/build-planning#139

Authors:
  - Bradley Dice (https://github.com/bdice)
  - Vyas Ramasubramani (https://github.com/vyasr)

Approvers:
  - James Lamb (https://github.com/jameslamb)
  - Ben Frederickson (https://github.com/benfred)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Vyas Ramasubramani (https://github.com/vyasr)

URL: rapidsai#621
It has been reported that when the number of search results is large, for example 100, using the multi-CTA algorithm can cause a decrease in recall. This PR is intended to alleviate this low recall issue.

close rapidsai#208

Authors:
  - Akira Naruse (https://github.com/anaruse)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Artem M. Chirkin (https://github.com/achirkin)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - tsuki (https://github.com/enp1s0)
  - Artem M. Chirkin (https://github.com/achirkin)

URL: rapidsai#492
…sai#591)

After calling `build()`, ideally the CAGRA index contains both the dataset and the graph. But when we do not have sufficient device memory, then only the graph is returned. In such case we need to pass the dataset explicitly to the serialization routines.

For serialization in HNSW format, in case we have flat hierarchy, the dataset was not passed. This PR fixes this problem by adding an optional `dataset` argument to `cagra::serialize_to_hnswlib`.

Furthermore, to improve execution time, we change from writing a single element to writing a single row of the graph and dataset at time. 

Additionally, debug messages for tracking data saving time are added.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Divye Gala (https://github.com/divyegala)

URL: rapidsai#591
A Java API for cuVS for easy integration into Apache Lucene or other Java based projects.

Try:
```
./build.sh libcuvs
./build.sh java
```

For generating docs, ```mvn javadoc:javadoc```

Prerequisites:
* JDK 22
* Maven 3.9.6+

Todo:
* Generate project panama classes using jextract on every build
* Algorithms other than Cagra 
* Prefiltering in cagra

Authors:
  - Ishan Chattopadhyaya (https://github.com/chatman)
  - Vivek Narang (https://github.com/narangvivek10)
  - Chris Hegarty (https://github.com/ChrisHegarty)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Mike Sarahan (https://github.com/msarahan)

URL: rapidsai#450
rapidsai#620)

hnswlib uses an internal indexing system which assigns an ID to points, atomically, in-order that they are added to the index. When using parallelism to add points to the index, the internal ID may be different than the "label" of the point (label, for us, is just the index of the row in the dataset) as a consequence of adding points in-parallel in no deterministic order.

The bug was that I was using the label itself to write out the CPU hierarchy, when I should have been using hnswlib's internal ID for the point associated with that label.

Authors:
  - Divye Gala (https://github.com/divyegala)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#620
Includes several fixes and improvements to Vamana, primarily:
- Edge case and bug fixes for Vamana index build (details below)
- Documentation added for Vamana
- experimental namespace removed
- Reduce device memory usage by splitting reverse edge work into batches

The edge case fix adds padding to all shared memory size and offset calculations so any dataset dimension is supported (tests added that verify this). A bug was also fixed with the L2 distance metric causing incorrect results in some rare cases. 

This PR addresses the most pressing items in rapidsai#393 and stabilize the index construction sufficiently to remove the experimental namespace.

Authors:
  - Ben Karsin (https://github.com/bkarsin)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#558
This PR add a dedicated documentation page for filtering in the `Getting started` tab, and add the `cuvs::neighbors::filtering` namespace to the C++ documentation

Authors:
  - Micka (https://github.com/lowener)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)

URL: rapidsai#568
Add functionality to add additional vectors after build to C API

Authors:
  - Ajit Mistry (https://github.com/ajit283)
  - Corey J. Nolet (https://github.com/cjnolet)
  - Ben Frederickson (https://github.com/benfred)
  - Micka (https://github.com/lowener)

Approvers:
  - Ben Frederickson (https://github.com/benfred)

URL: rapidsai#276
This PR points the shared workflow branches back to the default 25.02
branches.

xref: rapidsai/build-planning#139
@bdice bdice requested a review from a team as a code owner January 31, 2025 16:50
@bdice bdice added the improvement Improves an existing functionality label Jan 31, 2025
@bdice bdice requested a review from a team as a code owner January 31, 2025 16:50
@bdice bdice added the non-breaking Introduces a non-breaking change label Jan 31, 2025
@bdice bdice requested review from a team as code owners January 31, 2025 16:50
@bdice bdice requested a review from gforsyth January 31, 2025 16:50
Copy link

copy-pr-bot bot commented Jan 31, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@AyodeAwe AyodeAwe merged commit 9ecd282 into rapidsai:branch-25.04 Jan 31, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci CMake cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change Python
Projects
None yet
Development

Successfully merging this pull request may close these issues.