-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Forward-merge branch-25.02 into branch-25.04 #614
Conversation
…604) Contributes to rapidsai/build-planning#138 Updates to using UCX 1.18 in pip devcontainers here. Also fixes some small `update-version.sh` issues, and updates references that were outdated as a result of those issues. Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Gil Forsyth (https://github.com/gforsyth) - https://github.com/jakirkham - Corey J. Nolet (https://github.com/cjnolet) URL: #604
FAILURE - Unable to forward-merge due to an error, manual merge is necessary. Do not use the IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the |
`cuvs-cu{11,12}` wheels don't currently have a runtime dependency on `libcuvs-cu{11,12}`. They need one, for library-loading: https://github.com/rapidsai/cuvs/blob/e9983e17408e6bec6f2558f9df49be97a7255417/python/cuvs/cuvs/__init__.py#L19-L25 This was missed in #594. This PR adds it. ## Notes for Reviewers Adding for searchability... this bug can result in issues like this at runtime when using `cuvs` installed from wheels: > ImportError: libcuvs_c.so: cannot open shared object file: No such file or directory Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #615
Adds the CAGRA filtering feature to the C API using DLPack Tensor as blocklist Authors: - Ajit Mistry (https://github.com/ajit283) - Ben Frederickson (https://github.com/benfred) - Micka (https://github.com/lowener) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Micka (https://github.com/lowener) - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) URL: #452
This PR applies `pre-commit` hooks to normalize whitespace (trimming trailing whitespace and enforcing consistent end-of-file newlines). These rules are already applied to most other RAPIDS repos, so this PR aligns with the norm in RAPIDS. Authors: - Bradley Dice (https://github.com/bdice) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - James Lamb (https://github.com/jameslamb) - Corey J. Nolet (https://github.com/cjnolet) URL: #593
Authors: - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #617
Currently, running `NEIGHBORS_ANN_CAGRA_TEST` takes: [0.96 hours on CUDA 11.8, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418022?pr=596#step:8:1718) [1.59 hours on CUDA 12.5, V100 (x86)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418329?pr=596#step:8:492) [0.28 hours on CUDA 12.0, A100 (ARM)](https://github.com/rapidsai/cuvs/actions/runs/12913409417/job/36012418741?pr=596#step:8:1729) Individual tests should be able to complete in less than an hour. Ideally, less than 10 minutes. This PR proposes some changes to CAGRA tests: - Each CAGRA type is now its own test executable (e.g. `NEIGHBORS_ANN_CAGRA_FLOAT_UINT32_TEST`) - Some parameter combinations were trimmed by ~50% Authors: - Bradley Dice (https://github.com/bdice) - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) Approvers: - Artem M. Chirkin (https://github.com/achirkin) - Divye Gala (https://github.com/divyegala) URL: #602
Renames `test` directories to `tests` for alignment with the rest of RAPIDS. Closes rapidsai/build-planning#140. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Ray Douglass (https://github.com/raydouglass) - Divye Gala (https://github.com/divyegala) - Corey J. Nolet (https://github.com/cjnolet) URL: #590
Mostly adapted from rapidsai/raft#2026 Authors: - Tarang Jain (https://github.com/tarang-jain) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Artem M. Chirkin (https://github.com/achirkin) URL: #561
This PR uses CUDA 12.8.0 to build and test. xref: rapidsai/build-planning#139 Authors: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - James Lamb (https://github.com/jameslamb) - Ben Frederickson (https://github.com/benfred) - Corey J. Nolet (https://github.com/cjnolet) - Vyas Ramasubramani (https://github.com/vyasr) URL: #621
It has been reported that when the number of search results is large, for example 100, using the multi-CTA algorithm can cause a decrease in recall. This PR is intended to alleviate this low recall issue. close #208 Authors: - Akira Naruse (https://github.com/anaruse) - Tamas Bela Feher (https://github.com/tfeher) - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) - tsuki (https://github.com/enp1s0) - Artem M. Chirkin (https://github.com/achirkin) URL: #492
After calling `build()`, ideally the CAGRA index contains both the dataset and the graph. But when we do not have sufficient device memory, then only the graph is returned. In such case we need to pass the dataset explicitly to the serialization routines. For serialization in HNSW format, in case we have flat hierarchy, the dataset was not passed. This PR fixes this problem by adding an optional `dataset` argument to `cagra::serialize_to_hnswlib`. Furthermore, to improve execution time, we change from writing a single element to writing a single row of the graph and dataset at time. Additionally, debug messages for tracking data saving time are added. Authors: - Tamas Bela Feher (https://github.com/tfeher) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Divye Gala (https://github.com/divyegala) URL: #591
A Java API for cuVS for easy integration into Apache Lucene or other Java based projects. Try: ``` ./build.sh libcuvs ./build.sh java ``` For generating docs, ```mvn javadoc:javadoc``` Prerequisites: * JDK 22 * Maven 3.9.6+ Todo: * Generate project panama classes using jextract on every build * Algorithms other than Cagra * Prefiltering in cagra Authors: - Ishan Chattopadhyaya (https://github.com/chatman) - Vivek Narang (https://github.com/narangvivek10) - Chris Hegarty (https://github.com/ChrisHegarty) Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Mike Sarahan (https://github.com/msarahan) URL: #450
#620) hnswlib uses an internal indexing system which assigns an ID to points, atomically, in-order that they are added to the index. When using parallelism to add points to the index, the internal ID may be different than the "label" of the point (label, for us, is just the index of the row in the dataset) as a consequence of adding points in-parallel in no deterministic order. The bug was that I was using the label itself to write out the CPU hierarchy, when I should have been using hnswlib's internal ID for the point associated with that label. Authors: - Divye Gala (https://github.com/divyegala) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #620
Includes several fixes and improvements to Vamana, primarily: - Edge case and bug fixes for Vamana index build (details below) - Documentation added for Vamana - experimental namespace removed - Reduce device memory usage by splitting reverse edge work into batches The edge case fix adds padding to all shared memory size and offset calculations so any dataset dimension is supported (tests added that verify this). A bug was also fixed with the L2 distance metric causing incorrect results in some rare cases. This PR addresses the most pressing items in #393 and stabilize the index construction sufficiently to remove the experimental namespace. Authors: - Ben Karsin (https://github.com/bkarsin) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Tamas Bela Feher (https://github.com/tfeher) URL: #558
Authors: - rhdong (https://github.com/rhdong) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #560
This PR add a dedicated documentation page for filtering in the `Getting started` tab, and add the `cuvs::neighbors::filtering` namespace to the C++ documentation Authors: - Micka (https://github.com/lowener) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: #568
Add functionality to add additional vectors after build to C API Authors: - Ajit Mistry (https://github.com/ajit283) - Corey J. Nolet (https://github.com/cjnolet) - Ben Frederickson (https://github.com/benfred) - Micka (https://github.com/lowener) Approvers: - Ben Frederickson (https://github.com/benfred) URL: #276
This PR points the shared workflow branches back to the default 25.02 branches. xref: rapidsai/build-planning#139
Forward-merge triggered by push to branch-25.02 that creates a PR to keep branch-25.04 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge. See forward-merger docs for more info.