Lucene CuVS Integration

This is an integration for CuVS, GPU accelerated vector search library from NVIDIA (formerly part of Raft), into Apache Lucene.

Architecture

As an initial integration, the CuVS library is plugged in as an IndexSearcher. This project has two layers: (1) Java/JNI layer in lucene dir, (2) CuVS/C++ layer in cuda dir.

By way of a working example, OpenAI's Wikipedia corpus (25k documents) can be indexed, each document having a content vector. A provided sample query (query.txt) can be executed after the indexing.

⚠️ This is not production ready yet.

Running

Install RAFT (https://docs.rapids.ai/api/raft/stable/build/#installation)

Download the dataset file using this link

Set the correct path for Raft in cuda/CMakeLists.txt file. Then, proceed to run the following (Wikipedia OpenAI benchmark):

wget -c https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedded.zip
mvn package
java -jar lucene/target/cuvs-searcher-lucene-0.0.1-SNAPSHOT-jar-with-dependencies.jar <datasetfile> <vector_index_column> <name_of_vector_field> <numDocs> <dimensions> <queryFile>

# Example
java -jar lucene/target/cuvs-searcher-lucene-0.0.1-SNAPSHOT-jar-with-dependencies.jar vector_database_wikipedia_articles_embedded.zip 5 content_vector 25000 768 query.txt

Benchmarks

Wikipedia (768 dimensions, 1M vectors):

	Indexing	Improvement	Search	Improvement
CuVS (RTX 4090, NN_DESCENT)	38.80 sec	25.6x	2 ms	4x
CuVS (RTX 2080 Ti, NN_DESCENT)	47.67 sec	20.8x	3 ms	2.7x
Lucene HNSW (Ryzen 7700X, single thread)	992.37 sec	-	8 ms	-

Wikipedia (2048 dimensions, 1M vectors):

	Indexing	Improvement
CuVS (RTX 4090, NN_DESCENT)	55.84 sec	23.8x
Lucene HNSW (Ryzen 7950X, single thread)	1329.9 sec	-

Next steps

Instead of extending the IndexSearcher, create a KnnVectorFormat and corresponding KnnVectorsWriter and KnnVectorsReader for tighter integration.

Contributors

Vivek Narang, SearchScale
Ishan Chattopadhyaya, SearchScale & Committer, Apache Lucene & Solr
Kishore Angani, SearchScale
Noble Paul, SearchScale & Committer, Apache Lucene & Solr

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
cuda		cuda
lucene		lucene
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
architecture.png		architecture.png
pom.xml		pom.xml
query.txt		query.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lucene CuVS Integration

Architecture

Running

Benchmarks

Next steps

Contributors

About

Releases

Packages

Languages

License

punAhuja/lucene-cuvs

Folders and files

Latest commit

History

Repository files navigation

Lucene CuVS Integration

Architecture

Running

Benchmarks

Next steps

Contributors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages