Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to do batches of vector search? #553

Open
Wongboo opened this issue Dec 30, 2024 · 4 comments
Open

How to do batches of vector search? #553

Wongboo opened this issue Dec 30, 2024 · 4 comments
Labels
feature request New feature or request

Comments

@Wongboo
Copy link

Wongboo commented Dec 30, 2024

How to do batches of vector search? Each batch contains multiple queries and multiple databases, and queries only search for corresponding database in the batch. For example:

import cupy as cp

batch_size = 16
n_samples = 5000
n_features = 50
n_queries = 1000
dataset = cp.random.random_sample((batch_size, n_samples, n_features),
                                  dtype=cp.float32)
# Build index
index = cagra.build(cagra.IndexParams(), dataset)
# Search using the built index
queries = cp.random.random_sample((batch_size, n_queries, n_features),
                                  dtype=cp.float32)
# doing some indexing and searchs, queries only search for corresponding database

I've searched whole doc and issues. Really appreciate your answer!

@Wongboo Wongboo added the feature request New feature or request label Dec 30, 2024
@Wongboo Wongboo changed the title Is it do batches of vector search? How to do batches of vector search? Jan 5, 2025
@cjnolet
Copy link
Member

cjnolet commented Jan 7, 2025

Hi @Wongboo thanks for your patience as most of the team was on holiday last week. We have separate API docs for the Python API. Does this help? https://docs.rapids.ai/api/cuvs/nightly/python_api/neighbors_cagra/#cuvs.neighbors.cagra.search

@Wongboo
Copy link
Author

Wongboo commented Jan 8, 2025

Hi @Wongboo thanks for your patience as most of the team was on holiday last week. We have separate API docs for the Python API. Does this help? https://docs.rapids.ai/api/cuvs/nightly/python_api/neighbors_cagra/#cuvs.neighbors.cagra.search

Thanks, but this has a significant difference. Note my example contains an extra batch size dimension.

@cjnolet
Copy link
Member

cjnolet commented Jan 8, 2025

@Wongboo, I'm not sure what you mean here, but you can pass a 2d array to the search method to query multiple vectors at a time. If you have multiple such arrays, they would need to be passed into multiple calls of search(). If you use different device_resources with different CUDA streams in the calls to search, you can overlap them across batches. You can also take a look at the persistent=True option if you'd like to improve overlap further across searches.

If you are saying that you need to have multiple indexes, you can certainly build them and search them concurrently as the call to search is asynchronous when a device_resources instance is passed in. In other words, you can have multiple different indexes on the same GPU at the same time and query them individually (and concurrently).

@Wongboo
Copy link
Author

Wongboo commented Jan 8, 2025

@Wongboo, I'm not sure what you mean here, but you can pass a 2d array to the search method to query multiple vectors at a time. If you have multiple such arrays, they would need to be passed into multiple calls of search(). If you use different device_resources with different CUDA streams in the calls to search, you can overlap them across batches. You can also take a look at the persistent=True option if you'd like to improve overlap further across searches.

If you are saying that you need to have multiple indexes, you can certainly build them and search them concurrently as the call to search is asynchronous when a device_resources instance is passed in. In other words, you can have multiple different indexes on the same GPU at the same time and query them individually (and concurrently).

Thanks for your kind and helpful response~ I kind of get your meaning 'concurrently build and search', is it similar to pseudo code below? But I find it seems that there is no python API to get device_resources and persistent=True currently.

import cupy as cp
from cuvs.neighbors import cagra
batch_size = 16
n_samples = 5000
n_features = 50
n_queries = 1000
dataset = cp.random.random_sample((batch_size, n_samples, n_features),
                                  dtype=cp.float32)
# Build index
for i in range(batch_size):
    index[i] = cagra.build(cagra.IndexParams(), dataset[i])
resources.sync()
# Search using the built index
queries = cp.random.random_sample((batch_size, n_queries, n_features),
                                  dtype=cp.float32, resources=resources)
k = 10
search_params = cagra.SearchParams(
    max_queries=100,
    itopk_size=64
)
# Using a pooling allocator reduces overhead of temporary array
# creation during search. This is useful if multiple searches
# are performed with same query size.
for i in range(batch_size):
    distances[i], neighbors[i] = cagra.search(search_params, index[i], queries[i],
                                    k, resources=resources)
resources.sync()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants