How to do batches of vector search? #553

Wongboo · 2024-12-30T08:41:04Z

How to do batches of vector search? Each batch contains multiple queries and multiple databases, and queries only search for corresponding database in the batch. For example:

import cupy as cp

batch_size = 16
n_samples = 5000
n_features = 50
n_queries = 1000
dataset = cp.random.random_sample((batch_size, n_samples, n_features),
                                  dtype=cp.float32)
# Build index
index = cagra.build(cagra.IndexParams(), dataset)
# Search using the built index
queries = cp.random.random_sample((batch_size, n_queries, n_features),
                                  dtype=cp.float32)
# doing some indexing and searchs, queries only search for corresponding database

I've searched whole doc and issues. Really appreciate your answer!

The text was updated successfully, but these errors were encountered:

cjnolet · 2025-01-07T20:46:34Z

Hi @Wongboo thanks for your patience as most of the team was on holiday last week. We have separate API docs for the Python API. Does this help? https://docs.rapids.ai/api/cuvs/nightly/python_api/neighbors_cagra/#cuvs.neighbors.cagra.search

Wongboo · 2025-01-08T02:24:00Z

Hi @Wongboo thanks for your patience as most of the team was on holiday last week. We have separate API docs for the Python API. Does this help? https://docs.rapids.ai/api/cuvs/nightly/python_api/neighbors_cagra/#cuvs.neighbors.cagra.search

Thanks, but this has a significant difference. Note my example contains an extra batch size dimension.

cjnolet · 2025-01-08T04:01:29Z

@Wongboo, I'm not sure what you mean here, but you can pass a 2d array to the search method to query multiple vectors at a time. If you have multiple such arrays, they would need to be passed into multiple calls of search(). If you use different device_resources with different CUDA streams in the calls to search, you can overlap them across batches. You can also take a look at the persistent=True option if you'd like to improve overlap further across searches.

If you are saying that you need to have multiple indexes, you can certainly build them and search them concurrently as the call to search is asynchronous when a device_resources instance is passed in. In other words, you can have multiple different indexes on the same GPU at the same time and query them individually (and concurrently).

Wongboo · 2025-01-08T04:54:16Z

@Wongboo, I'm not sure what you mean here, but you can pass a 2d array to the search method to query multiple vectors at a time. If you have multiple such arrays, they would need to be passed into multiple calls of search(). If you use different device_resources with different CUDA streams in the calls to search, you can overlap them across batches. You can also take a look at the persistent=True option if you'd like to improve overlap further across searches.

If you are saying that you need to have multiple indexes, you can certainly build them and search them concurrently as the call to search is asynchronous when a device_resources instance is passed in. In other words, you can have multiple different indexes on the same GPU at the same time and query them individually (and concurrently).

Thanks for your kind and helpful response~ I kind of get your meaning 'concurrently build and search', is it similar to pseudo code below? But I find it seems that there is no python API to get device_resources and persistent=True currently.

import cupy as cp
from cuvs.neighbors import cagra
batch_size = 16
n_samples = 5000
n_features = 50
n_queries = 1000
dataset = cp.random.random_sample((batch_size, n_samples, n_features),
                                  dtype=cp.float32)
# Build index
for i in range(batch_size):
    index[i] = cagra.build(cagra.IndexParams(), dataset[i])
resources.sync()
# Search using the built index
queries = cp.random.random_sample((batch_size, n_queries, n_features),
                                  dtype=cp.float32, resources=resources)
k = 10
search_params = cagra.SearchParams(
    max_queries=100,
    itopk_size=64
)
# Using a pooling allocator reduces overhead of temporary array
# creation during search. This is useful if multiple searches
# are performed with same query size.
for i in range(batch_size):
    distances[i], neighbors[i] = cagra.search(search_params, index[i], queries[i],
                                    k, resources=resources)
resources.sync()

Wongboo added the feature request New feature or request label Dec 30, 2024

Wongboo changed the title ~~Is it do batches of vector search?~~ How to do batches of vector search? Jan 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do batches of vector search? #553

How to do batches of vector search? #553

Wongboo commented Dec 30, 2024 •

edited

Loading

cjnolet commented Jan 7, 2025

Wongboo commented Jan 8, 2025

cjnolet commented Jan 8, 2025 •

edited

Loading

Wongboo commented Jan 8, 2025

How to do batches of vector search? #553

How to do batches of vector search? #553

Comments

Wongboo commented Dec 30, 2024 • edited Loading

cjnolet commented Jan 7, 2025

Wongboo commented Jan 8, 2025

cjnolet commented Jan 8, 2025 • edited Loading

Wongboo commented Jan 8, 2025

Wongboo commented Dec 30, 2024 •

edited

Loading

cjnolet commented Jan 8, 2025 •

edited

Loading