Partial loading implementation for FAISS HNSW #2405

0ctopus13prime · 2025-01-17T21:54:10Z

Description

OpenSearch KNN plugin supports three engines: NMSLIB, FAISS, and Lucene.
The first two native engines, NMSLIB and FAISS, require all vector-related data structures (such as HNSW graphs) to be loaded into memory for search operation.
For large workloads, this memory cost can quickly become substantial if quantization techniques are not applied.
Therefore, 'Partial Loading' must be enabled as an option in native engines to control the available memory for KNN search. The objective of partial loading is twofold:

To allow users to control the maximum memory available for KNN searching.
To enable native engines to partially load only the necessary data within the constraint.
If we look closely a HNSW graph mainly consist of below things:

Full precision 32 bit vectors.
Graph representations.
Metadata like dimensions, number of vectors, space type, headers etc.
From the above items, main memory is used by these full precision vectors 4 bytes * the number of vectors * the number of dimension.
The way FAISS stores these vectors is in a Flat Index and during serialization and deserialization these vectors are written and read to/from the file and put in the main memory which increases the memory consumption.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]
#2401

Check List

New functionality includes testing.
New functionality has been documented.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ding order.

Signed-off-by: Dooyong Kim <[email protected]>

0ctopus13prime · 2025-01-18T01:27:41Z

Please note that will make sure all System.out for debugging to be removed after finalized before merging.

Signed-off-by: Dooyong Kim <[email protected]>

0ctopus13prime · 2025-01-18T07:35:26Z

src/main/java/org/opensearch/knn/partialloading/KdyPerfCheck.java

+
+package org.opensearch.knn.partialloading;
+
+public class KdyPerfCheck {


This is temp class for tracking performance.
Will be removed before merging to main.

0ctopus13prime · 2025-01-22T06:06:49Z

src/main/java/org/opensearch/knn/index/codec/KNN990Codec/NativeEngines990KnnVectorsWriter.java

@@ -106,7 +106,7 @@ public void flush(int maxDoc, final Sorter.DocMap sortMap) throws IOException {
            final QuantizationState quantizationState = train(field.getFieldInfo(), knnVectorValuesSupplier, totalLiveDocs);
            // Check only after quantization state writer finish writing its state, since it is required
            // even if there are no graph files in segment, which will be later used by exact search
-            if (shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {
+            if (false /*TMP*/ && shouldSkipBuildingVectorDataStructure(totalLiveDocs)) {


This is temp code. Will revert it back before merging.

0ctopus13prime · 2025-01-22T08:25:54Z

Partial Loading Code Review Breaks Down

1. Goal

This document provides a comprehensive overview of a big PR on partial loading to minimize the time required for reviewers to complete a review.

2. Scope

Design Document : RFC

1. Supported Vector Types

Only float32 vectors are supported initially.
Binary and byte vector indices are not yet supported.

2.. Supported Metrics

Dot product.
Euclidean distance.

3.. Filtered Query

Partial loading supports filtered queries.

4.. Nested Vectors

Supported for scenarios where parent documents contain multiple vectors.
Integer parent IDs are provided in KNNWeight.

5.. Sparse Vector Documents

Supports cases where not all Lucene documents contain vectors.
Handles indexing documents without vectors.

3. Break Downs

The PR can be divided into two main parts, with the search part further split into five subparts:

Index Loading
1. Graceful resource cleanup.
Searching
1. Basic framework.
2. Normal case: No filtering, no parent IDs, and all documents have indexed vectors.
3. Filtering:
  1. Filtered queries.
  2. Handling deletions.
4. Handling parent IDs.
5. Sparse vector documents.

4. [Part 1] Index partial loading

NativeMemoryLoadStrategy
1. Fetches mapping configuration from settings to check if the current KNN field supports partial loading. If partial loading is disabled, it falls back to the default mode, loading everything into memory.
  1. Currently, retrieving this configuration from settings is not implemented and can be replaced with a placeholder for now.
Partial Loading in FAISS
1. Source : partialloading.faiss package.
2. FaissIndex.partialLoad(InputStream input) is the entry point for partially loading a FAISS index by reading bytes from the provided InputStream. The main idea is to mark starting offsets and load bytes on demand.
  1. FaissIndex.partialLoad is a Java port of a corresponding function in FAISS.
    1. Please refer to FAISS C++ source code.
  2. Supported index types:
    1. IxMp - FaissIdMapIndex
      1. Containing a mapping that maps an internal vector id to Lucene document id.
    2. IHNf - FaissHNSWFlatIndex
      1. Contains FaissHNSW
    3. IxF2 - FaissIndexFlat
      1. For Euclidian distance.
    4. IxFI - FaissIndexFlat
      1. For inner product distance.
Resource Cleanup
1. PartialLoadingContext may hold a non-null IndexInput reference, which is passed to a search thread for vector searches (e.g., HNSW graph search).
2. Graceful resource cleanup is managed in NativeMemoryAllocation.IndexAllocation.close, which invokes PartialLoadingContext.close to release the IndexInput.

5. [Part 2] Search

2.1. Partial Loading Basic Framework

The flow will reach at KNNWeight.doANNSearch.
Retrieves the configured partial loading mode from settings. [Not yet implemented]
If partial loading is disabled, it falls back to the default search using the C++ FAISS.
If partial loading is enabled:
1. Obtains PartialLoadingContext from IndexAllocation.
2. Retrieves the search strategy based on the partial loading mode.
  1. Currently, the only available strategy is MemoryEfficientPartialLoadingSearchStrategy, which accesses and loads bytes on demand without caching.
3. Copies IndexInput.
4. Extracts the efSearch value from the query.
5. Calls queryIndex of the selected search strategy.
6. Invokes FaissIndex.search to perform the search.
Sources
1. KNNWeight
2. MemoryEfficientPartialLoadingSearchStrategy
3. PartialLodingContext
4. FaissIndex
5. FaissIdMapIndex → FaissHNSWFlatIndex → FaissIndexFlat

2.2. Normal Case — Happy Path

This is the straightforward case with no filtering IDs, parent IDs, and all documents having indexed vectors.

FaissIdMapIndex:
1. Operates without a grouper or selector.
2. Delegates the search directly to the nested index, FaissHNSWFlatIndex.
FaissHNSWFlatIndex:
1. Creates a max-heap based on the distance metric.
2. Passes the heap to FaissHNSW to initiate HNSW search.
FaissHNSW:
1. Executes the HNSW search.

2.3. Having a Filtering

With Filtering:

If filtering is applied, filterIdsBitSet will have a non-null value in doANNSearch.
- Live bits (representing "live" documents) are included in the bitset only when a filter is specified in the query.

No Integer List Conversion:

Unlike C++ FAISS, running a vector search in partial loading in the JVM does not require converting the bitset into an integer list.
The search can directly use the bitset as provided.

2.4. Having Parent Ids

Parent IDs Handling:

Parent IDs are passed down to FaissIndex.

Conversion to BitSet:

The passed parent IDs are converted into a bitset.
- Refer to the comments in BitSetParentIdGrouper for details.

Grouper Creation:

A grouper is created to map child document IDs to their corresponding parent document IDs.

Parent-Level BFS in HNSW:

During BFS in HNSW, the max heap based on distance considers only the parent IDs.
- For implementation, see GroupedDistanceMaxHeap.
- Example: Child IDs (1, 2, 3) with parent ID '4'. The max heap evaluates distances at the parent level only.
  - But, we keep tracking the max child per each parent id though.

2.5. Sparse Vector Documents

Handling Sparse Vectors:

If some documents lack indexed vectors, vectorIdToDocIdMapping in FaissIdMapIndex will hold a non-null value.
- Example: If only documents 1, 5, and 10 have vectors, the mapping will be:
  - 0 → 1
  - 1 → 5
  - 2 → 10

Dooyong Kim added 7 commits January 13, 2025 18:39

First partial loading implementation for float[], FAISS HNSW.

3f13a57

Adding KNNVectorDistanceFunction

9a568fc

Cover parentIds, filter, idMap

c60155e

Fix bug in GroupedDistanceMaxHeap to use childId instead of group id.

32ad9cc

Fix sorting logic in PlainDistanceMaxHeap to return distance in ascen…

ab14eb6

…ding order.

Fix bug in GroupedDistanceMaxHeap sorting logic.

adfa4e4

Implemeneted FAISS HNSW partial loading.

ec2d966

Signed-off-by: Dooyong Kim <[email protected]>

0ctopus13prime requested review from heemin32, navneet1v, VijayanB, vamshin, jmazanec15, naveentatikonda, junqiu-lei, martin-gaievski, ryanbogan, luyuncheng and shatejas as code owners January 17, 2025 21:54

0ctopus13prime changed the title ~~Partial loading implementation for FAISS HNSW>~~ Partial loading implementation for FAISS HNSW Jan 18, 2025

Fixed a bug setting MAX_VALUE after pop in the max-heap.

c42a6bf

Signed-off-by: Dooyong Kim <[email protected]>

0ctopus13prime commented Jan 18, 2025

View reviewed changes

0ctopus13prime commented Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial loading implementation for FAISS HNSW #2405

Partial loading implementation for FAISS HNSW #2405

0ctopus13prime commented Jan 17, 2025 •

edited

Loading

0ctopus13prime commented Jan 18, 2025

0ctopus13prime Jan 18, 2025

0ctopus13prime Jan 22, 2025

0ctopus13prime commented Jan 22, 2025 •

edited

Loading


		package org.opensearch.knn.partialloading;

		public class KdyPerfCheck {

Partial loading implementation for FAISS HNSW #2405

Are you sure you want to change the base?

Partial loading implementation for FAISS HNSW #2405

Conversation

0ctopus13prime commented Jan 17, 2025 • edited Loading

Description

Related Issues

Check List

0ctopus13prime commented Jan 18, 2025

0ctopus13prime Jan 18, 2025

Choose a reason for hiding this comment

0ctopus13prime Jan 22, 2025

Choose a reason for hiding this comment

0ctopus13prime commented Jan 22, 2025 • edited Loading

Partial Loading Code Review Breaks Down

1. Goal

2. Scope

3. Break Downs

4. [Part 1] Index partial loading

5. [Part 2] Search

2.1. Partial Loading Basic Framework

2.2. Normal Case — Happy Path

2.3. Having a Filtering

2.4. Having Parent Ids

2.5. Sparse Vector Documents

0ctopus13prime commented Jan 17, 2025 •

edited

Loading

0ctopus13prime commented Jan 22, 2025 •

edited

Loading