Introduce Hybrid Search API using SQLite FTS5 + Vector search #1158

varshaprasad96 · 2025-02-19T21:41:49Z

🚀 Describe the new functionality needed

Currently, Llama-Stack supports optimized chunked writes (PR #1094) for efficient SQLite-based storage. However, there is no built-in Hybrid Search API that combines FTS5 and sqlite-vss to enable semantic and lexical retrieval.

This issue proposes the addition of a Hybrid Search API that allows users to:

Store text documents with both full-text and vector embeddings.
Perform hybrid search that ranks results by combining BM25-based text relevance and vector similarity.
Utilize chunked writes (from PR feat: Chunk sqlite-vec writes #1094) to optimize insertions for large datasets.

Ref: https://github.com/liamca/sqlite-hybrid-search/tree/main - The idea would be take Reciprocal Rank Fusion between FTS5 and vector-based search results to ensure that highly ranked documents across multiple lists are prioritized.

💡 Why is this needed? What if we don't build it?

Building Hybrid Search with RRF will ensure better accuracy, more relevant results inside Llama-Stack's current sqlite vector DB implementation.

Other thoughts

No response

varshaprasad96 · 2025-02-19T21:44:14Z

cc: @franciscojavierarceo

franciscojavierarceo · 2025-02-19T22:55:57Z

We could probably make the choice between these configurable.

varshaprasad96 · 2025-02-20T18:31:49Z

/assign @varshaprasad96

franciscojavierarceo · 2025-02-26T21:19:09Z

@varshaprasad96 I've implemented an MVP of full text search with BM2 in @feast-dev that can be useful in understanding what an implementation could look like (see here: feast-dev/feast#5082).

There will be obvious differences and it'll require expanding the API to allow for passing an input of the raw string/query along with the embedding.

I'd also recommend with adding full text search first alongside vector search and then exposing hybrid, as they all have nuances.

Given the variance in how other providers will handle their implementation, it may make sense to share a short RFC outlining the approach.

varshaprasad96 · 2025-02-26T23:52:14Z

Thanks @franciscojavierarceo! I've started on implementing FTS in sqlite in here.

+1, makes sense. I'll modify the implementation and create a PR to support FTS and Vector search in parallel. Will then create a RFC to introduce changes to API such that it's easier to implement hybrid search with other DBs as well.

varshaprasad96 added the enhancement New feature or request label Feb 19, 2025

franciscojavierarceo mentioned this issue Feb 20, 2025

Document and benchmark performance tradeoffs between sqlite-vec and FAISS inline VectorDB providers #1165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce Hybrid Search API using SQLite FTS5 + Vector search #1158

Introduce Hybrid Search API using SQLite FTS5 + Vector search #1158

varshaprasad96 commented Feb 19, 2025 •

edited

Loading

varshaprasad96 commented Feb 19, 2025

franciscojavierarceo commented Feb 19, 2025

varshaprasad96 commented Feb 20, 2025

franciscojavierarceo commented Feb 26, 2025

varshaprasad96 commented Feb 26, 2025

Introduce Hybrid Search API using SQLite FTS5 + Vector search #1158

Introduce Hybrid Search API using SQLite FTS5 + Vector search #1158

Comments

varshaprasad96 commented Feb 19, 2025 • edited Loading

🚀 Describe the new functionality needed

💡 Why is this needed? What if we don't build it?

Other thoughts

varshaprasad96 commented Feb 19, 2025

franciscojavierarceo commented Feb 19, 2025

varshaprasad96 commented Feb 20, 2025

franciscojavierarceo commented Feb 26, 2025

varshaprasad96 commented Feb 26, 2025

varshaprasad96 commented Feb 19, 2025 •

edited

Loading