Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Updating retrieve online documents v2 to work for other fields for sq… #5082

Merged
merged 4 commits into from
Feb 26, 2025

Conversation

franciscojavierarceo
Copy link
Member

@franciscojavierarceo franciscojavierarceo commented Feb 22, 2025

What this PR does / why we need it:

This PR enables full text search for the retrieve_online_documents/ endpoint for SQLite Vec. It also establishes a new parameter in the SDK method called query_string that can be passed to use key word search. There are a number of limitations with this approach as the top_k parameter can be misleading (as evident by the example). This offers a good start for keyword search that leverages the existing vector retrieval endpoint. As a next step, enabling hybrid search would be beneficial.

It makes keyword search as simple as:

results = store.retrieve_online_documents_v2(
    features=[
        "document_embeddings:Embeddings",
        "document_embeddings:content",
        "document_embeddings:title",
    ],
    query=query_embedding,
    query_string="(content: 5) OR (title: 1) OR (title: 3)",
    top_k=3,
).to_dict()
print(results)
  • feature_server.py:
    • Added optional query_string parameter to the GetOnlineFeaturesRequest class.
    • Updated retrieve_online_documents to support the query_string parameter.
  • feature_store.py:
    • Added optional query_string parameter to retrieve_online_documents_v2.
    • Updated related methods to handle query_string.
  • feature_view.py:
    • Added an assertion to ensure only one vector feature per feature view.
  • milvus.py:
    • Added optional query_string parameter to retrieve_online_documents_v2.
  • online_store.py:
    • Added optional query_string parameter to retrieve_online_documents_v2.
  • sqlite.py:
    • Extensive changes to support text search with BM25, including adding text_search_enabled configuration and handling query_string.
    • Updated SQL operations to support the new functionalities.
  • passthrough_provider.py and provider.py:
    • Updated retrieve_online_documents_v2 to support the query_string.
  • types.py:
    • Added FEAST_VECTOR_TYPES list for handling vector types.
  • example_feature_repo_1.py:
    • Added content and title fields to an example feature view.

Which issue(s) this PR fixes:

#5081
#5073

Misc

@franciscojavierarceo franciscojavierarceo changed the title Updating retrieve online documents v2 to work for other fields for sq… feat: Updating retrieve online documents v2 to work for other fields for sq… Feb 22, 2025
@franciscojavierarceo
Copy link
Member Author

@HaoXuAI

@@ -196,6 +196,17 @@ def __str__(self):
UnixTimestamp: pyarrow.timestamp("us", tz=_utc_now().tzname()),
}

FEAST_VECTOR_TYPES: List[Union[ValueType, PrimitiveFeastType, ComplexFeastType]] = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if this is used somewhere? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector_bin = serialize_f32(
val.float_list_val.val, config.online_store.vector_len
) # type: ignore
if feature_type_dict[feature_name] in FEAST_VECTOR_TYPES:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HaoXuAI see here!

@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review February 26, 2025 19:05
@franciscojavierarceo franciscojavierarceo requested review from a team as code owners February 26, 2025 19:05
@franciscojavierarceo franciscojavierarceo merged commit fc121c3 into master Feb 26, 2025
22 of 23 checks passed
@franciscojavierarceo
Copy link
Member Author

Technically there's a flaw here I need to resolve because you have to pass in a query embedding for just pure text search, which is silly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants