From 7c83a271e62dfa311d0e01db7e6eba84d361b09d Mon Sep 17 00:00:00 2001 From: cmuhao Date: Tue, 30 Apr 2024 22:18:13 -0700 Subject: [PATCH 1/3] add vector database doc Signed-off-by: cmuhao --- docs/reference/alpha-vector-database.md | 96 +++++++++++++++++++++++++ 1 file changed, 96 insertions(+) create mode 100644 docs/reference/alpha-vector-database.md diff --git a/docs/reference/alpha-vector-database.md b/docs/reference/alpha-vector-database.md new file mode 100644 index 00000000000..ae94d49b924 --- /dev/null +++ b/docs/reference/alpha-vector-database.md @@ -0,0 +1,96 @@ +# [Alpha] Vector Database +**Warning**: This is an _experimental_ feature. To our knowledge, this is stable, but there are still rough edges in the experience. Contributions are welcome! + +## Overview +Vector database allows user to store and retrieve embeddings. Feast provides general APIs to store and retrieve embeddings. + +## Integration +Below are supported vector databases and implemented features: + +| Vector Database | Retrieval | Indexing | +|-----------------|-----------|----------| +| Pgvector | [x] | [ ] | +| Elasticsearch | [ ] | [ ] | +| Milvus | [ ] | [ ] | +| Faiss | [ ] | [ ] | + + +## Example + +See [https://github.com/feast-dev/feast-workshop/blob/rag/module_4_rag](https://github.com/feast-dev/feast-workshop/blob/rag/module_4_rag) for an example on how to use vector database. + +### **Prepare offline embedding dataset** +Run the following commands to prepare the embedding dataset: +```shell +python pull_states.py +python batch_score_documents.py +``` +The output will be stored in `data/city_wikipedia_summaries.csv.` + +### **Initialize Feast feature store and materialize the data to the online store** +Use the feature_tore.yaml file to initialize the feature store. This will use the data as offline store, and Pgvector as online store. + +```yaml +project: feast_demo_local +provider: local +registry: + registry_type: sql + path: postgresql://@localhost:5432/feast +online_store: + type: postgres + pgvector_enabled: true + vector_len: 384 + host: 127.0.0.1 + port: 5432 + database: feast + user: "" + password: "" + + +offline_store: + type: file +entity_key_serialization_version: 2 +``` +Run the following command to apply the feature store configuration: + +```shell +feast apply +``` + +Then run the following command to materialize the data to the online store: + +```shell +!feast materialize 2024-04-01T00:00:00 2024-04-17T00:00:00 +``` + +### **Prepare a query embedding** +```python +from batch_score_documents import run_model, TOKENIZER, MODEL +from transformers import AutoTokenizer, AutoModel + +question = "the most populous city in the U.S. state of Texas?" + +tokenizer = AutoTokenizer.from_pretrained(TOKENIZER) +model = AutoModel.from_pretrained(MODEL) +query_embedding = run_model(question, tokenizer, model) +query = query_embedding.detach().cpu().numpy().tolist()[0] +``` + +### **Retrieve the top 5 similar documents** +First create a feature store instance, and use the `retrieve_online_documents` API to retrieve the top 5 similar documents to the specified query. + +```python +from feast import FeatureStore +store = FeatureStore(repo_path=".") +features = store.retrieve_online_documents( + feature="city_embeddings:Embeddings", + query=query, + top_k=5 +).to_dict() + +def print_online_features(features): + for key, value in sorted(features.items()): + print(key, " : ", value) + +print_online_features(features) +``` \ No newline at end of file From 792b238948697adc12fa9e027a256d7a59b68b76 Mon Sep 17 00:00:00 2001 From: Hao Xu Date: Thu, 2 May 2024 17:08:12 +0000 Subject: [PATCH 2/3] Update Doc Signed-off-by: Hao Xu --- docs/reference/alpha-vector-database.md | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/docs/reference/alpha-vector-database.md b/docs/reference/alpha-vector-database.md index ae94d49b924..9e63bc7e081 100644 --- a/docs/reference/alpha-vector-database.md +++ b/docs/reference/alpha-vector-database.md @@ -51,16 +51,31 @@ offline_store: type: file entity_key_serialization_version: 2 ``` -Run the following command to apply the feature store configuration: +Run the following command in terminal to apply the feature store configuration: ```shell feast apply ``` -Then run the following command to materialize the data to the online store: +Note that when you run `feast apply` you are going to apply the following Feature View that we will use for retrieval later: -```shell -!feast materialize 2024-04-01T00:00:00 2024-04-17T00:00:00 +```python +city_embeddings_feature_view = FeatureView( +name="city_embeddings", +entities=[item], +schema=[ +Field(name="Embeddings", dtype=Array(Float32)), +], +source=source, +ttl=timedelta(hours=2), +) +``` + +Then run the following command in the terminal to materialize the data to the online store: + +```shell +CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S") +feast materialize-incremental $CURRENT_TIME ``` ### **Prepare a query embedding** From fbc59457905c4b2b3a6c1e4ae434a2d236cc9a4a Mon Sep 17 00:00:00 2001 From: cmuhao Date: Sun, 5 May 2024 21:52:45 -0700 Subject: [PATCH 3/3] update format Signed-off-by: cmuhao --- docs/reference/alpha-vector-database.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/reference/alpha-vector-database.md b/docs/reference/alpha-vector-database.md index 9e63bc7e081..3b0c924d84b 100644 --- a/docs/reference/alpha-vector-database.md +++ b/docs/reference/alpha-vector-database.md @@ -61,13 +61,13 @@ Note that when you run `feast apply` you are going to apply the following Featur ```python city_embeddings_feature_view = FeatureView( -name="city_embeddings", -entities=[item], -schema=[ -Field(name="Embeddings", dtype=Array(Float32)), -], -source=source, -ttl=timedelta(hours=2), + name="city_embeddings", + entities=[item], + schema=[ + Field(name="Embeddings", dtype=Array(Float32)), + ], + source=source, + ttl=timedelta(hours=2), ) ```