-
Notifications
You must be signed in to change notification settings - Fork 128
components llm_rag_generate_embeddings_parallel
github-actions[bot] edited this page Dec 31, 2024
·
68 revisions
Generates embeddings vectors for data chunks read from chunks_source
.
chunks_source
is expected to contain csv
files containing two columns:
- "Chunk" - Chunk of text to be embedded
- "Metadata" - JSON object containing metadata for the chunk
If previous_embeddings
is supplied, input chunks are compared to existing chunks in the Embeddings Container and only changed/new chunks are embedded, existing chunks being reused.
Version: 0.0.77
Preview
View in Studio: https://ml.azure.com/registries/azureml/components/llm_rag_generate_embeddings_parallel/version/0.0.77
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
chunks_source | Folder containing chunks to be embedded. | uri_folder |
If adding to previously generated Embeddings
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
embeddings_container | Folder containing previously generated embeddings. Should be parent folder of the 'embeddings' output path used for for this component. Will compare input data to existing embeddings and only embed changed/new data, reusing existing chunks. | uri_folder | True |
Embeddings settings
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
embeddings_model | The model to use to embed data. E.g. 'hugging_face://model/sentence-transformers/all-mpnet-base-v2' or 'azure_open_ai://deployment/{deployment_name}/model/{model_name}' | string | hugging_face://model/sentence-transformers/all-mpnet-base-v2 | ||
deployment_validation | Uri file containing information on if the Azure OpenAI deployments, if used, have been validated | uri_file | True |
Name | Description | Type |
---|---|---|
embeddings | Where to save data with embeddings. This should be a subfolder of previous embeddings if supplied, typically named using '${name}'. e.g. /my/prev/embeddings/${name} | uri_folder |
processed_file_names | Text file containing the names of the files that were processed | uri_file |