-
Notifications
You must be signed in to change notification settings - Fork 130
components llm_ingest_dbcopilot_faiss_e2e
github-actions[bot] edited this page Nov 7, 2024
·
69 revisions
Single job pipeline to chunk data from AzureML DB Datastore and create faiss embeddings index
Version: 0.0.66
View in Studio: https://ml.azure.com/registries/azureml/components/llm_ingest_dbcopilot_faiss_e2e/version/0.0.66
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
db_datastore | database datastore uri in the format of 'azureml://datastores/{datastore_name}' | string | |||
sample_data | Sample data to be used for data ingestion. format: 'azureml:samples-test:1' | uri_folder | True |
path: "azureml:samples-test:1" data ingest setting
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
embeddings_model | The model used to generate embeddings. 'azure_open_ai://endpoint/{endpoint_name}/deployment/{deployment_name}/model/{model_name}' | string | |||
chat_aoai_deployment_name | The name of the chat AOAI deployment | string | True | ||
embedding_aoai_deployment_name | The name of the embedding AOAI deployment | string |
grounding settings
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
max_tables | integer | True | |||
max_columns | integer | True | |||
max_rows | integer | True | |||
max_sampling_rows | integer | True | |||
max_text_length | integer | True | |||
max_knowledge_pieces | integer | True | |||
selected_tables | The list of tables to be ingested. If not specified, all tables will be ingested. Format: ["table1","table2","table3"] | string | True | ||
column_settings | string | True |
copilot settings
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
tools | The name of the tools for dbcopilot. Supported tools: "tsql", "python". Format: ["tsql", "python"] | string | True |
deploy settings
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
endpoint_name | The name of the endpoint | string | |||
deployment_name | The name of the deployment | string | blue | ||
mir_environment | The name of the mir environment. Format: azureml://registries/{registry_name}/environments/llm-dbcopilot-mir | string |
compute settings
Name | Description | Type | Default | Optional | Enum |
---|---|---|---|---|---|
serverless_instance_count | integer | 1 | True | ||
serverless_instance_type | string | Standard_DS3_v2 | True | ||
embedding_connection | Azure OpenAI workspace connection ARM ID for embeddings | string | True | ||
llm_connection | Azure OpenAI workspace connection ARM ID for llm | string | True | ||
temperature | number | 0.0 | True | ||
top_p | number | 0.0 | True | ||
include_builtin_examples | boolean | True | True | ||
knowledge_pieces | The list of knowledge pieces to be used for grounding. | string | True | ||
include_views | Whether to turn on views. | boolean | True | ||
instruct_template | The instruct template for the LLM. | string | True | ||
managed_identity_enabled | Whether to connect using managed identity. | boolean | False | True | |
egress_public_network_access | This option allows the resource to send outbound traffic to the public Internet or not, there are two choices disabled and enabled, the default is enabled | string | enabled | True |
Name | Description | Type |
---|---|---|
grounding_index | uri_folder | |
db_context | uri_folder |