This connector extracts search documents from pages on Sharepoint sites. It retrieves all pages on all sites that it has access to.
This connector makes use of a Microsoft Azure App Registration. Ensure that you apply either the Microsoft Graph / Sites.Read.All
permission or configure Microsoft Graph / Sites.Selected
appropriately to allow access to the desired site(s).
Configure the client ID, client secret, and tenant ID using the information from the registered App.
Additionally, this connector requires Azure OpenAI services or an OpenAI API key to generate embedding vectors for documents.
Create a YAML config file based on the following template.
sharepoint_client_id: sharepoint_client_id
sharepoint_client_secret: sharepoint_client_secret
sharepoint_tenant_id: sharepoint_tenant_id
embedding_model:
azure_openai:
key: <key>
endpoint: <endpoint>
Note that an embedding model needs to be appropriately configured. This example shows how to configure an Azure OpenAI services model, but you can use other supported models.
These defaults are provided; you don't have to manually configure them.
include_text
specifies whether to include the original document text alongside the embedded content.
embedding_model: # in the same block as above
azure_openai:
version: <version> # "2024-03-01-preview"
deployment_name: <deployment_name> # "Embedding_3_small"
model: <model> # "text-embedding-3-small"
chunk_size: 512
chunk_overlap: 50
include_text: <include_text> # False
See Output Config for more information on the optional output
config.
Follow the installation instructions to install metaphor-connectors
in your environment (or virtualenv). Make sure to include the sharepoint
or all
extra.
Run the following command to test the connector locally:
metaphor sharepoint <config_file>
Manually verify the output after the run finishes.