feat: allow to use customized GraphRAG settings.yaml #387

ronchengang · 2024-10-12T04:32:49Z

Description

Please include a summary of the changes and the related issue.
The GraphRAG index generation depends on the settings.yaml. In this file, there are many configurable items, many of which are crucial to the generation of the GraphRAG index, such as llm-related settings, request_timeout, concurrent_requests, etc. However, It is hard to write all these configurations to .env, because there are too many and it is not easy to manage. Although there are a few configurable items in the current .env.example that are ok for people who use OpenAI to complete the GraphRAG index without problem, for others who use private models like Ollama and need more advanced configurations, these few configurations are not enough. Several posts in the current issue list require a more flexible way to use more advanced configurations. This change is to allow users to use self-defined settings.yaml. To achieve this, users need to prepare the settings.yaml.example file by themselves, put it in the root folder, and add a new environment variable named USE_CUSTOMIZED_GRAPHRAG_SETTING in .env file, when its value is true, then this user-provided settings.yaml will be applied during the GraphRAG index process. in this way, use can use self-hosted models like Ollama and customize other configurations.
Fixes [REQUEST] - Can we have a settings.yaml file on the GraphRAG indexing module? #299 Ollama Graph Embedding Fails in Local LLM Setup #283 [REQUEST] - Can graphrag support other free LLMs like qwen2? #245 [BUG] - Ollama OpenAI not working #224 How to set the graphRAG with local ollama #212 and more

Type of change

New features (non-breaking change).
Bug fix (non-breaking change).
Breaking change (fix or feature that would cause existing functionality not to work as expected).

Checklist

I have performed a self-review of my code.
I have added thorough tests if it is a core feature.
There is a reference to the original bug report and related work.
I have commented on my code, particularly in hard-to-understand areas.
The feature is well documented.

libs/ktem/ktem/index/file/graph/pipelines.py

settings.yaml.example

taprosoft · 2024-10-14T08:22:48Z

@ronchengang this works well enough for Indexing process but beware that Retrieval process use different set of params to choose Embedding model

kotaemon/libs/ktem/ktem/index/file/graph/pipelines.py

Line 227 in f0f3b4b

text_embedder = OpenAIEmbedding(

Please also synchronize this setting to make everything seamless.

zzll22 · 2024-10-14T11:30:07Z

@ronchengang

Hello, I used your code but got an error:
Traceback (most recent call last):
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/main.py", line 85, in
index_cli(
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/cli.py", line 119, in index_cli
_initialize_project_at(root_dir, progress_reporter)
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/cli.py", line 190, in _initialize_project_at
raise ValueError(msg)
ValueError: Project already initialized at /Users/zhaolong/Desktop/TS_Platform/kotaemon-main/ktem_app_data/user_data/files/graphrag/0929df4a-e1eb-43dd-b96f-d17f5d1acee0
How to solve it?

(#387)

ronchengang · 2024-10-14T13:47:19Z

@ronchengang this works well enough for Indexing process but beware that Retrieval process use different set of params to choose Embedding model

kotaemon/libs/ktem/ktem/index/file/graph/pipelines.py

Line 227 in f0f3b4b

text_embedder = OpenAIEmbedding(

Please also synchronize this setting to make everything seamless.

@taprosoft done.

ronchengang · 2024-10-14T13:52:16Z

@ronchengang

Hello, I used your code but got an error: Traceback (most recent call last): File "/opt/anaconda3/envs/kotaemon/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/anaconda3/envs/kotaemon/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/main.py", line 85, in index_cli( File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/cli.py", line 119, in index_cli _initialize_project_at(root_dir, progress_reporter) File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/cli.py", line 190, in _initialize_project_at raise ValueError(msg) ValueError: Project already initialized at /Users/zhaolong/Desktop/TS_Platform/kotaemon-main/ktem_app_data/user_data/files/graphrag/0929df4a-e1eb-43dd-b96f-d17f5d1acee0 How to solve it?

@zzll22 Could you please make sure the "Force reindex file" option is checked and try again?

taprosoft

LGTM

taprosoft · 2024-10-14T14:17:33Z

Thanks for your contribution @ronchengang.

zzll22 · 2024-10-16T02:51:32Z

@ronchengang Can this allow me to use the transit API, such as the key of openapi? Can I only use the key of openai? (This is too expensive)

zzll22 · 2024-10-16T04:04:04Z

@ronchengang @taprosoft The latest branch reports an error using ollama+GraphRAG index:
Traceback (most recent call last):
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/ktem/ktem/index/file/pipelines.py", line 784, in stream
file_id, docs = yield from pipeline.stream(
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/ktem/ktem/index/file/pipelines.py", line 633, in stream
yield from self.handle_docs(docs, file_id, file_name)
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/ktem/ktem/index/file/pipelines.py", line 375, in handle_docs
self.handle_chunks_docstore(chunks, file_id)
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/ktem/ktem/index/file/pipelines.py", line 411, in handle_chunks_docstore
self.vector_indexing.add_to_docstore(chunks)
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/kotaemon/kotaemon/indices/vectorindex.py", line 86, in add_to_docstore
self.doc_store.add(docs)
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/kotaemon/kotaemon/storages/docstores/lancedb.py", line 53, in add
document_collection.add(data)
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lancedb/table.py", line 1301, in add
self.schema,
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lancedb/table.py", line 973, in schema
return self._dataset.schema
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lancedb/table.py", line 955, in _dataset
return self._ref.dataset
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lancedb/table.py", line 836, in dataset
self._dataset = lance.dataset(
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lance/init.py", line 89, in dataset
ds = LanceDataset(
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lance/dataset.py", line 168, in init
self._ds = _Dataset(
ValueError: Dataset at path Users/zhaolong/Desktop/TS_Platform/kotaemon-main/ktem_app_data/user_data/docstore/index_1.lance was not found: Not found: Users/zhaolong/Desktop/TS_Platform/kotaemon-main/ktem_app_data/user_data/docstore/index_1.lance/_versions/1.manifest, /Users/runner/work/lance/lance/rust/lance-table/src/io/commit.rs:140:23, /Users/runner/work/lance/lance/rust/lance/src/dataset/builder.rs:310:35

allow to use customized GraphRAG settings.yaml

12c5461

ronchengang changed the title ~~feat:allow to use customized GraphRAG settings.yaml~~ feat: allow to use customized GraphRAG settings.yaml Oct 12, 2024

adjust import style

6ee9043

ronchengang mentioned this pull request Oct 12, 2024

[BUG] GraphRAG integration issue #367

Closed

fix typo

02a9b79

EvelynBai approved these changes Oct 14, 2024

View reviewed changes

taprosoft reviewed Oct 14, 2024

View reviewed changes

libs/ktem/ktem/index/file/graph/pipelines.py Outdated Show resolved Hide resolved

taprosoft reviewed Oct 14, 2024

View reviewed changes

libs/ktem/ktem/index/file/graph/pipelines.py Outdated Show resolved Hide resolved

taprosoft reviewed Oct 14, 2024

View reviewed changes

settings.yaml.example Show resolved Hide resolved

Added GraphRAG original documentation reference.

5ad6bca

Merge branch 'Cinnamon:main' into graphrag_adaption

6906ab2

feat: allow to use customized GraphRAG settings.yaml

046d24f

(#387)

taprosoft approved these changes Oct 14, 2024

View reviewed changes

taprosoft merged commit 8188760 into Cinnamon:main Oct 14, 2024
5 checks passed

ronchengang deleted the graphrag_adaption branch October 14, 2024 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: allow to use customized GraphRAG settings.yaml #387

feat: allow to use customized GraphRAG settings.yaml #387

ronchengang commented Oct 12, 2024 •

edited

Loading

taprosoft commented Oct 14, 2024

zzll22 commented Oct 14, 2024

ronchengang commented Oct 14, 2024 •

edited

Loading

ronchengang commented Oct 14, 2024

taprosoft left a comment

taprosoft commented Oct 14, 2024

zzll22 commented Oct 16, 2024

zzll22 commented Oct 16, 2024

feat: allow to use customized GraphRAG settings.yaml #387

feat: allow to use customized GraphRAG settings.yaml #387

Conversation

ronchengang commented Oct 12, 2024 • edited Loading

Description

Type of change

Checklist

taprosoft commented Oct 14, 2024

zzll22 commented Oct 14, 2024

ronchengang commented Oct 14, 2024 • edited Loading

ronchengang commented Oct 14, 2024

taprosoft left a comment

Choose a reason for hiding this comment

taprosoft commented Oct 14, 2024

zzll22 commented Oct 16, 2024

zzll22 commented Oct 16, 2024

ronchengang commented Oct 12, 2024 •

edited

Loading

ronchengang commented Oct 14, 2024 •

edited

Loading