Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow to use customized GraphRAG settings.yaml #387

Merged
merged 6 commits into from
Oct 14, 2024
Merged

feat: allow to use customized GraphRAG settings.yaml #387

merged 6 commits into from
Oct 14, 2024

Conversation

ronchengang
Copy link
Contributor

@ronchengang ronchengang commented Oct 12, 2024

Description

  • Please include a summary of the changes and the related issue.
    The GraphRAG index generation depends on the settings.yaml. In this file, there are many configurable items, many of which are crucial to the generation of the GraphRAG index, such as llm-related settings, request_timeout, concurrent_requests, etc. However, It is hard to write all these configurations to .env, because there are too many and it is not easy to manage. Although there are a few configurable items in the current .env.example that are ok for people who use OpenAI to complete the GraphRAG index without problem, for others who use private models like Ollama and need more advanced configurations, these few configurations are not enough. Several posts in the current issue list require a more flexible way to use more advanced configurations. This change is to allow users to use self-defined settings.yaml. To achieve this, users need to prepare the settings.yaml.example file by themselves, put it in the root folder, and add a new environment variable named USE_CUSTOMIZED_GRAPHRAG_SETTING in .env file, when its value is true, then this user-provided settings.yaml will be applied during the GraphRAG index process. in this way, use can use self-hosted models like Ollama and customize other configurations.

  • Fixes [REQUEST] - Can we have a settings.yaml file on the GraphRAG indexing module? #299 Ollama Graph Embedding Fails in Local LLM Setup #283 [REQUEST] - Can graphrag support other free LLMs like qwen2? #245 [BUG] - Ollama OpenAI not working #224 How to set the graphRAG with local ollama #212 and more

Type of change

  • New features (non-breaking change).
  • Bug fix (non-breaking change).
  • Breaking change (fix or feature that would cause existing functionality not to work as expected).

Checklist

  • I have performed a self-review of my code.
  • I have added thorough tests if it is a core feature.
  • There is a reference to the original bug report and related work.
  • I have commented on my code, particularly in hard-to-understand areas.
  • The feature is well documented.

@ronchengang ronchengang changed the title feat:allow to use customized GraphRAG settings.yaml feat: allow to use customized GraphRAG settings.yaml Oct 12, 2024
@taprosoft
Copy link
Collaborator

@ronchengang this works well enough for Indexing process but beware that Retrieval process use different set of params to choose Embedding model

text_embedder = OpenAIEmbedding(

Please also synchronize this setting to make everything seamless.

@zzll22
Copy link

zzll22 commented Oct 14, 2024

@ronchengang

Hello, I used your code but got an error:
Traceback (most recent call last):
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/main.py", line 85, in
index_cli(
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/cli.py", line 119, in index_cli
_initialize_project_at(root_dir, progress_reporter)
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/cli.py", line 190, in _initialize_project_at
raise ValueError(msg)
ValueError: Project already initialized at /Users/zhaolong/Desktop/TS_Platform/kotaemon-main/ktem_app_data/user_data/files/graphrag/0929df4a-e1eb-43dd-b96f-d17f5d1acee0
How to solve it?

@ronchengang
Copy link
Contributor Author

ronchengang commented Oct 14, 2024

@ronchengang this works well enough for Indexing process but beware that Retrieval process use different set of params to choose Embedding model

text_embedder = OpenAIEmbedding(

Please also synchronize this setting to make everything seamless.

@taprosoft done.

@ronchengang
Copy link
Contributor Author

@ronchengang

Hello, I used your code but got an error: Traceback (most recent call last): File "/opt/anaconda3/envs/kotaemon/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/opt/anaconda3/envs/kotaemon/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/main.py", line 85, in index_cli( File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/cli.py", line 119, in index_cli _initialize_project_at(root_dir, progress_reporter) File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/graphrag/index/cli.py", line 190, in _initialize_project_at raise ValueError(msg) ValueError: Project already initialized at /Users/zhaolong/Desktop/TS_Platform/kotaemon-main/ktem_app_data/user_data/files/graphrag/0929df4a-e1eb-43dd-b96f-d17f5d1acee0 How to solve it?

@zzll22 Could you please make sure the "Force reindex file" option is checked and try again?

Copy link
Collaborator

@taprosoft taprosoft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@taprosoft
Copy link
Collaborator

Thanks for your contribution @ronchengang.

@taprosoft taprosoft merged commit 8188760 into Cinnamon:main Oct 14, 2024
5 checks passed
@ronchengang ronchengang deleted the graphrag_adaption branch October 14, 2024 14:39
@zzll22
Copy link

zzll22 commented Oct 16, 2024

@ronchengang Can this allow me to use the transit API, such as the key of openapi? Can I only use the key of openai? (This is too expensive)

@zzll22
Copy link

zzll22 commented Oct 16, 2024

@ronchengang @taprosoft The latest branch reports an error using ollama+GraphRAG index:
Traceback (most recent call last):
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/ktem/ktem/index/file/pipelines.py", line 784, in stream
file_id, docs = yield from pipeline.stream(
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/ktem/ktem/index/file/pipelines.py", line 633, in stream
yield from self.handle_docs(docs, file_id, file_name)
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/ktem/ktem/index/file/pipelines.py", line 375, in handle_docs
self.handle_chunks_docstore(chunks, file_id)
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/ktem/ktem/index/file/pipelines.py", line 411, in handle_chunks_docstore
self.vector_indexing.add_to_docstore(chunks)
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/kotaemon/kotaemon/indices/vectorindex.py", line 86, in add_to_docstore
self.doc_store.add(docs)
File "/Users/zhaolong/Desktop/TS_Platform/kotaemon-main/libs/kotaemon/kotaemon/storages/docstores/lancedb.py", line 53, in add
document_collection.add(data)
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lancedb/table.py", line 1301, in add
self.schema,
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lancedb/table.py", line 973, in schema
return self._dataset.schema
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lancedb/table.py", line 955, in _dataset
return self._ref.dataset
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lancedb/table.py", line 836, in dataset
self._dataset = lance.dataset(
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lance/init.py", line 89, in dataset
ds = LanceDataset(
File "/opt/anaconda3/envs/kotaemon/lib/python3.10/site-packages/lance/dataset.py", line 168, in init
self._ds = _Dataset(
ValueError: Dataset at path Users/zhaolong/Desktop/TS_Platform/kotaemon-main/ktem_app_data/user_data/docstore/index_1.lance was not found: Not found: Users/zhaolong/Desktop/TS_Platform/kotaemon-main/ktem_app_data/user_data/docstore/index_1.lance/_versions/1.manifest, /Users/runner/work/lance/lance/rust/lance-table/src/io/commit.rs:140:23, /Users/runner/work/lance/lance/rust/lance/src/dataset/builder.rs:310:35

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[REQUEST] - Can we have a settings.yaml file on the GraphRAG indexing module?
5 participants