-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate_with_langchain_docs is broken #764
Comments
Same issue File "/Users/jshah/anaconda3/envs/hf/lib/python3.10/site-packages/ragas/llms/base.py", line 177, in agenerate_text |
Hey @jayshah5696 This is a different issue, your issue is addressed in #762 |
Hey @rolandgvc do you face this issue often? else can you kill the run and try again |
Same here with the azure api (python 3.10 and ragas 0.1.4). |
@floatcyc as mentioned refer to #762 and don't wrap azure model with |
I have the same issue with ".py" files... Ragas version: 0.1.4 Test.py:
What is strange to me is that if I run the same code above in a ".ipynb" file it works as a charm This issue happens everytime I run the script in a .py file, never in a .ipynb file |
I'm also having the same issue with Azure OpenAI. I've followed the manual (https://docs.ragas.io/en/latest/howtos/customisations/azure-openai.html#test-set-generation). Keep getting:
I've enabled the
It keeps on this forever ... |
Same issue here. Generating: 80%|████████████████████████████████████████████████████████▊ | 8/10 [00:08<00:01, 1.39it/s] ` Get stuck at 80% always. |
I got the same issue. Any suggestions? @shahules786 |
I tried it with packing the script in a fastapi app and run it with uvicorn. works like charm. Maybe this is also helpful for you guys because .ipynb in not usefull if you want to integrate it somehow. |
I also came across the same problem no matter on .ipynb and .py. |
Same issue... |
same problem, jsut following the instructions from the quickstart and it got stuck generating at 0%, also ran up a big openai bill which isnt much fun. |
Hey @JuliGTV Sorry for the trouble man. We are aware of this issue but we have in fact trained a smaller model for you guys to use for free. Please be patient till we can integrate it with ragas. |
no worries. Fine tuning a small model for this usecase is great idea. Although I would still like to understand better what went wrong, and if I could have done things differently I was just trying to follow the quickstart guide, and I kept getting openai ratelimit errors, mostly during the embedding stage. from dotenv import load_dotenv
load_dotenv()
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = PyPDFLoader("mini_uth.pdf")
docs = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size = 250,
chunk_overlap = 40,
length_function = len
)
documents = text_splitter.split_documents(docs)
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
critic_llm = ChatOpenAI(model="gpt-4-1106-preview")
embeddings = OpenAIEmbeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}) this time it completed the embedding and then got stuck at 0% at the generation stage. After I also tried doing it for just a single chunk of a small document, and it still got stuck at 0% generation |
Any progress ? Also got a couple of runs in a .ipynb today that got stuck at 0% or 80% complete. I finally trimmed down my document set to two instances and managed to generate 3 test cases in 13mins. Something seems to go wrong under the hood I assume. |
I have the same issue, use AzureOpenAI and stuck in generating 90%. |
This issue is also discussed here. #662 This issue is replicable across various machines and LLM model types . As others have mentioned, the error seems to be threading related. Here is a stack trace when the generation is stuck. ---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
/tmp/ipykernel_30263/1853206858.py in <cell line: 1>()
----> 1 testset = generator.generate_with_langchain_docs(docs[:5],
2 test_size=10,
3 distributions={simple: 0.5, reasoning: 0.4, multi_context: 0.1},
4 with_debugging_logs=True)
.../python3.8/site-packages/ragas/testset/generator.py in generate_with_langchain_docs(self, documents, test_size, distributions, with_debugging_logs, is_async, raise_exceptions, run_config)
173 distributions = distributions or {}
174 # chunk documents and add to docstore
--> 175 self.docstore.add_documents(
176 [Document.from_langchain_document(doc) for doc in documents]
177 )
.../python3.8/site-packages/ragas/testset/docstore.py in add_documents(self, docs, show_progress)
213 for d in self.splitter.transform_documents(docs)
214 ]
--> 215 self.add_nodes(nodes, show_progress=show_progress)
216
217 def add_nodes(self, nodes: t.Sequence[Node], show_progress=True):
.../python3.8/site-packages/ragas/testset/docstore.py in add_nodes(self, nodes, show_progress)
250 result_idx += 1
251
--> 252 results = executor.results()
253 if not results:
254 raise ExceptionInRunner()
.../python3.8/site-packages/ragas/executor.py in results(self)
130 executor_job.start()
131 try:
--> 132 executor_job.join()
133 finally:
134 ...
.../python3.8/threading.py in join(self, timeout)
1009
1010 if timeout is None:
-> 1011 self._wait_for_tstate_lock()
1012 else:
1013 # the behavior of a negative timeout isn't documented, but
.../python3.8/threading.py in _wait_for_tstate_lock(self, block, timeout)
1025 if lock is None: # already determined that the C code is done
1026 assert self._is_stopped
-> 1027 elif lock.acquire(block, timeout):
1028 lock.release()
1029 self._stop()
KeyboardInterrupt: |
Adding parameter is_async=False worked for me on 0.1.7. generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25},is_async=False) Edit: Actually, this was a red herring. Seems like the key to getting this to work was running in debug and stepping through some of the code which I presume is somehow preventing the deadlock. |
any help on this probably?for me even after adding is_async=False , it is stuck at Generating: 0%|.Would be helpful to get some solution to this.Thanks in advance. |
I managed to make this work using OpenAI's gpt-3.5-turbo-16k. However, I'm trying to create the dataset using Llama3 running on LMStudio, and I'm getting the same stuck errror. Any advances on this? |
Also having the same problem. The code is getting stuck at
Python from typing import List
from langchain_core.documents.base import Document
from langchain_google_vertexai import VertexAI, VertexAIEmbeddings
from ragas.testset.generator import TestsetGenerator
def create_ragas_rag_benchmarking_dataset(
llm_generator_model: VertexAI,
llm_critic_model: VertexAI,
embeddings_model: VertexAIEmbeddings,
docs: List[Document],
):
generator = TestsetGenerator.from_langchain(
generator_llm=llm_generator_model,
critic_llm=llm_critic_model,
embeddings=embeddings_model
)
# generate testset
testset = generator.generate_with_langchain_docs(
documents=docs,
test_size=10,
with_debugging_logs=True,
is_async=False,
distributions={
simple: 0.5,
reasoning: 0.25,
multi_context: 0.25
}
)
return testset |
Any one fixed this problem? |
[x] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
Running
generate_with_langchain_docs
gets stuck, showing:Ragas version: 0.1.4
Python version: 3.9
Code to Reproduce
Error trace
Expected behavior
A synthetic dataset should be created.
Additional context
I'm trying to generate a synthetic dataset of questions based on enron emails.
The text was updated successfully, but these errors were encountered: