-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Testset Generation: Is going into continuous loop #662
Comments
Hey @Vtej98 , can you try to use it with a limited number of documents first? Also ensuring with_debugging argument to true so that we have better context on where it is getting stuck? |
Hello @shahules786 , The number of documents used is 1, it hardly contains 9 pages. Please find the debug log:Filename and doc_id are the same for all nodes. It's getting freeze here, and draining the openAI tokens Python version: 3.8 I also tried from python version: 3.12 It's the same thing |
It's the same for the latest version of ragas as well. It's just freezing, after 1/3/4 questions generation |
I'm facing the same problem too. |
@Kelp710 did you do automatic language adaptation before using it with Japanese documents?https://docs.ragas.io/en/stable/howtos/applications/use_prompt_adaptation.html |
@shahules786
|
Getting the same error (with llama-index running both on notebook and as a script) - keeps on going even after setting number of docs to 1:
As @asti009asti mentioned, this seems to be a theading issue - @shahules786 do you have any pointers as to what might be causing this? Happy to contribute |
Stuck on generating at 50%, unable to stop the python interpreter too. Tried installing from source like @omerfguzel mentioned, and running on just a single english markdown document. Tried several times, with several different models, but all were stuck at 50% The document: https://github.com/awsdocs/aws-doc-sdk-examples/tree/main/python/cross_service/apigateway_covid-19_tracker
|
a couple notes here as it might help:
|
I am not sure if the continuous loop is due to the local environment. When I change to Google Colab, instead of running the Python file in Windows PowerShell, it works. The generation using the API is still slow, though. |
Any update on this issue as I am also facing this issue with latest ragas library, it always stuck on 90% test generation ? |
I am trying to generate testset by following the Guide from Ragas And it's stuck at 14% for embedding nodes. Any updates about this issue will be helpful. |
keeping this open for now but we have made a lot of improvements to this in the latest releases - could you folks check that out? also we are doing a refactor in #1016 and will keep these points in mind thanks for reporting the issues 🙂 and apologies for the hard time but we'll keep ironing these out 💪🏽 |
I still experience this issue, on 1.14 and also ragas-0.1.15 when using alternative LLM's or Embeddings. For example AzureChatOpenAI and CohereEmbeddings do not work. |
Question
I am not sure, what's happening. I see that the testset data isn't generating and just going into a continuous loop, exhausting the tokens of openAI
My Code
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain.document_loaders import DirectoryLoader
import os
OPENAI_API_KEY = "sk-xxxxxx"
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
def load_docs(directory):
loader = DirectoryLoader(directory)
documents = loader.load()
return documents
documents = load_docs("./source")
for document in documents:
document.metadata['file_name'] = document.metadata['source']
generator = TestsetGenerator.with_openai()
testset = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})
testset.to_pandas()
testset.to_pandas().to_excel('output_data.xlsx', index=False)
Additional context
I did explore the code, and found that it has retry mechanisms, of 15 retries and a wait time of 90 seconds. But I still waited for a long time no response of completion.
The text was updated successfully, but these errors were encountered: