Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while implementing AzureChatOpenAI() using the documented code provided. Missing all values of Answer relevancy and some values of other metrics #1349

Closed
koushik9121 opened this issue Sep 23, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@koushik9121
Copy link

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug

Exception raised in Job[49]: TimeoutError() this is one sort of error I am getting
Exception raised in Job[29]: BadRequestError(Unsupported data type) this is the other
The exceptions are around 20 in total each of this format

All the values of answer of Answer relevancy are missing. Some values from the rest metrics are missing
Ragas version:0.1.19
Python version:3.11
OpenAI Version:1.33.0

Code to Reproduce
from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas import evaluate
azure_model = AzureChatOpenAI(
openai_api_version="2024-02-15-preview",
azure_endpoint=base_url,
azure_deployment="",
model="gpt-4o",
validate_base_url=False,
api_key="",
)
azure_embeddings = AzureOpenAIEmbeddings(
openai_api_version="2024-02-15-preview",
azure_endpoint=base_url,
azure_deployment="",
model="textembeddingada002",
api_key="",
)
from ragas.metrics import (
context_precision,
answer_relevancy,
faithfulness,
context_recall,
)
metrics = [
faithfulness,
answer_relevancy,
context_recall,
context_precision,
]
from datasets import load_dataset

amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
print(amnesty_qa)

from datasets import load_dataset
from ragas import RunConfig

run_config = RunConfig(timeout=300, log_tenacity=True)
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")
amnesty_qa
result = evaluate(
amnesty_qa["eval"], metrics=metrics, llm=azure_model, embeddings=azure_embeddings,run_config=run_config,
)
Error trace

Expected behavior
I am expecting to run fully as this is the code provided by the team to test, I want to test with new data but with the provide sample code ragas isn't working for me

Additional context
Add any other context about the problem here.

@koushik9121 koushik9121 added the bug Something isn't working label Sep 23, 2024
Copy link

dosubot bot commented Sep 23, 2024

Hey @koushik9121! I'm here to help you with any bugs, questions, or contributions. Let's work together to solve the issue you're facing.

To resolve the TimeoutError and BadRequestError (Unsupported data type) exceptions when implementing AzureChatOpenAI and evaluating metrics like answer relevancy, you can follow these steps:

  1. Ensure Correct Configuration: Double-check that your Azure OpenAI configurations are correct, including the base_url, model_deployment, model_name, embedding_deployment, and embedding_name.

  2. Handle TimeoutError: Increase the timeout settings for the API calls if possible. This can be done by setting a higher timeout value in the AzureChatOpenAI and AzureOpenAIEmbeddings initialization.

  3. Handle BadRequestError (Unsupported data type): Ensure that the data types being passed to the API are supported. This might involve converting data to the correct format before making API calls.

Here is the updated code with these considerations:

from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas import evaluate
import os

# Ensure your Azure OpenAI key is set in the environment
os.environ["AZURE_OPENAI_API_KEY"] = "your-azure-openai-api-key"

# Azure OpenAI configurations
azure_configs = {
    "base_url": "https://<your-endpoint>.openai.azure.com/",
    "model_deployment": "your-deployment-name",
    "model_name": "your-model-name",
    "embedding_deployment": "your-deployment-name",
    "embedding_name": "text-embedding-ada-002",  # most likely
}

# Initialize the chat model with a higher timeout
azure_model = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
    request_timeout=60  # Increase the timeout to 60 seconds
)

# Initialize the embeddings with a higher timeout
azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["embedding_deployment"],
    model=azure_configs["embedding_name"],
    request_timeout=60  # Increase the timeout to 60 seconds
)

# Load the dataset
from datasets import load_dataset
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")

# Import metrics
from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
)
from ragas.metrics.critique import harmfulness

# List of metrics to use
metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    harmfulness,
]

# Evaluate the dataset
result = evaluate(
    amnesty_qa["eval"], metrics=metrics, llm=azure_model, embeddings=azure_embeddings
)

# Convert results to pandas DataFrame
df = result.to_pandas()
df.head()

By increasing the request_timeout parameter, you can mitigate TimeoutError issues. Ensure that the data passed to the API is in the correct format to avoid BadRequestError (Unsupported data type) [1].

Additionally, there are known issues or limitations with the ragas library version 0.1.19 that could affect the evaluation of metrics like answer relevancy. Specifically, the _calculate_score method has a potential issue where if all generated questions are empty, it logs a warning and sets the score to NaN. Similarly, the _ascore method has a check for None answers, which also results in a NaN score [2].

For additional logging and debugging, you can enable debug mode by setting the environment variable RAGAS_DEBUG to true and use the patch_logger function to enable debug logging for specific modules [3][4][5]:

export RAGAS_DEBUG=true
from ragas.utils import patch_logger
import logging

patch_logger('your_module_name', logging.DEBUG)

This setup will help you capture detailed information about the execution and diagnose issues with missing metric values.

To continue talking to Dosu, mention @dosu.

@koushik9121 koushik9121 changed the title Error while implementing AzureChatOpenAI() using the documented code provided missing all values of Answer relevancy and Other metrics Error while implementing AzureChatOpenAI() using the documented code provided. Missing all values of Answer relevancy and some values of other metrics Sep 23, 2024
@koushik9121
Copy link
Author

koushik9121 commented Sep 23, 2024

I have checked by the keys and the names by making a test run using azure_model.invoke() and the api was returning correct answer. But what you need to understand is i am trying to run the code provided by the RAGAS team itself and still errors are being there. Help me with this issue

Copy link

dosubot bot commented Sep 23, 2024

To resolve the TimeoutError and BadRequestError (Unsupported data type) exceptions when running the provided RAGAS team code using AzureChatOpenAI and evaluating metrics like answer relevancy, ensure the following:

  1. TimeoutError: Increase the timeout settings for the Azure OpenAI API calls. This can be done by setting the timeout parameter in the AzureChatOpenAI and AzureOpenAIEmbeddings instances.

  2. BadRequestError (Unsupported data type): Ensure that the data types being passed to the API are supported. This typically involves ensuring that the inputs are in the correct format (e.g., strings for text inputs).

Here is the updated code with the necessary changes:

from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas import evaluate

# Configuration for Azure OpenAI
azure_configs = {
    "base_url": "https://<your-endpoint>.openai.azure.com/",
    "model_deployment": "your-deployment-name",
    "model_name": "your-model-name",
    "embedding_deployment": "your-deployment-name",
    "embedding_name": "text-embedding-ada-002",  # most likely
}

# Initialize the chat model with a timeout setting
azure_model = AzureChatOpenAI(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["model_deployment"],
    model=azure_configs["model_name"],
    validate_base_url=False,
    timeout=60  # Set a timeout of 60 seconds
)

# Initialize the embeddings with a timeout setting
azure_embeddings = AzureOpenAIEmbeddings(
    openai_api_version="2023-05-15",
    azure_endpoint=azure_configs["base_url"],
    azure_deployment=azure_configs["embedding_deployment"],
    model=azure_configs["embedding_name"],
    timeout=60  # Set a timeout of 60 seconds
)

# Load the dataset
from datasets import load_dataset
amnesty_qa = load_dataset("explodinggradients/amnesty_qa", "english_v2")

# Import metrics
from ragas.metrics import (
    context_precision,
    answer_relevancy,
    faithfulness,
    context_recall,
)
from ragas.metrics.critique import harmfulness

# List of metrics to use
metrics = [
    faithfulness,
    answer_relevancy,
    context_recall,
    context_precision,
    harmfulness,
]

# Initialize the metrics with LLMs and Embeddings
from ragas.run_config import RunConfig
from ragas.metrics.base import MetricWithLLM, MetricWithEmbeddings

def init_ragas_metrics(metrics, llm, embedding):
    for metric in metrics:
        if isinstance(metric, MetricWithLLM):
            metric.llm = llm
        if isinstance(metric, MetricWithEmbeddings):
            metric.embeddings = embedding
        run_config = RunConfig()
        metric.init(run_config)

# Wrappers for LLM and Embeddings
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper

init_ragas_metrics(
    metrics,
    llm=LangchainLLMWrapper(azure_model),
    embedding=LangchainEmbeddingsWrapper(azure_embeddings),
)

# Evaluate the dataset
result = evaluate(
    amnesty_qa["eval"], metrics=metrics, llm=azure_model, embeddings=azure_embeddings
)

# Convert results to pandas DataFrame
df = result.to_pandas()
df.head()

Ensure that the azure_configs dictionary is correctly populated with your Azure OpenAI endpoint details. The timeout parameter is set to 60 seconds, but you can adjust this value based on your needs. Additionally, the init_ragas_metrics function ensures that the metrics are properly initialized with the LLM and embeddings [1].

To continue talking to Dosu, mention @dosu.

@koushik9121
Copy link
Author

Serializing the data might help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant