The `context_precision_score` always returns 0 when using Bedrock with Claude instant. #288

Pauldevillers · 2023-11-16T09:33:49Z

Describe the bug
The context_precision_score always returns 0 when using Bedrock with Claude instant.

I loaded Amazon Bedrock documentation with OpenSearch as a vector store, and I am performing evaluation using Ragas.

The context_precision_score splits each chunk retrieved from vector search and uses a prompt to verify if the information in the given context is useful in answering the question. The LLM should answer with a yes/no answer.

The problem is that the LLM does not answer only yes or no but also includes its thinking process, as shown below:

' Yes\n\nThe context provided is the Amazon Bedrock User Guide which contains information about the supported regions for Amazon Bedrock on page 3. This would be useful in answering the question "Where Amazon bedrock is available?" since it directly lists out the available regions.'

The issue lies in the context_precision.py file, specifically in the following line:

response = [int("Yes" in resp) for resp in response]

This line checks if "Yes" is present in each response within the grouped_responses list. In the example above, the response outputs 0 even though we can see the answer contains "yes."

I have four items inside my grouped_responses, and the error occurs because there is no exact match with the condition int("Yes" in resp). The condition looks for a strict "Yes" in the response, leading to the problem.

Python version and packages
Ragas version: Python 3.9.6
Python version: ragas==0.0.19

Options to Resolve the Error:

Option 1: Add few-shot learning in the context_precision call to follow a given schema.
Option 2: Instead of using [int("Yes" in resp) for resp in response], consider using regex to identify if there is a "Yes" in the return statement.
Please provide guidance on implementing one of the suggested options to resolve the error.

Code to Reproduce

from ragas.llms import LangchainLLM
from ragas.metrics import faithfulness, answer_relevancy,context_precision
from ragas.langchain import RagasEvaluatorChain
from langchain.llms.bedrock import Bedrock

bedrock_client = boto3.client("bedrock-runtime", region_name="us-east-1")

bedrock_llm = Bedrock(
        model_id="anthropic.claude-instant-v1", 
        client=bedrock_client,
        model_kwargs={'temperature': 0}
    )

# Set bedorck as Langchain wrapper

bedrock_wraper = LangchainLLM(llm=bedrock_llm)

qa = RetrievalQA.from_chain_type(llm=bedrock_llm, 
                                 chain_type="stuff", 
                                 retriever=opensearch_vector_search_client.as_retriever(),
                                 return_source_documents=True,
                                 chain_type_kwargs={"prompt": PROMPT, "verbose": True},
                                 verbose=True)

response = qa(question, return_only_outputs=False)

#context_precision
context_precision.llm = bedrock_wraper

metrics = [context_precision]
for m in metrics:
    m.__setattr__("llm", bedrock_wraper)

# make eval chains
eval_chains = {
    m.name: RagasEvaluatorChain(metric=m) for m in metrics
}

for name, eval_chain in eval_chains.items():
    score_name = f"{name}_score"
    print(f"{score_name}: {eval_chain(response)[score_name]}")

Expected behavior

context_precision_score: not 0 value

Additional context

Here are my four elements inside `grouped_responses``:

' No\n\nThe context provided is the copyright information and terms of use for an Amazon Bedrock user guide. It does not contain any information about where Amazon bedrock is available. So this context is not useful in answering the given question.'

' Yes\n\nThe context provided is the Amazon Bedrock User Guide which contains information about the supported regions for Amazon Bedrock on page 3. This would be useful in answering the question "Where Amazon bedrock is available?" since it directly lists out the available regions.'

' Yes, the context contains information that is useful in answering the question "Where Amazon bedrock is available?". \n\nThe context talks about Amazon Bedrock and mentions that it is a fully managed service that makes base models from Amazon and third-party model providers accessible through an API. It also lists the supported regions on page 3, which would contain the answer to where Amazon Bedrock is available.\n\nSo the context provides relevant information to answer the given question, therefore the answer is Yes.'

' No\n\nThe given context does not contain any information about where Amazon bedrock is available. The context only mentions "Amazon Bedrock User Guide" but does not provide any details about locations. Hence the context is not useful in answering the given question about locations of Amazon bedrock availability.'

The text was updated successfully, but these errors were encountered:

shahules786 · 2023-11-16T10:38:51Z

Hi @Pauldevillers, this is a dominant issue with different prompts in the framework. As the first step in tackling this, we are going to change the prompts so that the output follows specific structure like json and can be verified easily. Thanks for bringing this to our notice, you can expect a fix very soon :)

IgnacioPascale · 2023-11-16T11:07:34Z

I support this @Pauldevillers. Using Llama-2-7b-chat-hf, my grouped_responses object looks as follows

[[['  Yes. The information']],
 [['  Yes, the information']],
 [['  Yes. The information']],
 [['  Yes. The information']],
 [['  Yes. The information']],
 [['  Yes. The information']],
 [['  Yes. The information']],
 [['  Yes. The information']],
 [['  Yes. The information']],
 [['  Yes. The information']],
 [['  Yes. The information']],
 [['  Yes.\n\n']],
 [['  Yes. The information']],
 [['  Yes.\n\n']],
 [['  Yes. The information']]]

So I get 0 in all examples.

Thanks for looking into it @shahules786

Pauldevillers · 2023-11-16T11:19:35Z

Hello @shahules786 , I just summited a PR to tackle this issue: #289

shahules786 · 2023-11-16T12:21:44Z

Thank you @Pauldevillers :)

ferdinandl007 · 2023-11-16T17:10:16Z

The same happens for vertex text bison

kandakji · 2023-11-17T10:11:42Z

same issue here

mohade09 · 2023-11-24T13:15:33Z

Now i am getting NaN

jjmachan · 2023-11-29T15:05:13Z

this is tricky because this is both a bug and only a long term problem. I think we can start off with having prompts for Bedrock and Vertext AI maybe @shahules786 .
it would be an extension of what @tinomaxthayil and Austin (from AWS) was working that would be faster to merge in?

kishoreiitd · 2023-12-08T18:57:07Z

I am also getting same error while using claude v2 via Bedrock.

shahules786 · 2023-12-09T12:26:07Z

Hi @kishoreiitd can you share the ragas version? This should be fixed with #364

kishoreiitd · 2023-12-11T04:35:32Z

Thanks @shahules786 This has been fixed. I was using version 0.0.21, however I was trying it before this #364.

jjmachan · 2024-01-08T14:58:45Z

addressed with #364
please raise a new issue if this still persists

jjmachan assigned shahules786 Nov 29, 2023

jjmachan added this to the v0.1.0 milestone Nov 29, 2023

jjmachan added bug Something isn't working enhancement New feature or request labels Nov 29, 2023

jjmachan closed this as completed Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `context_precision_score` always returns 0 when using Bedrock with Claude instant. #288

The `context_precision_score` always returns 0 when using Bedrock with Claude instant. #288

Pauldevillers commented Nov 16, 2023 •

edited

Loading

shahules786 commented Nov 16, 2023

IgnacioPascale commented Nov 16, 2023

Pauldevillers commented Nov 16, 2023 •

edited

Loading

shahules786 commented Nov 16, 2023

ferdinandl007 commented Nov 16, 2023

kandakji commented Nov 17, 2023

mohade09 commented Nov 24, 2023

jjmachan commented Nov 29, 2023

kishoreiitd commented Dec 8, 2023

shahules786 commented Dec 9, 2023

kishoreiitd commented Dec 11, 2023

jjmachan commented Jan 8, 2024

The context_precision_score always returns 0 when using Bedrock with Claude instant. #288

The context_precision_score always returns 0 when using Bedrock with Claude instant. #288

Comments

Pauldevillers commented Nov 16, 2023 • edited Loading

shahules786 commented Nov 16, 2023

IgnacioPascale commented Nov 16, 2023

Pauldevillers commented Nov 16, 2023 • edited Loading

shahules786 commented Nov 16, 2023

ferdinandl007 commented Nov 16, 2023

kandakji commented Nov 17, 2023

mohade09 commented Nov 24, 2023

jjmachan commented Nov 29, 2023

kishoreiitd commented Dec 8, 2023

shahules786 commented Dec 9, 2023

kishoreiitd commented Dec 11, 2023

jjmachan commented Jan 8, 2024

The `context_precision_score` always returns 0 when using Bedrock with Claude instant. #288

The `context_precision_score` always returns 0 when using Bedrock with Claude instant. #288

Pauldevillers commented Nov 16, 2023 •

edited

Loading

Pauldevillers commented Nov 16, 2023 •

edited

Loading