RAGAS without ground_truth #1379

dasilvaviviane · 2024-09-29T16:37:26Z

Hi, is it possible to use the library without having the ground_truth for any metrics?

dosubot · 2024-09-29T16:38:15Z

Hey @dasilvaviviane! I'm here to help you with any issues or questions you have while waiting for a human maintainer. I'm a bot 🤖 that can assist you with bug fixes, answer questions, and guide you on how to become a contributor.

Yes, it is possible to use the RAGAS library without having the ground_truth for any metrics. Ragas can perform reference-free evaluations of various aspects of your RAG pipeline, meaning you don't need ground-truths when running the evaluations and can run it on production traces that you've collected with Langfuse. The library supports both LLM-based and non-LLM-based metrics, allowing for flexible evaluation methods that do not necessarily require ground-truth data ^[1]^[2].

To continue talking to Dosu, mention @dosu.

veenuvinod · 2024-09-30T09:14:41Z

how to evaluate ragas metrics if any key user input or contexts or answer or reference is not present or null
@dosu

jjmachan · 2024-09-30T16:04:45Z

you can checkout the General Purpose Metrics mentioned in the docs too - https://docs.ragas.io/en/latest/concepts/metrics (bottom)

dosubot · 2024-09-30T16:06:04Z

If any key user input, context, answer, or reference is not present or is null, the _single_turn_ascore method will not be able to generate the necessary statements for evaluation, and the _ascore method will return np.nan as the score. Additionally, the evaluate function in the RAGAS framework will raise a ValueError if the dataset is not provided or if the columns required for the metrics are missing or of the wrong format.

if answers is None:
    return np.nan

if dataset is None:
    raise ValueError("Provide dataset!")

The evaluate function also includes a raise_exceptions parameter, which, if set to False, will return np.nan for the row that failed instead of raising an exception.

raise_exceptions: bool = False,

This ensures that the evaluation process can handle missing or null values gracefully by either returning np.nan or raising an appropriate exception based on the configuration ^[1]^[2].

To continue talking to Dosu, mention @dosu.

veenuvinod · 2024-10-01T06:05:20Z

Thanks @jjmachan for quite good link that you posted.
Also I have a scenario where my code is hosted or there is app running. I have golden dataset by client that has 2 columns question and answer only. I have to get the answer from LLM and contexts retrieved at runtime for the same question that client gave. Then the metrics from RAGAS can be evaluated. How to carry out this. Any idea?
@dosu

veenuvinod · 2024-10-01T06:38:05Z

Also like what is the stable version of RAGAS as from pypi we have 0.1.20, but in this version many functions will be depreciated.

dasilvaviviane · 2024-10-10T17:16:16Z

you can checkout the General Purpose Metrics mentioned in the docs too - https://docs.ragas.io/en/latest/concepts/metrics (bottom)

Thanks, from your suggestion, I tried to use https://docs.ragas.io/en/latest/concepts/metrics/available_metrics/answer_relevance/#response-relevancy but I couldn't install SingleTurnSample, do you have any idea on why?

dosubot · 2024-10-10T17:20:18Z

To evaluate RAGAS metrics using a golden dataset with only question and answer columns, while retrieving answers from an LLM and contexts at runtime, follow these steps:

Prepare the Dataset: Ensure your dataset has a "question" column. The "answer" column will be used for comparison if needed.
Set Up the Query Engine: Use a query engine to process the questions and retrieve answers and contexts at runtime. Tools like LlamaIndex can help build your RAG pipeline.
Execute Queries: Submit each question to the query engine, retrieve the answers and contexts, and construct a new dataset with these results.
Evaluate with RAGAS: Use the new dataset, which includes questions, retrieved answers, and contexts, to evaluate using RAGAS metrics.

Here's a code snippet to guide you:

from datasets import Dataset
from ragas.evaluation import evaluate as ragas_evaluate
from ragas.executor import Executor

def evaluate_ragas(query_engine, dataset, metrics, llm=None, embeddings=None, raise_exceptions=False):
    if dataset is None:
        raise ValueError("Provide dataset!")

    exec = Executor(
        desc="Running Query Engine",
        keep_progress_bar=True,
        raise_exceptions=raise_exceptions,
    )

    queries = dataset["question"]
    for i, q in enumerate(queries):
        exec.submit(query_engine.aquery, q, name=f"query-{i}")

    answers = []
    contexts = []
    try:
        results = exec.results()
        if results == []:
            raise Exception("No results returned")
    except Exception as e:
        raise e
    else:
        for r in results:
            answers.append(r.response)
            contexts.append([n.node.text for n in r.source_nodes])

    hf_dataset = Dataset.from_dict(
        {
            "question": queries,
            "contexts": contexts,
            "answer": answers,
        }
    )

    results = ragas_evaluate(
        dataset=hf_dataset,
        metrics=metrics,
        llm=llm,
        embeddings=embeddings,
        raise_exceptions=raise_exceptions,
    )

    return results

This function will handle the evaluation process, leveraging tools like LlamaIndex for building the query engine and RAGAS for evaluation ^[1].

To continue talking to Dosu, mention @dosu.

jjmachan · 2024-10-10T18:05:04Z

@veenuvinod the metrics useful would be

answer_correctness
context_recall

others are also present but these would help you get started

@dasilvaviviane u're using the latest version - try the stable version instead

veenuvinod · 2024-11-11T12:23:20Z

Hi How to make verbose as True so that we can see how ragas actually evaluate the metrics? That will be more helpful to understand the logic behind the metrics, in case of ascore also if we can see whats happening inside ragas.
@jjmachan @dosu

jjmachan · 2024-11-14T03:51:58Z

we have something like that here https://docs.ragas.io/en/stable/howtos/applications/_metrics_llm_calls/#export-llm-traces @veenuvinod. Could you check it out and let us know if that is what you where looking for?

dosubot · 2024-11-14T03:53:04Z

To set verbose to True in RAGAS and see how the metrics are evaluated, you can call the evaluate function with the verbose parameter set to True. Here's an example:

results = evaluate(
    dataset_name="MyDataset",
    llm_or_chain_factory=my_llm,
    experiment_name="experiment_1_with_vanila_rag",
    verbose=True
)
print(results)

Setting verbose=True will print detailed progress and results during the evaluation process, helping you understand the logic behind the metrics, including the ascore ^[1].

To continue talking to Dosu, mention @dosu.

dasilvaviviane added the question Further information is requested label Sep 29, 2024

jjmachan added the answered 🤖 The question has been answered. Will be closed automatically if no new comments label Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAGAS without ground_truth #1379

RAGAS without ground_truth #1379

dasilvaviviane commented Sep 29, 2024

dosubot bot commented Sep 29, 2024

veenuvinod commented Sep 30, 2024 •

edited

Loading

jjmachan commented Sep 30, 2024

dosubot bot commented Sep 30, 2024

veenuvinod commented Oct 1, 2024

veenuvinod commented Oct 1, 2024

dasilvaviviane commented Oct 10, 2024

dosubot bot commented Oct 10, 2024

jjmachan commented Oct 10, 2024

veenuvinod commented Nov 11, 2024 •

edited

Loading

jjmachan commented Nov 14, 2024

dosubot bot commented Nov 14, 2024

RAGAS without ground_truth #1379

RAGAS without ground_truth #1379

Comments

dasilvaviviane commented Sep 29, 2024

dosubot bot commented Sep 29, 2024

veenuvinod commented Sep 30, 2024 • edited Loading

jjmachan commented Sep 30, 2024

dosubot bot commented Sep 30, 2024

veenuvinod commented Oct 1, 2024

veenuvinod commented Oct 1, 2024

dasilvaviviane commented Oct 10, 2024

dosubot bot commented Oct 10, 2024

jjmachan commented Oct 10, 2024

veenuvinod commented Nov 11, 2024 • edited Loading

jjmachan commented Nov 14, 2024

dosubot bot commented Nov 14, 2024

veenuvinod commented Sep 30, 2024 •

edited

Loading

veenuvinod commented Nov 11, 2024 •

edited

Loading