Index Error when trying to evaluate() a simple example #1710

taledv · 2024-11-26T17:05:16Z

[ ] I checked the issues and you site, and couldn't find an answer to my question.

My understanding + question

As I understand, to use the evaluate(), you can send your LLM (you can also not send and it will take a default chatgpt 4o one). The important point is that you don't need to send an object that combines both the LLM and the retrieval object, because when you send the dataset to the evaluate() it is already after performing retrieval and generation beforehand, and you just want to evaluate (dah).
If my understanding is correct, I don't know why the following code fails with the next Error:

Token indices sequence length is longer than the specified maximum sequence length for this model (1921 > 1024). Running this sequence through the model will result in indexing errors
Exception raised in Job[0]: IndexError(index out of range in self)

As you will see in the code below, I have a very simple example of the data, with probably few dozens of tokens, so why it says it somehow exceeds the maximum input number of tokens for the model (which 1024). I don't understand how it gets to 1921.

Ragas version: 0.2.6
Python version: 3.10.8

Code

from langchain.llms import HuggingFacePipeline
from ragas.llms import LangchainLLMWrapper
from ragas import EvaluationDataset
from ragas.metrics import Faithfulness
from ragas import evaluate
from datasets import Dataset, DatasetDict
import torch

torch.device('mps')

data = {
    'user_input': ["When was America founded?"],
    'retrieved_contexts':  [[
    "The United States has over 331 million people.",
    'The United States of America was founded in 1776']],
    'response': ["America was founded in 1776"],
    'reference': ["America was founded in 1776"]
}

custom_dataset = DatasetDict({"eval": Dataset.from_dict(data)})
eval_dataset_custom = EvaluationDataset.from_hf_dataset(custom_dataset['eval'])

llm = HuggingFacePipeline.from_model_id(
    model_id='gpt2',  
    task="text-generation",
    pipeline_kwargs={"max_new_tokens": 50},
)

results = evaluate(dataset=eval_dataset_custom, metrics=[Faithfulness(llm=LangchainLLMWrapper(llm))])

Additional context

I have closed my other issue (#1700), since there I used a RetrievalQA.from_chain_type(llm=llm, retriever=retriever) in the evaluate(), while as I understand it is not needed, and only the llm object is. Now I get a different error as mentioned above, I would really appreciate your help. I think I wrote the simplest example that should work.

sahusiddharth · 2025-01-11T13:20:00Z

Hi @taledv

This is happening due to the prompt used in the faithfulness metric. You can check the prompt with the code below to see how it's affecting the token count:

from ragas.metrics import Faithfulness
from ragas.llms import LangchainLLMWrapper
from langchain_openai import ChatOpenAI

evaluator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
scorer = Faithfulness(llm=evaluator_llm)
scorer.get_prompts()

If you’d like to modify it, you can follow the instructions in our how to modify metric prompt docs.

taledv added the question Further information is requested label Nov 26, 2024

dosubot bot added the bug Something isn't working label Nov 26, 2024

sahusiddharth added module-metrics this is part of metrics module answered 🤖 The question has been answered. Will be closed automatically if no new comments labels Jan 11, 2025

sahusiddharth self-assigned this Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Index Error when trying to evaluate() a simple example #1710

Index Error when trying to evaluate() a simple example #1710

taledv commented Nov 26, 2024 •

edited by sahusiddharth

Loading

sahusiddharth commented Jan 11, 2025

Index Error when trying to evaluate() a simple example #1710

Index Error when trying to evaluate() a simple example #1710

Comments

taledv commented Nov 26, 2024 • edited by sahusiddharth Loading

My understanding + question

Code

Additional context

sahusiddharth commented Jan 11, 2025

taledv commented Nov 26, 2024 •

edited by sahusiddharth

Loading