How to write custom prompt for faithfullness with PydanticPrompt #1729

pratikchhapolika · 2024-12-04T10:14:50Z

How can I change the default prompt used in faithfulness metric.
Here is example that use default faithfulness prompt.

from ragas import EvaluationDataset
from langchain_openai.chat_models import AzureChatOpenAI
from langchain_openai.embeddings import AzureOpenAIEmbeddings
from ragas import SingleTurnSample 
from ragas.metrics import ResponseRelevancy
from langchain_core.callbacks import BaseCallbackHandler
from ragas import evaluate,RunConfig

from datasets import Dataset
from ragas.prompt import PydanticPrompt
from pydantic import BaseModel, Field
from datasets import Dataset 
from ragas.metrics import faithfulness
from ragas import evaluate
import os

class TestCallback(BaseCallbackHandler):

    def on_llm_start(self, serialized, prompts, **kwargs):
        print(f"**********Prompts*********:\n {prompts[0]}\n\n")

    def on_llm_end(self, response, **kwargs):
        print(f"**********Response**********:\n {response}\n\n")

data_samples = {
    'question': ['When was the first super bowl?'],
    'answer': ['The first superbowl was held on Jan 15, 1967'],
    'contexts' : [['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,'], 
    ],
}
dataset = Dataset.from_dict(data_samples)
score = evaluate(dataset,metrics=[faithfulness],
                 llm=azure_model,
                 embeddings=azure_embeddings,
                 raise_exceptions=True,
                 callbacks=[TestCallback()],
                 run_config=RunConfig(timeout=10,max_retries=1,max_wait=60,max_workers=1)
                )
score.to_pandas()

**********Prompts*********:
 Human: Given a question, an answer, and sentences from the answer analyze the complexity of each sentence given under 'sentences' and break down each sentence into one or more fully understandable statements while also ensuring no pronouns are used in each statement. Format the outputs in JSON.
Please return the output in a JSON format that complies with the following schema as specified in JSON Schema:
{'$defs': {'SentenceComponents': {'properties': {'sentence_index': {'description': 'The index of the sentence', 'title': 'Sentence Index', 'type': 'integer'}, 'simpler_statements': {'description': 'A list of simpler statements that can be directly inferred from the context', 'items': {'type': 'string'}, 'title': 'Simpler Statements', 'type': 'array'}}, 'required': ['sentence_index', 'simpler_statements'], 'title': 'SentenceComponents', 'type': 'object'}}, 'properties': {'sentences': {'description': 'A list of sentences and their simpler versions', 'items': {'$ref': '#/$defs/SentenceComponents'}, 'title': 'Sentences', 'type': 'array'}}, 'required': ['sentences'], 'title': 'SentencesSimplified', 'type': 'object'}

--------EXAMPLES-----------
Example 1
Input: {
    "question": "Who was Albert Einstein and what is he best known for?",
    "answer": "He was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time. He was best known for developing the theory of relativity, he also made important contributions to the development of the theory of quantum mechanics.",
    "sentences": {
        "0": "He was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time.",
        "1": "He was best known for developing the theory of relativity, he also made important contributions to the development of the theory of quantum mechanics."
    }
}
Output: {
    "sentences": [
        {
            "sentence_index": 0,
            "simpler_statements": [
                "Albert Einstein was a German-born theoretical physicist.",
                "Albert Einstein is recognized as one of the greatest and most influential physicists of all time."
            ]
        },
        {
            "sentence_index": 1,
            "simpler_statements": [
                "Albert Einstein was best known for developing the theory of relativity.",
                "Albert Einstein also made important contributions to the development of the theory of quantum mechanics."
            ]
        }
    ]
}
-----------------------------

Now perform the same with the following input
input: {
    "question": "When was the first super bowl?",
    "answer": "The first superbowl was held on Jan 15, 1967",
    "sentences": {}
}
Output: 


**********Response**********:
 generations=[[ChatGeneration(text='```json\n{\n    "sentences": []\n}\n```', generation_info={'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'protected_material_code': {'filtered': False, 'detected': False}, 'protected_material_text': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, message=AIMessage(content='```json\n{\n    "sentences": []\n}\n```', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 600, 'total_tokens': 612, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_04751d0b65', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}], 'finish_reason': 'stop', 'logprobs': None, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'protected_material_code': {'filtered': False, 'detected': False}, 'protected_material_text': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}, id='run-b5aad4d8-0ad7-4296-8338-ccdb36a20add-0', usage_metadata={'input_tokens': 600, 'output_tokens': 12, 'total_tokens': 612, 'input_token_details': {}, 'output_token_details': {}}))]] llm_output={'token_usage': {'completion_tokens': 12, 'prompt_tokens': 600, 'total_tokens': 612, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_04751d0b65', 'prompt_filter_results': [{'prompt_index': 0, 'content_filter_results': {'hate': {'filtered': False, 'severity': 'safe'}, 'jailbreak': {'filtered': False, 'detected': False}, 'self_harm': {'filtered': False, 'severity': 'safe'}, 'sexual': {'filtered': False, 'severity': 'safe'}, 'violence': {'filtered': False, 'severity': 'safe'}}}]} run=None type='LLMResult'


**********Prompts*********:
 Human: Your task is to judge the faithfulness of a series of statements based on a given context. For each statement you must return verdict as 1 if the statement can be directly inferred based on the context or 0 if the statement can not be directly inferred based on the context.
Please return the output in a JSON format that complies with the following schema as specified in JSON Schema:
{'$defs': {'StatementFaithfulnessAnswer': {'properties': {'statement': {'description': 'the original statement, word-by-word', 'title': 'Statement', 'type': 'string'}, 'reason': {'description': 'the reason of the verdict', 'title': 'Reason', 'type': 'string'}, 'verdict': {'description': 'the verdict(0/1) of the faithfulness.', 'title': 'Verdict', 'type': 'integer'}}, 'required': ['statement', 'reason', 'verdict'], 'title': 'StatementFaithfulnessAnswer', 'type': 'object'}}, 'properties': {'statements': {'items': {'$ref': '#/$defs/StatementFaithfulnessAnswer'}, 'title': 'Statements', 'type': 'array'}}, 'required': ['statements'], 'title': 'NLIStatementOutput', 'type': 'object'}

--------EXAMPLES-----------
Example 1
Input: {
    "context": "John is a student at XYZ University. He is pursuing a degree in Computer Science. He is enrolled in several courses this semester, including Data Structures, Algorithms, and Database Management. John is a diligent student and spends a significant amount of time studying and completing assignments. He often stays late in the library to work on his projects.",
    "statements": [
        "John is majoring in Biology.",
        "John is taking a course on Artificial Intelligence.",
        "John is a dedicated student.",
        "John has a part-time job."
    ]
}
Output: {
    "statements": [
        {
            "statement": "John is majoring in Biology.",
            "reason": "John's major is explicitly mentioned as Computer Science. There is no information suggesting he is majoring in Biology.",
            "verdict": 0
        },
        {
            "statement": "John is taking a course on Artificial Intelligence.",
            "reason": "The context mentions the courses John is currently enrolled in, and Artificial Intelligence is not mentioned. Therefore, it cannot be deduced that John is taking a course on AI.",
            "verdict": 0
        },
        {
            "statement": "John is a dedicated student.",
            "reason": "The context states that he spends a significant amount of time studying and completing assignments. Additionally, it mentions that he often stays late in the library to work on his projects, which implies dedication.",
            "verdict": 1
        },
        {
            "statement": "John has a part-time job.",
            "reason": "There is no information given in the context about John having a part-time job.",
            "verdict": 0
        }
    ]
}

Example 2
Input: {
    "context": "Photosynthesis is a process used by plants, algae, and certain bacteria to convert light energy into chemical energy.",
    "statements": [
        "Albert Einstein was a genius."
    ]
}
Output: {
    "statements": [
        {
            "statement": "Albert Einstein was a genius.",
            "reason": "The context and statement are unrelated",
            "verdict": 0
        }
    ]
}
-----------------------------

Now perform the same with the following input
input: {
    "context": "The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,",
    "statements": []
}
Output:

I assume it uses 2 steps to calculate faithfullnes. First prompt is to break into sentences and then calculate verdict score.

How can I use same prompt style , calling 2 times for my custom prompt.??

Here is what I tried:

# Step 1: Define custom input and output schemas
class CustomInput(BaseModel):
    question: str = Field(description="The question to be answered")
    answer: str = Field(description="The generated answer to the question")
    contexts: list[str] = Field(description="Relevant context documents")

class CustomOutput(BaseModel):
    score: float = Field(description="Faithfulness score between the answer and contexts")

# Step 2: Create a custom prompt
class CustomFaithfulnessPrompt(PydanticPrompt[CustomInput, CustomOutput]):
    instruction = "Evaluate how faithful the answer is to the provided contexts."
    input_model = CustomInput
    output_model = CustomOutput
    examples = [
        (
            CustomInput(
                question="What is the capital of France?",
                answer="The capital of France is Paris.",
                contexts=[
                    "France is a country in Europe. Its capital city is Paris, known for its landmarks like the Eiffel Tower."
                ]
            ),
            CustomOutput(score=1.0)
        )
    ]

# Step 3: Define the dataset
data_samples = {
    'question': ['When was the first super bowl?'],
    'answer': ['The first superbowl was held on Jan 15, 1967'],
    'contexts': [
        ['The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles,']
    ],
}
dataset = Dataset.from_dict(data_samples)
# print(dir(faithfulness))
faithfulness.long_form_answer_prompt=CustomFaithfulnessPrompt()
print("Custom prompt\n",faithfulness.long_form_answer_prompt.to_string())

# Step 4: Use evaluate with the custom prompt
score = evaluate(
    dataset,
    metrics=[faithfulness],
    llm=azure_model,
    embeddings=azure_embeddings,
    raise_exceptions=True,
    callbacks=[TestCallback()],
    run_config=RunConfig(timeout=10, max_retries=1, max_wait=60, max_workers=1),
)

# Convert the results to a DataFrame
print(score.to_pandas())

But its not working in the same way as default prompt.

The text was updated successfully, but these errors were encountered:

pratikchhapolika · 2024-12-04T18:28:21Z

@jjmachan would need your help on this

sahusiddharth · 2025-01-11T09:10:50Z

Hi @pratikchhapolika,

We have a section in our docs that explains how to change the prompt for the Ragas metrics. You can also have a look at the example below:

from ragas.metrics import Faithfulness

scorer = Faithfulness(llm=evaluator_llm)
scorer.get_prompts()

Output

{'n_l_i_statement_prompt': NLIStatementPrompt(instruction=Your task is to judge the faithfulness of a ..
'statement_generator_prompt': StatementGeneratorPrompt(instruction=Given a question, an answer, and sentences from ... }

Changing the prompt...

statement_prompt = scorer.get_prompts()["n_l_i_statement_prompt"]
verdict_prompt = scorer.get_prompts()["statement_generator_prompt"]

statement_prompt.instruction = "New statement breaking prompt"
statement_prompt.examples = [] # Conform to the example class of nli statement prompt if you want to change examples here

verdict_prompt.instruction = "New verdict generation prompt"
verdict_prompt.examples = [] # Conform to the example class of giving the verdict if you want to change examples here

scorer.set_prompts(
    **{
        "n_l_i_statement_prompt": statement_prompt,
        "statement_generator_prompt": verdict_prompt,
    }
)

scorer.get_prompts()

Output

{'n_l_i_statement_prompt': NLIStatementPrompt(instruction=New statement breaking prompt
'statement_generator_prompt': StatementGeneratorPrompt(instruction=New verdict generation prompt}

pratikchhapolika added the question Further information is requested label Dec 4, 2024

NoamDetournay mentioned this issue Dec 5, 2024

Prompt Error, TimeOut error #1693

Open

sahusiddharth added the module-metrics this is part of metrics module label Jan 11, 2025

sahusiddharth added the answered 🤖 The question has been answered. Will be closed automatically if no new comments label Jan 11, 2025

sahusiddharth self-assigned this Jan 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to write custom prompt for faithfullness with PydanticPrompt #1729

How to write custom prompt for faithfullness with PydanticPrompt #1729

pratikchhapolika commented Dec 4, 2024 •

edited by sahusiddharth

Loading

pratikchhapolika commented Dec 4, 2024

sahusiddharth commented Jan 11, 2025 •

edited

Loading

How to write custom prompt for faithfullness with PydanticPrompt #1729

How to write custom prompt for faithfullness with PydanticPrompt #1729

Comments

pratikchhapolika commented Dec 4, 2024 • edited by sahusiddharth Loading

pratikchhapolika commented Dec 4, 2024

sahusiddharth commented Jan 11, 2025 • edited Loading

pratikchhapolika commented Dec 4, 2024 •

edited by sahusiddharth

Loading

sahusiddharth commented Jan 11, 2025 •

edited

Loading