Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated testset generation question prompts to use JSON formatting instructions + prompt tweaks for smaller LLMs #1354

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/ragas/llms/json_load.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def load_as_json(text) -> t.Dict:

# not migrating to Prompt format to avoid circular imports
JSON_PROMPT = """\
Rewrite the input into valid json
Rewrite the input into valid json. Only output JSON and nothing else.

Input:
{{
Expand Down
77 changes: 51 additions & 26 deletions src/ragas/testset/prompts.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,31 @@
from ragas.llms.output_parser import RagasoutputParser, get_json_format_instructions
from ragas.llms.prompt import Prompt

from typing import List


class AnswerFormat(BaseModel):
answer: str
verdict: int


class KeyphraseFormat(BaseModel):
keyphrases: List[str]


class RelevantContextFormat(BaseModel):
relevant_contexts: List[int]


question_answer_parser = RagasoutputParser(pydantic_object=AnswerFormat)
keyphrase_parser = RagasoutputParser(pydantic_object=KeyphraseFormat)
relevant_context_parser = RagasoutputParser(pydantic_object=RelevantContextFormat)


reasoning_question_prompt = Prompt(
name="reasoning_question",
instruction="""Complicate the given question by rewriting question into a multi-hop reasoning question based on the provided context.
Answering the question should require the reader to make multiple logical connections or inferences using the information available in given context.
Answering the question should require the reader to make multiple logical connections or inferences using the information available in given context. Only output the question and nothing else.
Rules to follow when rewriting question:
1. Ensure that the rewritten question can be answered entirely from the information present in the contexts.
2. Do not frame questions that contains more than 15 words. Use abbreviation wherever possible.
Expand Down Expand Up @@ -43,7 +55,7 @@ class AnswerFormat(BaseModel):
multi_context_question_prompt = Prompt(
name="multi_context_question",
instruction="""
The task is to rewrite and complicate the given question in a way that answering it requires information derived from both context1 and context2.
The task is to rewrite and complicate the given question in a way that answering it requires information derived from both context1 and context2. Only output the question and nothing else.
Follow the rules given below while rewriting the question.
1. The rewritten question should not be very long. Use abbreviation wherever possible.
2. The rewritten question must be reasonable and must be understood and responded by humans.
Expand Down Expand Up @@ -73,7 +85,7 @@ class AnswerFormat(BaseModel):
conditional_question_prompt = Prompt(
name="conditional_question",
instruction="""Rewrite the provided question to increase its complexity by introducing a conditional element.
The goal is to make the question more intricate by incorporating a scenario or condition that affects the context of the question.
The goal is to make the question more intricate by incorporating a scenario or condition that affects the context of the question. Only output the question and nothing else.
Follow the rules given below while rewriting the question.
1. The rewritten question should not be longer than 25 words. Use abbreviation wherever possible.
2. The rewritten question must be reasonable and must be understood and responded by humans.
Expand All @@ -100,7 +112,8 @@ class AnswerFormat(BaseModel):
compress_question_prompt = Prompt(
name="compress_question",
instruction="""Rewrite the following question to make it more indirect and shorter while retaining the essence of the original question.
The goal is to create a question that conveys the same meaning but in a less direct manner. The rewritten question should shorter so use abbreviation wherever possible.""",
The goal is to create a question that conveys the same meaning but in a less direct manner. The rewritten question should shorter so use abbreviation wherever possible.
Only output the question and nothing else.""",
examples=[
{
"question": "What is the distance between the Earth and the Moon?",
Expand Down Expand Up @@ -185,28 +198,33 @@ class AnswerFormat(BaseModel):
keyphrase_extraction_prompt = Prompt(
name="keyphrase_extraction",
instruction="Extract the top 3 to 5 keyphrases from the provided text, focusing on the most significant and distinctive aspects. ",
output_format_instruction=get_json_format_instructions(KeyphraseFormat),
examples=[
{
"text": "A black hole is a region of spacetime where gravity is so strong that nothing, including light and other electromagnetic waves, has enough energy to escape it. The theory of general relativity predicts that a sufficiently compact mass can deform spacetime to form a black hole.",
"output": {
"keyphrases": [
"Black hole",
"Region of spacetime",
"Strong gravity",
"Light and electromagnetic waves",
"Theory of general relativity",
]
},
"output": KeyphraseFormat.parse_obj(
{
"keyphrases": [
"Black hole",
"Region of spacetime",
"Strong gravity",
"Light and electromagnetic waves",
"Theory of general relativity",
]
}
).dict(),
},
{
"text": "The Great Wall of China is an ancient series of walls and fortifications located in northern China, built around 500 years ago. This immense wall stretches over 13,000 miles and is a testament to the skill and persistence of ancient Chinese engineers.",
"output": {
"keyphrases": [
"Great Wall of China",
"Ancient fortifications",
"Northern China",
]
},
"output": KeyphraseFormat.parse_obj(
{
"keyphrases": [
"Great Wall of China",
"Ancient fortifications",
"Northern China",
]
}
).dict(),
},
],
input_keys=["text"],
Expand All @@ -216,7 +234,7 @@ class AnswerFormat(BaseModel):

seed_question_prompt = Prompt(
name="seed_question",
instruction="Generate a question that can be fully answered from given context. The question should be formed using topic",
instruction="Generate a question that can be fully answered from given context. The question should be formed using topic. Only output the question and nothing else.",
examples=[
{
"context": "Photosynthesis in plants involves converting light energy into chemical energy, using chlorophyll and other pigments to absorb light. This process is crucial for plant growth and the production of oxygen.",
Expand Down Expand Up @@ -272,6 +290,7 @@ class AnswerFormat(BaseModel):
find_relevant_context_prompt = Prompt(
name="find_relevant_context",
instruction="Given a question and set of contexts, find the most relevant contexts to answer the question.",
output_format_instruction=get_json_format_instructions(KeyphraseFormat),
examples=[
{
"question": "What is the capital of France?",
Expand All @@ -280,9 +299,11 @@ class AnswerFormat(BaseModel):
"2. The capital of France is Paris. It is also the most populous city in France, with a population of over 2 million people. Paris is known for its cultural landmarks like the Eiffel Tower and the Louvre Museum.",
"3. Paris is the capital of France. It is also the most populous city in France, with a population of over 2 million people. Paris is known for its cultural landmarks like the Eiffel Tower and the Louvre Museum.",
],
"output": {
"relevant_contexts": [1, 2],
},
"output": RelevantContextFormat.parse_obj(
{
"relevant_contexts": [2, 3],
}
).dict(),
},
{
"question": "How does caffeine affect the body and what are its common sources?",
Expand All @@ -291,7 +312,11 @@ class AnswerFormat(BaseModel):
"2. Regular physical activity is essential for maintaining good health. It can help control weight, combat health conditions, boost energy, and promote better sleep.",
"3. Common sources of caffeine include coffee, tea, cola, and energy drinks. These beverages are consumed worldwide and are known for providing a quick boost of energy.",
],
"output": {"relevant_contexts": [1, 2]},
"output": RelevantContextFormat.parse_obj(
{
"relevant_contexts": [1, 2],
}
).dict()
},
],
input_keys=["question", "contexts"],
Expand All @@ -303,7 +328,7 @@ class AnswerFormat(BaseModel):

question_rewrite_prompt = Prompt(
name="rewrite_question",
instruction="""Given a context, question and feedback, rewrite the question to improve its clarity and answerability based on the feedback provided.""",
instruction="""Given a context, question and feedback, rewrite the question to improve its clarity and answerability based on the feedback provided. Only output the question and nothing else.""",
examples=[
{
"context": "The Eiffel Tower was constructed using iron and was originally intended as a temporary exhibit for the 1889 World's Fair held in Paris. Despite its initial temporary purpose, the Eiffel Tower quickly became a symbol of Parisian ingenuity and an iconic landmark of the city, attracting millions of visitors each year. The tower's design, created by Gustave Eiffel, was initially met with criticism from some French artists and intellectuals, but it has since been celebrated as a masterpiece of structural engineering and architectural design.",
Expand Down
Loading