Skip to content

Commit

Permalink
ragas: pass rubric in via RubricScore.SingleTurnPrompt.instruction
Browse files Browse the repository at this point in the history
In ragas, RubricScores.rubrics isn't being used
anywhere except in __repr__ thus it was not being
passed into the prompt for the judge to evaluate
responses againist reference answers.

Passing a string rubric into SingleTurnPrompt.instruction
allows us to pass the rubric into the prompt sent
to the judge.

Signed-off-by: Ali Maredia <[email protected]>
  • Loading branch information
alimaredia committed Jan 12, 2025
1 parent 03afb6c commit 8034f7e
Showing 1 changed file with 21 additions and 2 deletions.
23 changes: 21 additions & 2 deletions src/instructlab/eval/ragas.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
from ragas.evaluation import EvaluationDataset, EvaluationResult, RunConfig, evaluate
from ragas.metrics import Metric
from ragas.metrics._domain_specific_rubrics import ( # the rubrics we must instantiate are located inside of a file marked as private
DEFAULT_WITH_REFERENCE_RUBRICS,
RubricsScore,
SingleTurnPrompt,
)

# Local
Expand All @@ -22,6 +22,23 @@

logger = setup_logger(__name__)

RUBRIC = """You are an evaluation system tasked with assessing the answer quality of a AI generated response in relation to the posed question and reference answer. Assess if the response is correct, accurate, and factual based on the reference answer.
For evaluating factuality of the answer look at the reference answer compare the model answer to it.
Evaluate the answer_quality as:
- Score 1: The response is completely incorrect, inaccurate, and/or not factual.
- Score 2: The response is mostly incorrect, inaccurate, and/or not factual.
- Score 3: The response is somewhat correct, accurate, and/or factual.
- Score 4: The response is mostly correct, accurate, and factual.
- Score 5: The response is completely correct, accurate, and factual.
Here is the question: \n ------- \n {user_input} \n -------
Here is model answer: \n ------- \n {response} \n -------
Here is the reference answer(may be very short and lack details or indirect, long and extractive): \n ------- \n {reference} \n ------- \n
Assess the quality of model answer with respect to the Reference Answer, but do not penalize the model answer for adding details or give a direct answer to user question.
Approach your evaluation in step-by-step manner.
For evaluating first list out keys facts covered in the reference answer and check how many are covered by the model answer.
If the question or reference answer is about steps then check if the steps and their order in model answer match with reference answer.
Provide your response as JSON object with two keys: 'reasoning' and 'answer_quality'."""


class Sample(TypedDict):
"""
Expand Down Expand Up @@ -257,8 +274,10 @@ def _generate_answers_from_model(
@staticmethod
def _get_metrics() -> List[Metric]:
# default set of metrics
st_prompt = SingleTurnPrompt()
st_prompt.instruction = RUBRIC
return [
RubricsScore(
rubrics=DEFAULT_WITH_REFERENCE_RUBRICS,
single_turn_prompt=st_prompt,
)
]

0 comments on commit 8034f7e

Please sign in to comment.