Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: add domain specific rubrics based scoring #1189

Merged
merged 23 commits into from
Aug 15, 2024

Conversation

vaishakhRaveendran
Copy link
Contributor

@vaishakhRaveendran vaishakhRaveendran commented Aug 12, 2024

This can be regarded as a next step refining on aspect critic metric. Ie
LEVEL 1: user evaluated using just simple criteria
LEVEL 2: the user specifies criteria associated with each score for the entire dataset
LEVEL 3: user specifies individual criteria associated with each score for each sample in the dataset

from ragas import evaluate
from datasets import Dataset, DatasetDict

from ragas.metrics import reference_free_rubrics_score, labelled_rubrics_score
rows = {
    "question": [
        "What's the longest river in the world?",
        "What does the Democratic Republic of Congo flag represent?"
    ],
    "ground_truth": [
        "The Nile is a major north-flowing river in northeastern Africa. It flows into the Mediterranean Sea. The Nile is the longest river in Africa and has historically been considered the longest river in the world, though this has been contested by research suggesting that the Amazon River is slightly longer. Of the world's major rivers, the Nile is one of the smallest, as measured by annual flow in cubic metres of water. About 6,650 km (4,130 mi) long, its drainage basin covers eleven countries: the Democratic Republic of the Congo, Tanzania, Burundi, Rwanda, Uganda, Kenya, Ethiopia, Eritrea, South Sudan, Sudan, and Egypt.",
        "The national flag of the Democratic Republic of the Congo represents blue for peace, red for 'the blood of the country's martyrs', yellow for the country's wealth, and a star for a radiant future for the country."
    ],
    "answer": [
        "The longest river in the world is the Nile, stretching approximately 6,650 kilometers (4,130 miles) through northeastern Africa, flowing through countries such as Uganda, Sudan, and Egypt before emptying into the Mediterranean Sea. There is some debate about this title, as recent studies suggest the Amazon River could be longer if its longest tributaries are included, potentially extending its length to about 7,000 kilometers (4,350 miles).",
        "The flag of the Democratic Republic of the Congo (DRC) features a sky blue field with a red diagonal stripe bordered by narrow yellow edges, and a yellow five-pointed star in the upper left corner. Each element on the flag carries specific symbolism: the blue represents peace, the red symbolizes the blood of the country's martyrs, the yellow denotes the nation's wealth, and the star stands for hope for a better future."
    ],
    "contexts": [
        [
            "Scientists debate whether the Amazon or the Nile is the longest river in the world. Traditionally, the Nile is considered longer, but recent information suggests that the Amazon may be longer.",
            "The Nile River was central to the Ancient Egyptians' rise to wealth and power. Since rainfall is almost non-existent in Egypt, the Nile River and its yearly floodwaters offered the people a fertile oasis for rich agriculture.",
            "The world's longest rivers are defined as the longest natural streams whose water flows within a channel, or streambed, with defined banks.",
            "The Amazon River could be considered longer if its longest tributaries are included, potentially extending its length to about 7,000 kilometers."
        ],
        [
            "The flag of the second Republic of Mobutu Sese Seko became the official banner after Mobutu established his dictatorship. This flag was used from 1966 to 1971 and consisted of the same yellow star, now made smaller, situated in the top corner of the hoist side, with a red, yellow-lined band running diagonally across the center. The red symbolized the people's blood; the yellow symbolized prosperity; the blue symbolized hope; and the star represented unity.",
            "The current flag of the Democratic Republic of Congo, which has been adopted after the approval of a new constitution in 2006, is composed of a blue sheet, red diagonal stripe and a yellow five-pointed star at the top of the left part of the flag. Blue symbolizes peace, red stands for blood of martyrs, yellow color that frames the red stripe denotes prosperity and the star represents hope for a brighter future of the country.",
            "The blue color in the flag symbolizes peace, the red should remind of the country’s martyrs, the yellow is for the country’s riches and the star represents the future."
        ]
    ]
}



dataset = Dataset.from_dict(rows)

result = evaluate(
    dataset,
    metrics=[
        reference_free_rubrics_score,
        labelled_rubrics_score
    ],
)

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Aug 12, 2024
@shahules786 shahules786 self-requested a review August 13, 2024 06:35
@shahules786
Copy link
Member

Thanks for the PR @vaishakhRaveendran
I checked some of the Prometheus models and unfortunately, none of them seem to be fine-tuned for JSON output. I have made this to their notice - issue.

For now lets implement this as model agnostic metric and allow users to Prometheus style evaluation

src/ragas/metrics/_prometheus.py Outdated Show resolved Hide resolved
src/ragas/metrics/_prometheus.py Outdated Show resolved Hide resolved
src/ragas/metrics/_prometheus.py Outdated Show resolved Hide resolved
Copy link
Member

@shahules786 shahules786 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please add documentation along with this PR

@shahules786 shahules786 changed the title Integrate prometheus-eval into RAGAS metrics: add prometheus style rubrics based scoring Aug 14, 2024
@shahules786 shahules786 requested review from shahules786 and jjmachan and removed request for shahules786 August 14, 2024 07:38
@shahules786 shahules786 changed the title metrics: add prometheus style rubrics based scoring metrics: add domain specific rubrics based scoring Aug 14, 2024
@shahules786
Copy link
Member

Hey @jjmachan on further inspecting and reading through the Prometheus paper I understood that Prometheus needs instance-level criteria and rubrics which cannot be supported by Ragas as of now ( because we only have limited columns as input) - this would be an extra input associated with each row to be evaluated.
So I am changing this metric to a specific rubric that is one rubric per dataset.
See image here to understand the difference.
IMO both should be supported - but former one can only be added once we decide on new input representation.
evals image

@shahules786 shahules786 requested a review from jjmachan August 14, 2024 14:08
@shahules786 shahules786 merged commit 5c1f9a2 into explodinggradients:main Aug 15, 2024
15 checks passed
shahules786 added a commit that referenced this pull request Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants