Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-279] [R-280] Summarization Score Formula is unreasonable #1108

Closed
ofooo opened this issue Jul 18, 2024 · 7 comments · Fixed by #1201
Closed

[R-279] [R-280] Summarization Score Formula is unreasonable #1108

ofooo opened this issue Jul 18, 2024 · 7 comments · Fixed by #1201
Assignees
Labels
bug Something isn't working linear Created by Linear-GitHub Sync
Milestone

Comments

@ofooo
Copy link

ofooo commented Jul 18, 2024

I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
I read the doc for summarization Score.
https://docs.ragas.io/en/stable/concepts/metrics/summarization_score.html

1

QA score, higher is better. range is (0, 1).

but, consiseness score, lower is better. range is (0, infinity).

so, summarization score = (QA score + conciseness score)/2, It's unreasonable.

I think the "consiseness score" should be converted into a valut that range is (0, 1), and the larger the better.

Ragas version: stable

R-280

@ofooo ofooo added the bug Something isn't working label Jul 18, 2024
@ofooo ofooo changed the title Summarization Score Formula error Summarization Score Formula is unreasonable Jul 18, 2024
Copy link

dosubot bot commented Jul 18, 2024

Hey @ofooo! I'm here to help you with your bug and answer any questions you have while we wait for a human maintainer. Let's solve this together!

To address the issue with the Summarization Score formula, you can adjust the conciseness score to fit within the 0 to 1 range and ensure that a higher score is better. The current implementation in the Ragas framework already does this. Here is the relevant code:

def _compute_conciseness_score(self, text, summary) -> float:
    """Returns the conciseness score of the summary. This is calculated as
    (1- relative_length_of_summary), where relative_length_of_summary is the
    ratio of the length of the summary to the length of the original text.
    This promotes shorter summaries.
    """
    return 1 - (len(summary) / len(text))

This method calculates the conciseness score as 1 - (len(summary) / len(text)), ensuring that the score is within the 0 to 1 range and that a higher score indicates a more concise summary [1].

Additionally, the Summarization Score is calculated as the average of the QA score and the conciseness score:

def _compute_score(self, scores) -> float:
    return sum(scores) / len(scores)

These scores are combined in the _ascore method:

async def _ascore(self, row: Dict, callbacks: Callbacks, is_async: bool) -> float:
    text: str = "\n".join(row["contexts"])
    summary: str = row["summary"]
    keyphrases = await self._extract_keyphrases(text, callbacks, is_async)
    questions = await self._get_questions(text, keyphrases, callbacks, is_async)
    answers = await self._get_answers(questions, summary, callbacks, is_async)

    scores = []
    qa_score = self._compute_qa_score(answers)
    scores.append(qa_score)
    if self.length_penalty:
        conciseness_score = self._compute_conciseness_score(text, summary)
        scores.append(conciseness_score)
    return self._compute_score(scores)
}

This ensures that both the QA score and the conciseness score are within the 0 to 1 range, making the Summarization Score meaningful and balanced [1].

To continue talking to Dosu, mention @dosu.

@shahules786
Copy link
Member

Hey @ofooo good point. I will take care of it this week.

@shahules786 shahules786 self-assigned this Jul 22, 2024
@jjmachan jjmachan added the linear Created by Linear-GitHub Sync label Jul 31, 2024
@jjmachan jjmachan added this to the v0.1.13 milestone Jul 31, 2024
@jjmachan jjmachan changed the title Summarization Score Formula is unreasonable [R-279] Summarization Score Formula is unreasonable Jul 31, 2024
@jjmachan jjmachan changed the title [R-279] Summarization Score Formula is unreasonable [R-280] Summarization Score Formula is unreasonable Jul 31, 2024
@jjmachan jjmachan changed the title [R-280] Summarization Score Formula is unreasonable [R-279] [R-280] Summarization Score Formula is unreasonable Jul 31, 2024
@sky-2002
Copy link
Contributor

Hey @ofooo , you are absoultely correct, In the code, its correct, I missed this in docs, I have updated the docs
But yes, we need to handle cases where summary length is higher than text length, we can straightaway assign 0 score maybe
image

@shahules786
Copy link
Member

Thanks for pitching in @sky-2002 and pointing out the issue @ofooo
@sky-2002 can you fix that edge case in the summarisation metric?
Another point I would like to make is summaries can be hard to score in general - even humans struggle with it. In coming versions, we will add support to ranking based metrics (it's easier to rank such tasks)

@sky-2002
Copy link
Contributor

@shahules786 interesting points you made, we can discuss these in the above PR I created or on discord, there are more points to discuss on summarization,
@ofooo you can also join the discussion in the PR, suggestions are welcome

@shahules786
Copy link
Member

shahules786 commented Aug 16, 2024

Hey @ofooo @sky-2002 An easy and intuitive fix for this is modify conciseness score as

conciseness score = min(length of summary, length of context) / (length of context + 1)
thereby mapping into range of (0,1)

Then since a conciseness score, lower is better , we take (1 - consiseness score) when combining it with the QA score

So final will be = [QA Score + (1 - consiseness_score) ] / 2

One more suggestion, here we have assumed both to be QA and consiness to be equal weightage, but ideally user should be able to control it. So adding an extra argument as coeff (which is 0-1) , the score would be

score = coeff * QA Score + (1-coeff) *(1 - consiseness_score)

How does that look guys?

@sky-2002
Copy link
Contributor

sky-2002 commented Aug 16, 2024

Great idea @shahules786 , I had proposed this weighing initially but we didnt want too much work on the user side, but anyways, its good to let user control it, fixing it right away

@jjmachan jjmachan linked a pull request Aug 27, 2024 that will close this issue
jjmachan pushed a commit that referenced this issue Aug 27, 2024
This PR adds a fix for the issue mentioned in #1108 

However I have a points to discuss @shahules786 :
- I had added `conciseness_score` to penalize long summaries, but I also
do not want to promote very very short and skimpy summaries, need to
find a middle ground.
- Is `averaging` a good way to combine `QA_score` and
`conciseness_score`?
- Ranking based metrics to measure quality of summarization (as
mentioned by shahul in the above issue)

Given the conclusions we reach based on these discussion points, I will
push more commits, let's keep this PR open till we resolve these points.

---------

Co-authored-by: Shahules786 <[email protected]>
@jjmachan jjmachan modified the milestones: v0.1.13, v.18 Aug 27, 2024
@jjmachan jjmachan closed this as not planned Won't fix, can't repro, duplicate, stale Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working linear Created by Linear-GitHub Sync
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants