[R-279] [R-280] Summarization Score Formula is unreasonable #1108

ofooo · 2024-07-18T07:22:48Z

I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug
I read the doc for summarization Score.
https://docs.ragas.io/en/stable/concepts/metrics/summarization_score.html

QA score, higher is better. range is (0, 1).

but, consiseness score, lower is better. range is (0, infinity).

so, summarization score = (QA score + conciseness score)/2, It's unreasonable.

I think the "consiseness score" should be converted into a valut that range is (0, 1), and the larger the better.

Ragas version: stable

_R-280

dosubot · 2024-07-18T07:27:26Z

Hey @ofooo! I'm here to help you with your bug and answer any questions you have while we wait for a human maintainer. Let's solve this together!

To address the issue with the Summarization Score formula, you can adjust the conciseness score to fit within the 0 to 1 range and ensure that a higher score is better. The current implementation in the Ragas framework already does this. Here is the relevant code:

def _compute_conciseness_score(self, text, summary) -> float:
    """Returns the conciseness score of the summary. This is calculated as
    (1- relative_length_of_summary), where relative_length_of_summary is the
    ratio of the length of the summary to the length of the original text.
    This promotes shorter summaries.
    """
    return 1 - (len(summary) / len(text))

This method calculates the conciseness score as 1 - (len(summary) / len(text)), ensuring that the score is within the 0 to 1 range and that a higher score indicates a more concise summary ^[1].

Additionally, the Summarization Score is calculated as the average of the QA score and the conciseness score:

def _compute_score(self, scores) -> float:
    return sum(scores) / len(scores)

These scores are combined in the _ascore method:

async def _ascore(self, row: Dict, callbacks: Callbacks, is_async: bool) -> float:
    text: str = "\n".join(row["contexts"])
    summary: str = row["summary"]
    keyphrases = await self._extract_keyphrases(text, callbacks, is_async)
    questions = await self._get_questions(text, keyphrases, callbacks, is_async)
    answers = await self._get_answers(questions, summary, callbacks, is_async)

    scores = []
    qa_score = self._compute_qa_score(answers)
    scores.append(qa_score)
    if self.length_penalty:
        conciseness_score = self._compute_conciseness_score(text, summary)
        scores.append(conciseness_score)
    return self._compute_score(scores)
}

This ensures that both the QA score and the conciseness score are within the 0 to 1 range, making the Summarization Score meaningful and balanced ^[1].

To continue talking to Dosu, mention @dosu.

shahules786 · 2024-07-22T06:59:41Z

Hey @ofooo good point. I will take care of it this week.

sky-2002 · 2024-08-15T10:10:53Z

Hey @ofooo , you are absoultely correct, In the code, its correct, I missed this in docs, I have updated the docs
But yes, we need to handle cases where summary length is higher than text length, we can straightaway assign 0 score maybe

shahules786 · 2024-08-15T10:27:36Z

Thanks for pitching in @sky-2002 and pointing out the issue @ofooo
@sky-2002 can you fix that edge case in the summarisation metric?
Another point I would like to make is summaries can be hard to score in general - even humans struggle with it. In coming versions, we will add support to ranking based metrics (it's easier to rank such tasks)

sky-2002 · 2024-08-15T13:59:19Z

@shahules786 interesting points you made, we can discuss these in the above PR I created or on discord, there are more points to discuss on summarization,
@ofooo you can also join the discussion in the PR, suggestions are welcome

shahules786 · 2024-08-16T13:20:59Z

Hey @ofooo @sky-2002 An easy and intuitive fix for this is modify conciseness score as

conciseness score = min(length of summary, length of context) / (length of context + 1)
thereby mapping into range of (0,1)

Then since a conciseness score, lower is better , we take (1 - consiseness score) when combining it with the QA score

So final will be = [QA Score + (1 - consiseness_score) ] / 2

One more suggestion, here we have assumed both to be QA and consiness to be equal weightage, but ideally user should be able to control it. So adding an extra argument as coeff (which is 0-1) , the score would be

score = coeff * QA Score + (1-coeff) *(1 - consiseness_score)

How does that look guys?

sky-2002 · 2024-08-16T13:56:21Z

Great idea @shahules786 , I had proposed this weighing initially but we didnt want too much work on the user side, but anyways, its good to let user control it, fixing it right away

@shahules786

This PR adds a fix for the issue mentioned in #1108 However I have a points to discuss @shahules786 : - I had added `conciseness_score` to penalize long summaries, but I also do not want to promote very very short and skimpy summaries, need to find a middle ground. - Is `averaging` a good way to combine `QA_score` and `conciseness_score`? - Ranking based metrics to measure quality of summarization (as mentioned by shahul in the above issue) Given the conclusions we reach based on these discussion points, I will push more commits, let's keep this PR open till we resolve these points. --------- Co-authored-by: Shahules786 <[email protected]>

ofooo added the bug Something isn't working label Jul 18, 2024

ofooo changed the title ~~Summarization Score Formula error~~ Summarization Score Formula is unreasonable Jul 18, 2024

shahules786 self-assigned this Jul 22, 2024

jjmachan added the linear Created by Linear-GitHub Sync label Jul 31, 2024

jjmachan added this to the v0.1.13 milestone Jul 31, 2024

jjmachan changed the title ~~Summarization Score Formula is unreasonable~~ [R-279] Summarization Score Formula is unreasonable Jul 31, 2024

jjmachan changed the title ~~[R-279] Summarization Score Formula is unreasonable~~ [R-280] Summarization Score Formula is unreasonable Jul 31, 2024

jjmachan changed the title ~~[R-280] Summarization Score Formula is unreasonable~~ [R-279] [R-280] Summarization Score Formula is unreasonable Jul 31, 2024

sky-2002 mentioned this issue Aug 15, 2024

[FIX] - Fix for summarization edge case #1201

Merged

jjmachan linked a pull request Aug 27, 2024 that will close this issue

[FIX] - Fix for summarization edge case #1201

Merged

jjmachan closed this as completed in #1201 Aug 27, 2024

jjmachan modified the milestones: v0.1.13, v.18 Aug 27, 2024

jjmachan closed this as not planned Won't fix, can't repro, duplicate, stale Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[R-279] [R-280] Summarization Score Formula is unreasonable #1108

[R-279] [R-280] Summarization Score Formula is unreasonable #1108

ofooo commented Jul 18, 2024 •

edited by jjmachan

Loading

dosubot bot commented Jul 18, 2024

shahules786 commented Jul 22, 2024

sky-2002 commented Aug 15, 2024

shahules786 commented Aug 15, 2024

sky-2002 commented Aug 15, 2024

shahules786 commented Aug 16, 2024 •

edited

Loading

sky-2002 commented Aug 16, 2024 •

edited

Loading

[R-279] [R-280] Summarization Score Formula is unreasonable #1108

[R-279] [R-280] Summarization Score Formula is unreasonable #1108

Comments

ofooo commented Jul 18, 2024 • edited by jjmachan Loading

dosubot bot commented Jul 18, 2024

shahules786 commented Jul 22, 2024

sky-2002 commented Aug 15, 2024

shahules786 commented Aug 15, 2024

sky-2002 commented Aug 15, 2024

shahules786 commented Aug 16, 2024 • edited Loading

sky-2002 commented Aug 16, 2024 • edited Loading

ofooo commented Jul 18, 2024 •

edited by jjmachan

Loading

shahules786 commented Aug 16, 2024 •

edited

Loading

sky-2002 commented Aug 16, 2024 •

edited

Loading