-
Notifications
You must be signed in to change notification settings - Fork 829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-279] [R-280] Summarization Score Formula is unreasonable #1108
Comments
Hey @ofooo! I'm here to help you with your bug and answer any questions you have while we wait for a human maintainer. Let's solve this together! To address the issue with the Summarization Score formula, you can adjust the conciseness score to fit within the 0 to 1 range and ensure that a higher score is better. The current implementation in the Ragas framework already does this. Here is the relevant code: def _compute_conciseness_score(self, text, summary) -> float:
"""Returns the conciseness score of the summary. This is calculated as
(1- relative_length_of_summary), where relative_length_of_summary is the
ratio of the length of the summary to the length of the original text.
This promotes shorter summaries.
"""
return 1 - (len(summary) / len(text)) This method calculates the conciseness score as Additionally, the Summarization Score is calculated as the average of the QA score and the conciseness score: def _compute_score(self, scores) -> float:
return sum(scores) / len(scores) These scores are combined in the async def _ascore(self, row: Dict, callbacks: Callbacks, is_async: bool) -> float:
text: str = "\n".join(row["contexts"])
summary: str = row["summary"]
keyphrases = await self._extract_keyphrases(text, callbacks, is_async)
questions = await self._get_questions(text, keyphrases, callbacks, is_async)
answers = await self._get_answers(questions, summary, callbacks, is_async)
scores = []
qa_score = self._compute_qa_score(answers)
scores.append(qa_score)
if self.length_penalty:
conciseness_score = self._compute_conciseness_score(text, summary)
scores.append(conciseness_score)
return self._compute_score(scores)
} This ensures that both the QA score and the conciseness score are within the 0 to 1 range, making the Summarization Score meaningful and balanced [1]. |
Hey @ofooo good point. I will take care of it this week. |
Hey @ofooo , you are absoultely correct, In the code, its correct, I missed this in docs, I have updated the docs |
Thanks for pitching in @sky-2002 and pointing out the issue @ofooo |
@shahules786 interesting points you made, we can discuss these in the above PR I created or on discord, there are more points to discuss on summarization, |
Hey @ofooo @sky-2002 An easy and intuitive fix for this is modify conciseness score as
Then since a conciseness score, lower is better , we take (1 - consiseness score) when combining it with the QA score So final will be = One more suggestion, here we have assumed both to be QA and consiness to be equal weightage, but ideally user should be able to control it. So adding an extra argument as
How does that look guys? |
Great idea @shahules786 , I had proposed this weighing initially but we didnt want too much work on the user side, but anyways, its good to let user control it, fixing it right away |
This PR adds a fix for the issue mentioned in #1108 However I have a points to discuss @shahules786 : - I had added `conciseness_score` to penalize long summaries, but I also do not want to promote very very short and skimpy summaries, need to find a middle ground. - Is `averaging` a good way to combine `QA_score` and `conciseness_score`? - Ranking based metrics to measure quality of summarization (as mentioned by shahul in the above issue) Given the conclusions we reach based on these discussion points, I will push more commits, let's keep this PR open till we resolve these points. --------- Co-authored-by: Shahules786 <[email protected]>
I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug
I read the doc for summarization Score.
https://docs.ragas.io/en/stable/concepts/metrics/summarization_score.html
QA score, higher is better. range is (0, 1).
but, consiseness score, lower is better. range is (0, infinity).
so, summarization score = (QA score + conciseness score)/2, It's unreasonable.
I think the "consiseness score" should be converted into a valut that range is (0, 1), and the larger the better.
Ragas version: stable
R-280
The text was updated successfully, but these errors were encountered: