Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

return overall_score from MTBenchBranch.judge_answers() #138

Conversation

alimaredia
Copy link
Contributor

This allows the overall_score to be shown by
callers of the library along with qa pairs
and the error rate.

@danmcp
Copy link
Member

danmcp commented Sep 26, 2024

@sallyom FYI on the api change

@danmcp
Copy link
Member

danmcp commented Sep 26, 2024

Note: This can't merge until a change is made to the cli to allow for the 2 and 3 param tuple return at the same time.

alimaredia added a commit to alimaredia/instructlab that referenced this pull request Sep 27, 2024
This commit is in preparation of a commit to the
eval library [1] that returns the overall score
from MT-Bench-Branch judgement.

[1] instructlab/eval#138

Signed-off-by: Ali Maredia <[email protected]>
@alimaredia alimaredia force-pushed the mtbench-branch-judgement-return-overall-score branch from 2e8afd7 to 510478a Compare September 27, 2024 17:25
@mergify mergify bot added ci-failure and removed ci-failure labels Sep 27, 2024
Copy link
Member

@danmcp danmcp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just need to fix the formatting error

@mergify mergify bot added the one-approval label Sep 27, 2024
danmcp pushed a commit to alimaredia/instructlab that referenced this pull request Sep 27, 2024
This commit is in preparation of a commit to the
eval library [1] that returns the overall score
from MT-Bench-Branch judgement.

[1] instructlab/eval#138

Signed-off-by: Ali Maredia <[email protected]>
This allows the overall_score to be shown by
callers of the library along with qa pairs
and the error rate.

This commit changes what a function in the
library returns and thus is not backwards compatible.

Signed-off-by: Ali Maredia <[email protected]>
@alimaredia alimaredia force-pushed the mtbench-branch-judgement-return-overall-score branch from 510478a to 219bca1 Compare September 27, 2024 18:04
@mergify mergify bot removed the ci-failure label Sep 27, 2024
@mergify mergify bot removed the one-approval label Sep 27, 2024
@alimaredia alimaredia removed the request for review from alinaryan September 27, 2024 18:20
@mergify mergify bot added the ci-failure label Sep 27, 2024
@mergify mergify bot added CI/CD Affects CI/CD configuration and removed ci-failure labels Sep 27, 2024
@mergify mergify bot added ci-failure and removed ci-failure labels Sep 27, 2024
qa_pairs Question and answer pairs (with scores) from the evaluation
error_rate percentage of questions dropped due to errors during evaluation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you do make another change to this commit, this is also missing for the mt_bench case above

@danmcp danmcp merged commit 40cc370 into instructlab:main Sep 28, 2024
13 of 14 checks passed
@danmcp danmcp changed the title return overall_score from MTBenchBranch.generate_judgement() return overall_score from MTBenchBranch.judge_answers() Sep 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration ci-failure testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Print full scores along with delta for MMLU-Branch scores
4 participants