-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return overall_score from MTBenchBranch.judge_answers() #138
return overall_score from MTBenchBranch.judge_answers() #138
Conversation
@sallyom FYI on the api change |
Note: This can't merge until a change is made to the cli to allow for the 2 and 3 param tuple return at the same time. |
This commit is in preparation of a commit to the eval library [1] that returns the overall score from MT-Bench-Branch judgement. [1] instructlab/eval#138 Signed-off-by: Ali Maredia <[email protected]>
2e8afd7
to
510478a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just need to fix the formatting error
This commit is in preparation of a commit to the eval library [1] that returns the overall score from MT-Bench-Branch judgement. [1] instructlab/eval#138 Signed-off-by: Ali Maredia <[email protected]>
This allows the overall_score to be shown by callers of the library along with qa pairs and the error rate. This commit changes what a function in the library returns and thus is not backwards compatible. Signed-off-by: Ali Maredia <[email protected]>
510478a
to
219bca1
Compare
Signed-off-by: Ali Maredia <[email protected]>
qa_pairs Question and answer pairs (with scores) from the evaluation | ||
error_rate percentage of questions dropped due to errors during evaluation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do make another change to this commit, this is also missing for the mt_bench case above
This allows the overall_score to be shown by
callers of the library along with qa pairs
and the error rate.