return overall_score from MTBenchBranch.judge_answers() #138

alimaredia · 2024-09-26T03:59:33Z

This allows the overall_score to be shown by
callers of the library along with qa pairs
and the error rate.

danmcp · 2024-09-26T04:08:07Z

@sallyom FYI on the api change

danmcp · 2024-09-26T04:08:52Z

Note: This can't merge until a change is made to the cli to allow for the 2 and 3 param tuple return at the same time.

This commit is in preparation of a commit to the eval library [1] that returns the overall score from MT-Bench-Branch judgement. [1] instructlab/eval#138 Signed-off-by: Ali Maredia <[email protected]>

danmcp

Looks good, just need to fix the formatting error

This commit is in preparation of a commit to the eval library [1] that returns the overall score from MT-Bench-Branch judgement. [1] instructlab/eval#138 Signed-off-by: Ali Maredia <[email protected]>

This allows the overall_score to be shown by callers of the library along with qa pairs and the error rate. This commit changes what a function in the library returns and thus is not backwards compatible. Signed-off-by: Ali Maredia <[email protected]>

Signed-off-by: Ali Maredia <[email protected]>

danmcp · 2024-09-27T22:54:26Z

src/instructlab/eval/mt_bench.py

            qa_pairs        Question and answer pairs (with scores) from the evaluation
+            error_rate      percentage of questions dropped due to errors during evaluation


If you do make another change to this commit, this is also missing for the mt_bench case above

alimaredia requested review from danmcp and alinaryan September 26, 2024 03:59

mergify bot added the testing Relates to testing label Sep 26, 2024

alimaredia mentioned this pull request Sep 26, 2024

feat: add full scores to mt-bench/mmlu-branch output instructlab/instructlab#2316

Merged

mergify bot added the ci-failure label Sep 26, 2024

alimaredia linked an issue Sep 26, 2024 that may be closed by this pull request

Print full scores along with delta for MMLU-Branch scores instructlab/instructlab#2124

Closed

alimaredia mentioned this pull request Sep 27, 2024

fix: allow MTBenchBranchEvaluator to return a 2 or 3 sized tuple instructlab/instructlab#2330

Merged

6 tasks

alimaredia force-pushed the mtbench-branch-judgement-return-overall-score branch from 2e8afd7 to 510478a Compare September 27, 2024 17:25

mergify bot added ci-failure and removed ci-failure labels Sep 27, 2024

danmcp approved these changes Sep 27, 2024

View reviewed changes

mergify bot added the one-approval label Sep 27, 2024

alimaredia force-pushed the mtbench-branch-judgement-return-overall-score branch from 510478a to 219bca1 Compare September 27, 2024 18:04

mergify bot removed the ci-failure label Sep 27, 2024

nathan-weinberg approved these changes Sep 27, 2024

View reviewed changes

mergify bot removed the one-approval label Sep 27, 2024

alimaredia removed the request for review from alinaryan September 27, 2024 18:20

mergify bot added the ci-failure label Sep 27, 2024

update flag for basic workflow tests

b22b40b

Signed-off-by: Ali Maredia <[email protected]>

mergify bot added CI/CD Affects CI/CD configuration and removed ci-failure labels Sep 27, 2024

alinaryan approved these changes Sep 27, 2024

View reviewed changes

mergify bot added ci-failure and removed ci-failure labels Sep 27, 2024

danmcp reviewed Sep 27, 2024

View reviewed changes

danmcp merged commit 40cc370 into instructlab:main Sep 28, 2024
13 of 14 checks passed

danmcp changed the title ~~return overall_score from MTBenchBranch.generate_judgement()~~ return overall_score from MTBenchBranch.judge_answers() Sep 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

return overall_score from MTBenchBranch.judge_answers() #138

return overall_score from MTBenchBranch.judge_answers() #138

alimaredia commented Sep 26, 2024

danmcp commented Sep 26, 2024

danmcp commented Sep 26, 2024

danmcp left a comment

danmcp Sep 27, 2024

		qa_pairs Question and answer pairs (with scores) from the evaluation
		error_rate percentage of questions dropped due to errors during evaluation

return overall_score from MTBenchBranch.judge_answers() #138

return overall_score from MTBenchBranch.judge_answers() #138

Conversation

alimaredia commented Sep 26, 2024

danmcp commented Sep 26, 2024

danmcp commented Sep 26, 2024

danmcp left a comment

Choose a reason for hiding this comment

danmcp Sep 27, 2024

Choose a reason for hiding this comment