Skip to content

Commit

Permalink
readme
Browse files Browse the repository at this point in the history
  • Loading branch information
sedrick-keh-tri committed Jan 23, 2025
1 parent d19763a commit 2c5291e
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 1 deletion.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,8 +53,10 @@ huggingface-cli login
- **IFEval**: [Instruction following capability evaluation](https://github.com/google-research/google-research/tree/master/instruction_following_eval)
- **AlpacaEval**: [Instruction following evaluation](https://github.com/tatsu-lab/alpaca_eval)
- **HumanEval**: [Code generation and problem solving](https://github.com/openai/human-eval)
- **HumanEvalPlus**: [HumanEval with more test cases](https://github.com/evalplus/evalplus)
- **ZeroEval**: [Logical reasoning and problem solving](https://github.com/WildEval/ZeroEval)
- **MBPP**: [Python programming benchmark](https://github.com/google-research/google-research/tree/master/mbpp)
- **MBPPPlus**: [MBPP with more test cases](https://github.com/evalplus/evalplus)
- **BigCodeBench:** [Benchmarking Code Generation with Diverse Function Calls and Complex Instructions](https://arxiv.org/abs/2406.15877)

> **🚨 Warning:** for BigCodeBench evaluation, we strongly recommend using a Docker container since the execution of LLM generated code on a machine can lead to destructive outcomes. More info is [here](eval/chat_benchmarks/BigCodeBench/README.md).
Expand Down
8 changes: 7 additions & 1 deletion reproduced_benchmarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,4 +68,10 @@
| | | meta-llama/Meta-Llama-3.1-8B-Instruct | instruct (pass@1) | 30.7 | 32.8 | |
| | | | complete (pass@1) | 41.9 | 40.5 | |
| | | Qwen/Qwen2.5-7B-Instruct | instruct (pass@1) | 35.2 | 37.6 | |
| | | | complete (pass@1) | 46.7 | 46.1 | |
| | | | complete (pass@1) | 46.7 | 46.1 | |
|HumanEvalPlus| Sedrick | mistralai/Mistral-7B-Instruct-v0.2 | accuracy (pass@1) | 27.44 | 36.0 | [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html) |
| | | meta-llama/Llama-3.1-8B-Instruct | accuracy (pass@1) | 62.2 | 62.8 | [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html) |
| | | google/codegemma-7b-it | accuracy (pass@1) | 36.6 | 51.8 | [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html) |
| MBPPPlus | Sedrick | mistralai/Mistral-7B-Instruct-v0.2 | accuracy (pass@1) | 43.9 | 37.0 | [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html) |
| | | meta-llama/Llama-3.1-8B-Instruct | accuracy (pass@1) | 58.7 | 55.6 | [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html) |
| | | google/codegemma-7b-it | accuracy (pass@1) | 56.6 | 56.9 | [EvalPlus Leaderboard](https://evalplus.github.io/leaderboard.html) |

0 comments on commit 2c5291e

Please sign in to comment.