Skip to content

Commit

Permalink
copy
Browse files Browse the repository at this point in the history
  • Loading branch information
paul-gauthier committed Dec 23, 2024
1 parent fbc3f0c commit 87a9643
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
2 changes: 1 addition & 1 deletion aider/website/_data/polyglot_leaderboard.yml
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@

- dirname: 2024-12-21-19-23-03--polyglot-o1-hard-diff
test_cases: 224
model: o1-2024-12-17
model: o1-2024-12-17 (high)
edit_format: diff
commit_hash: a755079-dirty
pass_rate_1: 23.7
Expand Down
6 changes: 3 additions & 3 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@
# Aider benchmark harness

Aider uses benchmarks to quantitatively measure how well it works
various LLMs.
with various LLMs.
This directory holds the harness and tools needed to run the benchmarking suite.

## Background

The benchmark is based on the [Exercism](https://github.com/exercism/python) coding exercises.
This
benchmark evaluates how effectively aider and GPT can translate a
benchmark evaluates how effectively aider and LLMs can translate a
natural language coding request into executable code saved into
files that pass unit tests.
It provides an end-to-end evaluation of not just
GPT's coding ability, but also its capacity to *edit existing code*
the LLM's coding ability, but also its capacity to *edit existing code*
and *format those code edits* so that aider can save the
edits to the local source files.

Expand Down

0 comments on commit 87a9643

Please sign in to comment.