CodeElo

This repository is used to evaluate a model's competition-level code generation abilities on CodeForces with human-comparable Elo ratings and percentiles among humans, using the method proposed in CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings.

Important

We have open-sourced all of the Elo calculation logic and ranking methods. The BASE_URL provided here points to our automated submission system. In order to prevent meaningless mass submissions and to comply with CodeForces policies, we require verified submissions. Due to ethical considerations, you need to agree to the AGREEMENT to obtain a TOKEN and BASE_URL to use the repository. Please fill in the blanks and email the letter to [email protected], and we will review it and respond as soon as possible. If you prefer not to use our automated system, you are free to implement your own submission mechanism by configuring the interfaces in api.py.

Quick Start

Send a request via email to obtain your access TOKEN, then set TOKEN variable in environment.

export TOKEN="your_actual_token" # replace with your actual token
export BASE_URL="your_base_url" # replace with base url

To test a local model, you need first host an LLM server. Here's an example:
```
vllm serve Qwen/Qwen2.5-Coder-7B-Instruct
```
If you're testing models via a third-party API, you can modify the get_response function with your custom calling method in llm_client.
To test the model, use the following command:
```
python main.py --model Qwen/Qwen2.5-Coder-7B-Instruct \
    --bid 2000 --eid 2030
```
This command will test all eligible contests with IDs ranging from 2000 to 2030.

Citation

@article{quan2025codeelo,
  title={CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings},
  author={Quan, Shanghaoran and Yang, Jiaxi and Yu, Bowen and Zheng, Bo and Liu, Dayiheng and Yang, An and Ren, Xuancheng and Gao, Bofei and Miao, Yibo and Feng, Yunlong and others},
  journal={arXiv preprint arXiv:2501.01257},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
AGREEMENT		AGREEMENT
README.md		README.md
api.py		api.py
calc_rating.py		calc_rating.py
llm_client.py		llm_client.py
main.py		main.py
sorted_ratings.json		sorted_ratings.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeElo

Quick Start

Citation

About

Releases

Packages

Languages

QwenLM/CodeElo

Folders and files

Latest commit

History

Repository files navigation

CodeElo

Quick Start

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages