Skip to content
/ CodeElo Public

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Notifications You must be signed in to change notification settings

QwenLM/CodeElo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeElo

This repository is used to evaluate a model's competition-level code generation abilities on CodeForces with human-comparable Elo ratings and percentiles among humans, using the method proposed in CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings.

Important

We have open-sourced all of the Elo calculation logic and ranking methods. The BASE_URL provided here points to our automated submission system. In order to prevent meaningless mass submissions and to comply with CodeForces policies, we require verified submissions. Due to ethical considerations, you need to agree to the AGREEMENT to obtain a TOKEN and BASE_URL to use the repository. Please fill in the blanks and email the letter to [email protected], and we will review it and respond as soon as possible. If you prefer not to use our automated system, you are free to implement your own submission mechanism by configuring the interfaces in api.py.

Quick Start

  1. Send a request via email to obtain your access TOKEN, then set TOKEN variable in environment.

    export TOKEN="your_actual_token" # replace with your actual token
    export BASE_URL="your_base_url" # replace with base url
  2. To test a local model, you need first host an LLM server. Here's an example:

    vllm serve Qwen/Qwen2.5-Coder-7B-Instruct

    If you're testing models via a third-party API, you can modify the get_response function with your custom calling method in llm_client.

  3. To test the model, use the following command:

    python main.py --model Qwen/Qwen2.5-Coder-7B-Instruct \
        --bid 2000 --eid 2030

    This command will test all eligible contests with IDs ranging from 2000 to 2030.

Citation

@article{quan2025codeelo,
  title={CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings},
  author={Quan, Shanghaoran and Yang, Jiaxi and Yu, Bowen and Zheng, Bo and Liu, Dayiheng and Yang, An and Ren, Xuancheng and Gao, Bofei and Miao, Yibo and Feng, Yunlong and others},
  journal={arXiv preprint arXiv:2501.01257},
  year={2025}
}

About

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages