AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement
AlphaVerus is a novel framework that bootstraps formally verified code generation by iteratively translating programs from a higher-resource language and leveraging feedback from a verifier. The system operates in three main phases:
- Exploration of Candidate Translations: Generates initial code translations
- Treefinement: A novel tree search algorithm for program refinement using verifier feedback
- Filtering: Prevents reward hacking by filtering misaligned specifications and programs
The framework teaches language models to generate verified code without human intervention or model finetuning.
- Clone the repository:
git clone https://github.com/cmu-l3/alphaverus.git
cd alphaverus
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
Create a
.env
file in bothtraining/
andinference/
directory with:
OPENAI_API_KEY=your_api_key_here
VERUS_PATH=path_to_verus_binary
Make sure your model is hosted on localhost with port 30000. We recommend using setting up openai-server in SGLang to host your model.
You can also configure to use exisiting models by setting url path in run_llm_api.py
files in training and inference folders. No other changes would be required.
The training phase bootstraps the model with verified code translations:
-
[Optional] Configure training parameters in
training/config.yaml
-
Run the training controller:
cd training
python controller.py
You can run multiple instances of controller.py in parallel to speed up the training proccess.
This will:
- Manage training iterations and experiment state
- Generate and verify code translations
- Apply Treefinement for error correction
- Filter misaligned specifications, preventing reward hacking attempts
For generating new formally verified code contains 2 steps:
-
Exploration: Handled by
solve.py
-
Treefinement: Handled by
rebase.py
-
Run the exploration step:
cd inference
python solve.py [arguments]
Available arguments for solve.py
:
--SAVE_DIR
: Directory to save generated programs (default: 'solved_programs_hev')--PROMPT_RANDOM_EXAMPLES
: Path to examples file for prompting (default: '../training/iter5.json')--BASE_PATH
: Base path for saved exemplars (default: '.../training/')--PROGRAMS_FILE
: Input programs file (default: '../datasets/hev_4nov.jsonl')--model
: Model to use (default: 'default')--temperature
: Sampling temperature (default: 0.7)--batch_size
: Batch size for processing (default: 32)
Once the exploration step is complete, you can run the treefinement step to refine the incorrect solutions:
python rebase.py [mode] [file_name] [iteration_number]
Arguments for rebase.py
:
mode
: Integer indicating the mode for different variants of treefinement. Default is 0, specifying REBASE (default: 0)file_name
: Path to the input file, on which treefinement needs to be applied. Points to the file generated bysolve.py
(required)iteration_number
: iteration number for training phase (required)
Example:
python rebase.py 0 "solved_programs_hev/program_0.rs" 5
.
├── training/ # Training phase implementation
│ ├── controller.py # Main orchestration of training pipeline, manages parallelization, iterations and experiment state
│ ├── convert_single.py # Handles individual program translations
│ ├── rebase_error_fix.py # Implements Treefinement algorithm for error correction
│ ├── verify_translations.py # Verification and filtering of generated translations
│ ├── run_llm_api.py # Interface for LLM API interactions
│ ├── verus_utils.py # Utilities for Verus verification
│ ├── verus_error_utils.py # Error parsing and handling for Verus
│ ├── exploit_model.py # Code for Exploit Model exploitation detection
│ ├── exploit_prompt.py # Prompt template for Exploit Model
│ ├── utils.py # General utility functions
│ ├── check_lock_files.py # Manages experiment locking mechanism incase of failiures
│ └── config.yaml # Training configuration
│
├── inference/ # Inference phase implementation
│ ├── solve.py # Main inference pipeline for generating verified code
│ ├── rebase.py # Tree search implementation for refining generated code
│ ├── run_llm_api.py # LLM API interface for inference
│ ├── verus_utils.py # Verus verification utilities
│ ├── verus_error_utils.py # Error handling for Verus verification
│ └── extras/ # Additional utilities
│ ├── exploit_autoverus.py
│ ├── openai_contam_test.py
│ └── critic_model.py
├── datasets/
│ ├── hev.jsonl
│ └── mbpp.jsonl
If you use AlphaVerus in your research, please cite:
@misc{aggarwal2024alphaverusbootstrappingformallyverified,
title={AlphaVerus: Bootstrapping Formally Verified Code Generation through Self-Improving Translation and Treefinement},
author={Pranjal Aggarwal and Bryan Parno and Sean Welleck},
year={2024},
eprint={2412.06176},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2412.06176},
}