Here we provide general instructions for running our verifier on VNN-COMP benchmarks using the latest version of our verifier.
We assume conda
is available on your system; if not, you should first install
miniconda.
First, clone and install our α,β-CROWN verifier:
# Clone verifier code.
git clone --recursive https://github.com/Verified-Intelligence/alpha-beta-CROWN.git
cd alpha-beta-CROWN
# Remove the old environment, if necessary.
conda deactivate; conda env remove --name alpha-beta-crown
# Create conda environment with fixed package versions.
conda env create -f complete_verifier/environment.yaml --name alpha-beta-crown
conda activate alpha-beta-crown
# Get python installation path. You will need to use this path in later steps.
export VNNCOMP_PYTHON_PATH=$(python -c 'import os; print(os.path.dirname(os.path.realpath("/proc/self/exe")))')
echo "Please run \"export VNNCOMP_PYTHON_PATH=${VNNCOMP_PYTHON_PATH}\" before you run vnncomp scripts."
Gurobi should be automatically installed via the command above, and you need to
activate it using the grbgetkey
command. If you work in academia you can get an academic licenses
from here. A Gurobi license is required for
eran
, mnistfc
, marabou-cifar10
and verivital
benchmarks; other
benchmarks do not depend on Gurobi.
CPLEX is needed for the oval21
benchmark. It is free for students and
academics from
here.
# Install IBM CPLEX >= 22.1.0
# Download from https://community.ibm.com/community/user/datascience/blogs/xavier-nodet1/2020/07/09/cplex-free-for-students
chmod +x cplex_studio2210.linux_x86_64.bin # Any version >= 22.1.0 should work. Change executable name here.
# You can directly run the installer: ./cplex_studio2210.linux_x86_64.bin; the response.txt created below is for non-interactive installation.
cat > response.txt <<EOF
INSTALLER_UI=silent
LICENSE_ACCEPTED=true
EOF
sudo ./cplex_studio2210.linux_x86_64.bin -f response.txt
# Build the C++ code for CPLEX interface. Assuming we are still inside the alpha-beta-CROWN folder.
sudo apt install build-essential # A modern g++ (>=8.0) is required to compile the code.
# Change CPX_PATH in complete_verifier/cuts/CPLEX_cuts/Makefile if you installed CPlex to a non-default location, like inside your home folder.
make -C complete_verifier/cuts/CPLEX_cuts/
The DNNV package is needed for converting the VGG models in VNN-COMP 2022 and VNN-COMP 2023.
pip install -U --no-deps git+https://github.com/dlshriver/DNNV.git@4d4b124bd739b4ddc8c68fed1af3f85b90386155#egg=dnnv
Clone the VNN-COMP benchmarks and scripts:
git clone https://github.com/stanleybak/vnncomp2021.git
git clone https://github.com/ChristopherBrix/vnncomp2022_benchmarks.git
# Unzip and download necessary files
(cd vnncomp2022_benchmarks; ./setup.sh)
git clone https://github.com/ChristopherBrix/vnncomp2023_benchmarks.git
# Unzip and download necessary files
(cd vnncomp2023_benchmarks; ./setup.sh)
Then, change directory to the folder for the year you want to use. For example, for VNN-COMP 2023, run:
cd vnncomp2023_benchmarks
To run with the VNN-COMP scripts, provide the correct paths to the
verifier and conda environment via setting the VNNCOMP_PYTHON_PATH
environment
variable. Here we assume the alpha-beta-crown conda environment is installed
to ${HOME}/miniconda3/envs/alpha-beta-crown
(the command in the section above
should print the right value for VNNCOMP_PYTHON_PATH
), and the verifier code
is cloned to ${HOME}/alpha-beta-CROWN
:
# Please set this environment variable properly, see above instructions
export VNNCOMP_PYTHON_PATH=${HOME}/miniconda3/envs/alpha-beta-crown/bin
# Please check the path to the alpha-beta-CROWN repository and change it accordingly
./run_all_categories.sh v1 ${HOME}/alpha-beta-CROWN/vnncomp_scripts $(pwd) results_vit.csv counterexamples_vit "vit" all # Example to run the vit benchmark
Then results csv for the vit
benchmark will be saved in
results_vit.csv
. Change the benchmark name to run other
benchmarks. Note that in the vnncomp_scripts/prepare_instance.sh
script we
kill all python processes before each run, and you may want to comment out
these killall
commands if you have other python processes running at the same
time.
We can also run multiple benchmarks at one time:
# For VNN-COMP 2021
./run_all_categories.sh v1 ${HOME}/alpha-beta-CROWN/vnncomp_scripts $(pwd) all_results.csv ./counterexamples "acasxu cifar10_resnet cifar2020 eran marabou-cifar10 mnistfc nn4sys oval21 verivital" all 2>&1 | tee stdout.log
# For VNN-COMP 2022
./run_all_categories.sh v1 ${HOME}/alpha-beta-CROWN/vnncomp_scripts $(pwd) all_results.csv ./counterexamples "oval21 tllverifybench carvana_unet_2022 cifar_biasfield collins_rul_cnn mnist_fc nn4sys reach_prob_density vggnet16_2022 rl_benchmarks sri_resnet_a sri_resnet_b cifar100_tinyimagenet_resnet acasxu cifar2020" all 2>&1 | tee stdout.log
# For VNN-COMP 2023
./run_all_categories.sh v1 ${HOME}/alpha-beta-CROWN/vnncomp_scripts $(pwd) all_results.csv ./counterexamples "acasxu cgan collins_yolo_robustness metaroom nn4sys tllverifybench vggnet16 yolo cctsdb_yolo collins_rul_cnn dist_shift ml4acopf test traffic_signs_recognition vit" all 2>&1 | tee stdout.log
Alternatively, there is a more user-friendly way to run each benchmark by
loading a specific config file, for example to run all properties of cifar100_small
model in cifar100_tinyimagenet_resnet
benchmark from VNN-COMP 2022:
cd alpha-beta-CROWN/complete_verifier
python abcrown.py --config exp_configs/vnncomp22/cifar100_small_2022.yaml
This command will run multiple properties by only calling our verifier once
to avoid import/compiling overhead and should produce same final verification results
with less running time compare to those with the ./run_all_categories.sh
scripts
(in the competitions, the overhead was measured and excluded from scoring).
All the config files are stored at exp_configs/vnncomp21
, exp_configs/vnncomp22
,
exp_configs/vnncomp23
for the three years of VNN-COMP respectively.
The original code used in the competitions can be found at:
However, it is always recommended using the latest master version to run the benchmarks, especially when compared to a newly developed verifier.