MagicDrive video generation. We release this version mainly for reference. Please be prepared to solve any issue. Before getting start, it is necessary for users to setup and understand the code in main
branch.
The environment should be compatible with MagicDrive (single frame). However, this codebase rely on another version of bevfusion (in
third_party
) and some video related python packages.
The code is tested with Pytorch==1.10.2
and torchvision==0.11.3
.
You should have these packages before starting. To install additional packages, follow:
cd ${ROOT}
pip install -r requirements.txt
We opt to install the source code for the following packages, with cd ${FOLDER}; pip install -e .
# install third-party
third_party/
├── bevfusion -> based on db75150
├── diffusers -> based on v0.17.1 (afcca3916)
└── xformers -> (optional) we minorly change 0.0.19 to install with pytorch1.10.2
If you need our xformers, please find it here. Please read FAQ if you encounter any issues.
Our training are based on stable-diffusion-v1-5
We assume you put them at ${ROOT}/../pretrained/
as follows:
{ROOT}/../pretrained/stable-diffusion-v1-5/
├── README.md
├── feature_extractor
├── model_index.json
├── safety_checker
├── scheduler
├── text_encoder
├── tokenizer
├── unet
├── v1-5-pruned-emaonly.ckpt
├── v1-5-pruned.ckpt
├── v1-inference.yaml
└── vae
Pretrained weight of MagicDrive (image generation)
{ROOT}/../MagicDrive-pretrained/
└── SDv1.5mv-rawbox_2023-09-07_18-39_224x400
Please prepare the nuScenes dataset as bevfusion's instructions. Note:
- Run with our forked version of mmdet3d.
- It is better to run generation ONE-BY-ONE to avoid overwrite.
- You have to move
nuscenes_dbinfos_train.pkl
andnuscenes_gt_database
manual from nuscenes root toann_file
folder likenuscenes_mmdet3d
.
After preparation, you should have
${ROOT}/../data/
├── nuscenes
│ ├── ...
│ └── sweeps
└── nuscenes_mmdet3d
Generation ann_file
for video frames (with keyframes / sweeps). We use them to train 7~16-frame video model.
# create `nuscenes_mmdet3d-t-keyframes`
python tools/create_data.py nuscenes \
--root-path ../data/nuscenes --out-dir ../data/nuscenes_mmdet3d-t-keyframes/ \
--extra-tag nuscenes --only_info
# create `nuscenes_mmdet3d-t-use-break`
USE_BREAK=True \
python tools/create_data.py nuscenes \
--root-path ../data/nuscenes --out-dir ../data/nuscenes_mmdet3d-t-use-break/ \
--extra-tag nuscenes --only_info --with_cam_sweeps
The data structure should looks like:
${ROOT}/../data/
├── ...
├── nuscenes_mmdet3d-t-use-break
│ ├── nuscenes_dbinfos_train.pkl -> ../nuscenes_mmdet3d/nuscenes_dbinfos_train.pkl
│ ├── nuscenes_gt_database -> ../nuscenes_mmdet3d/nuscenes_gt_database/
│ ├── nuscenes_infos_train_t6.pkl
│ └── nuscenes_infos_val_t6.pkl
└── nuscenes_mmdet3d-t-keyframes
├── nuscenes_dbinfos_train.pkl -> ../nuscenes_mmdet3d/nuscenes_dbinfos_train.pkl
├── nuscenes_gt_database -> ../nuscenes_mmdet3d/nuscenes_gt_database
├── nuscenes_infos_train.pkl
└── nuscenes_infos_val.pkl
Generation annotations for sweep frames and ann_file
for MagicDrive. We will use them to train 16-frame video models, and video generation for all 13~16 frame models.
- Please follow ASAP to generate
interp
annotations for nuScenes. Simply, the following command should do the work:# in ASAP root. bash scripts/ann_generator.sh 12 --ann_strategy 'interp'
- (Optional) Generate
advanced
annotations for sweeps. (We do not observe major difference betweeninterp
andadvanced
. This step can be skipped.) - Use commands in
scripts/prepare_dataset.sh
to generateann_file
and cache.
You should have
${ROOT}/../data/
├── ...
├── nuscenes
│ ├── advanced_12Hz_trainval
│ ├── interp_12Hz_trainval
│ ├── nuscenes_advanced_12Hz_gt_database
│ └── nuscenes_interp_12Hz_gt_database
└── nuscenes_mmdet3d-12Hz
├── nuscenes_advanced_12Hz_dbinfos_train.pkl
├── nuscenes_advanced_12Hz_infos_train.pkl
├── nuscenes_advanced_12Hz_infos_val.pkl
├── nuscenes_interp_12Hz_dbinfos_train.pkl
├── nuscenes_interp_12Hz_infos_train.pkl
└── nuscenes_interp_12Hz_infos_val.pkl
(Optional but recommended) To accelerate data loading, we prepared cache files in h5 format for BEV maps. They can be generated through tools/prepare_map_aux.py
with config in configs/exp/map_cache_gen.yaml
. You have to rename the cache files correctly after generating them.
${ROOT}/../data/
├── ...
├── nuscenes_map_aux # single frame cache, keyframes also use this.
│ ├── train_26x200x200_map_aux_full.h5
│ ├── train_26x400x400_map_aux_full.h5
│ ├── val_26x200x200_map_aux_full.h5
│ └── val_26x400x400_map_aux_full.h5
├── nuscenes_map_aux_12Hz_adv # from advanced
│ ├── train_26x200x200_12Hz_advanced.h5
│ └── val_26x200x200_12Hz_advanced.h5
├── nuscenes_map_aux_12Hz_int # from interp
│ ├── train_26x200x200_12Hz_interp.h5
│ └── val_26x200x200_12Hz_interp.h5
└── nuscenes_map_cache_t-use-break # with sweep, use break
├── train_8x200x200_map_use-break.h5
└── val_8x200x200_map_use-break.h5
Run training for 224x400 with 7 frames.
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.3.3
Run training for 224x400 with 16 frames.
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.3.4
Run training for 224x400 with 16 frames with sweeps and generated annotations.
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.3
# or
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.4
Typically, train ~80000 steps would be enough.
Our default log directory is ${ROOT}/magicdrive-t-log
. Please be prepared.
Run video generation with 12Hz annotations.
python tools/test.py resume_from_checkpoint=${RUN_LOG_DIR} task_id=${ANY} \
runner.validation_times=4 runner.pipeline_param.init_noise=rand_all \
++dataset.data.val.ann_file=${ROOT}/../data/nuscenes_mmdet3d-12Hz/nuscenes_interp_12Hz_infos_val.pkl
@inproceedings{gao2023magicdrive,
title={{MagicDrive}: Street View Generation with Diverse 3D Geometry Control},
author={Gao, Ruiyuan and Chen, Kai and Xie, Enze and Hong, Lanqing and Li, Zhenguo and Yeung, Dit-Yan and Xu, Qiang},
booktitle = {International Conference on Learning Representations},
year={2024}
}
We adopt following open-sourced projects: