MoDDM: Text-to-Motion Synthesis using Discrete Diffusion Model (BMVC 2023)

[Paper] [Poster] [Video] [Supp]

Generating motions based on text descriptions. Analogy to text-to-image that generate new images from text.

Instructions to setup

conda create -n text2motion python=3.9
conda activate text2motion
# Clone repository recursively
git clone https://github.com/Developer-Zer0/MoDDM-Text-to-Motion-Synthesis-Using-Discrete-Diffusion.git --recurse-submodules
# Install Pytorch 1.10.0 (**CUDA 11.1**)
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
# Install required pacakges
pip install -r requirements.txt
# Install DetUtil 
cd DetUtil
python setup.py develop

Perform single sample inference

API to run single sample inference using trained model on HumanML3D. Edit sample_description.txt to any text description of your choice. Inference does not require GPU and runs completely on CPU within 15 seconds. First run can take additional time to load CLIP.

You need to setup FFMPEG for .mp4 generation. Follow instructions at LINK. After installation, add path to ffmpeg.exe (inside bin folder) in .env (Rename .env.example).
Download autoencoder checkpoint and discrete diffusion checkpoint. Store them under checkpoints/ (Create if doesn't exist).
You will also need to download SMPL_DATA and Deps for the human skeleton transformations and animations. Extract them and store under data/ (Create if doesn't exist) (data/Deps, data/SMPL_DATA).
Run the following script and your human motion .mp4 will be stored in generations/.

python sample_generation.py

Dataset

To get both HumanML3D and KIT-ML dataset, follow instructions at https://github.com/EricGuo5513/HumanML3D. Once downloaded, store at location data/ (Create if doesn't exist). For training and evaluations, you will also need SMPL_DATA and Deps from Step 3 of single sample inference. Default dataset will be the HumanML3D dataset in all experiments. To use the KIT dataset add datamodule=guo-kit-ml.yaml as a parameter in command scripts.

Train Stage 1 Vector Quantized Variational AutoEncoder (VQ-VAE)

You can skip this step by using an autoencoder checkpoint. If you want to skip, copy paste autoencoder_finest.ckpt in the same location and rename is to autoencoder_trained.ckpt.

Train VQ-VAE reconstruction model on HumanML3D (or KIT-ML). Run the following script. All the outputs and checkpoints will be stored in logs/.

 python src/train.py --config-name=train model=vq_vae.yaml model.do_evaluation=false trainer.devices=[1] trainer.max_epochs=500

Setting model.do_evaluation=True will run the evaluator after every epoch to store FID, R-Precision. However, evaluator is a pre-trained model by the work at https://github.com/EricGuo5513/TM2T. You will need to download the pre-trained models from LINK. For HumanML3D evaluator, you need the t2m/text_mot_match/model/finest.tar. Store it at checkpoints/t2m/text_mot_match/model/finest.tar.

KIT-ML pre-trained models are from the above work as well and can be found at LINK. For the KIT-ML evaluator, you need the kit/text_mot_match/model/finest.tar. Store is at checkpoints/kit/text_mot_match/model/finest.tar. Also include eval_ckpt=checkpoints/kit/text_mot_match/model/finest.tar as parameter in script.

Train Stage 2 Discrete Diffusion Model

Discrete Diffusion training on HumanML3D (or KIT-ML). Copy trained autoencoder checkpoint from above step and paste directly into checkpoints/. Rename .ckpt file to autoencoder_trained.ckpt so that stage 2 can load it. All the outputs and checkpoints will be stored in logs/. 3 checkpoints will be created corresponding to the epoch with best validation FID, best validation R-Precision and best validation loss. Run the following command.

 python src/train.py --config-name=train model=vq_diffusion.yaml model.do_evaluation=false trainer.devices=[1] trainer.max_epochs=500

Similar to stage 1 training, setting model.do_evaluation=True will run the evaluator after every epoch to store metrics. Follow above steps to download pre-trained models for HumanML3D (or KIT-ML)

Set logger=tensorboard to get loss and metric plots across epochs.

Benchmark Results

We compare our model to 4 methods: Seq2Seq, Language2Pose, TM2T and Motion Diffusion Model (MDM). Seq2seq and Language2Pose are deterministic motion generation baselines. TM2T utilizes VQ-VAE and recurrent models for text-to-motion synthesis task. MDM uses a conditional diffusion model on raw motions that showed promising motion results.

HumanML3D	KIT-ML

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
DetUtil @ 62d8482		DetUtil @ 62d8482
assets		assets
configs		configs
generations		generations
misc		misc
src		src
tests		tests
tools		tools
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sample_description.txt		sample_description.txt
sample_generation.py		sample_generation.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MoDDM: Text-to-Motion Synthesis using Discrete Diffusion Model (BMVC 2023)

[Paper] [Poster] [Video] [Supp]

Instructions to setup

Perform single sample inference

Dataset

Train Stage 1 Vector Quantized Variational AutoEncoder (VQ-VAE)

Train Stage 2 Discrete Diffusion Model

Benchmark Results

Synthesized Motions

About

Releases

Packages

Languages

Developer-Zer0/MoDDM-Text-to-Motion-Synthesis-Using-Discrete-Diffusion

Folders and files

Latest commit

History

Repository files navigation

MoDDM: Text-to-Motion Synthesis using Discrete Diffusion Model (BMVC 2023)

[Paper] [Poster] [Video] [Supp]

Instructions to setup

Perform single sample inference

Dataset

Train Stage 1 Vector Quantized Variational AutoEncoder (VQ-VAE)

Train Stage 2 Discrete Diffusion Model

Benchmark Results

Synthesized Motions

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages