Code for paper "C$^2$KD: Bridging the Modality Gap for Cross-Modal Knowledge Distillation".
requirements.txt
Download Original Dataset: CREMA-D, AVE, VGGSound,
For AVE, CREMA-D and VGGSound dataset, we provide code to pre-process videos into RGB frames and audio wav files in the directory utils/data/
.
Detailed descriptions of options can be found in main_overlap_tag.py
- Pre-train the single modality model
- Conduct crossmodal knowledge distillation