This repository implements 'Self-supervised Video Representation Learning via Capturing Semantic Changes Indicated by Saccades (TCSVT2023)'.
Qiuxia Lai, Ailing Zeng, Ye Wang, Lihong Cao, Yu Li, Qiang Xu.
- python 3.7.12
- torch 1.13.0+cu116
- torchvision 0.14.0+cu116
- cudatoolkit 11.6.0
- tensorboard
- tensorboardX
- accimage (official github)
- faiss-gpu (official github)
- Pillow
- opencv
- scipy
- tqdm
Our models are pre-trained on UCF101 split 1 with RGB data only. Linear probe and fine-tuning are performed on UCF101 and HMDB51.
- Pre-trained models can be downloaded from this link.
- Linear probe models can be downloaded from this link.
- Finetune models can be downloaded from this link.
The datasets are arranged as follows, where <base_path> is defined in config.py
.
Note that all the split
folders are available at data/<dataset_name>/
.
<base_path>/DataSets/
|---UCF101/
| |---jpegs_256/
| | |---<video1>/
| | | |---XXX.jpg
| | | |--- ...
| | |---<video2>/
| | |--- ...
| |---split/
| | |---ClassInd.txt
| | |---trainlist01.txt
| | |---testlist01.txt
| | |--- ...
| |---scan_idx_7/
| | |---v_ApplyEyeMakeup_g01_c01.txt
| | |---v_ApplyEyeMakeup_g01_c02.txt
| | |---v_ApplyEyeMakeup_g01_c03.txt
| | |--- ...
|
|---HMDB51/
| |---jpegs_256/
| | | |---<video1>/
| | | | |---XXX.jpg
| | | | |--- ...
| | | |---<video2>/
| | | |--- ...
| |---split/
| | |---ClassInd.txt
| | |---trainlist01.txt
| | |---testlist01.txt
| | |--- ...
UCF101 with scanpath data can be downloaded from this BaiduDisk link. Our files are obtained from this repo. Official files can be downloaded from this link.
- Download three splits of the
zip
file. Unzip together, and got folderjpegs_256
. Put it into<base_path>\DataSets\UCF101
. - Download
scan_idx_7.zip
file. Unzip and Put it into<base_path>\DataSets\UCF101
.
The scanpath data is generated using G-Eymol (TPAMI 2019).
HMDB51 can be downloaded from this BaiduDisk link. Our files are obtained from this repo. Official files can be downloaded from this link.
- Download the
zip
file. Unzip together, and got folderjpegs_256
. Put it into<base_path>\DataSets\HMDB51
.
# R3D
python train_ssl_full.py --fs_warmup_epoch 241 --cluster_freq 5 --num_clusters 1500 1500 1500 --dataset=ucf101 --batch_size 16 --save_freq 60 --lr_ratio 0.001 --epochs 300 --learning_rate 0.1 --lr_decay_epochs 90 180 240 --gpu-id 2 \
--focus_num 3 --grid_num 7 --deduplicate True --margin_h 12
# R21D
python train_ssl_full.py --fs_warmup_epoch 301 --cluster_freq 5 --num_clusters 1500 1500 1500 --dataset=ucf101 --batch_size 8 --save_freq 60 --lr_ratio 0.001 --epochs 360 --learning_rate 0.1 --lr_decay_epochs 90 180 240 --gpu-id 3 \
--model r21d --pro_p 4 --focus_num 3 --grid_num 7 --deduplicate True --margin_h 12
# R3D
python train_ssl_full.py --fs_warmup_epoch 241 --cluster_freq 5 --num_clusters 1500 1500 1500 --dataset=ucf101 --batch_size 16 --save_freq 60 --lr_ratio 0.001 --epochs 300 --learning_rate 0.0008 --lr_decay_epochs 90 180 240 --gpu-id 1 \
--evaluate --focus_num 3 --grid_num 7 --deduplicate True --margin_h 12 --model_name <model_name> --resume <your_best_ckpt>.pth
# R21D
python train_ssl_full.py --fs_warmup_epoch 301 --cluster_freq 5 --num_clusters 1500 1500 1500 --dataset=ucf101 --batch_size 8 --save_freq 60 --lr_ratio 0.001 --epochs 360 --learning_rate 0.0008 --lr_decay_epochs 90 180 240 --gpu-id 0 \
--evaluate --model r21d --pro_p 4 --focus_num 3 --grid_num 7 --deduplicate True --margin_h 12 --model_name <model_name> --resume <your_best_ckpt>.pth
# ucf101
python downstream.py --finetune False --dropout 0.7 --seed 42 --model_postfix _dp_0_7_s42 --lr 0.1 --ft_lr 0.1 --epochs 200 --lr_decay_epochs 60 120 160 --lr_decay_rate 0.1 --batch_size 32 \
--focus_init <model_name>/<your_best_ckpt>.pth --gpu-id 0
# hmdb51
python downstream.py --dataset hmdb51 --finetune False --dropout 0.7 --seed 42 --model_postfix _dp_0_7_s42_hmdb --lr 0.1 --ft_lr 0.1 --epochs 200 --lr_decay_epochs 60 120 160 --lr_decay_rate 0.1 --batch_size 32 \
--focus_init <model_name>/<your_best_ckpt>.pth --gpu-id 1
python downstream.py --finetune True --dropout 0.7 --lr 0.1 --ft_lr 0.1 --epochs 200 --lr_decay_epochs 60 120 160 --lr_decay_rate 0.1 --batch_size 32 \
--final_bn False --final_norm False --focus_init <model_name>/<your_best_ckpt>.pth --gpu-id 0
If you find this repository useful, please consider citing the following reference.
@ARTICLE{lai2023self,
title={Self-supervised video representation learning via capturing semantic changes indicated by saccades},
author={Qiuxia Lai and Ailing Zeng and Ye Wang and Lihong Cao and Yu Li and Qiang Xu},
journal={IEEE Trans. on Circuits and Systems for Video Technology},
year={2023}
}
Qiuxia Lai: ashleylqxat
gmail.com | qxlaiat
cuc.edu.cn