Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The avg accuracy on CIFAR100 50steps #19

Open
jmin0530 opened this issue Nov 17, 2022 · 1 comment
Open

The avg accuracy on CIFAR100 50steps #19

jmin0530 opened this issue Nov 17, 2022 · 1 comment

Comments

@jmin0530
Copy link

jmin0530 commented Nov 17, 2022

Hello, Thank you for your code.
I used the setting of dytox for 50 steps, but I got a different results from your paper.

I ran cli command below

bash train.sh 0,1 \
    --options options/data/cifar100_2-2.yaml options/data/cifar100_order1.yaml options/model/cifar_dytox.yaml \
    --name dytox \
    --data-path MY_PATH_TO_DATASET \
    --output-basedir PATH_TO_SAVE_CHECKPOINTS \
    --memory-size 1000

According to your paper, your result on CIFAR-100 50 steps is "Avg acc: 64.82, Last acc: 45.61"
Here is the three CIFAR-100 orders reproduction result:

Also I will show dytox setting to you

DyTox, for CIFAR100

Model definition

model: convit
embed_dim: 384
depth: 6
num_heads: 12
patch_size: 4
input_size: 32
local_up_to_layer: 5
class_attention: true

Training setting

no_amp: true
eval_every: 50

Base hyperparameter

weight_decay: 0.000001
batch_size: 128
incremental_batch_size: 256
incremental_lr: 0.0005
rehearsal: icarl_all

Knowledge Distillation

auto_kd: true

Finetuning

finetuning: balanced
finetuning_epochs: 20
ft_no_sampling: true

Dytox model

dytox: true
freeze_task: [old_task_tokens, old_heads]
freeze_ft: [sab]

Divergence head to get diversity

head_div: 0.1
head_div_mode: tr

Independent Classifiers

ind_clf: 1-1
bce_loss: true

Advanced Augmentations, here disabled

Erasing

reprob: 0.0
remode: pixel
recount: 1
resplit: false

MixUp & CutMix

mixup: 0.0
cutmix: 0.0

I can't understand why my reproduction results differ from the results you wrote in your paper.
Thank you.

@arthurdouillard
Copy link
Owner

See https://github.com/arthurdouillard/dytox/blob/main/erratum_distributed.md

You probably want to use global memory and 2k memory.

If you use distributed memory with 1k, your effective memory size is rather low (much lower than 2k).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants