Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Instability of training under negative smoothing values #3

Closed
o-laurent opened this issue Nov 13, 2024 · 2 comments
Closed

🐛 Instability of training under negative smoothing values #3

o-laurent opened this issue Nov 13, 2024 · 2 comments

Comments

@o-laurent
Copy link

o-laurent commented Nov 13, 2024

Hello!

Thanks for your work! Here is what happens when I train a ResNet on CIFAR-10 using a negative smooth-rate with the following command:

python3 main_GLS_direct_train.py --smooth_rate -1.0

I have used -1.0, and - unless mistaken - your paper mentions smooth-rates up (down) to -6.0 (see Table 3).

Actual noise 0.20
over all noise rate is  0.20076
building model...
building model done
Epoch [1/200], Iter [50/390], Loss: 4.4735
Epoch [1/200], Iter [100/390], Loss: -2.9571
Epoch [1/200], Iter [150/390], Loss: -7.3548
Epoch [1/200], Iter [200/390], Loss: 10.6610
Epoch [1/200], Iter [250/390], Loss: 29.8863
Epoch [1/200], Iter [300/390], Loss: 24.9580
Epoch [1/200], Iter [350/390], Loss: -24.6937
previous_best 0.0
test acc on test images is  12.7
Epoch [2/200], Iter [50/390], Loss: -5.3613
Epoch [2/200], Iter [100/390], Loss: -5.3274
Epoch [2/200], Iter [150/390], Loss: -77.6080
Epoch [2/200], Iter [200/390], Loss: -1549.1436
Epoch [2/200], Iter [250/390], Loss: -199.6069
Epoch [2/200], Iter [300/390], Loss: -16850.7461
Epoch [2/200], Iter [350/390], Loss: -2025.1484
previous_best 12.7
test acc on test images is  13.66
Epoch [3/200], Iter [50/390], Loss: -4315.8242
Epoch [3/200], Iter [100/390], Loss: 11547.2734
Epoch [3/200], Iter [150/390], Loss: 249404.2500
Epoch [3/200], Iter [200/390], Loss: -78784.7734
Epoch [3/200], Iter [250/390], Loss: -60295.2031
Epoch [3/200], Iter [300/390], Loss: 679887.3750
Epoch [3/200], Iter [350/390], Loss: 3432846.2500
previous_best 13.66

Is it the intended behavior? If not, what command should I run to train a model with a negative smoothing rate? Ideally, I would like to use your method in the "noise-free" setting.

I have looked at Appendix D.2 of your paper, which you mentioned in #1, but I don't find information that could help reproduce Table 3. I would be completely fine with an accuracy between 88% and 92% (mentioned in D.2), but it seems that - currently - the training itself diverges. Do you think that it could be due to different PyTorch versions?

Many thanks, and have a great day!

@weijiaheng
Copy link
Collaborator

As discussed in the paper, direct training with NLS could be unstable.
This is mainly because NLS relies on a relatively well-pre-trained model.
As a practical implementation, the training of NLS is recommended to warm up with the hard label setting and then proceed with negative labels in the later training stage. see here

@o-laurent
Copy link
Author

Hello @weijiaheng,

Thank you for your answer! I have most likely misunderstood what was written in Appendix D.2. I'll try to use negative label-smoothing when fine-tuning from a pre-trained model, and I'll let you know by re-opening this issue if I still encounter some problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants