You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have used -1.0, and - unless mistaken - your paper mentions smooth-rates up (down) to -6.0 (see Table 3).
Actual noise 0.20
over all noise rate is 0.20076
building model...
building model done
Epoch [1/200], Iter [50/390], Loss: 4.4735
Epoch [1/200], Iter [100/390], Loss: -2.9571
Epoch [1/200], Iter [150/390], Loss: -7.3548
Epoch [1/200], Iter [200/390], Loss: 10.6610
Epoch [1/200], Iter [250/390], Loss: 29.8863
Epoch [1/200], Iter [300/390], Loss: 24.9580
Epoch [1/200], Iter [350/390], Loss: -24.6937
previous_best 0.0
test acc on test images is 12.7
Epoch [2/200], Iter [50/390], Loss: -5.3613
Epoch [2/200], Iter [100/390], Loss: -5.3274
Epoch [2/200], Iter [150/390], Loss: -77.6080
Epoch [2/200], Iter [200/390], Loss: -1549.1436
Epoch [2/200], Iter [250/390], Loss: -199.6069
Epoch [2/200], Iter [300/390], Loss: -16850.7461
Epoch [2/200], Iter [350/390], Loss: -2025.1484
previous_best 12.7
test acc on test images is 13.66
Epoch [3/200], Iter [50/390], Loss: -4315.8242
Epoch [3/200], Iter [100/390], Loss: 11547.2734
Epoch [3/200], Iter [150/390], Loss: 249404.2500
Epoch [3/200], Iter [200/390], Loss: -78784.7734
Epoch [3/200], Iter [250/390], Loss: -60295.2031
Epoch [3/200], Iter [300/390], Loss: 679887.3750
Epoch [3/200], Iter [350/390], Loss: 3432846.2500
previous_best 13.66
Is it the intended behavior? If not, what command should I run to train a model with a negative smoothing rate? Ideally, I would like to use your method in the "noise-free" setting.
I have looked at Appendix D.2 of your paper, which you mentioned in #1, but I don't find information that could help reproduce Table 3. I would be completely fine with an accuracy between 88% and 92% (mentioned in D.2), but it seems that - currently - the training itself diverges. Do you think that it could be due to different PyTorch versions?
Many thanks, and have a great day!
The text was updated successfully, but these errors were encountered:
As discussed in the paper, direct training with NLS could be unstable.
This is mainly because NLS relies on a relatively well-pre-trained model.
As a practical implementation, the training of NLS is recommended to warm up with the hard label setting and then proceed with negative labels in the later training stage. see here
Thank you for your answer! I have most likely misunderstood what was written in Appendix D.2. I'll try to use negative label-smoothing when fine-tuning from a pre-trained model, and I'll let you know by re-opening this issue if I still encounter some problems.
Hello!
Thanks for your work! Here is what happens when I train a ResNet on CIFAR-10 using a negative
smooth-rate
with the following command:I have used -1.0, and - unless mistaken - your paper mentions
smooth-rates
up (down) to -6.0 (see Table 3).Is it the intended behavior? If not, what command should I run to train a model with a negative smoothing rate? Ideally, I would like to use your method in the "noise-free" setting.
I have looked at Appendix D.2 of your paper, which you mentioned in #1, but I don't find information that could help reproduce Table 3. I would be completely fine with an accuracy between 88% and 92% (mentioned in D.2), but it seems that - currently - the training itself diverges. Do you think that it could be due to different PyTorch versions?
Many thanks, and have a great day!
The text was updated successfully, but these errors were encountered: