You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Large Batch Training of Convolutional Networks with Layer-wise Adaptive Rate Scaling
This is a non-official implementation of the optimizer Layer-wise Adaptive Rate Scaling (LARS) from Ginsburg, Boris, Igor Gitman, and Yang You. "Large batch training of convolutional networks with layer-wise adaptive rate scaling." ICLR'18, and LARGE BATCH OPTIMIZATION FOR DEEP LEARNING:TRAINING BERT IN 76 MINUTES. ICLR'20