You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In transformers.optimization support constant learning rate with cooldown functions.
Motivation
This method will implement that scaling experiments can be performed with significantly reduced compute and GPU hours by utilizing fewer but reusable training runs. SmolLM had used this method to train a series of SOTA small language models.
Paper: https://arxiv.org/pdf/2405.18392
Your contribution
I've created a branch, I'm finishing the implementation of these functions, and intend to submit a PR.
The text was updated successfully, but these errors were encountered:
Feature request
In
transformers.optimization
supportconstant learning rate with cooldown
functions.Motivation
This method will implement that scaling experiments can be performed with significantly reduced compute and GPU hours by utilizing fewer but reusable training runs.
SmolLM had used this method to train a series of SOTA small language models.
Paper: https://arxiv.org/pdf/2405.18392
Your contribution
I've created a branch, I'm finishing the implementation of these functions, and intend to submit a PR.
The text was updated successfully, but these errors were encountered: