Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Constant Learning Rate with Cooldown #35449

Open
LoserCheems opened this issue Dec 29, 2024 · 0 comments · May be fixed by #35453
Open

Support Constant Learning Rate with Cooldown #35449

LoserCheems opened this issue Dec 29, 2024 · 0 comments · May be fixed by #35453
Labels
Feature request Request for a new feature

Comments

@LoserCheems
Copy link

Feature request

In transformers.optimization support constant learning rate with cooldown functions.

Motivation

This method will implement that scaling experiments can be performed with significantly reduced compute and GPU hours by utilizing fewer but reusable training runs.
SmolLM had used this method to train a series of SOTA small language models.
Paper: https://arxiv.org/pdf/2405.18392

Your contribution

I've created a branch, I'm finishing the implementation of these functions, and intend to submit a PR.

@LoserCheems LoserCheems added the Feature request Request for a new feature label Dec 29, 2024
@LoserCheems LoserCheems linked a pull request Dec 29, 2024 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant