Support Constant Learning Rate with Cooldown #35449

LoserCheems · 2024-12-29T10:32:11Z

Feature request

In transformers.optimization support constant learning rate with cooldown functions.

Motivation

This method will implement that scaling experiments can be performed with significantly reduced compute and GPU hours by utilizing fewer but reusable training runs.
SmolLM had used this method to train a series of SOTA small language models.
Paper: https://arxiv.org/pdf/2405.18392

Your contribution

I've created a branch, I'm finishing the implementation of these functions, and intend to submit a PR.

The text was updated successfully, but these errors were encountered:

LoserCheems added the Feature request Request for a new feature label Dec 29, 2024

LoserCheems linked a pull request Dec 29, 2024 that will close this issue

Support constant lr with cooldown #35453

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Constant Learning Rate with Cooldown #35449

Support Constant Learning Rate with Cooldown #35449

LoserCheems commented Dec 29, 2024

Support Constant Learning Rate with Cooldown #35449

Support Constant Learning Rate with Cooldown #35449

Comments

LoserCheems commented Dec 29, 2024

Feature request

Motivation

Your contribution