Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support constant lr with cooldown #35453

Open
wants to merge 51 commits into
base: main
Choose a base branch
from

Conversation

LoserCheems
Copy link

@LoserCheems LoserCheems commented Dec 29, 2024

What does this PR do?

Fixes #35449
Added the 'warmup_stable_cooldown' learning rate scheduler method.
This method allows three phases, linear warmup, stable, and cooldown, where cooldown can be done using the linear, cosine and 1-sqrt methods.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@muellerzr and @SunMarc

@SunMarc
Copy link
Member

SunMarc commented Jan 2, 2025

Thanks for the PR @LoserCheems, we already have warmup_stable_decay scheduler with get_wsd_schedule that does practically the same thing as the new scheduler you proposed. I think what you be good is to maybe add more options to the decay stage of the wsd_scheduler as you did in this PR (e.g linear/ 1-sqrt) WDYT ?

@LoserCheems
Copy link
Author

LoserCheems commented Jan 3, 2025

Thanks for the PR @LoserCheems, we already have warmup_stable_decay scheduler with get_wsd_schedule that does practically the same thing as the new scheduler you proposed. I think what you be good is to maybe add more options to the decay stage of the wsd_scheduler as you did in this PR (e.g linear/ 1-sqrt) WDYT ?

oh, thank for your suggestion, I intend to integrate different cooldown methods and minimum learning rate into get_wsd_schedule.
In addition, is it possible to rename get_wsd_schedule to get_wsc_schedule, which is more in line with the three phases of warmup, stable, cooldown.

@SunMarc
Copy link
Member

SunMarc commented Jan 3, 2025

Thanks ! We can't really rename it wsc, wsd is a real term that appears in the MiniCPM paper.

@LoserCheems
Copy link
Author

Thanks ! We can't really rename it wsc, wsd is a real term that appears in the MiniCPM paper.

Thank you, the renaming of the function is done.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR ! Left a comment

src/transformers/optimization.py Outdated Show resolved Hide resolved
src/transformers/optimization.py Outdated Show resolved Hide resolved
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice ! Just a few nits and we should be able to merge this soon ! Thanks for being reactive

src/transformers/optimization.py Outdated Show resolved Hide resolved
src/transformers/optimization.py Outdated Show resolved Hide resolved
src/transformers/optimization.py Show resolved Hide resolved
src/transformers/optimization.py Outdated Show resolved Hide resolved
@LoserCheems
Copy link
Author

Nice ! Just a few nits and we should be able to merge this soon ! Thanks for being reactive

Everything is ready!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ! Thanks for iterating ! Just a few nits

src/transformers/optimization.py Outdated Show resolved Hide resolved
src/transformers/optimization.py Outdated Show resolved Hide resolved
@LoserCheems
Copy link
Author

LoserCheems commented Jan 10, 2025

LGTM ! Thanks for iterating ! Just a few nits

It's done!

@LoserCheems LoserCheems requested a review from SunMarc January 15, 2025 13:14
Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@LoserCheems
Copy link
Author

LoserCheems commented Jan 15, 2025

@SunMarc Thank you, this PR is ready for merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Constant Learning Rate with Cooldown
3 participants