-
Notifications
You must be signed in to change notification settings - Fork 494
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring FSDP. #1586
Refactoring FSDP. #1586
Conversation
Hi @AdamLouly, thanks for the PR 🙏 . The changes look good to me. The CI for ORT Training failed among which the Would you mind updating these training examples as well, otherwise we can get this PR in and target a new PR for updating those training examples. cc. @prathikr |
I think we can get this fix in and program another PR for updating those ORT training examples. Hi @AdamLouly, could you rebase the branch, as other maintainers have fixed some CIs which failed. |
With the commits I submitted, examples are running well in the docker container I built locally, but the CI failed with following error:
I will investigate further, but it's irrelevant to the fix so I will merge it anyway. Thanks again for the contribution @AdamLouly ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* refactor fsdp * add trainer * remove hidden layers * update dockerfile --------- Co-authored-by: Adam Louly <[email protected]@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: JingyaHuang <[email protected]>
We are refactoring the FSDP implementation.
Fortunately, there aren't many changes required, as most of the modifications are in the Transformers Trainer. We don't need to duplicate those changes in the Optimum Trainer.