Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FusedAdam optimizer doesn't have set_to_none keyword argument #1453

Open
MaciejBalaNV opened this issue Feb 4, 2025 · 0 comments · May be fixed by #1466
Open

FusedAdam optimizer doesn't have set_to_none keyword argument #1453

MaciejBalaNV opened this issue Feb 4, 2025 · 0 comments · May be fixed by #1466

Comments

@MaciejBalaNV
Copy link

MaciejBalaNV commented Feb 4, 2025

PyTorch Optimizer has set_to_none keyword argument. FusedAdam from TE doesn't have this kwarg, despite inheriting from the torch.optim.Optimizer. It's a broken inheritance protocol and it leads to various issues. For example torch.distributed.checkpoint() assumes set_to_none is present in the Optimizer when initializing the Optimizer states in this code line. Currently it's broken with the TE FusedAdam optimizer.

I understand TE FusedAdam has set_grad_none attribute, but it should still incorporate set_to_none kwargs to zero_grad method, otherwise some PyTorch functionalities are broken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant