You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
PyTorch Optimizer has set_to_none keyword argument. FusedAdam from TE doesn't have this kwarg, despite inheriting from the torch.optim.Optimizer. It's a broken inheritance protocol and it leads to various issues. For example torch.distributed.checkpoint() assumes set_to_none is present in the Optimizer when initializing the Optimizer states in this code line. Currently it's broken with the TE FusedAdam optimizer.
I understand TE FusedAdam has set_grad_none attribute, but it should still incorporate set_to_none kwargs to zero_grad method, otherwise some PyTorch functionalities are broken.
The text was updated successfully, but these errors were encountered:
PyTorch Optimizer has
set_to_none
keyword argument. FusedAdam from TE doesn't have this kwarg, despite inheriting from thetorch.optim.Optimizer
. It's a broken inheritance protocol and it leads to various issues. For exampletorch.distributed.checkpoint()
assumesset_to_none
is present in the Optimizer when initializing the Optimizer states in this code line. Currently it's broken with the TE FusedAdam optimizer.I understand TE FusedAdam has
set_grad_none
attribute, but it should still incorporateset_to_none
kwargs tozero_grad
method, otherwise some PyTorch functionalities are broken.The text was updated successfully, but these errors were encountered: