-
-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gradient Clipping #596
Comments
I know there's multiple ways to clip gradients (e.g. pytorch has clip_grad_norm_ and clip_grad_value_. Do we know if one of these is more widely used than the other? |
I think clip_grad_norm_ is more widely used, however it is also more complex, as it takes the norm of all the gradients first. clip_grad_value_ is used less, but is far more straightforward to implement so I think it makes sense to add that first. |
It should be possible to implement a general |
That seems like all that would be needed for the clip_grad_value_ |
pytorch implementations of the above are pretty straightforward https://github.com/pytorch/pytorch/blob/master/torch/nn/utils/clip_grad.py I would say clip_grad_norm would be required to go through TensorCollection api so:
model.clip_grad_norm(&mut grads, 0.5);
model.clip_grad_value(&mut grads, 0.5); Then we could implement clip_grad_norm with two passes with RecursiveWalker:
If we wanted this all to be in-place:
Also separately, the |
Has any work been done on this? |
I've submitted a draft PR, and once the examples are added I'll mark as ready for review. But so far I think it's working correctly, I've been able to avoid exploding gradients. |
When training large or deep models, exploding gradients are frequent and cause instability. Clipping them to a certian small amount is an effective way of stabilizing training.
To implement this, I believe a method on the
Gradients
struct would be needed (correct me if I'm wrong)The text was updated successfully, but these errors were encountered: