You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello everyone. The ReverseDiff documentation under "Limitations of ReverseDiff" lists the following
Nested differentiation of closures is dangerous. Differentiating closures is safe, and nested differentation is safe, but you might be vulnerable to a subtle bug if you try to do both. See this ForwardDiff issue for details. A fix is currently being planned for this problem.
But it appears as the listed ForwardDiff issue has been resolved. I ran this simple test
ReverseDiff.gradient(x -> x .+ ReverseDiff.gradient(y -> x +2*y,[1]), [1]) # Result=1
Which results in the correct value of 1. It seems as though this issue has been fixed?
As a bit of background, I have been working hard to use a hessian in a Flux loss function (which requires me to take the gradient of this hessian wrt a NN model's weights), but Zygote has a large number of issues with nested differentiation at the moment (not docking them at all, its an amazing development). It appears as though I am able to use ReverseDiff to correctly take the gradient of the hessian wrt the weights using the Flux.destructure function
using Flux,ReverseDiff,ForwardDiff
model =Chain(Dense(4=>2,σ),Dense(2=>1,σ)) # construct the model
weights,reconstructor = Flux.destructure(model) # destructure the model so ReverseDiff can track the weights
x =rand(4) # make a random data point# calculate the gradient of the first element of the hessian wrt the model's weights
ReverseDiff.gradient( temp -> ForwardDiff.hessian(temp2 ->reconstructor(temp)(temp2)[1],x)[1] , weights)
I am able to use these gradients to train the model, and it appears to be working. I'm worried that the gradients may be only slightly incorrect, but close enough for the optimizer to still work. If someone could comment on the status of this issue, I would greatly appreciate it.
Thanks in advance!
The text was updated successfully, but these errors were encountered:
The issue is very much alive. For a minimum working example, take:
import ForwardDiff, ReverseDiff, AbstractDifferentiation as AD
n = 1
x0 = Array(rand(n))
M0 = rand(n,n)
function proto(x,M)
M*x |> sum
end
fw = AD.ForwardDiffBackend()
rv = AD.ReverseDiffBackend()
#Grads with regards to x
grad_x_FW(x,M) = AD.gradient(fw, x -> proto(x,M),x) |> first |> first
grad_x_RV(x,M) = AD.gradient(rv, x -> proto(x,M),x) |> first |> first
AD.gradient(fw, m -> grad_x_FW(x0,m),M0) #Forward-over-forward, correct
AD.gradient(rv, m -> grad_x_FW(x0,m),M0) #Reverse-over-Forward, ERROR
AD.gradient(fw, m -> grad_x_RV(x0,m),M0) #Forward-over-reverse, ERROR
AD.gradient(rv, m -> grad_x_RV(x0,m),M0) #Reverse-over-reverse, wrong
I would advise against using ReverseDiff for this kind of stuff. I have found myself in exactly the same situation as you, and I'm completely clueless as to why it seems to work with destructured Flux models. Doing some experimentation with ForwardDiff (which can actually do this safely) I've found that the gradients from ReverseDiff are slightly off. This is probably due to infinitesimals propagating improperly.
Hello everyone. The ReverseDiff documentation under "Limitations of ReverseDiff" lists the following
Nested differentiation of closures is dangerous. Differentiating closures is safe, and nested differentation is safe, but you might be vulnerable to a subtle bug if you try to do both. See this ForwardDiff issue for details. A fix is currently being planned for this problem.
But it appears as the listed ForwardDiff issue has been resolved. I ran this simple test
Which results in the correct value of 1. It seems as though this issue has been fixed?
As a bit of background, I have been working hard to use a hessian in a Flux loss function (which requires me to take the gradient of this hessian wrt a NN model's weights), but Zygote has a large number of issues with nested differentiation at the moment (not docking them at all, its an amazing development). It appears as though I am able to use ReverseDiff to correctly take the gradient of the hessian wrt the weights using the Flux.destructure function
I am able to use these gradients to train the model, and it appears to be working. I'm worried that the gradients may be only slightly incorrect, but close enough for the optimizer to still work. If someone could comment on the status of this issue, I would greatly appreciate it.
Thanks in advance!
The text was updated successfully, but these errors were encountered: