You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
When reading the implementation, I noticed that in the forward-backward pass, you used a dot-product before running the backward pass, specifically in the following line:
Hello,
When reading the implementation, I noticed that in the forward-backward pass, you used a dot-product before running the backward pass, specifically in the following line:
GradCache/src/grad_cache/grad_cache.py
Line 241 in 0c33638
I can't understand this, when reading the paper I imagined that you would directly use the gradients cached, something like:
How exactly does the "surrogate" work to utilise the cached gradient? and why wouldn't the "standard" way of doing it work?
Thanks.
The text was updated successfully, but these errors were encountered: