Mark some arrays as safe for accumulation #578

mcabbott · 2022-08-25T14:04:37Z

As far as I know InplaceableThunk never does anything at present, since gradients are accumulated with +, and this un-thunks before adding.

This PR marks some arrays as safe to mutate, and then mutates them when adding another array, or an InplaceableThunk. The only arrays it marks are ones produced by adding a thunk to something, as this is sure to be a new array. This is no help for an accumulation of 2 terms, but helps with the 3rd and after, and should make N terms O(1).

#539 explores a different rule, assuming any @thunk expands to something safe to mutate. That helps with the 2nd Thunk. Both could be done.

The marker it uses is a new AbstractThunk. This is nice because consumers of gradients must already know to call unthunk if they need an array.

But in fact there's a small zoo of functions which automatically un-thunk. I'm not sure these are a great idea, as using them without thinking means you are likely to un-thunk twice or N times by accident. And using them on purpose requires memorising what's in the zoo and what's not. This PR adds one more problem: If you were relying on + to un-thunk for you, it will now assume you are accumulating, mutate, and wrap the answer in another thunk.

The other snag is that a rule like that for z = x+y whose pullback passes the same dz to both dx and dy risks wrong answers if it passes this new marker through. (Whereas earlier Thunks were safe.) The rrule in CR does not appear to have this problem, as reshape automatically unthunks (and will be called N times). Rules for .+ and maybe .- may need care... I believe CR's will simply fail here since ndims(@thunk [1,2]) is an error. Perhaps no rrule should ever transmit a thunk unchanged, and this could be automated somewhere?

Maybe this isn't quite safe enough.

What was the plan for InplaceableThunk? Was anything written when it was designed?

mcabbott · 2022-08-25T14:06:34Z

src/tangent_types/thunks.jl

+julia> @btime Diffractor.gradient(x -> _getindex(x,1), $(rand(128 * 100)));
+  min 1.012 μs, mean 11.103 μs (2 allocations, 100.05 KiB)
+
+julia> @btime Diffractor.gradient(x -> _getindex(x,1)+_getindex(x,2), $(rand(128 * 100)));
+  min 7.625 μs, mean 46.941 μs (6 allocations, 300.14 KiB)  # unthunk, unthunk, add -- unchanged
+
+julia> @btime Diffractor.gradient(x -> _getindex(x,1)+_getindex(x,2)+_getindex(x,3), $(rand(128 * 100)));
+  min 16.791 μs, mean 67.720 μs (10 allocations, 500.23 KiB)  # before
+  min 8.625 μs, mean 44.642 μs (6 allocations, 300.14 KiB)    # after
+
+  min 1.036 μs, mean 12.684 μs (2 allocations, 100.05 KiB)    # with stronger assumption, overwrite any thunk


Some benchmarks.

mcabbott · 2022-08-25T14:08:34Z

src/accumulation.jl

+Before ChainRulesCore 1.16, it would guess `true` for most wrappers based on `parent`,
+but this is not safe, e.g. it will lead to an error with ReadOnltArrays.jl. 


This branch is on top of #577, mostly to make tests pass.

codecov-commenter · 2022-08-25T14:17:43Z

Codecov Report

Base: 93.11% // Head: 93.11% // No change to project coverage 👍

Coverage data is based on head (f6123ee) compared to base (f6123ee).
Patch has no changes to coverable lines.

❗ Current head f6123ee differs from pull request most recent head 4d48df8. Consider uploading reports for the commit 4d48df8 to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #578   +/-   ##
=======================================
  Coverage   93.11%   93.11%           
=======================================
  Files          15       15           
  Lines         901      901           
=======================================
  Hits          839      839           
  Misses         62       62

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

mcabbott · 2022-08-31T14:45:06Z

BTW, status is that I think it's probably too unsafe to load this onto +.

It would be better to have a dedicated function for accumulation that it not used in the sloppy use of "maybe this will unthunk automatically". Diffractor.accum is such a function, and attaching this behaviour to it would be better. Possibly CRC should own this function.

ToucheSir · 2022-09-04T19:27:32Z

How would such a function be distinct from add!!? Is that interface not a good fit for https://github.com/JuliaDiff/ChainRulesCore.jl/pull/578/files#diff-e87adbe51202ce0c86d1adf5ec04bc9857f26830dd7a634aa3b1b13cc15b967bR162-R183?

mcabbott · 2022-09-04T19:39:41Z

Whatever else it does, I think add!!(x::Array, y::Array) will always mutate x. I think this isn't acceptable for accum as, right now, we do not demand that rrules return something known to be safe to mutate.

Few are actually unsafe, essentially just +, - & broadcasted versions. So perhaps you could fix those and then make this a demand. I'm a bit scared that some other rules elsewhere will break this, e.g. by returning a Tangent containing within it some arrays which alias each other. The rule for + (and any others) would either need to make preventative copies always, or else wrap in some marker like ReadOnlyArray. Either way an explicit opt-out.

The safer approach seems to be opt-in. The result of dx + dx is certainly safe. #539 further regards @thunk as an opt-in. We could encourage @thunk A in rules for any array A you know to be safe (even when the thunk delays no computation.)

ToucheSir · 2022-09-04T20:47:05Z

Thanks for the explanation. Is there something that can be done on the rule level to annotate one as being safe like +? I know the word "linear" has been thrown around a few times, but unsure if that applies here. Or is this not something that can be captured at a rule boundary?

mcabbott · 2022-09-04T21:09:27Z

Unsafe like +, I'd say. In that it returns gradients for two arguments which may well alias each other.

Done manually? You could wrap both in a ReadOnlyArray. But forgetting this means silent wrong answers.

Done automatically? I messed around in FluxML/Zygote.jl#981 with making the Z-to-CR bridge look for repeated pointers... but I don't know how well this can work. Someone is going to have a rule in the wild which does Tangent{A}(; x = arr, y = dy), Tangent{B}(; p = dp, q = arr'), and mutating that arr is not safe.

oxinabox · 2022-09-17T19:01:37Z

add!!(x,y) may or may not mutate x.
It makes no promise as to if it will or will not.

oxinabox · 2022-09-17T19:06:03Z

But yes, there is a bunch of extra thinking that remains to be done as to if in-place accumulation is ever safe in the presence of aliasing.
I suspect it's important that the tangent only is an alias for another tangent, if the primal is also an alias for another primal.
And if you have mutations support that requires a "if and only if" for aliasing.
And yes that thinking requires thinking about "linear".

mcabbott · 2022-09-18T23:56:35Z

I suspect it's important that the tangent only is an alias for another tangent, if the primal is also an alias for another primal.

This strikes me as being quite far from status quo. Flux allows for shared parameters (and dealing with this is 99% of the headache in Optimisers.jl) but their gradients will almost never be aliased, as they are freshly allocated by *. But gradients elsewhere are (at present) allowed to be aliased, e.g. if the forward pass is z = hcat(v,w) + vcat(r',s') then the four vectors here could all have gradients which are views of dz. (Not sure the hcat rules does in fact make views, not yet.)

This PR wants to design around that. Assume any array you get from an rrule's pullback aliases others in horrible ways. But as soon as you add two, you know you have made a new array, and can safely mutate it to add the 3rd.

Can this be safe? What are the edge cases?

If some rule relies on + to auto-unthunk, then they may expect to get an Array not another thunk afterwards. If that leads to an error, OK, they will change it to unthunk(dy) manually. If the next step is something else which auto-unthunks, then all will be well.
If a rule has a trivial pullback, like z = f(x) has dz -> (NoTangent, dz), then is it OK for the AccumThunk to pass right through? Accumulation of dz is finished by the time this pullback is called, since all operations on z must happen after f is called on the forward pass. Accumulation of dx happens later, if x is also used in some g(x), but that doesn't matter right?
If the pullback duplicates, dz -> (NoTangent, dz, dz), then accumulation on dx and dy will interfere with each other. E.g. something like prod(x+y) / first(x) will have a second contribution to dx which should not influence dy. Are there any rules besides + which will do this? Not just produce aliasing, but pass unchanged an AccumThunk they receive from some other rule on to several others? (In fact CR's rule for + does unthunk once, to avoid unthunking several times.)

mcabbott · 2022-09-19T00:13:50Z

More broadly, am I right to think the "functional" approach of how rrule works is more-or-less inherited from Zygote?

The alternative would be to allocate all buffers on the forward pass. Then there would never be a need for thunks, you would always update the existing array. I think this is the path taken by Tracker, and if I understand right also by Enzyme. (For scalars, thunks save work on inactive paths, but this would presumably track activity.)

ToucheSir · 2022-09-20T23:00:46Z

How would preallocation interact with something like Zygote.jacobian which calls the pullback multiple times? I'm not aware of how ReverseDiff and Enzyme handle that, so perhaps it's a non-issue.

mcabbott · 2022-09-21T00:09:59Z

You'd have to zero them all before the second backward pass. (After copying the contents to one slice of the output.)

Tracker has will give an error if you accidentally call back! twice, but I don't immediately see how its jacobian handles this.

ToucheSir · 2022-09-21T00:24:30Z

It appears Tracker has some capability for auto-accumulation? https://github.com/FluxML/Tracker.jl/blob/7ab871f4e4d6410e98bb1d5f527e512eb912aff8/src/back.jl#L48-L58. Surely it's not as easy as always using += instead of = when writing to buffers in a pullback?

mcabbott added the inplace accumulation for things relating to inplace accumulation of gradients label Aug 25, 2022

mcabbott commented Aug 25, 2022

View reviewed changes

mcabbott added 2 commits September 16, 2022 22:36

small upgrade to add!!

e4ff1ab

AccumThunk

4d48df8

mcabbott force-pushed the add_thunks_new branch from ddd9b46 to 4d48df8 Compare September 17, 2022 02:36

ToucheSir mentioned this pull request May 3, 2023

Discussion: what does Enzyme.jl need for DL in Julia? EnzymeAD/Enzyme.jl#805

Closed

mcabbott closed this Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mark some arrays as safe for accumulation #578

Mark some arrays as safe for accumulation #578

mcabbott commented Aug 25, 2022 •

edited

Loading

mcabbott Aug 25, 2022

mcabbott Aug 25, 2022

codecov-commenter commented Aug 25, 2022 •

edited

Loading

mcabbott commented Aug 31, 2022

ToucheSir commented Sep 4, 2022

mcabbott commented Sep 4, 2022

ToucheSir commented Sep 4, 2022

mcabbott commented Sep 4, 2022

oxinabox commented Sep 17, 2022

oxinabox commented Sep 17, 2022

mcabbott commented Sep 18, 2022

mcabbott commented Sep 19, 2022

ToucheSir commented Sep 20, 2022

mcabbott commented Sep 21, 2022

ToucheSir commented Sep 21, 2022 •

edited

Loading

		Before ChainRulesCore 1.16, it would guess `true` for most wrappers based on `parent`,
		but this is not safe, e.g. it will lead to an error with ReadOnltArrays.jl.

Mark some arrays as safe for accumulation #578

Mark some arrays as safe for accumulation #578

Conversation

mcabbott commented Aug 25, 2022 • edited Loading

mcabbott Aug 25, 2022

Choose a reason for hiding this comment

mcabbott Aug 25, 2022

Choose a reason for hiding this comment

codecov-commenter commented Aug 25, 2022 • edited Loading

Codecov Report

mcabbott commented Aug 31, 2022

ToucheSir commented Sep 4, 2022

mcabbott commented Sep 4, 2022

ToucheSir commented Sep 4, 2022

mcabbott commented Sep 4, 2022

oxinabox commented Sep 17, 2022

oxinabox commented Sep 17, 2022

mcabbott commented Sep 18, 2022

mcabbott commented Sep 19, 2022

ToucheSir commented Sep 20, 2022

mcabbott commented Sep 21, 2022

ToucheSir commented Sep 21, 2022 • edited Loading

mcabbott commented Aug 25, 2022 •

edited

Loading

codecov-commenter commented Aug 25, 2022 •

edited

Loading

ToucheSir commented Sep 21, 2022 •

edited

Loading