RFC: in-place `accum` #981

mcabbott · 2021-05-25T15:19:54Z

This is a minimal attempt to add safe in-place accumulation of gradients.

It assumes that any Δ::DenseArray may be mutated, and to keep this safe, any rule which duplicates Δ should apply a NoWrite wrapper. This should prevent the problems seen in #962, without such protection.

Rules which do Δ -> (Δ, Δ) (as for +) must apply this wrapper to both branches. (Ideally, once one has been used up, the other could be marked safe to mutate. That could be done but is more complicated, and can be added later.) For rules within Zygote, this protection is done by hand.

For rules defined by ChainRules, the interface function should check the pointer of Δ (and remove NoWrite), then compare the pointer of what it returns, and re-wrap if necessary. I'm not sure this is done perfectly yet. I'm also not sure if this has performance overhead. Evil test cases would be welcome.

It's easy to make the NoWrite wrapper disappear on the RHS of broadcasting or reductions. But if it survives to meet * then it's likely to cause slow generic matmul. I've added quite a few explicit _unprotect calls to try to avoid this. (This should only matter for rules defined with @adjoint.)

Xref JuliaDiff/ChainRulesCore.jl#350

src/lib/protect.jl

mcabbott · 2021-06-15T00:29:03Z

Benchmarks:

Some here: Flux & Zygote's AD slower than ForwardDiff #994 (comment)
From RFC: more efficient ∇getindex #962

julia> f4(x) = sum([x[i]^2 for i in eachindex(x)]);

julia> @btime Zygote.gradient(f4, $(collect(1:1000)))
  749.750 μs (7068 allocations: 15.77 MiB)  # v0.6.11
  455.959 μs (3071 allocations: 8.04 MiB)   # v0.6.12, with 962
  93.833 μs (2074 allocations: 472.20 KiB)  # this PR
([2, 4, 6, 8, 10, 12, 14, 16, 18, 20  …  1982, 1984, 1986, 1988, 1990, 1992, 1994, 1996, 1998, 2000],)

Example from How to speed up pullbacks when iterating over arrays? #644

julia> function _evalpoly(x, p)
           N = length(p)
           ex = p[end]
           for i in N-1:-1:1
               ex = muladd(x, ex, p[i])
           end
           ex
       end
_evalpoly (generic function with 1 method)

julia> x, p = rand(), randn(10000);

julia> @btime _evalpoly(x, p);
  21.791 μs (1 allocation: 16 bytes)

julia> @btime Zygote.gradient(_evalpoly, x, p);
  197.007 ms (680107 allocations: 1.52 GiB)    # v0.6.11
  146.587 ms (660107 allocations: 792.75 MiB)  # v0.6.12, with 962
  62.367 ms (640111 allocations: 35.76 MiB)    # this PR

Easy example from Incremental accumulation of gradients? #905

julia> @btime Zygote.gradient(x -> sum(abs2, net(x)), $(rand(50,50,50,50)));
  1.115 s (8266 allocations: 3.07 GiB)    # v0.6.11
  1.049 s (8187 allocations: 954.08 MiB)  # v0.6.12
  1.013 s (9138 allocations: 954.11 MiB)  # this PR

ToucheSir · 2022-04-27T04:09:23Z

How do we feel about this? Would it help to do an @adjoint -> rrule conversion first so that _unprotect is no longer required?

mcabbott · 2022-05-11T02:56:26Z

I didn't think about this since. Except to realise that https://github.com/bkamins/ReadOnlyArrays.jl might be better than the version I wrote here.

The checks I wrote for function (s::ZBack)(dy) try to handle these cases:

Rules returning (Δ, Δ), like for +: If any two gradients agree, wrap them both.
Rules receiving a wrapped gradient: Always unwrap before calling the rrule. Then re-wrap if (2a) the same answer emerges, or else (2b) if any two gradients agree.

Does (1)/(2b) ever occur, besides +? Maybe sum on an array of arrays is another possible case, not sure this is caught, seems tricky.

Does (2a) ever occur?

Since accum is recursive, this accumulation will also mutate arrays inside the structural gradient of non-array objects. Are any of these ever shared? The existing checks will not notice.

mcabbott · 2022-05-24T19:05:00Z

A narrower idea is to make in-place accumulation work only for the result of scalar indexing:

function accum(x::OneElement{T,N}, ys::OneElement{T,N}...) where {T,N}
    z = Buffer(x)
    fill!(z.data, zero(T))
    z[x.ind...] = x.val
    accum(z, ys...)
end
function accum(x::Buffer, ys::OneElement...)  # only produced by the above method
    for y in ys
        x[y.ind...] += y.val
    end
    x
end
_project(x::AbstractArray, dx::Buffer) = copy(dx)  # don't return this type

This gets similar speedup on the above examples. I think it ought to be safe. Buffer is just a flag here really, should think about 2nd derivatives too.

ToucheSir · 2022-05-24T22:04:09Z

Sounds good to me. Provenance tracking of possibly shared arrays has proven to be a consistent thorn in our side, so the less that has to be done the better.

mcabbott commented May 25, 2021

View reviewed changes

src/lib/protect.jl Outdated Show resolved Hide resolved

mcabbott mentioned this pull request May 27, 2021

Incremental accumulation of gradients? #905

Open

mcabbott mentioned this pull request Jun 14, 2021

Flux & Zygote's AD slower than ForwardDiff #994

Open

mcabbott mentioned this pull request Aug 2, 2021

Add rrules for extrema, findmax, maximum JuliaDiff/ChainRules.jl#480

Merged

mcabbott mentioned this pull request Aug 25, 2021

Rule for exp mutates its input JuliaDiff/ChainRules.jl#512

Closed

mcabbott force-pushed the inplace2 branch from 5315d12 to 039bf42 Compare October 24, 2021 01:35

mcabbott mentioned this pull request Nov 4, 2021

RFC: A General Recipe for Generic Rules and Natural Tangents (hopefully...) JuliaDiff/ChainRulesCore.jl#449

Open

mcabbott mentioned this pull request Jan 11, 2022

Accumulation JuliaDiff/Diffractor.jl#69

Open

mcabbott added 8 commits May 10, 2022 21:01

in-place accum with simple protect wrapper

90fb586

unprotect some linear algebra rules

84fb7fe

FFT + LinearAlgebra

0b2e5a2

four seven seven four

ef77843

make it fast again

3236701

better chainrules logic

c55dda2

fix Grads for implicit params

af48808

rm debug statements, to help tests

5884090

mcabbott force-pushed the inplace2 branch from 039bf42 to 5884090 Compare May 11, 2022 01:01

mcabbott added the discussion label May 11, 2022

mcabbott added the performance label Jul 4, 2022

ToucheSir mentioned this pull request Aug 5, 2022

Taking derivatives with respect to only some elements of an array #1283

Open

mcabbott mentioned this pull request Sep 4, 2022

Mark some arrays as safe for accumulation JuliaDiff/ChainRulesCore.jl#578

Closed

mcabbott closed this Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: in-place `accum` #981

RFC: in-place `accum` #981

mcabbott commented May 25, 2021 •

edited

Loading

mcabbott commented Jun 15, 2021 •

edited

Loading

ToucheSir commented Apr 27, 2022

mcabbott commented May 11, 2022

mcabbott commented May 24, 2022

ToucheSir commented May 24, 2022

RFC: in-place accum #981

RFC: in-place accum #981

Conversation

mcabbott commented May 25, 2021 • edited Loading

mcabbott commented Jun 15, 2021 • edited Loading

ToucheSir commented Apr 27, 2022

mcabbott commented May 11, 2022

mcabbott commented May 24, 2022

ToucheSir commented May 24, 2022

RFC: in-place `accum` #981

RFC: in-place `accum` #981

mcabbott commented May 25, 2021 •

edited

Loading

mcabbott commented Jun 15, 2021 •

edited

Loading