JuliaLang · topolarity · Nov 17, 2023 · Oct 18, 2023 · Oct 19, 2023 · Nov 10, 2023
diff --git a/README.md b/README.md
@@ -4,30 +4,72 @@
 
 [![](https://img.shields.io/badge/docs-dev-blue.svg)](https://juliacomputing.github.io/AllocCheck.jl/dev/)
 
-AllocCheck.jl is a Julia package that statically checks if a function call may allocate, analyzing the generated LLVM IR of it and it's callees using LLVM.jl and GPUCompiler.jl
+[AllocCheck.jl](https://github.com/JuliaComputing/AllocCheck.jl) is a Julia package that statically checks if a function call may allocate, analyzing the generated LLVM IR of it and it's callees using LLVM.jl and GPUCompiler.jl
 
-#### Examples
+AllocCheck operates on _functions_, trying to statically determine wether or not a function _may_ allocate memory, and if so, _where_ that allocation appears. This is different from measuring allocations using, e.g., `@time` or `@allocated`, which measures the allocations that _did_ happen during the execution of a function.
 
+## Getting started
+
+The primary entry point to check allocations is the macro [`@check_allocs`](@ref) which is used to annotate a function definition that you'd like to enforce allocation checks for:
 ```julia
-julia> mymod(x) = mod(x, 2.5)
+julia> using AllocCheck
+
+julia> @check_allocs multiply(x,y) = x * y
+multiply (generic function with 1 method)
+
+julia> multiply(1.5, 2.5) # call automatically checked for allocations
+3.75
+
+julia> multiply(rand(3,3), rand(3,3)) # result matrix requires an allocation
+ERROR: @check_alloc function contains 1 allocations.
+```
+
+The `multiply(::Float64, ::Float64)` call happened without error, indicating that the function was proven not to allocate. On the other hand, the `multiply(::Matrix{Float64}, ::Matrix{Float64})` call raised an `AllocCheckFailure` due to one internal allocation.
 
-julia> length(check_allocs(mymod, (Float64,)))
-0
+The `allocs` field can be used to inspect the individual errors:
+```julia
+julia> try multiply(rand(3,3), rand(3,3)) catch err err.allocs[1] end
+Allocation of Matrix{Float64} in ./boot.jl:477
+  | Array{T,2}(::UndefInitializer, m::Int, n::Int) where {T} =
 
-julia> linsolve(a, b) = a \ b
+Stacktrace:
+ [1] Array
+   @ ./boot.jl:477 [inlined]
+ [2] Array
+   @ ./boot.jl:485 [inlined]
+ [3] similar
+   @ ./array.jl:418 [inlined]
+ [4] *(A::Matrix{Float64}, B::Matrix{Float64})
+   @ LinearAlgebra ~/.julia/juliaup/julia-1.10.0-rc1+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:113
+ [5] var"##multiply#235"(x::Matrix{Float64}, y::Matrix{Float64})
+   @ Main ./REPL[13]:1
+```
+
+### Functions that throw exceptions
+
+Some functions that we do not expect may allocate memory, like `sin`, actually may:
+```julia
+julia> @allocated try sin(Inf) catch end
+48
+```
+
+The reason for this is that `sin` needs to allocate if it **throws an error**.
+
+By default, `@check_allocs` ignores all such allocations and assumes that no exceptions are thrown. If you care about detecting these allocations anyway, you can use `ignore_throw=false`:
+```julia
+julia> @check_allocs mysin1(x) = sin(x)
 
-julia> length(check_allocs(linsolve, (Matrix{Float64}, Vector{Float64})))
-175
+julia> @check_allocs ignore_throw=false mysin2(x) = sin(x)
 
-julia> length(check_allocs(sin, (Float64,)))
-2
+julia> mysin1(1.5)
+0.9974949866040544
 
-julia> length(check_allocs(sin, (Float64,); ignore_throw=true)) # ignore allocations that only happen when throwing errors
-0
+julia> mysin2(1.5)
+ERROR: @check_alloc function contains 2 allocations.
 ```
 
 #### Limitations
 
- 1. Runtime dispatch
+ Every call into a `@check_allocs` function behaves like a dynamic dispatch. This means that it can trigger compilation dynamically (involving lots of allocation), and even when the function has already been compiled, a small amount of allocation is still expected on function entry.
 
-   Any runtime dispatch is conservatively assumed to allocate.
+ For most applications, the solution is to use `@check_allocs` to wrap your top-level entry point or your main application loop, in which case those applications are only incurred once. `@check_allocs` will guarantee that no dynamic compilation or allocation occurs once your function has started running.
diff --git a/docs/make.jl b/docs/make.jl
@@ -7,6 +7,11 @@ makedocs(
       warnonly = [:missing_docs],
       pages = [
             "Home" => "index.md",
+            "Tutorials" => [
+                  "Optional debugging and logging" => "tutorials/optional_debugging_and_logging.md",
+                  "Hot loops" => "tutorials/hot_loop.md",
+                  "Minimum latency error recovery" => "tutorials/error_recovery.md",
+            ],
             "API" => "api.md",
       ],
       format = Documenter.HTML(prettyurls = haskey(ENV, "CI")),

diff --git a/docs/src/api.md b/docs/src/api.md
@@ -8,4 +8,8 @@
 
 ```@docs
 AllocCheck.check_allocs
-```
+```
+
+```@docs
+AllocCheck.@check_allocs
+```
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -6,44 +6,58 @@ AllocCheck operates on _functions_, trying to statically determine wether or not
 
 ## Getting started
 
-The main entry point to check allocations is the function [`check_allocs`](@ref), which takes the function to check as the first argument, and a tuple of argument types as the second argument:
-```@example README
+The primary entry point to check allocations is the macro [`@check_allocs`](@ref) which is used to annotate a function definition that you'd like to enforce allocation checks for:
+```@repl README
 using AllocCheck
-mymod(x) = mod(x, 2.5)
+using Test # hide
+@check_allocs mymod(x) = mod(x, 2.5)
 
-check_allocs(mymod, (Float64,))
+mymod(1.5) # call automatically checked for allocations
 ```
-This returned an empty array, indicating that the function was proven to not allocate any memory 🎉
+This call happened without error, indicating that the function was proven to not allocate any memory after it starts 🎉
 
 
 When used on a function that may allocate memory
-```@example README
-linsolve(a, b) = a \ b
+```@repl README
+@check_allocs linsolve(a, b) = a \ b
 
-allocs = check_allocs(linsolve, (Matrix{Float64}, Vector{Float64}));
-length(allocs)
+linsolve(rand(10,10), rand(10))
 ```
-we get a non-empty array of allocation instances. Each allocation instance contains some useful information, for example
+the function call raises an `AllocCheckFailure`.
+
+The `errors` field allows us to inspect the individual errors to get some useful information. For example:
 
 ```@example README
-allocs[1]
+try
+  linsolve(rand(10,10), rand(10))
+catch err
+  err.allocs[1]
+end
 ```
 
 we see what type of object was allocated, and where in the code the allocation appeared.
 
 
 ### Functions that throw exceptions
+
 Some functions that we do not expect may allocate memory, like `sin`, actually may:
 ```@example README
-length(check_allocs(sin, (Float64,)))
+@allocated try sin(Inf) catch end
 ```
-The reason for this is that `sin` may **throw an error**, and the exception object requires some allocations. We can ignore allocations that only happen when throwing errors by passing `ignore_throw=true`:
 
+The reason for this is that `sin` needs to allocate if it **throws an error**.
+
+By default, `@check_allocs` ignores all such allocations and assumes that no exceptions are thrown. If you care about detecting these allocations anyway, you can use `ignore_throw=false`:
 ```@example README
-length(check_allocs(sin, (Float64,); ignore_throw=true)) # ignore allocations that only happen when throwing errors
+@check_allocs mysin1(x) = sin(x)
+@check_allocs ignore_throw=false mysin2(x) = sin(x)
+
+@test mysin1(1.5) == sin(1.5)
+@test_throws AllocCheckFailure mysin2(1.5)
 ```
 
 ## Limitations
 
- 1. Runtime dispatch
-   Any runtime dispatch is conservatively assumed to allocate.
+ Every call into a `@check_allocs` function behaves like a dynamic dispatch. This means that it can trigger compilation dynamically (involving lots of allocation), and even when the function has already been compiled, a small amount of allocation is still expected on function entry.
+
+ For most applications, the solution is to use `@check_allocs` to wrap your top-level entry point or your main application loop, in which case those applications are only incurred once. `@check_allocs` will guarantee that no dynamic compilation or allocation occurs once your function has started running.
diff --git a/docs/src/tutorials/error_recovery.md b/docs/src/tutorials/error_recovery.md
@@ -0,0 +1,60 @@
+# Guaranteed Error Recovery
+
+Safety-critical real-time systems are often required to have performance critical error-recovery logic. While errors are not supposed to occur, they sometimes do anyways 😦, and when they do, we may want to make sure that the recovery logic runs with minimum latency.
+
+In the following example, we are executing a loop that may throw an error. By default [`check_allocs`](@ref) allows allocations on the error path, i.e., allocations that occur as a consequence of an exception being thrown. This can cause the garbage collector to be invoked by the allocation, and introduce an unbounded latency before we execute the error recovery logic.
+
+To guard ourselves against this, we may follow these steps
+1. Prove that the function does not allocate memory except for on exception paths.
+2. Since we have proved that we are not allocating memory, we may disable the garbage collector. This prevents it from running before the error recovery logic.
+3. To make sure that the garbage collector is re-enabled after an error has been recovered from, we re-enable it in a `finally` block.
+
+
+
+```@example ERROR
+function treading_lightly()
+    a = 0.0
+    GC.enable(false) # Turn off the GC before entering the loop
+    try
+        for i = 10:-1:-1
+            a += sqrt(i) # This throws an error for negative values of i
+        end
+    catch
+        exit_gracefully() # This function is supposed to run with minimum latency
+    finally
+        GC.enable(true) # Always turn the GC back on before exiting the function
+    end
+    a
+end
+exit_gracefully() = println("Calling mother")
+
+using AllocCheck, Test
+allocs = check_allocs(treading_lightly, ()) # Check that it's safe to proceed
+```
+```@example ERROR
+@test isempty(allocs)
+```
+
+[`check_allocs`](@ref) returned zero allocations. If we invoke [`check_allocs`](@ref) with the flag `ignore_throw = false`, we will see that the function may allocate memory on the error path:
+
+```@example ERROR
+allocs = check_allocs(treading_lightly, (); ignore_throw = false)
+length(allocs)
+```
+
+Finally, we test that the function is producing the expected result:
+
+```@example ERROR
+val = treading_lightly()
+@test val ≈ 22.468278186204103  # hide
+```
+
+In this example, we accepted an allocation on the exception path with the motivation that it occurred once only, after which the program was terminated. Implicit in this approach is an assumption that the exception path does not allocate too much memory to execute the error recovery logic before the garbage collector is turned back on. We should thus convince ourselves that this assumption is valid, e.g., by means of testing:
+
+```@example ERROR
+treading_lightly() # Warm start
+allocated_memory = @allocated treading_lightly() # A call that triggers the exception path
+# @test allocated_memory < 1e4
+```
+
+The allocations sites reported with the flag `ignore_throw = false` may be used as a guide as to what to test.
diff --git a/docs/src/tutorials/hot_loop.md b/docs/src/tutorials/hot_loop.md
@@ -0,0 +1,87 @@
+# Allocations followed by a hot loop
+A common pattern in high-performance Julia code, as well as in real-time systems, is to initially allocate some working memory, followed by the execution of a performance sensitive _hot loop_ that should perform no allocations. In the example below, we show a function `run_almost_forever` that resembles the implementation of a simple control system. The function starts by allocating a large `logvector` in which some measurement data is to be saved, followed by the execution of a loop which should run with as predictable timing as possible, i.e., we do not want to perform any allocations or invoke the garbage collector while executing the loop.
+```@example HOT_LOOP
+function run_almost_forever()
+    N = 100_000 # A large number
+    logvector = zeros(N) # Allocate a large vector for storing results
+    for i = 1:N # Run a hot loop that may not allocate
+        y = sample_measurement()
+        logvector[i] = y
+        u = controller(y)
+        apply_control(u)
+        Libc.systemsleep(0.01)
+    end
+end
+
+# Silly implementations of the functions used in the example
+sample_measurement() = 2.0
+controller(y) = -2y
+apply_control(u) = nothing
+nothing # hide
+```
+
+Here, the primary concern is the loop, while the preamble of the function should be allowed to allocate memory. The recommended strategy in this case is to refactor the function into a separate preamble and loop, like this
+```@example HOT_LOOP
+function run_almost_forever2() # The preamble that performs allocations
+    N = 100_000 # A large number
+    logvector = zeros(N) # Allocate a large vector for storing results
+    run_almost_forever!(logvector)
+end
+
+function run_almost_forever!(logvector) # The hot loop that is allocation free
+    for i = eachindex(logvector) # Run a hot loop that may not allocate
+        y = sample_measurement()
+        @inbounds logvector[i] = y
+        u = controller(y)
+        apply_control(u)
+        Libc.systemsleep(0.01)
+    end
+end
+nothing # hide
+```
+
+We may now analyze the loop function `run_almost_forever!` to verify that it does not allocate memory:
+```@example HOT_LOOP
+using AllocCheck, Test
+allocs = check_allocs(run_almost_forever!, (Vector{Float64},));
+@test isempty(allocs)
+```
+
+
+## More complicated initialization
+In practice, a function may need to perform several distinct allocations upfront, including potentially allocating objects of potentially complicated types, like closures etc. In situations like this, the following pattern may be useful:
+```julia
+struct Workspace
+    # All you need to run the hot loop, for example:
+    cache1::Vector{Float64}
+    cache2::Matrix{Float64}
+end
+
+function setup(max_iterations::Int = 100_000)
+    # Allocate and initialize the workspace
+    cache1 = zeros(max_iterations)
+    cache2 = zeros(max_iterations, max_iterations)
+    return Workspace(cache1, cache2)
+end
+
+function run!(workspace::Workspace)
+    # The hot loop
+    for i = eachindex(workspace.cache1)
+        workspace.cache1[i] = my_important_calculation() # The allocated cache is modified in place
+        ...
+    end
+end
+
+function run()
+    workspace = setup()
+    run!(workspace)
+end
+```
+
+Here, `workspace` is a custom struct designed to serve as a workspace for the hot loop, but it could also be realized as a simple tuple of all the allocated objects required for the computations. Note, the struct `Workspace` in this example was not marked as mutable. However, its contents, the two cache arrays, are. This means that the `run!` function may modify the contents of the cache arrays.
+
+The benefit of breaking the function up into two parts which are called from a third, is that we may now create the workspace object individually, and use it to compute the type of the arguments to the `run!` function that we are interested in analyzing:
+```julia
+workspace = setup()
+allocs = check_allocs(run!, (typeof(workspace),))
+```