-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce number of dependencies #44
Comments
On my machine, a non-exhaustive list of load times of the heaviest dependencies in a clean environment:
|
Yep, makes sense. I think a fair bit of these can be cut out. MCMCDiagnosticTools, as we discussed earlier, can be cut out pretty easily. Another way we could cut it out besides what we discussed is just copy/pasting the FFTESS code from MCMCDiagnosticTools so we don't need to take everything else, which might just be the easiest option. I think both Distributions.jl and Polyester.jl can be cut out too at very low cost and I'll make a PR doing that soon. It might be worth checking whether switching from AxisKeys to DimensionalData reduces compile time. My suspicion is it will -- AxisKeys works by turning every array into a functor that returns the output associated with an index, so maybe it compiles a function for every array? LoopVectorization may be harder to cut out without losing performance, but I've been considering whether a switch to TensorOperations.jl for calculations might be faster than LoopVectorization+Tullio for larger arrays anyways; unsure how it would affect compile times. |
Wow, yeah, the difference is huge. I stuck with AxisKeys because I wanted to have it interface easily with ArviZ.jl, but DimensionalData does exactly the same job in under a tenth of the compile time. Switching should be very easy (if a bit tedious) since the only real difference is DimensionalData uses a slightly different constructor and square brackets for lookup (instead of round brackets). |
Similar story here. I'll probably switch them out even if the performance drops slightly, although I've been having some very annoying errors -- for some reason trying to replace |
IMO copying code is usually one of the worst alternatives. Regarding FFT, I assume you could/should replace FFTW with AbstractFFTs, as we do in MCMCDiagnosticTools and MCMCChains as well. |
FFT isn't the important bit -- what matters is that we need a function to calculate ESS but don't want to pull everything else in MCMCDiagnosticTools.jl along with it. Seth suggested letting users calculate the relative effective sample size themselves; I think this is reasonable but it's a bit inconvenient. |
Sure, the FFTW suggestion was just an additional comment. I haven't benchmarked the precompilation and loading times of MCMCDiagnosticTools but it mainly depends on lightweight packages: https://github.com/devmotion/MCMCDiagnosticTools.jl/blob/main/Project.toml Most of these packages are just interfaces (AbstractFFTs, DataAPI, MLJModelInterface, Tables), stdlibs (LinearAlgebra, Random, Statistics), or standard packages from the math/stats ecosystem that basically every package depends on (SpecialFunctions and StatsBase). The only other dependency is Distributions which is also quite common in stats/probability theory packages. If there are any issues regarding loading/compilation times, it would be good to track them down (invalidations in one of these dependencies or MCMCDiagnosticTools?) and fix them. From just looking at the list of dependencies it is not clear to me why you would want to remove the dependency on MCMCDiagnosticTools and copy some of the code. In my experience usually this leads to diverging code bases, i.e., you will miss out on any improvements or bug fixes in MCMCDiagnosticTools and vice versa; this happened regularly and is still a problem eg with the copied-from-Distributions-and-slightly-modified code in DistributionsAD. |
Makes sense. From what I can see, MCMCDiagnosticTools' load time comes primarily from Distributions.jl. Do you happen to know where Distributions.jl is used in MCMCDiagnosticTools? Maybe if it's a few lines that need, say, a Normal PDF calculation, that could just be hand-coded instead to remove the dependency. Another way to cut load times might be switching from FFTESS to another ESS method so we can drop FFTW, but I have no idea how they compare performance/accuracy-wise. (Using |
Performance-wise in most of my benchmarks computing the ESS with FFT is the worst out of all methods.
That's the main point of using AbstractFFTs - users can pick whatever FFT library they want that implements the interface. |
Some benchmarks: TuringLang/MCMCChains.jl#156 and TuringLang/MCMCChains.jl#215 FFT (using FFTW) was by far the slowest method. |
This is also why it's neither the default nor recommended 😄 |
Haha, makes sense! I just copied it from R's loo package, since I figured they must have some good reason for using it. I'll swap it out for another one then -- is the default the best to stick with? |
According to the linked benchmarks, yes. I checked where exactly Distributions is used in MCMCDiagnosticTools. It turns out mainly in the discrete diagnostics (for sampling and for evaluating CDFs, the former could maybe be replaced by some, possibly less optimized, calls to StatsBase and the latter by calling into StatsFuns) but unfortunately it is crucial for the Rstar diagnostic: for probabilistic predictions we return a scaled Poisson Binomial distribution (https://github.com/devmotion/MCMCDiagnosticTools.jl/blob/90f502dec5f116f7c12a42df1aa285137dffe6af/src/rstar.jl#L129). So I don't think it can be removed as a dependency. BTW the dependencies of Distributions are lightweight as well (https://github.com/JuliaStats/Distributions.jl/blob/master/Project.toml), maybe apart from StatsFuns which still depends on Rmath (there are JuliaStats/StatsFuns.jl#125 and JuliaStats/StatsFuns.jl#113 though), so probably any performance issues are caused by Distributions itself. Since it is loaded by most stats/probability theory/Turing packages anyway I wonder how helpful it is to try to remove it though. |
There are more factors to consider than benchmark speed though. ESS computation is not a computational bottleneck in the statistical workflow (the inference method, e.g. MCMC, is), so IMO we should use whatever is currently considered to the best ESS method. Right now, that appears to be the rank-normalized bulk ESS method from Vehtari et al, 2021. That paper notes
While For some reason though, the loo package doesn't seem to use the rank-normalized version. |
I think it is nevertheless questionable if the FFT-based approach should be considered rhe best one. It is difficult to comment on accuracy and efficiency without inspecting the actual implementation and making sure that all methods are implemented in an optimal way. And even then efficiency might be different for different programming languages due to differences in eg how efficient FFTs are implemented or if the language yields optimized native code for for-loops or requires vectorization. (As a side remark, there's a paper that claims that the best way to compute the pmf values of a Poisson-Binomial distribution is based FFTs; however, it turned out it's better to use the dynamic programming approach that the authors claimed to be worse, and therefore we replaced the FFT-approach with it in Distributions). BTW all three ESS implementations use the same ESS algorithm from the split-rhat paper, really the only difference is how the autocorrelation terms are computed. @tpapp also noticed in MCMCDiagnostics that often only a few autocorrelation terms are needed which I assume explains some of the efficiency differences. |
Your point seems to be that papers can make mistakes, and I of course agree. But if we dismiss the advice it should only be after running our own experiments. Unfortunately, they don't detail their FFT experiments or the magnitude of the improvement they saw by switching to FFT. Here's a quick check that indicates that at least for small, IID samples, their method has slightly lower variance than using ArviZ, MCMCDiagnosticTools, Random, StatsBase, FFTW
ndraws = 100
nreps = 100_000
Random.seed!(42)
randcauchy(n) = map(_ -> randn() / randn(), 1:n)
rand_funs = [randn, rand, randexp, randcauchy]
ess_funs = [
ESSMethod => x -> MCMCDiagnosticTools.ess_rhat(reshape(x, :, 1, 1))[1][1],
FFTESSMethod => x -> MCMCDiagnosticTools.ess_rhat(reshape(x, :, 1, 1); method=FFTESSMethod())[1][1],
ArviZ => x -> ArviZ.ess(reshape(x, 1, :))
]
for f in rand_funs, (ess_name, ess_fun) in ess_funs
μ, σ = mean_and_std(ess_fun(f(ndraws)) for _ in 1:nreps)
println("$f: $(round(μ; digits=2)) ± $(round(σ; digits=2)) ($ess_name)")
end randn: 100.46 ± 30.39 (ESSMethod)
randn: 100.54 ± 30.23 (FFTESSMethod)
randn: 99.05 ± 28.04 (ArviZ)
rand: 100.48 ± 30.57 (ESSMethod)
rand: 100.42 ± 30.35 (FFTESSMethod)
rand: 99.18 ± 27.94 (ArviZ)
randexp: 100.19 ± 27.99 (ESSMethod)
randexp: 100.21 ± 27.59 (FFTESSMethod)
randexp: 98.99 ± 27.86 (ArviZ)
randcauchy: 102.65 ± 210.04 (ESSMethod)
randcauchy: 101.47 ± 260.84 (FFTESSMethod)
randcauchy: 98.98 ± 27.92 (ArviZ) We should also of course test on (auto/anti)correlated chains. So probably more important than FFT vs no FFT is adding the bulk/tail/quantile rank-normalized ESS methods to MCMCDiagnosticTools. It would be preferrable if the user could mix and match autocov-computation methods, whether rank normalization is performed, and how maximum lag and bias are computed, but this would require a refactor. |
It is based on this paper (well, on an earlier version https://arxiv.org/pdf/1903.08008v1.pdf since the method was implemented quite a while ago), just without the rank normalization and other improvements in section 4. More concretely, it uses the formulas in section 3 (and appendix A the earlier versions, it seems this was now all merged into section 3).
Sure, we shouldn't just dismiss existing results. But we conducted benchmarks and since all our runs showed a significant overhead of FFT vs non-FFT and apparently the same was observed in MCMCDiagnostics, I am sceptical that a FFT-based approach is the best.
I agree, it seems there is an improvement here. A more detailed inspection of the distributions would be good probably, 100 +- 260 seems to indicate a very skewed distribution or negative values.
Sure, it would be good to add such improvements. But as you say this is orthogonal to the original discussion about FFT vs non-FFT that I commented on. |
@ParadaCarleton and I discussed trying to slim down the package to make it a more lightweight dependency. Currently on my machine this package loads in 5s, vs 5ms with https://github.com/arviz-devs/PSIS.jl, which only depends on standard library packages.
The two reasons to add dependencies are 1) convenience and 2) performance. In my view, features like pretty-printed results are not worth the extra load time if a package can be light-weight, or they can be made conditional dependencies.
For performance, we discussed that Tullio+LoopVectorization may be unnecessary, as PSIS.jl is much faster than ParetoSmooth.jl for vectors and only 20% slower than ParetoSmooth.jl for arrays (if PSIS.jl uses the
@threads
macro like ParetoSmooth.jl does). If removing then only causes a ~20% decrease in performance but decreases load time significantly, this seems acceptable.The text was updated successfully, but these errors were encountered: