These benchmarks compare the performance of formulaic
against the existing
formula parsers for Python (patsy
) and R
(model.matrix
/
sparse.model.matrix
) when interpreting Wilkinson formulas and
generating the appropriate model matrices. These benchmarks are somewhat
synthetic, and target large data sizes where performance is more critical. As
such, all of the formula-to-model-matrix transforms are tested on a data frame
with three million rows represented as a Pandas or R dataframe. For the time
being, only CPU performance (as compared to memory utilization) is considered.
To run these benchmarks, install formulaic
and the benchmarking dependencies using pip install formulaic[benchmarks]
and then run in a checked out copy of this repository:
python <formulaic_repo>/benchmarks/benchmark.py
Note: This will not install R or the required R dependency Matrix
. This
benchmark will gracefully skip R benchmarks if these are not found.
You can run the standard visualization code using:
python <formulaic_repo>/benchmarks/plot.py
On a ThinkPad T14s Gen 1 with an Intel(R) Core(TM) i7-10610U CPU @ 1.80GHz and 32 GB of DDR4 RAM, this benchmark yields the following results:
version information
python: 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:24:11)
formulaic: 0.3.0
patsy: 0.5.2
pandas: 1.4.1
R: R version 4.0.5 (2021-03-31) -- "Shake and Throw"
model.matrix: (inbuilt into R)
Matrix (sparse.model.matrix): 1.4.0
a
patsy: 0.0624±0.0054 (mean of 7)
formulaic: 0.0161±0.0033 (mean of 7)
formulaic_sparse: 0.326±0.016 (mean of 7)
R: 0.287±0.041 (mean of 7)
R_sparse: 0.38±0.11 (mean of 7)
A
patsy: 5.08±0.22 (mean of 5)
formulaic: 0.2096±0.0065 (mean of 7)
formulaic_sparse: 0.497±0.014 (mean of 7)
R: 0.271±0.048 (mean of 7)
R_sparse: 0.620±0.047 (mean of 7)
a+A
patsy: 5.37±0.25 (mean of 4)
formulaic: 0.2144±0.0050 (mean of 7)
formulaic_sparse: 0.592±0.011 (mean of 7)
R: 0.339±0.051 (mean of 7)
R_sparse: 0.843±0.054 (mean of 7)
a:A
patsy: 5.42±0.20 (mean of 4)
formulaic: 0.2448±0.0098 (mean of 7)
formulaic_sparse: 0.595±0.016 (mean of 7)
R: 0.325±0.053 (mean of 7)
R_sparse: 0.629±0.052 (mean of 7)
A+B
patsy: 10.59±0.36 (mean of 2)
formulaic: 0.3979±0.0042 (mean of 7)
formulaic_sparse: 0.7370±0.0056 (mean of 7)
R: 0.458±0.046 (mean of 7)
R_sparse: 1.129±0.073 (mean of 7)
a:A:B
patsy: 13.14±0.74 (mean of 2)
formulaic: 0.530±0.029 (mean of 7)
formulaic_sparse: 0.950±0.017 (mean of 7)
R: 0.512±0.059 (mean of 7)
R_sparse: 2.44±0.16 (mean of 7)
A:B:C:D
patsy: 33.971909284591675±0 (mean of 1)
formulaic: 1.400±0.013 (mean of 7)
formulaic_sparse: 2.664±0.059 (mean of 7)
R: 1.574±0.043 (mean of 7)
R_sparse: 11.207±0.072 (mean of 2)
a*b*A*B
patsy: 14.136±0.024 (mean of 2)
formulaic: 0.702±0.016 (mean of 7)
formulaic_sparse: 1.2937±0.0088 (mean of 7)
R: 0.744±0.078 (mean of 7)
R_sparse: 8.047±0.099 (mean of 3)
a*b*c*A*B*C
patsy: 52.30743145942688±0 (mean of 1)
formulaic: 3.124±0.016 (mean of 7)
formulaic_sparse: 4.723±0.058 (mean of 5)
R: 3.261±0.034 (mean of 7)
R_sparse: 96.12985253334045±0 (mean of 1)