Speedup pytensor import #1161

ricardoV94 · 2025-01-21T15:16:55Z

If I provide the blas__ldflags to avoid #1160 I get these results running:

python -m benchmark_imports pytensor

Before:

1.5028 root       pytensor                                
0.8698 project    pytensor.scalar                         
0.6148 project    pytensor.scalar.math                    
0.4898 dependency scipy.stats                             
0.3592 project    pytensor.tensor                         
0.3488 transitive scipy.stats._stats_py                    from scipy.stats
0.2696 project    pytensor.tensor.rewriting               
0.2544 project    pytensor.scalar.basic                   
0.2479 project    pytensor.printing                       
0.2077 transitive scipy.stats.distributions                from scipy.stats._stats_py
0.1850 project    pytensor.compile                        
0.1841 project    pytensor.compile.function               
0.1665 project    pytensor.compile.mode                   
0.1589 transitive scipy.stats._continuous_distns           from scipy.stats.distributions
0.1542 project    pytensor.link.c.basic                   
0.1317 project    pytensor.configdefaults                 
0.1116 project    pytensor.link.c.cmodule                 
0.0969 dependency numpy                                   
0.0939 dependency scipy.special                           
0.0863 project    pytensor.tensor.pad                     
0.0821 dependency setuptools                              
0.0808 project    pytensor.scan                           
0.0766 project    pytensor.tensor.rewriting.basic         
0.0760 project    pytensor.scan.rewriting                 
0.0732 project    pytensor.scan.op

After:

python -m benchmark_imports pytensor
0.3682 root       pytensor                                
0.1243 project    pytensor.sparse                         
0.1234 project    pytensor.sparse.rewriting               
0.1089 project    pytensor.configdefaults                 
0.0993 project    pytensor.sparse.basic                   
0.0950 dependency numpy                                   
0.0550 project    pytensor.tensor                         
0.0486 dependency scipy.sparse                            
0.0458 project    pytensor.tensor.rewriting               
0.0325 transitive numpy.__config__                         from numpy
0.0319 transitive numpy.core                               from numpy.__config__
0.0264 transitive scipy.sparse.csgraph                     from scipy.sparse
0.0238 transitive scipy.sparse.csgraph._laplacian          from scipy.sparse.csgraph
0.0236 transitive scipy.sparse.linalg                      from scipy.sparse.csgraph._laplacian
0.0201 transitive numpy.lib                                from numpy
0.0196 project    pytensor.scalar                         
0.0177 project    pytensor.tensor.rewriting.math          
0.0164 project    pytensor.scalar.basic                   
0.0151 project    pytensor.tensor.rewriting.basic         
0.0150 transitive scipy.sparse.linalg._isolve              from scipy.sparse.linalg
0.0139 transitive scipy.sparse.linalg._isolve.iterative    from scipy.sparse.linalg._isolve
0.0132 transitive scipy.linalg                             from scipy.sparse.linalg._isolve.iterative
0.0128 transitive numpy.random                             from numpy
0.0124 transitive numpy.random._pickle                     from numpy.random
0.0119 project    pytensor.printing

For perspective numpy used to take 6.5% of the total import time, and now takes 25%.

I couldn't avoid importing scipy.sparse because we register the base spmatrix on the as_symbolic and shared_constructor dispatch functions. The import is messy: scipy/scipy#22382

📚 Documentation preview 📚: https://pytensor--1161.org.readthedocs.build/en/1161/

codecov · 2025-01-23T11:10:02Z

Codecov Report

Attention: Patch coverage is 60.29412% with 81 lines in your changes missing coverage. Please review.

Project coverage is 82.07%. Comparing base (a0fe30d) to head (0419bf6).

Files with missing lines	Patch %	Lines
pytensor/scalar/math.py	30.00%	42 Missing ⚠️
pytensor/printing.py	42.30%	14 Missing and 1 partial ⚠️
pytensor/tensor/blas.py	11.11%	8 Missing ⚠️
pytensor/sparse/basic.py	50.00%	0 Missing and 5 partials ⚠️
pytensor/d3viz/formatting.py	25.00%	3 Missing ⚠️
pytensor/graph/rewriting/basic.py	0.00%	2 Missing ⚠️
pytensor/link/c/cmodule.py	66.66%	2 Missing ⚠️
pytensor/sparse/rewriting.py	33.33%	2 Missing ⚠️
pytensor/sparse/type.py	66.66%	0 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1161      +/-   ##
==========================================
- Coverage   82.11%   82.07%   -0.04%     
==========================================
  Files         186      186              
  Lines       48201    48232      +31     
  Branches     8679     8673       -6     
==========================================
+ Hits        39579    39588       +9     
- Misses       6447     6473      +26     
+ Partials     2175     2171       -4

Files with missing lines	Coverage Δ
pytensor/compile/compilelock.py	`97.22% <100.00%> (+0.07%)`	⬆️
pytensor/configdefaults.py	`73.46% <100.00%> (-0.09%)`	⬇️
pytensor/configparser.py	`92.53% <100.00%> (ø)`
pytensor/link/vm.py	`92.39% <100.00%> (ø)`
pytensor/scan/op.py	`84.69% <100.00%> (ø)`
pytensor/sparse/sharedvar.py	`94.11% <100.00%> (ø)`
pytensor/tensor/blas_scipy.py	`88.23% <100.00%> (+3.23%)`	⬆️
pytensor/tensor/random/basic.py	`99.24% <100.00%> (+0.01%)`	⬆️
pytensor/tensor/rewriting/blas_scipy.py	`87.50% <100.00%> (+5.14%)`	⬆️
pytensor/tensor/slinalg.py	`93.68% <100.00%> (+0.16%)`	⬆️
... and 10 more

Armavica

I read through the changes and it looks good, however I am wondering if replacing one initial import with possibly several ones (one per function) would not make things slower overall?

ricardoV94 · 2025-01-23T16:25:29Z

Yeah it should ideally be done outside of the function, but inside the perform method. For all those Scalar Ops that have a nfunc_spec, we don't really ever use the impl/st_impl. There are usually wrapped in an Elemwise that uses the function directly from the nfunc_spec. Same for Blockwise with gufunc_spec. And if you want fast code, you won't use the python implementation to begin with.

The concern is valid for the linalg / random stuff. Gonna think about it a bit more then...

ricardoV94 · 2025-01-23T16:38:38Z

There's an uglier alternative with global variables xD

import numpy as np
from numpy import sin

def foo0(x):
    return np.sin(x)

def foo1(x):
    return sin(x)

sin2 = None

def foo2(x):
    global sin2
    if sin2 is None:
        from numpy import sin
        sin2 = sin

    return sin2(x)

def foo3(x):
    from numpy import sin
    return sin(x)

%timeit foo0(5.0)
%timeit foo1(5.0)
%timeit foo2(5.0)    
%timeit foo3(5.0)
# 931 ns ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# 840 ns ± 16.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# 847 ns ± 7.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# 1.77 μs ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Always importing has a considerable overhead indeed.

Armavica · 2025-01-23T18:16:14Z

Could this help? https://scientific-python.org/specs/spec-0001/

ricardoV94 · 2025-01-23T18:20:34Z

Could this help? https://scientific-python.org/specs/spec-0001/

I have to check carefully, we do a lot of registering stuff when pytensor is loaded (all the rewrites and whatnot), which involve instantiating many Ops.

ricardoV94 force-pushed the speedup_pytensor_import branch 3 times, most recently from 5e060e1 to 9f0a6cc Compare January 21, 2025 15:41

ricardoV94 added performance maintenance labels Jan 21, 2025

ricardoV94 force-pushed the speedup_pytensor_import branch 2 times, most recently from 948b079 to 89fe0d7 Compare January 23, 2025 10:47

ricardoV94 added 9 commits January 23, 2025 12:53

Scipy blas is no longer optional

fae2a4d

Lazy scipy special, stats and linalg imports

369dcab

Lazy setuptools imports

f090f5b

Lazy pydot import

7d573a0

Lazy filelock import

c6e328b

Lazy pdb import

c1d6505

Replace socket.gethostname by platform.node

7ecc512

Use more specialized imports in config utils

ba88d39

Use more specialized scipy sparse imports

0419bf6

ricardoV94 mentioned this pull request Jan 23, 2025

Make BLAS flags check lazy and more actionable #1165

Merged

ricardoV94 requested review from Armavica and jessegrabowski January 23, 2025 15:21

ricardoV94 force-pushed the speedup_pytensor_import branch from 89fe0d7 to 0419bf6 Compare January 23, 2025 15:25

Armavica reviewed Jan 23, 2025

View reviewed changes

ricardoV94 mentioned this pull request Jan 23, 2025

Speedup pytensor import #1170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup pytensor import #1161

Speedup pytensor import #1161

ricardoV94 commented Jan 21, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jan 23, 2025 •

edited

Loading

Armavica left a comment

ricardoV94 commented Jan 23, 2025

ricardoV94 commented Jan 23, 2025 •

edited

Loading

Armavica commented Jan 23, 2025

ricardoV94 commented Jan 23, 2025

Speedup pytensor import #1161

Are you sure you want to change the base?

Speedup pytensor import #1161

Conversation

ricardoV94 commented Jan 21, 2025 • edited by github-actions bot Loading

Before:

After:

codecov bot commented Jan 23, 2025 • edited Loading

Codecov Report

Armavica left a comment

Choose a reason for hiding this comment

ricardoV94 commented Jan 23, 2025

ricardoV94 commented Jan 23, 2025 • edited Loading

Armavica commented Jan 23, 2025

ricardoV94 commented Jan 23, 2025

ricardoV94 commented Jan 21, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jan 23, 2025 •

edited

Loading

ricardoV94 commented Jan 23, 2025 •

edited

Loading