Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup pytensor import #1161

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

ricardoV94
Copy link
Member

@ricardoV94 ricardoV94 commented Jan 21, 2025

If I provide the blas__ldflags to avoid #1160 I get these results running:

python -m benchmark_imports pytensor

Before:

1.5028 root       pytensor                                
0.8698 project    pytensor.scalar                         
0.6148 project    pytensor.scalar.math                    
0.4898 dependency scipy.stats                             
0.3592 project    pytensor.tensor                         
0.3488 transitive scipy.stats._stats_py                    from scipy.stats
0.2696 project    pytensor.tensor.rewriting               
0.2544 project    pytensor.scalar.basic                   
0.2479 project    pytensor.printing                       
0.2077 transitive scipy.stats.distributions                from scipy.stats._stats_py
0.1850 project    pytensor.compile                        
0.1841 project    pytensor.compile.function               
0.1665 project    pytensor.compile.mode                   
0.1589 transitive scipy.stats._continuous_distns           from scipy.stats.distributions
0.1542 project    pytensor.link.c.basic                   
0.1317 project    pytensor.configdefaults                 
0.1116 project    pytensor.link.c.cmodule                 
0.0969 dependency numpy                                   
0.0939 dependency scipy.special                           
0.0863 project    pytensor.tensor.pad                     
0.0821 dependency setuptools                              
0.0808 project    pytensor.scan                           
0.0766 project    pytensor.tensor.rewriting.basic         
0.0760 project    pytensor.scan.rewriting                 
0.0732 project    pytensor.scan.op     

After:

python -m benchmark_imports pytensor
0.3682 root       pytensor                                
0.1243 project    pytensor.sparse                         
0.1234 project    pytensor.sparse.rewriting               
0.1089 project    pytensor.configdefaults                 
0.0993 project    pytensor.sparse.basic                   
0.0950 dependency numpy                                   
0.0550 project    pytensor.tensor                         
0.0486 dependency scipy.sparse                            
0.0458 project    pytensor.tensor.rewriting               
0.0325 transitive numpy.__config__                         from numpy
0.0319 transitive numpy.core                               from numpy.__config__
0.0264 transitive scipy.sparse.csgraph                     from scipy.sparse
0.0238 transitive scipy.sparse.csgraph._laplacian          from scipy.sparse.csgraph
0.0236 transitive scipy.sparse.linalg                      from scipy.sparse.csgraph._laplacian
0.0201 transitive numpy.lib                                from numpy
0.0196 project    pytensor.scalar                         
0.0177 project    pytensor.tensor.rewriting.math          
0.0164 project    pytensor.scalar.basic                   
0.0151 project    pytensor.tensor.rewriting.basic         
0.0150 transitive scipy.sparse.linalg._isolve              from scipy.sparse.linalg
0.0139 transitive scipy.sparse.linalg._isolve.iterative    from scipy.sparse.linalg._isolve
0.0132 transitive scipy.linalg                             from scipy.sparse.linalg._isolve.iterative
0.0128 transitive numpy.random                             from numpy
0.0124 transitive numpy.random._pickle                     from numpy.random
0.0119 project    pytensor.printing         

For perspective numpy used to take 6.5% of the total import time, and now takes 25%.


I couldn't avoid importing scipy.sparse because we register the base spmatrix on the as_symbolic and shared_constructor dispatch functions. The import is messy: scipy/scipy#22382


📚 Documentation preview 📚: https://pytensor--1161.org.readthedocs.build/en/1161/

@ricardoV94 ricardoV94 force-pushed the speedup_pytensor_import branch 3 times, most recently from 5e060e1 to 9f0a6cc Compare January 21, 2025 15:41
@ricardoV94 ricardoV94 force-pushed the speedup_pytensor_import branch 2 times, most recently from 948b079 to 89fe0d7 Compare January 23, 2025 10:47
Copy link

codecov bot commented Jan 23, 2025

Codecov Report

Attention: Patch coverage is 60.29412% with 81 lines in your changes missing coverage. Please review.

Project coverage is 82.07%. Comparing base (a0fe30d) to head (0419bf6).

Files with missing lines Patch % Lines
pytensor/scalar/math.py 30.00% 42 Missing ⚠️
pytensor/printing.py 42.30% 14 Missing and 1 partial ⚠️
pytensor/tensor/blas.py 11.11% 8 Missing ⚠️
pytensor/sparse/basic.py 50.00% 0 Missing and 5 partials ⚠️
pytensor/d3viz/formatting.py 25.00% 3 Missing ⚠️
pytensor/graph/rewriting/basic.py 0.00% 2 Missing ⚠️
pytensor/link/c/cmodule.py 66.66% 2 Missing ⚠️
pytensor/sparse/rewriting.py 33.33% 2 Missing ⚠️
pytensor/sparse/type.py 66.66% 0 Missing and 2 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1161      +/-   ##
==========================================
- Coverage   82.11%   82.07%   -0.04%     
==========================================
  Files         186      186              
  Lines       48201    48232      +31     
  Branches     8679     8673       -6     
==========================================
+ Hits        39579    39588       +9     
- Misses       6447     6473      +26     
+ Partials     2175     2171       -4     
Files with missing lines Coverage Δ
pytensor/compile/compilelock.py 97.22% <100.00%> (+0.07%) ⬆️
pytensor/configdefaults.py 73.46% <100.00%> (-0.09%) ⬇️
pytensor/configparser.py 92.53% <100.00%> (ø)
pytensor/link/vm.py 92.39% <100.00%> (ø)
pytensor/scan/op.py 84.69% <100.00%> (ø)
pytensor/sparse/sharedvar.py 94.11% <100.00%> (ø)
pytensor/tensor/blas_scipy.py 88.23% <100.00%> (+3.23%) ⬆️
pytensor/tensor/random/basic.py 99.24% <100.00%> (+0.01%) ⬆️
pytensor/tensor/rewriting/blas_scipy.py 87.50% <100.00%> (+5.14%) ⬆️
pytensor/tensor/slinalg.py 93.68% <100.00%> (+0.16%) ⬆️
... and 10 more

Copy link
Member

@Armavica Armavica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read through the changes and it looks good, however I am wondering if replacing one initial import with possibly several ones (one per function) would not make things slower overall?

@ricardoV94
Copy link
Member Author

Yeah it should ideally be done outside of the function, but inside the perform method. For all those Scalar Ops that have a nfunc_spec, we don't really ever use the impl/st_impl. There are usually wrapped in an Elemwise that uses the function directly from the nfunc_spec. Same for Blockwise with gufunc_spec. And if you want fast code, you won't use the python implementation to begin with.

The concern is valid for the linalg / random stuff. Gonna think about it a bit more then...

@ricardoV94
Copy link
Member Author

ricardoV94 commented Jan 23, 2025

There's an uglier alternative with global variables xD

import numpy as np
from numpy import sin

def foo0(x):
    return np.sin(x)

def foo1(x):
    return sin(x)

sin2 = None

def foo2(x):
    global sin2
    if sin2 is None:
        from numpy import sin
        sin2 = sin

    return sin2(x)

def foo3(x):
    from numpy import sin
    return sin(x)

%timeit foo0(5.0)
%timeit foo1(5.0)
%timeit foo2(5.0)    
%timeit foo3(5.0)
# 931 ns ± 20.2 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# 840 ns ± 16.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# 847 ns ± 7.45 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
# 1.77 μs ± 12.7 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Always importing has a considerable overhead indeed.

@Armavica
Copy link
Member

Could this help? https://scientific-python.org/specs/spec-0001/

@ricardoV94
Copy link
Member Author

Could this help? https://scientific-python.org/specs/spec-0001/

I have to check carefully, we do a lot of registering stuff when pytensor is loaded (all the rewrites and whatnot), which involve instantiating many Ops.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants