Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup approx entropy #1107

Closed

Conversation

Scott-Simmons
Copy link
Contributor

@Scott-Simmons Scott-Simmons commented Jan 15, 2025

np.max(np.abs(x_re[:, np.newaxis] - x_re[np.newaxis, :]), axis=2) <= r was identified as a hotspot on the scalene profiler.*

This PR speeds this up.

Here is a snippet of before optimisations made:

image

Here is after changes made:

image

Does not change much... so this may be a red herring hotspot. Profiling the individual line change showed a massive speedup, but overall it harms performance... leaving in draft for now

Reproduce profiling results with:

from tsfresh.examples import robot_execution_failures
from tsfresh import extract_features
from tsfresh.utilities.dataframe_functions import impute
import pandas as pd
robot_execution_failures.download_robot_execution_failures()
df, y = robot_execution_failures.load_robot_execution_failures(
)
df_large = pd.concat([df]*100, ignore_index=True)
X = extract_features(
    df_large,
    column_id='id',
    column_sort='time',
    impute_function=impute,
    disable_progressbar=True,
)
scalene --cpu --profile --multiprocessing profile_tsfresh.py

Scalene profiler appears to be more informative than other python profiling tooIs (that I know of)

@Scott-Simmons Scott-Simmons force-pushed the speedup-approx-entropy branch from e8e890e to 0db5ea8 Compare January 15, 2025 09:46
Functionally equivalent output, but much faster, see example below:

In [16]: from scipy.spatial.distance import cdist

In [17]: import numpy as np

In [18]: x = [12,13,15,16,17]*10

In [19]: x_re = np.lib.stride_tricks.sliding_window_view(x, window_shape=2)

In [20]: %timeit np.max(np.abs(x_re[:, np.newaxis] - x_re[np.newaxis, :]), axis=2)
    ...:
62.9 μs ± 660 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [21]: %timeit cdist(x_re, x_re, metric="chebyshev")
5.58 μs ± 19.8 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant