- A Collection of Methods for Online Changepoint Detection
- Installation
- Examples
- License
- GitHub Repository
- Experimental
- Contributors
- How to Cite This Work
- References
The changepoint_online package provides efficient algorithms for
detecting changes in data streams, based on the Focus
algorithm. The
Focus
algorithm solves the CUSUM likelihood-ratio test exactly in
Key Features
-
Contains all
Focus
exponential family algorithms as well as theNPFocus
algorithm for non-parametric changepoint detection. -
It’s versatile enough to be applied in scenarios where the pre-change parameter is either known or unknown.
-
It is possible to apply constraints to detect specific types of changes (such as increases or decreases in parameter values).
pip install changepoint-online
python -m pip install 'git+https://github.com/grosed/changepoint_online/#egg=changepoint_online&subdirectory=python/package'
To run package tests:
python -m changepoint_online.tests.runtests
from changepoint_online import Focus, Gaussian
import numpy as np
# generating some data with a change at 50,000
np.random.seed(0)
Y = np.concatenate((np.random.normal(loc=0.0, scale=1.0, size=50000), np.random.normal(loc=2.0, scale=1.0, size=5000)))
# initialize a Focus Gaussian detector
detector = Focus(Gaussian())
threshold = 13.0
for y in Y:
# update your detector sequentially with
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 50013, 'changepoint': 50000}
If the pre-change location is known (in case of previous training data), this can be specified with:
# initialize a Focus Gaussian detector (with pre-change location known)
detector = Focus(Gaussian(loc=0))
threshold = 13.0
for y in Y:
# update your detector sequentially with
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 50013, 'changepoint': 50000}
See help(FamilyName)
for distribution specific parameters,
e.g. help(Gaussian)
.
As in the Gaussian case, we can detect:
-
Poisson change-in-rate
-
Gamma change-in-scale (or rate). Exponential change-in-rate implemented as a Gamma
shape=1
. -
Bernoulli change-in-probability
For example:
from changepoint_online import Focus, Gamma
import numpy as np
np.random.seed(0)
Y = np.concatenate((np.random.gamma(4.0, scale=3.0, size=50000),
np.random.gamma(4.0, scale=6.0, size=5000)))
# initialize a Gamma change-in-scale detector (with shape = 4)
detector = Focus(Gamma(shape=4.0))
threshold = 12.0
for y in Y:
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 50012, 'changepoint': 50000}
If we do not know the underlying distribution, or if the nature of the
change is unkown a priori, we can then use NPFocus
.
from changepoint_online import NPFocus
import numpy as np
# Define a simple Gaussian noise function
def generate_gaussian_noise(size):
return np.random.normal(loc=0.0, scale=1.0, size=size)
# Generate mixed data with change in gamma component
gamma_1 = np.random.gamma(4.0, scale=3.0, size=5000)
gamma_2 = np.random.gamma(4.0, scale=6.0, size=5000)
gaussian_noise = generate_gaussian_noise(10000)
Y = np.concatenate((gamma_1 + gaussian_noise[:5000], gamma_2 + gaussian_noise[5000:]))
# Create and use NPFocus detector
## One needs to provide some quantiles to track the null distribuition over
quantiles = [np.quantile(Y[:100], q) for q in [0.25, 0.5, 0.75]]
## the detector can be initialised with those quantiles
detector = NPFocus(quantiles)
stat_over_time = []
for y in Y:
detector.update(y)
# we can sum the statistics over to get a detection
# see (Romano, Eckley, and Fearnhead 2024) for more details
if np.sum(detector.statistic()) > 25:
break
changepoint_info = detector.changepoint()
print(changepoint_info["stopping_time"])
5014
It is possible to run a multivariate analysis via MDFocus by feeding, at
each iteration, a numpy
array.
from changepoint_online import MDFocus, MDGaussian
import numpy as np
np.random.seed(123)
# Define means and standard deviations for pre-change and post-change periods (independent dimensions)
mean_pre = np.array([0.0, 0.0, 5.0])
mean_post = np.array([1.0, 1.0, 4.5])
# Sample sizes for pre-change and post-change periods
size_pre = 5000
size_post = 500
# Generate pre-change data (independent samples for each dimension)
Y_pre = np.random.normal(mean_pre, size=(size_pre, 3))
# Generate post-change data (independent samples for each dimension)
Y_post = np.random.normal(mean_post, size=(size_post, 3))
# Concatenate data with a changepoint in the middle
changepoint = size_pre
Y = np.concatenate((Y_pre, Y_post))
# Initialize the Gaussian detector
# (for change in mean known, as in univariate case, write MDGaussian(loc=mean_pre))
detector = MDFocus(MDGaussian(), pruning_params = (2, 1))
threshold = 25
for y in Y:
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 5014, 'changepoint': 5000}
from changepoint_online import MDFocus, MDPoisson
import numpy as np
np.random.seed(123)
# Define rates (lambda) for pre-change and post-change periods (independent dimensions)
rate_pre = np.array([1.0, 2.0, 3.0])
rate_post = np.array([2.0, 3.0, 4.0])
# Sample sizes for pre-change and post-change periods
size_pre = 5000
size_post = 500
# Generate pre-change data (independent samples for each dimension)
Y_pre = np.random.poisson(rate_pre, size=(size_pre, 3))
# Generate post-change data (independent samples for each dimension)
Y_post = np.random.poisson(rate_post, size=(size_post, 3))
# Concatenate data with a changepoint in the middle
changepoint = size_pre
Y = np.concatenate((Y_pre, Y_post))
# Initialize the Poisson detector
detector = MDFocus(MDPoisson(), pruning_params = (2, 1))
threshold = 45 # Adjust the threshold as needed for the Poisson distribution
for y in Y:
detector.update(y)
if detector.statistic() >= threshold:
break
print(detector.changepoint())
{'stopping_time': 5056, 'changepoint': 5000}
More examples, including real-world applications, are found in the examples folder, including:
-
Change in the tails of Energy Wholesale Price. (markdown, quarto notebook)
-
Constrained Spike inference in calcium imaging data (markdown, quarto notebook)
Copyright (C) 2023 Gaetano Romano, Daniel Grose
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/gpl-3.0.txt.
Source files for the packages can be found at https://github.com/grosed/changepoint_online
The package includes the nunc
algorithm for non-parametric changepoint
detectino from Austin et al. (2023) as an experimental function. It is
possible to call nunc
as:
from changepoint_online.nunc import nunc, nunc_default_quantiles
import numpy as np
np.random.seed(1)
# Generate data with change
X1 = np.random.normal(1, 1, 3000)
X2 = np.random.normal(-2, 4, 5000)
Y = np.concatenate((X1, X2))
# Create and use NUNC detector
detector = nunc(300, 5, nunc_default_quantiles)
stat_over_time = []
for y in Y:
detector.update(y)
stat_over_time.append(detector.max_cost)
if detector.statistic() > 30:
break
changepoint_info = detector.changepoint()
print(f"Output information:\n{changepoint_info}")
Output information:
{'stopping_time': 300, 'changepoint': 284, 'max_cost': 31.492069041110202}
-
Gaetano Romano: email (Author) (Maintainer) (Creator) (Translator)
-
Daniel Grose: email (Author) (Maintainer) (Creator)
-
Kes Ward: email (Author)
-
Austin Edward: email (Author) (Maintainer)
-
Liudmila Pishchagina: email (Author)
-
Guillem Rigaill: email (Author) (Thesis Advisor)
-
Vincent Runge: email (Author) (Thesis Advisor)
-
Paul Fearnhead: email (Author) (Thesis Advisor)
-
Idris A. Eckley: email (Author) (Thesis Advisor)
A possible BibTeX entry for this package could be:
@software{changepoint_online,
author = {Gaetano Romano, Daniel Grose, Kes Ward, Austin Edward, Liudmila Pishchagina, Guillem Rigaill, Vincent Runge, Paul Fearnhead, Idris A. Eckley},
title = {changepoint_online: A Collection of Methods for Online Changepoint Detection.},
month = Apr,
year = 2024,
version = {v1.2.1},
url = {https://https://github.com/grosed/changepoint_online}
}
For citing the methodologies:
-
Gaussian FOCuS: (Romano et al. 2023)
-
Other Exponential Family detectors: (Ward et al. 2024)
-
Multi-dimentional FOCuS (Pishchagina et al. 2023)
-
NPFocus: (Romano, Eckley, and Fearnhead 2024)
-
NUNC: (Austin et al. 2023)
See references below.
Austin, Edward, Gaetano Romano, Idris A Eckley, and Paul Fearnhead. 2023. “Online Non-Parametric Changepoint Detection with Application to Monitoring Operational Performance of Network Devices.” Computational Statistics & Data Analysis 177: 107551.
Pishchagina, Liudmila, Gaetano Romano, Paul Fearnhead, Vincent Runge, and Guillem Rigaill. 2023. “Online Multivariate Changepoint Detection: Leveraging Links with Computational Geometry.” arXiv Preprint arXiv:2311.01174.
Romano, Gaetano, Idris A. Eckley, and Paul Fearnhead. 2024. “A Log-Linear Nonparametric Online Changepoint Detection Algorithm Based on Functional Pruning.” IEEE Transactions on Signal Processing 72: 594–606. https://doi.org/10.1109/tsp.2023.3343550.
Romano, Gaetano, Idris A Eckley, Paul Fearnhead, and Guillem Rigaill. 2023. “Fast Online Changepoint Detection via Functional Pruning CUSUM Statistics.” Journal of Machine Learning Research 24 (81): 1–36. https://www.jmlr.org/papers/v24/21-1230.html.
Ward, Kes, Gaetano Romano, Idris Eckley, and Paul Fearnhead. 2024. “A Constant-Per-Iteration Likelihood Ratio Test for Online Changepoint Detection for Exponential Family Models.” Statistics and Computing 34 (3): 1–11.