author: @amanqa
source: github.com/amanahuja/change-detection-tutorial
# Modified Stylesheet for notebook.
from IPython.core.display import HTML
def css_styling():
styles = open("custom.css", "r").read()
return HTML(styles)
css_styling()
Overview
An "offline" change detection algorithm is one in which we are given the entire signal and are attempting to "look back" and recognize where the change occcured.
An "online" algorithm is quite different, in which we imagine our signal as "streaming into" our detector, and at any given moment we only have knowledge of the most recent data point and whatever information the detector has retain about the signal. See wikipedia on Sequential Analysis http://en.wikipedia.org/wiki/Sequential_analysis
My interest is in the latter "online" class of algorithms. That means I want to make sure data points are passed to the algorithm one by one, and that I am interested in
I also want to make sure that I am not cheating when I run my experiments.
To do all this, it is helpful to write some code that helps me simulate the online streaming scenario of interest.
import matplotlib.pyplot as plt
import numpy as np
from collections import defaultdict
np.random.seed(seed=111111)
np.set_printoptions(precision=2, suppress=True)
This notebook is an optional part of the tutorial, in which we create a class and some methods in order to "simulate" streaming data.
The scenario that we are simulating:
With these tools, which we will utilize in the tutorial, we can experiment with online algorithms and, more specifically, online change detection algorithms.
Thinking this through, here's what I came up with, in pseudocode
Essentially a change detection algorithm is just calculating residuals and updating them for each new value passed.
class ChangeDetector:
INIT:
initialize residuals (to zero or neutral values)
accept custom parameters
METHOD step(new_signal_value):
Update residuals with new_signal_value
IF Stopping Rules have been triggered:
RETURN True
We'll make that step method a generator, so we can yield values for each new value of the signal.
Then, we can write a simulation that accepts an instance of Change Detector and a signal, and conveniently steps through the signal (one data points at a time).
class OnlineSimulator(change_detector, signal):
METHOD run:
# Pass each data point to the algorithm one by one
FOR EACH data point in the signal:
CALL change_detector.step()
Store value of each residual in change detector
IF stopping rules were triggered:
BREAK FOR LOOP
PLOT signal, residual values
To see the implementation details, you can check out
src/change_detector.py
in the current github repository. But it should be possible to follow this tutorial if you just understand the pseudocode above
import sys; sys.path.append('../src/')
from change_detector import ChangeDetector
# Initialize change detector
detector = ChangeDetector()
print detector
print "Signal size:", detector.signal_size
Change Detector(triggered=False, residuals={}) Signal size: 0
# Add one point to the change_detector
next(detector.step(10))
# (The step() function is written as a python generator)
print detector
print "Signal size:", detector.signal_size
Change Detector(triggered=False, residuals={}) Signal size: 1
It would be inconvenient to have to manually pass the entire signal, one point at a time, to our change detector, so we will use the online simulator.
OnlineSimulator.run() will plot the signal, our residuals, and -- if any rules were triggered -- the stopping point will be shown in the plot.
from change_detector import OnlineSimulator
signal = np.random.randint(10, size=20)
print "Signal: {}".format(signal)
# reset
sim = OnlineSimulator(detector, signal)
# pass the signal the online_simulator
sim.run(plot=True)
Signal: [9 7 7 5 2 3 4 4 0 1 7 4 7 9 5 9 1 5 9 0] Residuals: [] Stopping rule not triggered.
False
print detector
Change Detector(triggered=False, residuals={})
When we send a signal to OnlineSimulator.run() we are stepping through each data point in the signal one by one, just as if
for datapoint in signal:
detector.step(datapoint)
At each step, the detector decides whether the stopping rule should be triggered with only the information available to it at that step.
This is what makes the detector an online algorithm.
class MeanDetector(ChangeDetector):
"""
This is a simple detector that extends the base
ChangeDetector.
This serves as a useful "skeleton" on how to override
the methods of ChangeDetector.
We're adding a "residual" -- we'll keep track of the
mean of the signal.
"""
def __init__(self):
super(MeanDetector, self).__init__()
self.total_val = 0
# Mean is a pretty simple and useful residual
self.mean_ = np.nan
def update_residuals(self, new_signal_value):
self._update_base_residuals(new_signal_value)
self.total_val += new_signal_value
self.mean_ = float(self.total_val) / self.signal_size
def check_stopping_rules(self, new_signal_value):
# Implement Stopping rules here
if np.abs(new_signal_value - self.mean_) > 100:
self.rules_triggered = True
We can use this improved detector just like before:
detector = MeanDetector()
OnlineSimulator(detector, signal).run()
Residuals: ['mean_'] Stopping rule not triggered.
False
We have a residual, now, called mean_
.
We are also retaining a new attribute, total_val
which we use to calculate the mean. Our scaffolding code only plots attributes ending with underscore (_
), which we use as a way of marking our residuals.
The mean is close to 5
right now. The stopping rule says that if a new value passed differs from the mean by more than 100 units, then the stopping rule should be triggered.
Let's test this out by building a new signal.
signal[17] = 200
plt.plot(signal)
[<matplotlib.lines.Line2D at 0x3e1d7d0>]
# Let's run our detector on this signal
OnlineSimulator(detector, signal).run()
Residuals: ['mean_'] Change detected. Stopping Rule triggered at 17.
True
Note that the signal was triggered at data point 17. No information about the rest of the signal was available to the detector at the time the stopping rule was triggered.
After the rules were triggered, the simulation stopped. No residuals were calculated for the remainder of the points. (The rest of signal is still shown, only for illustrative purposes).
Our online change detectors, at any given time $t_0$, only have access to the portion of the signal that has before, $t < t_0$.
Okay, so now we have the ability to write change detectors, easily pass signals to them in an "online" fashion, and to examine and visualize the results.
Return to the Change Detection Tutorial Table of Contents