Notebook

Change Detection Tutorial¶

author: @amanqa
source: github.com/amanahuja/change-detection-tutorial

In [1]:

# Modified Stylesheet for notebook.
from IPython.core.display import HTML
def css_styling():
    styles = open("custom.css", "r").read()
    return HTML(styles)

css_styling()

Out[1]:

Section 0: Some Scaffolding¶

Overview

Offline vs. Online change detection
We write some utility functions that allow us to
pass signals to a change detection algorithm in "online" simulation mode
calculate residuals after receiving each new data point
Test the stopping rules at each iteration
Plot results and output conveniently.

Aside: Offline vs. Online algorithms¶

An "offline" change detection algorithm is one in which we are given the entire signal and are attempting to "look back" and recognize where the change occcured.

An "online" algorithm is quite different, in which we imagine our signal as "streaming into" our detector, and at any given moment we only have knowledge of the most recent data point and whatever information the detector has retain about the signal. See wikipedia on Sequential Analysis http://en.wikipedia.org/wiki/Sequential_analysis

My interest is in the latter "online" class of algorithms. That means I want to make sure data points are passed to the algorithm one by one, and that I am interested in

detecting a change,
in deciding whether a change is significant enough to trigger an alert, and
in how quickly I can detect the change after it occurs.

I also want to make sure that I am not cheating when I run my experiments.

To do all this, it is helpful to write some code that helps me simulate the online streaming scenario of interest.

In [2]:

import matplotlib.pyplot as plt
import numpy as np

In [3]:

from collections import defaultdict
np.random.seed(seed=111111)
np.set_printoptions(precision=2, suppress=True)

... thus we need some "scaffolding"¶

This notebook is an optional part of the tutorial, in which we create a class and some methods in order to "simulate" streaming data.

The scenario that we are simulating:

each data point is passed to the algorithm one by one
the algorithm performs its analysis on that new information along with any stored historical information
the algorithm decides after each data point whether or not a change has been detected.

With these tools, which we will utilize in the tutorial, we can experiment with online algorithms and, more specifically, online change detection algorithms.

Pseudocode¶

Thinking this through, here's what I came up with, in pseudocode

Essentially a change detection algorithm is just calculating residuals and updating them for each new value passed.

class ChangeDetector:
    INIT:
        initialize residuals (to zero or neutral values)
        accept custom parameters

    METHOD step(new_signal_value):
        Update residuals with new_signal_value
        IF Stopping Rules have been triggered:
            RETURN True

We'll make that step method a generator, so we can yield values for each new value of the signal.

Then, we can write a simulation that accepts an instance of Change Detector and a signal, and conveniently steps through the signal (one data points at a time).

class OnlineSimulator(change_detector, signal): 

    METHOD run: 
        # Pass each data point to the algorithm one by one
        FOR EACH data point in the signal:
            CALL change_detector.step() 
            Store value of each residual in change detector
            IF stopping rules were triggered: 
                BREAK FOR LOOP

        PLOT signal, residual values

Import this scaffolding¶

To see the implementation details, you can check out

src/change_detector.py

in the current github repository. But it should be possible to follow this tutorial if you just understand the pseudocode above

In [4]:

import sys; sys.path.append('../src/')
from change_detector import ChangeDetector

In [5]:

# Initialize change detector

detector = ChangeDetector()

print detector
print "Signal size:", detector.signal_size

Change Detector(triggered=False, residuals={})
Signal size: 0

In [6]:

# Add one point to the change_detector
next(detector.step(10))

# (The step() function is written as a python generator)

print detector
print "Signal size:", detector.signal_size

Change Detector(triggered=False, residuals={})
Signal size: 1

The Online Simulator¶

It would be inconvenient to have to manually pass the entire signal, one point at a time, to our change detector, so we will use the online simulator.

OnlineSimulator.run() will plot the signal, our residuals, and -- if any rules were triggered -- the stopping point will be shown in the plot.

In [7]:

from change_detector import OnlineSimulator

In [8]:

signal = np.random.randint(10, size=20)
print "Signal: {}".format(signal)

# reset
sim = OnlineSimulator(detector, signal)

# pass the signal the online_simulator
sim.run(plot=True)

Signal: [9 7 7 5 2 3 4 4 0 1 7 4 7 9 5 9 1 5 9 0]
Residuals: []
Stopping rule not triggered.

Out[8]:

False

(Note that the stopping rule will not be triggered, because we haven't created any stopping rules yet.)

In [9]:

print detector

Change Detector(triggered=False, residuals={})

Important:¶

When we send a signal to OnlineSimulator.run() we are stepping through each data point in the signal one by one, just as if

for datapoint in signal: 
    detector.step(datapoint)

At each step, the detector decides whether the stopping rule should be triggered with only the information available to it at that step.

This is what makes the detector an online algorithm.

Fine, okay, that works, but it doesn't do much. We'll now write a new Class that extends ChangeDetector. We'll add a residual.

In [10]:

class MeanDetector(ChangeDetector):
    """
    This is a simple detector that extends the base 
    ChangeDetector. 
    
    This serves as a useful "skeleton" on how to override 
    the methods of ChangeDetector.
    
    We're adding a "residual" -- we'll keep track of the 
    mean of the signal. 
    """
    def __init__(self):
        super(MeanDetector, self).__init__()

        self.total_val = 0
        # Mean is a pretty simple and useful residual
        self.mean_ = np.nan
        
    def update_residuals(self, new_signal_value):
        self._update_base_residuals(new_signal_value)
        
        self.total_val += new_signal_value
        self.mean_ = float(self.total_val) / self.signal_size
    
    def check_stopping_rules(self, new_signal_value):
        # Implement Stopping rules here
        if np.abs(new_signal_value - self.mean_) > 100: 
            self.rules_triggered = True

We can use this improved detector just like before:

In [11]:

detector = MeanDetector()

OnlineSimulator(detector, signal).run()

Residuals: ['mean_']
Stopping rule not triggered.

Out[11]:

False

We have a residual, now, called mean_.

We are also retaining a new attribute, total_val which we use to calculate the mean. Our scaffolding code only plots attributes ending with underscore (_), which we use as a way of marking our residuals.

The mean is close to 5 right now. The stopping rule says that if a new value passed differs from the mean by more than 100 units, then the stopping rule should be triggered.

Let's test this out by building a new signal.

In [12]:

signal[17] = 200

plt.plot(signal)

Out[12]:

[<matplotlib.lines.Line2D at 0x3e1d7d0>]

In [13]:

# Let's run our detector on this signal

OnlineSimulator(detector, signal).run()

Residuals: ['mean_']
Change detected. Stopping Rule triggered at 17.

Out[13]:

True

Note that the signal was triggered at data point 17. No information about the rest of the signal was available to the detector at the time the stopping rule was triggered.

After the rules were triggered, the simulation stopped. No residuals were calculated for the remainder of the points. (The rest of signal is still shown, only for illustrative purposes).

This is the important disctinction between online and offline algorithms.¶

Our online change detectors, at any given time $t_0$, only have access to the portion of the signal that has before, $t < t_0$.

Finally, let's get started.¶

Okay, so now we have the ability to write change detectors, easily pass signals to them in an "online" fashion, and to examine and visualize the results.

Return to the Change Detection Tutorial Table of Contents