Change Detection Tutorial¶

author: @amanqa
source: github.com/amanahuja/change-detection-tutorial

In [1]:

# Modified Stylesheet for notebook.
from IPython.core.display import HTML
def css_styling():
    styles = open("custom.css", "r").read()
    return HTML(styles)

css_styling()

Out[1]:

Section 01: Starting Simple¶

Lesson Plan

Section 01
A trivial signal
Signal 1
A "static mean" change detector
- Recent (amplitude) vs global (mean) detector
Framing the problem and some utility code #maybe move this to an appendix?

In [2]:

import matplotlib.pyplot as plt
import numpy as np

In [3]:

from collections import defaultdict
np.random.seed(seed=111111)
np.set_printoptions(precision=3, suppress=True)

Recap¶

So far, we have

discussed what online algorithm is
written some utility code to simulate an online algorithm

We'll import that utility code so we can use it:

In [4]:

import sys; sys.path.append('../src/')
from change_detector import ChangeDetector, OnlineSimulator

Let examine with a basic signal¶

We'll start with as basic as a signal that we can consider. It consists of a signal value, repeated, that, at some point, changes.

In [5]:

sig1 = np.ones(150)
sig1[:100] *= 50
sig1[100:] *= 40

plt.figure(figsize=(15, 5))
plt.plot(sig1, 'b.')
plt.plot(sig1, '-', alpha=0.2)
plt.ylim(0,100)
plt.title("sig1 : A trivial signal")

Out[5]:

<matplotlib.text.Text at 0x2aa0050>

1b. Static mean detector.¶

Earlier we looked at a simple detector that calculates the mean of the signal at each step, and uses stopping rules based on if an incoming signal value differs from the mean by some threshold percent.

Here it is again:

In [18]:

class MeanDetector(ChangeDetector):
    """
    Static Mean Detector
    
    Residuals: 
        mean_: the mean of signal values seen so far
        diff_: the difference between new value and mean_
    
    Stopping Rule:
        Stop if diff_ exceeds some threshold percentage value. 
        Default is 5%.     
    """
    
    def __init__(self, threshold=0.05): 
        super( MeanDetector, self ).__init__()

        # Save hyper-parameter(s)
        self.threshold = threshold
        
        # Required Attributes
        self.total_val = 0  # Used for calculating mean
        
        # new residuals(s)
        self.diff_ = np.nan 
    
    def update_residuals(self, new_signal_value):
        self._update_base_residuals(new_signal_value)

        # Update attributes
        self.total_val += new_signal_value
        
        #Update residuals 
        self.mean_ = self.total_val / self.signal_size
        self.diff_ = np.absolute(self.mean_ - new_signal_value)
    
    def check_stopping_rules(self, new_signal_value): 
        #check if new value is more than % different from mean
        threshold_level = self.mean_ * self.threshold
        
        if self.diff_ > threshold_level:
            self.rules_triggered = True
        

In [19]:

# Create detector
detector = MeanDetector(threshold=0.05)

OnlineSimulator(detector, sig1).run()

Residuals: ['mean_', 'diff_']
Change detected. Stopping Rule triggered at 100.

Out[19]:

True

Now we have a residual diff, which is just the difference between the latest signal value and the mean. We also have a stopping rule triggered if diff is greater than a manually provided threshold percentage.

Sounds too simple?¶

Yeah, it's pretty basic.

Scaling¶

Since it's based on a threshold %, the stop conditions are invariant to scaling of the signal. That's a good thing.

In [8]:

# MeanDetector easily accomodates a scaling of sig1 
#  as the threshold is given as a percentage.

detector = MeanDetector(threshold=0.05)
OnlineSimulator(detector, sig1 * 1000).run()

Residuals: ['mean_', 'diff_']
Change detected. Stopping Rule triggered at 100.

Out[8]:

True

Noise¶

Let's try adding some noise to the signal.

In [9]:

# Size of change in our test signal
jump_size = sig1[0] - sig1[-1]

# We'll add a small amount (0.02 x jump_size) of Gaussian noise to the signal. 
noise = np.random.normal(
    size=sig1.shape,
    scale=jump_size * 0.02)

detector = MeanDetector(threshold=0.05)

OnlineSimulator(detector, sig1 + noise).run(signal_name='Sig1 with 2% noise')

Residuals: ['mean_', 'diff_']
Change detected. Stopping Rule triggered at 100.

Out[9]:

True

More noise?¶

If we add more noise, the detector is less reliable. with noise at 10% of the signal jump, I found the detector gives some false positives but not always. With noise at 20% of the jump, there's always false positives.

In [10]:

# 10% noise sometimes causes trouble
noise = np.random.normal(
    size=sig1.shape,
    scale=jump_size * 0.10)

detector = MeanDetector(threshold=0.05)
OnlineSimulator(detector, sig1 + noise).run(signal_name='Sig1 with 10% noise')

# 20% noise pretty much always causes problems
noise = np.random.normal(
    size=sig1.shape,
    scale=jump_size * 0.20)

detector = MeanDetector(threshold=0.05)

OnlineSimulator(detector, sig1 + noise).run(signal_name='Sig1 with 20% noise')

Residuals: ['mean_', 'diff_']
Change detected. Stopping Rule triggered at 39.

Residuals: ['mean_', 'diff_']
Change detected. Stopping Rule triggered at 2.

Out[10]:

True

In both cases, note how easy it is for you, as a human, to spot the "real change" by visually inspecting the signal.

Additional Weaknesses of Mean Detector¶

This MeanDetector change detection method has additional weaknesses.

Sensitive to the threshold value, which we are determining manually.
Sensitive to anomalous values and outliers
Signal must be constant. the detector doesn't work well with drift (trend) or local variation (seasonality)

Manual tuning and heuristics could solve some of these problems. We could tune the threshold to be just right, avoiding false positives when we know what to expect. We could try to force the stopping rules not to trigger before the mean stabilizes.

That might be the right approach for certain cases. But let's expand our change detection algorithm arsenal and see what better tools are available.

Looking ahead: Seasonality¶

Imagine a seasonal signal like the one below. How would our MeanDetector perform on this type of signal? How could we build a change detector to handle trend and seasonality?

In [20]:

# Create a seasonal signal
# I imagined a metric that rises from 0 to 5 each calendar month

sig2 = np.linspace(0, 5, num=30)
sig2 = np.concatenate([sig2 for x in xrange(12)])

# Add a jump
jump_size = 5
sig2[250:] = sig2[250:] + jump_size

# Noise
noise = np.random.normal(
    size=sig2.shape,
    scale=jump_size * 0.02)

plt.figure(figsize=(15,5))
plt.plot(sig2 + noise, 'b.', linestyle='')
plt.plot(sig2 + noise, 'b-', alpha=0.15)
plt.ylim(0,15)
plt.xlim(0,365)
plt.title("Imaginary Seasonal signal")

Out[20]:

<matplotlib.text.Text at 0x413f290>

Return to the Change Detection Tutorial Table of Contents