author: @amanqa
source: github.com/amanahuja/change-detection-tutorial
# Modified Stylesheet for notebook.
from IPython.core.display import HTML
def css_styling():
styles = open("custom.css", "r").read()
return HTML(styles)
css_styling()
Lesson Plan
Section 01
A trivial signal
Signal 1
A "static mean" change detector
Framing the problem and some utility code #maybe move this to an appendix?
import matplotlib.pyplot as plt
import numpy as np
from collections import defaultdict
np.random.seed(seed=111111)
np.set_printoptions(precision=3, suppress=True)
So far, we have
We'll import that utility code so we can use it:
import sys; sys.path.append('../src/')
from change_detector import ChangeDetector, OnlineSimulator
We'll start with as basic as a signal that we can consider. It consists of a signal value, repeated, that, at some point, changes.
sig1 = np.ones(150)
sig1[:100] *= 50
sig1[100:] *= 40
plt.figure(figsize=(15, 5))
plt.plot(sig1, 'b.')
plt.plot(sig1, '-', alpha=0.2)
plt.ylim(0,100)
plt.title("sig1 : A trivial signal")
<matplotlib.text.Text at 0x2aa0050>
Earlier we looked at a simple detector that calculates the mean of the signal at each step, and uses stopping rules based on if an incoming signal value differs from the mean by some threshold percent.
Here it is again:
class MeanDetector(ChangeDetector):
"""
Static Mean Detector
Residuals:
mean_: the mean of signal values seen so far
diff_: the difference between new value and mean_
Stopping Rule:
Stop if diff_ exceeds some threshold percentage value.
Default is 5%.
"""
def __init__(self, threshold=0.05):
super( MeanDetector, self ).__init__()
# Save hyper-parameter(s)
self.threshold = threshold
# Required Attributes
self.total_val = 0 # Used for calculating mean
# new residuals(s)
self.diff_ = np.nan
def update_residuals(self, new_signal_value):
self._update_base_residuals(new_signal_value)
# Update attributes
self.total_val += new_signal_value
#Update residuals
self.mean_ = self.total_val / self.signal_size
self.diff_ = np.absolute(self.mean_ - new_signal_value)
def check_stopping_rules(self, new_signal_value):
#check if new value is more than % different from mean
threshold_level = self.mean_ * self.threshold
if self.diff_ > threshold_level:
self.rules_triggered = True
# Create detector
detector = MeanDetector(threshold=0.05)
OnlineSimulator(detector, sig1).run()
Residuals: ['mean_', 'diff_'] Change detected. Stopping Rule triggered at 100.
True
Now we have a residual diff
, which is just the difference between the latest signal value and the mean. We also have a stopping rule triggered if diff
is greater than a manually provided threshold percentage.
Yeah, it's pretty basic.
Since it's based on a threshold %, the stop conditions are invariant to scaling of the signal. That's a good thing.
# MeanDetector easily accomodates a scaling of sig1
# as the threshold is given as a percentage.
detector = MeanDetector(threshold=0.05)
OnlineSimulator(detector, sig1 * 1000).run()
Residuals: ['mean_', 'diff_'] Change detected. Stopping Rule triggered at 100.
True
Let's try adding some noise to the signal.
# Size of change in our test signal
jump_size = sig1[0] - sig1[-1]
# We'll add a small amount (0.02 x jump_size) of Gaussian noise to the signal.
noise = np.random.normal(
size=sig1.shape,
scale=jump_size * 0.02)
detector = MeanDetector(threshold=0.05)
OnlineSimulator(detector, sig1 + noise).run(signal_name='Sig1 with 2% noise')
Residuals: ['mean_', 'diff_'] Change detected. Stopping Rule triggered at 100.
True
If we add more noise, the detector is less reliable. with noise at 10% of the signal jump, I found the detector gives some false positives but not always. With noise at 20% of the jump, there's always false positives.
# 10% noise sometimes causes trouble
noise = np.random.normal(
size=sig1.shape,
scale=jump_size * 0.10)
detector = MeanDetector(threshold=0.05)
OnlineSimulator(detector, sig1 + noise).run(signal_name='Sig1 with 10% noise')
# 20% noise pretty much always causes problems
noise = np.random.normal(
size=sig1.shape,
scale=jump_size * 0.20)
detector = MeanDetector(threshold=0.05)
OnlineSimulator(detector, sig1 + noise).run(signal_name='Sig1 with 20% noise')
Residuals: ['mean_', 'diff_'] Change detected. Stopping Rule triggered at 39. Residuals: ['mean_', 'diff_'] Change detected. Stopping Rule triggered at 2.
True
In both cases, note how easy it is for you, as a human, to spot the "real change" by visually inspecting the signal.
This MeanDetector change detection method has additional weaknesses.
Manual tuning and heuristics could solve some of these problems. We could tune the threshold to be just right, avoiding false positives when we know what to expect. We could try to force the stopping rules not to trigger before the mean stabilizes.
That might be the right approach for certain cases. But let's expand our change detection algorithm arsenal and see what better tools are available.
Imagine a seasonal signal like the one below. How would our MeanDetector perform on this type of signal? How could we build a change detector to handle trend and seasonality?
# Create a seasonal signal
# I imagined a metric that rises from 0 to 5 each calendar month
sig2 = np.linspace(0, 5, num=30)
sig2 = np.concatenate([sig2 for x in xrange(12)])
# Add a jump
jump_size = 5
sig2[250:] = sig2[250:] + jump_size
# Noise
noise = np.random.normal(
size=sig2.shape,
scale=jump_size * 0.02)
plt.figure(figsize=(15,5))
plt.plot(sig2 + noise, 'b.', linestyle='')
plt.plot(sig2 + noise, 'b-', alpha=0.15)
plt.ylim(0,15)
plt.xlim(0,365)
plt.title("Imaginary Seasonal signal")
<matplotlib.text.Text at 0x413f290>
Return to the Change Detection Tutorial Table of Contents