This notebook gives an example for how to use the Python implementation of Friedman's Supersmoother at http://github.com/jakevdp/supersmoother/
Supersmoother is a non-parametric locally-linear smooth in which the size of the local neighborhood is tuned to the characteristics of the data. It was introduced in 1984 by JH Friedman in a paper titled "A Variable Span Smoother" (pdf)
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# Use seaborn for plotting defaults.
# This can be safely commented-out if seaborn is not installed
import seaborn; seaborn.set()
# Install the package
# source at http://github.com/jakevdp/supersmoother
# or use ``pip install supersmoother``
from supersmoother import SuperSmoother, LinearSmoother
def make_test_set(N=200, rseed_x=None, rseed_y=None):
"""Code to generate the test set from Friedman 1984"""
rng_x = np.random.RandomState(rseed_x)
rng_y = np.random.RandomState(rseed_y)
x = rng_x.rand(N)
dy = x
y = np.sin(2 * np.pi * (1 - x) ** 2) + dy * rng_y.randn(N)
return x, y, dy
# Generate and visualize the data
t, y, dy = make_test_set(rseed_x=0, rseed_y=1)
plt.errorbar(t, y, dy, fmt='o', alpha=0.3);
This is data generated the same way as in Friedman's paper. The nice part of this is that the data have a variety of errors, and the second derivative of the true model changes appreciably over the range:
Here is how to fit the supersmoother to the data:
# fit the supersmoother model
model = SuperSmoother()
model.fit(t, y, dy)
# find the smoothed fit to the data
tfit = np.linspace(0, 1, 1000)
yfit = model.predict(tfit)
Now we'll visualize this smoothed curve:
# Show the smoothed model of the data
plt.errorbar(t, y, dy, fmt='o', alpha=0.3)
plt.plot(tfit, yfit, '-k');
The supersmoother is based on initial smooths where the size of each local neighborhood is some fraction $f$ of the total dataset. In analogy with audio frequencies, Friedman calls these the tweeter $(f = 0.05)$, the midrange $(f = 0.2)$, and the woofer $(f = 0.5)$. We can visualize these individual fits here:
plt.errorbar(t, y, dy, fmt='o', alpha=0.3)
for smooth in model.primary_smooths:
plt.plot(tfit, smooth.predict(tfit),
label='span = {0:.2f}'.format(smooth.span))
plt.legend();
The final supersmoother value uses cross-validation to select the best smoothing value at each time for the dataset. We can show these smoothed span values as follows:
t = np.linspace(0, 1, 1000)
plt.plot(t, model.span(t))
plt.xlabel('t')
plt.ylabel('smoothed span value');
These spans are fit to the particular data realization. Friedman chose to use 1000 realizations of the smoother to get a better estimate of how the span varies. We'll do the same here:
N = 1000
span = span2 = 0
tfit = np.linspace(0, 1, 100)
for rseed in np.arange(N):
t, y, dy = make_test_set(rseed_x=0, rseed_y=rseed)
model = SuperSmoother().fit(t, y, dy)
span += model.span(tfit)
span2 += model.span(tfit) ** 2
mean = span / N
std = np.sqrt(span2 / N - mean ** 2)
plt.plot(tfit, mean)
plt.fill_between(tfit, mean - std, mean + std, alpha=0.3)
plt.xlabel('t')
plt.ylabel('resulting span');
The degree of smoothing can be tuned with the bass enhancement feature. This is a number $\alpha$ which lies between 0 and 10, with 10 being a much smoother curve
rng = np.random.RandomState(0)
t = rng.rand(200)
dy = 0.5
y = np.sin(5 * np.pi * t ** 2) + dy * rng.randn(200)
plt.errorbar(t, y, dy, fmt='o', alpha=0.3)
for alpha in [0, 8, 10]:
smoother = SuperSmoother(alpha=alpha)
smoother.fit(t, y, dy)
plt.plot(tfit, smoother.predict(tfit),
label='alpha = {0}'.format(alpha))
plt.legend(loc=2);
The effect of the smoothing is not linear: i.e. a change from 0 to 1 has much less effect than a change from 9 to 10.