%pylab inline
rcParams['figure.figsize'] = 14, 7
Populating the interactive namespace from numpy and matplotlib
While calculating CSD, we often have to deal with noisy measurements.
Finding a set of optimal parameters for our model, we have to solve the bias–variance dilemma which is the problem of simultaneously minimizing the bias (how accurate a model is across different training sets) and variance of the model error (how sensitive the model is to small changes in training set).
The error may be decomposed into:
$Err(x) = \left(E[\hat{f}(x)]-f(x)\right)^2 + E\left[\hat{f}(x)-E[\hat{f}(x)]\right]^2 +\sigma_e^2$
$Err(x) = \mathrm{Bias}^2 + \mathrm{Variance} + \mathrm{Irreducible\ Error}$
Suppose we took some noisy measurements of function $f(x) = 0.08 x^2 + 1.0$ for $x \in \lbrace{ 0,1,2,3,4,5,6,7,8,9 \rbrace}$ The noise will add variance, but not so much bias. The noise is from normal distribution so $E(noise) = 0$
import numpy as np
X = np.array(range(0,10))
Y = 0.08*X**2 + 1.0
fig, (ax1, ax2) = plt.subplots(1, 2, sharex=True, sharey=True)
ax1.plot(X,Y)
ax1.set_title("model function")
Y2 = Y + np.random.random_sample(size(X))-0.5
ax2.scatter(X,Y2)
ax2.plot(X,Y)
ax2.set_title("noisy measurements")
<matplotlib.text.Text at 0x36c5310>
Now let's try to fit some functions.
from scipy.optimize import curve_fit
def poly_7(x, a, b, c, d, e, f, g, h):
return h*x**7+g*x**6 + f*x**5 + e*x**4 + d*x**3 + c*x**2 + b*x + a
popt, pcov = curve_fit(poly_7, X, Y2, p0=(10.0, 0.2, 0.1, 0.0,0.0,0.0,0.0,0.0))
a = popt[0]
b = popt[1]
c = popt[2]
d = popt[3]
e = popt[4]
f = popt[5]
g = popt[6]
h = popt[7]
residuals = Y2 - poly_7(X, a,b,c,d,e,f,g,h)
fres = np.linalg.norm(residuals)
print "7-degree polynomial error: ",fres
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, sharex=True, sharey=True)
ax1.scatter(X,Y2)
xlin = np.linspace(0,9, 50)
ax1.plot(xlin, poly_7(xlin,a,b,c,d,e,f,g,h))
ax1.set_title("Overfit - high variance")
def poly_1(x, a, b):
return a*x + b
popt, pcov = curve_fit(poly_1, X, Y2, p0=(10.0, 0.2))
a = popt[0]
b = popt[1]
residuals = Y2 - poly_1(X, a,b)
fres = np.linalg.norm(residuals)
print "linear approximation error: ", fres
ax2.scatter(X,Y2)
ax2.plot(xlin, poly_1(xlin, a, b))
ax2.set_title("Underfit - high bias")
def poly_2(x, a, b, c):
return a*x**2 + b*x + c
popt, pcov = curve_fit(poly_2, X, Y2, p0=(10.0, 0.2, 0.0))
a = popt[0]
b = popt[1]
c = popt[2]
residuals = Y2 - poly_2(X, a,b,c)
fres = np.linalg.norm(residuals)
print "error of the correct model: ", fres
ax3.scatter(X,Y2)
ax3.plot(xlin, poly_2(xlin, a,b,c))
ax3.set_title("Correct model")
7-degree polynomial error: 0.161557406365 linear approximation error: 1.48952709068 error of the correct model: 0.568171379783
<matplotlib.text.Text at 0x3e8d410>
We can see that the higher the polynomial order is, the smallest the error. However if we were to add few more points without recalculating the polynomial coefficients, the incorrect models would fail miserably.
In order to give preference to some particular solution, we are can introduce a regularization parameter $\lambda$.
Now the least squares regularization turns into Tikhonov regularization (Regularized Least Squares)
$$\min_{f \in H} \sum_{i = 1}^n\left(f(X_i) - Y_i\right)^2 + \lambda ~ ||{f}||^2_K$$The regularization parameter $\lambda$ determines how much we distrust our data.
Too low $\lambda$ will leave noise in the measurements and therefore modelled function may not extrapolate well.
Too high $\lambda$ may eventually ignore the data.
To validate our data we can only use our own measurements. So we can split them into a training set and testing set, interpolate the function obtained from processing the training set and find how well we can predict the test samples. This procedure is called cross validation.
Repeating the procedure for different $\lambda$, we can find which regularization parameter gives the best results (smallest error).
Very high $\lambda$ suggests that our measurements are too noisy or our model may be invalid. We would expect $\lambda$ to be somewhere between 0 and 1.
In SciPy (scikit-learn) we have some handful procedures for cross-validation. For example - there are index generators, which enable us to perform:
Some examples:
from sklearn.cross_validation import LeaveOneOut
#for example
#X = np.array([[0., 0.], [1., 1.], [-1., -1.], [2., 2.]])
Y = np.array([0, 1, 0, 1])
loo = LeaveOneOut(len(Y), indices=True)
print(loo)
print "train test"
for train_ind, test_ind in loo:
#do something with the indices and data
print("%s %s" % (train_ind, test_ind))
sklearn.cross_validation.LeaveOneOut(n=4) train test [1 2 3] [0] [0 2 3] [1] [0 1 3] [2] [0 1 2] [3]
Leave P Out divides the samples into testing subset of size p and training set of size n-p
from sklearn.cross_validation import LeavePOut
lpo = LeavePOut(n=len(Y), p=2, indices=True)
print(lpo)
print "train test"
for train_ind, test_ind in lpo:
#do something with the indices and data
print("%s %s" % (train_ind, test_ind))
sklearn.cross_validation.LeavePOut(n=4, p=2) train test [2 3] [0 1] [1 3] [0 2] [1 2] [0 3] [0 3] [1 2] [0 2] [1 3] [0 1] [2 3]
An advantage of K-Fold cross validation is reduction of the number of iterations required to complete the task.
from sklearn.cross_validation import KFold
#for example:
#X = np.array([[0., 0.], [1., 1.], [-1., -1.], [2., 2.]])
Y = np.array([0, 1, 0, 1])
kf = KFold(len(Y), n_folds=2, indices=True)
print(kf)
print "train test"
for train_ind, test_ind in kf:
print("%s %s" % (train_ind, test_ind))
sklearn.cross_validation.KFold(n=4, n_folds=2) train test [2 3] [0 1] [0 1] [2 3]
from sklearn.cross_validation import ShuffleSplit
Y = np.array([0, 1, 0, 1])
ss = ShuffleSplit(len(Y), n_iter=2, test_size = 0.5, indices=True)
print(ss)
print "train test"
for train_ind, test_ind in ss:
print("%s %s" % (train_ind, test_ind))
ShuffleSplit(4, n_iter=2, test_size=0.5, indices=True, random_state=None) train test [3 2] [1 0] [0 1] [2 3]
And more.
We can recalculate our model with different $\lambda$ to find out which model works best.
Using the previous functions we can check how the models extrapolate. This is pure cross validation and curve fitting. No regularization parameter was introduced in this example.
def error_7(x_train, y_train, x_test, y_test):
popt, pcov = curve_fit(poly_7, x_train, y_train, p0=(10.0, 0.2, 0.1, 0.0,0.0,0.0,0.0,0.0))
a = popt[0]
b = popt[1]
c = popt[2]
d = popt[3]
e = popt[4]
f = popt[5]
g = popt[6]
h = popt[7]
y_estimated = poly_7(x_test, a,b,c,d,e,f,g,h)
err = np.linalg.norm(y_test - y_estimated)
return err
X = np.array(range(0,10))
Y = 0.04*X**2 + 1.0
Y2 = Y + np.random.random_sample(size(X))-0.5
loo = LeaveOneOut(len(Y), indices=True)
error = 0.0
for ind_train, ind_test in loo:
error += error_7(X[ind_train], Y2[ind_train], X[ind_test], Y2[ind_test])
print "Total error:", error
Total error: 27.2241098604
Now we can see how badly this model extrapolates.
def error_1(x_train, y_train, x_test, y_test):
popt, pcov = curve_fit(poly_1, x_train, y_train, p0=(10.0, 0.2))
a = popt[0]
b = popt[1]
y_estimated = poly_1(x_test, a,b)
err = np.linalg.norm(y_test - y_estimated)
return err
error = 0.0
for ind_train, ind_test in loo:
error += error_1(X[ind_train], Y2[ind_train], X[ind_test], Y2[ind_test])
print "Total error:", error
Total error: 3.68014736226
Now this looks better.
def error_2(x_train, y_train, x_test, y_test):
popt, pcov = curve_fit(poly_2, x_train, y_train, p0=(10.0, 0.2,0.0))
a = popt[0]
b = popt[1]
c = popt[2]
y_estimated = poly_2(x_test, a,b,c)
err = np.linalg.norm(y_test - y_estimated)
return err
error = 0.0
for ind_train, ind_test in loo:
error += error_2(X[ind_train], Y2[ind_train], X[ind_test], Y2[ind_test])
print "Total error:", error
Total error: 1.90230633991
Here we can see that the 2nd degree polynomial extrapolates best.
This is how we manage to minimize the error connected with too high variance.
Lets now solve a linear equation $Ax = b$
$x = (A+\lambda I)^{-1} b$
A = np.array([
[1.0, 0.5, 0.2, 0.0, 0.0],
[0.5, 1.0, 0.5, 0.2, 0.0],
[0.2, 0.5, 1.0, 0.5, 0.2],
[0.0, 0.2, 0.5, 1.0, 0.5],
[0.0, 0.0, 0.2, 0.5, 1.0],
])
b = np.array([[1],[2],[3],[4],[5]])
fig, (ax1, ax2) = plt.subplots(1,2)
ax1.imshow(A, interpolation="none")
ax1.set_title("A")
ax2.imshow(b, interpolation="none")
ax2.set_title("b")
<matplotlib.text.Text at 0x474ab10>
fig, (ax1, ax2, ax3, ax4) = plt.subplots(1,4)
x = dot(inv(A), b)
ax1.imshow(x, interpolation="none")
ax1.set_title("x for lambda=0.0")
ax4.plot(x)
lambd = 1.0
x = dot(inv(A + lambd*identity(A.shape[0])), b)
ax2.imshow(x, interpolation = "none")
ax2.set_title("x for lambda=1.0")
ax4.plot(x)
lambd = 20.0
x = dot(inv(A + lambd*identity(A.shape[0])), b)
ax3.imshow(x, interpolation = "none")
ax3.set_title("x for lambda=20.0")
ax4.plot(x)
[<matplotlib.lines.Line2D at 0x4d62c90>]