We will cover in detail the following setions of Lecture 3
Section 5, Linear models.
Section 7, Linearized models.
Section 9, Non parametric nonlinear models; nearest neighbours.
We will cover procedures for fitting these models in future tutorials.
Written by Franco Pestilli, PhD, Stanford University, April 2012
Translated to Python by Michael Waskom, June 2012
%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline]. For more information, type 'help(pylab)'.
from __future__ import division
import utils
import seaborn
seaborn.set()
colors = seaborn.color_palette("deep")
implicitly fitting the linear model $y = ax + b$ where $a$ and $b$ are free parameters. We can say that $x$ is an input variable, $y$ is the output variable, and the model attempts to predict the output variable based on the input variable. Alternatively, we can say that $x$ is a regressor, $y$ is the data, and the model attempts to explain the data using the regressor.
weighted sum of the regressors. For example, suppose we have a set of regressors $x_i$ where $i$ ranges from $1$ to $n$.
where $w$ is the weight to apply to $x$ to obtain $y$. For example with three regressors:
$$y = w_1x_1 + w_2x_2 + w_3x_3$$re-write the model as $y = ax + b1$ where $1$ is a constant regressor consisting of all ones. By constant regressor we mean a vector of all 1s with the same number of entries (rows) as $x$.
Then, if we set $x_1 = x$ and $x_2 = 1$, the model can be re-written as $y = w_1x_1 + w_2x_2$ where and $w_1$ and $w_2$ are free parameters (parameters that are free to vary). This is to say:
$y = ax + bx_2$, where $x_2 = 1$
Hereafter we will show that we can rewrite $y = ax + b$ as $y = sum(hx)$ where $h$ is a set of weights
# We will simulate a dataset with two variables
# First we create a variable x
x = arange(15)
a = 2 # Scaling, scales y in relation to the same value of x
b = 4 # Additive term, shifts y up in relation to the same value of x
# We will create a secod variable as a linear trasformation of the first
# one. We double x, we add 4 to each of its elements and add random
# gaussian noise the size as x
noise = randn(len(x))
# Note that we are effectively using the formula y = ax + b to simulate the data
y = (a * x + b) + 3 * noise
# Now we take a look at the relation between these two variables using a scatterplot
plot(x, y, "o")
xlabel("variable 1")
ylabel("variable 2");
# To help visualize the relation between the two variables it is good to
# plot them on axis with equal spacing (on x and y) and displaying equal
# ranges of x and y's
figure(figsize=(8, 8))
plot(x, y, "o")
axis((-1, 35, -1, 35))
# Above we simulated the data using the relation y = ax + b.
# Here we show that we can rewrite the same relation using matrix notation
X = row_stack((x, ones_like(x)))
# Notice that the way we have set up our matrix X each regressor is in a
# row and each observation is in a column. The matrix X should be
# transposed such that regressors are in columns and observations are in
# the rows. This dimensionality is essential for the matrix multiplication
# to work out properly
X = X.T # Here we transpose X
# We could have also done
X = column_stack((x, ones_like(x)))
# The matrix X can be multiplied by some weights (w). These weights
# effectily shift and scale the data in x. These weights are the
# coeficient of the linear equation; y = ax+b, we used to simulate the
# data.
w = row_stack((a, b))
# We can describe the simulated data by plotting the straight line defined
# by the model y = ax+b. This is the model we used to tranform x into y
y_model1 = a * x + b
# Plot the model on top of the data
plot(x, y_model1, linewidth=3)
# Alternatively we can describe the data by using the same linear model
# but written in matrix notation.
# Note that unlike MATLAB, using the "*" operator on numpy arrays
# performs element-wise multiplication. To perform real matrix
# multiplication, we could use numpy.matrix objects, which can
# be created with syntax similar to MATLAB
mtx1 = matrix(["1 2; 3 4"])
mtx2 = matrix(X)
# However, it is generally better practice to perform mulitiplication
# using the numpy.dot() function
# For those of you who are not familiar with linear algebra
# our vector or weights denoted by w must be (1) a column vector and (2)
# have as many entries as there are columns of X. This can be formalized
# in the following line of code where we assert that the number of columns
# in X is equal to the number of rows in w.
assert X.shape[1] == w.shape[0], 'Dimension mismatch between X and w'
# Because the dimensions match we can perform the following multiplication
y_model2 = dot(X, w)
# Plot the model
# The two models, 1 and 2, are perfectly on top of eachother. We play here
# with the the line style to make them a bit more visible. You
# should see a continuous green line (model 1) and on top of it a red
# dashed line (model 2).
plot(x, y_model2, "--", linewidth=3);
These are one type of nonlinear model. Every nonlinear model (a model that cannot be plotted on two axis and be described by a single line is a non-linear model).
Linearized models are non-linear models that can be transformed to linear models using a single tranformation. Generally, the traformation can be thought of as an expansion of the input space.
Hereafter we will simulate a non-linear dataset, with two variables. The two variables have a non-linear relation bewteen them. We will show how we can re-write the non-linear model and linearize it so that it can be described by a linear operation (addition and multiplication).
# First we create a variable, x
x = arange(-15, 16)
# We will now need three parameters
a = 1.5 # Scaling, expands y in relaton to the same value of x.
b = 2 # Additive term, shifts y up in relation to the same value of x.
c = 1.5 # This is the term that decribe the influence of the non-linear
# component. The greater this term the strongest the nonlinearity
The second variable is a non-linear trasformation of the first one. We double x, we add 4 to each of its elements, we square x and double it and add random gaussian noise.
noise = randn(len(x)) * 6
y = c * x ** 2 + a * x + b + noise
Plotting the data, we can 'see' that a line is not a good model for it
plot(x, y, ".")
xlabel("variable 1")
ylabel("variable 2")
axis((-40, 40, -10, 400));
We introduced above the matrix notation $y = \matrix{X}w$. This same notation can be used to represent this new non-linear dataset.
To do so we effectively linearize the model that we use to represent the data set. $x^2$ will be coded as a new entry in $\matrix{X}$, this new entry is now a linear function of $x^2$. The coefficient $c$ is the weight to apply to this new component of the linearized model.
X = column_stack((x ** 2, x, ones_like(x)))
We can appreciate the matrix representation of the linearized model by plotting each value in $\matrix{X}$ as intensity on an image of shape = shape(X).
We will use the matshow()
function to do so. This is really just a wrapper around the more generic imshow()
function.
# Plot the matrix
matshow(X)
# We will turn off the grid we have been using
grid()
# Use set the colormap for the whole image to be a shade of grays
gray()
# Turn on a colorbar to compare the values in the image
colorbar()
<matplotlib.colorbar.Colorbar instance at 0x99379b8>
Values are high (bright) in the first column at the top and bottom. This is because we are taking the power of $x$ there and also we are multiplying by $c$. They are lower (dimmer) in the second column and even lower in the last column.
We can show that the linearized model $y = \matrix{X}w$ describes well the data with a single linear operation, multiplication.
These are the new weights for the linearized model.
# Make the weight vector
w = row_stack((c, a, b))
# Check the dimensions of w and X are correct
sw = w.shape
print "w is %d by %d" % sw
sX = X.shape
print "X is %d by %d" % sX
# Formally check to make sure the dimensions match
assert sw[0] == sX[1], "Dimension mismatch between X and w"
w is 3 by 1 X is 31 by 3
# Build the model with matrix multiplication
y_model3 = dot(X, w)
# Plot the data and model
plot(x, y, ".")
axis((-40, 40, -10, 400))
plot(x, y_model3);
A third type of nonlinear model is a nonparametric nonlinear model. Nonparametric nonlinear models are essentially very flexible nonlinear models that can be generically applied to arbitrary datasets.
types of nonlinear models can be hazy, but we can make some generalities:
(1) Nonparametric nonlinear models are sometimes nonlinear with respect to the free parameters; thus, we cannot categorize them as linearized models.
(2) Nonparametric nonlinear models make few assumptions on the form of the nonlinearity of the model, unlike parametric nonlinear models.
Examples of nonparametric nonlinear models include nearest-neighbor methods (predict the output based on the nearest input example), local (fit a simple model at each point in the input space based on nearby data points), and neural networks (use generic basis functions such as sigmoidal or radial basis functions to model the output).
# Suppose we collected some measure of height, weight and the corresponding "beauty"
# on a set of individuals.
height = array([1.75, 1.55, 1.82, 1.77, 1.8, 1.65, 1.6]) # in meters
weight = array([75, 70, 72, 65, 68, 70, 68]) # in Kg.
beauty = array([1, 4, 2, 7, 7, 4, 3]) # 1-7, 7 is most beautiful
We plot here the data in a scatter plot of weight vs. height where the size of symbols are set by the beauty rating. More beautiful individual have bigger symbols.
# Unlike the matlab code, we can do this in one step
scatter(height, weight, beauty * 20, color=colors[2])
axis((1.5, 2, 60, 80))
xlabel("Height (meters)")
ylabel("Weight (Kg)");
Now we want to build a model to predict beauty ratings of new individuals given this data set.
The model is a nearest-neighbor model where a new individuals beauty is predicted by other individuals that are near them in euclidean space. Note that this can be done with infinitely many predictor variables however it is easier to visualize this model with just 2 variables
We know the height (1.8 meters) and the weight (68 Kg) of the individual. The nearest-neighbor model will allow us to predict the beauty of the individual by using the set of data of the original sample and the measures of height and weight for the new individual.
height_ind = 1.9
weight_ind = 67
The prediction of a nearest-neighbor model for the new individual is the beauty rating of the individual that has the closet height and weight.
To set up the nearest-neighbor model we find the distance of the height and weight of the new individual from all the rest of the idividual in our original sample. We sum the distances among weights and heights. In this way we have a measure of distance between the new individual and the rest of the group, the distance combines both height and weight.
distance = (height_ind - height) ** 2 + (weight_ind - weight) ** 2
# We can also do this with a function from the scipy package
from scipy.spatial.distance import cdist
pop = column_stack((height, weight))
sample = column_stack((height_ind, weight_ind))
distance2 = cdist(pop, sample)
# By default, cdist computes the euclidian distance, but you can use many others
# It also generalizes to n dimensions
assert all(sqrt(distance) == distance2.ravel())
After setting up our distance measure. We can find the individual in the orginal data set that has smallest distance to the new individual. We can then use the beauty rating of that individual in the original data set as the predicted beauty for the new individual.
scatter(height, weight, beauty * 20, color=colors[2])
axis((1.5, 2, 60, 80))
xlabel("Height (meters)")
ylabel("Weight (Kg)")
# Find the closest individual
idx = argmin(distance)
# Look up their beauty
predicted_beatuy = beauty[idx]
# The nearest-neighbor model uses the closest data point to
# predict the expected value of a new data point. In this way the model is
# non parametric, because it uses the actual data (the closest point to the
# new point) as the model (prediction).
# Here we plot the new individual with the original group.
scatter(height_ind, weight_ind, predicted_beatuy * 20, color=colors[0]);