%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import GPy
import pods
from IPython.display import display
In this section we'll combine expectation propagation with the low rank approximation to build a simple image classificationa application. For this toy example we'll classify whether or not the subject of the image is wearing glasses. correspond to whether the subject of the image is wearing glasses. Set up the ipython environment and download the data:
data = pods.datasets.olivetti_glasses()
Xtrain = data['X']
ytrain = data['Y']
Here’s a simple way to visualise the data. Each pixel in the image will become an input to the GP.
plt.imshow(Xtrain[120].reshape(64,64,order='F'),
interpolation='nearest',cmap=plt.cm.gray)
<matplotlib.image.AxesImage at 0x10f903210>
Now fetch the class labels
Divide the data into a training/testing set:
Next we choose some inducing inputs. Here we've chosen inducing inputs by applying k-means clustering to the training data. Think about whether this is a good scheme for choosing the inputs? Can you devise a better one?
from scipy import cluster
M = 8
#Z, distortion = cluster.vq.kmeans(Xtrain,M)
Xtrain_std = (Xtrain - Xtrain.mean(0)[None,:] )/ (Xtrain.std(0)[None,:])
Z = np.random.permutation(Xtrain_std)[:M].copy()
print Xtrain.mean()
132.132182617
Finally, we’re ready to build the classifier object.
k = GPy.kern.RBF(Xtrain.shape[1],lengthscale=20) + GPy.kern.White(Xtrain.shape[1],0.001)
model = GPy.models.SparseGPClassification(Xtrain_std, ytrain, kernel=k, Z=Z)
display(model)
model.optimize()
clang: warning: argument unused during compilation: '-fopenmp' In file included from /Users/neil/.cache/scipy/python27_compiled/sc_1790bf65208b11355ffcfd4b65a5f1091.cpp:11: In file included from /Users/neil/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/weave/blitz/blitz/array.h:26: In file included from /Users/neil/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/weave/blitz/blitz/array-impl.h:37: /Users/neil/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/weave/blitz/blitz/range.h:120:34: warning: '&&' within '||' [-Wlogical-op-parentheses] return ((first_ < last_) && (stride_ == 1) || (first_ == last_)); ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~ ~~ /Users/neil/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/scipy/weave/blitz/blitz/range.h:120:34: note: place parentheses around the '&&' expression to silence this warning return ((first_ < last_) && (stride_ == 1) || (first_ == last_)); ^ ( ) In file included from /Users/neil/.cache/scipy/python27_compiled/sc_1790bf65208b11355ffcfd4b65a5f1091.cpp:23: In file included from /Users/neil/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4: In file included from /Users/neil/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17: In file included from /Users/neil/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1761: /Users/neil/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it by " \ ^ /Users/neil/.cache/scipy/python27_compiled/sc_1790bf65208b11355ffcfd4b65a5f1091.cpp:24:10: fatal error: 'omp.h' file not found #include <omp.h> ^ 2 warnings and 1 error generated. Weave compilation failed. Falling back to (slower) numpy implementation
Model: SparseGPClassification
Log-likelihood: -329.515082269
Number of Parameters: 32771
SparseGPClassification. | Value | Constraint | Prior | Tied to |
---|---|---|---|---|
inducing inputs | (8, 4096) | |||
add.rbf.variance | 1.0 | +ve | ||
add.rbf.lengthscale | 20.0 | +ve | ||
add.white.variance | 0.001 | +ve |
Look at the following figure. What is being shown? Why does it look like this?
plt.figure()
plt.imshow(model.Z[0].gradient.reshape(64,64,order='F'),
interpolation='nearest',
cmap=plt.cm.gray)
plt.colorbar()
<matplotlib.colorbar.Colorbar instance at 0x118b6d4d0>
Write some code to evaluate the model’s performance, using the held-out data that we separated earlier. How is the error rate? Is that better than random guessing?
Hint:
GPy.util.classification.conf_matrix(prob_estimate,ytest)
Write another simple for loop to optimize the model. How low can you get the error rate to go? What kind of kernel do you think might be appropriate for this classification task?