%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import GPy
import pods
In the worst case, inference in a Gaussian process is $\mathcal{O}(n^3)$ computational complexity and $\mathcal{O}(n^2)$ storage. For efficient inference in larger data sets we need to consider approximations. One approach is low rank approximation of the covariance matrix (also known as sparse approximations or perhaps more accurately parsimonious approximations). We'll study these approximations by first creating a simple data set by sampling from a GP.
X = np.sort(np.random.rand(50,1)*12,0)
k = GPy.kern.RBF(1)
K = k.K(X)
K+= np.eye(50)*0.01 # add some independence (noise) to K
y = np.random.multivariate_normal(np.zeros(50), K).reshape(50,1)
Build a straightforward GP model of our simulation. We’ll also plot the posterior of $f$.
m = GPy.models.GPRegression(X,y)
m.optimize()
fig = plt.figure()
ax = fig.add_subplot(111)
m.plot_f(ax=ax)
m._raw_predict?
mu, var = m._raw_predict(X) # this fetches the posterior of f
plt.vlines(X[:,0], mu[:,0]-2.*np.sqrt(var[:,0]), mu[:,0]+2.*np.sqrt(var[:,0]),color='r',lw=2)
One thought that occurs is as follows. Do we need all the data to create this posterior estimate? Are any of the data points redundant? What happens to the model if you remove some data?
Hint:
X2 = np.delete(X,range(8),0)
y2 = np.delete(y,range(8),0)
# Exercise 2 answer here
Now we’ll consider a GP that uses a low rank approximation to fit the data.
from IPython.display import display
Z = np.random.rand(3,1)*12
m = GPy.models.SparseGPRegression(X,y,Z=Z)
display(m)
In GPy, the sparse inputs $\mathbf{Z}$ are abbreviated 'iip' , for inducing input. Plot the posterior of $u$ in the same manner as for the full GP:
mu, var = m._raw_predict(Z)
plt.vlines(Z[:,0], mu[:,0]-2.*np.sqrt(var[:,0]), mu[:,0]+2.*np.sqrt(var[:,0]),color='r')
a) Optimise and plot the model. The inducing inputs are marked – how
are they placed? You can move them around with e.g. m['iip_2_0'] = 100
. What
happens to the likelihood? What happens to the fit if you remove an input?
# Exercise 3 a answer
b) How does the fit of the sparse compare with the full GP? Play around with the number of inducing inputs, the fit should improve as $M$ increases. How many inducing points are needed? What do you think happens in higher dimensions?
Can you build a low rank Gaussian process with the intrinsic model of coregionalization? Do you have to treat the 2nd input (which specifies the event number) in a special way?