Coregionalized Regression Model¶

13th August 2014¶

by Ricardo Andrade-Pacheco¶

This tutorial will focus on the use and kernel selection of the $\textit Coregionalized\,Regression$ model in GPy.

Setup¶

The first thing to do is to set the plots to be interactive and to import GPy.

In [1]:

%pylab inline
import pylab as pb
pylab.ion()
import GPy

Populating the interactive namespace from numpy and matplotlib

For this example we will generate an artificial dataset.

In [2]:

#This functions generate data corresponding to two outputs
f_output1 = lambda x: 2. * np.sin(x/4.) + 4. * np.cos((x-15)/5.) - .4 * x + np.ones_like(x) * 35.+(np.random.rand(x.size)[:,None]-.5) * 2.
f_output2 = lambda x: 2. * np.sin(x/4.) + 4. * np.cos((x-15)/5.) + .2*x + np.ones_like(x) * 10. + (np.random.rand(x.size)[:,None]-.5) * 8.

#{X,Y} training set for each output
X1 = np.random.rand(100)[:,None]; X1=X1*75
X2 = np.random.rand(100)[:,None]; X2=X2*70 + 30
Y1 = f_output1(X1)
Y2 = f_output2(X2)
#{X,Y} test set for each output
Xt1 = np.random.rand(100)[:,None]*100
Xt2 = np.random.rand(100)[:,None]*100
Yt1 = f_output1(Xt1)
Yt2 = f_output2(Xt2)

Our two datasets look like this:

In [3]:

xlim = (0,100); ylim = (0,50)
fig = pb.figure(figsize=(12,8))
ax1 = fig.add_subplot(211)
ax1.set_xlim(xlim)
ax1.set_title('Output 1')
ax1.plot(X1[:,:1],Y1,'kx',mew=1.5,label='Train set')
ax1.plot(Xt1[:,:1],Yt1,'rx',mew=1.5,label='Test set')
ax1.legend()
ax2 = fig.add_subplot(212)
ax2.set_xlim(xlim)
ax2.set_title('Output 2')
ax2.plot(X2[:,:1],Y2,'kx',mew=1.5,label='Train set')
ax2.plot(Xt2[:,:1],Yt2,'rx',mew=1.5,label='Test set')
ax2.legend()

Out[3]:

<matplotlib.legend.Legend at 0x10e3dbed0>

We will also define a function that will be used later for plotting our results.

In [4]:

def plot_2outputs(m,xlim,ylim):
    fig = pb.figure(figsize=(12,8))
    #Output 1
    ax1 = fig.add_subplot(211)
    ax1.set_xlim(xlim)
    ax1.set_title('Output 1')
    m.plot(plot_limits=xlim,fixed_inputs=[(1,0)],which_data_rows=slice(0,100),ax=ax1)
    ax1.plot(Xt1[:,:1],Yt1,'rx',mew=1.5)
    #Output 2
    ax2 = fig.add_subplot(212)
    ax2.set_xlim(xlim)
    ax2.set_title('Output 2')
    m.plot(plot_limits=xlim,fixed_inputs=[(1,1)],which_data_rows=slice(100,200),ax=ax2)
    ax2.plot(Xt2[:,:1],Yt2,'rx',mew=1.5)

Multiple output kernel¶

The $\textit Coregionalized\,Regression$ model relies on the use of $\textit{multiple output kernels}$ (Álvarez, Rosasco and Lawrence, 2012), of the following form:

$ \begin{align*} {\bf B}\otimes{\bf K} = \left(\begin{array}{ccc} B_{1,1}\times{\bf K}({\bf X}_{1},{\bf X}_{1}) & \ldots & B_{1,D}\times{\bf K}({\bf X}_{1},{\bf X}_{D})\\ \vdots & \ddots & \vdots\\ B_{D,1}\times{\bf K}({\bf X}_{D},{\bf X}_{1}) & \ldots & B_{D,D}\times{\bf K}({\bf X}_{D},{\bf X}_{D}) \end{array}\right) \end{align*} $.

In the expression above, ${\bf K}$ is a kernel function, ${\bf B}$ is a regarded as the coregionalization matrix, and ${\bf X}_i$ represents the inputs corresponding to the $i$-th output.

Notice that if $B_{i,j} = 0$ for $i \neq j$, then all the outputs are being considered as independent of each other.

To ensure that the multiple output kernel is a valid kernel, we need the $\bf K$ and ${\bf B}$ to be to be valid. If $\bf K$ is already a valid kernel, we just need to ensure that ${\bf B}$ is positive definite. The last is achieved by defining ${\bf B} = {\bf W}{\bf W}^\top + {\boldsymbol \kappa}{\bf I}$, for some matrix $\bf W$ and vector ${\boldsymbol \kappa}$.

In GPy, a multiple output kernel is defined in the following way:

In [5]:

import GPy
K=GPy.kern.RBF(1)
B = GPy.kern.Coregionalize(input_dim=1,output_dim=2) 
multkernel = K.prod(B,name='B.K')
print multkernel

  B_K.             |  Value   |  Constraint  |  Prior  |  Tied to
  rbf.variance     |     1.0  |     +ve      |         |         
  rbf.lengthscale  |     1.0  |     +ve      |         |         
  coregion.W       |  (2, 1)  |              |         |         
  coregion.kappa   |    (2,)  |     +ve      |         |

The components of the kernel can be accessed as follows:

In [6]:

#Components of B
print 'W matrix\n',B.W
print '\nkappa vector\n',B.kappa
print '\nB matrix\n',B.B

W matrix
  Index  |  B_K.coregion.W  |  Constraint  |   Prior   |  Tied to
  [0 0]  |     -0.15188848  |              |           |    N/A    
  [1 0]  |     -0.52590895  |              |           |    N/A    

kappa vector
  Index  |  B_K.coregion.kappa  |  Constraint  |   Prior   |  Tied to
   [0]   |                 0.5  |     +ve      |           |    N/A    
   [1]   |                 0.5  |     +ve      |           |    N/A    

B matrix
[[ 0.52307011  0.07987951]
 [ 0.07987951  0.77658023]]

We have built a function called $\textit ICM$ that deals with the steps of defining two kernels and multiplying them together.

In [7]:

icm = GPy.util.multioutput.ICM(input_dim=1,num_outputs=2,kernel=GPy.kern.RBF(1))
print icm

  ICM.             |  Value   |  Constraint  |  Prior  |  Tied to
  rbf.variance     |     1.0  |     +ve      |         |         
  rbf.lengthscale  |     1.0  |     +ve      |         |         
  B.W              |  (2, 1)  |              |         |         
  B.kappa          |    (2,)  |     +ve      |         |

Now we will show how to add different kernels together to model the data in our example.

Using the GPy's Coregionalized Regression Model¶

Once we have defined an appropiate kernel for our model, its use is straightforward. In the next example we will use a $\textit{Matern 3/2 kernel}$ as $\bf K$.

In [8]:

K = GPy.kern.Matern32(1)
icm = GPy.util.multioutput.ICM(input_dim=1,num_outputs=2,kernel=K)

m = GPy.models.GPCoregionalizedRegression([X1,X2],[Y1,Y2],kernel=icm)
m['.*Mat32.var'].constrain_fixed(1.) #For this kernel, B.kappa encodes the variance now.
m.optimize()
print m
plot_2outputs(m,xlim=(0,100),ylim=(-20,60))

WARNING: reconstraining parameters gp.ICM.Mat32.variance

Name                 : gp
Log-likelihood       : -352.613958853
Number of Parameters : 8
Parameters:
  gp.                                    |      Value       |  Constraint  |  Prior  |  Tied to
  ICM.Mat32.variance                     |             1.0  |    fixed     |         |         
  ICM.Mat32.lengthscale                  |   46.7840602961  |     +ve      |         |         
  ICM.B.W                                |          (2, 1)  |              |         |         
  ICM.B.kappa                            |            (2,)  |     +ve      |         |         
  mixed_noise.Gaussian_noise_0.variance  |  0.256214397783  |     +ve      |         |         
  mixed_noise.Gaussian_noise_1.variance  |     4.505340958  |     +ve      |         |

Notice that the there are two parameters for the noise variance. Each one corresponds to the noise of each output.

But what is the advantage of this model? Well, the fit of a non-coregionalized model (i.e., two independent models) would look like this:

In [9]:

K = GPy.kern.Matern32(1)

m1 = GPy.models.GPRegression(X1,Y1,kernel=K.copy())
m1.optimize()
m2 = GPy.models.GPRegression(X2,Y2,kernel=K.copy())
m2.optimize()
fig = pb.figure(figsize=(12,8))
#Output 1
ax1 = fig.add_subplot(211)
m1.plot(plot_limits=xlim,ax=ax1)
ax1.plot(Xt1[:,:1],Yt1,'rx',mew=1.5)
ax1.set_title('Output 1')
#Output 2
ax2 = fig.add_subplot(212)
m2.plot(plot_limits=xlim,ax=ax2)
ax2.plot(Xt2[:,:1],Yt2,'rx',mew=1.5)
ax2.set_title('Output 2')

Out[9]:

<matplotlib.text.Text at 0x110ba9550>

The fit is clearly better by coregionalizing the outputs rather than using independent models. Can we improve the fit in the coregionalization? Yes, we will have a look at that in the next section.

Kernel Selection¶

The data from both outputs is not centered on zero. A way of dealing with outputs of different means or magnitudes is using a $\textit{bias kernel}$.

This kernel is just changing the mean (constant) of the Gaussian Process being fitted. There is no need to assume any sort of correlation between both means, so we can define ${\bf W} = {\bf 0}$.

In [10]:

kernel = GPy.util.multioutput.ICM(input_dim=1,num_outputs=2,kernel=GPy.kern.Bias(input_dim=1))
m = GPy.models.GPCoregionalizedRegression(X_list=[X1,X2],Y_list=[Y1,Y2],kernel=kernel)
m['.*bias.var'].constrain_fixed(1) #B.kappa now encodes the variance.
m['.*W'].constrain_fixed(0)
m.optimize()
plot_2outputs(m,xlim=(-20,120),ylim=(0,60))

WARNING: reconstraining parameters gp.ICM.bias.variance

/Users/neill/SheffieldML/GPy/GPy/kern/_src/prod.py:47: RuntimeWarning: invalid value encountered in divide
  p.update_gradients_full(k/p.K(X,X2),X,X2)

At the moment, our model is only able to explain the mean of the data. However we can notice that there is a deacreasing trend in the first output and and increasent trend in the second one. In this case we can model such a trend with a $\textit{linear kernel}$.

Since the linear kernel only fits a line with constant slope along the output space, there is no need to assume any correlation between outputs.

We could define our new multiple output kernel as follows:

${\bf K}_{ICM} = {\bf B} \otimes ( {\bf K}_{Bias} + {\bf K}_{Linear} )$.

However, we can also define a more general kernel of the following form:

${\bf K}_{LCM} = {\bf B}_1 \otimes {\bf K}_{Bias} + {\bf B}_2 \otimes {\bf K}_{Linear}$.

GPy has also a function which saves some steps in the definition of LCM kernels.

In [11]:

K1 = GPy.kern.Bias(1)
K2 = GPy.kern.Linear(1)
lcm = GPy.util.multioutput.LCM(input_dim=1,num_outputs=2,kernels_list=[K1,K2])

m = GPy.models.GPCoregionalizedRegression([X1,X2],[Y1,Y2],kernel=lcm)
m['.*bias.var'].constrain_fixed(1.)
m['.*W'].constrain_fixed(0)
m['.*linear.var'].constrain_fixed(1.)
m.optimize()
plot_2outputs(m,xlim=(-20,120),ylim=(0,60))

WARNING: reconstraining parameters gp.add.ICM0.bias.variance
WARNING: reconstraining parameters gp.add.ICM1.linear.variances

Now we will model the variation along the trend defined by the linear component. We will do this with a $\textit{Matern 3/2 kernel}$.

In [12]:

K1 = GPy.kern.Bias(1)
K2 = GPy.kern.Linear(1)
K3 = GPy.kern.Matern32(1)
lcm = GPy.util.multioutput.LCM(input_dim=1,num_outputs=2,kernels_list=[K1,K2,K3])

m = GPy.models.GPCoregionalizedRegression([X1,X2],[Y1,Y2],kernel=lcm)
m['.*ICM.*var'].unconstrain()
m['.*ICM0.*var'].constrain_fixed(1.)
m['.*ICM0.*W'].constrain_fixed(0)
m['.*ICM1.*var'].constrain_fixed(1.)
m['.*ICM1.*W'].constrain_fixed(0)
m.optimize()
plot_2outputs(m,xlim=(0,100),ylim=(-20,60))

In [12]: