In which we explore the problems of correlated regressors¶

By JB Poline and Matthew Brett.

In [1]:

# Compatibility with Python 3
from __future__ import print_function  # print('me') instead of print 'me'
from __future__ import division  # 1/2 == 0.5, not 0

In [2]:

# Array and plotting libraries
import numpy as np
import numpy.linalg as npl
import matplotlib.pyplot as plt

In [3]:

# Display plots inside the notebook
%matplotlib inline

In [4]:

# Make numpy print 4 significant digits for prettiness
np.set_printoptions(precision=4, suppress=True)

Imagine we have a TR (image) every 2 seconds, for 30 seconds. Here are the times of the TR onsets, in seconds:

In [5]:

times = np.arange(0, 30, 2)
times

Out[5]:

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

Now we make a function returning an HRF shape for an input vector of times:

In [6]:

# Gamma distribution from scipy
from scipy.stats import gamma

def spm_hrf(times):
    """ Return values for SPM-like HRF at given times """
    # Make output vector
    values = np.zeros(len(times))
    # Only evaluate gamma above 0 (undefined at <= 0)
    valid_times = times[times > 0]
    # Gamma pdf for the peak
    peak_values = gamma.pdf(valid_times, 6)
    # Gamma pdf for the undershoot
    undershoot_values = gamma.pdf(valid_times, 12)
    # Combine them, put back into values vector
    values[times > 0] = peak_values - 0.35 * undershoot_values
    # Scale area under curve to 1
    return values / np.sum(values)

Sample the HRF at the given times (to simulate an event starting at time 0), and at times - 2 (simulating an event starting at time 2):

In [7]:

hrf1 = spm_hrf(times)
hrf2 = spm_hrf(times - 2) # An HRF with 2 seconds (one TR) delay
hrf1 = (hrf1 - hrf1.mean()) # Rescale and mean center
hrf2 = (hrf2 - hrf2.mean())

In [8]:

plt.plot(times, hrf1)
plt.plot(times, hrf2)

Out[8]:

[<matplotlib.lines.Line2D at 0x106369b50>]

The Pearson correlation coefficient between the HRFs for the two events:

In [9]:

np.corrcoef(hrf1, hrf2)

Out[9]:

array([[ 1.    ,  0.7023],
       [ 0.7023,  1.    ]])

Make a signal that comes from the combination of the two HRFs:

In [10]:

signal = hrf1 + hrf2
plt.plot(hrf1, label='hrf1')
plt.plot(hrf2, label='hrf2')
plt.plot(signal, label='signal (combined hrfs)')
plt.legend()

Out[10]:

<matplotlib.legend.Legend at 0x1080adbd0>

We're going to make our simulated data from taking the signal (the two HRFs) and adding some random noise:

In [11]:

noise = np.random.normal(size=times.shape)
Y = signal + noise
plt.plot(times, signal)
plt.plot(times, Y, '+')

Out[11]:

[<matplotlib.lines.Line2D at 0x108231dd0>]

We are going to model this simulated signal in several different ways. First, we make a model that only has the first HRF as a regressor (plus a column of ones to model the mean of the data):

In [12]:

X_one = np.vstack((hrf1, np.ones_like(hrf1))).T
plt.imshow(X_one, interpolation='nearest', cmap='gray')
plt.title('Model with first HRF regressor only')

Out[12]:

<matplotlib.text.Text at 0x108534b90>

Next we make a model where we also include the second HRF as a regressor:

In [13]:

X_both = np.vstack((hrf1, hrf2, np.ones_like(hrf1))).T
plt.imshow(X_both, interpolation='nearest', cmap='gray')
plt.title('Model with both HRF regressors')

Out[13]:

<matplotlib.text.Text at 0x1085c9e50>

Now we make a very large number of data vectors, each with the signal (both HRFs) plus a different vector of noise.

In [14]:

T = len(times)
iters = 100000
# Make 100000 Y vectors (new noise for each colum)
noise_vectors = np.random.normal(size=(T, iters))
# add signal to make data vectors
Ys = noise_vectors + signal[:, np.newaxis]
Ys.shape

Out[14]:

(15, 100000)

We first fit the model with only the first HRF regressor to every (signal + noise) sample vector.

In [15]:

# Fit X_one to signals + noise
B_ones = npl.pinv(X_one).dot(Ys)

Next fit the model with both HRFs as regressors:

In [16]:

# Fit X_both to signals + noise
B_boths = npl.pinv(X_both).dot(Ys)

Remember that the students-t statistic is:

$$ t = \frac{c^T \hat\beta}{\sqrt{\mathrm{var}(c^T \hat\beta)}} $$

which works out to:

$$ t = \frac{c^T \hat\beta}{\sqrt{\hat{\sigma}^2 c^T (X^T X)^+ c}} $$

where $\hat{\sigma}^2$ is our estimate of variance in the residuals, and $(X^T X)^+$ is the pseudo-inverse of $X^T X$.

That's the theory. So, what is the distribution of the estimates we get for the first beta, in the single-HRF model?

In [17]:

plt.hist(B_ones[0], bins=100)
np.std(B_ones[0])

Out[17]:

1.4859976445167784

The standard deviation of the estimates is what we observe. Does this match what we would predict from the t-statistic formula above?

In [18]:

C_one = np.array([1, 0])[:, None]  # column vector
np.sqrt(C_one.T.dot(npl.pinv(X_one.T.dot(X_one)).dot(C_one)))

Out[18]:

array([[ 1.485]])

Notice that the mean of the estimates, is somewhere above one, even though we only added 1 times the first HRF as the signal:

In [19]:

np.mean(B_ones[0])

Out[19]:

1.7003184795688511

This is because the single first regresssor has to fit both the first HRF in the signal, and as much as possible of the second HRF in the signal, because there is nothing else in the model to fit the second HRF shape.

What estimates do we get for the first regressor, when we have both regressors in the model?

In [20]:

plt.hist(B_boths[0], bins=100)
np.mean(B_boths[0]), np.std(B_boths[0])

Out[20]:

(1.0040164097531259, 2.0898972889301848)

Two things have happened now we added the second (correlated) hrf2 regressor. First, the mean of the parameter for the hrf1 regressor has dropped to 1, because hrf1 is no longer having to model the signal from the second HRF. Second, the variability of the estimate has increased. This is what the bottom half of the t-statistic predicts:

In [21]:

C_both = np.array([1, 0, 0])[:, None]  # column vector
np.sqrt(C_both.T.dot(npl.pinv(X_both.T.dot(X_both)).dot(C_both)))

Out[21]:

array([[ 2.0861]])

The estimate of the parameter for hrf2 has a mean of around 1, like the parameter estimates for hrf1. This is what we expect because we have 1 x hrf1 and 1 x hrf2 in the signal. Not surprisingly, the hrf2 parameter estimate has a similar variability to that for the hrf1 parameter estimate:

In [22]:

plt.hist(B_boths[1], bins=100)
np.mean(B_boths[1]), np.std(B_boths[1])

Out[22]:

(0.99158549385148709, 2.0895047510765408)

In [23]:

C_both_1 = np.array([0, 1, 0])[:, None]  # column vector
np.sqrt(C_both_1.T.dot(npl.pinv(X_both.T.dot(X_both)).dot(C_both_1)))

Out[23]:

array([[ 2.0865]])

The parameter estimates for hrf1 and hrf2 are anti-correlated:

In [24]:

# Relationship of estimated parameter of hrf1 and hrf2
plt.plot(B_boths[0], B_boths[1], '.')
np.corrcoef(B_boths[0], B_boths[1])

Out[24]:

array([[ 1.    , -0.7032],
       [-0.7032,  1.    ]])

Orthogonalizing hrf2 with respect to hrf1¶

hrf2 is correlated with hrf1. That means that we can split up hrf2 into two vectors, one being a multiple of hrf1, and the other being the remaining unique contribution of hrf2. The sum of the two vectors is the original hrf2 regressor. Like this:

In [25]:

# Regress hrf2 against hrf1 to get best fit of hrf2 using just hrf1
y = hrf2
X = hrf1[:, np.newaxis]  # hrf1 as column vector
B_hrf1_in_hrf2 = npl.pinv(X).dot(y)  # scalar multiple of hrf1 to best fit hrf2
hrf1_in_hrf2 = X.dot(B_hrf1_in_hrf2)  # portion of hrf2 that can be explained by hrf1
unique_hrf2 = hrf2 - hrf1_in_hrf2  # portion of hrf2 that cannot be explained by hrf1
plt.plot(times, hrf1, label='hrf1')
plt.plot(times, hrf2, label='hrf2')
plt.plot(times, hrf1_in_hrf2, label='hrf1 in hrf2')
plt.plot(times, unique_hrf2, label='hrf2 orth wrt hrf1')
plt.legend()
# hrf1 part of hrf2, plus unique part, equals original hrf2
np.allclose(hrf2, hrf1_in_hrf2 + unique_hrf2)

Out[25]:

True

How much of the first regressor did we find in the second regressor?

In [26]:

B_hrf1_in_hrf2

Out[26]:

array([ 0.7022])

When we have the model with both hrf1 and hrf2, we are effectively multiplying both parts of hrf2 by the same beta parameter, to fit the data. That is, we are applying the same scaling to the part of hrf2 that is the same shape as hrf1 and the part of hrf2 that cannot be formed from the hrf1 shape.

Now, what happens if we replace hrf2, by just the part of hrf2, that cannot be explained by hrf1? Our second regressor is now hrf2 orthogonalized with respect to hrf1:

In [27]:

X_both_o = np.vstack((hrf1, unique_hrf2, np.ones_like(hrf1))).T
plt.imshow(X_both_o, interpolation='nearest', cmap='gray')

Out[27]:

<matplotlib.image.AxesImage at 0x10a5d7790>

In [28]:

plt.plot(times, X_both_o[:,0], times, X_both_o[:,1]) 

Out[28]:

[<matplotlib.lines.Line2D at 0x1086b8e10>,
 <matplotlib.lines.Line2D at 0x1086b8950>]

What will happen when we fit this model?

In [29]:

B_boths_o = npl.pinv(X_both_o).dot(Ys)

In [30]:

# Distribution of parameter for hrf1 in orth model
plt.hist(B_boths_o[0], bins=100)
np.mean(B_boths_o[0]), np.std(B_boths_o[0])

Out[30]:

(1.7003184795688511, 1.4859976445167784)

In [31]:

# Predicted variance of hrf1 parameter is the same as for the
# model with hrf1 on its own
np.sqrt(C_both.T.dot(npl.pinv(X_both_o.T.dot(X_both_o)).dot(C_both)))

Out[31]:

array([[ 1.485]])

The parameter for the hrf1 regressor has now returned to the same value and variance as it had when hrf1 was the only regressor in the model (apart from the mean). For the orthogonalized model, we removed the part of hrf2 that could be explained by hrf1. Now, the amount of hrf1, that we could find in hrf2, has been added back to the parameter for hrf1, in order to make the fitted $\hat{y}$ values the same as for the model with both HRFs.

In [32]:

np.mean(B_boths[0, :]) + B_hrf1_in_hrf2

Out[32]:

array([ 1.7062])

The hrf1 parameter in the orthogonalized model is the same as for the model that only includes hrf1 - as if the orthogonalized hrf2 was not present. The parameter for orthogonalized hrf2 is the same as the parameter for hrf2 in the not-orthogonalized model. We still need the same amount of the orthogonal part of the second regressor to explain the signal:

In [33]:

# Example parameters from the single model
B_ones[:,:5]

Out[33]:

array([[ 1.8449,  2.6261, -0.1168,  3.7753,  3.2674],
       [-0.1947,  0.0405,  0.0153, -0.3264,  0.295 ]])

In [34]:

# Example parameters from the non-orth model
B_boths[:,:5]

Out[34]:

array([[ 1.2982, -0.2706, -2.4005,  1.4996,  2.7255],
       [ 0.7786,  4.1252,  3.2521,  3.2408,  0.7717],
       [-0.1947,  0.0405,  0.0153, -0.3264,  0.295 ]])

In [35]:

# Example parameters from the orth model
B_boths_o[:,:5]

Out[35]:

array([[ 1.8449,  2.6261, -0.1168,  3.7753,  3.2674],
       [ 0.7786,  4.1252,  3.2521,  3.2408,  0.7717],
       [-0.1947,  0.0405,  0.0153, -0.3264,  0.295 ]])

In [36]:

# The parameter for the hrf1 regressor in the orth model
# is the same as the parameter for the hrf1 regressor in the
# single regressor model
plt.plot(B_ones[0], B_boths_o[0], '.')
np.allclose(B_ones[0], B_boths_o[0])

Out[36]:

True

In [37]:

# The parameter for the orthogonalized hrf2 regressor is the same as the
# parameter for the non-orthogonalize hrf2 regressor in the 
# non-orthogonalized model
plt.plot(B_boths[1], B_boths_o[1], '.')
np.allclose(B_boths[1], B_boths_o[1])

Out[37]:

True

In [38]:

# The parameter for the hrf1 regressor in the non-orth model
# is correlated with the parameter for the hrf1 regressor
# in the orth model.
plt.plot(B_boths[0], B_boths_o[0], '.')
np.corrcoef(B_boths[0], B_boths_o[0])

Out[38]:

array([[ 1.    ,  0.7121],
       [ 0.7121,  1.    ]])

In [39]:

# Relationship of estimated parameters for hrf1 and orthogonalized hrf2
# (they should be independent)
plt.plot(B_boths_o[0], B_boths_o[1], '+')
np.corrcoef(B_boths_o[0], B_boths_o[1])

Out[39]:

array([[ 1.    , -0.0015],
       [-0.0015,  1.    ]])

Variance inflation factor¶

The coefficient of determination, $R^2$, is computed with: $$ R^2 = \frac{SSReg}{SST - C} $$

With $C = n \bar{Y}^2$, and $\bar{Y}$ the mean of the $Y$s.

In [40]:

n, p = X_both.shape[0], X_both.shape[1]

In [41]:

# R2 And variance inflation factor

# function to compute the R2 from Y and X
from scipy import linalg as slg

def projR(to_proj, X_):
    """ 
    projects the 2d array `to_proj` onto the residual space of `_X`
    
    The svd of _X is computed to have a orthonormal basis of the
    space of the columns of _X
    """
    u_, _, _ = slg.svd(X_, full_matrices=False)
    return (to_proj - u_.dot(u_.T.dot(to_proj)))

# try this function with the mean:
ones_n1 = np.ones((n,1))
y = np.random.normal(3,1,size=(n,1))
print("this should be close to 3: ", y.mean())
print("this should be close to 0: ", projR(y, ones_n1).mean())


def R2(Y, X):
    """
    Compute the coefficient of determination R2 
    Y: numpy array shape (n,1) : the data
    X: numpy array shape (n,p) : the design matrix
    """
    (n,p) = X.shape
    SST = (Y**2).sum()               # total sum of square
    C = (Y.sum())**2 / n             # sum of square of the mean
    ones_n1 = np.ones((n,1))
    Xdemeaned = projR(X, ones_n1)
    # compute the svd to get an orthonormal basis 
    u_, _, _ = slg.svd(Xdemeaned, full_matrices=False)
    
    # compute SSReg noticing that Y^t u u^t Y = \sum (u^t Y)^2
    SSReg = ((u_.T.dot(Y))**2).sum()
    return SSReg / (SST - C)

def SSR(Y, X):
    "compute the sum of square of the regression on X"
    (n_, p_) = X.shape
    if n_ < p_:
        raise ValueError("n must be >= to p")
    u_, _, _ = slg.svd(X, full_matrices=False)
    return (((u_.T * Y.T).sum(axis=1))**2).sum()

def Proj(X):
    """ This function returns the projector on the 
        space of the columns of `X`
    """
    (n_, p_) = X.shape
    if n_ < p_:
        raise ValueError("n must be >= to p")
    u_, _, _ = slg.svd(X, full_matrices=False)
    return u_.dot(u_.T)

this should be close to 3:  2.64427014763
this should be close to 0:  -3.5527136788e-16

In [42]:

print(R2(X_both[:,[0]], X_both[:,1:]))

0.493287984608

In [43]:

VIF = 1./(1. - .55)
print(VIF)

2.22222222222

In [44]:

print(R2(X_both_o[:,[0]], X_both_o[:,1:]))

0.000150782815012