We know that the traditional way of looking at an F-contrast is to think of two models : a full model $X$, and a reduced model $X_0$, and compute the sum of square of the residuals under these two models. We also know that we can express the numerator of an F test with a contrast that impose a constraint on the parameters of the model. This short notebook is to make clear the relation between these two formulations, and therefore being able to test unusual hypotheses more easily.
import scipy.linalg as lin
import scipy.stats as sst
import numpy as np
n = 30
# make a design of 1 factor 3 levels
oneway_anov = np.kron(np.eye(3), ones((10,1)))
#add a trend?
#trend = np.arange(0,n).reshape(n,1)
# add an intercept
intercept = np.ones((n,1))
X = np.hstack((oneway_anov, intercept))
print X
[[ 1. 0. 0. 1.] [ 1. 0. 0. 1.] [ 1. 0. 0. 1.] [ 1. 0. 0. 1.] [ 1. 0. 0. 1.] [ 1. 0. 0. 1.] [ 1. 0. 0. 1.] [ 1. 0. 0. 1.] [ 1. 0. 0. 1.] [ 1. 0. 0. 1.] [ 0. 1. 0. 1.] [ 0. 1. 0. 1.] [ 0. 1. 0. 1.] [ 0. 1. 0. 1.] [ 0. 1. 0. 1.] [ 0. 1. 0. 1.] [ 0. 1. 0. 1.] [ 0. 1. 0. 1.] [ 0. 1. 0. 1.] [ 0. 1. 0. 1.] [ 0. 0. 1. 1.] [ 0. 0. 1. 1.] [ 0. 0. 1. 1.] [ 0. 0. 1. 1.] [ 0. 0. 1. 1.] [ 0. 0. 1. 1.] [ 0. 0. 1. 1.] [ 0. 0. 1. 1.] [ 0. 0. 1. 1.] [ 0. 0. 1. 1.]]
n, p = X.shape
Y = np.random.randn(n, 1)
csignal = np.array([1., -1., 0, 3.]).reshape(4,1)
signal = X.dot(csignal)
Y += signal
print np.hstack((Y, X))
[[ 2.77986271 1. 0. 0. 1. ] [ 3.99773182 1. 0. 0. 1. ] [ 3.7271806 1. 0. 0. 1. ] [ 3.8632183 1. 0. 0. 1. ] [ 4.20122298 1. 0. 0. 1. ] [ 4.1900779 1. 0. 0. 1. ] [ 3.23495586 1. 0. 0. 1. ] [ 3.49377632 1. 0. 0. 1. ] [ 3.80760708 1. 0. 0. 1. ] [ 4.15341048 1. 0. 0. 1. ] [ 1.10881459 0. 1. 0. 1. ] [ 1.45459043 0. 1. 0. 1. ] [ 1.49804018 0. 1. 0. 1. ] [ 2.04922078 0. 1. 0. 1. ] [ 2.81045098 0. 1. 0. 1. ] [ 0.11316675 0. 1. 0. 1. ] [ 2.58020441 0. 1. 0. 1. ] [ 0.17984951 0. 1. 0. 1. ] [ 2.83416135 0. 1. 0. 1. ] [ 0.83353014 0. 1. 0. 1. ] [ 2.98371249 0. 0. 1. 1. ] [ 3.74146005 0. 0. 1. 1. ] [ 3.39001635 0. 0. 1. 1. ] [ 2.33916425 0. 0. 1. 1. ] [ 3.51469752 0. 0. 1. 1. ] [ 4.16145559 0. 0. 1. 1. ] [ 3.68332627 0. 0. 1. 1. ] [ 3.08862975 0. 0. 1. 1. ] [ 2.86829704 0. 0. 1. 1. ] [ 3.266467 0. 0. 1. 1. ]]
Let's imaging that we know the reduced model: we'd like to test what is in $X$ but not in $X_0$.
We define the orthogonal projector onto $X$ as $P_X = X X^-$, where $X^-$ is the moore-penrose pseudo inverse.
\begin{eqnarray} %\label{equ:F} F_{\nu_1, \nu_2} & = & \frac{(Y^T(I-P_{X_0})Y - Y^T(I-P_{X})Y)/ \nu_{1} }{Y^T(I-P_{X})Y/\nu_{2}} \\ & = & \frac{(SSR(X_0) - SSR(X))/\nu_1}{SSR(X)/\nu_2} \end{eqnarray}with $SSR(X)$, (resp. $SSR(X_0)$) the sum of square for error of model $X$ (resp. $X_0$), and
\begin{eqnarray} \nu_{1} & = & tr(P_X - P_{X_0}) = tr(R_0 - R_X) \\ \nonumber \nu_{2} & = & tr(I - P_X) = tr(R_X) \end{eqnarray}from numpy.linalg import matrix_rank
# lets do an F-test :
def F_test(Y, X, L0):
'''
Y: numpy array
data
X: numpy array
design matrix
L0: numpy array
contrast forming X0, the "null" model
returns:
F, nu1, nu2 : float
'''
X0 = X.dot(L0)
n = X.shape[0]
p0 = matrix_rank(X0)
p = matrix_rank(X)
nu1 = p-p0
nu2 = n-p
SSR_num = (Y.T.dot(proj(X, Y)) - Y.T.dot(proj(X0, Y)))
numF = SSR_num/nu1
print ' ', SSR_num, p, p0
SSR_den = (Y.T.dot(R_proj(X, Y)))
denF = SSR_den/nu2
print ' ', SSR_den, n, p
return numF/denF, nu1, nu2
def proj(X, Y=None):
# project onto the space of X
#print matrix_rank(X)
[u, s, vt] = lin.svd(X, full_matrices=False)
tol = s.max() * max(X.shape) * finfo(s.dtype).eps
nz = np.where(s > tol)[0]
#print nz
if Y is None:
return u[:,nz].dot(u[:,nz].T)
else:
return u[:,nz].dot(u[:,nz].T.dot(Y))
def R_proj(X, Y=None):
# project onto the space orthogonal to X (RY)
if Y is None:
return np.eye(X.shape[0]) - proj(X, Y)
else:
return Y - proj(X, Y)
Understanding the numerator : Matthew's proposal - works under some conditions:
At line $i$ :
$$ \begin{eqnarray} \epsilon_{0i}^2 - \epsilon_i^2 &=& (Y - \mu)^2 - (Y - \beta_{i} - \mu)^2 \\ &=& \epsilon_{0i}^2 - (\epsilon_{0i} - \beta_{i})^2 \\ &=& \beta_i^2 + 2\epsilon_{0i}\beta_{i} \end{eqnarray} $$Summing these, we see that $\sum_i\epsilon_{0i}\beta_{i} = 0$ since each column of X must be decorrelated from the noise $\epsilon_0$
A more general version of this can be written in the following way, with a small bit of matrix algebra:
$$ \begin{eqnarray} SSR(X_0) - SSR(X) & = & (Y^T(I-P_{X_0})Y - Y^T(I-P_{X})Y) \\ & = & Y^TY - YP_{X_0}Y - Y^TY + YP_{X}Y \\ & = & Y^TP_{X}Y - Y^TP_{X_0}Y \end{eqnarray} $$But $Y^TP_{X}Y$ is also $Y^TP_{X}P_{X}Y = Y^TX(X^TX)^{-1}X^T X(X^TX)^{-1}X^T Y = \beta^T X^T X \beta$
The first equality is because $P_{X} = P_{X}P_{X}$, the second by replacing $P_X$ by $X(X^TX)^{-1}X^T $.
Now, we can partition $X$ in $[X_t X_0]$ where $X_0$ is the null model and $X_t$ is the tested part (with parameters $\beta_t$), and if they are orthogonal, then
$$ SSR(X_0) - SSR(X) = \beta^T X^T X \beta - \beta_0^T X_0^T X_0 \beta_0 = \beta_t^T X_t^T X_t \beta_t $$and therefore
$$ SSR(X_0) - SSR(X) = \beta_t^T X_t^T X_t \beta_t $$which corresponds to the sum of square of the means, weighted by the number of observation (see this by doing the dot product of $X_t \beta_t$ by itself. Note what is $ X_t \beta_t$ : the "means".
L0 = np.eye(p)[:,3:]
(F, n1, n2) = F_test(Y, X, L0)
print (F, n1, n2)
[[ 27.05935053]] 3 1 [[ 13.51305476]] 30 3 (array([[ 27.0332089]]), 2, 27)
# split X in Xt and X0 :
X0 = X[:,[3]]
# orthogonolize the three first columns of X : this is what we want to test
Xt = R_proj(X0, X[:,:3])
# recreate the all matrix
Xall = np.hstack((Xt, X0))
betah = lin.pinv(Xall).dot(Y)
betah_t = betah[:3,:]
print "betah_t : ", betah_t
tmp = Xt.dot(betah_t)
SSRXt = tmp.T.dot(tmp)
print "Using the sum of square of the beta_t : SSRXt = %f \n" % SSRXt
SSRX_SSRX0 = Y.T.dot(proj(Xall, Y)) - Y.T.dot(proj(X0, Y))
print "Using SSR X - SSR X0 : SSRX_SSRX0 = %f \n" % SSRX_SSRX0
betah_t : [[ 0.87996109] [-1.3187404 ] [ 0.43877931]] Using the sum of square of the beta_t : SSRXt = 27.059351 Using SSR X - SSR X0 : SSRX_SSRX0 = 27.059351
from scipy.stats import f as Fdist
Fdist.sf(F,n1,n2)
array([[ 3.58143893e-07]])
# check if we have no rank deficiency
# recreate the original
X = np.hstack((oneway_anov, intercept))
Xt = X.dot(np.asarray([[1, -1, 0, 0],[0, 1, -1, 0]]).T)
X0 = X[:,[3]]
Xall = np.hstack((Xt, X0))
betah = np.linalg.pinv(Xall).dot(Y)
#print Xall
#print Xall.T.dot(Xall)
betah_t = betah[:2,:]
print " betah_t ", betah_t
tmp = Xt.dot(betah_t)
SSRXt = tmp.T.dot(tmp)
print "Using the sum of square of the beta_t : SSRXt = %f \n" % SSRXt
SSRX_SSRX0 = Y.T.dot(proj(Xall, Y)) - Y.T.dot(proj(X0, Y))
print "Using SSR X - SSR X0 : SSRX_SSRX0 = %f \n" % SSRX_SSRX0
betah_t [[ 0.50142161] [ 0.02168588]] Using the sum of square of the beta_t : SSRXt = 4.820403 Using SSR X - SSR X0 : SSRX_SSRX0 = 4.820403
Now - let's make the F test with a contrast. We just assume that we derived it - the formula is
$$ \begin{equation} F_{\nu_1, \nu_2} = \frac{\hat\beta^T \Lambda (\Lambda^T (X^TX)^- \Lambda)^- \Lambda^T \hat\beta / \nu_{1} }{MSE} \\ \end{equation} $$where the $MSE$ is the mean square error $SSR(X)/\nu_2$.
def F_con(X, Y, L, verbose=False):
# check that the contrast makes sense
B = lin.pinv(X).dot(Y)
# Numerator
XtX = X.T.dot(X)
piXtX = lin.pinv(X.T.dot(X))
piLtpiXtXL = lin.pinv(L.T.dot(piXtX).dot(L))
LtB = L.T.dot(B)
Fnum = LtB.T.dot(piLtpiXtXL).dot(LtB)
# SSRX ?
SSRX = Y.T.dot(Y) - B.T.dot(XtX).dot(B)
v1 = matrix_rank(L)
v2 = X.shape[0] - matrix_rank(X)
if verbose:
print 'Fnum: %f, df1=%d, Fdenom: %f, df2=%d, \n' % (Fnum, v1, SSRX, v2)
return (Fnum/v1)/(SSRX/v2), v1, v2
B = lin.pinv(X).dot(Y)
print B
#print X
L = np.array([[1., -1., 0, 0],[0., 1., -1., 0]]).T
print L
F,v1,v2 = F_con(X, Y, L, verbose = True)
print F,v1,v2
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-71-e62367a14bfc> in <module>() 4 L = np.array([[1., -1., 0, 0],[0., 1., -1., 0]]).T 5 print L ----> 6 F,v1,v2 = F_con(X, Y, L, verbose = True) 7 print F,v1,v2 <ipython-input-70-ecf4a53ba7b2> in F_con(X, Y, L, verbose) 6 # Numerator 7 piXtX = lin.pinv(X.T.dot(X)) ----> 8 piLtpiXtXL = lin.pinv(L.T.dot(piXtX).dot(L)) 9 LtB = L.T.dot(B) 10 Fnum = LtB.T.dot(piLtpiXtXL).dot(LtB) ValueError: objects are not aligned
[[ 1.0565886 ] [ 0.82571892] [ 0.36905904] [ 1.90725234] [ 0.34411422] [ 2.25136656]] [[ 1. 0.] [-1. 1.] [ 0. -1.] [ 0. 0.]]
print X
[[ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 0. 1. 1.] [ 1. 0. 0. 0. 1. 1.] [ 1. 0. 0. 0. 1. 1.] [ 1. 0. 0. 0. 1. 1.] [ 1. 0. 0. 0. 1. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 0. 1. 1.] [ 0. 0. 1. 0. 1. 1.] [ 0. 0. 1. 0. 1. 1.] [ 0. 0. 1. 0. 1. 1.] [ 0. 0. 1. 0. 1. 1.]]
Here, we quickly take the example of a 2 ways anova - with no interaction. Say we have two groups of subjects (eg patients and normal controls) and they are divided in 3 groups each, depending for instance on rehabilitation therapy. Let's do the desing matrix:
# simplifiy the notations
from numpy import kron, ones, eye, zeros
nbs1= 10; g1 = 3; g2 = 2;
n = nbs1*g1
factor1 = kron(eye(g1), ones((nbs1,1)))
factor2 = kron(eye(g2), ones((n/6,1)))
factor2 = kron(ones((g1,1)), factor2)
intercept = ones((n,1))
X = np.hstack((factor1, factor2, intercept))
print X
[[ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 1. 0. 1.] [ 1. 0. 0. 0. 1. 1.] [ 1. 0. 0. 0. 1. 1.] [ 1. 0. 0. 0. 1. 1.] [ 1. 0. 0. 0. 1. 1.] [ 1. 0. 0. 0. 1. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 1. 0. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 1. 0. 0. 1. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 1. 0. 1.] [ 0. 0. 1. 0. 1. 1.] [ 0. 0. 1. 0. 1. 1.] [ 0. 0. 1. 0. 1. 1.] [ 0. 0. 1. 0. 1. 1.] [ 0. 0. 1. 0. 1. 1.]]
Simulate the data :
print csignal
[[ 0.5] [-0.5] [ 0. ] [ 2. ] [ 0. ] [ 3. ]]
n, p = X.shape
f1_fx = np.asarray([1., -1., 0]) * .5
f2_fx = np.asarray([2., 0.])
offset = np.asarray([3.])
csignal = np.hstack((f1_fx, f2_fx, offset)).reshape(p,1)
Y = np.random.randn(n, 1)
signal = X.dot(csignal)
Y = Y + signal
print signal
[[ 5.5] [ 5.5] [ 5.5] [ 5.5] [ 5.5] [ 3.5] [ 3.5] [ 3.5] [ 3.5] [ 3.5] [ 4.5] [ 4.5] [ 4.5] [ 4.5] [ 4.5] [ 2.5] [ 2.5] [ 2.5] [ 2.5] [ 2.5] [ 5. ] [ 5. ] [ 5. ] [ 5. ] [ 5. ] [ 3. ] [ 3. ] [ 3. ] [ 3. ] [ 3. ]]
Test for factor 1:
Fc1 = np.asarray([[1, -1, 0], [0, 1, -1]])
F_fill = zeros((2, p-3))
Fc = np.hstack((Fc1, F_fill))
print Fc
print p
(f, n1, n2) = F_con(X, Y, Fc.T, verbose=True)
# compute the P value:
Fdist.sf(f,n1,n2)
[[ 1. -1. 0. 0. 0. 0.] [ 0. 1. -1. 0. 0. 0.]] 6 Fnum: 1.796334, df1=2, Fdenom: 25.258972, df2=26,
array([[ 0.40937548]])
We assume from now on that the contrast of parameters we would like to test may concern multiple constraints on the $\beta$, and will denote those constraints by $\Lambda \beta = 0$.
Setting $\Lambda^T \beta = 0$ is also setting $H^T X \beta = 0$ for some matrix $H$ (the number of columns in $H$ is the number of columns in $\Lambda$).
If we have a (legitimate) contrast of the parameters, what does $H$ look like? How does this relate to a reduced model? Why is all this relevant for fMRI? It is interesting to understand what $H$ is. While $\Lambda$ puts some constraints on the parameters of the model, $H$ is the equivalent constraint on the space of $X$ \citep{Christensen2002}. The reduced model that is to be tested is $ Y = X \beta + \epsilon $ $\textbf{and}$ $H^T X \beta =0$, or, equivalently, that $E(Y) \in C(X)$ $\textbf{and}$ $E(Y) \in C(H)^{\perp}$ where $C(H)^{\perp}$ is the space orthogonal to $H$ (This is the set of vectors with zero correlation with any vector of $H$). The reduced model should therefore be a matrix $X_0$ such that $X_0 \in C(X)$ $\textbf{and}$ $X_0 \in C(H)^{\perp}$.
By choosing $H = X^{T-} \Lambda$, we impose the constraint on $X$ that corresponds to $\Lambda^T \beta = 0$. Since $H^T E(Y) = H^T X \beta = \Lambda^T X^- X \beta = \Lambda^T \beta = 0 $, we have $E(Y) \in C(H)^\perp $ with this particular choice of $H$. Also, $H \in C(X)$, so $H$ is a matrix that imposes a constraint on $X$ within the space of $X$, and is the part of $X$ that is being tested. The part of $X$ that is not being tested, the reduced model, can be found easily by projecting the orthogonal space of $H$ onto $X$ , so we have $X_0 = P_X R_H = P_X(I - P_H) = P_X - P_H$. Testing for $\Lambda \beta = 0$ is equivalent to testing for the reduced model $Y = X_0 \gamma + \epsilon$ with $X_0$ as defined above. Having defined $X_0$ as above, we have $P_X = P_{X_0} + P_H$.
Now, the numerator of the F test can be rewritten in a much simpler way, as a function of $\Lambda$ only:
\begin{eqnarray} Y^T (P_H) Y & = & Y^T X (X^T X)^- X^T H(H^T H)^- H^T X (X^T X)^- X^T Y / r(\Lambda) \\ %\nonumber & = & \hat\beta^T \Lambda (\Lambda^T (X^TX)^- \Lambda)^- \Lambda^T \hat\beta / r(\Lambda) \end{eqnarray}Where $r(\Lambda)$ is the rank of $\Lambda$, and the F test can be rewritten as:
\begin{equation} F_{\nu_1, \nu_2} = \frac{\hat\beta^T \Lambda (\Lambda^T (X^TX)^- \Lambda)^- \Lambda^T \hat\beta / \nu_{1} }{MSE} \\ \end{equation}where the $MSE$ is the mean square error $SSR(X)/\nu_2$.