Principles of statistics in Statistical Parametric Mapping¶

This notebook is a basic introduction to the statistics of statistical parametric mapping (SPM). The page is designed for readers who have had very little formal statistical background, and I have tried to keep formulae to a minimum.

For a more technical treatment of the statistical background, see the chapter on the general linear model in the book Human Brain Function (2nd edition)

First we load the interactive graphics and other libraries:

In [1]:

%pylab inline

Populating the interactive namespace from numpy and matplotlib

In [2]:

import numpy as np
import matplotlib
import matplotlib.pyplot as plt

I like gray images, with nearest neighbor resampling:

In [3]:

matplotlib.rcParams['image.cmap'] = 'gray'
matplotlib.rcParams['image.interpolation'] = 'nearest'

Introduction¶

The statistics interface to functional imaging packages can be hard to understand for those of us trained to think of statistics in terms of means and variances.

The reason that the statistical interface to functional imaging packages is so much harder to understand is that SPM, FSL and others choose to take the user much closer to the math; they sacrifice ease of understanding for great flexiblity of the designs that can be analysed with the same interface.

To understand these interfaces, you will need some understanding of the underlying math. I hope to show that this need not be an insurmountable task, even for the relative statistical novice.

To start, it is important to point out that the functional imaging analysis creates statistics by doing a seperate statistical analysis for each voxel. Standard analyses analyze each voxel independently. Specifically, the standard analysis:

does a regression analysis separately at each voxel;
makes t statistics from the results of this analysis, for each voxel;
works out a p value for each t statistic;
shows you an image of the t statistics;
suggests a correction to the significance of the t statistics which takes account of the multiple comparisons in the image.

This document tries to explain how steps 1 to 4 work. I have written a separate tutorial on Random Field Theory that describes the basics of step 5.

Before continuing, it is worth re-stressing that the regression analysis is entirely independent for each voxel.

The statistics it uses to do this are fairly straightforward, and can be found in most statistics textbooks.

Naming of parts¶

This section goes through the standard statistical terms for an analysis, and how they relate to a functional imaging analysis.

The response variable in statistics is some measured data for each observation. A response variable is often referred to as dependent variable. In the case of functional imaging, the response variable is made up of all the values from an individual voxel for each of the scans in the analysis.

The predictor variable contains some value used to predict the data in the response variable. A predictor variable may also be called an independent variable. Each variable contains a possible effect. In our case a predictor variable might be some covariate such as task difficulty which is known for each scan, and which might influence the response variable - in our case, voxel values.

Thus, in functional imaging:

observation : a voxel value, in the voxel we are analysing, for one scan;
response variable : data for all the scans for one voxel (i.e. all the observations);
predictor variable : covariate = effect.

Getting the data¶

In order to run this tutorial, you'll need to download and unpack a set of data files.

The following two commands will download and unpack them on linux:

curl -O http://imaging.mrc-cbu.cam.ac.uk/downloads/Tutscans/egscans.tar.gz
tar xzf egscans.tar.gz

The fetch_scans() function below will do all the downloading in a cross-platform manner, using python-only tools. You should run it even if you downloaded the files manually, as it will compute the necessary paths for the rest of the script (it won't re-download files already present).

In [4]:

import os
import tarfile
import urllib

def fetch_scans():
    # Names of the files in the data bundle
    scans_dir = 'egscans'
    files = [os.path.join(scans_dir, 'snn03055dy%i.img' % i) for i in range(1,13)]
    
    # The remote url and filename for local storage
    data_url = 'http://imaging.mrc-cbu.cam.ac.uk/downloads/Tutscans/egscans.tar.gz'
    tarball = 'egscans.tar.gz'

    # First, check that we don't already have the files, whose names are:
    if all(map(os.path.isfile, files)):
        print 'All files already present, nothing to download.'
        return files
    
    # If we don't have them, make the storage directory, fetch (if needed)
    # and unpack skipping the .mat files that are in the tarball.
    if not os.path.isdir(scans_dir):
        os.mkdir(scans_dir)

    if not os.path.isfile(tarball):
        print "Fetching remote tarball, this may take some time depending on your network..."
        urllib.urlretrieve(data_url, tarball)

    print "Unpacking scans, omitting .mat files"
    tf = tarfile.open(tarball, 'r:gz')
    image_files = [ f for f in tf.getmembers() if not f.name.endswith('.mat')]
    tf.extractall(scans_dir, image_files)
    print "All files unpacked into", scans_dir, ', done!'
    return files

files = fetch_scans()

All files already present, nothing to download.

Starting the analysis¶

Let's now make actual images out of these files

In [5]:

import nibabel as nib
images = map(nib.load, files)

In [6]:

# grand mean for data
GM = 50

Now we load the affine transformation from the first image (it's the same for all) and extract the conversion parameters from voxels to MNI space (in mm).

In [7]:

# transformation from voxel no to mm
M = images[0].get_affine()
# mm to voxel no
iM = np.linalg.inv(M)
# coordinates of voxel of interest in mm (MNI space)
posmm = [-20.0, -42, 34]
# coordinates in voxel space (in homogenous coordinates)
posvox = np.dot(iM, posmm + [1])
# We grab the spatial part of the output.  Since we want to use it as an 
# index, we need to make it a tuple
posvox = tuple(posvox[:3].astype(int))
posvox

Out[7]:

(29, 35, 42)

We need to define a small utility function commonly used in SPM, we'll call it global_mean.

In [8]:

def global_mean(img):
    """Compute a "sorted global mean" of an image.

    This returns the mean of the image voxels that are in the range above 
    1/8 of the mean."""
    
    m = img.mean()/8.0
    return img[img>m].mean()

With this utility, we can now get the data for the voxel of interest across all the images, and compute the global mean of each scans:

In [9]:

nimgs = len(images)
vdata = np.zeros(nimgs)
gdata = np.zeros(nimgs)
for i, V in enumerate(images):
    d = V.get_data()
    vdata[i] = d[posvox]
    gdata[i] = global_mean(d)
gdata

Out[9]:

array([ 0.00459615,  0.004154  ,  0.00462038,  0.00439584,  0.00417357,
        0.00442748,  0.00416478,  0.00471157,  0.00419319,  0.00455416,
        0.00420025,  0.00405441])

We can display slices from the scans. Here is slice 30 of the last scan:

In [10]:

last_image = images[-1]
last_data = last_image.get_data()
imshow(last_data[:, :, 30])

Out[10]:

<matplotlib.image.AxesImage at 0x717f350>

Plot globals¶

In [11]:

plt.plot(gdata, vdata, 'x')
plt.xlabel('TMVV for scan')
plt.ylabel('Voxel value for -20 -42 34')
plt.title('The relationship of overall signal to values for an individual voxel')

Out[11]:

<matplotlib.text.Text at 0x71a2b90>

In [12]:

# proportionally scale data
pvdata = vdata / gdata
# scale to global mean of 50
Y = pvdata * GM
# our covariate
ourcov = np.array([5,4,4,2,3,1,6,3,1,6,5,2])

In [13]:

# plot data 
plot(ourcov, Y, 'x')
xlabel('Task difficulty')
ylabel('PS voxel value')
xlim(0, 7)
title('The relationship of task difficulty to voxel value');

In [14]:

# guess intercept, slope, estimated points
guessic   = 55
guesslope = 0.5
guessPts  = guessic + guesslope*ourcov

For plotting it will be convenient to have the boundaries of our plot region available

In [15]:

minx = 0
maxx = ourcov.max()+1 # min max x for plots
axs = np.array([minx, maxx, Y.min()*0.98, Y.max()*1.02])

In [16]:

# plot data with guess intercept and slope
plot(ourcov, Y, 'x')
plot(axs[:2], guessic+guesslope*axs[:2], ':') # guessed line
# plot guessed residuals
for i, c in enumerate(ourcov):
  plot([c, c], [Y[i], guessPts[i]], color='r')
plt.axis(axs)
xlabel('Task difficulty')
ylabel('PS voxel value')
xlim(0, 7)
title('A first guess at a linear relationship, and its residuals');

In [17]:

# residuals from guess slope
guessRes = Y - guessPts

Now we construct the design matrix $X$.

In [18]:

X  = np.column_stack([ourcov, ones(nimgs)])

We would like to display it in a useful way

In [19]:

def scale_design_mtx(X):
    """utility to scale the design matrix for display

    This scales the columns to their own range so we can see the variations 
    across the column for all the columns, regardless of the scaling of the 
    column.
    """
    mi, ma = X.min(axis=0), X.max(axis=0)
    col_neq = (ma - mi) > 1.e-8
    Xs = np.ones_like(X)
    mi = mi[col_neq]
    ma = ma[col_neq]
    Xs[:,col_neq] = (X[:,col_neq] - mi)/(ma - mi)
    return Xs

In [20]:

def show_design(X, design_title):
    """ Show the design matrix nicely """
    figure()
    plt.gray()
    imshow(scale_design_mtx(X), interpolation='nearest')
    title(design_title)

In [21]:

# display using our fancy function
show_design(X, 'Design matrix for the first analysis')

We need to know the degrees of freedom. In this simple case, we have two effects (the intercept and the slope) and 12 observations, leaving us with 10 degrees of freedom. In general, the number of effects we have is given by the number of independent columns in the design matrix $X$ above. This is the column rank of $X$. This is given by the numpy function numpy.linalg.matrix_rank for numpy >= 1.5, but in case you have an earlier version of numpy, a crude calculation of the matrix rank is a couple of lines of code:

In [22]:

def matrix_rank(X):
    U, S, V = np.linalg.svd(X)
    # Number of singular values above an arbitrary small threshold
    return np.sum(S > 1e-5)

Now we can calculate the degrees of freedom in a more general way:

In [23]:

df = nimgs - matrix_rank(X)
# mean SoS for guess slope
guessRSS = (guessRes**2).sum() / df

The analysis giving the least squares fit is very simple: we matrix multiply the pseudoinverse of $X$ (denoted $X^+$) by the data, to get the estimation of the model parameters, the $\hat\beta$ 's : $\hat\beta = X^+ Y$

In [24]:

# the analysis, giving slope and constant
B = dot(pinv(X), Y)
print B

[  0.63928135  54.39383796]

In [25]:

# plot data with new least squares line
plot(ourcov, Y, 'x')
bound = np.array([minx, maxx])
plot(bound, bound*B[0]+B[1], 'r')
axis(axs)
xlabel('Task difficulty')
ylabel('PS voxel value')
title('The least squares linear relationship');

We'll need some statistical machinery from scipy

In [26]:

from scipy.stats import t as tdist, norm as normdist

And now we can compute our stats for this contrast: The $\textbf{Residual Sum of Square}$ (RSS) is :

$$ RSS = \sum_{i=1}^{12} (Y_i - X_i \beta)^2$$

$X_i$ is the $i$th row of the matrix $X$. The $\textbf{Mean Residual Sum of Square}$ is :

$$ MRSS = RSS / \textit{df} $$

where $\textit{df}$ are the degrees of freedom, computed as the number of observations (here, 12) minus the rank of the design matrix (ie, the number of independent vectors in X). The $MRSS$ is the noise variance estimation: $\widehat{\sigma^2}$.

The $\textbf{Standard Error}$ (SE) of $c \beta$ is computed with $\textrm{SE} = \sqrt{\widehat{\sigma^2} c (X^t X)^+ c^t} $ (why is that so is to be expanded here ...)

The t test is then : $t = \frac{c \beta}{\textrm{SE}} $

In [27]:

# Contrast
C = np.array([1, 0])
# t statistic and significance test
RSS   = ((Y - dot(X, B))**2).sum()
MRSS  = RSS / df
SE    = np.sqrt(MRSS*dot(C, (dot(pinv(dot(X.T, X)), C.T))))
t     = dot(C, B)/SE
ltp   = tdist(df).cdf(t) # lower tail p
p     = 1-ltp           # upper tail p 

# print results to window
print 'First TD analysis: t= %2.2f, p= %0.6f' % (t, p)

First TD analysis: t= 7.95, p= 0.000006

We can package the analysis up into a function:

In [28]:

def t_stat(Y, X, C):
    """ betas, t statistic and significance test given data, design matrix, contrast
    """
    # Calculate the parameters
    B = dot(pinv(X), Y)
    RSS   = ((Y - dot(X, B))**2).sum(axis=0)
    # Recalculate df
    df =  X.shape[0] - matrix_rank(X)
    MRSS  = RSS / df
    SE    = np.sqrt(MRSS*dot(C, (dot(pinv(dot(X.T, X)), C.T))))
    t     = dot(C, B)/SE
    ltp   = tdist(df).cdf(t) # lower tail p
    p = 1 - ltp # upper tail p
    return B, t, df, p

# Results are the same
print 'First TD analysis again: B = %s, t= %2.2f, df= %d, p= %0.6f' % t_stat(Y, X, C)

First TD analysis again: B = [  0.63928135  54.39383796], t= 7.95, df= 10, p= 0.000006

In [29]:

# save data for analysis in some package that reads csv type files
np.savetxt("voxdata.txt", Y)

Now the analysis for the added covariate.

This and all the other analyses here use exactly the same code as above.

In [30]:

# design matrix, df
X = np.column_stack([ourcov, np.arange(1, nimgs+1), np.ones(nimgs)])
show_design(X, 'Design matrix for added covariate')

In [31]:

# Contrast
C = np.array([1, 0, 0])
# the analysis
B, t, df, p = t_stat(Y, X, C)
print 'Added covariate analysis: t= %2.2f, p= %0.6f' % (t, p)

Added covariate analysis: t= 7.81, p= 0.000013

Conditions analysis¶

In [32]:

X = np.zeros((nimgs, 2))
X[0:6, 0] = 1
X[6:, 1] = 1
show_design(X, 'Design matrix for two conditions')

In [33]:

B = np.linalg.pinv(X).dot(Y)
B

Out[33]:

array([ 56.54455457,  56.71809077])

These are the means for each set of observations (why?):

In [34]:

np.mean(Y[:6]), np.mean(Y[6:])

Out[34]:

(56.544554566379446, 56.718090771077186)

In [35]:

# Contrast for activation minus rest
C = np.array([-1, 1])
# the analysis
B, t, df, p = t_stat(Y, X, C)
print 'Analysis for two conditions: t= %2.2f, p= %0.6f' % (t, p)

Analysis for two conditions: t= 0.23, p= 0.409795

Some extra analyses to show how it all works¶

In [36]:

# do an ANCOVA for globals, instead of proportional scaling
# GM scale
Yanc = vdata * GM / np.mean(gdata)

In [37]:

X = np.column_stack((ourcov, gdata, np.ones(nimgs)))
show_design(X, 'Design with ANCOVA for global signal')

In [38]:

# Contrast for task covariate only
C = np.array([1, 0, 0])
# the analysis
B, t, df, p = t_stat(Yanc, X, C)
print 'TD analysis with ANCOVA for global: t= %2.2f, p= %0.6f' % (t, p)

TD analysis with ANCOVA for global: t= 7.15, p= 0.000027

Another voxel

In [39]:

# coordinates of voxel of interest in mm (MNI space)
posmm2  = [6, 0, 6]
# Coordinates in voxel space
posvox2 = np.dot(iM, posmm2 + [1])
# We grab the spatial part of the output.  Since we want to use it as an 
# index, we need to make it a tuple
posvox2 = tuple(posvox2[:3].astype(int))
posvox2

Out[39]:

(42, 56, 28)

In [40]:

# Get the data for this voxel
vdata2 = np.zeros(nimgs)
for i, V in enumerate(images):
    d = V.get_data()
    vdata2[i] = d[posvox2]

In [41]:

# Proportional and global mean scale
Y2 = vdata2 / gdata * 50

In [42]:

X = np.column_stack((ourcov, np.ones(nimgs)))
C = np.array([1, 0])
# the analysis
B, t, df, p = t_stat(Y2, X, C)
print 'First TD analysis second voxel: t= %2.2f, p= %0.6f' % (t, p)

First TD analysis second voxel: t= 5.84, p= 0.000082

Analyzing two voxels at the same time, but independently:

In [43]:

Y_both = np.column_stack((Y, Y2))

In [44]:

# the analysis
B, t, df, p = t_stat(Y_both, X, C)
print 'First TD analysis first voxel: t= %2.2f, p= %0.6f' % (t[0], p[0])
print 'First TD analysis second voxel: t= %2.2f, p= %0.6f' % (t[1], p[1])

First TD analysis first voxel: t= 7.95, p= 0.000006
First TD analysis second voxel: t= 5.84, p= 0.000082

In [45]:

B.shape

Out[45]:

(2, 2)

You can see that the result when analyzing two voxels at the same time are identical to the results when analyzing them separately.