We discovered a strange truncation of strongly negative values when converting from T statistic scores --> P values --> Z scores. First we will show the strangeness.
This is a group map for a "story" contrast from a language task from the Human Connectome Project (HCP). For this task, there are alternating blocks of doing match problems and listening to a story. This contrast is for the "story" blocks.
We concatenated each single subject cope1.nii.gz image representing this contrast in time, for a total of 486 subjects (timepoints), and ran randomise for 5000 iterations (fsl).
Now we can read in the file, and first look at the image itself and the T-distribution.
import matplotlib
import matplotlib.pylab as plt
import numpy as np
%matplotlib inline
import nibabel as nib
from nilearn.plotting import plot_stat_map, plot_roi
from scipy.spatial.distance import pdist
from scipy.stats import norm, t
import seaborn as sns
all_copes_file = "/home/vanessa/Desktop/tfMRI_LANGUAGE_STORY.nii_tstat1.nii.gz"
all_copes = nib.load(all_copes_file)
plot_stat_map(all_copes)
print "Here is our map created with randomise for all 486 subjects, for the story contrast"
Here is our map created with randomise for all 486 subjects, for the story contrast
Now I want to point out something about this map - we have a set of strongly negative outliers.
# Function to flag outliers
def plot_outliers(image,n_std=6):
mr = nib.load(image)
data = mr.get_data()
mean = data.mean()
std = data.std()
six_dev_up = mean + n_std * std
six_dev_down = mean - n_std*std
empty_brain = np.zeros(data.shape)
empty_brain[data>=six_dev_up] = 1
empty_brain[data<=six_dev_down] = 1
outlier_nii = nib.nifti1.Nifti1Image(empty_brain,affine=mr.get_affine(),header=mr.get_header())
plot_roi(outlier_nii)
plot_outliers(all_copes_file)
It is a separate problem entirely if those outliers should be there, but for the purposes of this problem, we would want any conversion from T to Z to maintain those outliers. Let's now look at the distribution of the data.
data = all_copes.get_data()
data = data[data!=0]
sns.distplot(data.flatten(), label="Original T-Stat Data")
plt.legend()
print "Here is our map created with randomise for all 486 subjects, for the story contrast"
Here is our map created with randomise for all 486 subjects, for the story contrast
We have a heavy left tail, meaning lots of strongly negative values.
We next will convert T scores into P values by way of the "survival function" from the scipy.stats t module. The survival function is actually 1 - the cumulative density function (CDF) that will give us the probability (p-value) for each of our random variable (the T).
The degrees of freedom should be the number of subjects from which the group map was derived -2.
dof=486 - 2
data = all_copes.get_data()
p_values = t.sf(data, df = dof)
p_values[p_values==1] = 0.99999999999999
sns.distplot(p_values.flatten(), label="P-Values from T-Stat Data")
plt.legend()
print "Here are the p-values created from the t-stat map, including all zeros in the map when we calculate"
Here are the p-values created from the t-stat map, including all zeros in the map when we calculate
Now we can use scipy.stats.norm inverse survival function to "undo" the p-values back into normal (Z scores).
z_values = norm.isf(p_values)
sns.distplot(z_values.flatten(), label="Z-Values from T-Stat Data")
plt.legend()
print "Here are the z-values created from the t-stat map, including all zeros in the map when we calculate"
Here are the z-values created from the t-stat map, including all zeros in the map when we calculate
But now we see something strange. The distribution looks almost truncated. When we look at the new image, the strong negative values (previously outliers in ventricles) aren't there either:
# Need to make sure we look at the same slices :)
def plot_outliers(image,cut_coords,n_std=6):
mr = nib.load(image)
data = mr.get_data()
mean = data.mean()
std = data.std()
six_dev_up = mean + n_std * std
six_dev_down = mean - n_std*std
empty_brain = np.zeros(data.shape)
empty_brain[data>=six_dev_up] = 1
empty_brain[data<=six_dev_down] = 1
outlier_nii = nib.nifti1.Nifti1Image(empty_brain,affine=mr.get_affine(),header=mr.get_header())
plot_roi(outlier_nii,cut_coords=cut_coords)
Z_nii = nib.nifti1.Nifti1Image(z_values,affine=all_copes.get_affine(),header=all_copes.get_header())
nib.save(Z_nii,"/home/vanessa/Desktop/Zimage.nii")
plot_outliers("/home/vanessa/Desktop/Zimage.nii",cut_coords=(7,0,13))
And here is the problem. The outliers are clearly gone, and it's because the distribution has been truncated:
I found this paper, which summarizes the problem:
This was modified from the code provided in the paper above. Thank you!
data = all_copes.get_data()
# Let's select just the nonzero voxels
nonzero = data[data!=0]
# We will store our results here
Z = np.zeros(len(nonzero))
# Select values less than or == 0, and greater than zero
c = np.zeros(len(nonzero))
k1 = (nonzero <= c)
k2 = (nonzero > c)
# Subset the data into two sets
t1 = nonzero[k1]
t2 = nonzero[k2]
# Calculate p values for <=0
p_values_t1 = t.cdf(t1, df = dof)
z_values_t1 = norm.ppf(p_values_t1)
# Calculate p values for > 0
p_values_t2 = t.cdf(-t2, df = dof)
z_values_t2 = -norm.ppf(p_values_t2)
Z[k1] = z_values_t1
Z[k2] = z_values_t2
sns.distplot(Z, label="Z-Values from T-Stat Data")
plt.legend()
<matplotlib.legend.Legend at 0x7f07491de6d0>
Did we fix it?
empty_nii = np.zeros(all_copes.shape)
empty_nii[all_copes.get_data()!=0] = Z
Z_nii_fixed = nib.nifti1.Nifti1Image(empty_nii,affine=all_copes.get_affine(),header=all_copes.get_header())
nib.save(Z_nii_fixed,"/home/vanessa/Desktop/Zfixed.nii")
plot_stat_map(Z_nii_fixed)
<nilearn.plotting.displays.OrthoSlicer at 0x7f0749ce7650>