There are infinte values in the data when we don't separate the distribution into two:
import matplotlib
import matplotlib.pylab as plt
import numpy as np
%matplotlib inline
import nibabel as nib
from nilearn.plotting import plot_stat_map, plot_roi
from scipy.spatial.distance import pdist
from scipy.stats import norm, t
import seaborn as sns
all_copes_file = "/home/vanessa/Desktop/tfMRI_LANGUAGE_STORY.nii_tstat1.nii.gz"
all_copes = nib.load(all_copes_file)
data = all_copes.get_data()
# Let's select just the nonzero voxels
nonzero = data[data!=0]
# Degrees of freedom is number of subjects -2
dof = 486 -2
Zdata = norm.ppf(t.cdf(data, dof))
# Are there inf in the data?
infs = np.isinf(Zdata).any()
print "Are there infs in the data?: %s" %(infs)
print "There are %s infs in the data" %(len(np.where(np.isinf(Zdata))[0]))
Are there infs in the data?: True There are 7982 infs in the data
This first issue can be traced to the case when the p_values are equal to 1.0
p_values = t.cdf(data, dof)
Zdata = norm.ppf(p_values)
index = np.where(np.isinf(Zdata))
print "The inf values occur where p-values are %s" %(np.unique(p_values[index]))
The inf values occur where p-values are [ 1.]
This can be helped by fudging the number a bit: instead of 1.0 we can fill in 0.999999
p_values[p_values==1.0] = 0.999999999
Zdata = norm.ppf(p_values)
# Are there inf in the data?
infs = np.isinf(Zdata).any()
print "Are there infs in the data?: %s" %(infs)
print "There are %s infs in the data" %(len(np.where(np.isinf(Zdata))[0]))
Are there infs in the data?: False There are 0 infs in the data
Here I thought we were in the clear! But then I looked at the distribution of the map:
Z_nii = nib.nifti1.Nifti1Image(Zdata,affine=all_copes.get_affine(),header=all_copes.get_header())
nib.save(Z_nii,"/home/vanessa/Desktop/Zdata.nii")
On the left is the Zdata that we just made - if it were a lizard, someone cut part of his right tail off! The map on the right is generated by first splitting at zero.
Apologies for the different scalings for the two distributions, I just opened them up in MriCron :)
The concern is that we are losing potentially the strongest activations in the conversion.