Here we will work through an example of using PCA to characterize responses during the optomotor response, which is a behavioral response (swimming) induced by the presentation of a moving visual stimulus. The data are stored on S3 in a form that can be read into Thunder as a Series
object. A Series
object is a distributed collection of indexed records, each of which is a key-value pair, where the key is a label (in this case, the spatial coordinate), and the value is a 1D array (in this case, a time series).
The data set is approximately 50 GB total as integers, and 200 GB when converted to floats.
Before starting, we will import the functions and classes we'll need, as well as set up plotting inside the noteboook. We'll use the seaborn
package for plotting, but this is entirely optional. If you cannot load the seaborn package, just ignore commands involving sns
and everything should otherwise work as described.
from thunder import RegressionModel, PCA, Colorize
from numpy import amax
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_context('notebook')
First we will load the data and do some basic operations to inspect it, like looking at a single time point, and looking at various summary statistics. To load the data, we use the loadExampleEC2
method of the ThunderContext
, which is automatically created for us as tsc
when we start Thunder.
data, params = tsc.loadExampleEC2('zebrafish-optomotor-response')
Compute the mean of each voxel and pack it into a local image on the driver. Check the shape (it's an xyz volume) and look at a maximum intensity projection.
img_mean = data.seriesMean().pack()
img_mean.shape
(1250, 1650, 15)
sns.set_context("poster")
sns.set_style('white')
plt.imshow(amax(img_mean,2), cmap='gray', clim=(0,1500));
As another example, we can select a single time point, pack it into an image, and look at a single time point.
img_single = data.select(100).pack()
sns.set_context("poster")
sns.set_style('white')
plt.imshow(img_single[:,:,10], cmap='gray', clim=(0,1500));
Now compute the mean across voxels, which will yield a time series. This is raw flouresnce; note the unnormalized values on the y-axis. Also note the drift in the signal over time.
y = data.mean()
sns.set_context('notebook')
sns.set_style('darkgrid')
plt.plot(y);
We can apply some simple filtering operations using methods availiable on a Series
object. In this case, we'll perform linear detrending, as well as normalize by subtracting and dividing by a percentile. Note how these changes are reflected in the mean.
y = data.detrend().normalize().mean()
sns.set_context('notebook')
sns.set_style('darkgrid')
plt.plot(y);
The 394 time points of the traces above correspond to 9 repeated presentations of the stimulus. There are several ways to compute a trial-triggered average, but one simple method is through regression, where we build a design matrix such that regressing against this matrix computes the correct average. We loaded that matrix above in the variable params
. First, we build a model from it.
model = RegressionModel.load(params, 'linear')
Filter the data as we did above, and fit the model to the filtered data.
data_filtered = data.detrend().normalize()
params = model.fit(data_filtered)
params.index
['betas', 'stats', 'resid']
We have three sets of parameters from the fit: the betas (or coeffients) for each voxel, corresponding to the triggered-average, as well as an r2 statistic, and residuals. For now, we select the betas, because we'll perform PCA on those. And we cache them to speed up any subsequent operations performed on them. Cacheing these, rather than the raw data, avoids storing the raw data set in RAM (which we otherwise no longer need).
betas = params.select('betas')
betas.cache();
Perform PCA on the trial-triggered responses, and plot the resulting components, which are basis functions in time.
pca = PCA(k=3).fit(betas)
sns.set_context("notebook")
sns.set_palette("husl", 3)
plt.plot(pca.comps.T);
PCA also recovers basis functions in space. These are stored in the attribute scores
, which we can pack into as image just as we did earlier with the mean. Check the dimensions, they should be kxyz, where k is the number of components, because we get a full volume for each component.
imgs = pca.scores.pack()
imgs.shape
(3, 1250, 1650, 15)
For visualization, it's useful to convert the scores from the first two principal components into a polar angle space, where color describes the relative projection onto the two components, and brightness indicates response strength. We can perform this conversion, as well as several other conversions from numerical data into colors, using the Colorize
function.
maps = Colorize("polar", scale=1000).images(imgs[0:2])
Look at the result maps from a couple single planes, as well as a maximum intensity projection
sns.set_context("poster")
sns.set_style("white")
plt.imshow(maps[:,:,2,:]);
sns.set_context("poster")
sns.set_style("white")
plt.imshow(maps[:,:,10,:]);
sns.set_context("poster")
sns.set_style("white")
plt.imshow(amax(maps, 2));
Another useful way to inspect the results of PCA is to look at the scores of individual voxels in a 2D scatter plot. We can do this by extracting a subset of points, and we threshold on the standard deviation to ensure we select voxels with at least some signal.
pts = pca.scores.subset(nsamples=500, thresh=0.0003)
sns.set_style("darkgrid")
sns.set_context("notebook")
clrs = Colorize("polar", scale=1000).points(pts[:,0:2])
a = plt.scatter(pts[:, 0], pts[:, 1], s=150, c=clrs, alpha=0.6, edgecolor='black', linewidth=0.2)
xlim([-0.003, 0.002]); ylim([-0.004, 0.004]);
A final useful visualization is to plot, for each of the same points, the time series associated with that voxel reconstructed in the space spanned by the first two principal components.
recon = map(lambda x: (x[0] * pca.comps[0, :] + x[1] * pca.comps[1, :]).tolist(), pts)
plt.gca().set_color_cycle(clrs)
plt.plot(asarray(recon).T);
plt.ylim(-0.0007,0.0007);