# Overview¶

In this lab, we'll walk through some of the features of the Neurosynth core tools (http://github.com/neurosynth/neurosynth). By the end, you'll know how to:

• Import the modules you'll need to perform basic analyses
• Create a new Neurosynth dataset object from provided text files
• Run a simple term-based meta-analysis
• Run a slightly more complicated term-based meta-analysis
• Perform meta-analytic contrasts
• Generate seed-based coactivation maps

## Installation¶

We're not going to cover installation here--sorry! See the quickstart guide in the github repository for that.

## Running this code¶

To run the examples below, you have several options:

• Run from within IPython Notebook. This is the preferred approach if you have IPython Notebook installed. Put this file (neurosynth_demo.ipynb) in its own directory. Now launch the IPython dashboard from a command line prompt by typing "ipython notebook" (without quotes), then open the Neurosynth demo. You should now be able to run any cell in the notebook.

• Run in IPython. From a terminal prompt, launch IPython (just type ipython). Now paste the code blocks below.

• Write a standalone script. Save the code blocks below to a separate file and run it as a Python script. This is not recommended as it's non-interactive.

In all cases, you'll need to make sure that the data files called below (database.txt and features.txt) are accessible at the locations indicated (a subfolder called data/), or change the paths to point to the correct location. You can obtain the files from the neurosynth-data github repository, or by following the instructions in the core tools repository.

## Importing modules¶

Let's start with the basics. In Python, most modules (i.e., organized chunks of code) are inaccessible by default. Other than a few very basic built-in functions, you'll need to explicitly include every piece of code you want to work with. This may seem cumbersome, but it has the nice effect of (a) making sure you always know exactly what dependencies your code has, and (b) minimizing the memory footprint of your app by only including functionality you know you'll need.

Like most Python packages, Neurosynth consists of several modules arranged into a semi-sensible tree structure. For this lab, we'll need functionality available in several modules, which we can include like so:

In [1]:
# Core functionality for managing and accessing data
from neurosynth import Dataset
# Analysis tools for meta-analysis, image decoding, and coactivation analysis
from neurosynth import meta, decode, network


## Creating a new dataset¶

Next, we create a Dataset, which is the core object most Neurosynth tools operate on. We initialize a Dataset by passing in a database file, which is essentially just a giant list of activation coordinates and associated study IDs. This file can be downloaded from the Neurosynth website or installed from the data submodule (see the Readme for instructions).

Creating the object will take a few minutes on most machines, as we need to process about 200,000 activations drawn from nearly 6,000 studies. Once that's done, we also need to add some features to the Dataset. Features are just variables associated with the studies in our dataset; literally any dimension a study could be coded on can constitute a feature that Neurosynth can use. In practice, the default set of features included in the data download includes 500 psychological terms (e.g., 'language', 'emotion', 'memory', etc.) that occur with some frequency in the dataset. So when we're talking about the "emotion" feature, we're really talking about how frequently each study in the Dataset uses the word 'emotion' in the full-text of the corresponding article.

In [ ]:
# Create a new Dataset instance
dataset = Dataset('data/database.txt')



Because this takes a while, we'll save our Dataset object to disk. That way, the next time we want to use it, we won't have to sit through the whole creation operation again:

In [ ]:
dataset.save('dataset.pkl')


Now in future, instead of waiting, we could just load the dataset from file:

In [ ]:
dataset = Dataset.load('dataset.pkl')   # Note the capital D in the second Dataset--load() is a class method


## Doing stuff with Neurosynth¶

Now that our Dataset has both activation data and some features, we're ready to start doing some analyses! By design, Neurosynth focuses on facilitating simple, fast, and modestly useful analyses. This means you probably won't break any new ground using Neurosynth, but you should be able to supplement results you've generated using other approaches with a bunch of nifty analyses that take just 2 - 3 lines of code.

### Simple feature-based meta-analyses¶

The most straightforward thing you can do with Neurosynth is use the features we just loaded above to perform automated large-scale meta-analyses of the literature. Let's see what features we have:

In [ ]:
dataset.get_feature_names()


If the loading process went smoothly, this should return a list of over 3,000 features. We can use these terms--either in isolation or in combination--to select articles for inclusion in a meta-analysis. For example, suppose we want to run a meta-analysis of emotion studies. We could operationally define a study of emotion as one in which the authors used words starting with 'emo' with high frequency:

In [ ]:
ids = dataset.get_studies(features='emo*', frequency_threshold=0.05)


Here we're asking for a list of IDs of all studies that use words starting with 'emo' (e.g.,'emotion', 'emotional', 'emotionally', etc.) with a loading greater than 0.05. In the default feature set we loaded above, values reflect tf-idf frequencies. Tf-idf is a normalized frequency metric that ranges from 0 to 1. Let's find out how many studies are in our list:

In [ ]:
len(ids)


The resulting set includes 1264 studies (if you get a different number, you're probably using a different version of the feature file, but all the examples in this notebook should still run just fine).

Once we've got a set of studies we're happy with, we can run a simple meta-analysis, prefixing all output files with the string 'emotion' to distinguish them from other analyses we might run:

In [ ]:
# Run a meta-analysis on emotion
ids = dataset.get_ids_by_features('emo*', threshold=0.05)
ma = meta.MetaAnalysis(dataset, ids)
ma.save_results('.', 'emotion')


You should now have a set of Nifti-format brain images on your drive that display various meta-analytic results. The image names are somewhat cryptic; see documentation elsewhere for details. It's important to note that the meta-analysis routines currently implemented in Neurosynth aren't very sophisticated; they're designed primarily for efficiency (most analyses should take just a few seconds), and take multiple shortcuts as compared to other packages like ALE or MKDA. But with that caveat in mind (and one that will hopefully be remedied in the near future), Neurosynth gives you a streamlined and quick way of running large-scale meta-analyses of fMRI data. Of course, all of the images you could generate using individual features are already available on the Neurosynth website, so there's probably not much point in doing this kind of thing yourself unless you've defined entirely new features.

### More complex feature-based meta-analyses¶

Fortunately, we're not constrained to using single features in our meta-analyses. Neurosynth implements a parsing expression grammar, which is a fancy way of saying you can combine terms according to syntactic rules--in this case, basic logical operations.

For example, suppose we want to restrict our analysis to studies of emotion that do NOT use the terms 'reward' or 'pain', which we might construe as somewhat non-prototypical affective states. Then we could do the following:

In [ ]:
ids = dataset.get_studies(expression='emo* &~ (reward* | pain*)', frequency_threshold=0.05)
ma = meta.MetaAnalysis(dataset, ids)
ma.save_results('.', 'emotion_without_reward_or_pain')
print "Found %d studies." % len(ids)


This meta-analysis is somewhat more restrictive than the previous one (1108 instead of 1264), and the result should theoretically be at least somewhat more spatially specific.

There's no inherent restriction on how many terms you combine or how deeply you nest logical expressions within parentheses, but the cardinal of GIGO (garbage in, garbage out) always applies, so if your expression is very specific and the number of studies drops too far (in practice, sensible results are unlikely with fewer than 50 studies), don't expect to see much.

### Meta-analytic contrasts¶

In addition to various logical operations, one handy thing you can do with Neurosynth is perform meta-analytic contrasts. Meaning, you can identify voxels in which the average likelihood of activation being reported differ for two different sets of studies. For example, let's say you want to meta-analytically contrast studies that use the term 'recollection' with studies that use the term 'recognition'. You can do this by defining both sets of studies separately, and then passing them both to the meta-analysis object:

In [ ]:
# Get the recognition studies and print some info...
recog_ids = dataset.get_ids_by_features('recognition', threshold=0.05)
print "We found %d studies of recognition" % len(recog_ids)

# Repeat for recollection studies
recoll_ids = dataset.get_ids_by_features('recollection', threshold=0.05)
print "We found %d studies of recollection" % len(recoll_ids)

# Run the meta-analysis
ma = meta.MetaAnalysis(dataset, recog_ids, recoll_ids)
ma.save_results('.', 'recognition_vs_recollection')


This produces the same set of maps we've seen before, except the images now represent a meta-analytic contrast between two specific sets of studies, rather than between one set of studies and all other studies in the database.

It's worth noting that meta-analytic contrasts generated using Neurosynth should be interpreted very cautiously. Remember that this is a meta-analytic contrast rather than a meta-analysis of contrasts. In the above example, we're comparing activation in all studies in which the term recognition shows up often to activation in all studies in which the term recollection shows up often (implicitly excluding studies that use both terms). We are NOT meta-analytically combining direct contrasts of recollection and recognition, which would be a much more sensible thing to do (but is something that can't be readily automated).

### Seed-based coactivation maps¶

By now you're all familiar with seed-based functional connectivity. We can do something very similar at a meta-analytic level (e.g., Toro et al, 2008, Robinson et al, 2010, Chang et al, 2012) using the Neurosynth data. Specifically, we can define a seed region and then ask what other regions tend to be reported in studies that report activity in our seed region. The Neurosynth tools make this very easy to do. We can either pass in a mask image defining our ROI, or pass in a list of coordinates to use as the centroid of spheres. In this example, we'll do the latter:

In [ ]:
# Seed-based coactivation
network.coactivation(dataset, [[0, 20, 28]], threshold=0.1, r=10, output_dir='.', prefix='acc_seed')


Here we're generating a coactivation map for a sphere with radius 10 mm centered on an anterior cingulate cortex (ACC) voxel. The threshold argument indicates what proportion of voxels within the ACC sphere have to be activated for a study to be considered 'active'. We write the resulting images to the current directory, prepended with 'acc_seed'.

In general, meta-analytic coactivation produces results quite similar--but substantially less spatially specific--than time series-based functional connectivity. Note that if you're only interested in individual points in the brain, you can find precomputed coactivation maps for spheres centered on every gray matter voxel in the brain on the Neurosynth website.

# Decode images