By Ariel Rokem, late of this university
The platform for this workshop will be files such as this one, which interleave explanation text, code and data visualizations.
To execute each cell of the notebook, hit shift-enter
.
That will put you in the next cell. There are two types of input cells, controlled by the menu bar at the top of this page. One type are cells like this one, called "markdown cells". They are called that, because they contain a simple, minimalistic form of markup as input to the browser, which then renders these cells in a useful way. You can find a pretty good markdown cheat-sheet here. See - for example - that was a hyper-link to a web page!
These cells can also be used to write math-y stuff, using $\LaTeX$. To write math that will be rendered in this way, we simply enter a piece of syntactically correct latex between two $
signs. Like so: $f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{\frac{(x-\mu)^2}{2\sigma^2}}$
Another type of input cell in the notebook is a code cell. These cells contain code and when they are executed, this piece of code is sent to the python interpter. For example:
a = 1
print a
1
In contrast to some other commonly used scripting languages (notably Matlab), Python is an object-oriented language. That means that everything is an object. [And there's a big difference between having an object
data-type and having almost everything be an object].
For example, if you type a.
and hit the tab
key, you will see all the attributes of this object:
print a.real
print a.imag
1 0
So far, that looks a lot like the matlab struct
array. However, an important difference is that some of these attributes can be functions that can be called on/from the object. These are also referred to as 'methods' of the object. For example:
print a.bit_length
<built-in method bit_length of int object at 0x100311c88>
To tell what this method does, we can get ipython to give us a brief description of this function. This is done by attaching a ?
after it:
a.bit_length?
This should open a sub-window at the bottom of this window with the 'docstring' of this method. If the author of the function was kind enough to provide useful information in the docstring, we can use this information to know what this method does and what are its expected inputs. This function has no inputs, so to call it, we simply provide the method with empty parentheses:
a.bit_length()
1
There is a lot more to say about objects and object-orientation (and what it is good for). For now, this will suffice.
Some of them look deceivingly like structures in other languages. For example, here's a 'list':
b = [1,2,3,4]
print len(b)
4
Here's a classic gotcha. Python indexes start at 0:
print(b[0])
1
Lists can be changed:
b[0] = 2
print b
[2, 2, 3, 4]
They have a ton of interesting attributes and methods. Feel free to explore them as you just learned above. Importantly, though they look a lot like a Matlab vector, they are not the Python analogue of that. Wait just a few more cells for that one to show up.
print b * 2
[2, 2, 3, 4, 2, 2, 3, 4]
print b+1
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-10-5bd19596e089> in <module>() ----> 1 print b+1 TypeError: can only concatenate list (not "int") to list
print b + b
[2, 2, 3, 4, 2, 2, 3, 4]
Let's look at one useful class method and notice some interesting (and possibly surprising) behaviors:
new_b = b.append(5)
What do you expect new_b
to look like?
print new_b
None
This is because the append
method has no output. It changes the object on which it is called and doesn't return anything. We'll see more about what that means when we examine functions below. In the meanwhile, here's what b
not looks like:
print b
[2, 2, 3, 4, 5]
'tuples' are very similar to lists:
c = (1,2,3,4)
print c[0]
1
In contrast to lists, they are immutable:
c[0] = 2
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-16-5673658116c7> in <module>() ----> 1 c[0] = 2 TypeError: 'tuple' object does not support item assignment
And they are also not analogous to Matlab vectors:
print c + c
(1, 2, 3, 4, 1, 2, 3, 4)
So far, looks kinda useless. It'll become more interesting in just a minute, after we introduce 'dictionaries', or as they are affectionately known 'dicts':
d = {'key_the_first':1000.0, 'key_the_second':'woot!', 10:1 , (10,11):12}
This data structure holds key-value pairs, such that when you refer to one, you can get the other:
print d['key_the_first']
1000.0
print d[10]
1
Note that while lists cannot be keys to a dict, tuples can:
print d[(10,11)]
12
In CS-speak, a thing such as a dict is sometimes referred to as a hash-table and is considered a very useful thing to have. For example, think what you would have to do in order to count the frequency of appearance of certain words in a big text file.
As in other languages you might know, there are if
, else
, elif
(not elseif
), for
and while
control statements. There are a few interesting idiosyncracies. First of all, if you have something that has several components, you can probably just loop over it, using the for
loop:
for x in b:
print x
2 2 3 4 5
Second, you might notice that there's no explicit delimiter to that for loop. How does python know where it started and where it ended? The answer is that the indentation signals that. In contrast to other languages, indentation in Python is not only a stylistic preference, but a syntactical requirement. That is, if you do not indent the contents of a control block, you will get an error:
for x in b:
print x
IndentationError: expected an indented block
This level of stickler-ish insistence on such things as where the white-space appears have made people call it a "bondage and discipline" language. But maybe that's your thing?
Obviously, we can nest several different control structures in each other:
for lookfor in range(10):
if lookfor in b:
print 'I think we have a %s in b'%lookfor
I think we have a 2 in b I think we have a 3 in b I think we have a 4 in b I think we have a 5 in b
Also demonstrating:
range
in
The idea here is that apart from very few built-in constructs and types, when you start a python interperter, it knows very little. To allow it to know more, you need to assign variables (which we did above), you need to load things from files (which we'll see later), or you need to import modules that contain more objects, functions, etc. For example, let's import a favorite module of mine, the os
module
import os
import
ing this name into our name-space now makes a lot of other names available through it. Try to type os.
and tab-completing on that. And some of the names under that have additional names under them. For example:
curdir = os.path.curdir
listdir = os.listdir(curdir)
print os.path.join(curdir, listdir[1])
./.gitignore
No need for other files, compilation steps, or anything. Just define it and it's there:
def add_two_numbers(num1, num2):
"""
This is the function docstring. Ideally it is informative. For example:
Add two numbers to each other.
Parameters
----------
num1 : int/float
A number
num2 : int/float
Another number
Returns
-------
sum_it : the sum of the inputs
"""
sum_it = num1 + num2
return sum_it
add_two_numbers(1,2)
3
add_two_numbers?
If we have time, let's look at a slightly more interesting example:
import numpy as np # The array library - we'll come back to this one
def word_count(url):
"""
Count word frequencies in the text body of a given url
Parameters
----------
url : string
The address of a url to be scraped
Returns
-------
word_dict : dict
Frequency counts of the words
"""
import urllib
url_get = urllib.urlretrieve(url)
f = file(url_get[0])
txt = f.read()
start_idx = txt.find('<body>')
end_idx = txt.find('</body>')
new_txt = txt[start_idx:end_idx]
new_txt = new_txt.split(' ')
word_dict = {}
for word in new_txt:
# Get rid of all kinds of html crud and the empty character:
if not('>' in word or '<' in word or '=' in word or '-' in word or '&' in word or word == ''):
if word in word_dict.keys():
word_dict[word] += 1
else:
word_dict[word] = 1
vals = np.array(word_dict.values())
keys = np.array(word_dict.keys())
sort_idx = np.argsort(vals)
return (vals[sort_idx][::-1], keys[sort_idx[::-1]])
Let's apply this to a paper about fMRI:
word_arr = word_count('http://www.journalofvision.org/content/11/5/12.full?')
For simplicity, let's examine the top 50 results:
to_plot_vals = word_arr[0][:50]
to_plot_words = word_arr[1][:50]
print to_plot_vals
print to_plot_words
[557 492 491 336 213 209 198 149 148 134 134 120 105 91 81 77 77 77 76 74 71 62 62 54 53 47 46 46 42 42 41 39 38 37 35 34 34 34 33 33 33 32 31 30 30 30 29 29 28 27] ['the' 'in' 'of' '\n' 'to' 'and' 'a' 'is' 'that' 'text"\n' 'reference' 'visual' 'fMRI' 'with' 'V1' 'by' 'response' 'attention' 'for' 'be' 'BOLD' 'human' 'signal' 'responses' 'on' 'not' 'are' 'et' 'neurons' 'stimulus' 'as' 'effects' 'from' 'This' 'electrophysiological' 'signals' 'cortex' 'between' 'macaque' 'or' 'an' 'neuronal' 'effect' 'but' 'The' 'spatial' 'time' 'this' 'primary' 'may']
I think that it's fair to say that the first 11 words are not very interesting, so let's ignore those and plot the others, wordle-style
%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline]. For more information, type 'help(pylab)'.
fig, ax = plt.subplots(1)
for word_idx in range(len(to_plot_words[11:])):
ax.text(np.random.rand(), np.random.rand(), to_plot_words[11:][word_idx], fontsize=to_plot_vals[11:][word_idx])
ax.set_axis_off()
For help: tab complete, something?
, and when in doubt, Google is your friend.
Python is an object-oriented language with powerful libraries for representation of numerical objects, and for scientific analysis and visualization. Using the ipython notebook, we can interactively analyze and visualize data in an iterative fashion. We can now move on to examine some more interesting data. Namely, we will start looking at some MRI data, using neuroimaging-specific libraries.