Software Carpentry Bootcamp
Scripps Research Institute, La Jolla, CA
November 2012
Prepared by: Justin Kitzes
Thanks to: Matt Davis
The most basic component of any programming language are "things", also called variables or (in special cases) objects.
The most common basic "things" in Python are integers, floats, strings, booleans, and some special objects of various types. We'll meet many of these as we go through the lesson.
TIP: To run the code in a cell quickly, press Ctrl-Enter.
TIP: To quickly create a new cell below an existing one, type Ctrl-m then b. Other shortcuts for making, deleting, and moving cells are in the menubar on the left of the screen. To hide the menubar, click on the vertical gray divider separating it from this main page.
# A thing
2
2
# Use print to show multiple things in the same cell
# Note that you can use single or double quotes for strings
print 2
print 'hello'
2 hello
# Things can be stored as variables
a = 2
b = 'hello'
c = True # This is case sensitive
print a, b, c
2 hello True
# The type function tells us the type of thing we have
print type(a)
print type(b)
print type(c)
<type 'int'> <type 'str'> <type 'bool'>
Just storing data in variables isn't much use to us. Right away, we'd like to start performing operations and manipulations on data and variables.
There are three very common means of performing an operation on a thing.
All of the basic math operators work like you think they should for numbers. They can also
do some useful operations on other things, like strings. There are also boolean operators that
compare quantities and give back a bool
variable as a result.
# Standard math operators work as expected on numbers
a = 2
b = 3
print a + b
print a * b
print a ** b
5 6 8
# Except for division
print 2 / 3
print 2 / 3.
0 0.666666666667
# There are also operators for strings
print 'hello' + 'world'
print 'hello' * 3
# print hello / 3
helloworld hellohellohello
# Boolean operators compare two things
a = (1 > 3)
b = (3 == 3)
print a
print b
print a & b
False True False
These will be very familiar to anyone who has programmed in any language, and work like you would expect.
# There are thousands of functions that operate on things
print type(3)
print len('hello')
print round(3.3)
<type 'int'> 5 3.0
TIP: To find out what a function does, you can type it's name and then a question mark to get a pop up help window. Or, to see what arguments it takes, you can type its name, an open parenthesis, and hit tab.
#round?
#round(
round(3.14159, 2)
3.14
TIP: Many useful functions are not in the Python built in library, but are in external scientific packages. These need to be imported into your Python notebook (or program) before they can be used. Probably the most important of these are numpy and matplotlib.
# Many useful functions are in external packages
# Let's meet numpy
import numpy as np
# To see what's in a package, type the name, a period, then hit tab
#np?
#np.
# Some examples of numpy functions and "things"
print np.sqrt(4)
print np.pi # Not a function, just a variable
print np.sin(np.pi)
2.0 3.14159265359 1.22464679915e-16
Before we get any farther into the Python language, we have to say a word about "objects". We will not be teaching object oriented programming in this workshop, but you will encounter objects throughout Python (in fact, even seemingly simple things like ints and strings are actually objects in Python).
In the simplest terms, you can think of an object as a small bundled "thing" that contains within itself both data and functions that operate on that data. For example, strings in Python are objects that contain a set of characters and also various functions that operate on the set of characters. When bundled in an object, these functions are called "methods".
Instead of the "normal" function(variable, arguments)
syntax, methods are called using the
syntax variable.method(arguments)
.
# A string is actually an object
a = 'hello, world'
print type(a)
<type 'str'>
# Objects have bundled methods
#a.
print a.capitalize()
print a.replace('l', 'X')
Hello, world heXXo, worXd
Throughout this lesson, we will successively build towards a program that will calculate the logistic growth of a population of bacteria in a petri dish (or bears in the woods, if you prefer). The exercises will build on each other - if at any time you get behind, you can open the notebook python-full to review answers to the previous exercises and catch up.
As a reminder, a commonly used discrete time equation for logistic population growth is
n(t+1) = n(t) + r n(t) [1 - n(t) / K]
where n(t) is the population size at time t, r is the net per capita growth rate, and K is the carrying capacity of the dish/woods.
To get started, write Python expressions that do the following:
r
, K
, and n0
, setting these equal to 0.6, 100, and 10, respectively.n1
and calculate it's value. Do the same for n2
.n2
- what is it?n2
is larger than 20, and print out a line that says "n2 more than 20: "followed by the answer.
n2
is an integer, and print out a line that says "n2 is int: " followedby the answer (HINT: look at the methods of n2
.)
n1
and n2
so that these values are rounded to the nearestinteger.
r = 0.6
K = 100
n0 = 10
n1 = n0 + r*n0*(1 - n0/K)
n2 = n1 + r*n1*(1 - n1/K)
print n0, n1, n2
print type(n2)
print 'n2 more than 20: ', n2 > 20
print 'n2 is int: ', n2.is_integer()
10 16.0 24.064 <type 'float'> n2 more than 20: True n2 is int: False
Once the number of variables that you are interested in starts getting large, working with them all individually starts to get unwieldy. To help stay organized, we can use collections of things.
Probably 99% of your work in scientific Python will use one of four types of collections: lists, tuples, dictionaries, and numpy arrays. We'll look quickly at each of these and what they can do for you.
Lists are probably the handiest and most flexible type of container. Lists are declared with
square brackets []. Individual elements of a list can be selected using the syntax a[ind]
.
# Lists are created with square bracket syntax
a = ['hi', 'hello', 'yo']
print a, type(a)
['hi', 'hello', 'yo'] <type 'list'>
# Lists (and all collections) are also indexed with square brackets
# NOTE: The first index is zero, not one
print a[0]
print a[1]
hi hello
# Lists can be sliced by putting a colon between indexes
# NOTE: The end value is not inclusive
print a[0:2]
['hi', 'hello']
# You can leave off the start or end if desired
print a[:2]
print a[2:]
print a[:]
print a[:-1]
['hi', 'hello'] ['yo'] ['hi', 'hello', 'yo'] ['hi', 'hello']
# Lists are objects, like everything else, and have methods such as append
a.append('hiya')
print a
a.append([1,2])
print a
a.pop()
print a
['hi', 'hello', 'yo', 'hiya'] ['hi', 'hello', 'yo', 'hiya', [1, 2]] ['hi', 'hello', 'yo', 'hiya']
TIP: A 'gotcha' for some new Python users is that many collections, including lists, actually store pointers to data, not the data itself. This makes the language much more efficient, but can lead to some very hard to trace bugs...
# Assigning b = a does not make a copy of a, rather b and a now point to the same data
b = a
print a, b
b[0] = 'bye'
print a, b
b[0] = 'hi'
# To make a copy, use either list(a) or a[:]
b = list(a)
print a, b
b[0] = 'seeya'
print a, b
['hi', 'hello', 'yo', 'hiya'] ['hi', 'hello', 'yo', 'hiya'] ['bye', 'hello', 'yo', 'hiya'] ['bye', 'hello', 'yo', 'hiya'] ['hi', 'hello', 'yo', 'hiya'] ['hi', 'hello', 'yo', 'hiya'] ['hi', 'hello', 'yo', 'hiya'] ['seeya', 'hello', 'yo', 'hiya']
Copy your code from Exercise 1 into the box below, and do the following:
n0
, n1
, n2
, and n3
are stored in a list and not asseparate individual variables. HINT: You can start off by declaring an empty list using the syntax
n = []
, and then append each new calculated value of nt
to the list.
followed by the result. Extract the last value in two different ways: first, by using the index for the last item in the list, and second, presuming that you do not know how long the list is.
r
and K
to make sure that your cell still runs correctly and givesreasonable answers.
######################################
# This code deletes our old variables
try: del n0, n1, n2, r, K
except: pass
######################################
r = 0.6
K = 100
n = []
n.append(10) # Append n0 in the first location
n.append(round(n[0] + r*n[0]*(1 - n[0]/K))) # Append n1
n.append(round(n[1] + r*n[1]*(1 - n[1]/K))) # Append n2
n.append(round(n[2] + r*n[2]*(1 - n[2]/K))) # Append n3
print n
print "Grew by a factor of ", n[3]/n[0] # or n[-1]/n[0]
[10, 16.0, 24.0, 35.0] Grew by a factor of 3.5
We won't say a whole lot about tuples except to mention that they basically work just like lists, with two major exceptions:
You'll see tuples come up throughout the Python language, and over time you'll develop a feel for when to use them. In general, they're often used instead of lists to group items when the position in the collection is critical to understanding the item's meaning, such as (x,y), and when you want to make sure that you don't accidentally modify any of the items later.
Dictionaries are the collection to use when you want to store and retrieve things by their names (or some other kind of key) instead of by their position in the collection. A good example is a set of model parameters, each of which has a name and a value. Dictionaries are declared using {}.
# Make a dictionary of model parameters
params = {'n0': 10, 'r': 0.5}
print params
print params['r']
{'n0': 10, 'r': 0.5} 0.5
params['K'] = 200
print params
{'n0': 10, 'K': 200, 'r': 0.5}
print params['c']
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) /Users/jkitzes/Projects/swc/2012-11-scripps/python/<ipython-input-477-c0b07735ba0a> in <module>() ----> 1 print params['c'] KeyError: 'c'
Even though numpy arrays (often written as ndarrays, for n-dimensional arrays) are not part of the core Python libraries, they are so useful in scientific Python that we'll include them here in the core lesson. Numpy arrays are collections of things, all of which must be the same type, that work similarly to lists (as we've described them so far). The most important are:
Arrays can be created from existing collections such as lists, or instantiated "from scratch" in a few useful ways.
When getting started with scientific Python, you will probably want to try to use ndarrays whenever possible, saving the other types of collections for those cases when you have a specific reason to use them.
# Make an array from a list
alist = [2, 3, 4]
blist = [5, 6, 7]
a = np.array(alist)
b = np.array(blist)
print a, type(a)
print b, type(b)
[2 3 4] <type 'numpy.ndarray'> [5 6 7] <type 'numpy.ndarray'>
# Do arithmetic on arrays
print a**2
print np.sin(a)
print a * b
print a.dot(b), np.dot(a, b)
[ 4 9 16] [ 0.90929743 0.14112001 -0.7568025 ] [10 18 28] 56 56
# Boolean operators work on arrays too, and they return boolean arrays
print a > 2
print b == 6
c = a > 2
print c
print type(c)
print c.dtype
[False True True] [False True False] [False True True] <type 'numpy.ndarray'> bool
# Indexing arrays
print a[0:2]
c = np.random.rand(3,3)
print c
print '\n'
print c[1:3,0:2]
c[0,:] = a
print '\n'
print c
[2 3] [[ 0.74050824 0.26058895 0.17520427] [ 0.86331106 0.10234613 0.74713171] [ 0.53824664 0.57843474 0.53802683]] [[ 0.86331106 0.10234613] [ 0.53824664 0.57843474]] [[ 2. 3. 4. ] [ 0.86331106 0.10234613 0.74713171] [ 0.53824664 0.57843474 0.53802683]]
# Arrays can also be indexed with other boolean arrays
print a
print b
print a > 2
print a[a > 2]
print b[a > 2]
b[a == 3] = 77
print b
[2 3 4] [ 5 77 7] [False True True] [3 4] [77 7] [ 5 77 7]
# ndarrays have attributes in addition to methods
#c.
print c.shape
print c.prod()
(3, 3) 0.26539136205
# There are handy ways to make arrays full of ones and zeros
print np.zeros(5), '\n'
print np.ones(5), '\n'
print np.identity(5), '\n'
[ 0. 0. 0. 0. 0.] [ 1. 1. 1. 1. 1.] [[ 1. 0. 0. 0. 0.] [ 0. 1. 0. 0. 0.] [ 0. 0. 1. 0. 0.] [ 0. 0. 0. 1. 0.] [ 0. 0. 0. 0. 1.]]
# You can also easily make arrays of number sequences
print np.arange(10)
print np.arange(0, 10, 2)
print np.linspace(5, 10, 33)
[0 1 2 3 4 5 6 7 8 9] [0 2 4 6 8] [ 5. 5.15625 5.3125 5.46875 5.625 5.78125 5.9375 6.09375 6.25 6.40625 6.5625 6.71875 6.875 7.03125 7.1875 7.34375 7.5 7.65625 7.8125 7.96875 8.125 8.28125 8.4375 8.59375 8.75 8.90625 9.0625 9.21875 9.375 9.53125 9.6875 9.84375 10. ]
Copy your code from Exercise 2 into the box below, and do the following:
Make sure to pre-allocate space for the array using np.zeros() or something similar. Pre-allocate an
array of length 100, as if we were going to fill in 100 time steps, including n0.
2. Imagine that each discrete time step actually represents 0.25 of an hour. Create an array t
storing
the time, in hours, of each step of the calculations (for example, t0 is 0, t1 is 0.25, t2 is 0.5, etc.)
up to 100 time steps (the final time will thus be 24.75).
3. Use boolean indexing to extract the value of n
corresponding to a t
of 0.5.
r = 0.6
K = 100
n = np.zeros(100)
n[0] = 10
n[1] = round(n[0] + r*n[0]*(1 - n[0]/K))
n[2] = round(n[1] + r*n[1]*(1 - n[1]/K))
n[3] = round(n[2] + r*n[2]*(1 - n[2]/K))
print n
t = np.arange(0, 25, 0.25)
t = np.linspace(0, 25, 100, endpoint=False)
print t
print n[t == 0.5]
[ 10. 15. 21. 29. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0.25 0.5 0.75 1. 1.25 1.5 1.75 2. 2.25 2.5 2.75 3. 3.25 3.5 3.75 4. 4.25 4.5 4.75 5. 5.25 5.5 5.75 6. 6.25 6.5 6.75 7. 7.25 7.5 7.75 8. 8.25 8.5 8.75 9. 9.25 9.5 9.75 10. 10.25 10.5 10.75 11. 11.25 11.5 11.75 12. 12.25 12.5 12.75 13. 13.25 13.5 13.75 14. 14.25 14.5 14.75 15. 15.25 15.5 15.75 16. 16.25 16.5 16.75 17. 17.25 17.5 17.75 18. 18.25 18.5 18.75 19. 19.25 19.5 19.75 20. 20.25 20.5 20.75 21. 21.25 21.5 21.75 22. 22.25 22.5 22.75 23. 23.25 23.5 23.75 24. 24.25 24.5 24.75] [ 21.]
So far, everything that we've done could, in principle, be done by hand calculation. In this section and the next, we really start to take advantage of the power of programming languages to do things for us automatically.
We start here with ways to repeat yourself. The two most common ways of doing this are known as for loops and while loops. For loops in Python are useful when you want to cycle over all of the items in a collection (such as all of the elements of an array), and while loops are useful when you want to cycle for an indefinite amount of time until some condition is met.
The basic examples below will work for looping over lists, tuples, and arrays. Looping over dictionaries is a bit different, since there is a key and a value for each item in a dictionary. Have a look at the Python docs for more information.
# A basic for loop - don't forget the white space!
wordlist = ['hi', 'hello', 'bye']
for word in wordlist:
print word + '!'
hi! hello! bye!
# Sum all of the values in a collection using a for loop
numlist = [1, 4, 77, 3]
sum = 0
for num in numlist:
sum = sum + num
print "Sum is", sum
Sum is 85
# Often we want to loop over the indexes of a collection, not just the items
wordrange = np.arange(len(wordlist))
print wordrange
for i in wordrange:
print i, wordlist[i]
[0 1 2] 0 hi 1 hello 2 bye
# While loops are useful when you don't know how many steps you will need,
# and want to stop once a certain condition is met.
step = 0
prod = 1
while prod < 100:
step = step + 1
prod = prod * 2
print step, prod
print 'Reached a product of', prod, 'at step number', step
1 2 2 4 3 8 4 16 5 32 6 64 7 128 Reached a product of 128 at step number 7
TIP: Once we start really generating useful and large collections of data, it becomes unwieldy to inspect our results manually. The code below shows how to make a very simple plot of an array. We'll do much more plotting later on, this is just to get started.
# Load up pylab, a useful plotting library
%pylab inline
# Make some x and y data and plot it
y = np.arange(100)**2
plot(y)
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline]. For more information, type 'help(pylab)'.
[<matplotlib.lines.Line2D at 0x11967a510>]
FINALLY, let's get smart about our calculations of nt
. Copy your code from Exercise 3 into the box
below, and do the following:
nt
for 100 time steps. HINT: You will need tocreate an array of the step numbers using a command like step = np.arange(1, 100)
. (Why does
this array start at 1 and not at 0?). Then, loop over the values of the step array,
and use each step value to index the array n
.
n
.r
and K
and see how it changes the plot. What happens if you setr
to 1.9 or 3?
Bonus
greater than 90. HINT: Start a step counter i
at 1, and check that the value in n[i-1]
is less than 90
each time around the loop. Increment the step counter within the loop.
r = 0.5
K = 100
n = np.zeros(100, dtype=float)
n[0] = 10
steps = np.arange(1, 100)
for i in steps:
n[i] = round(n[i-1] + r*n[i-1]*(1 - n[i-1]/K))
plot(n)
[<matplotlib.lines.Line2D at 0x116a94590>]
r = 0.6
K = 100
n = np.zeros(100)
n[0] = 10
i = 1
while n[i-1] < 90:
n[i] = round(n[i-1] + r*n[i-1]*(1 - n[i-1]/K))
print i, n[i]
i = i + 1
print 'Ended with a population size of', n[i-1], 'at step', i-1
print 'Population sizes to this step were', n[0:i]
1 15.0 2 23.0 3 34.0 4 47.0 5 62.0 6 76.0 7 87.0 8 94.0 Ended with a population size of 94.0 at step 8 Population sizes to this step were [ 10. 15. 23. 34. 47. 62. 76. 87. 94.]
Often we want to check if a condition is True and take one action if it is, and another action if the condition is False. We can achieve this in Python with an if statement.
TIP: You can use any expression that returns a boolean value (True or False) in an if statement.
Common boolean operators are ==, !=, <, <=, >, >=. You can also use is
and is not
if you want to
check if two variables are identical in the sense that they are stored in the same location in memory.
# A simple if statement
x = 3
if x > 0:
print 'x is positive'
elif x < 0:
print 'x is negative'
else:
print 'x is zero'
x is negative
# If statements can rely on boolean variables
x = -1
test = (x > 0)
if test:
print 'Test was true'
Deterministic models are boring, so let's introduce some element of randomness into our logistic growth model. We'll model a simple "catastrophe" process, in which a catastrophe happens in 10% of the time steps that reduces the population back down to the size at n0. Copy your code from Exercise 4 into the box below, and do the following:
cat
, for catastrophe, that will be True if a catastrophehas occurred, and False if it hasn't. A simple way to do this is to generate a random number using
np.random.rand()
, which will give you a pseudorandom number between 0 and 1. Check whether this number
is less than 0.1 - this check will be True 10% of the time.
cat
, add an if statement to your for loop that checks whether cat
istrue in each time step. If it is true, set the population back to the size at n[0]. Otherwise, perform the usual logistic growth calculation.
Bonus
n
, count the number of time steps in which the population was above 50.You can do this with a for loop (loop through each value of nt
, check if it is > 50, and if so
increment a counter), or better with a simple boolean comparison. HINT: If you take the sum of a boolean
array (using np.sum()
), it will give you the number of True values (since a True is considered to
be a 1, and False is a 0).
# Set up initial model parameters
r = 0.6
K = 100
n = np.zeros(100)
n[0] = 10
# Loop through all time steps
steps = np.arange(1, 100)
for i in steps:
cat = (np.random.rand() < 0.1) # Random catastrophe 10% of time
if cat:
n[i] = n[0]
else:
n[i] = round(n[i-1] + r*n[i-1]*(1 - n[i-1]/K))
plot(n)
print 'Population was above 50 in', np.sum(n > 50), 'out of 100 steps'
Population was above 50 in 62 out of 100 steps
One way to write a program is to simply string together commands, like the ones described above, in a long file, and then to run that file to generate your results. This may work, but it can be cognitively difficult to follow the logic of programs written in this style. Also, it does not allow you to reuse your code easily - for example, what if we wanted to run our logistic growth model for several different choices of initial parameters?
The most important ways to "chunk" code into more manageable pieces is to create functions and then to gather these functions into modules, and eventually package. Below we will discuss how to create functions and modules. A third common type of "chunk" in Python is classes, but we will not be covering object-oriented programming in this workshop.
# We've been using functions all day
x = 3.333333
print round(x, 2)
print np.sin(x)
3.33 -0.190567635651
# It's very easy to write your own functions
def multiply(x, y):
return x*y
# Once a function is "run" and saved in memory, it's available just like any other function
print type(multiply)
print multiply(4, 3)
<type 'function'> 12
# It's useful to include docstrings to describe what your function does
def say_hello(time, place):
'''
Function returns a greeting string. Useful for engendering goodwill.
Takes two arguments (strings): time, place
'''
return 'Good ' + time + ', ' + place
say_hello('afternoon', 'Scripps')
'Good afternoon, Scripps'
# All arguments must be present, or the function will return an error
say_hello('afternoon')
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/jkitzes/Projects/swc/2012-11-scripps/python/<ipython-input-723-627d50fc9487> in <module>() 1 # All arguments must be present, or the function will return an error ----> 2 say_hello('afternoon') TypeError: say_hello() takes exactly 2 arguments (1 given)
# Keyword arguments can be used to make some arguments optional by giving them a default value
# All mandatory arguments must come first, in order
def say_hello(time, place='California'):
return 'Good ' + time + ', ' + place
say_hello('afternoon')
'Good afternoon, California'
say_hello('afternoon', 'La Jolla')
'Good afternoon, La Jolla'
Finally, let's turn our logistic growth model into a function that we can use over and over again. Copy your code from Exercise 5 into the box below, and do the following:
logistic_growth
that takes four arguments: r
, K
, n0
,and p
(the probability of a catastrophe). Make p
a keyword argument with a default value of 0.1.
Have your function return the n
array.
Store the returned value of n
and make a plot from it.
Bonus
population given the old population. Make this line another function called grow_one_step
that takes
in the old population, r
, and K
, and returns the new population. Have your logistic_growth
function
use the grow_one_step
function to calculate the new population in each time step.
def logistic_growth(r, K, n0, p=0.1):
'''
Function to simulate discrete time stochastic logistic growth
Arguments
---------
r : float
Reproductive rate of population
K : float
Carrying capacity
n0 : float
Initial population
p : float
Probability of a catastrophe resetting the population to n0
Returns
-------
n : ndarray
Array of 100 time steps of population values
'''
# Set up population array
n = np.zeros(100, dtype=float)
n[0] = n0
# Loop through all time steps
steps = np.arange(1, 100)
for i in steps:
cat = (np.random.rand() < p) # Random catastrophe 10% of time
if cat:
n[i] = 10
else:
n[i] = grow_one_step(n[i-1], r, K)
return n
def grow_one_step(nold, r, K):
'''Calculate new population from old one.'''
return round(nold + r*nold*(1 - nold/K))
a1 = logistic_growth(0.6, 100, 10)
a2 = logistic_growth(0.6, 100, 10)
a3 = logistic_growth(0.6, 100, 10)
hold(True)
plot(a1)
plot(a2)
plot(a3)
[<matplotlib.lines.Line2D at 0x11967eb90>]
We can make our functions more easily reusable by placing them into modules that we can import, just
like we have been doing with numpy
. It's pretty simple to do this.
called pop.py
.
how to run the module, and explain to humans what the module is doing. (The instructor will show you how to do this.)
import pop
to import the module. Type pop.
and hit tab to see the availablefunctions in the module. Try running the logistic growth function that was imported from the module with a few different parameter values.
# import pop
# plot(pop.logistic_growth(0.6, 100, 10))