This is a Jupyter notebook. Here is a Jupyter tutorial.
This cell is a Markdown cell: it contains text with markup commands rather than executable computer code.
You can also style text in a Markdown cell using html tags.
Markdown cells in Jupyter support MathJax, so you can write lovely typeset mathematics, e.g., $ e^{i\pi} = -1$.
$$ e^x \equiv \sum_{j=1}^\infty \frac{x^j}{j!}. $$Here is a demo of Markdown and MathJax in Jupyter
This particular Jupyter notebook is a Python notebook (there are also R and Julia notebooks, among other programming languages). This notebook is communicating with a Python kernel and can execute Python commands.
Here is an introduction to Python.
For a more thorough introduction to Python for Science, see https://scipy-lectures.github.io/
Most of Jupyter's functionality is clear from its drop-down menus: commands to insert or delete cells, to execute cells, to clear output, etc.
One of the most useful features of Jupyter is its help functions and tab completion.
For instance, typing "tab" while you are typing the name of a function will give you a list of functions that start with the letters you have typed so far.
Please click "help" and take the User Interface Tour.
The rest of this notebook is a brief introduction to Python (within Jupyter). We will see more of Python in later sections of the course, as we encounter particular topics linear algebra, least squares, optimization, random number generation, the Bootstrap, etc.
# This is a code cell (but this line is a comment, because it starts with '#')
# This is a Python notebook, so you can type Python commands into this cell, for example:
print('Hello world! I\'m so happy to be in Tokyo!')
Hello world! I'm so happy to be in Tokyo!
# more Python
# arithmetic
print 5+2
print 5^2
print 5**2
print 5/2 # by default, division of integers is truncated.
print 5/2.0
7 7 25 2 2.5
# You will use many Python packages to get higher-level data constructs, functions, etc.
# __future__ arithmetic doesn't truncate integer division
from __future__ import division
print 5/2
2.5
sqrt(4) # sqrt is not in the core language. It's in the math package.
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-4-e80ea8d2b357> in <module>() ----> 1 sqrt(4) # sqrt is not in the core language. It's in the math package. NameError: name 'sqrt' is not defined
import math
print math.sqrt(4)
print math.sqrt(-1) # no complex arithmetic in the basic math package
2.0
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-5-266c381803bf> in <module>() 1 import math 2 print math.sqrt(4) ----> 3 print math.sqrt(-1) # no complex arithmetic in the basic math package ValueError: math domain error
import numpy as np # numeric Python. Lots of goodies.
import scipy as sp # scientific Python. More goodies.
print np.sqrt(-1+0j) # Python uses j to denote sqrt(-1). Numpy can take the square root of complex numbers
print np.sqrt(4)
print np.sqrt(4+0j)
1j 2.0 (2+0j)
# variables
x = 5
print x
print x**2
y = x
print y
5 25 5
# Python has some pre-defined values
print math.pi
print math.e
3.14159265359 2.71828182846
# not a number
x = float('nan')
print x
print x + 1 # arithmetic with "NA" gives "NA"
print math.isnan(x)
print math.isnan(math.pi)
nan nan True False
# ranges
print range(5) # by default, ranges start at 0
print range(1,5) # includes lower endpoint but not upper endpoint
print range(1,10,2) # step size
print range(10,5,-1) # negative steps are OK
[0, 1, 2, 3, 4] [1, 2, 3, 4] [1, 3, 5, 7, 9] [10, 9, 8, 7, 6]
# non-integer spaced ranges
print np.linspace(1, 3, num=5) # 5 equispaced points between 1 and 3
print np.linspace(3, 1, num=5) # 5 equispaced points between 3 and 1
print np.arange(1, 2, step = 0.1) # go from 1 to 2 in steps of 0.1
print np.arange(3, 1, step = -0.5) # from 3 to 1 in steps of -0.5
[ 1. 1.5 2. 2.5 3. ] [ 3. 2.5 2. 1.5 1. ] [ 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9] [ 3. 2.5 2. 1.5]
# how long is a list?
y = range(5)
print len(y)
# indexing a list
print y[0] # Python uses 0-based indexing
print y[4]
print y[6] # nothing there!
5 0 4
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-11-832e3785ec08> in <module>() 6 print y[0] # Python uses 0-based indexing 7 print y[4] ----> 8 print y[6] # nothing there! IndexError: list index out of range
# you can append to lists
y.append(2)
print y
print len(y)
[0, 1, 2, 3, 4, 2] 6
# you can pop elements from lists
y = range(5)
print y
print y.pop() # remove and return the last element of the list
print y # last element is gone now
y = range(5)
print y
print y.pop(1) # remove and return the element in position 1 (recall Python uses 0-based indices)
print y # element in position 1 is gone now
[0, 1, 2, 3, 4] 4 [0, 1, 2, 3] [0, 1, 2, 3, 4] 1 [0, 2, 3, 4]
# slightly more advanced indexing
#
y = range(5)
print y[2:4] # 3rd and 4th elements of y--remember, last element isn't included
print y[2:5] # 3rd through 5th
# logical indexing: easy to use list comprehensions. List comprehensions are a great Python language feature!
ygt2 = [v for v in y if v > 2]
print ygt2
# alternatively, use a numpy array
y = np.array(range(5))
print y[np.where(y > 2)]
[2, 3] [2, 3, 4] [3, 4] [3 4]
# build a matrix
x = np.array([[0,1], [0,2], [1,3]])
print x
print x.T
[[0 1] [0 2] [1 3]] [[0 0 1] [1 2 3]]
# building arrays
x = np.array(range(1,6))
y = np.array(range(6, 11))
# row binding
print np.vstack((x,y)) # glue these together as rows
print np.vstack((x,y,x**2))
# column binding
print np.vstack((x,y)).T # glue these together as columns
print np.vstack((x,y,x**2)).T
[[ 1 2 3 4 5] [ 6 7 8 9 10]] [[ 1 2 3 4 5] [ 6 7 8 9 10] [ 1 4 9 16 25]] [[ 1 6] [ 2 7] [ 3 8] [ 4 9] [ 5 10]] [[ 1 6 1] [ 2 7 4] [ 3 8 9] [ 4 9 16] [ 5 10 25]]
# pre-defined arrays in numpy
print np.ones((3,4))
print np.zeros((4,3))
[[ 1. 1. 1. 1.] [ 1. 1. 1. 1.] [ 1. 1. 1. 1.]] [[ 0. 0. 0.] [ 0. 0. 0.] [ 0. 0. 0.] [ 0. 0. 0.]]
# changing the dimension of an array
x = np.ones((3,4))
print x
x = np.reshape(x, (4,3))
print x
[[ 1. 1. 1. 1.] [ 1. 1. 1. 1.] [ 1. 1. 1. 1.]] [[ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.] [ 1. 1. 1.]]
# operations on numpy arrays
y = np.array(range(1,5))
print y
print y**2
print y^2 # careful: the caret operator is bitwise XOR, not exponentiation!
print np.sqrt(y)
print np.log(y)
[1 2 3 4] [ 1 4 9 16] [3 0 1 6] [ 1. 1.41421356 1.73205081 2. ] [ 0. 0.69314718 1.09861229 1.38629436]
# linear algebra: linalg package in scipy
x = np.array([[0,1], [0,2], [1,3]])
print x
xtx = np.dot(x.T, x) # matrix multiplication
print xtx
from scipy import linalg
xtxInv = sp.linalg.inv(xtx) # matrix inversion is in the linalg part of scipy.
# NOTE: generally should avoid inverting matrices
# there are much better ways to solve linear systems
print xtxInv
print np.dot(xtx, xtxInv) # was that really the inverse?
[[0 1] [0 2] [1 3]] [[ 1 3] [ 3 14]] [[ 2.8 -0.6] [-0.6 0.2]] [[ 1.00000000e+00 -1.11022302e-16] [ 0.00000000e+00 1.00000000e+00]]
# solving linear systems: the linalg package in scipy
b = np.array([0, 1])
y = linalg.solve(xtx,b)
print y, np.dot(xtx,y) -b
[-0.6 0.2] [ 0. 0.]
# least squares: linalg package in scipy
# build a quadratic with noise; fit a cubic
import numpy.random
n = 25
x = np.linspace(0,1,n) # grid of x values
A = np.vstack((np.ones(n), x, x**2, x**3)).T # Design matrix. The column of ones gives the constant term.
# generate fake data
coeffs = np.random.rand(3) # three independent uniform[0,1] variables
coeffs = np.append(coeffs,0)
y = np.dot(A, coeffs) + np.random.randn(n) # data are quadratic plus independent standard Gaussian noise
fitc, err, rank, sigma = linalg.lstsq(A, y)
print coeffs, fitc
print 'difference:', fitc-coeffs
[ 0.49868413 0.47601081 0.55385093 0. ] [ 0.46695487 3.08249731 -7.61942371 4.67542972] difference: [-0.03172925 2.6064865 -8.17327463 4.67542972]
# a little plotting
# the %matplotlib inline call is a bit of "magic" so that plots display in the browser.
%matplotlib inline
import matplotlib.pyplot as plt # amazing matplotlib plotting library. See http://matplotlib.org/gallery.html
preds = np.dot(A, fitc)
# blue stars for data, red dashes for fitted function, green solid line for true function
plt.plot(x,y,'b*',x,preds,'r--',x,np.dot(A,coeffs),'g-', linewidth=2)
plt.axis([-0.1,1.1,-3,4]) # axis limits
plt.xlabel('$x$') # axis labels. Can use LaTeX math markup
plt.ylabel('$y$, predictions, truth')
plt.title('Fit of cubic function to quadratic data with $N(0,1)$ errors')
plt.show()
# logical (Boolean) variables
print True
print False
x = True
print x
print 1 == 2
print 1 > 2
print 1 >= 2
print 1 != 2
True False True False False False True
# logical operators
print not True
print not False
print True or False # logical "or"
print False or False
print True and True # logical "and"
print True and False
print not(True and False)
False True True False True False True
# most numerical values can be cast as Booleans
import math
print 0 and True
print 0 or True
print 1 and True
print math.pi and True
print "hello" and True
0 True True True True
# sorting, max, min
x = np.array(range(10,5,-1))
print x
print np.sort(x)
# what permutation puts the list in sorted order?
print np.argsort(x)
#
print np.max(x), np.min(x), np.max(x**2), np.max(x)**2
print np.sum(x), np.cumsum(x)
print np.prod(x), np.cumprod(x)
[10 9 8 7 6] [ 6 7 8 9 10] [4 3 2 1 0] 10 6 100 100 40 [10 19 27 34 40] 30240 [ 10 90 720 5040 30240]
# Set operations
x = [1, 2, 3, 3, 3]
print x
s = set(x)
print s
print len(s) # cardinality of s
print 1 in s # is 1 an element of s?
print 0 not in s #
print s.isdisjoint([1, 2])
print s.isdisjoint([0, 10])
print s.issubset(range(1,5))
print s <= range(1,5)
print s.issubset(range(2))
print s <= range(2)
print s < range(1,5) # proper subset?
print s < range(1,4)
print s.issuperset(range(1,3))
print s >= range(1,3)
print s > range(1,4)
print s > range(1,3)
print s.union(range(5,10))
print s | set(range(5,10))
print s.intersection(range(2))
print s & set(range(2))
print s.difference(range(2))
[1, 2, 3, 3, 3] set([1, 2, 3]) 3 True True False True True False False False False False True True True True set([1, 2, 3, 5, 6, 7, 8, 9]) set([1, 2, 3, 5, 6, 7, 8, 9]) set([1]) set([1]) set([2, 3])
# printing and flow control
x = 3
for i in range(x): # first flow control
print x**i
i =1
while i <= 5: # second flow control
print x**i
i =i+1 # indentation matters in Python! if this were indented more or less, it would complain
if 1 < 2: # third flow control
print '1 is less than 2'
if 2 < 1:
print '2 is less than 1'
else:
print '2 is not less than 1'
if 2 < 0:
print '2 is less than 0'
elif 2 < 1:
print '2 is less than 1'
else:
print '2 is neither less than 1 nor less than zero'
File "<ipython-input-20-b1a5f7e8caf8>", line 10 i =i+1 # indentation matters in Python! if this were indented more or less, it would complain ^ IndentationError: unindent does not match any outer indentation level
Next chapter: Sets, Combinatorics, & Probability
%run talkTools.py