In this tutorial, we'll learn the basic functionality of two important Python packages for scientific computing: NumPy and matplotlib.
First, recall from previous tutorials some useful shortcuts for IPython notebook:
Material has been adapted from the following:
Python includes a package for numerical computation called numpy, which is an essential tool for scientific computing. It is based on multidimensional arrays (vectors and matrices) and provides a large library of functions and operators that work efficiently on these objects. In this way, any numerical algorithm is expressed as operations on arrays and in many cases runs almost as quickly as the equivalent C code.
Numpy is complemented by SciPy
; a library that adds more MATLAB-like functionality and matplotlib
; a plotting package that provides MATLAB-like plotting functionality.
Like MATLAB, NumPy relies on BLAS and LAPACK for efficient linear algebra computations.
NumPy's is used in: Numerical Analysis, Linear algebra, Solution of differential equations, Image and signal processing, Statistical analysis, Data transformation and query and many more...
Consider the following list and suppose we want to perform some simple algebraic operations:
price = [5.99, 10.25, 2.0, 40.99, 5.60, 63.49]
price*2
[5.99, 10.25, 2.0, 40.99, 5.6, 63.49, 5.99, 10.25, 2.0, 40.99, 5.6, 63.49]
price+0.5
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-458-ba6ebb983a2c> in <module>() ----> 1 price+0.5 TypeError: can only concatenate list (not "float") to list
Clearly, if we wanted to double the value of each element and add half, this cannot be done easily. Instead, we would have to use a loop:
for i in range(len(price)):
price[i] = price[i]*2
price[i] = price[i]+0.5
print price
But, there should be a more convinient way...
We start, by importing the NumPy module. Import statements like this are the typical way of getting access to functions in Python. The most general way to import a module is the following:
from numpy import *
The above imports all functions included in the package and can call them simply by their name.
sin(3*pi/2)*exp(2)
-7.3890560989306504
Instead we can import the numpy module and tell Python that we want to refer to numpy by the short abbreviation np. This is recommended since we keep track of which module the functions belong to.
import numpy as np
NumPy's main object is the homogeneous multidimensional array. It is a table of elements - all of the same type - indexed by a tuple of positive integers.
NumPy's array class is called ndarray and the package includes basic utility functions for creating and manipulate arrays in a similar fashion to MATLAB. For example we can create a one-dimensional array by:
price = np.array([5.99, 10.25, 2.0, 40.99, 5.60, 63.49])
print type(price)
price = price*2+0.5
print price
<type 'numpy.ndarray'> [ 12.48 21. 4.5 82.48 11.7 127.48]
There are other ways to create one-dimensional arrays:
x = np.linspace(-1,1,5)
print x
[-1. -0.5 0. 0.5 1. ]
y = np.arange(0,1,0.2)
print y
[ 0. 0.2 0.4 0.6 0.8]
Exercise 1: Change the inputs of the linspace and arange function until you understand exactly what they do. Using both ways, create a vector of starting from -1.5 to 4.5 exclusive with a step of 0.5.
# type your solution here
#%load solutions/1D_array.py
We can always find out what a function does by typing its name, followed by a question mark:
np.arange?
Resize the help window (to get it out of your way) by dragging the divider. Double click on the divider to close it.
We can create multidimensional arrays, like the following:
A = np.array([[1,2.4,-13],[4.1,5,0],[7.2,8,9]])
print "A = ", A
print type(A)
print A.shape
print A.dtype
A = [[ 1. 2.4 -13. ] [ 4.1 5. 0. ] [ 7.2 8. 9. ]] <type 'numpy.ndarray'> (3, 3) float64
To see all different attributes of the array object type its name followed by a fullstop and press tap. Note that some attributes are basically functions and require parentheses (with or without arguments), e.g. conjugate and others don't, e.g. size:
A = np.array([[1,2+4j,3j],[4j,5,6-5j],[7j,8,9+3j]]) # dtype=complex
print A.conjugate()
print A.size
[[ 1.-0.j 2.-4.j 0.-3.j] [ 0.-4.j 5.-0.j 6.+5.j] [ 0.-7.j 8.-0.j 9.-3.j]] 9
We can also define the shape of the array:
np.random.uniform(0,1,size=(5,5)) # 5 x 5 matrix with elements from a uniform distribution
array([[ 0.36571257, 0.97149998, 0.93448159, 0.63618724, 0.11734503], [ 0.82468843, 0.0165409 , 0.38839101, 0.41888606, 0.98623419], [ 0.92330802, 0.02180409, 0.57944221, 0.90391814, 0.99616971], [ 0.82753157, 0.99756878, 0.82759013, 0.14440918, 0.38945326], [ 0.95052569, 0.29399635, 0.85205134, 0.47462323, 0.94028725]])
The function zeros creates an array full of zeros, the function ones creates an array full of ones and eye an identity array. Function diag is used for a diagonal array with specified diagonal elements. A matrix with random entries is generated by the random function.
print np.zeros((3,4))
print np.ones((2,3))
print np.eye(3)
[[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.]] [[ 1. 1. 1.] [ 1. 1. 1.]] [[ 1. 0. 0.] [ 0. 1. 0.] [ 0. 0. 1.]]
print np.diag(x,0)
print np.diag([2,3,4])
print np.random.random((2,2))
[[-1. 0. 0. 0. 0. ] [ 0. -0.5 0. 0. 0. ] [ 0. 0. 0. 0. 0. ] [ 0. 0. 0. 0.5 0. ] [ 0. 0. 0. 0. 1. ]] [[2 0 0] [0 3 0] [0 0 4]] [[ 0.14303763 0.08297788] [ 0.04270201 0.26137453]]
Notice that there are two pairs of parentheses used when calling the zeros, ones and random command, because we are passing as argument (e.g. the Python tuple (3,4)), which is the shape of the resulting array.
Manipulation of arrays can be done in many ways:
A = np.arange(1,10)
print "A = ", A
A = A.reshape((3,3))
print A
A = [1 2 3 4 5 6 7 8 9] [[1 2 3] [4 5 6] [7 8 9]]
B = arange(15).reshape((3,5))
print "B = ", B
print B.ravel() # flatten the array
B = [[ 0 1 2 3 4] [ 5 6 7 8 9] [10 11 12 13 14]] [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
We can slice numpy arrays just as other programming languages, but the indexing is a little different. See the examples below:
print A
print A[0,0]
print A[1:3,0:2]
print A[1:3,[0,1]]
[[1 2 3] [4 5 6] [7 8 9]] 1 [[4 5] [7 8]] [[4 5] [7 8]]
Note that the fundamental difference from MATLAB is that the indexing starts from zero and is exclusive of the second index. Hence, in the above example 1:3 refers to the second and third row and 0:2 to the first and second column.
We can index in some clever ways too:
print A
[[1 2 3] [4 5 6] [7 8 9]]
print A[0,:] # row selection
print A[:,0] # column selection
[1 2 3] [1 4 7]
print A[:,1:] # all rows, from 2nd column until the end
print A[-1,:] # last row, all columns
[[2 3] [5 6] [8 9]] [7 8 9]
We can change array's elements:
A[:,2] = [.3,.4,9.12]
print A
print A.dtype
[[1 2 0] [4 5 0] [7 8 9]] int64
Notice that the elements of A remain integers. That is because A has type int. The following does the desired change:
A = array(A,dtype='float')
A[:,2] = [.3,.4,9.12]
print A
[[ 1. 2. 0.3 ] [ 4. 5. 0.4 ] [ 7. 8. 9.12]]
There is an easy way to reverse a vector:
a = arange(10)
print a
print a[: :-1] # reverse a
[0 1 2 3 4 5 6 7 8 9] [9 8 7 6 5 4 3 2 1 0]
We can also index an array using boolean arrays:
b = a>5
print b
a[b] = 0 # All elements of 'a' higher than 5 become 0
print a
[False False False False False False True True True True] [0 1 2 3 4 5 0 0 0 0]
print a**2
print 10*sin(np.pi/a[1:5]) # avoiding division with zero!
print A*A
[ 0 1 4 9 16 25 0 0 0 0] [ 1.22464680e-15 1.00000000e+01 8.66025404e+00 7.07106781e+00] [[ 1. 4. 0.09 ] [ 16. 25. 0.16 ] [ 49. 64. 83.1744]]
By default, NumPy uses component-wise multiplication. For a matrix-matrix (or matrix-vector) multiplication, we use the dot function:
np.dot(A,A)
array([[ 11.1 , 14.4 , 3.836 ], [ 26.8 , 36.2 , 6.848 ], [ 102.84 , 126.96 , 88.4744]])
A = array(A,dtype='int')
print A
[[1 2 0] [4 5 0] [7 8 9]]
print A.sum() # sum of all elements
print A.min()
print A.max()
36 0 9
print A.sum(axis=0) # sum of each column
print A.min(axis=1) # min of each row
print A.cumsum(axis=1) # cumulative sum along each row
[12 15 9] [0 0 7] [[ 1 3 3] [ 4 9 9] [ 7 15 24]]
In Python axis refers to the dimensions of arrays; axis=0 refers to the colums and axis=1 to the rows.
The vstack and hstack commands can be used to stack arrays and hsplit to split.
A = floor(10*random.random((2,2)))
print A
B = floor(10*random.random((2,2)))
print B
[[ 1. 6.] [ 7. 1.]] [[ 2. 1.] [ 2. 9.]]
print np.vstack((A,B))
print np.hstack((B,A))
[[ 1. 6.] [ 7. 1.] [ 2. 1.] [ 2. 9.]] [[ 2. 1. 1. 6.] [ 2. 9. 7. 1.]]
A = floor(10*random.random((2,12)))
print A
[[ 5. 3. 9. 2. 3. 0. 2. 5. 6. 6. 0. 9.] [ 7. 2. 4. 7. 9. 5. 4. 3. 7. 8. 5. 6.]]
print np.hsplit(A,3) # Split A into 3 arrays
print np.hsplit(A,(3,4)) # Split A after the third and the fourth column
[array([[ 5., 3., 9., 2.], [ 7., 2., 4., 7.]]), array([[ 3., 0., 2., 5.], [ 9., 5., 4., 3.]]), array([[ 6., 6., 0., 9.], [ 7., 8., 5., 6.]])] [array([[ 5., 3., 9.], [ 7., 2., 4.]]), array([[ 2.], [ 7.]]), array([[ 3., 0., 2., 5., 6., 6., 0., 9.], [ 9., 5., 4., 3., 7., 8., 5., 6.]])]
Exercise 2:
# type your solution here
#%load solutions/array_operations.py
Now let's work with some real data. Ensure that the stockholm_temp.txt in the same directory with this notebook. This file contains the daily average temperatures according to observations of Stockholm for the years 1756 - 2012. The following code checks if the file exists, opens it, prints the first lines and stores the data in an array.
import os
filename = 'stockholm_temp.txt'
if os.path.exists(filename):
# look at the first 3 lines
with open(filename,'r') as f:
print '\n'.join(f.readlines()[:3])
# alternative look at the first head lines
!head 'stockholm_temp.txt'
data = np.loadtxt(filename) # alternative we can use genfromtxt command
else:
print 'Weather data does not exist, please download "stockholm_temp.txt" from Github.'
1.756000000000000000e+03 1.000000000000000000e+00 1.000000000000000000e+00 -8.699999999999999289e+00 1.756000000000000000e+03 1.000000000000000000e+00 2.000000000000000000e+00 -9.199999999999999289e+00 1.756000000000000000e+03 1.000000000000000000e+00 3.000000000000000000e+00 -8.599999999999999645e+00 1.756000000000000000e+03 1.000000000000000000e+00 1.000000000000000000e+00 -8.699999999999999289e+00 1.756000000000000000e+03 1.000000000000000000e+00 2.000000000000000000e+00 -9.199999999999999289e+00 1.756000000000000000e+03 1.000000000000000000e+00 3.000000000000000000e+00 -8.599999999999999645e+00 1.756000000000000000e+03 1.000000000000000000e+00 4.000000000000000000e+00 -7.700000000000000178e+00 1.756000000000000000e+03 1.000000000000000000e+00 5.000000000000000000e+00 -7.200000000000000178e+00 1.756000000000000000e+03 1.000000000000000000e+00 6.000000000000000000e+00 -1.600000000000000089e+00 1.756000000000000000e+03 1.000000000000000000e+00 7.000000000000000000e+00 6.999999999999999556e-01 1.756000000000000000e+03 1.000000000000000000e+00 8.000000000000000000e+00 1.300000000000000044e+00 1.756000000000000000e+03 1.000000000000000000e+00 9.000000000000000000e+00 2.399999999999999911e+00 1.756000000000000000e+03 1.000000000000000000e+00 1.000000000000000000e+01 8.000000000000000444e-01
Since the data are in the form *Year
we would like to express them in a better way. We define a dtype
, that is the type for each column: int, int, int, float:
dt = np.dtype([('Year', 'int16'), ('Month', 'int8'), ('Day', 'int8'), ('Temp', 'float64')])
data = np.loadtxt(filename,dtype=dt)
data[:10] # first 10 entries of our data
array([(1756, 1, 1, -8.7), (1756, 1, 2, -9.2), (1756, 1, 3, -8.6), (1756, 1, 4, -7.7), (1756, 1, 5, -7.2), (1756, 1, 6, -1.6), (1756, 1, 7, 0.7), (1756, 1, 8, 1.3), (1756, 1, 9, 2.4), (1756, 1, 10, 0.8)], dtype=[('Year', '<i2'), ('Month', 'i1'), ('Day', 'i1'), ('Temp', '<f8')])
We can now look for the example only on the temperatures:
data['Temp']
array([-8.7, -9.2, -8.6, ..., 0.2, 2.6, 2.2])
or find all leap years:
years = np.unique(data['Year']) # making an array with each year appearing only once
leap_years = np.zeros((1,),dtype='int16') # initializing array for leap years
for i in range(len(years)):
if np.fmod(years[i],4)==0 and ((np.fmod(years[i],100)==0 and np.fmod(years[i],400)==0) or np.fmod(years[i],100)!=0):
leap_years = np.append(leap_years,years[i]) # adding a leap year
leap_years = leap_years[1:] # removing dummy first value
print leap_years
[1756 1760 1764 1768 1772 1776 1780 1784 1788 1792 1796 1804 1808 1812 1816 1820 1824 1828 1832 1836 1840 1844 1848 1852 1856 1860 1864 1868 1872 1876 1880 1884 1888 1892 1896 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012]
Exercise 3: Make an array with the temperatures of the 5th day of every month.
# type your solution here
#%load solutions/temperatures.py
Handling ndarrays is really powerful. Of the many things one can do with NumPy's arrays is linear algebra:
x = np.random.random(10)
print x
np.linalg.norm(x,np.inf) # maximum norm of a vector
[ 0.59126588 0.175877 0.27998015 0.61563441 0.49037974 0.9935111 0.04109928 0.53477242 0.30145292 0.02317688]
0.99351110414064647
We can use this module to solve a linear system of equations $Ax=b$:
A = np.array([[0,2],[8,0]])
b = np.array([1,2])
print A
print b
[[0 2] [8 0]] [1 2]
x = np.linalg.solve(A,b)
print x
[ 0.25 0.5 ]
We can also solve eigenvalue problems:
lamda,V = np.linalg.eig(A)
print lamda
print V
[ 4. -4.] [[ 0.4472136 -0.4472136 ] [ 0.89442719 0.89442719]]
Notice we have put two variables on the left of the equals sign, to assign the outputs of eig() to two different variables. Here lamda are the eigenvalues and the columns of V are the eigenvectors.
Exercise 4: Create a diagonal matrix using the lamda variable and then multiply from the left with V and from the right with the inverse of V (use np.linalg.inv()). What do you find?
# type your solution here
#%load solutions/linear_algebra.py
Matplotlib
is the basic package of Python for plotting.
%pylab inline --no-import-all
import matplotlib.pyplot as plt
Populating the interactive namespace from numpy and matplotlib
The first line above ensures that our plots appear in the browser, instead in a separate window.
Let's see a very simple example. Suppose we want to plot the function $\sin(\exp(x))$ on the interval $x\in(0,\pi)$. We'll want to use the NumPy functions of the sine and exponential functions. (Note: the math module operates only on scalars so we should use NumPy in such cases)
x = np.linspace(0,np.pi,1000)
f = np.sin(np.exp(x))
g = np.cos(2*x)
plt.plot(x,f)
plt.plot(x,g)
[<matplotlib.lines.Line2D at 0x10f73a050>]
There are several ways to format a plot in python. Here we just mention only the basic functionality. It's straightforward to use them and you will generally only get into its advanced functionality if you are customizing the look of a plot for a publication. Matplotlib documentation is very good, have a look here. You can also read this tutorial for an easy start. More references are listed at the end of this tutorial.
The following code does the basic formatting for the plot above:
plt.figure(figsize=(10,6), dpi=100) # changing figure's shape
plt.xlim(x.min()-0.1, x.max()+0.1) # adjusting horizonatal axis limits
plt.ylim(f.min()-0.1, f.max()+0.1) # adjusting vertical axis limits
plt.xticks([0, np.pi/2, np.pi],[r'$-\pi$', r'$-\pi/2$', r'$0$', r'$+\pi/2$', r'$+\pi$'],fontsize=14) # setting ticks
plt.yticks([-1, 0, +1],[r'$-1$', r'$0$', r'$+1$'],fontsize=14) # setting ticks
plt.plot(x, f, color="blue", linewidth=2.5, linestyle="-", label="$\sin(e^x)$") # changing color and thickness
plt.plot(x, g, color="red", linewidth=2.5, linestyle="-", label="$\cos(2x)$") # changing color and thickness
plt.legend(loc='lower left',prop={'size':16}) # placing legend on bottom left
plt.xlabel('$x$',fontsize=16) # horizontal axis name
plt.ylabel('test functions',fontsize=16) # vertical axis name
plt.title('Sample plot',fontsize=18) # title
plt.grid(True) # enabling grid
#plt.savefig('trig_functions.pdf')
The image can be resized by dragging the handle in the lower right corner. Double clicking will return them to their original size.
One thing to be aware of is that by default, the figure object is cleared at the end of each cell, so you will need to issue all plotting commands for a single figure in a single cell.
plt.savefig('trig_functions.pdf')
<matplotlib.figure.Figure at 0x110360b90>
Go on and open the pdf file. It is empty! Now uncomment the save line in the block where the figure is plotted.
The following example is adapted from this tutorial. Let's define the following function:
alpha = 0.7
phi_ext = 2*np.pi*0.5
def flux_qubit_potential(phi_m, phi_p):
return 2 + alpha - 2*np.cos(phi_p)*np.cos(phi_m) - alpha*np.cos(phi_ext - 2*phi_p)
Now create a mesh and evaluate the function over that mesh:
phi_m = np.linspace(0, 2*np.pi, 100)
phi_p = np.linspace(0, 2*np.pi, 100)
X,Y = np.meshgrid(phi_p, phi_m)
Z = flux_qubit_potential(X, Y).T
Plot the surface:
from mpl_toolkits.mplot3d.axes3d import Axes3D # imports 3D plotting
from matplotlib import cm # module for color pattern
fig = plt.figure(figsize=(14,6))
# `ax` is a 3D-aware axis instance because of the projection='3d' keyword argument to add_subplot
ax = fig.add_subplot(1, 2, 1, projection='3d')
p = ax.plot_surface(X, Y, Z, rstride=4, cstride=4, linewidth=0)
# surface_plot with color grading and color bar
ax = fig.add_subplot(1, 2, 2, projection='3d')
p = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.coolwarm, linewidth=0, antialiased=False)
cb = fig.colorbar(p, shrink=0.5) # enables colorbar
Exercise 5: Use the weather data variable defined previously to plot the temperature over the years. Put a title, axis labels and make the plot wide enough. Hint: the x-axis represents the time and it can be evaluated by Year + Month/12.0 + Day/365.0.
# type your solution here
#%load solutions/plotting.py
There are many more fancy things to do with Python visualization package. For example we can create animations.
The code below uses the solution of the Woodward-Colella blast wave problem solved by the Clawpack package.
def solution(filename, path='woodward_colella_blast'):
"""
This function reads the solution stored in an
ascii file written by pyclaw.
"""
import sys
# Concatenate path and file name
pathfilename = path + "/" + filename
try:
f = open(pathfilename,"r")
except IOError as e:
print("({})".format(e))
sys.exit()
# Read file header
# The information contained in the first two lines are not used.
unused = f.readline() # grid_number
unused = f.readline() # AMR_level
# Read mx, my, xlow, ylow, dx and dy
line = f.readline()
sline = line.split()
mx = int(sline[0])
line = f.readline()
sline = line.split()
xlower = float(sline[0])
line = f.readline()
sline = line.split()
dx = float(sline[0])
# Grid:
xupper = xlower + mx * dx
xc = np.linspace(xlower+dx/2.,xupper-dx/2.,mx)
# Read solution
# Define arrays of conserved variables
q = np.zeros((mx,3))
line = f.readline()
for j in range(mx):
line = f.readline()
q[j,:] = np.array(map(float, line.split()))
return q, xc
def time(filename, path='woodward_colella_blast'):
"""
This function reads the time stored in an
ascii file written by pyclaw.
"""
from clawpack import pyclaw
import sys
# Concatenate path and file name
pathfilename = path + "/" + filename
try:
f = open(pathfilename,"r")
except IOError as e:
print("({})".format(e))
sys.exit()
# Read time
line = f.readline()
sline = line.split()
t = float(sline[0])
return t
def fplot(i):
index = "%02d" % i
q, xc = solution('fort.q00' + index)
t = time('fort.t00' + index)
line.set_data(xc, q[:,0])
ax.set_title('Density at t='+str(t))
return line
def init():
line.set_data([], [])
return line,
### main block ###
from matplotlib import animation
import matplotlib.pyplot as plt
from clawpack.visclaw.JSAnimation import IPython_display
_, xc = solution('fort.q0000')
fig = plt.figure(figsize=[8,4])
ax = plt.axes(xlim=(xc[0], xc[-1]), ylim=(0, 10))
line, = ax.plot([], [], lw=2)
animation.FuncAnimation(fig, fplot, frames=11, init_func=init, blit=True)
For more comprehensive examples have a look at the following links and check the capabilities of plotly
and Matplotlib
:
Besides numpy and matplotlib, there are many other useful Python packages for scientific computing. Here is a short list:
Copyright 2014, Yiannis Hadjimichael, KAUST ACM Student Chapter Member.