This tutorial is prepared by ACM Student Chapter of King Abdullah University of Science and Technology (KAUST).
Material has been adapted from the following:
- David Ketcheson's [Introduction to NumPy and Matplotlib](http://nbviewer.ipython.org/github/ketch/AMCS252/blob/master/2_Introduction_to_Numpy%20and_Matplotlib.ipynb)
- Software Carpentry's [Numerical analysis with NumPy](http://nbviewer.ipython.org/github/geocarpentry/2014-01-30-mit/blob/gh-pages/lessons/vignettes/Numerical%20analysis%20with%20NumPy.ipynb)
- J.R. Johansson's [Numpy - multidimensional data arrays](http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-2-Numpy.ipynb) and [SciPy - Library of scientific algorithms for Python](http://nbviewer.ipython.org/github/jrjohansson/scientific-python-lectures/blob/master/Lecture-3-Scipy.ipynb#SciPy---Library-of-scientific-algorithms-for-Python)
- Official [Tentative NumPy tutorial](http://www.scipy.org/Tentative_NumPy_Tutorial)
Prerequisites: Introduction to Python
Python includes a package for numerical computation called numpy, which is an essential tool for scientific computing. It is based on multidimensional arrays (vectors and matrices) and provides a large library of functions and operators that work efficiently on these objects. In this way, any numerical algorithm is expressed as operations on arrays and in many cases runs almost as quickly as the equivalent C code.
Like MATLAB, NumPy
relies on BLAS and LAPACK for efficient linear algebra computations.
NumPy
is used in:
Numerical Analysis, Linear algebra, Solution of differential equations, Statistical analysis and many more...
Consider the following list and suppose we want to perform some simple algebraic operations:
price = [5.99, 10.25, 2.0, 40.99, 5.60, 63.49]
price*2
[5.99, 10.25, 2.0, 40.99, 5.6, 63.49, 5.99, 10.25, 2.0, 40.99, 5.6, 63.49]
price+0.5
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-4-ba6ebb983a2c> in <module>() ----> 1 price+0.5 TypeError: can only concatenate list (not "float") to list
Clearly, if we wanted to double the value of each element and add half, this cannot be done easily. Instead, we would have to use a loop:
for i in range(len(price)):
price[i] = price[i]*2
price[i] = price[i]+0.5
print price
[12.48, 21.0, 4.5, 82.48, 11.7, 127.48]
But, there should be a more convinient way...
We start, by importing the NumPy module. Import statements like this are the typical way of getting access to functions in Python. The most general way to import a module is the following:
from numpy import *
The above imports all functions included in the package and can call them simply by their name.
sin(3*pi/2)*exp(2)
-7.3890560989306504
Instead we can import the numpy module and tell Python that we want to refer to numpy by the short abbreviation np. This is recommended since we keep track of which module the functions belong to.
import numpy as np
NumPy's main object is the homogeneous multidimensional array. It is a table of elements - all of the same type - indexed by a tuple of positive integers.
NumPy's array class is called ndarray and the package includes basic utility functions for creating and manipulate arrays in a similar fashion to MATLAB. For example we can create a one-dimensional array by:
price = np.array([5.99, 10.25, 2.0, 40.99, 5.60, 63.49])
print type(price)
price = price*2+0.5
print price
<type 'numpy.ndarray'> [ 12.48 21. 4.5 82.48 11.7 127.48]
There are other ways to create one-dimensional arrays:
x = np.linspace(-1,1,5)
print x
[-1. -0.5 0. 0.5 1. ]
y = np.arange(0,1,0.2)
print y
[ 0. 0.2 0.4 0.6 0.8]
Change the inputs of the linspace and arange function until you understand exactly what they do. Using both ways, create a vector of starting from -1.5 to 4.5 inclusive with a step of 0.5.
# type your solution here
#%load solutions/1D_array.py
We can always find out what a function does by typing its name, followed by a question mark:
np.arange?
Resize the help window (to get it out of your way) by dragging the divider. Double click on the divider to close it.
We can create multidimensional arrays, like the following:
A = np.array([[1,2.4,-13],[4.1,5,0],[7.2,8,9]])
print "A = ", A
print type(A)
print A.shape
print A.dtype
A = [[ 1. 2.4 -13. ] [ 4.1 5. 0. ] [ 7.2 8. 9. ]] <type 'numpy.ndarray'> (3, 3) float64
To see all different attributes of the array object type its name followed by a fullstop and press tap. Note that some attributes are basically functions and require parentheses (with or without arguments), e.g. conjugate and others don't, e.g. size:
A = np.array([[1,2+4j,3j],[4j,5,6-5j],[7j,8,9+3j]]) # dtype=complex
print A.conjugate()
print A.size
[[ 1.-0.j 2.-4.j 0.-3.j] [ 0.-4.j 5.-0.j 6.+5.j] [ 0.-7.j 8.-0.j 9.-3.j]] 9
We can also define the shape of the array:
np.random.uniform(0,1,size=(5,5)) # 5 x 5 matrix with elements from a uniform distribution
array([[ 0.18894904, 0.4016313 , 0.30167486, 0.04452608, 0.29880108], [ 0.85862765, 0.18898635, 0.46421297, 0.6303044 , 0.56185042], [ 0.78056999, 0.82705839, 0.63247392, 0.07986216, 0.64999473], [ 0.07852346, 0.27204531, 0.85521149, 0.94891731, 0.7810846 ], [ 0.47172365, 0.57284027, 0.57618836, 0.30150177, 0.54945194]])
The function zeros creates an array full of zeros, the function ones creates an array full of ones and eye an identity array. Function diag is used for a diagonal array with specified diagonal elements. A matrix with random entries is generated by the random function.
print np.zeros((3,4))
print np.ones((2,3))
print np.eye(3)
[[ 0. 0. 0. 0.] [ 0. 0. 0. 0.] [ 0. 0. 0. 0.]] [[ 1. 1. 1.] [ 1. 1. 1.]] [[ 1. 0. 0.] [ 0. 1. 0.] [ 0. 0. 1.]]
print np.diag(x,0)
print np.diag([2,3,4])
print np.random.random((2,2))
[[-1. 0. 0. 0. 0. ] [ 0. -0.5 0. 0. 0. ] [ 0. 0. 0. 0. 0. ] [ 0. 0. 0. 0.5 0. ] [ 0. 0. 0. 0. 1. ]] [[2 0 0] [0 3 0] [0 0 4]] [[ 0.89073408 0.92386659] [ 0.43499535 0.1673178 ]]
Notice that there are two pairs of parentheses used when calling the zeros, ones and random command, because we are passing as argument (e.g. the Python tuple (3,4)), which is the shape of the resulting array.
Manipulation of arrays can be done in many ways:
A = np.arange(1,10)
print "A = ", A
A = A.reshape((3,3))
print A
A = [1 2 3 4 5 6 7 8 9] [[1 2 3] [4 5 6] [7 8 9]]
B = arange(15).reshape((3,5))
print "B = ", B
print B.ravel() # flatten the array
B = [[ 0 1 2 3 4] [ 5 6 7 8 9] [10 11 12 13 14]] [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
We can slice numpy arrays just as other programming languages, but the indexing is a little different. See the examples below:
print A
print A[0,0]
print A[1:3,0:2]
print A[1:3,[0,1]]
[[1 2 3] [4 5 6] [7 8 9]] 1 [[4 5] [7 8]] [[4 5] [7 8]]
Note that the fundamental difference from MATLAB is that the indexing starts from zero and is exclusive of the second index. Hence, in the above example 1:3 refers to the second and third row and 0:2 to the first and second column.
We can index in some clever ways too:
print A
[[1 2 3] [4 5 6] [7 8 9]]
print A[0,:] # row selection
print A[:,0] # column selection
[1 2 3] [1 4 7]
print A[:,1:] # all rows, from 2nd column until the end
print A[-1,:] # last row, all columns
[[2 3] [5 6] [8 9]] [7 8 9]
We can change array's elements:
A[:,2] = [.3,.4,9.12]
print A
print A.dtype
[[1 2 0] [4 5 0] [7 8 9]] int64
Notice that the elements of A remain integers. That is because A has type int. The following does the desired change:
A = A.astype(float)
A[:,2] = [.3,.4,9.12]
print A
[[ 1. 2. 0.3 ] [ 4. 5. 0.4 ] [ 7. 8. 9.12]]
There is an easy way to reverse a vector:
a = arange(10)
print a
print a[: :-1] # reverse a
[0 1 2 3 4 5 6 7 8 9] [9 8 7 6 5 4 3 2 1 0]
We can also index an array using boolean arrays:
b = a>5
print b
a[b] = 0 # All elements of 'a' higher than 5 become 0
print a
[False False False False False False True True True True] [0 1 2 3 4 5 0 0 0 0]
print a**2
print 10*sin(np.pi/a[1:5]) # avoiding division with zero!
print A*A
[ 0 1 4 9 16 25 0 0 0 0] [ 1.22464680e-15 1.00000000e+01 8.66025404e+00 7.07106781e+00] [[ 1. 4. 0.09 ] [ 16. 25. 0.16 ] [ 49. 64. 83.1744]]
By default, NumPy uses component-wise multiplication. For a matrix-matrix (or matrix-vector) multiplication, we use the dot function:
np.dot(A,A)
array([[ 11.1 , 14.4 , 3.836 ], [ 26.8 , 36.2 , 6.848 ], [ 102.84 , 126.96 , 88.4744]])
A = array(A,dtype='int')
print A
[[1 2 0] [4 5 0] [7 8 9]]
print A.sum() # sum of all elements
print A.min()
print A.max()
36 0 9
print A.sum(axis=0) # sum of each column
print A.min(axis=1) # min of each row
print A.cumsum(axis=1) # cumulative sum along each row
[12 15 9] [0 0 7] [[ 1 3 3] [ 4 9 9] [ 7 15 24]]
In Python axis refers to the dimensions of arrays; axis=0 refers to the colums and axis=1 to the rows.
The vstack and hstack commands can be used to stack arrays and hsplit to split.
A = floor(10*random.random((2,2)))
print A
B = floor(10*random.random((2,2)))
print B
[[ 4. 0.] [ 6. 7.]] [[ 1. 2.] [ 5. 3.]]
print np.vstack((A,B))
print np.hstack((B,A))
[[ 4. 0.] [ 6. 7.] [ 1. 2.] [ 5. 3.]] [[ 1. 2. 4. 0.] [ 5. 3. 6. 7.]]
A = floor(10*random.random((2,12)))
print A
[[ 6. 7. 5. 2. 3. 6. 2. 4. 7. 5. 2. 9.] [ 8. 4. 4. 5. 7. 0. 5. 3. 3. 3. 6. 3.]]
print np.hsplit(A,3) # Split A into 3 arrays
print np.hsplit(A,(3,4)) # Split A after the third and the fourth column
[array([[ 6., 7., 5., 2.], [ 8., 4., 4., 5.]]), array([[ 3., 6., 2., 4.], [ 7., 0., 5., 3.]]), array([[ 7., 5., 2., 9.], [ 3., 3., 6., 3.]])] [array([[ 6., 7., 5.], [ 8., 4., 4.]]), array([[ 2.], [ 5.]]), array([[ 3., 6., 2., 4., 7., 5., 2., 9.], [ 7., 0., 5., 3., 3., 3., 6., 3.]])]
# type your solution here
#%load solutions/array_operations.py
Now let's work with some real data. Ensure that the stockholm_temp.txt in the same directory with this notebook. This file contains the daily average temperatures according to observations of Stockholm for the years 1756 - 2012. The following code checks if the file exists, opens it, prints the first lines and stores the data in an array.
import os
filename = 'data/stockholm_temp.txt'
if os.path.exists(filename):
# look at the first 3 lines
with open(filename,'r') as f:
print '\n'.join(f.readlines()[:3])
# alternative look at the first head lines
!head 'data/stockholm_temp.txt'
data = np.loadtxt(filename) # alternative we can use getfromtxt command
else:
print 'Weather data does not exist, please download "stockholm_temp.txt" from Github.'
1.756000000000000000e+03 1.000000000000000000e+00 1.000000000000000000e+00 -8.699999999999999289e+00 1.756000000000000000e+03 1.000000000000000000e+00 2.000000000000000000e+00 -9.199999999999999289e+00 1.756000000000000000e+03 1.000000000000000000e+00 3.000000000000000000e+00 -8.599999999999999645e+00 1.756000000000000000e+03 1.000000000000000000e+00 1.000000000000000000e+00 -8.699999999999999289e+00 1.756000000000000000e+03 1.000000000000000000e+00 2.000000000000000000e+00 -9.199999999999999289e+00 1.756000000000000000e+03 1.000000000000000000e+00 3.000000000000000000e+00 -8.599999999999999645e+00 1.756000000000000000e+03 1.000000000000000000e+00 4.000000000000000000e+00 -7.700000000000000178e+00 1.756000000000000000e+03 1.000000000000000000e+00 5.000000000000000000e+00 -7.200000000000000178e+00 1.756000000000000000e+03 1.000000000000000000e+00 6.000000000000000000e+00 -1.600000000000000089e+00 1.756000000000000000e+03 1.000000000000000000e+00 7.000000000000000000e+00 6.999999999999999556e-01 1.756000000000000000e+03 1.000000000000000000e+00 8.000000000000000000e+00 1.300000000000000044e+00 1.756000000000000000e+03 1.000000000000000000e+00 9.000000000000000000e+00 2.399999999999999911e+00 1.756000000000000000e+03 1.000000000000000000e+00 1.000000000000000000e+01 8.000000000000000444e-01
Since the data are in the form *Year
we would like to express them in a better way. We define a dtype
, that is the type for each column: int, int, int, float:
dt = np.dtype([('Year', 'int16'), ('Month', 'int8'), ('Day', 'int8'), ('Temp', 'float64')])
data = np.loadtxt(filename,dtype=dt)
data[:10] # first 10 entries of our data
array([(1756, 1, 1, -8.7), (1756, 1, 2, -9.2), (1756, 1, 3, -8.6), (1756, 1, 4, -7.7), (1756, 1, 5, -7.2), (1756, 1, 6, -1.6), (1756, 1, 7, 0.7), (1756, 1, 8, 1.3), (1756, 1, 9, 2.4), (1756, 1, 10, 0.8)], dtype=[('Year', '<i2'), ('Month', 'i1'), ('Day', 'i1'), ('Temp', '<f8')])
We can now look for the example only on the temperatures:
data['Temp']
array([-8.7, -9.2, -8.6, ..., 0.2, 2.6, 2.2])
or find all leap years:
years = np.unique(data['Year']) # making an array with each year appearing only once
leap_years = np.zeros((1,),dtype='int16') # initializing array for leap years
for i in range(len(years)):
if np.fmod(years[i],4)==0 and ((np.fmod(years[i],100)==0 and np.fmod(years[i],400)==0) or np.fmod(years[i],100)!=0):
leap_years = np.append(leap_years,years[i]) # adding a leap year
leap_years = leap_years[1:] # removing dummy first value
print leap_years
[1756 1760 1764 1768 1772 1776 1780 1784 1788 1792 1796 1804 1808 1812 1816 1820 1824 1828 1832 1836 1840 1844 1848 1852 1856 1860 1864 1868 1872 1876 1880 1884 1888 1892 1896 1904 1908 1912 1916 1920 1924 1928 1932 1936 1940 1944 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012]
# type your solution here
#%load solutions/temperatures.py
Handling ndarrays is really powerful. Of the many things one can do with NumPy's arrays is linear algebra:
x = np.random.random(10)
print x
np.linalg.norm(x,np.inf) # maximum norm of a vector
[ 0.05834554 0.05282837 0.39608145 0.37972821 0.59102374 0.38928025 0.00254778 0.4471012 0.22718756 0.32243128]
0.59102373985998502
We can use this module to solve a linear system of equations $Ax=b$:
A = np.array([[0,2],[8,0]])
b = np.array([1,2])
print A
print b
[[0 2] [8 0]] [1 2]
x = np.linalg.solve(A,b)
print x
[ 0.25 0.5 ]
We can also solve eigenvalue problems:
lamda,V = np.linalg.eig(A)
print lamda
print V
[ 4. -4.] [[ 0.4472136 -0.4472136 ] [ 0.89442719 0.89442719]]
Notice we have put two variables on the left of the equals sign, to assign the outputs of eig() to two different variables. Here lamda are the eigenvalues and the columns of V are the eigenvectors.
Create a diagonal matrix using the lamda variable and then multiply from the left with V and from the right with the inverse of V (use np.linalg.inv()). What do you find?
# type your solution here
#%load solutions/linear_algebra.py
NumPy
is complemented by SciPy
; a library that adds more MATLAB-like functionality.
SciPy
is a collection of mathematical algorithms and convenience functions that provides high-level commands and classes for Nowdays, everything from parallel programming to web and data-base subroutines and classes have been made available to Python All of this power is also accessible through the the mathematical libraries in SciPy NumPy
and SciPy
.
SciPy
is used in:
Numerical Analysis, Optimization, Linear algebra, Image and signal processing, Statistical analysis, Data processing, Data transformation and query and many more...
As with NumPy, we can access all packages in SciPy:
from scipy import *
SciPy provides implementations of a large set of special functions, usually used in mathematics, physics and engineering. Available functions include airy, elliptic, bessel, gamma, beta, hypergeometric, parabolic cylinder, mathieu, spheroidal wave, struve, and kelvin.
For a list of functions see SciPy's documention at http://docs.scipy.org/doc/scipy/reference/special.html#module-scipy.special. or try
from scipy import special
#help(special)
For example we can look at the Bessel functions.
Bessel functions are a family of solutions to Bessel’s differential equation with real or complex order alpha: \begin{align} x^2 \frac{d^2 y}{d x^2} + x \frac{d y}{d x} + (x^2 −\alpha^2)y = 0 \end{align}
from scipy.special import jn, jn_zeros
x = 0.0
n = 0 # order
# Bessel function of first kind, zero order
print "J_%d(%f) = %f" % (n, x, jn(n, x))
n = 1 # order
# Bessel function of first kind, first order
print "J_%d(%f) = %f" % (n, x, jn(n, x))
J_0(0.000000) = 1.000000 J_1(0.000000) = 0.000000
The next two lines allow plotting inside the ipython notebook. More during the next sessions...
# The next two
%pylab inline --no-import-all
import matplotlib.pyplot as plt
Populating the interactive namespace from numpy and matplotlib
We can now plot the Bessel functions of first kind for different values of order:
x = linspace(0, 12, 100)
fig, ax = plt.subplots()
ax.plot(x,zeros(len(x)),'--k')
for n in range(4):
ax.plot(x, jn(n, x), label=r"$J_%d(x)$" % n)
ax.legend();
and check for their zeros...
# zeros of Bessel functions
n = 0 # order
m = 4 # number of roots to compute
jn_zeros(n, m)
array([ 2.40482556, 5.52007811, 8.65372791, 11.79153444])
The numerical evaluation of the definite integral of a function is called numerical quadrature, or simply quadature. SciPy provides a series of functions for different kind of quadrature, for example the quad, dblquad and tplquad for single, double and triple integrals, respectively.
from scipy.integrate import quad
xl = 0 # the lower limit of x
xu = 10 # the upper limit of x
val, abserr = quad(lambda x: jn(0,x), xl, xu)
print "integral value =", val, ", absolute error =", abserr
integral value = 1.06701130396 , absolute error = 7.43478946065e-14
We can pass extra arguments to integrand function, by using the args
keyword argument.
For example we can use args=(3,)
for the third order Bessel function.
Create a function that returns the Bessel function of first kinf for different orders and then find the integral.
# type your solution here
#%load solutions/bessel_integral.py
The interp1d
function accepts given arrays describing points X and Y data.
It returns a function that can be evaluated everywhere in the domain for an arbitrary value of $x \in X$ and the image it is corresponding interpolated value. Options for interpoland kinds are linear, quadratic and cubic spline interpolation.
from scipy.interpolate import interp1d
x = np.linspace(0, 20, 20)
y = lambda x: jn(5,x) # sampled on given values x
f1 = interp1d(x, y(x)) # linear interpolation
f2 = interp1d(x, y(x), kind='quadratic')
f3 = interp1d(x, y(x), kind='cubic')
# new points where we want to evaluate the interpoland
xnew = np.linspace(0, 20, 100)
fig, ax = plt.subplots(figsize=(10,4))
plt.plot(x,y(x),'o',xnew,f1(xnew),'-', xnew, f2(xnew),'--',xnew, f3(xnew),'--',xnew,y(xnew),'k')
plt.legend(['data', 'linear', 'quadratic','cubic','true'], loc='best')
plt.show()
Sparse matrices are often useful in numerical analysis dealing with large systems. Whenever the problem involves vectors or matrices with mostly zero values then one should take advantage of the sparsity. SciPy has an extensive library for sparse matrices, with basic linear algebra operations (such as equation solving, eigenvalue calculations, etc).
To create a sparse matrix we have to choose the format that will be stored in:
from scipy.sparse import *
# dense matrix
M = array([[10,0,0,0], [0,.3,0,0], [0,2,1.5,0], [-3,0,0,100]])
print M
[[ 10. 0. 0. 0. ] [ 0. 0.3 0. 0. ] [ 0. 2. 1.5 0. ] [ -3. 0. 0. 100. ]]
# convert from dense to sparse
M_sparse = csr_matrix(M); # or csc_matrix(M)
print M_sparse
(0, 0) 10.0 (1, 1) 0.3 (2, 1) 2.0 (2, 2) 1.5 (3, 0) -3.0 (3, 3) 100.0
We can go back to the dense form:
print M_sparse
print M_sparse.todense()
(0, 0) 10.0 (1, 1) 0.3 (2, 1) 2.0 (2, 2) 1.5 (3, 0) -3.0 (3, 3) 100.0 [[ 10. 0. 0. 0. ] [ 0. 0.3 0. 0. ] [ 0. 2. 1.5 0. ] [ -3. 0. 0. 100. ]]
The easiest way to create a sparse matrix is using the lil_matrix
command. It creates an empty matrix and then we allogating non-zero entries by using matrix indexing. This way we avoid creating large dense matrices.
A = lil_matrix((4,4)) # empty 4x4 sparse matrix using a linked list format
A[0,0] = 1
A[1,1] = 3
A[2,2] = A[2,1] = 1
A[3,3] = A[3,0] = 1
A
<4x4 sparse matrix of type '<type 'numpy.float64'>' with 6 stored elements in LInked List format>
We can also change the format based on the sparsity:
A = csr_matrix(A); A # if the rows are sparse
<4x4 sparse matrix of type '<type 'numpy.float64'>' with 6 stored elements in Compressed Sparse Row format>
A = csc_matrix(A); A # if the columns are sparse
<4x4 sparse matrix of type '<type 'numpy.float64'>' with 6 stored elements in Compressed Sparse Column format>
Construct a $10000 \times 10000$ lil_matrix
$A$ and add values to it by setting all rows and columns $2000$ to $3000$ into random values. Also set random diagonal entries (you can do this without a loop!). Then use the spy
function from matplotlib.pyplot
to "spy" the matrix.
Using a vector $b$ with $10000$ random values solve the system $A x = b$.
To do so use the direcet solver linsolve
from scipy.sparse.linalg.dsolve
.
Change to different sparse formats (CSR and CSC) and measure the time needed to solve the system. You can use the time
package.
# type your solution here
#%load solutions/sparse_matrices.py
Besides NumPy and Scipy, there are many other useful Python packages for scientific computing. Here is a short list:
Copyright 2015, Maruan Al-Shedivat, Yiannis Hadjimichael, ACM Student Chapter.