Learning objectives:
You have known vectors since high school mathematics, e.g., point $(x,y)$ in the plane, point $(x,y,z)$ in space. In general, we can describe a vector $v$ as an $n$-tuple of numbers: $v=(v_0,\ldots,v_{n-1})$. One way to store vectors in Python is by using lists: $v_i$ is stored as v[i].
Arrays are a generalization of vectors where we can have multiple indices: $A_{i,j}$, $A_{i,j,k}$. In Python code this is represented as a nested list (see previous lecture), accessed as A[i][j], A[i][j][k].
Example: table of numbers, one index for the row, one for the column $$ \left\lbrack\begin{array}{cccc} 0 & 12 & -1 & 5q -1 & -1 & -1 & 0\cr 11 & 5 & 5 & -2 \end{array}\right\rbrack \hspace{1cm} A = \left\lbrack\begin{array}{ccc} A_{0,0} & \cdots & A_{0,n-1}\cr \vdots & \ddots & \vdots\cr A_{m-1,0} & \cdots & A_{m-1,n-1} \end{array}\right\rbrack $$ The number of indices in an array is the rank or number of dimensions. Using these terms, a vector can be described as a one-dimensional array, or rank 1 array.
In practice, we use Numerical Python (NumPy) arrays instead of lists to represent mathematical arrays because it is much faster for large arrays.
Let's consider an example where we store $(x,y)$ points along a curve in Python lists and numpy arrays:
# Sample function
def f(x):
return x**3
# Generate n points in [0,1]
n = 5
dx = 1.0/(n-1) # x spacing
xlist = [i*dx for i in range(n)] # Python lists
ylist = [f(x) for x in xlist]
# Turn these Python lists into Numerical Python (NumPy) arrays:
from numpy import *
x2 = array(xlist)
y2 = array(ylist)
Instead of first making lists with $x$ and $y = f (x)$ data, and then turning lists into arrays, we can make NumPy arrays directly:
n = 5 # number of points
x2 = linspace(0, 1, n) # generates n points between 0 and 1
y2 = zeros(n) # n zeros (float data type by default)
for i in range(n):
y2[i] = f(x2[i])
List comprehensions create lists, not arrays, but we can do:
y2 = array([f(xi) for xi in x2]) # list -> array
x = linspace(0, 2, 10001)
y = zeros(10001)
for i in range(len(x)):
y[i] = sin(x[i])
can be coded as
y = sin(x)
In the latter case the loop over all elements is now performed in a very efficient C function.
Operations on whole arrays, instead of using Python for-loops, is called vectorization and is a very convenient, efficient and therefore important programming technique to master.
Let's consider a simple vectorisation example: a loop to compute $x$ coordinates (x2) and $y=f(x)$ coordinates (y2) along a function curve:
x2 = linspace(0, 1, n)
y2 = zeros(n)
for i in range(n):
y2[i] = f(x2[i])
This computation can be replaced by:
x2 = linspace(0, 1, n)
y2 = f(x2)
The advantage of this approach is:
Consider the function
def f(x):
return x**3
$f(x)$ is intended for a number $x$, i.e. a scalar. So what happens when we call f(x2), where x2 is an NumPy array? The function simply evaluates $x^3$ for an array x. NumPy supports arithmetic operations on arrays, which correspond to the equivalent operations on each element, e.g.:
x**3 # x[i]**3 forr all i
cos(x) # cos(x[i]) for all i
x**3 + x*cos(x) # x[i]**3 + x[i]*cos(x[i]) for all i
x/3*exp(-x*0.5) # x[i]/3*exp(-x[i]*0.5) for all i
array([ 0.00000000e+00, 6.66600003e-05, 1.33306669e-04, ..., 2.45252956e-01, 2.45252960e-01, 2.45252961e-01])
In each of these cases a highly optimised C function is actually called to evaluate the expression. In this example, the cos function called for an array is imported from numpy rather than from the math module which only acts on scalars.
Notes:
Consider this code:
a=x
a[-1] = 42
print(x[-1])
42.0
Notice what happened here - we changed a value in a but the corresponding value in x was also changed! This is because a refers to the same array as x. If you really want a seperate copy of x then we have to make an explicit copy:
a = x.copy()
A function with many applications in science is defined as: $h(x) = \frac{1}{\sqrt{2\pi}}\exp(-0.5x^2)$
Fill two lists xlist and hlist with x and h(x) values for uniformly spaced x coordinates in [−4, 4]. You may adapt the first example in the lecture 4 notes.
Fill two arrays x and y with x and h(x) values, respectively, where h(x) is defined above. Let the x values be uniformly spaced in [−4, 4]. Use list comprehensions to create the x and y arrays.
Vectorize the code by creating the x values using the linspace function and by evaluating h(x) for an array argument.
Given a vector $v = (2, 3, −1)$ and a function $f(x) = x^3 + xe^x + 1$, apply $f$ to each element in $v$. Then calculate $f(v)$ as $v^3 + ve^v + 1$ using vector computing rules. Show that the two results are equal.
Suppose x and t are two arrays of the same length, and given the vectorized expression,
y = cos(sin(x)) + exp(1/t)
assuming x holds two elements, 0 and 2, and t holds the elements 1 and 1.5, calculate by hand (using a calculator) the y array.
Write a program that mimics the series of computations you did by hand (use explicit loops, but at the end you can use NumPy functionality to check the results).
We can select a slice of an array using a[start:stop:inc], where the slice start:stop:inc implies a set of indices starting from start, up to stop in increments of inc. In fact, any integer list or array can be used to indicate a set of indices:
a = linspace(1, 8, 8)
print(a)
[ 1. 2. 3. 4. 5. 6. 7. 8.]
a[[1,6,7]] = 10 # i.e. set the elements with indicies 1,6, and 7 in the list to 10.
print(a)
[ 1. 10. 3. 4. 5. 6. 10. 10.]
a[range(2,8,3)] = -2 # same as a[2:8:3] = -2
print(a)
[ 1. 10. -2. 4. 5. -2. 10. 10.]
Even boolean expressions can also be used to select part of an array(!)
print(a[a < 0]) # pick out all negative elements
[-2. -2.]
a[a < 0] = a.max() # if a[i]<0, set a[i]=10
print(a)
[ 1. 10. 10. 4. 5. 10. 10. 10.]
Create an array w with values 0, 0.1, 0.2, ..., 3. Write out w[:], w[:-2], w[::5], w[2:-2:6]. Convince yourself in each case that you understand which elements of the array are printed.
First of all, a little house keeping. There are quite a few ways of plotting graphs etc. in Python. Currently the best way is using PyLab. If you are someone who likes the details then please go to the PyLab website and read. Secondly, because we are doing this within IPython NoteBook, and we do not want additional windows popping up all over the place, we execute this next line:
%pylab inline
Populating the interactive namespace from numpy and matplotlib
/home/nbuser/anaconda3_431/lib/python3.6/site-packages/IPython/core/magics/pylab.py:160: UserWarning: pylab import has clobbered these variables: ['f'] `%matplotlib` prevents importing * from pylab and numpy "\n`%matplotlib` prevents importing * from pylab and numpy"
Now, onwards and upwards...
A curve $y = f(x)$ stored in the 1D NumPy arrays x and y can easily be plotted:
t = linspace(0, 3, 51)
y = t**2*exp(-t**2)
plot(t, y)
show()
Plots also should have labels on the axis, a title, and sometimes a specific extent of the axis (perhaps you wish to easily compare two graphs side-by-side):
def f(t):
return t**2*exp(-t**2)
t = linspace(0, 3, 51) # Generates 51 points between 0 and 3
y = f(t)
plot(t, y)
xlabel('t')
ylabel('y')
legend(('t^2*exp(-t^2)',))
axis([0, 3, -0.05, 0.6]) # specify the extent of the axes [tmin, tmax, ymin, ymax]
title('My second PyLab graph')
show()
The function $f(x, t) = \exp(-(x - 3t)^2)\sin(3\pi(x - t))$ describes, for a fixed value of t, a wave localized in space. Make a program that visualizes this function as a function of x on the interval [−4, 4] when t = 0.
We can also plot several curves in one plot:
def f1(t):
return t**2*exp(-t**2)
def f2(t):
return t**2*f1(t)
t = linspace(0, 3, 51)
y1 = f1(t)
y2 = f2(t)
# Matlab-style syntax:
plots = plot(t, y1, t, y2)
legend(plots, ('t^4*exp(-t^2)', 't^4*exp(-t^2)'), loc='best')
xlabel('t')
ylabel('y')
title('Plotting two curves in the same plot')
show()
When plotting multiple curves in the same plot, PyLab usually does a good job in making sure that the different lines actually look different. However, sometimes you need to take action yourself (e.g. if you need to print your graph out in black&white). To do this we can add an extra argument to the plot command where we specify what we want - e.g. "r-" means a red line, while "bo" means blue circles:
plot(t, y1, 'r-', t, y2, 'bo')
show()
For further examples check out the PyLab website.
When we have a table of numbers,
$$ \left\lbrack\begin{array}{cccc} 0 & 12 & -1 & 5\cr -1 & -1 & -1 & 0\cr 11 & 5 & 5 & -2 \end{array}\right\rbrack $$(i.e. a matrix) it is natural to use a two-dimensional array $A_{i,j}$ with one index for the rows and one for the columns:
$$ A = \left\lbrack\begin{array}{ccc} A_{0,0} & \cdots & A_{0,n-1}\cr \vdots & \ddots & \vdots\cr A_{m-1,0} & \cdots & A_{m-1,n-1} \end{array}\right\rbrack $$Let's recreate this array using NumPy:
A = zeros((3,4))
A[0,0] = 0
A[1,0] = -1
A[2,0] = 11
A[0,1] = 12
A[1,1] = -1
A[2,1] = 5
A[0,2] = -1
A[1,2] = -1
A[2,2] = 5
# we can also use the same syntax that we used for nested lists
A[0][3] = 5
A[1][3] = 0
A[2][3] = -2
print(A)
[[ 0. 12. -1. 5.] [ -1. -1. -1. 0.] [ 11. 5. 5. -2.]]
Next let's convert a nested list from a previous example into a 2D array:
Cdegrees = range(0, 101, 10)
Fdegrees = [9./5*C + 32 for C in Cdegrees]
table = [[C, F] for C, F in zip(Cdegrees, Fdegrees)]
print(table)
[[0, 32.0], [10, 50.0], [20, 68.0], [30, 86.0], [40, 104.0], [50, 122.0], [60, 140.0], [70, 158.0], [80, 176.0], [90, 194.0], [100, 212.0]]
# Convert this into a NumPy array:
table2 = array(table)
print(table2)
[[ 0. 32.] [ 10. 50.] [ 20. 68.] [ 30. 86.] [ 40. 104.] [ 50. 122.] [ 60. 140.] [ 70. 158.] [ 80. 176.] [ 90. 194.] [ 100. 212.]]
To see the number of elements in each dimension:
print(table2.shape)
(11, 2)
i.e. 11 rows and 2 columns.
Let's write a loop over all array elements of A:
for i in range(table2.shape[0]):
for j in range(table2.shape[1]):
print('table2[%d,%d] = %g' % (i, j, table2[i,j]))
table2[0,0] = 0 table2[0,1] = 32 table2[1,0] = 10 table2[1,1] = 50 table2[2,0] = 20 table2[2,1] = 68 table2[3,0] = 30 table2[3,1] = 86 table2[4,0] = 40 table2[4,1] = 104 table2[5,0] = 50 table2[5,1] = 122 table2[6,0] = 60 table2[6,1] = 140 table2[7,0] = 70 table2[7,1] = 158 table2[8,0] = 80 table2[8,1] = 176 table2[9,0] = 90 table2[9,1] = 194 table2[10,0] = 100 table2[10,1] = 212
Alternatively:
for index_tuple, value in ndenumerate(table2):
print('index %s has value %g' % (index_tuple, table2[index_tuple]))
index (0, 0) has value 0 index (0, 1) has value 32 index (1, 0) has value 10 index (1, 1) has value 50 index (2, 0) has value 20 index (2, 1) has value 68 index (3, 0) has value 30 index (3, 1) has value 86 index (4, 0) has value 40 index (4, 1) has value 104 index (5, 0) has value 50 index (5, 1) has value 122 index (6, 0) has value 60 index (6, 1) has value 140 index (7, 0) has value 70 index (7, 1) has value 158 index (8, 0) has value 80 index (8, 1) has value 176 index (9, 0) has value 90 index (9, 1) has value 194 index (10, 0) has value 100 index (10, 1) has value 212
We can also extract slices from multi-dimensional arrays as before. For example, extract the second column:
print(table2[:, 1]) # 2nd column (index 1)
[ 32. 50. 68. 86. 104. 122. 140. 158. 176. 194. 212.]
Play with this more complicated example:
t = linspace(1, 30, 30).reshape(5, 6)
print(t)
[[ 1. 2. 3. 4. 5. 6.] [ 7. 8. 9. 10. 11. 12.] [ 13. 14. 15. 16. 17. 18.] [ 19. 20. 21. 22. 23. 24.] [ 25. 26. 27. 28. 29. 30.]]
print(t[1:-1:2, 2:])
[[ 9. 10. 11. 12.] [ 21. 22. 23. 24.]]
A matrix $\mathbf{A}$ and a vector $\mathbf{b}$, represented in Python as a 2D array and a 1D array respectively, are given by:
$$ \mathbf{A} = \left\lbrack\begin{array}{ccc} 0 & 12 & -1\cr -1 & -1 & -1\cr 11 & 5 & 5 \end{array}\right\rbrack $$$$ \mathbf{b} = \left\lbrack\begin{array}{c} -2\cr 1\cr 7 \end{array}\right\rbrack $$Multiplying a matrix by a vector results in another vector $\mathbf{c}$, whose components are defined by the general rule
$$\mathbf{c}_i = \sum_j\mathbf{A}_{i,j}\mathbf{b}_j$$Define $\mathbf{A}$ and $\mathbf{b}$ as NumPy arrays, and multiply them together using the above rule.
The file data/precipitation.csv contains monthly precipitation data for South-East England between 1916 and 2016 provided by the Met Office. Each row represents a year, and each column a month. To open the data file, use the command below:
import numpy as np
data = np.loadtxt(fname='data/precipitation.csv', delimiter=',')
Open the data file, and print its content. Observe that it is a multi-dimensional Numpy array. Print its shape. How many years are represented ?
Write a function that converts a length in mm into a length in inches. Use this function to convert the precipitation data into inches in a vectorized way.
With a for
loop, print (in a formatted way) for each month, the year where the monthly precipitation was the highest. A line like:
The maximal precipitation for January is 181.9 mm/month and was reached in 2014.
is expected.
Hint: you should use the max and argmax functions provided by Numpy.