#!/usr/bin/env python # coding: utf-8 # # Introduction to programming for Geoscientists through Python # ### [Gerard Gorman](http://www.imperial.ac.uk/people/g.gorman), [Nicolas Barral](http://www.imperial.ac.uk/people/n.barral) # # # Lecture 5: Array computing and curve plotting # Learning objectives: # # * Learn how to compute using numpy arrays, *i.e.* vectorise code. # * Learn how to generate 2D graphs. # ## Vectors and arrays # # You have known **vectors** since high school mathematics, *e.g.*, point $(x,y)$ in the plane, point $(x,y,z)$ in space. In general, we can describe a vector $v$ as an $n$-tuple of numbers: $v=(v_0,\ldots,v_{n-1})$. One way to store vectors in Python is by using *lists*: $v_i$ is stored as *v[i]*. # **Arrays** are a generalization of vectors where we can have multiple indices: $A_{i,j}$, $A_{i,j,k}$. In Python code this is represented as a nested list (see previous lecture), accessed as *A[i][j]*, *A[i][j][k]*. # # Example: table of numbers, one index for the row, one for the column # $$ # \left\lbrack\begin{array}{cccc} # 0 & 12 & -1 & 5q # -1 & -1 & -1 & 0\cr # 11 & 5 & 5 & -2 # \end{array}\right\rbrack # \hspace{1cm} # A = # \left\lbrack\begin{array}{ccc} # A_{0,0} & \cdots & A_{0,n-1}\cr # \vdots & \ddots & \vdots\cr # A_{m-1,0} & \cdots & A_{m-1,n-1} # \end{array}\right\rbrack # $$ # The number of indices in an array is the *rank* or *number of dimensions*. Using these terms, a vector can be described as a one-dimensional array, or rank 1 array. # In practice, we use [Numerical Python (*NumPy*)](http://www.numpy.org/) arrays instead of lists to represent mathematical arrays because it is **much** faster for large arrays. # # Let's consider an example where we store $(x,y)$ points along a curve in Python lists and numpy arrays: # In[1]: # Sample function def f(x): return x**3 # Generate n points in [0,1] n = 5 dx = 1.0/(n-1) # x spacing xlist = [i*dx for i in range(n)] # Python lists ylist = [f(x) for x in xlist] # Turn these Python lists into Numerical Python (NumPy) arrays: from numpy import * x2 = array(xlist) y2 = array(ylist) # Instead of first making lists with $x$ and $y = f (x)$ data, and then turning lists into arrays, we can make NumPy arrays # directly: # In[2]: n = 5 # number of points x2 = linspace(0, 1, n) # generates n points between 0 and 1 y2 = zeros(n) # n zeros (float data type by default) for i in range(n): y2[i] = f(x2[i]) # List comprehensions create lists, not arrays, but we can do: # In[3]: y2 = array([f(xi) for xi in x2]) # list -> array # ### When and where to use NumPy arrays # # * Python lists can hold any sequence of any Python objects, however, NumPy arrays can only hold objects of the same type. # * Arrays are most efficient when the elements are of basic number types (*float*, *int*, *complex*). # * In that case, arrays are stored efficiently in the computer's memory and we can compute very efficiently with the array elements. # * Mathematical operations on whole arrays can be done without loops in Python. For example, # In[4]: x = linspace(0, 2, 10001) y = zeros(10001) for i in range(len(x)): y[i] = sin(x[i]) # can be coded as # In[5]: y = sin(x) # In the latter case the loop over all elements is now performed in a very efficient C function. # # Operations on whole arrays, instead of using Python *for*-loops, is called vectorization and is a very **convenient**, **efficient** and therefore important programming technique to master. # Let's consider a simple vectorisation example: a loop to compute $x$ coordinates (*x2*) and $y=f(x)$ coordinates (*y2*) along a function curve: # In[6]: x2 = linspace(0, 1, n) y2 = zeros(n) for i in range(n): y2[i] = f(x2[i]) # This computation can be replaced by: # In[7]: x2 = linspace(0, 1, n) y2 = f(x2) # The advantage of this approach is: # # * There is no need to allocate space for y2 (via the NumPy *zeros* function). # * There is no need for a loop. # * It is *much* faster. # ## How vectorised functions work # Consider the function # In[8]: def f(x): return x**3 # $f(x)$ is intended for a number $x$, *i.e.* a *scalar*. So what happens when we call *f(x2)*, where *x2* is an NumPy array? **The function simply evaluates $x^3$ for an array x**. NumPy supports arithmetic operations on arrays, which correspond to the equivalent operations on each element, *e.g.*: # In[9]: x**3 # x[i]**3 forr all i cos(x) # cos(x[i]) for all i x**3 + x*cos(x) # x[i]**3 + x[i]*cos(x[i]) for all i x/3*exp(-x*0.5) # x[i]/3*exp(-x[i]*0.5) for all i # In each of these cases a highly optimised C function is actually called to evaluate the expression. In this example, the *cos* function called for an *array* is imported from *numpy* rather than from the *math* module which only acts on scalars. # # Notes: # # * Functions that can operate on arrays are called **vectorized functions**. # * Vectorization is the process of turning a non-vectorized expression/algorithm into a vectorized expression/algorithm. # * Mathematical functions in Python automatically work for both scalar and array (vector) arguments, *i.e.* no vectorization is needed by the programmer. # # ### Watch out for references Vs. copies of arrays! # Consider this code: # In[10]: a=x a[-1] = 42 print(x[-1]) # Notice what happened here - we changed a value in *a* but the corresponding value in *x* was also changed! This is because *a* refers to the same array as *x*. If you really want a seperate copy of *x* then we have to make an explicit copy: # In[11]: a = x.copy() # ## Exercise 5.1: Fill lists and arrays with function values # A function with many applications in science is defined as:

# $h(x) = \frac{1}{\sqrt{2\pi}}\exp(-0.5x^2)$

# # * Fill two lists *xlist* and *hlist* with *x* and *h(x)* values for uniformly spaced *x* coordinates in [−4, 4]. You may adapt the first example in the lecture 4 notes. # # * Fill two arrays *x* and *y* with *x* and *h(x)* values, respectively, where *h(x)* is defined above. Let the *x* values be uniformly spaced in [−4, 4]. Use list comprehensions to create the *x* and *y* arrays. # # * Vectorize the code by creating the *x* values using the *linspace* function and by evaluating *h(x)* for an array argument. # In[ ]: # ## Exercise 5.2: Apply a function to a vector # Given a vector $v = (2, 3, −1)$ and a function $f(x) = x^3 + xe^x + 1$, apply $f$ to each element in $v$. Then calculate $f(v)$ as $v^3 + ve^v + 1$ using vector computing rules. Show that the two results are equal. # In[ ]: # ## Exercise 5.3: Simulate by hand a vectorized expression # Suppose *x* and *t* are two arrays of the same length, and given the vectorized expression, # # ```python # y = cos(sin(x)) + exp(1/t) # ``` # # assuming *x* holds two elements, 0 and 2, and *t* holds the elements 1 and 1.5, calculate by hand (using a calculator) the *y* array. # # Write a program that mimics the series of computations you did by hand (use explicit loops, but at the end you can use NumPy functionality to check the results). # In[ ]: # ## Generalised array indexing # We can select a slice of an array using *a[start:stop:inc]*, where the slice *start:stop:inc* implies a set of indices starting from *start*, up to *stop* in increments of *inc*. In fact, any integer list or array can be used to indicate a set of indices: # In[12]: a = linspace(1, 8, 8) print(a) # In[13]: a[[1,6,7]] = 10 # i.e. set the elements with indicies 1,6, and 7 in the list to 10. print(a) # In[14]: a[range(2,8,3)] = -2 # same as a[2:8:3] = -2 print(a) # Even boolean expressions can also be used to select part of an array(!) # In[15]: print(a[a < 0]) # pick out all negative elements # In[16]: a[a < 0] = a.max() # if a[i]<0, set a[i]=10 print(a) # ## Exercise 5.4: Demonstrate array slicing # Create an array *w* with values 0, 0.1, 0.2, ..., 3. Write out *w[:]*, *w[:-2]*, *w[::5]*, *w[2:-2:6]*. Convince yourself in each case that you understand which elements of the array are printed. # In[ ]: # ## Plotting curves - the basics # First of all, a little house keeping. There are quite a few ways of plotting graphs etc. in Python. Currently the best way is using [PyLab](http://wiki.scipy.org/PyLab). If you are someone who likes the details then please go to the [PyLab](http://wiki.scipy.org/PyLab) website and read. Secondly, because we are doing this within IPython NoteBook, and we do not want additional windows popping up all over the place, we execute this next line: # In[17]: get_ipython().run_line_magic('pylab', 'inline') # Now, onwards and upwards... # # A curve $y = f(x)$ stored in the 1D NumPy arrays *x* and *y* can easily be plotted: # In[18]: t = linspace(0, 3, 51) y = t**2*exp(-t**2) plot(t, y) show() # Plots also should have **labels** on the axis, a **title**, and sometimes a specific extent of the axis (perhaps you wish to easily compare two graphs side-by-side): # In[19]: def f(t): return t**2*exp(-t**2) t = linspace(0, 3, 51) # Generates 51 points between 0 and 3 y = f(t) plot(t, y) xlabel('t') ylabel('y') legend(('t^2*exp(-t^2)',)) axis([0, 3, -0.05, 0.6]) # specify the extent of the axes [tmin, tmax, ymin, ymax] title('My second PyLab graph') show() # ## Exercise 5.5: Plot a formula # * Make a plot of the function $y(t) = v_0t − 0.5gt^2$ for $v_0 = 10$, $g = 9.81$, and $t \in [0, 2v_0/g]$. The label on the *x* axis should be 'time (s)' and the label on the *y* axis should be 'height (m)'. # * Extend the program such that the minimum and maximum *x* and *y* values are computed, and use the extreme values to specify the extent of the *x* and *y* axes. Add some space above the heighest curve. # In[ ]: # ## Exercise 5.6: Plot another formula # The function

# $f(x, t) = \exp(-(x - 3t)^2)\sin(3\pi(x - t))$ #

# describes, for a fixed value of *t*, a wave localized in space. Make a program that visualizes this function as a function of *x* on the interval [−4, 4] when *t* = 0. # In[ ]: # ## Multiple curves in one plot # We can also plot several curves in one plot: # In[20]: def f1(t): return t**2*exp(-t**2) def f2(t): return t**2*f1(t) t = linspace(0, 3, 51) y1 = f1(t) y2 = f2(t) # Matlab-style syntax: plots = plot(t, y1, t, y2) legend(plots, ('t^4*exp(-t^2)', 't^4*exp(-t^2)'), loc='best') xlabel('t') ylabel('y') title('Plotting two curves in the same plot') show() # When plotting multiple curves in the same plot, PyLab usually does a good job in making sure that the different lines actually look different. However, sometimes you need to take action yourself (*e.g.* if you need to print your graph out in black&white). To do this we can add an extra argument to the plot command where we specify what we want - *e.g.* "r-" means a *red line*, while "bo" means *blue circles*: # In[21]: plot(t, y1, 'r-', t, y2, 'bo') show() # For further examples check out the [PyLab website](http://scipy.org/PyLab). # ## Exercise 5.7: Plot a formula for several parameters # - Make a program in which you define a set of $v_0$ values and plot the corresponding curves $y(t) = v_0t − 0.5gt^2$ in the same figure (set $g = 9.81$). Let $t \in [0, 2v_0/g$] for each curve, which implies that you need a different vector of $t$ coordinates for each curve. # - Make a similar program where the values of $v_0$ are given by the user through an iPython widget as a series of numbers separated by a comma. # In[ ]: # ## 2D arrays # When we have a table of numbers, # # $$ # \left\lbrack\begin{array}{cccc} # 0 & 12 & -1 & 5\cr # -1 & -1 & -1 & 0\cr # 11 & 5 & 5 & -2 # \end{array}\right\rbrack # $$ # # (*i.e.* a *matrix*) it is natural to use a two-dimensional array $A_{i,j}$ with one index for the rows and one for the columns: # # $$ # A = # \left\lbrack\begin{array}{ccc} # A_{0,0} & \cdots & A_{0,n-1}\cr # \vdots & \ddots & \vdots\cr # A_{m-1,0} & \cdots & A_{m-1,n-1} # \end{array}\right\rbrack # $$ # # Let's recreate this array using NumPy: # In[22]: A = zeros((3,4)) A[0,0] = 0 A[1,0] = -1 A[2,0] = 11 A[0,1] = 12 A[1,1] = -1 A[2,1] = 5 A[0,2] = -1 A[1,2] = -1 A[2,2] = 5 # we can also use the same syntax that we used for nested lists A[0][3] = 5 A[1][3] = 0 A[2][3] = -2 print(A) # Next let's convert a nested list from a previous example into a 2D array: # In[23]: Cdegrees = range(0, 101, 10) Fdegrees = [9./5*C + 32 for C in Cdegrees] table = [[C, F] for C, F in zip(Cdegrees, Fdegrees)] print(table) # In[24]: # Convert this into a NumPy array: table2 = array(table) print(table2) # To see the number of elements in each dimension: # In[25]: print(table2.shape) # *i.e.* 11 rows and 2 columns. # Let's write a loop over all array elements of A: # In[26]: for i in range(table2.shape[0]): for j in range(table2.shape[1]): print('table2[%d,%d] = %g' % (i, j, table2[i,j])) # Alternatively: # In[27]: for index_tuple, value in ndenumerate(table2): print('index %s has value %g' % (index_tuple, table2[index_tuple])) # We can also extract slices from multi-dimensional arrays as before. For example, extract the second column: # In[28]: print(table2[:, 1]) # 2nd column (index 1) # Play with this more complicated example: # In[29]: t = linspace(1, 30, 30).reshape(5, 6) print(t) # In[30]: print(t[1:-1:2, 2:]) # ## Exercise 5.8: Implement matrix-vector multiplication # A matrix $\mathbf{A}$ and a vector $\mathbf{b}$, represented in Python as a 2D array and a 1D array respectively, are given by: # # $$ # \mathbf{A} = \left\lbrack\begin{array}{ccc} # 0 & 12 & -1\cr # -1 & -1 & -1\cr # 11 & 5 & 5 # \end{array}\right\rbrack # $$ # # $$ # \mathbf{b} = \left\lbrack\begin{array}{c} # -2\cr # 1\cr # 7 # \end{array}\right\rbrack # $$ # # Multiplying a matrix by a vector results in another vector $\mathbf{c}$, whose components are defined by the general rule # # $$\mathbf{c}_i = \sum_j\mathbf{A}_{i,j}\mathbf{b}_j$$ # # Define $\mathbf{A}$ and $\mathbf{b}$ as NumPy arrays, and multiply them together using the above rule. # In[ ]: # ## Exercise 5.9: Plot meteorological data # The file data/precipitation.csv contains monthly precipitation data for South-East England between 1916 and 2016 provided by the [Met Office](https://www.metoffice.gov.uk/hadobs/hadukp/data/download.html). Each row represents a year, and each column a month. # To open the data file, use the command below: # # ```python # import numpy as np # data = np.loadtxt(fname='data/precipitation.csv', delimiter=',') # ``` # # - Open the data file, and print its content. Observe that it is a multi-dimensional Numpy array. Print its shape. How many years are represented ? # # - Write a function that converts a length in mm into a length in inches. Use this function to convert the precipitation data into inches in a vectorized way. # # - With a `for` loop, print (in a formatted way) for each month, the year where the monthly precipitation was the highest. A line like:
# `The maximal precipitation for January is 181.9 mm/month and was reached in 2014.`
# is expected.
# Hint: you should use the [max](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.amax.html) and [argmax](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.argmax.html) functions provided by Numpy. # # - Plot the [average](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.mean.html), [maximal](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.amax.html) and [minimal](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.amin.html) monthly precipitation with respect to the year. Don't forget to decorate the plot properly (add a plot title, legends and axis labels). # # - Plot the same quantities, but only for the last twenty years. # # - Plot the same quantities, but only for years ending in 6. # In[ ]: