#!/usr/bin/env python
# coding: utf-8
# # Introduction to programming for Geoscientists through Python
# ### [Gerard Gorman](http://www.imperial.ac.uk/people/g.gorman), [Nicolas Barral](http://www.imperial.ac.uk/people/n.barral)
#
# # Lecture 5: Array computing and curve plotting
# Learning objectives:
#
# * Learn how to compute using numpy arrays, *i.e.* vectorise code.
# * Learn how to generate 2D graphs.
# ## Vectors and arrays
#
# You have known **vectors** since high school mathematics, *e.g.*, point $(x,y)$ in the plane, point $(x,y,z)$ in space. In general, we can describe a vector $v$ as an $n$-tuple of numbers: $v=(v_0,\ldots,v_{n-1})$. One way to store vectors in Python is by using *lists*: $v_i$ is stored as *v[i]*.
# **Arrays** are a generalization of vectors where we can have multiple indices: $A_{i,j}$, $A_{i,j,k}$. In Python code this is represented as a nested list (see previous lecture), accessed as *A[i][j]*, *A[i][j][k]*.
#
# Example: table of numbers, one index for the row, one for the column
# $$
# \left\lbrack\begin{array}{cccc}
# 0 & 12 & -1 & 5q
# -1 & -1 & -1 & 0\cr
# 11 & 5 & 5 & -2
# \end{array}\right\rbrack
# \hspace{1cm}
# A =
# \left\lbrack\begin{array}{ccc}
# A_{0,0} & \cdots & A_{0,n-1}\cr
# \vdots & \ddots & \vdots\cr
# A_{m-1,0} & \cdots & A_{m-1,n-1}
# \end{array}\right\rbrack
# $$
# The number of indices in an array is the *rank* or *number of dimensions*. Using these terms, a vector can be described as a one-dimensional array, or rank 1 array.
# In practice, we use [Numerical Python (*NumPy*)](http://www.numpy.org/) arrays instead of lists to represent mathematical arrays because it is **much** faster for large arrays.
#
# Let's consider an example where we store $(x,y)$ points along a curve in Python lists and numpy arrays:
# In[1]:
# Sample function
def f(x):
return x**3
# Generate n points in [0,1]
n = 5
dx = 1.0/(n-1) # x spacing
xlist = [i*dx for i in range(n)] # Python lists
ylist = [f(x) for x in xlist]
# Turn these Python lists into Numerical Python (NumPy) arrays:
from numpy import *
x2 = array(xlist)
y2 = array(ylist)
# Instead of first making lists with $x$ and $y = f (x)$ data, and then turning lists into arrays, we can make NumPy arrays
# directly:
# In[2]:
n = 5 # number of points
x2 = linspace(0, 1, n) # generates n points between 0 and 1
y2 = zeros(n) # n zeros (float data type by default)
for i in range(n):
y2[i] = f(x2[i])
# List comprehensions create lists, not arrays, but we can do:
# In[3]:
y2 = array([f(xi) for xi in x2]) # list -> array
# ### When and where to use NumPy arrays
#
# * Python lists can hold any sequence of any Python objects, however, NumPy arrays can only hold objects of the same type.
# * Arrays are most efficient when the elements are of basic number types (*float*, *int*, *complex*).
# * In that case, arrays are stored efficiently in the computer's memory and we can compute very efficiently with the array elements.
# * Mathematical operations on whole arrays can be done without loops in Python. For example,
# In[4]:
x = linspace(0, 2, 10001)
y = zeros(10001)
for i in range(len(x)):
y[i] = sin(x[i])
# can be coded as
# In[5]:
y = sin(x)
# In the latter case the loop over all elements is now performed in a very efficient C function.
#
# Operations on whole arrays, instead of using Python *for*-loops, is called vectorization and is a very **convenient**, **efficient** and therefore important programming technique to master.
# Let's consider a simple vectorisation example: a loop to compute $x$ coordinates (*x2*) and $y=f(x)$ coordinates (*y2*) along a function curve:
# In[6]:
x2 = linspace(0, 1, n)
y2 = zeros(n)
for i in range(n):
y2[i] = f(x2[i])
# This computation can be replaced by:
# In[7]:
x2 = linspace(0, 1, n)
y2 = f(x2)
# The advantage of this approach is:
#
# * There is no need to allocate space for y2 (via the NumPy *zeros* function).
# * There is no need for a loop.
# * It is *much* faster.
# ## How vectorised functions work
# Consider the function
# In[8]:
def f(x):
return x**3
# $f(x)$ is intended for a number $x$, *i.e.* a *scalar*. So what happens when we call *f(x2)*, where *x2* is an NumPy array? **The function simply evaluates $x^3$ for an array x**. NumPy supports arithmetic operations on arrays, which correspond to the equivalent operations on each element, *e.g.*:
# In[9]:
x**3 # x[i]**3 forr all i
cos(x) # cos(x[i]) for all i
x**3 + x*cos(x) # x[i]**3 + x[i]*cos(x[i]) for all i
x/3*exp(-x*0.5) # x[i]/3*exp(-x[i]*0.5) for all i
# In each of these cases a highly optimised C function is actually called to evaluate the expression. In this example, the *cos* function called for an *array* is imported from *numpy* rather than from the *math* module which only acts on scalars.
#
# Notes:
#
# * Functions that can operate on arrays are called **vectorized functions**.
# * Vectorization is the process of turning a non-vectorized expression/algorithm into a vectorized expression/algorithm.
# * Mathematical functions in Python automatically work for both scalar and array (vector) arguments, *i.e.* no vectorization is needed by the programmer.
#
# ### Watch out for references Vs. copies of arrays!
# Consider this code:
# In[10]:
a=x
a[-1] = 42
print(x[-1])
# Notice what happened here - we changed a value in *a* but the corresponding value in *x* was also changed! This is because *a* refers to the same array as *x*. If you really want a seperate copy of *x* then we have to make an explicit copy:
# In[11]:
a = x.copy()
# ## Exercise 5.1: Fill lists and arrays with function values
# A function with many applications in science is defined as:
# $h(x) = \frac{1}{\sqrt{2\pi}}\exp(-0.5x^2)$
#
# * Fill two lists *xlist* and *hlist* with *x* and *h(x)* values for uniformly spaced *x* coordinates in [−4, 4]. You may adapt the first example in the lecture 4 notes.
#
# * Fill two arrays *x* and *y* with *x* and *h(x)* values, respectively, where *h(x)* is defined above. Let the *x* values be uniformly spaced in [−4, 4]. Use list comprehensions to create the *x* and *y* arrays.
#
# * Vectorize the code by creating the *x* values using the *linspace* function and by evaluating *h(x)* for an array argument.
# In[ ]:
# ## Exercise 5.2: Apply a function to a vector
# Given a vector $v = (2, 3, −1)$ and a function $f(x) = x^3 + xe^x + 1$, apply $f$ to each element in $v$. Then calculate $f(v)$ as $v^3 + ve^v + 1$ using vector computing rules. Show that the two results are equal.
# In[ ]:
# ## Exercise 5.3: Simulate by hand a vectorized expression
# Suppose *x* and *t* are two arrays of the same length, and given the vectorized expression,
#
# ```python
# y = cos(sin(x)) + exp(1/t)
# ```
#
# assuming *x* holds two elements, 0 and 2, and *t* holds the elements 1 and 1.5, calculate by hand (using a calculator) the *y* array.
#
# Write a program that mimics the series of computations you did by hand (use explicit loops, but at the end you can use NumPy functionality to check the results).
# In[ ]:
# ## Generalised array indexing
# We can select a slice of an array using *a[start:stop:inc]*, where the slice *start:stop:inc* implies a set of indices starting from *start*, up to *stop* in increments of *inc*. In fact, any integer list or array can be used to indicate a set of indices:
# In[12]:
a = linspace(1, 8, 8)
print(a)
# In[13]:
a[[1,6,7]] = 10 # i.e. set the elements with indicies 1,6, and 7 in the list to 10.
print(a)
# In[14]:
a[range(2,8,3)] = -2 # same as a[2:8:3] = -2
print(a)
# Even boolean expressions can also be used to select part of an array(!)
# In[15]:
print(a[a < 0]) # pick out all negative elements
# In[16]:
a[a < 0] = a.max() # if a[i]<0, set a[i]=10
print(a)
# ## Exercise 5.4: Demonstrate array slicing
# Create an array *w* with values 0, 0.1, 0.2, ..., 3. Write out *w[:]*, *w[:-2]*, *w[::5]*, *w[2:-2:6]*. Convince yourself in each case that you understand which elements of the array are printed.
# In[ ]:
# ## Plotting curves - the basics
# First of all, a little house keeping. There are quite a few ways of plotting graphs etc. in Python. Currently the best way is using [PyLab](http://wiki.scipy.org/PyLab). If you are someone who likes the details then please go to the [PyLab](http://wiki.scipy.org/PyLab) website and read. Secondly, because we are doing this within IPython NoteBook, and we do not want additional windows popping up all over the place, we execute this next line:
# In[17]:
get_ipython().run_line_magic('pylab', 'inline')
# Now, onwards and upwards...
#
# A curve $y = f(x)$ stored in the 1D NumPy arrays *x* and *y* can easily be plotted:
# In[18]:
t = linspace(0, 3, 51)
y = t**2*exp(-t**2)
plot(t, y)
show()
# Plots also should have **labels** on the axis, a **title**, and sometimes a specific extent of the axis (perhaps you wish to easily compare two graphs side-by-side):
# In[19]:
def f(t):
return t**2*exp(-t**2)
t = linspace(0, 3, 51) # Generates 51 points between 0 and 3
y = f(t)
plot(t, y)
xlabel('t')
ylabel('y')
legend(('t^2*exp(-t^2)',))
axis([0, 3, -0.05, 0.6]) # specify the extent of the axes [tmin, tmax, ymin, ymax]
title('My second PyLab graph')
show()
# ## Exercise 5.5: Plot a formula
# * Make a plot of the function $y(t) = v_0t − 0.5gt^2$ for $v_0 = 10$, $g = 9.81$, and $t \in [0, 2v_0/g]$. The label on the *x* axis should be 'time (s)' and the label on the *y* axis should be 'height (m)'.
# * Extend the program such that the minimum and maximum *x* and *y* values are computed, and use the extreme values to specify the extent of the *x* and *y* axes. Add some space above the heighest curve.
# In[ ]:
# ## Exercise 5.6: Plot another formula
# The function
# $f(x, t) = \exp(-(x - 3t)^2)\sin(3\pi(x - t))$
#
# describes, for a fixed value of *t*, a wave localized in space. Make a program that visualizes this function as a function of *x* on the interval [−4, 4] when *t* = 0.
# In[ ]:
# ## Multiple curves in one plot
# We can also plot several curves in one plot:
# In[20]:
def f1(t):
return t**2*exp(-t**2)
def f2(t):
return t**2*f1(t)
t = linspace(0, 3, 51)
y1 = f1(t)
y2 = f2(t)
# Matlab-style syntax:
plots = plot(t, y1, t, y2)
legend(plots, ('t^4*exp(-t^2)', 't^4*exp(-t^2)'), loc='best')
xlabel('t')
ylabel('y')
title('Plotting two curves in the same plot')
show()
# When plotting multiple curves in the same plot, PyLab usually does a good job in making sure that the different lines actually look different. However, sometimes you need to take action yourself (*e.g.* if you need to print your graph out in black&white). To do this we can add an extra argument to the plot command where we specify what we want - *e.g.* "r-" means a *red line*, while "bo" means *blue circles*:
# In[21]:
plot(t, y1, 'r-', t, y2, 'bo')
show()
# For further examples check out the [PyLab website](http://scipy.org/PyLab).
# ## Exercise 5.7: Plot a formula for several parameters
# - Make a program in which you define a set of $v_0$ values and plot the corresponding curves $y(t) = v_0t − 0.5gt^2$ in the same figure (set $g = 9.81$). Let $t \in [0, 2v_0/g$] for each curve, which implies that you need a different vector of $t$ coordinates for each curve.
# - Make a similar program where the values of $v_0$ are given by the user through an iPython widget as a series of numbers separated by a comma.
# In[ ]:
# ## 2D arrays
# When we have a table of numbers,
#
# $$
# \left\lbrack\begin{array}{cccc}
# 0 & 12 & -1 & 5\cr
# -1 & -1 & -1 & 0\cr
# 11 & 5 & 5 & -2
# \end{array}\right\rbrack
# $$
#
# (*i.e.* a *matrix*) it is natural to use a two-dimensional array $A_{i,j}$ with one index for the rows and one for the columns:
#
# $$
# A =
# \left\lbrack\begin{array}{ccc}
# A_{0,0} & \cdots & A_{0,n-1}\cr
# \vdots & \ddots & \vdots\cr
# A_{m-1,0} & \cdots & A_{m-1,n-1}
# \end{array}\right\rbrack
# $$
#
# Let's recreate this array using NumPy:
# In[22]:
A = zeros((3,4))
A[0,0] = 0
A[1,0] = -1
A[2,0] = 11
A[0,1] = 12
A[1,1] = -1
A[2,1] = 5
A[0,2] = -1
A[1,2] = -1
A[2,2] = 5
# we can also use the same syntax that we used for nested lists
A[0][3] = 5
A[1][3] = 0
A[2][3] = -2
print(A)
# Next let's convert a nested list from a previous example into a 2D array:
# In[23]:
Cdegrees = range(0, 101, 10)
Fdegrees = [9./5*C + 32 for C in Cdegrees]
table = [[C, F] for C, F in zip(Cdegrees, Fdegrees)]
print(table)
# In[24]:
# Convert this into a NumPy array:
table2 = array(table)
print(table2)
# To see the number of elements in each dimension:
# In[25]:
print(table2.shape)
# *i.e.* 11 rows and 2 columns.
# Let's write a loop over all array elements of A:
# In[26]:
for i in range(table2.shape[0]):
for j in range(table2.shape[1]):
print('table2[%d,%d] = %g' % (i, j, table2[i,j]))
# Alternatively:
# In[27]:
for index_tuple, value in ndenumerate(table2):
print('index %s has value %g' % (index_tuple, table2[index_tuple]))
# We can also extract slices from multi-dimensional arrays as before. For example, extract the second column:
# In[28]:
print(table2[:, 1]) # 2nd column (index 1)
# Play with this more complicated example:
# In[29]:
t = linspace(1, 30, 30).reshape(5, 6)
print(t)
# In[30]:
print(t[1:-1:2, 2:])
# ## Exercise 5.8: Implement matrix-vector multiplication
# A matrix $\mathbf{A}$ and a vector $\mathbf{b}$, represented in Python as a 2D array and a 1D array respectively, are given by:
#
# $$
# \mathbf{A} = \left\lbrack\begin{array}{ccc}
# 0 & 12 & -1\cr
# -1 & -1 & -1\cr
# 11 & 5 & 5
# \end{array}\right\rbrack
# $$
#
# $$
# \mathbf{b} = \left\lbrack\begin{array}{c}
# -2\cr
# 1\cr
# 7
# \end{array}\right\rbrack
# $$
#
# Multiplying a matrix by a vector results in another vector $\mathbf{c}$, whose components are defined by the general rule
#
# $$\mathbf{c}_i = \sum_j\mathbf{A}_{i,j}\mathbf{b}_j$$
#
# Define $\mathbf{A}$ and $\mathbf{b}$ as NumPy arrays, and multiply them together using the above rule.
# In[ ]:
# ## Exercise 5.9: Plot meteorological data
# The file data/precipitation.csv contains monthly precipitation data for South-East England between 1916 and 2016 provided by the [Met Office](https://www.metoffice.gov.uk/hadobs/hadukp/data/download.html). Each row represents a year, and each column a month.
# To open the data file, use the command below:
#
# ```python
# import numpy as np
# data = np.loadtxt(fname='data/precipitation.csv', delimiter=',')
# ```
#
# - Open the data file, and print its content. Observe that it is a multi-dimensional Numpy array. Print its shape. How many years are represented ?
#
# - Write a function that converts a length in mm into a length in inches. Use this function to convert the precipitation data into inches in a vectorized way.
#
# - With a `for` loop, print (in a formatted way) for each month, the year where the monthly precipitation was the highest. A line like:
# `The maximal precipitation for January is 181.9 mm/month and was reached in 2014.`
# is expected.
# Hint: you should use the [max](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.amax.html) and [argmax](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.argmax.html) functions provided by Numpy.
#
# - Plot the [average](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.mean.html), [maximal](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.amax.html) and [minimal](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.amin.html) monthly precipitation with respect to the year. Don't forget to decorate the plot properly (add a plot title, legends and axis labels).
#
# - Plot the same quantities, but only for the last twenty years.
#
# - Plot the same quantities, but only for years ending in 6.
# In[ ]: