#!/usr/bin/env python
# coding: utf-8

# # Introduction to programming for Geoscientists through Python
# ### [Gerard Gorman](http://www.imperial.ac.uk/people/g.gorman),  [Nicolas Barral](http://www.imperial.ac.uk/people/n.barral)
# 
# # Lecture 5: Array computing and curve plotting

# Learning objectives: 
# 
# * Learn how to compute using numpy arrays, *i.e.* vectorise code.
# * Learn how to generate 2D graphs.

# ## Vectors and arrays
# 
# You have known **vectors** since high school mathematics, *e.g.*, point $(x,y)$ in the plane, point $(x,y,z)$ in space. In general, we can describe a vector $v$ as an $n$-tuple of numbers: $v=(v_0,\ldots,v_{n-1})$. One way to store vectors in Python is by using *lists*: $v_i$ is stored as *v[i]*.

# **Arrays** are a generalization of vectors where we can have multiple indices: $A_{i,j}$, $A_{i,j,k}$. In Python code this is represented as a nested list (see previous lecture), accessed as *A[i][j]*, *A[i][j][k]*.
# 
# Example: table of numbers, one index for the row, one for the column
# $$
# \left\lbrack\begin{array}{cccc}
# 0 & 12 & -1 & 5q
# -1 & -1 & -1 & 0\cr
# 11 & 5 & 5 & -2
# \end{array}\right\rbrack
# \hspace{1cm}
# A =
# \left\lbrack\begin{array}{ccc}
# A_{0,0} & \cdots &  A_{0,n-1}\cr
# \vdots & \ddots &  \vdots\cr
# A_{m-1,0} & \cdots & A_{m-1,n-1}
# \end{array}\right\rbrack
# $$
# The number of indices in an array is the *rank* or *number of dimensions*. Using these terms, a vector can be described as a one-dimensional array, or rank 1 array.

# In practice, we use [Numerical Python (*NumPy*)](http://www.numpy.org/) arrays instead of lists to represent mathematical arrays because it is **much** faster for large arrays.
# 
# Let's consider an example where we store $(x,y)$ points along a curve in Python lists and numpy arrays:

# In[1]:


# Sample function
def f(x):
    return x**3

# Generate n points in [0,1]
n = 5
dx = 1.0/(n-1) # x spacing

xlist = [i*dx for i in range(n)] # Python lists
ylist = [f(x) for x in xlist]

# Turn these Python lists into Numerical Python (NumPy) arrays:
from numpy import *
x2 = array(xlist)
y2 = array(ylist)


# Instead of first making lists with $x$ and $y = f (x)$ data, and then turning lists into arrays, we can make NumPy arrays
# directly:

# In[2]:


n = 5                     # number of points
x2 = linspace(0, 1, n)    # generates n points between 0 and 1
y2 = zeros(n)             # n zeros (float data type by default)
for i in range(n):     
    y2[i] = f(x2[i])


# List comprehensions create lists, not arrays, but we can do:

# In[3]:


y2 = array([f(xi) for xi in x2]) # list -> array


# ### When and where to use NumPy arrays
# 
# * Python lists can hold any sequence of any Python objects, however, NumPy arrays can only hold objects of the same type.
# * Arrays are most efficient when the elements are of basic number types (*float*, *int*, *complex*).
# * In that case, arrays are stored efficiently in the computer's memory and we can compute very efficiently with the array elements.
# * Mathematical operations on whole arrays can be done without loops in Python. For example,

# In[4]:


x = linspace(0, 2, 10001)
y = zeros(10001)
for i in range(len(x)):
    y[i] = sin(x[i])


# can be coded as

# In[5]:


y = sin(x)


# In the latter case the loop over all elements is now performed in a very efficient C function.
# 
# Operations on whole arrays, instead of using Python *for*-loops, is called vectorization and is a very **convenient**, **efficient** and therefore important programming technique to master.

# Let's consider a simple vectorisation example: a loop to compute $x$ coordinates (*x2*) and $y=f(x)$ coordinates (*y2*) along a function curve:

# In[6]:


x2 = linspace(0, 1, n)
y2 = zeros(n)
for i in range(n):
    y2[i] = f(x2[i])


# This computation can be replaced by:

# In[7]:


x2 = linspace(0, 1, n)
y2 = f(x2)


# The advantage of this approach is:
# 
# * There is no need to allocate space for y2 (via the NumPy *zeros* function).
# * There is no need for a loop.
# * It is *much* faster.

# ## How vectorised functions work
# Consider the function

# In[8]:


def f(x):
    return x**3


# $f(x)$ is intended for a number $x$, *i.e.* a *scalar*. So what happens when we call *f(x2)*, where *x2* is an NumPy array? **The function simply evaluates $x^3$ for an array x**. NumPy supports arithmetic operations on arrays, which correspond to the equivalent operations on each element, *e.g.*:

# In[9]:


x**3                # x[i]**3 forr all i
cos(x)              # cos(x[i]) for all i
x**3 + x*cos(x)     # x[i]**3 + x[i]*cos(x[i]) for all i
x/3*exp(-x*0.5)     # x[i]/3*exp(-x[i]*0.5) for all i 


# In each of these cases a highly optimised C function is actually called to evaluate the expression. In this example, the *cos* function called for an *array* is imported from *numpy* rather than from the *math* module which only acts on scalars.
# 
# Notes:
# 
# * Functions that can operate on arrays are called **vectorized functions**.
# * Vectorization is the process of turning a non-vectorized expression/algorithm into a vectorized expression/algorithm.
# * Mathematical functions in Python automatically work for both scalar and array (vector) arguments, *i.e.* no vectorization is needed by the programmer.
# 

# ### Watch out for references Vs. copies of arrays!
# Consider this code:

# In[10]:


a=x
a[-1] = 42
print(x[-1])


# Notice what happened here - we changed a value in *a* but the corresponding value in *x* was also changed! This is because *a* refers to the same array as *x*. If you really want a seperate copy of *x* then we have to make an explicit copy:

# In[11]:


a = x.copy()


# ## <span style="color:blue">Exercise 5.1: Fill lists and arrays with function values</span>
# A function with many applications in science is defined as:</br></br>
# $h(x) = \frac{1}{\sqrt{2\pi}}\exp(-0.5x^2)$</br></br>
# 
# * Fill two lists *xlist* and *hlist* with *x* and *h(x)* values for uniformly spaced *x* coordinates in [−4, 4]. You may adapt the first example in the lecture 4 notes.
# 
# * Fill two arrays *x* and *y* with *x* and *h(x)* values, respectively, where *h(x)* is defined above. Let the *x* values be uniformly spaced in [−4, 4]. Use list comprehensions to create the *x* and *y* arrays.
# 
# * Vectorize the code by creating the *x* values using the *linspace* function and by evaluating *h(x)* for an array argument.

# In[ ]:


# ## <span style="color:blue">Exercise 5.2: Apply a function to a vector</span>
# Given a vector $v = (2, 3, −1)$ and a function $f(x) = x^3 + xe^x + 1$, apply $f$ to each element in $v$. Then calculate $f(v)$ as $v^3 + ve^v + 1$ using vector computing rules. Show that the two results are equal.

# In[ ]:


# ## <span style="color:blue">Exercise 5.3: Simulate by hand a vectorized expression</span>
# Suppose *x* and *t* are two arrays of the same length, and given the vectorized expression,
# 
# ```python
# y = cos(sin(x)) + exp(1/t)
# ```
# 
# assuming *x* holds two elements, 0 and 2, and *t* holds the elements 1 and 1.5, calculate by hand (using a calculator) the *y* array.
# 
# Write a program that mimics the series of computations you did by hand (use explicit loops, but at the end you can use NumPy functionality to check the results).

# In[ ]:


# ## Generalised array indexing
# We can select a slice of an array using *a[start:stop:inc]*, where the slice *start:stop:inc* implies a set of indices starting from *start*, up to *stop* in increments of *inc*. In fact, any integer list or array can be used to indicate a set of indices:

# In[12]:


a = linspace(1, 8, 8)
print(a)


# In[13]:


a[[1,6,7]] = 10 # i.e. set the elements with indicies 1,6, and 7 in the list to 10.
print(a)


# In[14]:


a[range(2,8,3)] = -2   # same as a[2:8:3] = -2
print(a)


# Even boolean expressions can also be used to select part of an array(!)

# In[15]:


print(a[a < 0]) # pick out all negative elements


# In[16]:


a[a < 0] = a.max() # if a[i]<0, set a[i]=10
print(a)


# ## <span style="color:blue">Exercise 5.4: Demonstrate array slicing</span>
# Create an array *w* with values 0, 0.1, 0.2, ..., 3. Write out *w[:]*, *w[:-2]*, *w[::5]*, *w[2:-2:6]*. Convince yourself in each case that you understand which elements of the array are printed.

# In[ ]:


# ## Plotting curves - the basics
# First of all, a little house keeping. There are quite a few ways of plotting graphs etc. in Python. Currently the best way is using [PyLab](http://wiki.scipy.org/PyLab). If you are someone who likes the details then please go to the [PyLab](http://wiki.scipy.org/PyLab) website and read. Secondly, because we are doing this within IPython NoteBook, and we do not want additional windows popping up all over the place, we execute this next line:

# In[17]:


get_ipython().run_line_magic('pylab', 'inline')


# Now, onwards and upwards...
# 
# A curve $y = f(x)$ stored in the 1D NumPy arrays *x* and *y* can easily be plotted:

# In[18]:


t = linspace(0, 3, 51)
y = t**2*exp(-t**2)
plot(t, y)
show()


# Plots also should have **labels** on the axis, a **title**, and sometimes a specific extent of the axis (perhaps you wish to easily compare two graphs side-by-side):

# In[19]:


def f(t):
    return t**2*exp(-t**2)

t = linspace(0, 3, 51) # Generates 51 points between 0 and 3
y = f(t)
plot(t, y)

xlabel('t')
ylabel('y')
legend(('t^2*exp(-t^2)',))
axis([0, 3, -0.05, 0.6]) # specify the extent of the axes [tmin, tmax, ymin, ymax]

title('My second PyLab graph')
show()


# ## <span style="color:blue">Exercise 5.5: Plot a formula</span>
# * Make a plot of the function $y(t) = v_0t − 0.5gt^2$ for $v_0 = 10$, $g = 9.81$, and $t \in [0, 2v_0/g]$. The label on the *x* axis should be 'time (s)' and the label on the *y* axis should be 'height (m)'.
# * Extend the program such that the minimum and maximum *x* and *y* values are computed, and use the extreme values to specify the extent of the *x* and *y* axes. Add some space above the heighest curve.

# In[ ]:


# ## <span style="color:blue">Exercise 5.6: Plot another formula</span>
# The function</br></br>
# $f(x, t) = \exp(-(x - 3t)^2)\sin(3\pi(x - t))$
# </br></br>
# describes, for a fixed value of *t*, a wave localized in space. Make a program that visualizes this function as a function of *x* on the interval [−4, 4] when *t* = 0.

# In[ ]:


# ## Multiple curves in one plot
# We can also plot several curves in one plot:

# In[20]:


def f1(t):
    return t**2*exp(-t**2)

def f2(t):
    return t**2*f1(t)

t = linspace(0, 3, 51)
y1 = f1(t)
y2 = f2(t)

# Matlab-style syntax:
plots = plot(t, y1, t, y2)
legend(plots, ('t^4*exp(-t^2)', 't^4*exp(-t^2)'), loc='best')
xlabel('t')
ylabel('y')
title('Plotting two curves in the same plot')
show()


# When plotting multiple curves in the same plot, PyLab usually does a good job in making sure that the different lines actually look different. However, sometimes you need to take action yourself (*e.g.* if you need to print your graph out in black&white). To do this we can add an extra argument to the plot command where we specify what we want - *e.g.* "r-" means a *red line*, while "bo" means *blue circles*:

# In[21]:


plot(t, y1, 'r-', t, y2, 'bo')
show()


# For further examples check out the [PyLab website](http://scipy.org/PyLab).

# ## <span style="color:blue">Exercise 5.7: Plot a formula for several parameters</span>
#   - Make a program in which you define a set of $v_0$ values and plot the corresponding curves $y(t) = v_0t − 0.5gt^2$ in the same figure (set $g = 9.81$). Let $t \in [0, 2v_0/g$] for each curve, which implies that you need a different vector of $t$ coordinates for each curve.
#   - Make a similar program where the values of $v_0$ are given by the user through an iPython widget as a series of numbers separated by a comma.

# In[ ]:


# ## 2D arrays
# When we have a table of numbers,
# 
# $$
# \left\lbrack\begin{array}{cccc}
# 0 & 12 & -1 & 5\cr
# -1 & -1 & -1 & 0\cr
# 11 & 5 & 5 & -2
# \end{array}\right\rbrack
# $$
# 
# (*i.e.* a *matrix*) it is natural to use a two-dimensional array $A_{i,j}$ with one index for the rows and one for the columns:
# 
# $$
# A = 
# \left\lbrack\begin{array}{ccc}
# A_{0,0} & \cdots &  A_{0,n-1}\cr
# \vdots & \ddots &  \vdots\cr
# A_{m-1,0} & \cdots & A_{m-1,n-1}
# \end{array}\right\rbrack
# $$
# 
# Let's recreate this array using NumPy:

# In[22]:


A = zeros((3,4))
A[0,0] = 0
A[1,0] = -1
A[2,0] = 11

A[0,1] = 12
A[1,1] = -1
A[2,1] = 5

A[0,2] = -1
A[1,2] = -1
A[2,2] = 5

# we can also use the same syntax that we used for nested lists

A[0][3] = 5
A[1][3] = 0
A[2][3] = -2

print(A)


# Next let's convert a nested list from a previous example into a 2D array:

# In[23]:


Cdegrees = range(0, 101, 10)
Fdegrees = [9./5*C + 32 for C in Cdegrees]
table = [[C, F] for C, F in zip(Cdegrees, Fdegrees)]
print(table)


# In[24]:


# Convert this into a NumPy array:
table2 = array(table)
print(table2)


# To see the number of elements in each dimension:

# In[25]:


print(table2.shape)


# *i.e.* 11 rows and 2 columns.

# Let's write a loop over all array elements of A:

# In[26]:


for i in range(table2.shape[0]):
    for j in range(table2.shape[1]):
        print('table2[%d,%d] = %g' % (i, j, table2[i,j]))


# Alternatively:

# In[27]:


for index_tuple, value in ndenumerate(table2):
    print('index %s has value %g' % (index_tuple, table2[index_tuple]))


# We can also extract slices from multi-dimensional arrays as before. For example, extract the second column:

# In[28]:


print(table2[:, 1]) # 2nd column (index 1)


# Play with this more complicated example:

# In[29]:


t = linspace(1, 30, 30).reshape(5, 6)
print(t)


# In[30]:


print(t[1:-1:2, 2:])


# ## <span style="color:blue">Exercise 5.8: Implement matrix-vector multiplication</span>
# A matrix $\mathbf{A}$ and a vector $\mathbf{b}$, represented in Python as a 2D array and a 1D array respectively, are given by:
# 
# $$
# \mathbf{A} = \left\lbrack\begin{array}{ccc}
# 0 & 12 & -1\cr
# -1 & -1 & -1\cr
# 11 & 5 & 5
# \end{array}\right\rbrack
# $$
# 
# $$
# \mathbf{b} = \left\lbrack\begin{array}{c}
# -2\cr
# 1\cr
# 7
# \end{array}\right\rbrack
# $$
# 
# Multiplying a matrix by a vector results in another vector $\mathbf{c}$, whose components are defined by the general rule
# 
# $$\mathbf{c}_i = \sum_j\mathbf{A}_{i,j}\mathbf{b}_j$$
# 
# Define $\mathbf{A}$ and $\mathbf{b}$ as NumPy arrays, and multiply them together using the above rule.

# In[ ]:


# ## <span style="color:blue">Exercise 5.9: Plot meteorological data</span>
# The file data/precipitation.csv contains monthly precipitation data for South-East England between 1916 and 2016 provided by the [Met Office](https://www.metoffice.gov.uk/hadobs/hadukp/data/download.html). Each row represents a year, and each column a month.
# To open the data file, use the command below:
# 
# ```python
# import numpy as np
# data = np.loadtxt(fname='data/precipitation.csv', delimiter=',')
# ```
# 
#  - Open the data file, and print its content. Observe that it is a multi-dimensional Numpy array. Print its shape. How many years are represented ?
#  
#   - Write a function that converts a length in mm into a length in inches. Use this function to convert the precipitation data into inches in a vectorized way.
#  
#  - With a `for` loop, print (in a formatted way) for each month, the year where the monthly precipitation was the highest. A line like: <br />
# `The maximal precipitation for January is 181.9 mm/month and was reached in 2014.` <br />
# is expected. <br />
# Hint: you should use the [max](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.amax.html) and [argmax](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.argmax.html) functions provided by Numpy.
# 
#  - Plot the [average](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.mean.html), [maximal](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.amax.html) and [minimal](https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.amin.html) monthly precipitation with respect to the year. Don't forget to decorate the plot properly (add a plot title, legends and axis labels).
#  
#  - Plot the same quantities, but only for the last twenty years.
#  
#  - Plot the same quantities, but only for years ending in 6.

# In[ ]: