Notebook

Table of Contents¶

5. Function fitting and Interpolation¶

In today's session, we will be using some of the LAI datasets we examined last week (masked by national boundaries) and doing some analysis on them.

5.1 Making 3D datasets and Movies First, we will examine how to improve our data reading function by extracting only the area we are interested in. This involves querying the 'country' mask to find its limits and passing this information through to the reader.
5.2 Interpolation Then we will look at methods to interpolate and smooth over gaps in datasets using various methods.
5.3 Function Fitting Finally, we will look at fitting models to datasets, in this case a model describing LAI phenology.

5.1 Making 3D datasets and Movies¶

First though, we will briefly go over once more the work we did on downloading data (ussssing wget), generating 3D masked datasets, and making movies.

This time, we will concentrate more on generating functions that we can re-use for other purposes.

5.1.1 Downloading data¶

We can download data from the MODIS server. The URL for this is http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/, and you can see that inside that URL, we have a folder per time step. Inside that folder, all the files for each of the land tiles are present. Currently, you need an username and password to access this folders, which you can register for here. You should then be able to click on the links and see and download individual files.

Let's try to write some Python code to download individual files. The code will try to do the following:

Open the URL http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/2013.02.18, and retrieve the HTML content using the requests package
Parse the content of the HTML to extract the URLs from all the available links using the BeautifulSoup package.
Check that the files have our tile of interest, as well as end in .hdf

Let's see how this works...

In [1]:

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [4]:

# The following two modules allow you to...
import bs4 # Parse HTML data
import requests     # Get data on the internet using an URL

the_url = 'http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/'
tile = "h31v11"
dates = "2013.02.18" # Eg.

r = requests.get (the_url + dates)
if not r.ok:
    print "Problem!"
html_content = r.content

soup = bs4.BeautifulSoup(html_content)
for link in soup.findAll("a"):
    fname = link.get("href")
    if fname.find (tile) >= 0 and fname.split(".")[-1] == "hdf":
        print fname

MCD15A2H.A2013049.h31v11.006.2015256194138.hdf

/opt/anaconda/lib/python2.7/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

The code that caused this warning is on line 174 of the file /opt/anaconda/lib/python2.7/runpy.py. To get rid of this warning, change code that looks like this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "lxml")

  markup_type=markup_type))

5.1.2 Read multiple dates¶

The previous code is OK for a single date, but we can make it even more useful if we stick it in a function to download data for a particular date.

We might want to check that the date is a valid date for the particular dataset that we're targetting....

In [2]:

def get_modis_fname ( modis_product_url='http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/',
                    tile="h31v11", date="2013.02.18"):
    """Get the URL for a MODIS Coll5 HDF file for a given tile and date.
    
    Parameters
    -----------
    
    modis_product_url: str
        A valid URL that points to the MODIS product
    tile: str
        A tile in MODIS parlance (e.g. "h17v03")
    date: str
        A date, in year.month.date
        
    Returns
    --------
    
    Either a full URL to download, or None if the e.g. 
    the date isn't available.
    """
    import bs4
    import requests
    
    r = requests.get ( modis_product_url + "/" + date )
    if not r.ok:
        # A problem was found. Warn the user and return None
        print "There was a problem obtaining the URL %s" % ( the_url + date )
        return None
    html_content = r.content

    soup = bs4.BeautifulSoup(html_content)
    for link in soup.findAll("a"):
        fname = link.get("href")
        if fname.find (tile) >= 0 and fname.split(".")[-1] == "hdf":
            return modis_product_url + "/" + date + "/" + fname 
                         # We assume we only have a single file fitting the 
                         # previous criteria
                
                
                
print get_modis_fname ( modis_product_url='http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/',
                    tile="h31v11", date="2013.02.18")
print get_modis_fname ( modis_product_url='http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/',
                    tile="h31v11", date="2013.02.19")

http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006//2013.02.18/MCD15A2H.A2013049.h31v11.006.2015256194138.hdf
There was a problem obtaining the URL http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/2013.02.19
None

In order to retrieve the file, we need a username and password (see above), and we need to authenticate ourselves in order to download the data. The following code (taken from get_modis deals with the authentication, but also does some other useful stuff:

Requests the file size from the remote server
Works out how long it took to download the data
Uses a context manager to open a file.

Perhaps the context manager is the weirdest concept here, but it's just a safe and convenient way of opening and closing a file. The following code

fp = open ("myfile.txt", 'w')
    fp.write ( x )
    fp.close()

can be written as

with open("myfile.txt", 'w') as fp:
        fp.write ( x )

The second option is preferred if you're doing more complicated stuff than just writing, as if something fails inside the context manager, fp will always be closed.

Note that we also use a context manager to authenticate ourselves against the MODIS server. The way the authentication works is through a redirection: you ask for the URL, and the server tells you to go somewhere else for authentication and then redirects you back to your HDF file.

In [3]:

def get_modis_product ( url, username, password, out_dir="./"):
    """"
    Download a MODIS product from a given URL. This method requires an
    username and password, as well as an output directory (optional) to
    save the file. The method will authenticate against the server, get
    the data, and report some timing statistics.
    
    Parameters
    ------------
    
    url: str
        The URL. Must exist, otherwise, we raise an IOError exception
    username: str
        The useranme. I bet that was surprising.
    password: str
        The day of the week. Naaaah, the password!!!
    out_dir: str
        The output directory where the file will be downloaded.
        
    Returns
    --------
    Nothing, it reports some speed statistics on the command line.
    """
    import requests
    import os
    import time
    # The following is an HTTP session, denoted by `s`
    with requests.Session() as s:
        # Set up the authentication
        s.auth = (username, password)
        # Get the named URL from the user. This will 
        # result in an HTTP redirection for which we
        # need to harvest the url
        r1 = s.request('get', url)
        # The redirection is stored in the .url bit
        # We also open as a stream for downloading larger
        # files using an iterator.
        r = s.get(r1.url, stream=True)
        # Check whether it all worked
        if not r.ok:
            raise IOError("Can't start download... [%s]" % url)
        # Get the file size from the headers. File size is in bytes
        file_size = int(r.headers['content-length'])
        # Get the filename from the url
        fname = url.split("/")[-1]
        print "Starting download on %s(%d bytes) ..." % ( 
            os.path.join(out_dir, fname), file_size)
        # The next line stores the current time in seconds. It's our
        # start time
        tic = int(round(time.time() ))
        # Open the output file to write
        with open(os.path.join(out_dir, fname), 'wb') as fp:
            # This iterates over the remote file contents in
            # 64kbytes chunks. If the chunk is valid, we just
            # write to the output file
            for chunk in r.iter_content(chunk_size=65535):
                if chunk:
                    fp.write(chunk)
        # Time at the end
        toc = int(round(time.time() ))
        print "Done in %d s. Speed = %g bytes/second" % ( 
            (toc-tic), float(file_size)/ (toc-tic))
        
        
the_url = get_modis_fname ()    
username = "ProfLewis"
password = "GeogG1222016"
get_modis_product ( the_url, username, password)

Starting download on ./MCD15A2H.A2013049.h31v11.006.2015256194138.hdf(9557411 bytes) ...
Done in 5 s. Speed = 1.91148e+06 bytes/second

5.1.3 Read Just The Data We Want¶

Last time, we generated a function to read MODIS LAI data.

We have now included such a function in the directory files/python called get_lai.py.

The only added sophistication is that when we call ReadAsArray, we give it the starting cols, rows, and number of cols and rows to read (e.g. xsize=600,yoff=300,xoff=300,ysize=600):

In [2]:

import glob
filelist = glob.glob("data/MCD15A2.A2005*hdf")

# Now we have a list of filenames
# load read_lai
import sys
sys.path.insert(0,'python')

from lai_library import get_lai

help(get_lai)

Help on function get_lai in module lai_library:

get_lai(filename, qc_layer='FparLai_QC', scale=[0.1, 0.1], mincol=0, minrow=0, ncol=None, nrow=None, selected_layers=['Lai_1km', 'LaiStdDev_1km'])

In [3]:

# e.g. for reading a single file:

lai_file0 = get_lai(filelist[20],ncol=600,mincol=300,minrow=400,nrow=800)
plt.imshow(lai_file0['Lai_1km'], vmin=0, vmax=5)

Out[3]:

<matplotlib.image.AxesImage at 0x7f00e6602510>

In [4]:

print type(lai_file0)
print lai_file0.keys()

<type 'dict'>
['Lai_1km', 'LaiStdDev_1km']

The function returns a dictionary with has keys ['Lai_1km', 'LaiStdDev_1km', 'FparLai_QC']:

In [5]:

print lai_file0['Lai_1km'].shape

(800, 600)

Each of these datasets is of shape (1200, 1200), but we have read only 600 (columns) and 800 (rows) in this case. Note that the numpy indexing is (rows,cols).

We know how to create a mask from a vector dataset from the last session. First, we'll copy and unzip the global country shapefile:

In [6]:

%%bash
cd data
unzip -x -o ../../Chapter4_GDAL/data/ne_50m*zip
cd ..

Archive:  ../../Chapter4_GDAL/data/ne_50m_admin_0_countries.zip
  inflating: ne_50m_admin_0_countries.README.html  
 extracting: ne_50m_admin_0_countries.VERSION.txt  
 extracting: ne_50m_admin_0_countries.cpg  
  inflating: ne_50m_admin_0_countries.dbf  
  inflating: ne_50m_admin_0_countries.prj  
  inflating: ne_50m_admin_0_countries.shp  
  inflating: ne_50m_admin_0_countries.shx

In [7]:

# have to make sure have access to gdal data files 
#from lai_library import rasterise_vector
# make a raster mask
# from the layer IRELAND in world.shp
from lai_library import rasterise_vector
filename = filelist[0]
file_template = 'HDF4_EOS:EOS_GRID:"%s":MOD_Grid_MOD15A2:%s'
file_spec = file_template%(filename,'Lai_1km')
     
mask = rasterise_vector ( file_spec, "data/ne_50m_admin_0_countries.shp", "NAME = 'IRELAND'", verbose=True)

mask = mask.astype(np.bool)
plt.imshow(mask)
plt.colorbar()

>>> Opened file HDF4_EOS:EOS_GRID:"data/MCD15A2.A2005049.h17v03.005.2007360165724.hdf":MOD_Grid_MOD15A2:Lai_1km
>>> Projection: PROJCS["unnamed",GEOGCS["Unknown datum based upon the custom spheroid",DATUM["Not specified (based on custom spheroid)",SPHEROID["Custom spheroid",6371007.181,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["longitude_of_center",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["Meter",1]]
>>> File size 1200 rows, 1200 columns
>>> UL corner: -1.11195e+06, 6.6717e+06
Done!

Out[7]:

<matplotlib.colorbar.Colorbar at 0x7f00f4ff0450>

In this case, the data we want is only a small section of the whole spatial dataset.

It would be convenient to extract only the part we want.

We can use numpy.where() to help with this:

In [8]:

# The mask is True for the area we want
rowpix,colpix = np.where(mask == True)

print rowpix,colpix

[ 556  556  556 ..., 1022 1022 1022] [699 700 701 ..., 471 472 473]

rowpix and colpix are lists of pixel coordinates where the condition we specified is True (i.e. where mask is False).

If we wanted to find the bounds of this area, we simply need to know the minimum and maximum column and row in these lists:

In [9]:

mincol,maxcol = min(colpix),max(colpix)
minrow,maxrow = min(rowpix),max(rowpix)

# think about why the + 1 here!!!
# what if maxcol and mincol were the same?
ncol = maxcol - mincol + 1
nrow = maxrow - minrow + 1

print minrow,mincol,nrow,ncol

556 431 467 335

We could use this information to extract only the area we want when we read the data:

In [10]:

lai_file0 = get_lai(filelist[20],\
                    ncol=ncol,nrow=nrow,mincol=mincol,minrow=minrow)

plt.imshow(lai_file0['Lai_1km'],interpolation='none')

Out[10]:

<matplotlib.image.AxesImage at 0x7f00e636ef10>

Now, lets extract this portion of the mask:

In [11]:

small_mask = mask[minrow:minrow+nrow,mincol:mincol+ncol]

plt.imshow(small_mask,interpolation='none')

Out[11]:

<matplotlib.image.AxesImage at 0x7f00e62adcd0>

And combine the country mask with the small dataset:

As a recap, we can use the function rasterise_vector that we gave you last time to develop a raster mask (!) from an ESRI shapefile (ne_50m_admin_0_countries.shp here).

We can then combine this mask with the QC-derived mask in the LAI dataset. We note that

The LAI mask (that will be the pixels which are set to NaN in the code below) is False for good data, if we use for example np.isnan to flag the NaN
The country mask is True within the selected country.

To combine them, we want some operator X for which:

LAI            Country                       Result
False(OK)      False (not Ireland)           False (invalid pixel)
False(OK)      True (Ireland)                True (valid pixel)
True (Bad LAI) False (not Ireland)           False (invalid pixel)
True (Bad LAI) True ( Ireland)               False (invalid pixel)

We can obtain a simple binary relationship if we NOT the country mask, and AND it with the LAI mask:

np.logical_and ( lai.mask, np.bitwise_not(small_mask))

If we want to use this mask in a masked array, remember that the mask is True if the pixel is masked, and False if it is unmasked. We can flip the mask over using e.g. np.bitwise_not when using it in the masked array.

However, it's easy to just carry on using NaN rather than a masked array.

In [12]:

import numpy.ma as ma
lai_file0 = get_lai(filelist[20],
                    ncol=ncol,nrow=nrow,mincol=mincol,minrow=minrow)

layer = 'Lai_1km'
lai = lai_file0[layer]
small_mask = mask[minrow:minrow+nrow,mincol:mincol+ncol]

# combined mask
lai_mask = np.isnan(lai)
new_mask = np.logical_and(small_mask, np.logical_not(lai_mask))

plt.figure(figsize=(7,7))
plt.imshow(new_mask,interpolation='none')
plt.title("Mask of good observations over Ireland")

lai = ma.array(lai,mask=np.logical_not(new_mask))

plt.figure(figsize=(7,7))
plt.imshow(lai,vmin=0, vmax=5, interpolation='none')
plt.title("Masked array LAI over Ireland")

plt.figure(figsize=(7,7))
plt.title("Non masked array with NaN")
plt.imshow(lai*new_mask, vmin=0, vmax=5, interpolation="nearest")    

Out[12]:

<matplotlib.image.AxesImage at 0x7f00e598ced0>

We should be used to writing loops around such functions.

In this case, we read all of the files in filelist and put the data into the dictionary called lai here.

Because there are multiple layers in the datasets, we loop over layer and append to each list indiviually. But first, let's build the mask:

In [13]:

country = 'UNITED KINGDOM'

# make a raster mask
# from the layer UNITED KINGDOM in world.shp
filename = filelist[0]
file_template = 'HDF4_EOS:EOS_GRID:"%s":MOD_Grid_MOD15A2:%s'
file_spec = file_template%(filename,'Lai_1km')
                           
mask = rasterise_vector ( file_spec, 
                         "data/ne_50m_admin_0_countries.shp", "NAME = '{}'".format(country), 
                         verbose=False)

# extract just the area we want
# by getting the min/max rows/cols
# of the data mask
# The mask is True for the area we want
rowpix,colpix = np.where(mask == True)
mincol,maxcol = min(colpix),max(colpix)
minrow,maxrow = min(rowpix),max(rowpix)
ncol = maxcol - mincol + 1
nrow = maxrow - minrow + 1
# and make a small mask
small_mask = mask[minrow:minrow+nrow, mincol:mincol+ncol]

small_mask = np.where(small_mask == 1, 1., np.nan)
plt.imshow(small_mask, interpolation="nearest")

Done!

Out[13]:

<matplotlib.image.AxesImage at 0x7f00e581d310>

In [14]:

data_fields = ['LaiStdDev_1km', 'Lai_1km']

# make a dictionary and put the filenames in it
# along with the mask and min/max info
# & data_fields with empty lists
lai = {'filenames':np.sort(filelist),
       'minrow':minrow,
       'mincol':mincol,
       'mask':small_mask, 
       'LaiStdDev_1km':[],
       'Lai_1km':[] }


# loop over each filename
for ii, f in enumerate(np.sort(lai['filenames'])):

    this_lai = get_lai(f,
                       mincol=mincol,ncol=ncol,
                       minrow=minrow,nrow=nrow)
    for layer in data_fields:
        # apply the mask
        lai_p = this_lai[layer]
        this_lai[layer] = lai_p*small_mask
        lai[layer].append(this_lai[layer])
        
    
    
    
        

In [15]:

# have a look at one of these

i = 20

# just see what the shape is ...
print lai['Lai_1km'][i].shape

root = 'images/lai_uk'

cmap = plt.cm.Greens

f = lai['filenames'][i]
fig = plt.figure(figsize=(7,7))
# get some info from filename
file_id = f.split('/')[-1].split('.')[-5][1:]
print file_id
plt.imshow(lai['Lai_1km'][i],cmap=cmap,interpolation='none',vmax=4.,vmin=0.0)
# plot a jpg
plt.title(file_id)
plt.colorbar()
plt.savefig('images/lai_uk_%s.jpg'%file_id)

(1197, 568)
2005161

In [16]:

# thats quite good, so put as a function:
import numpy.ma as ma
import numpy as np
import sys
sys.path.insert(0,'python')
from lai_library import get_lai, rasterise_vector

def read_lai(filelist,country=None):
    '''
    Read MODIS LAI data from a set of files
    in the list filelist. Data assumed to be in
    directory datadir.
    
    Parameters:
    filelist : list of LAI files
    
    Options:
    country  : country name (in data/world.shp)
    
    Returns:
    lai dictionary
    '''
    if country:
        # make a raster mask
        # from the layer UNITED KINGDOM in world.shp
        filename = filelist[0]
        file_template = 'HDF4_EOS:EOS_GRID:"%s":MOD_Grid_MOD15A2:%s'
        file_spec = file_template%(filename,'Lai_1km')

        mask = rasterise_vector ( file_spec, 
                                 "data/ne_50m_admin_0_countries.shp", "NAME = '{}'".format(country), 
                                 verbose=False)

        # extract just the area we want
        # by getting the min/max rows/cols
        # of the data mask
        # The mask is True for the area we want
        rowpix,colpix = np.where(mask == True)
        mincol,maxcol = min(colpix),max(colpix)
        minrow,maxrow = min(rowpix),max(rowpix)
        ncol = maxcol - mincol + 1
        nrow = maxrow - minrow + 1
        # and make a small mask
        small_mask = mask[minrow:minrow+nrow, mincol:mincol+ncol]

        small_mask = np.where(small_mask == 1, 1., np.nan)
    else:
        # no country
        mincol = 0
        maxcol = 0
        ncol = None
        nrow = None

    # data_fields with empty lists
    data_fields = ['LaiStdDev_1km','Lai_1km']
    
    # make a dictionary and put the filenames in it
    # along with the mask and min/max info
    lai = {'filenames':np.sort(filelist),
           'minrow':minrow,'mincol':mincol,
           'mask':small_mask, 
           'Lai_1km': [],
           'LaiStdDev_1km': []}
    

    
    # loop over each filename
    for f in np.sort(lai['filenames']):
        this_lai = get_lai(f,
                           mincol=mincol,ncol=ncol,
                           minrow=minrow,nrow=nrow)
        for layer in data_fields:
            if country:
                # apply the mask
                lai_p = this_lai[layer]
                this_lai[layer] = lai_p*small_mask
                lai[layer].append(this_lai[layer])
            else:
                lai_p = this_lai[layer]
                lai[layer].append(this_lai[layer])
            
    return lai

In [17]:

# test this ... the one in the file
# does a cutout of the data area as well
# which will keep the memory
# requirements down
#from get_lai import read_lai

lai = read_lai(filelist,country='IRELAND')

# have a look at one of these

i = 20

# just see what the shape is ...
print lai['Lai_1km'][i].shape

root = 'images/lai_eire'

cmap = plt.cm.Greens

f = lai['filenames'][i]
fig = plt.figure(figsize=(7,7))
# get some info from filename
file_id = f.split('/')[-1].split('.')[-5][1:]
print file_id
plt.imshow(lai['Lai_1km'][i],cmap=cmap,interpolation='none',vmax=4.,vmin=0.0)
# plot a jpg
plt.title(file_id)
plt.colorbar()
plt.savefig('%s_%s.jpg'%(root,file_id))

Done!
(467, 335)
2005161

In [18]:

# make a movie

import pylab as plt
import os

# just see what the shape is ...
print lai['Lai_1km'][0].shape

root = 'images/lai_country_eire'

cmap = plt.cm.Greens

for i,f in enumerate(lai['filenames']):
    fig = plt.figure(figsize=(7,7))
    # get some info from filename
    file_id = f.split('/')[-1].split('.')[-5][1:]
    print file_id
    plt.imshow(lai['Lai_1km'][i],cmap=cmap,interpolation='none',vmax=4.,vmin=0.0)
    # plot a jpg
    plt.title(file_id)
    plt.colorbar()
    plt.savefig('%s_%s.jpg'%(root,file_id))
    plt.close(fig)
    
cmd = 'convert -delay 100 -loop 0 {0}_*.jpg {0}_movie.gif'.format(root)
os.system(cmd)

Out[18]:

In [19]:

# The movie making works, so pack that into a function

import pylab as plt
import os

root = 'images/lai_eire'

def make_movie(lai,root,layer='Lai_1km',vmax=4.,vmin=0.,do_plot=False):
    '''
    Make an animated gif from MODIS LAI data in
    dictionary 'lai'.
    
    Parameters:
    lai    : data dictionary
    root   : root file /directory name of frames and movie
    
    layer  : data layer to plot 
    vmax   : max value for plotting
    vmin   : min value for plotting
    do_plot: set True if you want the individual plots
             to display
    
    Returns:
    movie name    
    
    '''
    cmap = plt.cm.Greens
    
    for i,f in enumerate(lai['filenames']):
        fig = plt.figure(figsize=(7,7))
        # get some info from filename
        file_id = f.split('/')[-1].split('.')[-5][1:]
        print file_id
        plt.imshow(lai[layer][i],cmap=cmap,interpolation='none',\
                   vmax=vmax,vmin=vmin)
        # plot a jpg
        plt.title(file_id)
        plt.colorbar()
        plt.savefig('%s_%s.jpg'%(root,file_id))
        if not do_plot:
            plt.close(fig)
        
    cmd = 'convert -delay 100 -loop 0 {0}_*.jpg {0}_movie.gif'.format(root)
    os.system(cmd)
    return '{0}_movie.gif'.format(root)

In [20]:

import glob
year = 2005
tile = "h17v03"
# test it
# Search for all the files using wildcards
filelist = glob.glob ("data/MCD15A2.A{}*{}*.hdf".format(year, tile))
filelist.sort()
lai_uk = read_lai(filelist,country='UNITED KINGDOM')
root = 'images/lai_UK'
movie = make_movie(lai_uk,root)
print movie

---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-20-368cf0ed4639> in <module>()
      8 lai_uk = read_lai(filelist,country='UNITED KINGDOM')
      9 root = 'images/lai_UK'
---> 10 movie = make_movie(lai_uk,root)
     11 print movie

<ipython-input-19-25bd2db3bc9d> in make_movie(lai, root, layer, vmax, vmin, do_plot)
     36         plt.title(file_id)
     37         plt.colorbar()
---> 38         plt.savefig('%s_%s.jpg'%(root,file_id))
     39         if not do_plot:
     40             plt.close(fig)

/opt/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in savefig(*args, **kwargs)
    696     fig = gcf()
    697     res = fig.savefig(*args, **kwargs)
--> 698     fig.canvas.draw_idle()   # need this if 'transparent=True' to reset colors
    699     return res
    700 

/opt/anaconda/lib/python2.7/site-packages/matplotlib/backend_bases.pyc in draw_idle(self, *args, **kwargs)
   2030         if not self._is_idle_drawing:
   2031             with self._idle_draw_cntx():
-> 2032                 self.draw(*args, **kwargs)
   2033 
   2034     def draw_cursor(self, event):

/opt/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_agg.pyc in draw(self)
    462 
    463         try:
--> 464             self.figure.draw(self.renderer)
    465         finally:
    466             RendererAgg.lock.release()

/opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     61     def draw_wrapper(artist, renderer, *args, **kwargs):
     62         before(artist, renderer)
---> 63         draw(artist, renderer, *args, **kwargs)
     64         after(artist, renderer)
     65 

/opt/anaconda/lib/python2.7/site-packages/matplotlib/figure.pyc in draw(self, renderer)
   1141 
   1142             mimage._draw_list_compositing_images(
-> 1143                 renderer, self, dsu, self.suppressComposite)
   1144 
   1145             renderer.close_group('figure')

/opt/anaconda/lib/python2.7/site-packages/matplotlib/image.pyc in _draw_list_compositing_images(renderer, parent, dsu, suppress_composite)
    137     if not_composite or not has_images:
    138         for zorder, a in dsu:
--> 139             a.draw(renderer)
    140     else:
    141         # Composite any adjacent images together

/opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     61     def draw_wrapper(artist, renderer, *args, **kwargs):
     62         before(artist, renderer)
---> 63         draw(artist, renderer, *args, **kwargs)
     64         after(artist, renderer)
     65 

/opt/anaconda/lib/python2.7/site-packages/matplotlib/axes/_base.pyc in draw(self, renderer, inframe)
   2407             renderer.stop_rasterizing()
   2408 
-> 2409         mimage._draw_list_compositing_images(renderer, self, dsu)
   2410 
   2411         renderer.close_group('axes')

/opt/anaconda/lib/python2.7/site-packages/matplotlib/image.pyc in _draw_list_compositing_images(renderer, parent, dsu, suppress_composite)
    137     if not_composite or not has_images:
    138         for zorder, a in dsu:
--> 139             a.draw(renderer)
    140     else:
    141         # Composite any adjacent images together

/opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     61     def draw_wrapper(artist, renderer, *args, **kwargs):
     62         before(artist, renderer)
---> 63         draw(artist, renderer, *args, **kwargs)
     64         after(artist, renderer)
     65 

/opt/anaconda/lib/python2.7/site-packages/matplotlib/axis.pyc in draw(self, renderer, *args, **kwargs)
   1139 
   1140         for tick in ticks_to_draw:
-> 1141             tick.draw(renderer)
   1142 
   1143         # scale up the axis label box to also find the neighbors, not

/opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     61     def draw_wrapper(artist, renderer, *args, **kwargs):
     62         before(artist, renderer)
---> 63         draw(artist, renderer, *args, **kwargs)
     64         after(artist, renderer)
     65 

/opt/anaconda/lib/python2.7/site-packages/matplotlib/axis.pyc in draw(self, renderer)
    257             self.gridline.draw(renderer)
    258         if self.tick1On:
--> 259             self.tick1line.draw(renderer)
    260         if self.tick2On:
    261             self.tick2line.draw(renderer)

/opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs)
     61     def draw_wrapper(artist, renderer, *args, **kwargs):
     62         before(artist, renderer)
---> 63         draw(artist, renderer, *args, **kwargs)
     64         after(artist, renderer)
     65 

/opt/anaconda/lib/python2.7/site-packages/matplotlib/lines.pyc in draw(self, renderer)
    824 
    825         if self._marker and self._markersize > 0:
--> 826             gc = renderer.new_gc()
    827             self._set_gc_clip(gc)
    828             rgbaFace = self._get_rgba_face()

/opt/anaconda/lib/python2.7/site-packages/matplotlib/backend_bases.pyc in new_gc(self)
    726         Return an instance of a :class:`GraphicsContextBase`
    727         """
--> 728         return GraphicsContextBase()
    729 
    730     def points_to_pixels(self, points):

/opt/anaconda/lib/python2.7/site-packages/matplotlib/backend_bases.pyc in __init__(self)
    800         self._hatch_linewidth = rcParams['hatch.linewidth']
    801         self._url = None
--> 802         self._gid = None
    803         self._snap = None
    804         self._sketch = None

KeyboardInterrupt:

5.2 Interpolation¶

5.2.1 Univariate interpolation¶

So, we can load the data we want from multiple MODIS hdf files that we have downloaded from the NASA server into a 3D masked numpy array, with a country boundary mask (projected int the raster data coordinate system) from a vector dataset.

Let's start to explore the data then.

You should have an array of LAI for Ireland:

In [21]:

type(lai['Lai_1km'])

Out[21]:

list

Let's plot the LAI for some given pixels.

First, we might like to identify which pixels actually have any data.

A convenient function for this would be np.where that returns the indices of items that are True.

Since the data mask is False for good data, we take the complement ~ so that good data are `True:

In [22]:

data = np.array(lai['Lai_1km'])
np.where(np.isfinite(data))

Out[22]:

(array([ 3,  3,  3, ..., 39, 39, 39]),
 array([318, 320, 321, ..., 466, 466, 466]),
 array([ 73, 136,  74, ...,  37,  40,  42]))

An example good pixel this is (3, 320, 136). Let's look at this for all time periods:

In [23]:

r = 320
c = 136

pixel = data[:, r, c]/10. 

# plot red stars at the data points
plt.plot(np.arange(len(pixel))*8,pixel,'r*')
# plot a black (k) dashed line (--)
plt.plot(np.arange(len(pixel))*8,pixel,'k--')
plt.xlabel('doy')
plt.ylabel('LAI')
plt.title('pixel %03d %03d'%(r,c))

Out[23]:

<matplotlib.text.Text at 0x7f00e1b82cd0>

The data follow the trend of what we might expect for LAI development, but they are clearly a little noisy.

We also have access to uncertainty information (standard deviation):

In [24]:

# copy the data in case we change it any

data = np.array(lai['Lai_1km']) * 1.
sd   = np.array(lai['LaiStdDev_1km'])*1.

r = 320
c = 136

pixel    = data[:,r,c]
pixel_sd =   sd[:,r,c]

x = np.arange(len(pixel))*8

# plot red stars at the data points
plt.plot(x,pixel,'r*')
# plot a black (k) dashed line (--)
plt.plot(x,pixel,'k--')
# plot error bars:
# 1.96 because that is the 95% confidence interval
plt.errorbar(x,pixel,yerr=pixel_sd*1.96)
plt.xlabel('doy')
plt.ylabel('LAI')
plt.title('pixel %03d %03d'%(r,c))

Out[24]:

<matplotlib.text.Text at 0x7f00e1ab7b50>

We would generally expect LAI to be quite smoothly varying over time. Visualising the data with 95% confidence intervals is quite useful as we can now 'imagine' some smooth line that would generally go within these bounds.

Some of the uncertainty estimates are really rather small though, which are probably not reliable.

Let's inflate them:

In [25]:

# copy the data in case we change it any

data = np.array(lai['Lai_1km']) * 1.
sd   = np.array(lai['LaiStdDev_1km'])*1.
r = 320
c = 136

pixel    = data[:,r,c]
pixel_sd =   sd[:,r,c]
# threshold
thresh = 0.25
pixel_sd[pixel_sd<thresh] = thresh

x = np.arange(len(pixel))*8

# plot red stars at the data points
plt.plot(x,pixel,'r*')
# plot a black (k) dashed line (--)
plt.plot(x,pixel,'k--')
# plot error bars:
# 1.96 because that is the 95% confidence interval
plt.errorbar(x,pixel,yerr=pixel_sd*1.96)
plt.xlabel('doy')
plt.ylabel('LAI')
plt.title('pixel %03d %03d'%(r,c))

Out[25]:

<matplotlib.text.Text at 0x7f00e1a0c550>

This is perhaps a bit more realistic ...

The data now have some missing values (data gaps) and, as we have noted, are a little noisy.

A Python module we can use for many scientific functions is scipy, in particular here, the scipy interpolation functions.

We need to make a careful choice of the interpolation functions.

We might, in many circumstances simply want something that interpolates between data points, i.e. that goes through the data points that we have.

Many interpolators will not provide extrapolation, so in the example above we could not get an estimate of LAI prior to the first sample and after the last.

The best way to deal with that would be to have multiple years of data.

Instead here, we will repeat the dataset three times to mimic this:

In [26]:

from scipy import interpolate

pixel = data[:,r,c]

# original x,y
y_ = pixel
x_ = (np.arange(len(y_))*8.+1)[np.isfinite(pixel)]
y_ = y_[np.isfinite(pixel)]

# extend: using np.tile() to repeat data
y_extend = np.tile(y_,3)
# extend: using vstack to stack 3 different arrays
x_extend = np.hstack((x_-46*8,x_,x_+46*8))

In [27]:

# plot the extended dataset
plt.figure(figsize=(12,3))
plt.plot(x_extend,y_extend,'b')
plt.plot(x_,y_,'k+')
plt.axvline(0, color="r")
plt.axvline(365, color="r")
plt.xlim(-356,2*365)
plt.xlabel('day of year')
plt.ylabel('LAI')

Out[27]:

<matplotlib.text.Text at 0x7f00e119e2d0>

In [28]:

# define xnew at 1 day interval
xnew = np.arange(1.,366.)

# linear interpolation
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
ynew = f(xnew)

In [29]:

plt.plot(xnew,ynew)
plt.plot(x_,y_,'r+')
plt.xlim(1,366)

Out[29]:

(1, 366)

In [30]:

# cubic interpolation
f = interpolate.interp1d(x_extend,y_extend,kind='cubic')
ynew = f(xnew)
plt.plot(xnew,ynew)
plt.plot(x_,y_,'r+')
plt.xlim(1,366)

Out[30]:

(1, 366)

In [31]:

# nearest neighbour interpolation
f = interpolate.interp1d(x_extend,y_extend,kind='nearest')
ynew = f(xnew)
plt.plot(xnew,ynew)
plt.plot(x_,y_,'r+')
plt.xlim(1,366)

Out[31]:

(1, 366)

Depending on the problem you are trying to solve, different interpolation schemes will be appropriate. For categorical data (e.g. 'snow', coded as 1 and 'no snow' coded as 1), for instance, a nearest neighbour interpolation might be a good idea.

5.2.2 Smoothing¶

One issue with the schemes above is that they go exactly through the data points, but a more realistic description of the data might be one that incorporated the uncertainty information we have. Visually, this is quite easy to imagine, but how can we implement such ideas?

One way of thinking about this is to think about other sources of information that we might bring to bear on the problem. One such would be that we expect the function to be 'quite smooth'. This allows us to consider applying smoothness as an additional constraint to the solution.

Many such problems can be phrased as convolution operations.

Convolution is a form of digital filtering that combines two sequences of numbers $y$ and $w$ to give a third, the result $z$ that is a filtered version of $y$, where for each element $j$ of $y$:

$$ z_j = \sum_{i=-n}^{i=n}{w_i y_{j+i}} $$

where $n$ is the half width of the filter $w$. For a smoothing filter, the elements of this will sum to 1 (so that the magnitude of $y$ is not changed).

To illustrate this in Python:

In [32]:

# a simple box smoothing filter
# filter width 11
w = np.ones(11)
# normalise
w = w/w.sum()
# half width
n = len(w)/2

# Take the linear interpolation of the LAI above as the signal 
# linear interpolation
x = xnew
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
y = f(x)

# where we will put the result
z = np.zeros_like(y)

# This is a straight implementation of the
# equation above
for j in xrange(n,len(y)-n):
    for i in xrange(-n,n+1):
        z[j] += w[n+i] * y[j+i]

In [33]:

plt.plot(x,y,'k--',label='y')
plt.plot(x,z,'r',label='z')
plt.xlim(x[0],x[-1])
plt.legend(loc='best')
plt.title('smoothing with filter width %d'%len(w))

Out[33]:

<matplotlib.text.Text at 0x7f00e0ab9f50>

As we suggested, the result of convolving $y$ with the filter $w$ (of width 31 here) is $z$, a smoothed version of $y$.

You might notice that the filter is only applied once we are n samples into the signal, so we get 'edge effects'. There are various ways of dealing with edge effects, such as repeating the signal (as we did above, for much the same reason), reflecting the signal, or assuming the signal to be some constant value (e.g. 0) outside of its defined domain.

If we make the filter wider (width 31 now):

In [34]:

# a simple box smoothing filter
# filter width 31
w = np.ones(31)
# normalise
w = w/w.sum()
# half width
n = len(w)/2

# Take the linear interpolation of the LAI above as the signal 
# linear interpolation
x = xnew
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
y = f(x)

# where we will put the result
z = np.zeros_like(y)

# This is a straight implementation of the
# equation above
for j in xrange(n,len(y)-n):
    for i in xrange(-n,n+1):
        z[j] += w[n+i] * y[j+i]
        
plt.plot(x,y,'k--',label='y')
plt.plot(x,z,'r',label='z')
plt.xlim(x[0],x[-1])
plt.legend(loc='best')
plt.title('smoothing with filter width %d'%len(w))

Out[34]:

<matplotlib.text.Text at 0x7f00e0951410>

Then the signal is 'more' smoothed.

There are many filters implemented in scipy.signal that you should look over.

A very commonly used smoothing filter is the Savitsky-Golay filter for which you define the window size and filter order.

As with most filters, the filter width controls the degree of smoothing (see examples above). The filter order (related to polynomial order) in essence controls the shape of the filter and defines the 'peakiness' of the response.

In [35]:

import sys
sys.path.insert(0,'python')
# see http://wiki.scipy.org/Cookbook/SavitzkyGolay
from savitzky_golay import *

window_size = 31
order = 1

# Take the linear interpolation of the LAI above as the signal 
# linear interpolation
x = xnew
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
y = f(x)

z = savitzky_golay(y,window_size,order)

plt.plot(x,y,'k--',label='y')
plt.plot(x,z,'r',label='z')
plt.xlim(x[0],x[-1])
plt.legend(loc='best')
plt.title('smoothing with filter width %d order %.2f'%(window_size,order))

Out[35]:

<matplotlib.text.Text at 0x7f00e076f8d0>

In [36]:

import sys
sys.path.insert(0,'python')
# see http://wiki.scipy.org/Cookbook/SavitzkyGolay
from savitzky_golay import *

window_size = 61
order = 2

# Take the linear interpolation of the LAI above as the signal 
# linear interpolation
x = xnew
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
y = f(x)

z = savitzky_golay(y,window_size,order)

plt.plot(x,y,'k--',label='y')
plt.plot(x,z,'r',label='z')
plt.xlim(x[0],x[-1])
plt.legend(loc='best')
plt.title('smoothing with filter width %d order %.2f'%(window_size,order))

Out[36]:

<matplotlib.text.Text at 0x7f00e0e242d0>

If the samples $y$ have uncertainty (standard deviation $\sigma_j$ for sample $j$) associated with them, we can incorporate this into smoothing, although many of the methods in scipy and numpy do not directly allow for this.

Instead, we call an optimal interpolation scheme (a regulariser) here that achieves this. This also has the advantage of giving an estimate of uncertainty for the smoothed samples.

In this case, the parameters are: order (as above, but only integer in this implementation) and wsd which is an estimate of the variation (standard deviation) in the signal that control smoothness.

In [37]:

import glob
from smoothn import *
tile = 'h17v03'
year = '2005'

# Search for all the files using wildcards
filelist = glob.glob ("data/MCD15A2.A{}*{}*.hdf".format(year, tile))
filelist.sort()

try:
    data = np.array(lai['Lai_1km'])
    sd = np.array(lai['LaiStdDev_1km'])
except:
    lai = read_lai(filelist,country='IRELAND')
    data = np.array(lai['Lai_1km'])
    sd = np.array(lai['LaiStdDev_1km'])
    
thresh = 0.25
sd[sd<thresh] = thresh

#print np.where(np.isfinite(data))
r = 320
c = 136

# this is about the right amount of smoothing here
gamma = .7

pixel = data[:,r,c]
pixel_sd = sd[:,r,c]

#pixel[np.isnan(pixel)] = 0.
#pixel_sd[np.isnan(pixel_sd)] = 0.




x = np.arange(46)*8+1

order = 2
z = smoothn(pixel, s=gamma, sd=pixel_sd, smoothOrder=2.0)[0]

# plot
plt.plot(x,pixel,'k*',label='y')
plt.errorbar(x,pixel,pixel_sd*1.96)
plt.plot(x,z,'r',label='z')
# lower and upper bounds of 95% CI

plt.xlim(1,366)
plt.ylim(0.,5)
plt.legend(loc='best')

/opt/anaconda/lib/python2.7/site-packages/ipykernel_launcher.py:19: RuntimeWarning: invalid value encountered in less
python/smoothn.py:179: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if sd != None:
python/smoothn.py:186: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if W != None:
python/smoothn.py:204: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.
  if W == None:

Out[37]:

<matplotlib.legend.Legend at 0x7f00e10f8410>

In [38]:

# test it on a new pixel

r = 432
c = 86

gamma = 5

pixel = data[:,r,c]
pixel_sd =   sd[:,r,c]

x = np.arange(46)*8+1

order = 2
z = smoothn(pixel,s=gamma,sd=pixel_sd,smoothOrder=2.0)[0]

# plot
plt.plot(x,pixel,'k*',label='y')
plt.errorbar(x,pixel,pixel_sd*1.96)
plt.plot(x,z,'r',label='z')

plt.xlim(1,366)
plt.legend(loc='best')
z.ndim

Out[38]:

In [44]:

# and test it on a new pixel

r = 372
c = 56

#r = 9
#c = 277
gamma = .1

pixel = data[:,r,c]
pixel_sd =   sd[:,r,c]

x = np.arange(46)*8+1

order = 2
# solve for gamma - degree of smoothness 
zz = smoothn(pixel,sd=pixel_sd,smoothOrder=2.0)
z = zz[0]
print zz[1],zz[2]

gamma = zz[1]

# plot
plt.plot(x,pixel,'k*',label='y')
plt.errorbar(x,pixel,pixel_sd*1.96)
plt.plot(x,z,'r',label='z')

plt.xlim(1,366)
plt.legend(loc='best')

0.652313558672 True

Out[44]:

<matplotlib.legend.Legend at 0x7fcb2f5b2f10>

To apply this approach to our 3D dataset, we could simply loop over all pixels.

Note that any per-pixel processing will be slow ... but this is quite a fast smoothing method, so is feasible here.

In [46]:

# we have put in an axis control to smoothn
# here so it will only smooth over doy
# This will take a few minutes to process
# we switch on verbose mode to get some feedback
# on progress

# make a mask of pixels where there is at least 1 sample
# over the time period
#mask = (data.mask.sum(axis=0) == 0)
#mask = np.array([mask]*data.shape[0])

z = smoothn(data,s=5.0,sd=sd,smoothOrder=2.0,axis=0,TolZ=0.05,verbose=True)[0]
#z = ma.array(z,mask=mask)

python/smoothn.py:181: RuntimeWarning: invalid value encountered in greater
  mask = (sd > 0.)

tol 1.0 nit 0
tol 1.03468577991 nit 1
tol 0.694005334662 nit 2
tol 0.552663819671 nit 3
tol 0.378617705139 nit 4
tol 0.296861598869 nit 5
tol 0.211077369397 nit 6
tol 0.161583424458 nit 7
tol 0.117940775214 nit 8
tol 0.0890834140148 nit 9
tol 0.0661973636164 nit 10

In [47]:

plt.figure(figsize=(9,9))
plt.imshow(z[20],interpolation='none',vmax=6)
plt.colorbar()

Out[47]:

<matplotlib.colorbar.Colorbar at 0x7fcb30bff590>

In [48]:

# similarly, take frame 20
# and smooth that

ZZ = smoothn(z[20],smoothOrder=2.)
# self-calibrated smoothness term
s = ZZ[1]
print 's =',s
Z = ZZ[0]
plt.figure(figsize=(9,9))
plt.imshow(Z,interpolation='none',vmax=6)
plt.colorbar()

s = 0.75661616792

Out[48]:

<matplotlib.colorbar.Colorbar at 0x7fcb33e0eb10>

In [49]:

# similarly, take frame 20
# and smooth that

ZZ = smoothn(z,s=s,smoothOrder=2.,axis=(1,2),verbose=True)

Z = ZZ[0]
plt.figure(figsize=(9,9))
plt.imshow(Z[30],interpolation='none',vmax=6)
plt.colorbar()

tol 1.0 nit 0

Out[49]:

<matplotlib.colorbar.Colorbar at 0x7fcb2dae4e10>

In [50]:

x = np.arange(46)*8+1.
try:
    plt.plot(x,np.mean(Z,axis=(1,2)))
    plt.plot(x,np.min(Z,axis=(1,2)),'r--')
    plt.plot(x,np.max(Z,axis=(1,2)),'r--')
except:
    plt.plot(x,np.mean(Z,axis=2).mean(axis=1))
    plt.plot(x,np.min(Z,axis=2).min(axis=1),'r--')
    plt.plot(x,np.max(Z,axis=2).max(axis=1),'r--')
    
plt.title('LAI variation of Eire')

Out[50]:

<matplotlib.text.Text at 0x7fcb30b86610>

In [45]:

# or doing this pixel by pixel ...
# which is slower than using axis

order = 2

# pixels that have some data
mask = (~data.mask).sum(axis=0)

odata = np.zeros((46,) + mask.shape)

rows,cols = np.where(mask>0)

len_x = len(rows)
order = 2
gamma = 5.

for i in xrange(len_x):
    r,c = rows[i],cols[i]
    # progress bar
    if i%(len_x/20) == 0:
        print '... %4.2f percent'%(i*100./float(len_x))
    pixel    = data[:,r,c]
    pixel_sd = sd[:,r,c]

    zz = smoothn(pixel,s=gamma,sd=pixel_sd,smoothOrder=order,TolZ=0.05)
    odata[:,rows[i],cols[i]] = zz[0]

... 0.00 percent
... 5.00 percent
... 10.00 percent
... 15.00 percent
... 20.00 percent
... 25.00 percent
... 30.00 percent
... 35.00 percent
... 40.00 percent
... 45.00 percent
... 50.00 percent
... 55.00 percent
... 60.00 percent
... 65.00 percent
... 70.00 percent
... 75.00 percent
... 80.00 percent
... 85.00 percent
... 90.00 percent
... 95.00 percent

In [46]:

import pylab as plt
import os

root = 'images/lai_eire_colourZ'

for i,f in enumerate(lai['filenames']):
    fig = plt.figure(figsize=(7,7))
    # get some info from filename
    file_id = f.split('/')[-1].split('.')[-5][1:]
    print file_id
    plt.imshow(Z[i],interpolation='none',vmax=6.,vmin=0.0)
    # plot a jpg
    plt.title(file_id)
    plt.colorbar()
    plt.savefig('%s_%s.jpg'%(root,file_id))
    plt.close(fig)

In [47]:

cmd = 'convert -delay 100 -loop 0 {0}_*.jpg {0}_movie2.gif'.format(root)
os.system(cmd)

Out[47]:

5.3 Function fitting¶

Sometimes, instead of applying some arbitrary smoothing function to data, we want to extract particular infromation from the time series.

One way to approach this is to fit some function to the time series at each location.

Let us suppose that we wish to characterise the phenology of vegetation in Ireland.

One way we could do this would be to look in the lai data for the most rapid changes.

Another would be to explicitly fit some mathematical function to the LAI data that would would expect to descrive typical LAI trajectories.

One example of such a function is the double logistic. A logistic function is:

$$ \hat{y} = p_0 - p_1 \left( \frac{1}{1 + e^{p_2 (t - p_3)}} + \frac{1}{1 + e^{p_4 (t - p_5)}} -1\right) $$

We can give a function for a double logistic:

In [48]:

def dbl_logistic_model ( p, t ):
        """A double logistic model, as in Sobrino and Juliean, 
        or Zhang et al"""
        return p[0] - p[1]* ( 1./(1+np.exp(p[2]*(t-p[3]))) + \
                              1./(1+np.exp(-p[4]*(t-p[5])))  - 1 )
   

In [49]:

tile = 'h17v03'
year = '2005'

# specify the file with the urls in
ifile= 'data/modis_lai_%s_%s.txt'%(tile,year)

fp = open(ifile)
filelist = [url.split('/')[-1].strip() for url in fp.readlines()]
fp.close()
import sys
sys.path.insert(0,'python')

from get_lai import *

try:
    data = lai['Lai_1km']
    sd = lai['LaiStdDev_1km']
except:
    lai = read_lai(filelist,country='IRELAND')
    data = lai['Lai_1km']
    sd = lai['LaiStdDev_1km']
    
thresh = 0.25
sd[sd<thresh] = thresh

# test pixel
r = 472
c = 84


y = data[:,r,c]
mask = ~y.mask
y = np.array(y[mask])
x = (np.arange(46)*8+1.)[mask]
unc = np.array(sd[:,r,c][mask])

And see what this looks like:

In [50]:

# define x (time)
x_full = np.arange(1,366) 

# some default values for the parameters
p = np.zeros(6)

# some stats on y
ysd = np.std(y)
ymean = np.mean(y)

# some rough guesses at the parameters

p[0] = ymean - 1.151*ysd;   # minimum  (1.151 is 75% CI)
p[1] = 2*1.151*ysd          # range
p[2] = 0.19                 # related to up slope
p[3] = 120                  # midpoint of up slope
p[4] = 0.13                 # related to down slope
p[5] = 220                  # midpoint of down slope

y_hat = dbl_logistic_model(p,x_full)

plt.clf()
plt.plot(x_full,y_hat)
plt.plot(x,y,'*')
plt.errorbar(x,y,unc*1.96)

Out[50]:

<Container object of 3 artists>

We could manually 'tweak' the parameters until we got a better 'fit' to the observations.

First though, let's define a measure of 'fit':

$$ Z_i = \frac{\hat{y}_i - y_i}{\sigma_i} $$$$ Z^2 = \sum_i{Z_i^2} = \sum_i{\left( \frac{\hat{y}_i - y_i}{\sigma_i} \right)^2} $$

and implement this as a mismatch function where we have data points:

In [51]:

def mismatch_function(p, x, y, unc):
    y_hat = dbl_logistic_model(p, x)
    diff = (y_hat - y)/unc
    return diff


Z = mismatch_function(p,x,y,unc)

plt.plot([1,365.],[0,0.],'k-')
plt.xlim(0,365)
plt.plot(x,Z,'*')


print 'Z^2 =',(Z**2).sum()

Z^2 = 113.325251358

Now lets change p a bit:

In [52]:

p[0] = ymean - 1.151*ysd;   # minimum  (1.151 is 75% CI)
p[1] = 2*1.151*ysd          # range
p[2] = 0.19                 # related to up slope
p[3] = 140                  # midpoint of up slope
p[4] = 0.13                 # related to down slope
p[5] = 220                  # midpoint of down slope

Z = mismatch_function(p,x,y,unc)

plt.plot([1,365.],[0,0.],'k-')
plt.xlim(0,365)
plt.plot(x,Z,'*')


print 'Z^2 =',(Z**2).sum()

Z^2 = 105.478274642

We have made the mismatch go down a little ...

Clearly it would be tedious (and impractical) to do a lot of such tweaking, so we can use methods that seek the minimum of some function.

One such method is implemented in scipy.optimize.leastsq:

In [ ]:

from scipy import optimize

# initial estimate is in p
print 'initial parameters:',p[0],p[1],p[2],p[3],p[4],p[5]

# set some bounds for the parameters
bound = np.array([(0.,10.),(0.,10.),(0.01,1.),\
                  (50.,300.),(0.01,1.),(50.,300.)])


# test pixel
r = 472
c = 84


y = data[:,r,c]
mask = ~y.mask
y = np.array(y[mask])
x = (np.arange(46)*8+1.)[mask]
unc = np.array(sd[:,r,c][mask])

# define function to give Z^2

def sse(p,x,y,unc):
    '''Sum of squared error'''
    # penalise p[3] > p[5]
    err = np.max([0.,(p[3] - p[5])])*1e4
    return (mismatch_function(p,x,y,unc)**2).sum()+err

# we pass the function:
#
# sse               : the name of the function we wrote to give 
#                     sum of squares of Z_i
# p                 : an initial estimate of the parameters
# args=(x,y,unc)    : the other information (other than p) that
#                     mismatch_function needs
# approx_grad       : if we dont have a function for the gradien
#                     we have to get the solver to approximate it
#                     which takes time ... see if you can work out
#                     d_sse / dp and use that to speed this up!

psolve = optimize.fmin_l_bfgs_b(sse,p,approx_grad=True,iprint=-1,\
                                args=(x,y,unc),bounds=bound)

print psolve[1]
pp = psolve[0]
plt.plot(x,y,'*')
plt.errorbar(x,y,unc*1.96)
y_hat = dbl_logistic_model(pp,x_full)
plt.plot(x_full,y_hat)

print 'solved parameters: ',pp[0],pp[1],pp[2],pp[3],pp[4],pp[5]

# if we define the phenology as the parameter p[3]
# and the 'length' of the growing season:
print 'phenology',pp[3],pp[5]-pp[3]

initial parameters: 0.468054306457 1.78821571141 0.19 140.0 0.13 220.0
23.2758233693
solved parameters:  0.615260788506 2.86137770501 0.0359482293173 158.838968533 0.0384063066821 227.611227167
phenology 158.838968533 68.7722586339

In [ ]:

# and run over each pixel ... this will take some time

# pixels that have some data
mask = (~data.mask).sum(axis=0)

pdata = np.zeros((7,) + mask.shape)

rows,cols = np.where(mask>0)
len_x = len(rows)

# lets just do some random ones to start with
#rows = rows[::10]
#cols = cols[::10]

len_x = len(rows)


for i in xrange(len_x):
    r,c = rows[i],cols[i]
    # progress bar
    if i%(len_x/40) == 0:
        print '... %4.2f percent'%(i*100./float(len_x))
    
    y = data[:,r,c]
    mask = ~y.mask
    y = np.array(y[mask])
    x = (np.arange(46)*8+1.)[mask]
    unc = np.array(sd[:,r,c][mask])
    
    # need to get an initial estimate of the parameters
    
    # some stats on y
    ysd = np.std(y)
    ymean = np.mean(y)

    p[0] = ymean - 1.151*ysd;   # minimum  (1.151 is 75% CI)
    p[1] = 2*1.151*ysd          # range
    p[2] = 0.19                 # related to up slope
    p[3] = 140                  # midpoint of up slope
    p[4] = 0.13                 # related to down slope
    p[5] = 220                  # midpoint of down slope

    
    # set factr to quite large number (relative error in solution)
    # as it'll take too long otherwise
    psolve = optimize.fmin_l_bfgs_b(sse,p,approx_grad=True,iprint=-1,\
                                args=(x,y,unc),bounds=bound,factr=1e12)

    pdata[:-1,rows[i],cols[i]] = psolve[0]
    pdata[-1,rows[i],cols[i]] = psolve[1] # sse

... 0.00 percent
... 2.50 percent
... 5.00 percent
... 7.50 percent
... 10.00 percent
... 12.50 percent
... 15.00 percent
... 17.50 percent
... 19.99 percent
... 22.49 percent
... 24.99 percent
... 27.49 percent
... 29.99 percent
... 32.49 percent
... 34.99 percent
... 37.49 percent
... 39.99 percent
... 42.49 percent
... 44.99 percent
... 47.49 percent
... 49.99 percent
... 52.49 percent
... 54.99 percent

In [ ]:

plt.figure(figsize=(10,10))
plt.imshow(pdata[3],interpolation='none',vmin=137,vmax=141)
plt.title('green up doy')
plt.colorbar()

plt.figure(figsize=(10,10))
plt.imshow(pdata[5]-pdata[3],interpolation='none',vmin=74,vmax=84)
plt.title('season length')
plt.colorbar()

plt.figure(figsize=(10,10))
plt.imshow(pdata[0],interpolation='none',vmin=0.,vmax=6.)
plt.title('min LAI')
plt.colorbar()

plt.figure(figsize=(10,10))
plt.imshow(pdata[1]+pdata[0],interpolation='none',vmin=0.,vmax=6.)
plt.title('max LAI')
plt.colorbar()

plt.figure(figsize=(10,10))
plt.imshow(np.sqrt(pdata[-1]),interpolation='none',vmax=np.sqrt(500))
plt.title('RSSE')
plt.colorbar()

In [ ]:

# check a few pixels

c = 200

for r in xrange(200,400,25):
    y = data[:,r,c]
    mask = ~y.mask
    y = np.array(y[mask])
    x = (np.arange(46)*8+1.)[mask]
    unc = np.array(sd[:,r,c][mask])
    
    x_full = np.arange(1,366) 
    
    # some default values for the parameters
    pp = pdata[:-1,r,c]
    plt.figure(figsize=(7,7))
    plt.title('r %d c %d'%(r,c))
    plt.plot(x,y,'*')
    plt.errorbar(x,y,unc*1.96)
    y_hat = dbl_logistic_model(pp,x_full)
    plt.plot(x_full,y_hat)
    
    print 'solved parameters: ',pp[0],pp[1],pp[2],pp[3],pp[4],pp[5]
    
    # if we define the phenology as the parameter p[3]
    # and the 'length' of the growing season:
    print 'phenology',pp[3],pp[5]-pp[3]

In [ ]: