In today's session, we will be using some of the LAI datasets we examined last week (masked by national boundaries) and doing some analysis on them.
5.1 Making 3D datasets and Movies First, we will examine how to improve our data reading function by extracting only the area we are interested in. This involves querying the 'country' mask to find its limits and passing this information through to the reader.
5.2 Interpolation Then we will look at methods to interpolate and smooth over gaps in datasets using various methods.
5.3 Function Fitting Finally, we will look at fitting models to datasets, in this case a model describing LAI phenology.
First though, we will briefly go over once more the work we did on downloading data (ussssing wget
), generating 3D masked datasets, and making movies.
This time, we will concentrate more on generating functions that we can re-use for other purposes.
We can download data from the MODIS server. The URL for this is http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/, and you can see that inside that URL, we have a folder per time step. Inside that folder, all the files for each of the land tiles are present. Currently, you need an username and password to access this folders, which you can register for here. You should then be able to click on the links and see and download individual files.
Let's try to write some Python code to download individual files. The code will try to do the following:
requests
package.hdf
Let's see how this works...
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
# The following two modules allow you to...
import bs4 # Parse HTML data
import requests # Get data on the internet using an URL
the_url = 'http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/'
tile = "h31v11"
dates = "2013.02.18" # Eg.
r = requests.get (the_url + dates)
if not r.ok:
print "Problem!"
html_content = r.content
soup = bs4.BeautifulSoup(html_content)
for link in soup.findAll("a"):
fname = link.get("href")
if fname.find (tile) >= 0 and fname.split(".")[-1] == "hdf":
print fname
MCD15A2H.A2013049.h31v11.006.2015256194138.hdf
/opt/anaconda/lib/python2.7/site-packages/bs4/__init__.py:181: UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently. The code that caused this warning is on line 174 of the file /opt/anaconda/lib/python2.7/runpy.py. To get rid of this warning, change code that looks like this: BeautifulSoup([your markup]) to this: BeautifulSoup([your markup], "lxml") markup_type=markup_type))
The previous code is OK for a single date, but we can make it even more useful if we stick it in a function to download data for a particular date.
We might want to check that the date is a valid date for the particular dataset that we're targetting....
def get_modis_fname ( modis_product_url='http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/',
tile="h31v11", date="2013.02.18"):
"""Get the URL for a MODIS Coll5 HDF file for a given tile and date.
Parameters
-----------
modis_product_url: str
A valid URL that points to the MODIS product
tile: str
A tile in MODIS parlance (e.g. "h17v03")
date: str
A date, in year.month.date
Returns
--------
Either a full URL to download, or None if the e.g.
the date isn't available.
"""
import bs4
import requests
r = requests.get ( modis_product_url + "/" + date )
if not r.ok:
# A problem was found. Warn the user and return None
print "There was a problem obtaining the URL %s" % ( the_url + date )
return None
html_content = r.content
soup = bs4.BeautifulSoup(html_content)
for link in soup.findAll("a"):
fname = link.get("href")
if fname.find (tile) >= 0 and fname.split(".")[-1] == "hdf":
return modis_product_url + "/" + date + "/" + fname
# We assume we only have a single file fitting the
# previous criteria
print get_modis_fname ( modis_product_url='http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/',
tile="h31v11", date="2013.02.18")
print get_modis_fname ( modis_product_url='http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/',
tile="h31v11", date="2013.02.19")
http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006//2013.02.18/MCD15A2H.A2013049.h31v11.006.2015256194138.hdf There was a problem obtaining the URL http://e4ftl01.cr.usgs.gov//MODV6_Cmp_A/MOTA/MCD15A2H.006/2013.02.19 None
In order to retrieve the file, we need a username and password (see above), and we need to authenticate ourselves in order to download the data. The following code (taken from get_modis deals with the authentication, but also does some other useful stuff:
Perhaps the context manager is the weirdest concept here, but it's just a safe and convenient way of opening and closing a file. The following code
fp = open ("myfile.txt", 'w')
fp.write ( x )
fp.close()
can be written as
with open("myfile.txt", 'w') as fp:
fp.write ( x )
The second option is preferred if you're doing more complicated stuff than just writing, as if something fails inside the context manager, fp
will always be closed.
Note that we also use a context manager to authenticate ourselves against the MODIS server. The way the authentication works is through a redirection: you ask for the URL, and the server tells you to go somewhere else for authentication and then redirects you back to your HDF file.
def get_modis_product ( url, username, password, out_dir="./"):
""""
Download a MODIS product from a given URL. This method requires an
username and password, as well as an output directory (optional) to
save the file. The method will authenticate against the server, get
the data, and report some timing statistics.
Parameters
------------
url: str
The URL. Must exist, otherwise, we raise an IOError exception
username: str
The useranme. I bet that was surprising.
password: str
The day of the week. Naaaah, the password!!!
out_dir: str
The output directory where the file will be downloaded.
Returns
--------
Nothing, it reports some speed statistics on the command line.
"""
import requests
import os
import time
# The following is an HTTP session, denoted by `s`
with requests.Session() as s:
# Set up the authentication
s.auth = (username, password)
# Get the named URL from the user. This will
# result in an HTTP redirection for which we
# need to harvest the url
r1 = s.request('get', url)
# The redirection is stored in the .url bit
# We also open as a stream for downloading larger
# files using an iterator.
r = s.get(r1.url, stream=True)
# Check whether it all worked
if not r.ok:
raise IOError("Can't start download... [%s]" % url)
# Get the file size from the headers. File size is in bytes
file_size = int(r.headers['content-length'])
# Get the filename from the url
fname = url.split("/")[-1]
print "Starting download on %s(%d bytes) ..." % (
os.path.join(out_dir, fname), file_size)
# The next line stores the current time in seconds. It's our
# start time
tic = int(round(time.time() ))
# Open the output file to write
with open(os.path.join(out_dir, fname), 'wb') as fp:
# This iterates over the remote file contents in
# 64kbytes chunks. If the chunk is valid, we just
# write to the output file
for chunk in r.iter_content(chunk_size=65535):
if chunk:
fp.write(chunk)
# Time at the end
toc = int(round(time.time() ))
print "Done in %d s. Speed = %g bytes/second" % (
(toc-tic), float(file_size)/ (toc-tic))
the_url = get_modis_fname ()
username = "ProfLewis"
password = "GeogG1222016"
get_modis_product ( the_url, username, password)
Starting download on ./MCD15A2H.A2013049.h31v11.006.2015256194138.hdf(9557411 bytes) ... Done in 5 s. Speed = 1.91148e+06 bytes/second
Last time, we generated a function to read MODIS LAI data.
We have now included such a function in the directory files/python
called get_lai.py
.
The only added sophistication is that when we call ReadAsArray
, we give it the starting cols, rows, and number of cols and rows to read (e.g. xsize=600,yoff=300,xoff=300,ysize=600
):
import glob
filelist = glob.glob("data/MCD15A2.A2005*hdf")
# Now we have a list of filenames
# load read_lai
import sys
sys.path.insert(0,'python')
from lai_library import get_lai
help(get_lai)
Help on function get_lai in module lai_library: get_lai(filename, qc_layer='FparLai_QC', scale=[0.1, 0.1], mincol=0, minrow=0, ncol=None, nrow=None, selected_layers=['Lai_1km', 'LaiStdDev_1km'])
# e.g. for reading a single file:
lai_file0 = get_lai(filelist[20],ncol=600,mincol=300,minrow=400,nrow=800)
plt.imshow(lai_file0['Lai_1km'], vmin=0, vmax=5)
<matplotlib.image.AxesImage at 0x7f00e6602510>
print type(lai_file0)
print lai_file0.keys()
<type 'dict'> ['Lai_1km', 'LaiStdDev_1km']
The function returns a dictionary with has keys ['Lai_1km', 'LaiStdDev_1km', 'FparLai_QC']
:
print lai_file0['Lai_1km'].shape
(800, 600)
Each of these datasets is of shape (1200, 1200)
, but we have read only 600 (columns) and 800 (rows) in this case. Note that the numpy indexing is (rows,cols)
.
We know how to create a mask from a vector dataset from the last session. First, we'll copy and unzip the global country shapefile:
%%bash
cd data
unzip -x -o ../../Chapter4_GDAL/data/ne_50m*zip
cd ..
Archive: ../../Chapter4_GDAL/data/ne_50m_admin_0_countries.zip inflating: ne_50m_admin_0_countries.README.html extracting: ne_50m_admin_0_countries.VERSION.txt extracting: ne_50m_admin_0_countries.cpg inflating: ne_50m_admin_0_countries.dbf inflating: ne_50m_admin_0_countries.prj inflating: ne_50m_admin_0_countries.shp inflating: ne_50m_admin_0_countries.shx
# have to make sure have access to gdal data files
#from lai_library import rasterise_vector
# make a raster mask
# from the layer IRELAND in world.shp
from lai_library import rasterise_vector
filename = filelist[0]
file_template = 'HDF4_EOS:EOS_GRID:"%s":MOD_Grid_MOD15A2:%s'
file_spec = file_template%(filename,'Lai_1km')
mask = rasterise_vector ( file_spec, "data/ne_50m_admin_0_countries.shp", "NAME = 'IRELAND'", verbose=True)
mask = mask.astype(np.bool)
plt.imshow(mask)
plt.colorbar()
>>> Opened file HDF4_EOS:EOS_GRID:"data/MCD15A2.A2005049.h17v03.005.2007360165724.hdf":MOD_Grid_MOD15A2:Lai_1km >>> Projection: PROJCS["unnamed",GEOGCS["Unknown datum based upon the custom spheroid",DATUM["Not specified (based on custom spheroid)",SPHEROID["Custom spheroid",6371007.181,0]],PRIMEM["Greenwich",0],UNIT["degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["longitude_of_center",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["Meter",1]] >>> File size 1200 rows, 1200 columns >>> UL corner: -1.11195e+06, 6.6717e+06 Done!
<matplotlib.colorbar.Colorbar at 0x7f00f4ff0450>
In this case, the data we want is only a small section of the whole spatial dataset.
It would be convenient to extract only the part we want.
We can use numpy.where()
to help with this:
# The mask is True for the area we want
rowpix,colpix = np.where(mask == True)
print rowpix,colpix
[ 556 556 556 ..., 1022 1022 1022] [699 700 701 ..., 471 472 473]
rowpix
and colpix
are lists of pixel coordinates where the condition we specified is True
(i.e. where mask
is False
).
If we wanted to find the bounds of this area, we simply need to know the minimum and maximum column and row in these lists:
mincol,maxcol = min(colpix),max(colpix)
minrow,maxrow = min(rowpix),max(rowpix)
# think about why the + 1 here!!!
# what if maxcol and mincol were the same?
ncol = maxcol - mincol + 1
nrow = maxrow - minrow + 1
print minrow,mincol,nrow,ncol
556 431 467 335
We could use this information to extract only the area we want when we read the data:
lai_file0 = get_lai(filelist[20],\
ncol=ncol,nrow=nrow,mincol=mincol,minrow=minrow)
plt.imshow(lai_file0['Lai_1km'],interpolation='none')
<matplotlib.image.AxesImage at 0x7f00e636ef10>
Now, lets extract this portion of the mask:
small_mask = mask[minrow:minrow+nrow,mincol:mincol+ncol]
plt.imshow(small_mask,interpolation='none')
<matplotlib.image.AxesImage at 0x7f00e62adcd0>
And combine the country mask with the small dataset:
As a recap, we can use the function rasterise_vector
that we gave you last time to develop a raster mask (!) from an ESRI shapefile (ne_50m_admin_0_countries.shp
here).
We can then combine this mask with the QC-derived mask in the LAI dataset. We note that
NaN
in the code below) is False
for good data, if we use for example np.isnan
to flag the NaN
True
within the selected country.To combine them, we want some operator X
for which:
LAI Country Result
False(OK) False (not Ireland) False (invalid pixel)
False(OK) True (Ireland) True (valid pixel)
True (Bad LAI) False (not Ireland) False (invalid pixel)
True (Bad LAI) True ( Ireland) False (invalid pixel)
We can obtain a simple binary relationship if we NOT
the country mask, and AND
it with the LAI mask:
np.logical_and ( lai.mask, np.bitwise_not(small_mask))
If we want to use this mask in a masked array, remember that the mask is True
if the pixel is masked, and False
if it is unmasked. We can flip the mask over using e.g. np.bitwise_not
when using it in the masked array.
However, it's easy to just carry on using NaN
rather than a masked array.
import numpy.ma as ma
lai_file0 = get_lai(filelist[20],
ncol=ncol,nrow=nrow,mincol=mincol,minrow=minrow)
layer = 'Lai_1km'
lai = lai_file0[layer]
small_mask = mask[minrow:minrow+nrow,mincol:mincol+ncol]
# combined mask
lai_mask = np.isnan(lai)
new_mask = np.logical_and(small_mask, np.logical_not(lai_mask))
plt.figure(figsize=(7,7))
plt.imshow(new_mask,interpolation='none')
plt.title("Mask of good observations over Ireland")
lai = ma.array(lai,mask=np.logical_not(new_mask))
plt.figure(figsize=(7,7))
plt.imshow(lai,vmin=0, vmax=5, interpolation='none')
plt.title("Masked array LAI over Ireland")
plt.figure(figsize=(7,7))
plt.title("Non masked array with NaN")
plt.imshow(lai*new_mask, vmin=0, vmax=5, interpolation="nearest")
<matplotlib.image.AxesImage at 0x7f00e598ced0>
We should be used to writing loops around such functions.
In this case, we read all of the files in filelist
and put the data into the dictionary called lai
here.
Because there are multiple layers in the datasets, we loop over layer and append to each list indiviually. But first, let's build the mask:
country = 'UNITED KINGDOM'
# make a raster mask
# from the layer UNITED KINGDOM in world.shp
filename = filelist[0]
file_template = 'HDF4_EOS:EOS_GRID:"%s":MOD_Grid_MOD15A2:%s'
file_spec = file_template%(filename,'Lai_1km')
mask = rasterise_vector ( file_spec,
"data/ne_50m_admin_0_countries.shp", "NAME = '{}'".format(country),
verbose=False)
# extract just the area we want
# by getting the min/max rows/cols
# of the data mask
# The mask is True for the area we want
rowpix,colpix = np.where(mask == True)
mincol,maxcol = min(colpix),max(colpix)
minrow,maxrow = min(rowpix),max(rowpix)
ncol = maxcol - mincol + 1
nrow = maxrow - minrow + 1
# and make a small mask
small_mask = mask[minrow:minrow+nrow, mincol:mincol+ncol]
small_mask = np.where(small_mask == 1, 1., np.nan)
plt.imshow(small_mask, interpolation="nearest")
Done!
<matplotlib.image.AxesImage at 0x7f00e581d310>
data_fields = ['LaiStdDev_1km', 'Lai_1km']
# make a dictionary and put the filenames in it
# along with the mask and min/max info
# & data_fields with empty lists
lai = {'filenames':np.sort(filelist),
'minrow':minrow,
'mincol':mincol,
'mask':small_mask,
'LaiStdDev_1km':[],
'Lai_1km':[] }
# loop over each filename
for ii, f in enumerate(np.sort(lai['filenames'])):
this_lai = get_lai(f,
mincol=mincol,ncol=ncol,
minrow=minrow,nrow=nrow)
for layer in data_fields:
# apply the mask
lai_p = this_lai[layer]
this_lai[layer] = lai_p*small_mask
lai[layer].append(this_lai[layer])
# have a look at one of these
i = 20
# just see what the shape is ...
print lai['Lai_1km'][i].shape
root = 'images/lai_uk'
cmap = plt.cm.Greens
f = lai['filenames'][i]
fig = plt.figure(figsize=(7,7))
# get some info from filename
file_id = f.split('/')[-1].split('.')[-5][1:]
print file_id
plt.imshow(lai['Lai_1km'][i],cmap=cmap,interpolation='none',vmax=4.,vmin=0.0)
# plot a jpg
plt.title(file_id)
plt.colorbar()
plt.savefig('images/lai_uk_%s.jpg'%file_id)
(1197, 568) 2005161
# thats quite good, so put as a function:
import numpy.ma as ma
import numpy as np
import sys
sys.path.insert(0,'python')
from lai_library import get_lai, rasterise_vector
def read_lai(filelist,country=None):
'''
Read MODIS LAI data from a set of files
in the list filelist. Data assumed to be in
directory datadir.
Parameters:
filelist : list of LAI files
Options:
country : country name (in data/world.shp)
Returns:
lai dictionary
'''
if country:
# make a raster mask
# from the layer UNITED KINGDOM in world.shp
filename = filelist[0]
file_template = 'HDF4_EOS:EOS_GRID:"%s":MOD_Grid_MOD15A2:%s'
file_spec = file_template%(filename,'Lai_1km')
mask = rasterise_vector ( file_spec,
"data/ne_50m_admin_0_countries.shp", "NAME = '{}'".format(country),
verbose=False)
# extract just the area we want
# by getting the min/max rows/cols
# of the data mask
# The mask is True for the area we want
rowpix,colpix = np.where(mask == True)
mincol,maxcol = min(colpix),max(colpix)
minrow,maxrow = min(rowpix),max(rowpix)
ncol = maxcol - mincol + 1
nrow = maxrow - minrow + 1
# and make a small mask
small_mask = mask[minrow:minrow+nrow, mincol:mincol+ncol]
small_mask = np.where(small_mask == 1, 1., np.nan)
else:
# no country
mincol = 0
maxcol = 0
ncol = None
nrow = None
# data_fields with empty lists
data_fields = ['LaiStdDev_1km','Lai_1km']
# make a dictionary and put the filenames in it
# along with the mask and min/max info
lai = {'filenames':np.sort(filelist),
'minrow':minrow,'mincol':mincol,
'mask':small_mask,
'Lai_1km': [],
'LaiStdDev_1km': []}
# loop over each filename
for f in np.sort(lai['filenames']):
this_lai = get_lai(f,
mincol=mincol,ncol=ncol,
minrow=minrow,nrow=nrow)
for layer in data_fields:
if country:
# apply the mask
lai_p = this_lai[layer]
this_lai[layer] = lai_p*small_mask
lai[layer].append(this_lai[layer])
else:
lai_p = this_lai[layer]
lai[layer].append(this_lai[layer])
return lai
# test this ... the one in the file
# does a cutout of the data area as well
# which will keep the memory
# requirements down
#from get_lai import read_lai
lai = read_lai(filelist,country='IRELAND')
# have a look at one of these
i = 20
# just see what the shape is ...
print lai['Lai_1km'][i].shape
root = 'images/lai_eire'
cmap = plt.cm.Greens
f = lai['filenames'][i]
fig = plt.figure(figsize=(7,7))
# get some info from filename
file_id = f.split('/')[-1].split('.')[-5][1:]
print file_id
plt.imshow(lai['Lai_1km'][i],cmap=cmap,interpolation='none',vmax=4.,vmin=0.0)
# plot a jpg
plt.title(file_id)
plt.colorbar()
plt.savefig('%s_%s.jpg'%(root,file_id))
Done! (467, 335) 2005161
# make a movie
import pylab as plt
import os
# just see what the shape is ...
print lai['Lai_1km'][0].shape
root = 'images/lai_country_eire'
cmap = plt.cm.Greens
for i,f in enumerate(lai['filenames']):
fig = plt.figure(figsize=(7,7))
# get some info from filename
file_id = f.split('/')[-1].split('.')[-5][1:]
print file_id
plt.imshow(lai['Lai_1km'][i],cmap=cmap,interpolation='none',vmax=4.,vmin=0.0)
# plot a jpg
plt.title(file_id)
plt.colorbar()
plt.savefig('%s_%s.jpg'%(root,file_id))
plt.close(fig)
cmd = 'convert -delay 100 -loop 0 {0}_*.jpg {0}_movie.gif'.format(root)
os.system(cmd)
(467, 335) 2005001 2005009 2005017 2005025 2005033 2005041 2005049 2005057 2005065 2005073 2005081 2005089 2005097 2005105 2005113 2005121 2005129 2005137 2005145 2005153 2005161 2005169 2005177 2005185 2005193 2005201 2005209 2005217 2005225 2005233 2005241 2005249 2005257 2005265 2005273 2005281 2005289 2005297 2005305 2005313 2005321 2005329 2005337 2005345 2005353 2005361
0
# The movie making works, so pack that into a function
import pylab as plt
import os
root = 'images/lai_eire'
def make_movie(lai,root,layer='Lai_1km',vmax=4.,vmin=0.,do_plot=False):
'''
Make an animated gif from MODIS LAI data in
dictionary 'lai'.
Parameters:
lai : data dictionary
root : root file /directory name of frames and movie
layer : data layer to plot
vmax : max value for plotting
vmin : min value for plotting
do_plot: set True if you want the individual plots
to display
Returns:
movie name
'''
cmap = plt.cm.Greens
for i,f in enumerate(lai['filenames']):
fig = plt.figure(figsize=(7,7))
# get some info from filename
file_id = f.split('/')[-1].split('.')[-5][1:]
print file_id
plt.imshow(lai[layer][i],cmap=cmap,interpolation='none',\
vmax=vmax,vmin=vmin)
# plot a jpg
plt.title(file_id)
plt.colorbar()
plt.savefig('%s_%s.jpg'%(root,file_id))
if not do_plot:
plt.close(fig)
cmd = 'convert -delay 100 -loop 0 {0}_*.jpg {0}_movie.gif'.format(root)
os.system(cmd)
return '{0}_movie.gif'.format(root)
import glob
year = 2005
tile = "h17v03"
# test it
# Search for all the files using wildcards
filelist = glob.glob ("data/MCD15A2.A{}*{}*.hdf".format(year, tile))
filelist.sort()
lai_uk = read_lai(filelist,country='UNITED KINGDOM')
root = 'images/lai_UK'
movie = make_movie(lai_uk,root)
print movie
Done! 2005001 2005009 2005017 2005025 2005033 2005041 2005049 2005057 2005065 2005073 2005081 2005089 2005097 2005105 2005113 2005121 2005129 2005137 2005145 2005153 2005161 2005169 2005177 2005185 2005193 2005201 2005209 2005217 2005225 2005233 2005241 2005249 2005257 2005265 2005273 2005281
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-20-368cf0ed4639> in <module>() 8 lai_uk = read_lai(filelist,country='UNITED KINGDOM') 9 root = 'images/lai_UK' ---> 10 movie = make_movie(lai_uk,root) 11 print movie <ipython-input-19-25bd2db3bc9d> in make_movie(lai, root, layer, vmax, vmin, do_plot) 36 plt.title(file_id) 37 plt.colorbar() ---> 38 plt.savefig('%s_%s.jpg'%(root,file_id)) 39 if not do_plot: 40 plt.close(fig) /opt/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in savefig(*args, **kwargs) 696 fig = gcf() 697 res = fig.savefig(*args, **kwargs) --> 698 fig.canvas.draw_idle() # need this if 'transparent=True' to reset colors 699 return res 700 /opt/anaconda/lib/python2.7/site-packages/matplotlib/backend_bases.pyc in draw_idle(self, *args, **kwargs) 2030 if not self._is_idle_drawing: 2031 with self._idle_draw_cntx(): -> 2032 self.draw(*args, **kwargs) 2033 2034 def draw_cursor(self, event): /opt/anaconda/lib/python2.7/site-packages/matplotlib/backends/backend_agg.pyc in draw(self) 462 463 try: --> 464 self.figure.draw(self.renderer) 465 finally: 466 RendererAgg.lock.release() /opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 61 def draw_wrapper(artist, renderer, *args, **kwargs): 62 before(artist, renderer) ---> 63 draw(artist, renderer, *args, **kwargs) 64 after(artist, renderer) 65 /opt/anaconda/lib/python2.7/site-packages/matplotlib/figure.pyc in draw(self, renderer) 1141 1142 mimage._draw_list_compositing_images( -> 1143 renderer, self, dsu, self.suppressComposite) 1144 1145 renderer.close_group('figure') /opt/anaconda/lib/python2.7/site-packages/matplotlib/image.pyc in _draw_list_compositing_images(renderer, parent, dsu, suppress_composite) 137 if not_composite or not has_images: 138 for zorder, a in dsu: --> 139 a.draw(renderer) 140 else: 141 # Composite any adjacent images together /opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 61 def draw_wrapper(artist, renderer, *args, **kwargs): 62 before(artist, renderer) ---> 63 draw(artist, renderer, *args, **kwargs) 64 after(artist, renderer) 65 /opt/anaconda/lib/python2.7/site-packages/matplotlib/axes/_base.pyc in draw(self, renderer, inframe) 2407 renderer.stop_rasterizing() 2408 -> 2409 mimage._draw_list_compositing_images(renderer, self, dsu) 2410 2411 renderer.close_group('axes') /opt/anaconda/lib/python2.7/site-packages/matplotlib/image.pyc in _draw_list_compositing_images(renderer, parent, dsu, suppress_composite) 137 if not_composite or not has_images: 138 for zorder, a in dsu: --> 139 a.draw(renderer) 140 else: 141 # Composite any adjacent images together /opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 61 def draw_wrapper(artist, renderer, *args, **kwargs): 62 before(artist, renderer) ---> 63 draw(artist, renderer, *args, **kwargs) 64 after(artist, renderer) 65 /opt/anaconda/lib/python2.7/site-packages/matplotlib/axis.pyc in draw(self, renderer, *args, **kwargs) 1139 1140 for tick in ticks_to_draw: -> 1141 tick.draw(renderer) 1142 1143 # scale up the axis label box to also find the neighbors, not /opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 61 def draw_wrapper(artist, renderer, *args, **kwargs): 62 before(artist, renderer) ---> 63 draw(artist, renderer, *args, **kwargs) 64 after(artist, renderer) 65 /opt/anaconda/lib/python2.7/site-packages/matplotlib/axis.pyc in draw(self, renderer) 257 self.gridline.draw(renderer) 258 if self.tick1On: --> 259 self.tick1line.draw(renderer) 260 if self.tick2On: 261 self.tick2line.draw(renderer) /opt/anaconda/lib/python2.7/site-packages/matplotlib/artist.pyc in draw_wrapper(artist, renderer, *args, **kwargs) 61 def draw_wrapper(artist, renderer, *args, **kwargs): 62 before(artist, renderer) ---> 63 draw(artist, renderer, *args, **kwargs) 64 after(artist, renderer) 65 /opt/anaconda/lib/python2.7/site-packages/matplotlib/lines.pyc in draw(self, renderer) 824 825 if self._marker and self._markersize > 0: --> 826 gc = renderer.new_gc() 827 self._set_gc_clip(gc) 828 rgbaFace = self._get_rgba_face() /opt/anaconda/lib/python2.7/site-packages/matplotlib/backend_bases.pyc in new_gc(self) 726 Return an instance of a :class:`GraphicsContextBase` 727 """ --> 728 return GraphicsContextBase() 729 730 def points_to_pixels(self, points): /opt/anaconda/lib/python2.7/site-packages/matplotlib/backend_bases.pyc in __init__(self) 800 self._hatch_linewidth = rcParams['hatch.linewidth'] 801 self._url = None --> 802 self._gid = None 803 self._snap = None 804 self._sketch = None KeyboardInterrupt:
So, we can load the data we want from multiple MODIS hdf files that we have downloaded from the NASA server into a 3D masked numpy array, with a country boundary mask (projected int the raster data coordinate system) from a vector dataset.
Let's start to explore the data then.
You should have an array of LAI for Ireland:
type(lai['Lai_1km'])
list
Let's plot the LAI for some given pixels.
First, we might like to identify which pixels actually have any data.
A convenient function for this would be np.where
that returns the indices of items that are True
.
Since the data mask is False
for good data, we take the complement ~
so that good data are `True:
data = np.array(lai['Lai_1km'])
np.where(np.isfinite(data))
(array([ 3, 3, 3, ..., 39, 39, 39]), array([318, 320, 321, ..., 466, 466, 466]), array([ 73, 136, 74, ..., 37, 40, 42]))
An example good pixel this is (3, 320, 136). Let's look at this for all time periods:
r = 320
c = 136
pixel = data[:, r, c]/10.
# plot red stars at the data points
plt.plot(np.arange(len(pixel))*8,pixel,'r*')
# plot a black (k) dashed line (--)
plt.plot(np.arange(len(pixel))*8,pixel,'k--')
plt.xlabel('doy')
plt.ylabel('LAI')
plt.title('pixel %03d %03d'%(r,c))
<matplotlib.text.Text at 0x7f00e1b82cd0>
The data follow the trend of what we might expect for LAI development, but they are clearly a little noisy.
We also have access to uncertainty information (standard deviation):
# copy the data in case we change it any
data = np.array(lai['Lai_1km']) * 1.
sd = np.array(lai['LaiStdDev_1km'])*1.
r = 320
c = 136
pixel = data[:,r,c]
pixel_sd = sd[:,r,c]
x = np.arange(len(pixel))*8
# plot red stars at the data points
plt.plot(x,pixel,'r*')
# plot a black (k) dashed line (--)
plt.plot(x,pixel,'k--')
# plot error bars:
# 1.96 because that is the 95% confidence interval
plt.errorbar(x,pixel,yerr=pixel_sd*1.96)
plt.xlabel('doy')
plt.ylabel('LAI')
plt.title('pixel %03d %03d'%(r,c))
<matplotlib.text.Text at 0x7f00e1ab7b50>
We would generally expect LAI to be quite smoothly varying over time. Visualising the data with 95% confidence intervals is quite useful as we can now 'imagine' some smooth line that would generally go within these bounds.
Some of the uncertainty estimates are really rather small though, which are probably not reliable.
Let's inflate them:
# copy the data in case we change it any
data = np.array(lai['Lai_1km']) * 1.
sd = np.array(lai['LaiStdDev_1km'])*1.
r = 320
c = 136
pixel = data[:,r,c]
pixel_sd = sd[:,r,c]
# threshold
thresh = 0.25
pixel_sd[pixel_sd<thresh] = thresh
x = np.arange(len(pixel))*8
# plot red stars at the data points
plt.plot(x,pixel,'r*')
# plot a black (k) dashed line (--)
plt.plot(x,pixel,'k--')
# plot error bars:
# 1.96 because that is the 95% confidence interval
plt.errorbar(x,pixel,yerr=pixel_sd*1.96)
plt.xlabel('doy')
plt.ylabel('LAI')
plt.title('pixel %03d %03d'%(r,c))
<matplotlib.text.Text at 0x7f00e1a0c550>
This is perhaps a bit more realistic ...
The data now have some missing values (data gaps) and, as we have noted, are a little noisy.
A Python module we can use for many scientific functions is scipy
, in particular here, the scipy
interpolation functions.
We need to make a careful choice of the interpolation functions.
We might, in many circumstances simply want something that interpolates between data points, i.e. that goes through the data points that we have.
Many interpolators will not provide extrapolation, so in the example above we could not get an estimate of LAI prior to the first sample and after the last.
The best way to deal with that would be to have multiple years of data.
Instead here, we will repeat the dataset three times to mimic this:
from scipy import interpolate
pixel = data[:,r,c]
# original x,y
y_ = pixel
x_ = (np.arange(len(y_))*8.+1)[np.isfinite(pixel)]
y_ = y_[np.isfinite(pixel)]
# extend: using np.tile() to repeat data
y_extend = np.tile(y_,3)
# extend: using vstack to stack 3 different arrays
x_extend = np.hstack((x_-46*8,x_,x_+46*8))
# plot the extended dataset
plt.figure(figsize=(12,3))
plt.plot(x_extend,y_extend,'b')
plt.plot(x_,y_,'k+')
plt.axvline(0, color="r")
plt.axvline(365, color="r")
plt.xlim(-356,2*365)
plt.xlabel('day of year')
plt.ylabel('LAI')
<matplotlib.text.Text at 0x7f00e119e2d0>
# define xnew at 1 day interval
xnew = np.arange(1.,366.)
# linear interpolation
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
ynew = f(xnew)
plt.plot(xnew,ynew)
plt.plot(x_,y_,'r+')
plt.xlim(1,366)
(1, 366)
# cubic interpolation
f = interpolate.interp1d(x_extend,y_extend,kind='cubic')
ynew = f(xnew)
plt.plot(xnew,ynew)
plt.plot(x_,y_,'r+')
plt.xlim(1,366)
(1, 366)
# nearest neighbour interpolation
f = interpolate.interp1d(x_extend,y_extend,kind='nearest')
ynew = f(xnew)
plt.plot(xnew,ynew)
plt.plot(x_,y_,'r+')
plt.xlim(1,366)
(1, 366)
Depending on the problem you are trying to solve, different interpolation schemes will be appropriate. For categorical data (e.g. 'snow', coded as 1 and 'no snow' coded as 1), for instance, a nearest neighbour interpolation might be a good idea.
One issue with the schemes above is that they go exactly through the data points, but a more realistic description of the data might be one that incorporated the uncertainty information we have. Visually, this is quite easy to imagine, but how can we implement such ideas?
One way of thinking about this is to think about other sources of information that we might bring to bear on the problem. One such would be that we expect the function to be 'quite smooth'. This allows us to consider applying smoothness as an additional constraint to the solution.
Many such problems can be phrased as convolution operations.
Convolution is a form of digital filtering that combines two sequences of numbers $y$ and $w$ to give a third, the result $z$ that is a filtered version of $y$, where for each element $j$ of $y$:
$$ z_j = \sum_{i=-n}^{i=n}{w_i y_{j+i}} $$where $n$ is the half width of the filter $w$. For a smoothing filter, the elements of this will sum to 1 (so that the magnitude of $y$ is not changed).
To illustrate this in Python:
# a simple box smoothing filter
# filter width 11
w = np.ones(11)
# normalise
w = w/w.sum()
# half width
n = len(w)/2
# Take the linear interpolation of the LAI above as the signal
# linear interpolation
x = xnew
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
y = f(x)
# where we will put the result
z = np.zeros_like(y)
# This is a straight implementation of the
# equation above
for j in xrange(n,len(y)-n):
for i in xrange(-n,n+1):
z[j] += w[n+i] * y[j+i]
plt.plot(x,y,'k--',label='y')
plt.plot(x,z,'r',label='z')
plt.xlim(x[0],x[-1])
plt.legend(loc='best')
plt.title('smoothing with filter width %d'%len(w))
<matplotlib.text.Text at 0x7f00e0ab9f50>
As we suggested, the result of convolving $y$ with the filter $w$ (of width 31 here) is $z$, a smoothed version of $y$.
You might notice that the filter is only applied once we are n
samples into the signal, so we get 'edge effects'. There are various ways of dealing with edge effects, such as repeating the signal (as we did above, for much the same reason), reflecting the signal, or assuming the signal to be some constant value (e.g. 0) outside of its defined domain.
If we make the filter wider (width 31 now):
# a simple box smoothing filter
# filter width 31
w = np.ones(31)
# normalise
w = w/w.sum()
# half width
n = len(w)/2
# Take the linear interpolation of the LAI above as the signal
# linear interpolation
x = xnew
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
y = f(x)
# where we will put the result
z = np.zeros_like(y)
# This is a straight implementation of the
# equation above
for j in xrange(n,len(y)-n):
for i in xrange(-n,n+1):
z[j] += w[n+i] * y[j+i]
plt.plot(x,y,'k--',label='y')
plt.plot(x,z,'r',label='z')
plt.xlim(x[0],x[-1])
plt.legend(loc='best')
plt.title('smoothing with filter width %d'%len(w))
<matplotlib.text.Text at 0x7f00e0951410>
Then the signal is 'more' smoothed.
There are many filters implemented in scipy.signal
that you should look over.
A very commonly used smoothing filter is the Savitsky-Golay filter for which you define the window size and filter order.
As with most filters, the filter width controls the degree of smoothing (see examples above). The filter order (related to polynomial order) in essence controls the shape of the filter and defines the 'peakiness' of the response.
import sys
sys.path.insert(0,'python')
# see http://wiki.scipy.org/Cookbook/SavitzkyGolay
from savitzky_golay import *
window_size = 31
order = 1
# Take the linear interpolation of the LAI above as the signal
# linear interpolation
x = xnew
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
y = f(x)
z = savitzky_golay(y,window_size,order)
plt.plot(x,y,'k--',label='y')
plt.plot(x,z,'r',label='z')
plt.xlim(x[0],x[-1])
plt.legend(loc='best')
plt.title('smoothing with filter width %d order %.2f'%(window_size,order))
<matplotlib.text.Text at 0x7f00e076f8d0>
import sys
sys.path.insert(0,'python')
# see http://wiki.scipy.org/Cookbook/SavitzkyGolay
from savitzky_golay import *
window_size = 61
order = 2
# Take the linear interpolation of the LAI above as the signal
# linear interpolation
x = xnew
f = interpolate.interp1d(x_extend,y_extend,kind='linear')
y = f(x)
z = savitzky_golay(y,window_size,order)
plt.plot(x,y,'k--',label='y')
plt.plot(x,z,'r',label='z')
plt.xlim(x[0],x[-1])
plt.legend(loc='best')
plt.title('smoothing with filter width %d order %.2f'%(window_size,order))
<matplotlib.text.Text at 0x7f00e0e242d0>
If the samples $y$ have uncertainty (standard deviation $\sigma_j$ for sample $j$) associated with them, we can incorporate this into smoothing, although many of the methods in scipy
and numpy
do not directly allow for this.
Instead, we call an optimal interpolation scheme (a regulariser) here that achieves this. This also has the advantage of giving an estimate of uncertainty for the smoothed samples.
In this case, the parameters are: order
(as above, but only integer in this implementation) and wsd
which is an estimate of the variation (standard deviation) in the signal that control smoothness.
import glob
from smoothn import *
tile = 'h17v03'
year = '2005'
# Search for all the files using wildcards
filelist = glob.glob ("data/MCD15A2.A{}*{}*.hdf".format(year, tile))
filelist.sort()
try:
data = np.array(lai['Lai_1km'])
sd = np.array(lai['LaiStdDev_1km'])
except:
lai = read_lai(filelist,country='IRELAND')
data = np.array(lai['Lai_1km'])
sd = np.array(lai['LaiStdDev_1km'])
thresh = 0.25
sd[sd<thresh] = thresh
#print np.where(np.isfinite(data))
r = 320
c = 136
# this is about the right amount of smoothing here
gamma = .7
pixel = data[:,r,c]
pixel_sd = sd[:,r,c]
#pixel[np.isnan(pixel)] = 0.
#pixel_sd[np.isnan(pixel_sd)] = 0.
x = np.arange(46)*8+1
order = 2
z = smoothn(pixel, s=gamma, sd=pixel_sd, smoothOrder=2.0)[0]
# plot
plt.plot(x,pixel,'k*',label='y')
plt.errorbar(x,pixel,pixel_sd*1.96)
plt.plot(x,z,'r',label='z')
# lower and upper bounds of 95% CI
plt.xlim(1,366)
plt.ylim(0.,5)
plt.legend(loc='best')
/opt/anaconda/lib/python2.7/site-packages/ipykernel_launcher.py:19: RuntimeWarning: invalid value encountered in less python/smoothn.py:179: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future. if sd != None: python/smoothn.py:186: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future. if W != None: python/smoothn.py:204: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future. if W == None:
<matplotlib.legend.Legend at 0x7f00e10f8410>
# test it on a new pixel
r = 432
c = 86
gamma = 5
pixel = data[:,r,c]
pixel_sd = sd[:,r,c]
x = np.arange(46)*8+1
order = 2
z = smoothn(pixel,s=gamma,sd=pixel_sd,smoothOrder=2.0)[0]
# plot
plt.plot(x,pixel,'k*',label='y')
plt.errorbar(x,pixel,pixel_sd*1.96)
plt.plot(x,z,'r',label='z')
plt.xlim(1,366)
plt.legend(loc='best')
z.ndim
1
# and test it on a new pixel
r = 372
c = 56
#r = 9
#c = 277
gamma = .1
pixel = data[:,r,c]
pixel_sd = sd[:,r,c]
x = np.arange(46)*8+1
order = 2
# solve for gamma - degree of smoothness
zz = smoothn(pixel,sd=pixel_sd,smoothOrder=2.0)
z = zz[0]
print zz[1],zz[2]
gamma = zz[1]
# plot
plt.plot(x,pixel,'k*',label='y')
plt.errorbar(x,pixel,pixel_sd*1.96)
plt.plot(x,z,'r',label='z')
plt.xlim(1,366)
plt.legend(loc='best')
0.652313558672 True
<matplotlib.legend.Legend at 0x7fcb2f5b2f10>
To apply this approach to our 3D dataset, we could simply loop over all pixels.
Note that any per-pixel processing will be slow ... but this is quite a fast smoothing method, so is feasible here.
# we have put in an axis control to smoothn
# here so it will only smooth over doy
# This will take a few minutes to process
# we switch on verbose mode to get some feedback
# on progress
# make a mask of pixels where there is at least 1 sample
# over the time period
#mask = (data.mask.sum(axis=0) == 0)
#mask = np.array([mask]*data.shape[0])
z = smoothn(data,s=5.0,sd=sd,smoothOrder=2.0,axis=0,TolZ=0.05,verbose=True)[0]
#z = ma.array(z,mask=mask)
python/smoothn.py:181: RuntimeWarning: invalid value encountered in greater mask = (sd > 0.)
tol 1.0 nit 0 tol 1.03468577991 nit 1 tol 0.694005334662 nit 2 tol 0.552663819671 nit 3 tol 0.378617705139 nit 4 tol 0.296861598869 nit 5 tol 0.211077369397 nit 6 tol 0.161583424458 nit 7 tol 0.117940775214 nit 8 tol 0.0890834140148 nit 9 tol 0.0661973636164 nit 10
plt.figure(figsize=(9,9))
plt.imshow(z[20],interpolation='none',vmax=6)
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7fcb30bff590>
# similarly, take frame 20
# and smooth that
ZZ = smoothn(z[20],smoothOrder=2.)
# self-calibrated smoothness term
s = ZZ[1]
print 's =',s
Z = ZZ[0]
plt.figure(figsize=(9,9))
plt.imshow(Z,interpolation='none',vmax=6)
plt.colorbar()
s = 0.75661616792
<matplotlib.colorbar.Colorbar at 0x7fcb33e0eb10>
# similarly, take frame 20
# and smooth that
ZZ = smoothn(z,s=s,smoothOrder=2.,axis=(1,2),verbose=True)
Z = ZZ[0]
plt.figure(figsize=(9,9))
plt.imshow(Z[30],interpolation='none',vmax=6)
plt.colorbar()
tol 1.0 nit 0
<matplotlib.colorbar.Colorbar at 0x7fcb2dae4e10>
x = np.arange(46)*8+1.
try:
plt.plot(x,np.mean(Z,axis=(1,2)))
plt.plot(x,np.min(Z,axis=(1,2)),'r--')
plt.plot(x,np.max(Z,axis=(1,2)),'r--')
except:
plt.plot(x,np.mean(Z,axis=2).mean(axis=1))
plt.plot(x,np.min(Z,axis=2).min(axis=1),'r--')
plt.plot(x,np.max(Z,axis=2).max(axis=1),'r--')
plt.title('LAI variation of Eire')
<matplotlib.text.Text at 0x7fcb30b86610>
# or doing this pixel by pixel ...
# which is slower than using axis
order = 2
# pixels that have some data
mask = (~data.mask).sum(axis=0)
odata = np.zeros((46,) + mask.shape)
rows,cols = np.where(mask>0)
len_x = len(rows)
order = 2
gamma = 5.
for i in xrange(len_x):
r,c = rows[i],cols[i]
# progress bar
if i%(len_x/20) == 0:
print '... %4.2f percent'%(i*100./float(len_x))
pixel = data[:,r,c]
pixel_sd = sd[:,r,c]
zz = smoothn(pixel,s=gamma,sd=pixel_sd,smoothOrder=order,TolZ=0.05)
odata[:,rows[i],cols[i]] = zz[0]
... 0.00 percent ... 5.00 percent ... 10.00 percent ... 15.00 percent ... 20.00 percent ... 25.00 percent ... 30.00 percent ... 35.00 percent ... 40.00 percent ... 45.00 percent ... 50.00 percent ... 55.00 percent ... 60.00 percent ... 65.00 percent ... 70.00 percent ... 75.00 percent ... 80.00 percent ... 85.00 percent ... 90.00 percent ... 95.00 percent
import pylab as plt
import os
root = 'images/lai_eire_colourZ'
for i,f in enumerate(lai['filenames']):
fig = plt.figure(figsize=(7,7))
# get some info from filename
file_id = f.split('/')[-1].split('.')[-5][1:]
print file_id
plt.imshow(Z[i],interpolation='none',vmax=6.,vmin=0.0)
# plot a jpg
plt.title(file_id)
plt.colorbar()
plt.savefig('%s_%s.jpg'%(root,file_id))
plt.close(fig)
2005001 2005009 2005017 2005025 2005033 2005041 2005049 2005057 2005065 2005073 2005081 2005089 2005097 2005105 2005113 2005121 2005129 2005137 2005145 2005153 2005161 2005169 2005177 2005185 2005193 2005201 2005209 2005217 2005225 2005233 2005241 2005249 2005257 2005265 2005273 2005281 2005289 2005297 2005305 2005313 2005321 2005329 2005337 2005345 2005353 2005361
cmd = 'convert -delay 100 -loop 0 {0}_*.jpg {0}_movie2.gif'.format(root)
os.system(cmd)
0
Sometimes, instead of applying some arbitrary smoothing function to data, we want to extract particular infromation from the time series.
One way to approach this is to fit some function to the time series at each location.
Let us suppose that we wish to characterise the phenology of vegetation in Ireland.
One way we could do this would be to look in the lai data for the most rapid changes.
Another would be to explicitly fit some mathematical function to the LAI data that would would expect to descrive typical LAI trajectories.
One example of such a function is the double logistic. A logistic function is:
$$ \hat{y} = p_0 - p_1 \left( \frac{1}{1 + e^{p_2 (t - p_3)}} + \frac{1}{1 + e^{p_4 (t - p_5)}} -1\right) $$We can give a function for a double logistic:
def dbl_logistic_model ( p, t ):
"""A double logistic model, as in Sobrino and Juliean,
or Zhang et al"""
return p[0] - p[1]* ( 1./(1+np.exp(p[2]*(t-p[3]))) + \
1./(1+np.exp(-p[4]*(t-p[5]))) - 1 )
tile = 'h17v03'
year = '2005'
# specify the file with the urls in
ifile= 'data/modis_lai_%s_%s.txt'%(tile,year)
fp = open(ifile)
filelist = [url.split('/')[-1].strip() for url in fp.readlines()]
fp.close()
import sys
sys.path.insert(0,'python')
from get_lai import *
try:
data = lai['Lai_1km']
sd = lai['LaiStdDev_1km']
except:
lai = read_lai(filelist,country='IRELAND')
data = lai['Lai_1km']
sd = lai['LaiStdDev_1km']
thresh = 0.25
sd[sd<thresh] = thresh
# test pixel
r = 472
c = 84
y = data[:,r,c]
mask = ~y.mask
y = np.array(y[mask])
x = (np.arange(46)*8+1.)[mask]
unc = np.array(sd[:,r,c][mask])
And see what this looks like:
# define x (time)
x_full = np.arange(1,366)
# some default values for the parameters
p = np.zeros(6)
# some stats on y
ysd = np.std(y)
ymean = np.mean(y)
# some rough guesses at the parameters
p[0] = ymean - 1.151*ysd; # minimum (1.151 is 75% CI)
p[1] = 2*1.151*ysd # range
p[2] = 0.19 # related to up slope
p[3] = 120 # midpoint of up slope
p[4] = 0.13 # related to down slope
p[5] = 220 # midpoint of down slope
y_hat = dbl_logistic_model(p,x_full)
plt.clf()
plt.plot(x_full,y_hat)
plt.plot(x,y,'*')
plt.errorbar(x,y,unc*1.96)
<Container object of 3 artists>
We could manually 'tweak' the parameters until we got a better 'fit' to the observations.
First though, let's define a measure of 'fit':
$$ Z_i = \frac{\hat{y}_i - y_i}{\sigma_i} $$$$ Z^2 = \sum_i{Z_i^2} = \sum_i{\left( \frac{\hat{y}_i - y_i}{\sigma_i} \right)^2} $$and implement this as a mismatch function where we have data points:
def mismatch_function(p, x, y, unc):
y_hat = dbl_logistic_model(p, x)
diff = (y_hat - y)/unc
return diff
Z = mismatch_function(p,x,y,unc)
plt.plot([1,365.],[0,0.],'k-')
plt.xlim(0,365)
plt.plot(x,Z,'*')
print 'Z^2 =',(Z**2).sum()
Z^2 = 113.325251358
Now lets change p a bit:
p[0] = ymean - 1.151*ysd; # minimum (1.151 is 75% CI)
p[1] = 2*1.151*ysd # range
p[2] = 0.19 # related to up slope
p[3] = 140 # midpoint of up slope
p[4] = 0.13 # related to down slope
p[5] = 220 # midpoint of down slope
Z = mismatch_function(p,x,y,unc)
plt.plot([1,365.],[0,0.],'k-')
plt.xlim(0,365)
plt.plot(x,Z,'*')
print 'Z^2 =',(Z**2).sum()
Z^2 = 105.478274642
We have made the mismatch go down a little ...
Clearly it would be tedious (and impractical) to do a lot of such tweaking, so we can use methods that seek the minimum of some function.
One such method is implemented in scipy.optimize.leastsq
:
from scipy import optimize
# initial estimate is in p
print 'initial parameters:',p[0],p[1],p[2],p[3],p[4],p[5]
# set some bounds for the parameters
bound = np.array([(0.,10.),(0.,10.),(0.01,1.),\
(50.,300.),(0.01,1.),(50.,300.)])
# test pixel
r = 472
c = 84
y = data[:,r,c]
mask = ~y.mask
y = np.array(y[mask])
x = (np.arange(46)*8+1.)[mask]
unc = np.array(sd[:,r,c][mask])
# define function to give Z^2
def sse(p,x,y,unc):
'''Sum of squared error'''
# penalise p[3] > p[5]
err = np.max([0.,(p[3] - p[5])])*1e4
return (mismatch_function(p,x,y,unc)**2).sum()+err
# we pass the function:
#
# sse : the name of the function we wrote to give
# sum of squares of Z_i
# p : an initial estimate of the parameters
# args=(x,y,unc) : the other information (other than p) that
# mismatch_function needs
# approx_grad : if we dont have a function for the gradien
# we have to get the solver to approximate it
# which takes time ... see if you can work out
# d_sse / dp and use that to speed this up!
psolve = optimize.fmin_l_bfgs_b(sse,p,approx_grad=True,iprint=-1,\
args=(x,y,unc),bounds=bound)
print psolve[1]
pp = psolve[0]
plt.plot(x,y,'*')
plt.errorbar(x,y,unc*1.96)
y_hat = dbl_logistic_model(pp,x_full)
plt.plot(x_full,y_hat)
print 'solved parameters: ',pp[0],pp[1],pp[2],pp[3],pp[4],pp[5]
# if we define the phenology as the parameter p[3]
# and the 'length' of the growing season:
print 'phenology',pp[3],pp[5]-pp[3]
initial parameters: 0.468054306457 1.78821571141 0.19 140.0 0.13 220.0 23.2758233693 solved parameters: 0.615260788506 2.86137770501 0.0359482293173 158.838968533 0.0384063066821 227.611227167 phenology 158.838968533 68.7722586339
# and run over each pixel ... this will take some time
# pixels that have some data
mask = (~data.mask).sum(axis=0)
pdata = np.zeros((7,) + mask.shape)
rows,cols = np.where(mask>0)
len_x = len(rows)
# lets just do some random ones to start with
#rows = rows[::10]
#cols = cols[::10]
len_x = len(rows)
for i in xrange(len_x):
r,c = rows[i],cols[i]
# progress bar
if i%(len_x/40) == 0:
print '... %4.2f percent'%(i*100./float(len_x))
y = data[:,r,c]
mask = ~y.mask
y = np.array(y[mask])
x = (np.arange(46)*8+1.)[mask]
unc = np.array(sd[:,r,c][mask])
# need to get an initial estimate of the parameters
# some stats on y
ysd = np.std(y)
ymean = np.mean(y)
p[0] = ymean - 1.151*ysd; # minimum (1.151 is 75% CI)
p[1] = 2*1.151*ysd # range
p[2] = 0.19 # related to up slope
p[3] = 140 # midpoint of up slope
p[4] = 0.13 # related to down slope
p[5] = 220 # midpoint of down slope
# set factr to quite large number (relative error in solution)
# as it'll take too long otherwise
psolve = optimize.fmin_l_bfgs_b(sse,p,approx_grad=True,iprint=-1,\
args=(x,y,unc),bounds=bound,factr=1e12)
pdata[:-1,rows[i],cols[i]] = psolve[0]
pdata[-1,rows[i],cols[i]] = psolve[1] # sse
... 0.00 percent ... 2.50 percent ... 5.00 percent ... 7.50 percent ... 10.00 percent ... 12.50 percent ... 15.00 percent ... 17.50 percent ... 19.99 percent ... 22.49 percent ... 24.99 percent ... 27.49 percent ... 29.99 percent ... 32.49 percent ... 34.99 percent ... 37.49 percent ... 39.99 percent ... 42.49 percent ... 44.99 percent ... 47.49 percent ... 49.99 percent ... 52.49 percent ... 54.99 percent
plt.figure(figsize=(10,10))
plt.imshow(pdata[3],interpolation='none',vmin=137,vmax=141)
plt.title('green up doy')
plt.colorbar()
plt.figure(figsize=(10,10))
plt.imshow(pdata[5]-pdata[3],interpolation='none',vmin=74,vmax=84)
plt.title('season length')
plt.colorbar()
plt.figure(figsize=(10,10))
plt.imshow(pdata[0],interpolation='none',vmin=0.,vmax=6.)
plt.title('min LAI')
plt.colorbar()
plt.figure(figsize=(10,10))
plt.imshow(pdata[1]+pdata[0],interpolation='none',vmin=0.,vmax=6.)
plt.title('max LAI')
plt.colorbar()
plt.figure(figsize=(10,10))
plt.imshow(np.sqrt(pdata[-1]),interpolation='none',vmax=np.sqrt(500))
plt.title('RSSE')
plt.colorbar()
# check a few pixels
c = 200
for r in xrange(200,400,25):
y = data[:,r,c]
mask = ~y.mask
y = np.array(y[mask])
x = (np.arange(46)*8+1.)[mask]
unc = np.array(sd[:,r,c][mask])
x_full = np.arange(1,366)
# some default values for the parameters
pp = pdata[:-1,r,c]
plt.figure(figsize=(7,7))
plt.title('r %d c %d'%(r,c))
plt.plot(x,y,'*')
plt.errorbar(x,y,unc*1.96)
y_hat = dbl_logistic_model(pp,x_full)
plt.plot(x_full,y_hat)
print 'solved parameters: ',pp[0],pp[1],pp[2],pp[3],pp[4],pp[5]
# if we define the phenology as the parameter p[3]
# and the 'length' of the growing season:
print 'phenology',pp[3],pp[5]-pp[3]