#!/usr/bin/env python
# coding: utf-8

# # 7.0 Fire/ENSO teleconnections

# ## 7.1 Introduction

# There is much public and scientific interest in monitoring and predicting the activity of wildfires and such topics are [often in the media](http://www.bbc.co.uk/news/science-environment-15691060).
# 
# Part of this interest stems from the role fire plays in issues such as land cover change, deforestation and forest degradation and [Carbon emissions](http://www.google.com/url?sa=t&rct=j&q=fire%20carbon%20emissions&source=web&cd=6&ved=0CFEQFjAF&url=http%3A%2F%2Fwww.kcl.ac.uk%2Fsspp%2Fdepartments%2Fgeography%2Fpeople%2Facademic%2Fwooster%2F30yongwoosteretal.pdf&ei=4FPCTuvZE9Gg8gPZybyxBA&usg=AFQjCNG81fTRoCcK1nhKnk3u0b8az24bGQ&sig2=EjJYm2S-_2gHu2vgt4ByvA&cad=rja) from the land surface to the atmosphere, but also of concern are human health impacts. The impacts of fire should not however be considered as wholy negative, as it plays a [significant role in natural ecosystem processes](http://www.fl-dof.com/publications/fires_natural_role.html#firerole).
# 
# For many regions of the Earth, there are large inter-annual variations in the timing, frequency and severity of wildfires. Whilst anthropogenic activity accounts for a [large and probably increasing proportion](http://www.google.com/url?sa=t&rct=j&q=fire%20frequency%20july%204th&source=web&cd=7&ved=0CFIQFjAG&url=http%3A%2F%2Farctic.synergiesprairies.ca%2Farctic%2Findex.php%2Farctic%2Farticle%2Fdownload%2F2806%2F2783&ei=K1bCTqv1MYS28QOR9ekG&usg=AFQjCNFKillAOZMXrT5xpFhckMKvqW50Vg&sig2=r3J6454VcvI1xpC3Sf3RKw&cad=rja) of fires started, this is not in itself [a new phenomenon](http://www.google.com/url?sa=t&rct=j&q=anthropogenic%20fire&source=web&cd=2&ved=0CCcQFjAB&url=http%3A%2F%2Fwww.as.ua.edu%2Fant%2Fbindon%2Fant475%2FPapers%2FHamm.pdf&ei=rFXCTu-PHsay8QPdy-2MBA&usg=AFQjCNGUMrfnDTwRDBxFB-wioZokBt8EtA&sig2=Zt1nfHoKktbka-pEZs6NGw&cad=rja).
# 
# Fires spread where: (i) there is an ignition source (lightning or man, mostly); (ii) sufficient combustible fuel to maintain the fire. The latter is strongly dependent on fuel loads and mositure content, as well as meteorological conditions. Generally then, when conditions are drier (and there is sufficient fuel and appropriate weather conditions), we would expect fire spread to increase. If the number of ignitions remained approximately constant, this would mean more fires. [Many models of fire activity](http://www.nasa.gov/images/content/492949main_Figure-2-Wildfires.jpg) predict increases in fire frequency in the coming decades, although there may well be [different behaviours in different parts of the world](http://science.sciencemag.org/content/334/6057/787.full).
# 
# 
# [![](http://www.nasa.gov/images/content/492949main_Figure-2-Wildfires.jpg)](http://www.nasa.gov/images/content/492949main_Figure-2-Wildfires.jpg)  
# 
# 
# Satellite data has been able to provide us with increasingly useful tools for monitoring wildfire activity, particularly since 2000 with the MODIS instruments on the NASA Terra and Aqua (2002) satellites. A suite of [‘fire’ products](http://modis-fire.umd.edu/index.html) have been generated from these data that have been used in a large number of [publications](http://modis-fire.umd.edu/Publications.html) and [practical/management projects](https://earthdata.nasa.gov/data/near-real-time-data/firms).
# 
# There is growing evidence of ‘teleconnection’ links between fire occurence and large scale climate patterns, such as [ENSO](https://www.ncdc.noaa.gov/teleconnections/enso/enso-tech.php).
# 
# [![](http://www.esrl.noaa.gov/psd/enso/mei/ts.gif)](http://www.esrl.noaa.gov/psd/enso/mei/)
# 
# The proposed mechanisms are essentially that such climatic patterns are linked to local water status and temperature and thus affect the ability of fires to spread. For some regions of the Earth, empirical models built from such considerations have quite reasonable predictive skill, meaning that fire season severity might be predicted [some months ahead of time](http://www.sciencemag.org/content/334/6057/787.full).

# ## 7.2 A Practical Exercise

# ### 7.2.1 In This Session

# In[ ]:


# In this session, you will be working in groups (of 3 or 4) to build a computer code in python to explore links between fire activity and Sea Surface Temperature anomalies.
# 
# This is a team exercise, but does not form part of your formal assessment for this course. You should be able to complete the exercise in the 3 hour session, if you work effectively as a team. Staff will be on hand to provide pointers.
# 
# You should be able to complete the exercise using coding skills and python modules that you have previously experience of, though we will also provide some pointers to get you started.

# ### 7.2.2 Statement of the problem

# **Using monthly fire count data from MODIS Terra, develop and test a predictive model for the number of fires per unit area per year driven by Sea Surface Temperature anomaly data.**
# 
# The datasets should be created at 0.5 degree resolution on a latitude/longitude grid.
# 
# You should concentrate on building the model for the *peak fire count* in a particular year at a particular location, i.e. derive your model for annual peak fire count.
# 
# 
# Reading datasets
# ------------------
# 
# You should use the datasets described below.
# 
# If you follow the naming conventions below, you should have:
# 
# ----
# 
# `fdata` : shape e.g. `(186, 36, 72)` i.e. `(nmonths,nlat,nlon)` of fire count data
# as a masked array.
# 
# `fyear` : shape e.g. `(186,)` for the years for which the Sea Surface Temperature anomaly are available.
# 
# 
# `fmonth`: shape e.g. `(186,)` the associated month
# 
# 
# ----
# 
# 
# `cyears` : shape e.g. `(69,)` for the years for which the Sea Surface Temperature anomaly are available.
# 
# 
# `cdata` : shape e.g. `(12,69)` for each month (the `12`) and each year for which the Sea Surface Temperature anomaly are available.
# 
# ----
# 
# 
# Derived Information
# ---------------------
# 
# To do the modelling for this exercise, you should access:
# 
# `peak_count` : the peak fire count per year
# 
# `peak_month` : the month in which the peak fire count occured in a particular year

# ### 7.2.3 Datasets

# We suggest that the datasets you use of this analysis, following Chen at al. (2011), are:
# 
# - MODIS Terra fire counts (2001-2011) (MOD14CMH). The particular dataset you will want from the file is ‘SUBDATASET_2 [360x720] CloudCorrFirePix (16-bit integer)’.
# - Climate index data from NOAA
# 
# If you ever wish to take this study further, you can find various other useful datasets such as these.
# 
# #### Fire Data
# 
# The MOD14CMH [CMG data](http://nsidc.org/data/modis/data_summaries/cmg_sample.html) are available from the [UMD ftp server](ftp://fire:burnt@fuoco.geog.umd.edu/modis/C5/cmg/monthly/hdf) but have also been packaged for you as [![DOI](https://sandbox.zenodo.org/badge/DOI/10.5072/zenodo.61299.svg)](https://doi.org/10.5072/zenodo.61299)
# 
# 
# 
# 
# If for any reason, you *did* want to copy or update them, use the following unix command:
# 
# `cd Chapter7_ENSO/data;wget 'ftp://fire:burnt@fuoco.geog.umd.edu/modis/C5/cmg/monthly/hdf/*'`  
# 
# (though you may need to update the password). The data need to be placed in [data](data) and are in HDF format, so you should know how to read them into a numpy array in python.

# In[1]:


import matplotlib
import numpy as np
import matplotlib.pyplot as plt
get_ipython().run_line_magic('matplotlib', 'inline')

get_ipython().system('ls data/*hdf')


# If you are **really** stuck on reading the data, or just want to move on to the next parts, you can use [`python/reader.py`](python/reader.py) which will create a masked array in `data`, and an array of years (`year`) and months (`month`):

# In[341]:


run python/reader


# In[344]:


# refine the data mask
data.mask = data.data.sum(axis=0) == 0


# In[345]:


# the reader script loads the 
# firecount data into the data array
# shape: (nmonths,nlat,nlon)
print 'data shape',data.shape

# data is a masked array:
print 'data type',type(data)
# so has a mask:
print 'mask shape',data.mask.shape

# also: year and month
print 'year and month shape',year.shape,month.shape

# display the firecount data
plt.figure(figsize=(10,6))
x = year + month/12.
y = np.sum(data,axis=(1,2))
plt.plot(x,y)
plt.ylabel('global fire count')
plt.xlabel('time')
plt.title('global fire count')
plt.xlim(x[0],x[-1])

fyear,fmonth = year,month


# Data pre-processing
# =====================
# 
# A. Sort the data size required
# ----------------------------
# 
# This dataset is at 0.5 degree resolution and we want to perform tha analysis as 5 degrees.
# 
# **First, confirm for yourself that the shape of the dataset is consistent with the data being at 0.5 degree resolution.**

# To use the dataset here, we will need to shrink the dataset by a factor of 10 then, to a 5 degrees dataset.
# 
# There are different ways to achive this, but one way would be to reoganise the data:

# In[352]:


plt.figure(figsize=(10,5))
plt.imshow(data.mask[0])


# In[358]:


plt.figure(figsize=(10,5))
plt.imshow(rdata.sum(axis=0)[0],interpolation='none')


# In[420]:


# how does this work: do it in stages to try to understand this
# NB -- all we are trying to achieve is a reduction in the 
# dataset resolution by summingover each sun 10 x 10 cells

rdata = np.array([data[:,i::10,j::10] for i in xrange(10) for j in xrange(10)])
# sum over 1st axis to achieve reduced resolution summation
rdata = rdata.sum(axis=0)
nslots,nlat,nlon = rdata.shape

# set a min count
mincount = 1000

#build a mask
all_mask = rdata.sum(axis=0) <= mincount
# repeat it for each frame
rmask = np.repeat(all_mask.T,nslots).reshape((nlon,nlat,nslots)).T

plt.figure(figsize=(10,5))
print rmask.shape
plt.imshow(rmask[0],interpolation='none')


fdata = ma.array(rdata,mask=rmask) 

plt.figure(figsize=(10,5))
plt.title('log fire count for %d month %02d'%(year[10],month[10]))
plt.imshow(np.ma.log(rdata[10]),interpolation='none')
plt.colorbar(fraction=0.02)


# In[423]:


# or even make a movie
lf = np.ma.log(fdata)
vmax = np.max(lf[lf>0])

root = 'images/'

print lf.shape
# loop over lf.shape[0] (time samples)
for i in xrange(lf.shape[0]):
    print i,
    fig = plt.figure(figsize=(10,5))
    plt.imshow(np.ma.log(fdata[i]),interpolation='nearest',vmax=vmax)
    plt.colorbar(fraction=0.02)
    file_id = '%d month %02d'%(year[i],month[i])
    plt.title('log fire count for %s'%file_id)
    plt.savefig('%s_%s.jpg'%(root,file_id.replace(' ','_')))
    plt.close(fig)


# In[425]:


# make a movie from the files
cmd = 'convert -delay 100 -loop 0 {0}_*month*.jpg {0}fire_movie3.gif'.format(root)
os.system(cmd)


# ![](files/images/fire_movie3.gif)

# A. Sort the information required
# ----------------------------
# 
# What information do we need to derive from the fire dataset?
# 
# There are several ways we could approach the problem, but a simple method might be to first find the 'typical' peak fire month per geographical location. We could call that the 'fire month'.
# 
# To calculate that, we could find which month in each (full) year the maximum fire count occurs in.
# 
# A useful method for that will be [`np.argmax()`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.argmax.html) that gives the argument ('index' if you like) of the maximum value in an array.
#     
#     
# 

# In[435]:


data = np.random.rand(100)
amax = np.argmax(data)

print 'the maxiumum value in the data array is',data[amax],'that occurs at sample index',amax


# Once we have defined the spatial distribution of fire month, we can calculate an annual dataset containing the firecount (for the fire month) for each year and location.

# In[426]:


plt.figure(figsize=(10,5))
plt.imshow(fdata.mask[0],interpolation='nearest')
plt.colorbar(fraction=0.02)


# In[297]:


# find complete years

# the list of years
good_year = []

for y in np.unique(fyear):
    if (fyear == y).sum() == 12:
        # then we have a full year
        good_year.append(y)

good_year = np.array(good_year)
print good_year


# In[501]:


# so for these years, find the max fire count
# and the associated month

# the list of years
good_year = []
month_data = []

for y in np.unique(fyear):
    if (fyear == y).sum() == 12:
        # then we have a full year
        good_year.append(y)
        # this gives the month in this year where firecount is maximum
        year_data = np.ma.array(np.ma.argmax(fdata[fyear == y],axis=0),\
                                mask=fdata.mask[0])
        month_data.append(year_data)

# now find the e.g. median of these to get th erepresetative fire month ...
month_data = np.median(np.ma.array(month_data),axis=0).astype(int)

plt.figure(figsize=(10,5))
plt.imshow(month_data,interpolation='nearest',vmax=11)
plt.colorbar(fraction=0.02)
plt.title('fire month')


# In[547]:


# now generate the fire_count data for each year in an array fire_count

fire_count = np.zeros((len(good_year),nlat,nlon))

for iy,y in enumerate(good_year):
    year_data = np.ma.array(fdata[fyear == y])
    for m in np.arange(12):
        r,c = np.where(month_data == m)
        fire_count[iy,r,c] = year_data[m,r,c]
        
fire_count_mask = fire_count == 0
fire_count = np.ma.array(fire_count,mask=fire_count_mask)

# plot a few years data
for y in xrange(2):
    plt.figure(figsize=(10,5))
    plt.imshow(np.ma.log(fire_count[y]),interpolation='nearest')
    plt.colorbar(fraction=0.02)
    plt.title('log max fire count %d'%good_year[y])
    
# plot for one location
# the biggest one ...
y,r,c = np.where(fire_count == np.ma.max(fire_count))
plt.figure(figsize=(10,5))
plt.title('log fire count for pixel %d %d'%(r,c))
plt.plot(good_year,np.log(fire_count[:,r,c]))
plt.xlabel('year')
plt.ylabel('log fire count')


# In[549]:


print fire_count.shape
print len(good_year)
print month_data.shape


# In summary, we now have the fire month stored in `month_data`, and, for each year (in `good_year`), the fire count for that month in `fire_count`.
# 
# This is the core information we require for modelling.
# 
# We now consider the associated climate datasets.

# #### Climate Data
# 
# The climate data you will want will be some form of Sea Surface Temperature (SST) anomaly measure. There is a long list of such measures on [http://www.esrl.noaa.gov/psd/data/climateindices/list](http://www.esrl.noaa.gov/psd/data/climateindices/list/).
# 
# Examples would be [AMO](http://www.esrl.noaa.gov/psd/data/correlation/amon.us.data) or [ONI](http://www.esrl.noaa.gov/psd/data/correlation/oni.data). Note that some of these measures are smoothed and others not.
# 
# When you see a new dataset, you should investigate the formatting and other issues before trying to interpret it.
# 
# Whilst learning coding, it is often worthwhile doing the 'low level' reading and formatting yourself. We provide 2 examples of this below. 
# 
# There are simpler ways of reading in tabular ('csv') datasets, so we show two of these as well.
# 
# First, lets explore the dataset format:
# 
# Suppose we had selected AMO and we want to read directly from the url:

# In[550]:


import urllib2

url = 'http://www.esrl.noaa.gov/psd/data/correlation/amon.us.data'

# as a backup, in case the noaa server is not responding,
# you can access the data from the data directory or
# https://github.com/profLewis/geogg122/blob/master/Chapter7_ENSO/data/amon.us.data

req = urllib2.Request ( url )
raw_data = urllib2.urlopen(req).readlines()


# Now examine the data format and work out how to access the information we want.

# In[551]:


# look at the first 2 lines of the data we pull from the url
#
# The first line gives the dates range
#
# The data values appear on subsequent lines
print 'first 2 lines:'
print raw_data[:2]

# years: integer array
year_range = np.array(raw_data[0].split()).astype(int)
print 'years range:',year_range


# look at the end of the data:
print 'last 5 lines:'
print raw_data[-5:]

# so the last 4 lines of the data contains 
# dataset information
fill_value = float(raw_data[-4])
print 'fill value:',fill_value
data_name = raw_data[-3]
print'data name:',data_name
data_info = raw_data[-2]
print'data info:',data_info
data_url = raw_data[-1]
print'data url:',data_url
#
# we notice from inspection that 
# we want data from rows 1 to -4
# so raw_data[1:-4]
# set this as data_str, the string representation of the data
data_str = raw_data[1:-4]
print 'length of data_str:',len(data_str)


# In[552]:


# so to read the data we loop over each line
# Version 1:
#
# we could do this in a loop of this sort:

# We want the data in an array of shape (nmonths,nyears)
# so initialise this
nyears = year_range[-1] - year_range[0] + 1
nmonths = 12
first_year = year_range[0]
print 'nyears, nmonths:',nyears,nmonths
print 'first year:',first_year

cdata = np.zeros((nmonths,nyears))
cyears = np.zeros((nyears))

# loop over all lines in data_str
for i in xrange(len(data_str)):
    
    # load the string data for thatr line
    line_data = data_str[i]
    
    # split it into an array of (13) floats
    fline_data = np.array(line_data.split()).astype(float)
    
    # the first entry is the year
    this_year = int(fline_data[0])
    
    # give the user info on which years we read
    print this_year,
    
    cyears[i] = this_year
    
    # the rest of the line is the 12 data entries per month
    year_data = fline_data[1:]
    
    # the year index is this_year minus first_year
    year_index = this_year - first_year
    print '(%d),'%year_index,

    # store that in cdata
    cdata[:,year_index] = year_data
    
# now, generate a mask of the fill values
cmask = (cdata ==  fill_value)

# where are the duff (fill) values?
# (why would this be?)
print np.where(cmask)


# We can see that the dataset has no gaps in year, so instead of working out `year_index` in the loop, we could more simply use the index `i`.
# 
# Actually, we could read the data in in a more 'pythonic' manner with a much simpler loop:

# In[553]:


# so to read the data we loop over each line
# Version 2:

# form float array of all 13 data columns
cdata = np.array([r.split() for r in raw_data[1:-4]]).astype(float).T

# check the shape is how we want it
print 'shape:',cdata.shape

# separate the data and year values
cyears = cdata[0]
cdata = cdata[1:]

# make a mask for fill values
cmask = (cdata == fill_value)

# put cdata in a masked array
cdata = ma.array(cdata,mask=cmask)

print cdata.shape

print 'fill values:',fill_value,np.where(cmask)


# In[554]:


# so to read the data we loop over each line
# Version 3:
# use numpy.genfromtxt
# see http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html

cdata = np.genfromtxt(raw_data,skip_header=1,skip_footer=4,unpack=True,usemask=True)

# In this case, need to specify and fix fill value
cdata.fill_value=-99.99
cdata.mask[cdata.data==cdata.fill_value] = True
# separate the years column and the data
cyears = cdata[0]
cdata = cdata[1:]
print cdata.shape


# In[555]:


fig = plt.figure(figsize=(15,3))
plt.plot(cyears,cdata[0])
plt.xlabel('year')
plt.ylabel('January AMO')
plt.xlim(year_range[0],year_range[1])


# In[556]:


# so to read the data we loop over each line
# Version 4:
#
# use pandas to do the whole thing from the url
# pandas is a useful package for dealing with and plotting tabular data
# that you may like to consider
import pandas as pd

# see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
# and http://pandas.pydata.org/pandas-docs/stable/tutorials.html
c=pd.read_csv(url,skiprows=1,\
              header=None,delim_whitespace=True,skipfooter=4,index_col=0,engine='python').T
c[c == -99.99] = np.nan
print c.shape
# now display the dataset in a table
#print c.info()
print 'keys:',c.keys()

# display option
pd.set_option('display.max_columns', 10) 
# show table
c


# In[557]:


ax = c.T.plot.line(figsize=(15,5),title='ENSO AMO')
ax.set_xlabel('years')


# In[558]:


# to just dump it as a numpy array if thats all you want to do ...
cdata = np.array(pd.DataFrame(c))
cyears = np.array(c.keys()).astype(int)
year_range = np.array([cyears[0],cyears[-1]])

fig = plt.figure(figsize=(15,3))
plt.plot(cyears,cdata[0])
plt.xlabel('year')
plt.ylabel('January AMO')
plt.xlim(year_range[0],year_range[1])


# ### 7.2.4 Data Processing

# The fire information we have is the peak fire count and the month this occurred in, for each location and year.
# 
# The climate data gives the monthly ENSO-like variable for each month and year.

# In[569]:


# relevant datasets

print '------- Fire -----------'
print fire_count.shape
print len(good_year)
print month_data.shape

print '------- Climate --------'
print cyears.shape
print cdata.shape


# We now want to form datasets matching the ENSO index for N months prior to the fire month to the peak fire activity.
# 
# We can prototype this for `N = 0` to test the idea.
# 
# We appreciate in our design that `N` months before any month may lie in the previous year, so have to build a case for this:

# In[615]:


# what offset to use
N = 0

enso = np.zeros_like(fire_count)

for iy,y in enumerate(good_year):
    for m in np.arange(12):
        r,c = np.where(month_data==m)
        # which month in the climate dataset
        climate_month = m - N
        
        # account for possibility of month being in
        # previous year
        if climate_month < 0:
            climate_month += 12
            climate_year = y-1
        else:
            climate_year = y
        yy = np.where(cyears == y)[0]
        # pull the climate value for this month and year
        climate_data = cdata[climate_month,yy]
        
        # store in enso
        enso[y-good_year[0],r,c] = climate_data
        
plt.figure(figsize=(10,5))
plt.imshow(enso[0],interpolation='nearest')
plt.colorbar(fraction=0.02)
plt.title('enso value for year %d at lag %d'%(good_year[0],N))

# these should match up
print enso.shape,fire_count.shape


# So, now we have the ENSO index dataset for lag month `N` and the fire count data.
# 
# Lets look at a plot of this two for a particular location:

# In[616]:


import scipy.stats
# see http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.linregress.html

y,r,c = np.where(fire_count == np.ma.max(fire_count))

enso_ = np.array(enso[:,r,c]).flatten()
fire_count_ = np.array(fire_count[:,r,c]).flatten()
plt.figure(figsize=(5,5))
plt.plot(enso_,fire_count_,'+')
plt.xlabel('ENSO')
plt.ylabel('fire count')

# linear regression
slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(enso_,fire_count_)

print 'slope, intercept, r_value, p_value, std_err',slope, intercept, r_value, p_value, std_err


# We see there is a rather weak relationship between the two here, but that may just be the particular sample we have selected
# 
# Let's make a method to help wiith this:

# In[622]:


def enso_data(N,cdata=cdata,fire_count=fire_count,good_year=good_year,\
             month_data=month_data,cyears=cyears):
    enso = np.zeros_like(fire_count)

    for iy,y in enumerate(good_year):
        for m in np.arange(12):
            r,c = np.where(month_data==m)
            # which month in the climate dataset
            climate_month = m - N

            # account for possibility of month being in
            # previous year
            if climate_month < 0:
                climate_month += 12
                climate_year = y-1
            else:
                climate_year = y
            yy = np.where(cyears == y)[0]
            # pull the climate value for this month and year
            climate_data = cdata[climate_month,yy]

            # store in enso
            enso[y-good_year[0],r,c] = climate_data
    return enso


# In[637]:


y,r,c = np.where(fire_count == np.ma.max(fire_count))

fire_count_ = np.array(fire_count[:,r,c]).flatten()

r_2 = np.zeros(12)
for N in xrange(12):
    enso = enso_data(N)
    enso_ = np.array(enso[:,r,c]).flatten()
    slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(enso_,fire_count_)
    r_2[N] = r_value**2

print np.argmax(r_2),r_2.max()
plt.plot(r_2)
plt.xlabel('lag month')
plt.ylabel('R^2')


# So the highest correlation is achieved for `N=8` for this location.

# In[643]:


# do this spatially now
# so

r_2 = np.zeros((12,nlat,nlon))

for N in xrange(12):
    enso = enso_data(N)
    rs,cs = np.where(~fire_count.mask[0])
    for i in xrange(len(rs)):
        r,c = rs[i],cs[i]
        enso_ = np.array(enso[:,r,c]).flatten()
        fire_count_ = np.array(fire_count[:,r,c]).flatten()
        # regress
        slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(enso_,fire_count_)
        r_2[N,r,c] = r_value**2


# In[653]:


lag_month = np.argmax(r_2,axis=0)
max_r2 = np.max(r_2,axis=0)

plt.figure(figsize=(10,5))
plt.imshow(lag_month,interpolation='nearest')
plt.colorbar(fraction=0.02)
plt.title('enso optimal lag month')

plt.figure(figsize=(10,5))
plt.imshow(np.sqrt(max_r2),interpolation='nearest')
plt.colorbar(fraction=0.02)
plt.title('enso/fire -r')


####### NEED TO EDIT BEYOND HERE


# ### 7.2.4 Code to perform correlation analysis

# The idea here is, for a particular (or set of) SST anomaly measures, work out which ‘lag’ month gives the highest correlation coefficient with fire count. 
# 
# By ‘lag’ month, we mean that e.g. if the peak fire month for a particular pixel was September, which month prior to that has a set of SST anomalies over the sample years that is most strongly correlated with fire count. 
# 
# So, if we were using a single SST anomaly measure (e.g. AMO or ONI) and sample years 2001 to 2009 to build our model, then we would do a linear regression of fire count for a particular pixel over these years against e.g. AMO data for September (lag 0) then August (lag 1) then July (lag 2) etc. and see which produced the highest $R^2$.
# 
# Before we get into that, let's look again at the data structure we have:

# In[ ]:


# climate data
print 'cdata',cdata.shape
print 'cyears',cyears.shape

# From the fire data

print 'peak_count',peak_count.shape
print 'av_fire_month',av_fire_month.shape
print 'min_year',min_year


# So, if we want to select data for particular years:

# In[ ]:


# which years (inclusive)
years = [2001,2010]

ypeak_count = peak_count[years[0]-min_year:years[1] - min_year + 1]
ycdata  =  cdata[:,years[0] - cyears[0]:years[1] - cyears[0] + 1]

# check the shape
print ycdata.shape,ypeak_count.shape,av_fire_month.shape


# We need to consider a little carefully the implementation of lag ...

# In[ ]:


# we will need to access ycdata[month - n][year]
# which is a bit fiddly as e.g. -3 will be interpreted as
# October for that same year, rather than the previous year
y = 2001 - min_year
m = 2
lag = 5
print m - lag,y


# In[ ]:


# so one way to fix this is to decrease y by one
# if m - lag is -ve
Y = y - (m - lag < 0)
print m-lag,Y


# In[ ]:


from scipy.stats import linregress

# examine an example row col
# for a given month over all years

c = 24
r = 19
m = av_fire_month[r,c]
# pull the data
yyears = np.arange(years[1]-years[0]+1)

R2 = np.array([linregress(\
            ycdata[m-n,yyears - (m - n < 0)],\
            ypeak_count[yyears - (m - n < 0),r,c]\
        )[2] for n in xrange(12)])

n = np.argmax(R2)

x  = ycdata[m-n,yyears - (m - n < 0)]
y = ypeak_count[yyears - (m - n < 0),r,c]
slope,intercept,R,p,err = linregress(x,y)

print slope,intercept,p,err
plt.plot(ycdata[m-n],y,'r+')
plt.xlabel('Climate Index')
plt.ylabel('Fire count')
plt.plot([x.min(),x.max()],\
         [intercept+slope*x.min(),intercept+slope*x.max()],'k--')
plt.title('Fire count at r %03d c %03d: R^2 = %.3f: lag %d'%(r,c,R2[n],n))


# In[ ]:


# looper

data_mask = ypeak_count.sum(axis=0)>100

rs,cs = np.where(data_mask)

results = {'intercept':0,'slope':0,'p':0,'R':0,'stderr':0,'lag':0}
for k in results.keys():
    results[k] = np.zeros_like(av_fire_month).astype(float)
    results[k] = ma.array(results[k],mask=~data_mask)
    
for r,c in zip(rs,cs):
    m = av_fire_month[r,c]
    # pull the data
    yyears = np.arange(years[1]-years[0]+1)
    R2 = np.array([\
            linregress(\
                ycdata[m-n,yyears - (m - n < 0)],\
                ypeak_count[yyears - (m - n < 0),r,c]\
            )[2] for n in xrange(12)])
    
    n = np.argmax(R2)
    results['lag'][r,c] = n
    x  = ycdata[m-n,yyears - (m - n < 0)]
    y = ypeak_count[yyears - (m - n < 0),r,c]
    results['slope'][r,c],results['intercept'][r,c],\
    results['R'][r,c],results['p'][r,c],\
    results['stderr'][r,c] = linregress(x,y)


# In[ ]:


plt.figure(figsize=(10,4))
plt.imshow(results['R'],interpolation='nearest')
plt.colorbar()
plt.title('R')

plt.figure(figsize=(10,4))
plt.imshow(results['p'],interpolation='nearest')
plt.colorbar()
plt.title('p')


plt.figure(figsize=(10,4))
plt.imshow(results['slope'],interpolation='nearest')
plt.colorbar()
plt.title('slope')


plt.figure(figsize=(10,4))
plt.imshow(results['lag'],interpolation='nearest')
plt.colorbar()
plt.title('lag')


# which we can now predict:

# In[ ]:


# prediction year
pyear = 2012

# which month?
M = av_fire_month - results['lag']
Y = np.zeros_like(M) + pyear
Y[M<0] -= 1


# In[ ]:


# lets look at that ...
plt.imshow(Y,interpolation='nearest')


# In[ ]:


# climate data
scdata = np.zeros_like(Y).astype(float)

for y in [pyear,pyear-1]:
    for m in xrange(12):
        scdata[(Y == y) & (M == m)] = cdata[m,y-cyears[0]]

plt.imshow(scdata,interpolation='nearest')


# In[ ]:


# now predict

fc_predict = results['intercept'] + results['slope'] * scdata

plt.figure(figsize=(10,4))
plt.imshow(fc_predict,interpolation='nearest',vmin=0,vmax=10000)
plt.colorbar()
plt.title('predicted peak fire count for %d'%pyear)

plt.figure(figsize=(10,4))
plt.imshow(peak_count[pyear-min_year],\
           interpolation='nearest',vmin=0,vmax=10000)
plt.colorbar()
plt.title('actual peak fire count for %d'%pyear)


# In[ ]:


x = peak_count[pyear-min_year].flatten()
y = fc_predict.flatten()

slope,intercept,R,p,err = linregress(x,y)

plt.plot(x,y,'+')
plt.xlabel('measured fire count')
plt.ylabel('predicted fire count')
cc = np.array([0.,x.max()])

plt.plot(cc,cc,'k--')
plt.plot(cc,slope*cc+intercept,'k-')
plt.title('fire count predictions')
print slope,intercept,R,p,err