Investigating Right Whale Sighting data in Cape Cod Bay¶

Scott Lindell at MBL was working on aquaculture siting in Cape Cod Bay, and wanted to know when and where endangered Right Whales were found in shallow water.

In [26]:

import pandas as pd
import netCDF4
from mpl_toolkits.basemap import interp

Scott's data was in a simple spreadsheet output, so we saved to CSV and read it into Pandas

In [27]:

df = pd.read_csv('/usgs/data2/notebook/right_whale.csv',index_col='DATE',parse_dates=True)

In [28]:

df.head(4)

Out[28]:

	Species Code	SPECIES TOTAL	Latitude	Longitude
DATE
1998-12-13	RIWH	1	42.05167	-70.15833
1998-12-13	RIWH	1	42.05167	-70.14833
1998-12-13	RIWH	1	41.86000	-70.26334
1998-12-13	RIWH	2	41.79167	-70.35333

To see what depths the whales were in, get a high-resolution bathymetry dataset for the Gulf of Maine. We will get just the bathymetric grid we need by determining the bounding box of the whale data, and then using that to extract the data via OPeNDAP. We access OPeNDAP data using NetCDF4-Python, which makes OPeNDAP datasets look like a local NetCDF file.

In [29]:

bathy_url='http://geoport.whoi.edu/thredds/dodsC/bathy/gom03_v1_0'
nc = netCDF4.Dataset(bathy_url).variables
lon=nc['lon'][:]
lat=nc['lat'][:]
bi=(lon>=df['Longitude'].min())&(lon<=df['Longitude'].max())
bj=(lat>=df['Latitude'].min())&(lat<=df['Latitude'].max())
z=nc['topo'][bj,bi]
lon=lon[bi]
lat=lat[bj]
print shape(z)

(368, 722)

Get whale depths by interpolating from bathymetric grid

In [30]:

df['Depth']=interp(-z,lon,lat,df['Longitude'],df['Latitude'])

In [31]:

df.head()

Out[31]:

	Species Code	SPECIES TOTAL	Latitude	Longitude	Depth
DATE
1998-12-13	RIWH	1	42.05167	-70.15833	6.740654
1998-12-13	RIWH	1	42.05167	-70.14833	3.542641
1998-12-13	RIWH	1	41.86000	-70.26334	28.271041
1998-12-13	RIWH	2	41.79167	-70.35333	23.508510
1999-01-06	RIWH	1	41.83333	-70.15833	6.958285

At what water depths are whales usually found in Cape Cod Bay?

In [32]:

hist(df['Depth']);

In [33]:

# create dataframe of shallow sightings
ishallow = where(df['Depth'] < 10.0)[0]
df_shallow = df.ix[ishallow]
df_shallow.head(10)

Out[33]:

	Species Code	SPECIES TOTAL	Latitude	Longitude	Depth
DATE
1998-12-13	RIWH	1	42.05167	-70.15833	6.740654
1998-12-13	RIWH	1	42.05167	-70.14833	3.542641
1999-01-06	RIWH	1	41.83333	-70.15833	6.958285
1999-01-21	RIWH	1	41.83333	-70.16167	7.666747
1999-01-21	RIWH	2	41.83333	-70.16167	7.666747
1999-01-26	RIWH	1	41.75167	-70.23000	9.675799
1999-02-01	RIWH	1	41.83167	-70.10500	9.478573
1999-02-17	RIWH	1	41.83333	-70.16666	8.686150
2007-05-13	RIWH	2	41.80426	-70.07568	9.200651
2007-04-25	RIWH	1	42.02145	-70.19609	1.540242

In [34]:

figure(figsize=(12,12))
z = ma.masked_where(z>0,z)
subplot(111,aspect=(1.0/cos(mean(lat)*pi/180.0)))
pcolormesh(lon,lat,z,vmax=0)
plot(df_shallow.Longitude,df_shallow.Latitude,'ko');
plot(df_shallow.Longitude,df_shallow.Latitude,'w+');
axis([-70.6, -70.0, 41.75, 42.06])
grid()

In [35]:

# determine stats by year (or month, quarter, etc)
#df_shallow_stats = df_shallow.resample('6M', how='mean')   # monthly means
df_shallow_stats = df_shallow.resample('A-AUG', how='sum')   # annual sum, ending in august

In [36]:

df_shallow_stats['SPECIES TOTAL']

Out[36]:

DATE
1998-08-31     9
1999-08-31     9
2000-08-31     4
2001-08-31     3
2002-08-31   NaN
2003-08-31   NaN
2004-08-31     8
2005-08-31     1
2006-08-31   NaN
2007-08-31     3
2008-08-31     6
2009-08-31     4
2010-08-31     5
2011-08-31    25
Freq: A-AUG, Name: SPECIES TOTAL, dtype: float64

In [37]:

# any significant trend in time?
df_shallow_stats['SPECIES TOTAL'].plot(kind='bar',figsize=(12,5))

Out[37]:

<matplotlib.axes.AxesSubplot at 0x4900f10>

So how many whales on average would we expect to see in shallow water in specific months? We can use the Pandas groupby method to group by month and take the mean

In [64]:

# average number of sightings for each month
df_monthly_sum = df_shallow.groupby(lambda x: x.month).mean()

In [67]:

df_monthly_sum.head(12)

Out[67]:

	SPECIES TOTAL	Latitude	Longitude	Depth
1	1.875	41.917880	-70.189633	6.414131
2	1.000	41.926982	-70.141615	4.845008
3	1.800	41.952314	-70.153653	6.878730
4	2.000	41.934977	-70.191988	6.019028
5	1.750	41.925157	-70.116205	7.168017
12	1.000	42.051670	-70.153330	5.141647

In [68]:

# plot 6 frame panel of sightings in each month
iframe=0
figure(figsize=(20,10))
grouped = df_shallow.groupby(lambda x: x.month)
for month,group in grouped:
    iframe = iframe + 1
    subplot(2,3,iframe,aspect=(1.0/cos(mean(lat)*pi/180.0)))
    pcolormesh(lon,lat,z,vmax=0)
    plot(group.Longitude,group.Latitude,'ko')
    plot(group.Longitude,group.Latitude,'w+')
    axis('tight')
    #axis('off')
    grid('on')
    title('Month = %d' % month)

In [42]:

for month,group in grouped:
    group.to_csv('/usgs/data2/notebook/%s.csv' % month)

In [43]:

# plot each month separately
iframe=0
for month,group in grouped:
    iframe = iframe + 1
    #subplot(2,3,iframe,aspect=(1.0/cos(mean(lat)*pi/180.0)))
    figure(iframe,figsize=(12,12))
    subplot(111,aspect=(1.0/cos(mean(lat)*pi/180.0)))
    
    pcolormesh(lon,lat,z,vmax=0)
    plot(group.Longitude,group.Latitude,'ko')
    plot(group.Longitude,group.Latitude,'w+')
    axis([-70.6, -70.0, 41.75, 42.06])
    #axis('tight')
    #axis('off')
    grid('on')
    title('Month = %d' % month)

In [55]:

df_shallow_stats[['SPECIES TOTAL','Depth']].describe()

Out[55]:

	SPECIES TOTAL	Depth
count	11.000000	11.000000
mean	7.000000	25.700881
std	6.511528	19.597841
min	1.000000	-0.481648
25%	3.500000	12.481057
50%	5.000000	20.128451
75%	8.500000	29.539113
max	25.000000	60.851178

In [45]:

In [46]:

df_shallow.index

Out[46]:

<class 'pandas.tseries.index.DatetimeIndex'>
[1998-12-13 00:00:00, ..., 2001-03-17 00:00:00]
Length: 45, Freq: None, Timezone: None

In [47]:

type(df_shallow.index)

Out[47]:

pandas.tseries.index.DatetimeIndex

In [ ]: