Scott Lindell at MBL was working on aquaculture siting in Cape Cod Bay, and wanted to know when and where endangered Right Whales were found in shallow water.
import pandas as pd
import netCDF4
from mpl_toolkits.basemap import interp
Scott's data was in a simple spreadsheet output, so we saved to CSV and read it into Pandas
df = pd.read_csv('/usgs/data2/notebook/right_whale.csv',index_col='DATE',parse_dates=True)
df.head(4)
Species Code | SPECIES TOTAL | Latitude | Longitude | |
---|---|---|---|---|
DATE | ||||
1998-12-13 | RIWH | 1 | 42.05167 | -70.15833 |
1998-12-13 | RIWH | 1 | 42.05167 | -70.14833 |
1998-12-13 | RIWH | 1 | 41.86000 | -70.26334 |
1998-12-13 | RIWH | 2 | 41.79167 | -70.35333 |
To see what depths the whales were in, get a high-resolution bathymetry dataset for the Gulf of Maine. We will get just the bathymetric grid we need by determining the bounding box of the whale data, and then using that to extract the data via OPeNDAP. We access OPeNDAP data using NetCDF4-Python, which makes OPeNDAP datasets look like a local NetCDF file.
bathy_url='http://geoport.whoi.edu/thredds/dodsC/bathy/gom03_v1_0'
nc = netCDF4.Dataset(bathy_url).variables
lon=nc['lon'][:]
lat=nc['lat'][:]
bi=(lon>=df['Longitude'].min())&(lon<=df['Longitude'].max())
bj=(lat>=df['Latitude'].min())&(lat<=df['Latitude'].max())
z=nc['topo'][bj,bi]
lon=lon[bi]
lat=lat[bj]
print shape(z)
(368, 722)
Get whale depths by interpolating from bathymetric grid
df['Depth']=interp(-z,lon,lat,df['Longitude'],df['Latitude'])
df.head()
Species Code | SPECIES TOTAL | Latitude | Longitude | Depth | |
---|---|---|---|---|---|
DATE | |||||
1998-12-13 | RIWH | 1 | 42.05167 | -70.15833 | 6.740654 |
1998-12-13 | RIWH | 1 | 42.05167 | -70.14833 | 3.542641 |
1998-12-13 | RIWH | 1 | 41.86000 | -70.26334 | 28.271041 |
1998-12-13 | RIWH | 2 | 41.79167 | -70.35333 | 23.508510 |
1999-01-06 | RIWH | 1 | 41.83333 | -70.15833 | 6.958285 |
At what water depths are whales usually found in Cape Cod Bay?
hist(df['Depth']);
# create dataframe of shallow sightings
ishallow = where(df['Depth'] < 10.0)[0]
df_shallow = df.ix[ishallow]
df_shallow.head(10)
Species Code | SPECIES TOTAL | Latitude | Longitude | Depth | |
---|---|---|---|---|---|
DATE | |||||
1998-12-13 | RIWH | 1 | 42.05167 | -70.15833 | 6.740654 |
1998-12-13 | RIWH | 1 | 42.05167 | -70.14833 | 3.542641 |
1999-01-06 | RIWH | 1 | 41.83333 | -70.15833 | 6.958285 |
1999-01-21 | RIWH | 1 | 41.83333 | -70.16167 | 7.666747 |
1999-01-21 | RIWH | 2 | 41.83333 | -70.16167 | 7.666747 |
1999-01-26 | RIWH | 1 | 41.75167 | -70.23000 | 9.675799 |
1999-02-01 | RIWH | 1 | 41.83167 | -70.10500 | 9.478573 |
1999-02-17 | RIWH | 1 | 41.83333 | -70.16666 | 8.686150 |
2007-05-13 | RIWH | 2 | 41.80426 | -70.07568 | 9.200651 |
2007-04-25 | RIWH | 1 | 42.02145 | -70.19609 | 1.540242 |
figure(figsize=(12,12))
z = ma.masked_where(z>0,z)
subplot(111,aspect=(1.0/cos(mean(lat)*pi/180.0)))
pcolormesh(lon,lat,z,vmax=0)
plot(df_shallow.Longitude,df_shallow.Latitude,'ko');
plot(df_shallow.Longitude,df_shallow.Latitude,'w+');
axis([-70.6, -70.0, 41.75, 42.06])
grid()
# determine stats by year (or month, quarter, etc)
#df_shallow_stats = df_shallow.resample('6M', how='mean') # monthly means
df_shallow_stats = df_shallow.resample('A-AUG', how='sum') # annual sum, ending in august
df_shallow_stats['SPECIES TOTAL']
DATE 1998-08-31 9 1999-08-31 9 2000-08-31 4 2001-08-31 3 2002-08-31 NaN 2003-08-31 NaN 2004-08-31 8 2005-08-31 1 2006-08-31 NaN 2007-08-31 3 2008-08-31 6 2009-08-31 4 2010-08-31 5 2011-08-31 25 Freq: A-AUG, Name: SPECIES TOTAL, dtype: float64
# any significant trend in time?
df_shallow_stats['SPECIES TOTAL'].plot(kind='bar',figsize=(12,5))
<matplotlib.axes.AxesSubplot at 0x4900f10>
So how many whales on average would we expect to see in shallow water in specific months? We can use the Pandas groupby method to group by month and take the mean
# average number of sightings for each month
df_monthly_sum = df_shallow.groupby(lambda x: x.month).mean()
df_monthly_sum.head(12)
SPECIES TOTAL | Latitude | Longitude | Depth | |
---|---|---|---|---|
1 | 1.875 | 41.917880 | -70.189633 | 6.414131 |
2 | 1.000 | 41.926982 | -70.141615 | 4.845008 |
3 | 1.800 | 41.952314 | -70.153653 | 6.878730 |
4 | 2.000 | 41.934977 | -70.191988 | 6.019028 |
5 | 1.750 | 41.925157 | -70.116205 | 7.168017 |
12 | 1.000 | 42.051670 | -70.153330 | 5.141647 |
# plot 6 frame panel of sightings in each month
iframe=0
figure(figsize=(20,10))
grouped = df_shallow.groupby(lambda x: x.month)
for month,group in grouped:
iframe = iframe + 1
subplot(2,3,iframe,aspect=(1.0/cos(mean(lat)*pi/180.0)))
pcolormesh(lon,lat,z,vmax=0)
plot(group.Longitude,group.Latitude,'ko')
plot(group.Longitude,group.Latitude,'w+')
axis('tight')
#axis('off')
grid('on')
title('Month = %d' % month)
for month,group in grouped:
group.to_csv('/usgs/data2/notebook/%s.csv' % month)
# plot each month separately
iframe=0
for month,group in grouped:
iframe = iframe + 1
#subplot(2,3,iframe,aspect=(1.0/cos(mean(lat)*pi/180.0)))
figure(iframe,figsize=(12,12))
subplot(111,aspect=(1.0/cos(mean(lat)*pi/180.0)))
pcolormesh(lon,lat,z,vmax=0)
plot(group.Longitude,group.Latitude,'ko')
plot(group.Longitude,group.Latitude,'w+')
axis([-70.6, -70.0, 41.75, 42.06])
#axis('tight')
#axis('off')
grid('on')
title('Month = %d' % month)
df_shallow_stats[['SPECIES TOTAL','Depth']].describe()
SPECIES TOTAL | Depth | |
---|---|---|
count | 11.000000 | 11.000000 |
mean | 7.000000 | 25.700881 |
std | 6.511528 | 19.597841 |
min | 1.000000 | -0.481648 |
25% | 3.500000 | 12.481057 |
50% | 5.000000 | 20.128451 |
75% | 8.500000 | 29.539113 |
max | 25.000000 | 60.851178 |
df_shallow.index
<class 'pandas.tseries.index.DatetimeIndex'> [1998-12-13 00:00:00, ..., 2001-03-17 00:00:00] Length: 45, Freq: None, Timezone: None
type(df_shallow.index)
pandas.tseries.index.DatetimeIndex