matplotlib Demo¶

In this demo we'll be using figures to explore some historical hurricane data. The raw data are in this CSV file, which is a machine readable form of the data in the Weather Underground's Hurricane Archive.

Read the Data¶

We'll start off by reading the CSV file via numpy.genfromtxt:

In [1]:

import numpy as np

In [2]:

!head -n5 HurricaneHistoryData.csv

Year,Storms,Hurricanes,Deaths,Damage (millions USD),Number of Retired Names,Retired Names
2011,20,7,100,21000,1,Irene
2010,21,12,287,12356,2,"Igor, Tomas"
2009,11,3,6,77,0,
2008,16,8,761,24945,3,"Gustav, Ike, Paloma"

In [3]:

year, storms, hurricanes, deaths, damage, retired = \
    np.genfromtxt('HurricaneHistoryData.csv', delimiter=',', usecols=range(6), dtype=np.int, unpack=True, skip_header=1)

The usecols keyword specifies exactly which columns we want back, and in this case we don't want the last column of retired names since it'd be tricky to deal with in arrays. I happen to know all the numbers in this file are integers so I've used the dtype keyword to specify that.

In [4]:

year[:5]

Out[4]:

array([2011, 2010, 2009, 2008, 2007])

Setup and Imports¶

Here we'll turn on inline plotting, set the figure type to SVG, and import matplotlib:

In [5]:

%pylab inline

Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

In [6]:

%config InlineBackend.figure_format = 'SVG'

In [7]:

import matplotlib.pyplot as plt

Plotting vs. Time¶

One thing we might do is a simple line plot of the number of storms, hurricanes, and retired names vs. year. The first thing to do when making a new plot is to use the plt.subplots function to make Figure and Axes objects that will be our interfaces to matplotlib:

In [8]:

fig, ax = plt.subplots()
ax.plot(year, storms, label='Storms')
ax.plot(year, hurricanes, label='Hurricanes')
ax.plot(year, retired, label='Retired Names')
ax.set_xlabel('Year')
ax.legend(loc='upper left')

Out[8]:

<matplotlib.legend.Legend at 0x10bb9d650>

Notice that we didn't need to pass much information to the ax.legend method. Because we used the label keyword in the calls to plot legend was able to automatically construct an appropriate legend.

The variability in this plot makes it a bit busy, though. If we rearrange the data a little we can try some other plot types. The hurricanes and are really a subset of the storms, and the retired names are probably a subset of the hurricanes. So let's separate those out:

In [9]:

tropical_storms = storms - hurricanes
unretired_hurricanes = hurricanes - retired

With these distinct groups we could try a stackplot:

In [10]:

fig, ax = plt.subplots()
ax.stackplot(year, retired, unretired_hurricanes, tropical_storms)
ax.set_xlabel('Year')

Out[10]:

<matplotlib.text.Text at 0x10bbe5b90>

That's still pretty tough to read. Maybe a stacked bar chart:

In [11]:

left = year - 0.5
fig, ax = plt.subplots()
ax.bar(left=left, height=tropical_storms, width=1, linewidth=0, color='b', label='Storms')
ax.bar(left=left, height=unretired_hurricanes, bottom=tropical_storms, width=1, linewidth=0, color='r', label='Hurricanes')
ax.bar(left=left, height=retired, bottom=tropical_storms + unretired_hurricanes, width=1, linewidth=0, color='g', label='Retired')
ax.set_xlabel('Year')
ax.legend(loc='upper left', fontsize=10)

Out[11]:

<matplotlib.legend.Legend at 0x10be38810>

This is looking pretty good! If we could space things out a bit more it would be easier to read. We could try setting the limits:

In [12]:

left = year - 0.5
fig, ax = plt.subplots()
ax.bar(left=left, height=tropical_storms, width=1, linewidth=0, color='b', label='Storms')
ax.bar(left=left, height=unretired_hurricanes, bottom=tropical_storms, width=1, linewidth=0, color='r', label='Hurricanes')
ax.bar(left=left, height=retired, bottom=tropical_storms + unretired_hurricanes, width=1, linewidth=0, color='g', label='Retired')
ax.set_xlabel('Year')
ax.legend(loc='upper left', fontsize=10)
ax.set_xlim(year.min() - 0.5, year.max() + 0.5)

Out[12]:

(1850.5, 2011.5)

That helps but actually increasing the figure size would help spread things out even more. We can do that with the figsize argument to plt.subplots, which takes a tuple of figure (width, height) in inches:

In [13]:

left = year - 0.5
fig, ax = plt.subplots(figsize=(14, 4))
ax.bar(left=left, height=tropical_storms, width=1, linewidth=0, color='b', label='Storms')
ax.bar(left=left, height=unretired_hurricanes, bottom=tropical_storms, width=1, linewidth=0, color='r', label='Hurricanes')
ax.bar(left=left, height=retired, bottom=tropical_storms + unretired_hurricanes, width=1, linewidth=0, color='g', label='Retired')
ax.set_xlabel('Year')
ax.legend(loc='upper left', fontsize=10)
ax.set_xlim(year.min() - 0.5, year.max() + 0.5)

Out[13]:

(1850.5, 2011.5)

That is looking nice!

So far we've only been plotting the various storm counts and haven't touch the deaths or damage data. It's very different data so we can't put it on the same axis as the storm counts, but by using a second y-axis we can put it on the same figure. We can do this using the ax.twinx method:

In [14]:

left = year - 0.5
fig, ax = plt.subplots(figsize=(14, 4))
ax.bar(left=left, height=tropical_storms, width=1, linewidth=0, color='b', label='Storms')
ax.bar(left=left, height=unretired_hurricanes, bottom=tropical_storms, width=1, linewidth=0, color='r', label='Hurricanes')
ax.bar(left=left, height=retired, bottom=tropical_storms + unretired_hurricanes, width=1, linewidth=0, color='g', label='Retired')
ax.set_xlabel('Year')
ax.legend(loc='upper left', fontsize=10)

ax2 = ax.twinx()
ax2.set_yscale('log')
ax2.plot(year, damage, color='c', linewidth=2)
ax2.set_ylabel('Damage (millions of USD)')

ax.set_xlim(year.min() - 0.5, year.max() + 0.5)

Out[14]:

(1850.5, 2011.5)

That's really getting a bit unreadable, though. A better solution might be to have two subplots on the figure, one for the bar chart and another to show the deaths and damage as line plots. To get two axes we'll change our call to plt.subplots a little, specifying the number of rows and columns we want on the figure. We'll also increase the figsize to account for the two axes:

In [15]:

fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 6), sharex=True) # ax1 will be top plot, ax2 the bottom

damage_line, = ax1.plot(year, damage, color='blue', label='Damage')
ax1.set_ylabel('Damage (millions of USD)')

ax1_right = ax1.twinx()
deaths_line, = ax1_right.plot(year, deaths, color='red', label='Deaths')
ax1_right.set_ylabel('Deaths')

ax1.legend((damage_line, deaths_line), ('Damage', 'Deaths'), loc='upper left', fontsize=10)

ax2.bar(left=left, height=tropical_storms, width=1, linewidth=0, color='b', label='Storms')
ax2.bar(left=left, height=unretired_hurricanes, bottom=tropical_storms, width=1, linewidth=0, color='r', label='Hurricanes')
ax2.bar(left=left, height=retired, bottom=tropical_storms + unretired_hurricanes, width=1, linewidth=0, color='g', label='Retired')
ax2.set_xlabel('Year')
ax2.set_xlim(year.min() - 0.5, year.max() + 0.5)
ax2.legend(loc='upper left', fontsize=10)

fig.tight_layout(h_pad=0)

Notice that for the top plot we weren't able to automatically construct a legend because the two lines are on different Axes objects. To construct the legend in this case we had to capture the output of calls to plot and then pass those into legend along with the labels. All of the ax. plotting methods will return the object that represents the data in the figure.

Now that we've got this awesome figure we can save it to disk with the fig.savefig method:

In [16]:

fig.savefig('hurricane_history.pdf')

More with matplotlib¶

So what else can you do with matplotlib? Tons! To keep learning take a look at the official documentation and browser the example gallery.

In [16]: