This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
You can find the instructions to install PyMC on the package's website. (http://pymc-devs.github.io/pymc/)
You also need the Storms dataset from the book's website. (http://ipython-books.github.io)
import numpy as np
import pandas as pd
import pymc
import matplotlib.pyplot as plt
%matplotlib inline
# http://www.ncdc.noaa.gov/ibtracs/index.php?name=wmo-data
df = pd.read_csv("data/Allstorms.ibtracs_wmo.v03r05.csv",
delim_whitespace=False)
NA
), then we group the rows by year (Season
), and we take the number of unique storm (Serial_Num
) as each storm can span several days (nunique
method).cnt = df[df['Basin'] == ' NA'].groupby('Season') \
['Serial_Num'].nunique()
years = cnt.index
y0, y1 = years[0], years[-1]
arr = cnt.values
plt.figure(figsize=(8,4));
plt.plot(years, arr, '-ok');
plt.xlim(y0, y1);
plt.xlabel("Year");
plt.ylabel("Number of storms");
early_mean
before a certain switch point, and a second value late_mean
after that point. These three unknown parameters are treated as random variables (we will describe them more in the How it works... section).A Poisson process is a particular point process, that is, a stochastic process describing the random occurence of instantaneous events. The Poisson process is fully random: the events occur independently at a given rate.
switchpoint = pymc.DiscreteUniform('switchpoint',
lower=0, upper=len(arr))
early_mean = pymc.Exponential('early_mean', beta=1)
late_mean = pymc.Exponential('late_mean', beta=1)
@pymc.deterministic(plot=False)
def rate(s=switchpoint, e=early_mean, l=late_mean):
out = np.empty(len(arr))
out[:s] = e
out[s:] = l
return out
storms = pymc.Poisson('storms', mu=rate, value=arr, observed=True)
sample
method launches the fitting iterative procedure.model = pymc.Model([switchpoint, early_mean, late_mean, rate, storms])
mcmc = pymc.MCMC(model)
mcmc.sample(iter=10000, burn=1000, thin=10)
plt.figure(figsize=(8,8))
plt.subplot(311);
plt.plot(mcmc.trace('switchpoint')[:]);
plt.ylabel("Switch point");
plt.subplot(312);
plt.plot(mcmc.trace('early_mean')[:]);
plt.ylabel("Early mean");
plt.subplot(313);
plt.plot(mcmc.trace('late_mean')[:]);
plt.xlabel("Iteration");
plt.ylabel("Late mean");
plt.figure(figsize=(14,3))
plt.subplot(131);
plt.hist(mcmc.trace('switchpoint')[:] + y0, 15);
plt.xlabel("Switch point")
plt.ylabel("Distribution")
plt.subplot(132);
plt.hist(mcmc.trace('early_mean')[:], 15);
plt.xlabel("Early mean");
plt.subplot(133);
plt.hist(mcmc.trace('late_mean')[:], 15);
plt.xlabel("Late mean");
yp = y0 + mcmc.trace('switchpoint')[:].mean()
em = mcmc.trace('early_mean')[:].mean()
lm = mcmc.trace('late_mean')[:].mean()
print((yp, em, lm))
plt.figure(figsize=(8,4));
plt.plot(years, arr, '-ok');
plt.axvline(yp, color='k', ls='--');
plt.plot([y0, yp], [em, em], '-b', lw=3);
plt.plot([yp, y1], [lm, lm], '-r', lw=3);
plt.xlim(y0, y1);
plt.xlabel("Year");
plt.ylabel("Number of storms");
For a possible scientific interpretation of the data considered here, see http://www.gfdl.noaa.gov/global-warming-and-hurricanes.
You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).
IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).