See this notebook as a blog post: http://pages.uoregon.edu/cfulton/posts/state_space_mh.html, along with other state space resources.
%matplotlib inline
import numpy as np
import pandas as pd
import pymc as mc
from scipy import signal
import statsmodels.api as sm
import matplotlib.pyplot as plt
np.set_printoptions(precision=4, suppress=True, linewidth=120)
Suppose we have a time series $Y_T \equiv \{ y_t \}_{t=0}^T$ which we model as local level process:
$$ \begin{align} y_t & = \mu_t + \varepsilon_t, \qquad \varepsilon_t \sim N(0, \sigma_\varepsilon^2) \\ \mu_{t+1} & = \mu_t + \eta_t, \qquad \eta_t \sim N(0, \sigma_\eta^2) \\ \end{align} $$In this model, there are two unknown parameters, which we will collected in a vector $\psi$, so that: $\psi = (\sigma_\varepsilon^2, \sigma_\eta^2)$; let's set their true values as follows (denoted with the subscript 0):
$$ \psi_0 = (\sigma_{\varepsilon,0}^2, \sigma_{\eta,0}^2) = (3, 10) $$Finally, we also must specify the prior $\mu_0 \sim N(m_0, P_0)$ to initialize the Kalman filter.
Set $T = 1000$.
# True values
T = 1000
sigma2_eps0 = 3
sigma2_eta0 = 10
# Simulate data
np.random.seed(1234)
eps = np.random.normal(scale=sigma2_eps0**0.5, size=T)
eta = np.random.normal(scale=sigma2_eta0**0.5, size=T)
mu = np.cumsum(eta)
y = mu + eps
# Plot the time series
fig, ax = plt.subplots(figsize=(13,2))
ax.plot(y);
ax.set(xlabel='$T$', title='Simulated series');
It turns out it will be convenient to write the model in terms of the precision of $\varepsilon$, defined to be $h^{-1} \equiv \sigma_\varepsilon^2$, and the ratio of the variances: $q \equiv \sigma_\eta^2 / \sigma_\varepsilon^2$ so that $q h^{-1} = \sigma_\eta^2$.
Then our error terms can be written:
$$ \varepsilon_t \sim N(0, h^{-1}) \\ \eta_t \sim N(0, q h^{-1}) \\ $$And the true values are:
To take a Bayesian approach to this problem, we assume that $\psi$ is a random variable, and we want to learn about the values of $\psi$ based on the data $Y_T$; in fact we want a density $p(\psi | Y_T)$. To do this, we use Bayes rule to write:
$$ p(\psi | Y_T) = \frac{p(Y_T | \psi) p(\psi)}{p(Y_T)} $$or
$$ \underbrace{p(\psi | Y_T)}_{\text{posterior}} \propto \underbrace{p(Y_T | \psi)}_{\text{likelihood}} \underbrace{p(\psi)}_{\text{prior}} $$The object of interest is the posterior; to achieve it we need to specify a prior density for the unknown parameters and the likelihood function of the model.
We will use the following priors:
Since the precision must be positive, but has no theoretical upper bound, we use a Gamma prior:
$$h \sim \text{Gamma}(\alpha_h, \beta_h)$$to be specific, the density is written:
$$p(h) = \frac{\beta_h^{\alpha_h}}{\Gamma(\alpha)} h^{\alpha_h-1}e^{-\beta_h h}$$and we set the hyperparameters as $\alpha_h = 2, \beta_h = 2$. In this case, we have $E(h) = \alpha_h / \beta_h = 1$ and also $E(h^{-1}) = E(\sigma_\varepsilon^2) = 1$.
Similarly, the ratio of variances must be positive, but has no theoretical upper bound, so we again use an (independent) Gamma prior:
$$q \sim \text{Gamma}(\alpha_q, \beta_q)$$and we set the same hyperparameters, so $\alpha_q = 2, \beta_q = 2$. Since $E(q) = 1$, our prior is of equal variances. We then have $E(\sigma_\eta^2) = E(q h^{-1}) = E(q) E(h^{-1}) = 1$.
As noted above, the Kalman filter must be initialized with $\mu_0 \sim N(m_0, P_0)$. We will use the following approximately diffuse prior:
$$\mu_0 \sim N(0, 10^6)$$For given parameters, likelihood of this model can be calculated via prediction error decomposition using an application of the Kalman filter iterations.
# Priors
precision = mc.Gamma('precision', 2, 4)
ratio = mc.Gamma('ratio', 2, 1)
# Likelihood calculated using the state-space model
class LocalLevel(sm.tsa.statespace.MLEModel):
def __init__(self, endog):
# Initialize the state space model
super(LocalLevel, self).__init__(endog, k_states=1,
initialization='approximate_diffuse',
loglikelihood_burn=1)
# Initialize known components of the state space matrices
self.ssm['design', :] = 1
self.ssm['transition', :] = 1
self.ssm['selection', :] = 1
@property
def start_params(self):
return [1. / np.var(self.endog), 1.]
@property
def param_names(self):
return ['h_inv', 'q']
def update(self, params, transformed=True):
params = super(LocalLevel, self).update(params, transformed)
h, q = params
sigma2_eps = 1. / h
sigma2_eta = q * sigma2_eps
self.ssm['obs_cov', 0, 0] = sigma2_eps
self.ssm['state_cov', 0, 0] = sigma2_eta
# Instantiate the local level model with our simulated data
ll_mod = LocalLevel(y)
# Create the stochastic (observed) component
@mc.stochastic(dtype=LocalLevel, observed=True)
def local_level(value=ll_mod, h=precision, q=ratio):
return value.loglike([h, q], transformed=True)
# Create the PyMC model
ll_mc = mc.Model((precision, ratio, local_level))
# Create a PyMC sample
ll_sampler = mc.MCMC(ll_mc)
# Sample
ll_sampler.sample(iter=10000, burn=1000, thin=10)
# Plot traces
mc.Matplot.plot(ll_sampler)
[-----------------100%-----------------] 10000 of 10000 complete in 19.8 secPlotting ratio Plotting precision
For comparison, here is the output from maximum likelihood estimation:
ll_res = ll_mod.fit()
print ll_res.summary()
Statespace Model Results ============================================================================== Dep. Variable: y No. Observations: 1000 Model: LocalLevel Log Likelihood -2789.529 Date: Fri, 31 Jul 2015 AIC 5583.058 Time: 12:33:08 BIC 5592.873 Sample: 0 HQIC 5586.788 - 1000 Covariance Type: opg ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ h_inv 0.3045 0.049 6.196 0.000 0.208 0.401 q 2.9588 0.690 4.289 0.000 1.607 4.311 ============================================================================== Warnings: [1] Covariance matrix calculated using the outer product of gradients.
It can be shown (see e.g. Harvey (1989), among many others) that the local level model is equivalent to an ARIMA(0,1,1) model, which is written:
$$y_t = \theta \zeta_{t-1} + \zeta_t, \qquad \zeta_t \sim N(0, \sigma^2)$$It is important to note that $\sigma_\zeta^2$ here is not equivalent to $\sigma_\varepsilon^2$ from the local level model. However, we can connect the models through $q = - (1 + \theta)^2 / \theta$. Note that to generate a valid $q$, $\theta$ must be negative.
We will use the same Gamma prior as above on the precision $h^{-1} = \sigma_\eta^2$, but we must specify a new prior for $\theta$. We will specify that $-\theta \equiv \gamma \sim \text{Beta}(\alpha_\theta, \beta_\theta)$. Since the entire support for the Beta distribution falls between 0 and 1, this prior implies $\theta \in [-1, 0]$.
We would expect that $\hat \theta \approx -0.21$, since then $q = \frac{-(1 - 0.21)^2}{-0.21} \approx 3$ (and so $\hat \gamma \approx 0.21$).
# Priors
arima_precision = mc.Gamma('arima_precision', 2, 4)
gamma = mc.Beta('gamma', 2, 1)
# Instantiate the ARIMA model with our simulated data
arima_mod = sm.tsa.SARIMAX(y, order=(0,1,1))
# Create the stochastic (observed) component
@mc.stochastic(dtype=sm.tsa.SARIMAX, observed=True)
def arima(value=arima_mod, h=arima_precision, gamma=gamma):
# Rejection sampling
if gamma < 0 or h < 0:
return 0
return value.loglike(np.r_[-gamma, 1./h], transformed=True)
# Create the PyMC model
arima_mc = mc.Model((arima_precision, gamma, arima))
# Create a PyMC sample
arima_sampler = mc.MCMC(arima_mc)
# Sample
arima_sampler.sample(iter=10000, burn=1000, thin=10)
# Plot traces
mc.Matplot.plot(arima_sampler)
[-----------------100%-----------------] 10000 of 10000 complete in 22.0 secPlotting arima_precision Plotting gamma
arima_res = arima_mod.fit()
print arima_res.summary()
Statespace Model Results ============================================================================== Dep. Variable: y No. Observations: 1000 Model: SARIMAX(0, 1, 1) Log Likelihood -2789.529 Date: Fri, 31 Jul 2015 AIC 5583.058 Time: 12:37:46 BIC 5592.873 Sample: 0 HQIC 5586.788 - 1000 Covariance Type: opg ============================================================================== coef std err z P>|z| [0.025 0.975] ------------------------------------------------------------------------------ ma.L1 -0.2106 0.032 -6.576 0.000 -0.273 -0.148 sigma2 15.5913 0.667 23.386 0.000 14.285 16.898 ============================================================================== Warnings: [1] Covariance matrix calculated using the outer product of gradients.