This is based on Example 5.1 (pages 143-5) – Figures 5.1 (page 144) and 5.2 (page 146) of Gamerman and Lopes (2006) which in turn is based on an example of Carlin, Gelfand and Smith (1992)
Essentially we consider count data where: $$Y_i ~ Poisson(\lambda_i)$$ subject to
$\lambda_i = \lambda_b$ if year $<$ changepoint, and $\lambda_i = \lambda_a$ otherwise. Assume $\lambda_b ~ Gamma(\alpha, \beta)$ and $\lambda_a ~ Gamma(\gamma, \delta)$. I think we have discrete uniform prior on $Pr[Change = Year]$. The posterior density is given by:
$$f(\lambda_b, \lambda_a, changepoint) \propto f(y_1, \ldots, y_n | \lambda_b, \lambda_a, changepoint)p(\lambda_b, \lambda_a, changepoint)$. Setting the two sufficient statistics $s_b=\sum_{i=1}^{changepoint} y_i$ and $s_a = sum_{changepoint+1}^n y_i$ we have the posterior as: $$f(\lambda_b, \lambda_a, changepoint) \propto \lambda_b^{\alpha + s_m - 1} e^{-(\beta+changepoint)\lambda_b} \lambda_a^{\gamma+s_a-1} e^{-(\delta s_a+n-changepoint)\lambda_a}$$
This has three simple conditional densities which are specified later.
In total, there are 112 observations of $y_i$ from 1851 to 1962.
from numpy import array
%load_ext rmagic
disasters = ( 4, 5, 4, 0, 1, 4, 3, 4, 0, 6, 3, 3, 4, 0, 2, 6,
3, 3, 5, 4, 5, 3, 1, 4, 4, 1, 5, 5, 3, 4, 2, 5,
2, 2, 3, 4, 2, 1, 3, 2, 2, 1, 1, 1, 1, 3, 0, 0,
1, 0, 1, 1, 0, 0, 3, 1, 0, 3, 2, 2, 0, 1, 1, 1,
0, 1, 0, 1, 0, 0, 0, 2, 1, 0, 0, 0, 1, 1, 0, 2,
3, 3, 1, 1, 2, 1, 1, 1, 1, 2, 4, 2, 0, 0, 1, 4,
0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1)
years = range(1851,1962)
len(disasters)
idx = range(111)
111
%Rpush years disasters
%R plot(years, disasters)
import numpy as np
idxl = np.tril_indices(111)
idxu = np.triu_indices(111)
##create blank matrices of zeros
a=np.zeros(shape=(111,111))
b=np.zeros(shape=(111,111))
## Fill the upper triangle with ones
a[idxl] = 1
b[idxu] = 1
## Now get cumulative sums
cumsumafter = np.dot(b, disasters)
cumsumbefore = np.dot(a, disasters)
## Now append zero at start/end as appropriate.
cumsumafter = np.append(cumsumafter, 0)
cumsumbefore = np.append(0, cumsumbefore)
%Rpush years cumsumbefore
%R plot(years, cumsumbefore[-1])
from scipy import random, exp
#Provide values for the priors
alphaprior = 0.001
betaprior = 0.001
gammaprior = 0.001
deltaprior = 0.001
We need to do the last thing first and define a function to calculate the messiest full conditional; namelt:
$$f(changepoint) = \frac{\lambda_b^{\alpha + s_m - 1} e^{-(\beta+changepoint)\lambda_b} \lambda_a^{\gamma+s_a-1} e^{-(\delta s_a+n-changepoint)\lambda_a}}{\sum_{i=1}^n \lambda_b^{\alpha + s_m - 1} e^{-(\beta+changepoint)\lambda_b} \lambda_a^{\gamma+s_a-1} e^{-(\delta s_a+n-changepoint)\lambda_a}}$$Having computed these values, we can sample (random.choice in scipy) a changepoint value.
def mprob(lb, la, ap, bp, gp, dp, m, sm, sl):
return (lb**(ap+sm-1))*(exp(-1*((bp)+m)*lb))*(la**(gp+sl-1))*(exp(-1*((dp)+(111-m))*la))
The easier full conditionals to simulate from are the two Poisson means.
$$f(\lambda_b) = Gamma(\alpha + s_m, \beta+changepoint)$$and $$f(\lambda_b) = Gamma(\gamma + s_l, \delta + (n-changepoint))$$
This requires only the sufficient statistics $s_m$, $s_l$ and the current value of changepoint.
The Gibbs sampler proceeds in three stages:
Repeat.
outlambdabefore=np.zeros(5000)
outlambdaafter=np.zeros(5000)
outm=np.zeros(5000)
outlambdabefore[0]=12
outlambdaafter[0]=3
outm[0]=60
for iter in range(1,5000):
outlambdabefore[iter] = random.gamma(alphaprior+cumsumbefore[outm[iter-1]], 1.0/(betaprior+outm[iter-1]))
outlambdaafter[iter] = random.gamma(gammaprior+cumsumafter[outm[iter-1]], 1.0/(deltaprior+111-outm[iter-1]))
blah=np.zeros(111)
for i in range(111):
blah[i] = mprob(outlambdabefore[iter], outlambdaafter[iter], alphaprior, betaprior, gammaprior, deltaprior, i, cumsumbefore[i], cumsumafter[i])
condalpha = blah / sum(blah)
outm[iter]=random.choice(range(0,111), size=1, replace = 0, p=condalpha)
%Rpush outlambdabefore outlambdaafter outm
%R plot(c(1:5000), outlambdabefore, type = "l")
%R par(mfrow = c(2,2)); hist(outlambdabefore[c(1000:5000)], xlim = c(0,5)); hist(outlambdaafter[c(1000:5000)], xlim =c(0,5))
%R plot(xtabs(~outm[c(1000:5000)]))
%R acf(outlambdabefore[c(1000:5000)]) ; acf(outlambdaafter[c(1000:5000)])
array([<rpy2.rinterface.SexpVector - Python:0x109ad4480 / R:0x7fe03ab56dd0>, <rpy2.rinterface.SexpVector - Python:0x109ad4468 / R:0x7fe03d209a88>, <rpy2.rinterface.SexpVector - Python:0x109ad4228 / R:0x7fe03d209a58>, <rpy2.rinterface.SexpVector - Python:0x109ad4558 / R:0x7fe03ab56f20>, <rpy2.rinterface.SexpVector - Python:0x109ad4570 / R:0x7fe03d209a28>, rpy2.rinterface.NULL], dtype=object)
This was a quick sketch of a Gibbs sampler for a rather famous change point problem. The first thing to note is that analytic values are available and so the answers can be checked.
As a programming exercise there is still a little work: the way the posterior value for the change point is calculated currently uses a for loop which could be vectorised. There are a few other efficiency tweaks that should be made. But it appears that on the whole this works.