The aim here is to run through some of the core concepts in Bayesian modelling, seeing how we can implement them with PyMC. A simple linear regression model is used. We will look at:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rc('font', size=20)
import numpy as np
import pymc
print(pymc.__version__)
2.3.3
For this example we assume values of $x$ are uniformly distributed between 0-10
$x \sim Uniform(0,10)$
The true value of $y$, which we will call $\mu$ is given by the standard linear regression equation $\mu=mx+c$
$\mu \leftarrow m.x + c$
But observations are noisy with precision $\tau$, which is the inverse variance $\tau = 1/\sigma^2$
$y \sim Normal(\mu, \tau)$
And we have priors over the parameters: slope, intercept, and tau
$m \sim Normal(0,1/100)$
$c \sim Normal(0,1/100)$
$\tau \sim Gamma(0.01, 0.01)$
true_m = 1.0
true_c = 0.0
true_tau = 1.0/(1.5*1.5)
The following code creates PyMC objects, which embody our probabilistic model above
def generate():
# priors
x = pymc.Uniform("x", lower=0, upper=10)
tau = pymc.Gamma("tau", alpha=0.1, beta=0.1, value = true_tau, observed=True)
m = pymc.Normal("m", mu=0.0, tau=1.0/100, value=true_m, observed=True)
c = pymc.Normal("c", mu=0.0, tau=1.0/100, value=true_c, observed=True)
@pymc.deterministic(plot=False)
def mu(m=m, x=x, c=c):
return m*x + c
y = pymc.Normal("y", mu=mu, tau=tau)
return locals()
# generate MCMC object
m1 = pymc.MCMC(generate())
m1.sample(iter=1000, burn=100)
[-----------------100%-----------------] 1000 of 1000 complete in 0.1 sec
tempy = m1.y.trace()
tempx = m1.x.trace()
plt.plot(tempx,tempy,'o')
[<matplotlib.lines.Line2D at 0x10ed98750>]
tempx.shape
(900,)
tempy.shape
(900,)
So now we have generated (x,y)
from the model, basically by sampling many times from the distribution $P(x,y|m,c,\tau)$
Our observed data in in tempx
and tempy
, but we generated many samples. We want to see how good our inferences can be with just a handfull of x,y data points
N=10
xdata = tempx[:N]
ydata = tempy[:N]
plt.plot(xdata,ydata,'o')
plt.xlabel('x')
plt.ylabel('y')
<matplotlib.text.Text at 0x10edad4d0>
print(xdata.size)
print(ydata.size)
10 10
def sampling():#(true_m, true_c, true_tau)
# priors
x = pymc.Uniform("x", lower=0, upper=10, value=xdata, observed=True)
tau = pymc.Gamma("tau", alpha=0.1, beta=0.1, value = true_tau, observed=True)
m = pymc.Normal("m", mu=0.0, tau=1.0/100)
c = pymc.Normal("c", mu=0.0, tau=1.0/100)
@pymc.deterministic(plot=False)
def mu(m=m, x=x, c=c):
return m*x + c
y = pymc.Normal("y", mu=mu, tau=tau, value=ydata, observed=True)
# posterior prediction. This will calculate values of x and y
# which are under the influence of the parameters, but are seperated
# from the data
x_pred = pymc.Uniform("x_pred", lower=0, upper=10)
y_pred = pymc.Normal("y_pred", mu=mu, tau=tau) # but the 'mu' IS tied to data
# if you really wanted though you could set up a set of uniformly spaced x vales
# and calc mu for that, and get predited y values. But the way how it is above
# will give a distribution of expected y values for each given provided x value.
return locals()
# generate MCMC object
m2 = pymc.MCMC(sampling())
m2.sample(iter=1000, burn=500)
[-----------------100%-----------------] 1000 of 1000 complete in 0.4 sec
pymc.Matplot.plot(m2.m)
Plotting m
pymc.Matplot.plot(m2.c)
Plotting c
plt.plot( m2.c.trace() , m2.m.trace() ,'.')
plt.xlabel('c')
plt.ylabel('m')
<matplotlib.text.Text at 0x10f474510>
The posterior predictive check was actually already generated in Part 2 where we did parameter estimation. But here we examine the predictions (in x/y data space).
tempx = m2.x_pred.trace()
tempy = m2.y_pred.trace()
print(tempx.shape)
print(tempy.shape)
(500,) (500, 10)
TODO: deal with y having more points than x (caused by it creating a vector of length requal to number of input x points).
plt.plot(m2.x_pred.trace(),
m2.y_pred.trace(),'o')
[<matplotlib.lines.Line2D at 0x10f75e1d0>, <matplotlib.lines.Line2D at 0x10f75e910>, <matplotlib.lines.Line2D at 0x10f75e9d0>, <matplotlib.lines.Line2D at 0x10f75e5d0>, <matplotlib.lines.Line2D at 0x10f75e410>, <matplotlib.lines.Line2D at 0x10f75e190>, <matplotlib.lines.Line2D at 0x10f75eb10>, <matplotlib.lines.Line2D at 0x10f752450>, <matplotlib.lines.Line2D at 0x10f7786d0>, <matplotlib.lines.Line2D at 0x10f7787d0>]
pymc.Matplot.plot(m2.x_pred)
Plotting x_pred
pymc.Matplot.plot(m2.y_pred)
Plotting y_pred_0 Plotting y_pred_1 Plotting y_pred_2 Plotting y_pred_3 Plotting y_pred_4 Plotting y_pred_5 Plotting y_pred_6 Plotting y_pred_7 Plotting y_pred_8 Plotting y_pred_9
m2.y_pred.shape
(10,)
m2.x_pred.shape
()
m2.y_pred.parent_names
['mu', 'tau']