벨기에 유로 동전을 250번 던졌더니 앞면(Head)이 140번, 뒷면(Tail)이 110번 나왔다. 이 동전은 공정한 것일까?
(Information Theory, Inference and Learning Algorithms, David MacKay)
추정 목표
Prior
Likelihood
Posterior
import thinkbayes
import thinkplot
class Euro(thinkbayes.Suite):
"""Represents hypotheses about the probability of heads."""
def Likelihood(self, data, hypo):
"""Computes the likelihood of the data under the hypothesis.
hypo: integer value of x, the probability of heads (0-100)
data: string 'H' or 'T'
"""
x = hypo / 100.0
if data == 'H':
return x
else:
return 1-x
def UniformPrior():
"""Makes a Suite with a uniform prior."""
suite = Euro(xrange(0, 101))
return suite
def RunUpdate(suite, heads=140, tails=110):
"""Updates the Suite with the given number of heads and tails.
suite: Suite object
heads: int
tails: int
"""
dataset = 'H' * heads + 'T' * tails
for data in dataset:
suite.Update(data)
def PlotSuites(suites):
"""Plots two suites.
suite1, suite2: Suite objects
root: string filename to write
"""
thinkplot.Clf()
thinkplot.PrePlot(len(suites))
thinkplot.Pmfs(suites)
suite1 = UniformPrior()
suite1.name = 'uniform'
RunUpdate(suite1)
PlotSuites([suite1])
kwargs = {'title':'Posterior from Uniform Prior', 'xlabel':'p(%)', 'ylabel':'probability mass'}
thinkplot.Show(**kwargs)
def Summarize(suite):
"""Prints summary statistics for the suite."""
print suite.Prob(50)
print 'MLE', suite.MaximumLikelihood()
print 'Mean', suite.Mean()
print 'Median', thinkbayes.Percentile(suite, 50)
print '5th %ile', thinkbayes.Percentile(suite, 5)
print '95th %ile', thinkbayes.Percentile(suite, 95)
print 'CI', thinkbayes.CredibleInterval(suite, 90)
Summarize(suite1)
0.0209765261295 MLE 56 Mean 55.9523809524 Median 56 5th %ile 51 95th %ile 61 CI (51, 61)
def TrianglePrior():
"""Makes a Suite with a triangular prior."""
suite = Euro()
for x in range(0, 51):
suite.Set(x, x)
for x in range(51, 101):
suite.Set(x, 100-x)
suite.Normalize()
return suite
suite1 = UniformPrior()
suite1.name = 'uniform'
suite2 = TrianglePrior()
suite2.name = 'triangle'
PlotSuites([suite1, suite2])
kwargs = {'title':'Different Priors', 'xlabel':'p(%)', 'ylabel':'probability mass'}
thinkplot.Show(**kwargs)
RunUpdate(suite1)
RunUpdate(suite2)
PlotSuites([suite1, suite2])
kwargs = {'title':'Posterior from Triangular Prior', 'xlabel':'p(%)', 'ylabel':'probability mass'}
thinkplot.Show(**kwargs)
Summarize(suite2)
0.0238475372147 MLE 56 Mean 55.7434994386 Median 56 5th %ile 51 95th %ile 61 CI (51, 61)
Gaussian, Beta, Dirichlet distribution 등등...
Likelihood가 특정한 형태일 때, prior와 posterior가 같은 형태의 함수로 표혐됨
$ x $ 가 $ [0, 1]$ 안에 있을 때,
단 $a, b$는 양수이고 $B(a,b)$는 베타 함수(Beta function) 여기에 들어가는 $\Gamma(a)$는 감마 함수(Gamma function)
어차피 베타 함수의 역할은 normalizing constant
이미 구현된 계산 함수를 쓰면 됨ㅋ
국문 위키 http://ko.wikipedia.org/wiki/%EB%B2%A0%ED%83%80_%EB%B6%84%ED%8F%AC
''' building beta distribution '''
import numpy as np
from scipy.special import beta # beta function
def beta_pdf(x, a, b):
return (x**(a-1)) *((1-x)**(b-1)) / (beta(a,b))
1.0 [ 1. 1.]
x = np.linspace(0,1,200)
p = beta_pdf(x,2,2)
plt.plot(x,p,'-')
plt.title('Beta distribution - Prior density')
plt.xlabel('x')
plt.ylabel('probability density')
<matplotlib.text.Text at 0x7fe4919909d0>
''' a=b=1 gives uniform prior distribution '''
a = 1
b = 1
''' The Bayesian update '''
a += 140
b += 110
x = np.linspace(0,1,200)
p = beta_pdf(x,a,b)
plt.plot(x,p,'-')
plt.title('Beta distribution - Posterior density')
plt.xlabel('x')
plt.ylabel('probability density')
plt.figure()
''' comparison to the discrete prior case '''
PlotSuites([suite1])
kwargs = {'title':'Posterior from Uniform Prior', 'xlabel':'p(%)', 'ylabel':'probability mass'}
thinkplot.Show(**kwargs)