베이지안Py - 2회차 연습문제¶

작성자 : 김무성
참조[1,2,3,4]

Misfiling Assistants¶

In this problem, there are three assistants working at a company: Moe, Larry, and Curly. Their primary job duty is to file paperwork in the filing cabinet when papers become available. The three assistants have different work schedules:

	Moe	Larry	Curly
Workload	60%	30%	10%

That is, Moe works 60% of the time, Larry works 30% of the time, and Curly does the remaining 10%, and they file documents at approximately the same speed. Suppose a person were to select one of the documents from the cabinet at random. Let M be the event

M = { More filed the document }

and let L and C be the events that Larry and Curly, respectively, filed the document. What are these events’ respective probabilities? In the absence of additional information, reasonable prior probabilities would just be

	Moe	Larry	Curly
Prior	P(M)=0.60	P(L)=0.30	P(C)=0.10

Now, the boss comes in one day, opens up the file cabinet, and selects a file at random. The boss discovers that the file has been misplaced. The boss is so angry at the mistake that (s)he threatens to fire the one who erred.

The question is: who misplaced the file?¶

In [3]:

# who ??

(1)¶

The boss decides to use probability to decide, and walks straight to the workload schedule. (S)he reasons that, since the three employees work at the same speed, the probability that a randomly selected file would have been filed by each one would be proportional to his workload. The boss notifies Moe that he has until the end of the day to empty his desk.

But Moe argues in his defense that the boss has ignored additional information. Moe’s likelihood of having misfiled a document is smaller than Larry’s and Curly’s, since he is a diligent worker who pays close attention to his work. Moe admits that he works longer than the others, but he doesn’t make as many mistakes as they do. Thus, Moe recommends that – before making a decision – the boss should update the probability (initially based on workload alone) to incorporate the likelihood of having observed a misfiled document.

And, as it turns out, the boss has information about Moe, Larry, and Curly’s filing accuracy in the past (due to historical performance evaluations). The performance information may be represented by the following table:

	Moe	Larry	Curly
Misfile Rate	0.003	0.007	0.010

The question is: who misplaced the file?¶

In [4]:

# who ??

(2)¶

In other words, on the average, Moe misfiles 0.003% of the documents he is supposed to file. Notice that Moe was correct: he is the most accurate filer, followed by Larry, and lastly Curly. If the boss were to make a decision based only on the worker’s overall accuracy, then Curly should get the axe. But Curly hears this and interjects that he only works a short period during the day, and consequently makes mistakes only very rarely; there is only the tiniest chance that he misfiled this particular document.

The boss would like to use this updated information to update the probabilities for the three assistants, that is, (s)he wants to use the additional likelihood that the document was misfiled to update his/her beliefs about the likely culprit. Let A be the event that a document is misfiled. What the boss would like to know are the three probabilities

P(M|A) = ?
P(L|A) = ?
P(C|A) = ?

P(M|A) = P(M and A) / P(A)
P(M and A) = 0.6 * 0.003 = 0.0018
P(L|A) = 0.0021
P(C|A) = 0.0010
P(A) = P(M and A) + P(L and A) + P(C and A) = 0.0049
Bayes' Rule -> P(M|A) = 0.0018 / 0.0049

	Moe	Larry	Curly
Prior	0.6	0.3	0.1
Lilkehood	0.003	0.007	0.01
Posterior	0.37	0.43	0.20

The question is: who misplaced the file?¶

In [16]:

# who ??

code¶

In [53]:

import numpy as np

# set prior
prior = np.array([0.6, 0.3, 0.1])

# set likilhood
like = np.array([0.003, 0.007, 0.01])

# normalizing
evidence = sum(prior * like)

# get posterior
post = prior * like / evidence

print prior
print like
print evidence
print post

[ 0.6  0.3  0.1]
[ 0.003  0.007  0.01 ]
0.0049
[ 0.36734694  0.42857143  0.20408163]

	Moe	Larry	Curly
Prior	0.6	0.3	0.1
Lilkehood	0.003	0.007	0.01
Posterior	0.37	0.43	0.20

code (thinkbayes)¶

In [18]:

!wget http://thinkbayes.com/thinkbayes.py

--2015-02-25 16:49:55--  http://thinkbayes.com/thinkbayes.py
Resolving thinkbayes.com (thinkbayes.com)... 208.113.214.221
Connecting to thinkbayes.com (thinkbayes.com)|208.113.214.221|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://www.greenteapress.com/thinkbayes/thinkbayes.py [following]
--2015-02-25 16:49:56--  http://www.greenteapress.com/thinkbayes/thinkbayes.py
Resolving www.greenteapress.com (www.greenteapress.com)... 208.113.214.221
Reusing existing connection to thinkbayes.com:80.
HTTP request sent, awaiting response... 200 OK
Length: 42406 (41K) [text/plain]
Saving to: `thinkbayes.py'

100%[======================================>] 42,406       104K/s   in 0.4s    

2015-02-25 16:49:57 (104 KB/s) - `thinkbayes.py' saved [42406/42406]

In [19]:

!ls

ch02_ex_prb.ipynb  ch02_ex_sol.ipynb  thinkbayes.py

In [86]:

prior_prob = np.array([0.6, 0.3, 0.1])
like_prob = np.array([0.003, 0.007, 0.01])
names = ['Moe', 'Larry', 'Curly']

In [87]:

from thinkbayes import Pmf
pmf = Pmf()

In [88]:

# set prior
for name, prob in zip(names, prior_prob) :
    pmf.Set(name, prob)
    print name, prob

Moe 0.6
Larry 0.3
Curly 0.1

In [89]:

pmf.d

Out[89]:

{'Curly': 0.10000000000000001,
 'Larry': 0.29999999999999999,
 'Moe': 0.59999999999999998}

In [90]:

# set likilhood 
for name, prob in zip(names, like_prob) :
    pmf.Mult(name, prob)
    print name, prob

Moe 0.003
Larry 0.007
Curly 0.01

In [91]:

pmf.d

Out[91]:

{'Curly': 0.001, 'Larry': 0.0020999999999999999, 'Moe': 0.0018}

In [92]:

# normalizing
pmf.Normalize()

Out[92]:

0.0048999999999999998

In [93]:

# get posterior
pmf.Prob('Moe')

Out[93]:

0.36734693877551022

In [94]:

pmf.d

Out[94]:

{'Curly': 0.20408163265306123,
 'Larry': 0.42857142857142855,
 'Moe': 0.36734693877551022}

In [95]:

newprior_prob = [pmf.d[name] for name in ['Moe', 'Larry', 'Curly']]
newprior_prob

Out[95]:

[0.36734693877551022, 0.42857142857142855, 0.20408163265306123]

	Moe	Larry	Curly
Prior	0.6	0.3	0.1
Lilkehood	0.003	0.007	0.01
Posterior	0.37	0.43	0.20

(3)¶

Suppose the boss gets a change of heart and does not fire anybody. But the next day (s)he randomly selects another file and again finds it to be misplaced.

The question is: who misplaced the file?¶

In [6]:

# who ??

code¶

In [62]:

new_prior = post
evidence = sum(new_prior * like)
post = new_prior * like / evidence

print new_prior
print like
print evidence
print post

[ 0.36734694  0.42857143  0.20408163]
[ 0.003  0.007  0.01 ]
0.00614285714286
[ 0.17940199  0.48837209  0.33222591]

	Moe	Larry	Curly
Prior	0.37	0.43	0.20
Liklihood	0.003	0.007	0.01
Posterior	0.18	0.49	0.33

code(thinkbayes)¶

In [97]:

pmf = Pmf()
for name, prob in zip(names, prior_prob) :
    pmf.Set(name, prob)

# prior update
for name, prob in zip(names, newprior_prob) :
    pmf.Set(name, prob)

for name, prob in zip(names, like_prob) :
    pmf.Mult(name, prob)    
    
pmf.Normalize()    
pmf.d

Out[97]:

{'Curly': 0.33222591362126247,
 'Larry': 0.48837209302325579,
 'Moe': 0.17940199335548174}

	Moe	Larry	Curly
Prior	0.37	0.43	0.20
Liklihood	0.003	0.007	0.01
Posterior	0.18	0.49	0.33

(4)¶

Let us incorporate the posterior probability (post) information from the (2) example and suppose that the assistants misfile seven more documents. Using Bayes’ Rule, what would the new posterior probabilities be?

code¶

In [17]:

new_prior = post
evidence = sum(new_prior * (like **7))
post = new_prior * (like**7) / evidence

print new_prior
print like
print evidence
print post

[ 0.36734694  0.42857143  0.20408163]
[ 0.003  0.007  0.01 ]
2.39456671429e-15
[  3.35504436e-04   1.47394933e-01   8.52269563e-01]

	Moe	Larry	Curly
Prior	0.37	0.43	0.20
Liklihood	0.003	0.007	0.01
Posterior	0.000335	0.1473	0.8522

code(thinkbayes)¶

In [98]:

pmf = Pmf()
for name, prob in zip(names, prior_prob) :
    pmf.Set(name, prob)

# prior update
for name, prob in zip(names, newprior_prob) :
    pmf.Set(name, prob)
    
for name, prob in zip(names, like_prob**7) :
    pmf.Mult(name, prob)    
    
pmf.Normalize()    
pmf.d

Out[98]:

{'Curly': 0.85226956273773158,
 'Larry': 0.14739493282620111,
 'Moe': 0.00033550443606733543}

	Moe	Larry	Curly
Prior	0.37	0.43	0.20
Liklihood	0.003	0.007	0.01
Posterior	0.000335	0.1473	0.8522

References¶

[1] Introduction to Probability and Statistics Using R : Example 4.44 - http://cran.r-project.org/web/packages/IPSUR/vignettes/IPSUR.pdf
[2] Example 4.45
[3] Example 4.47
[4] Think Bayes - http://www.greenteapress.com/thinkbayes/