In this problem, there are three assistants working at a company: Moe, Larry, and Curly. Their primary job duty is to file paperwork in the filing cabinet when papers become available. The three assistants have different work schedules:
Moe | Larry | Curly | |
---|---|---|---|
Workload | 60% | 30% | 10% |
That is, Moe works 60% of the time, Larry works 30% of the time, and Curly does the remaining 10%, and they file documents at approximately the same speed. Suppose a person were to select one of the documents from the cabinet at random. Let M be the event
and let L and C be the events that Larry and Curly, respectively, filed the document. What are these events’ respective probabilities? In the absence of additional information, reasonable prior probabilities would just be
Moe | Larry | Curly | |
---|---|---|---|
Prior | P(M)=0.60 | P(L)=0.30 | P(C)=0.10 |
Now, the boss comes in one day, opens up the file cabinet, and selects a file at random. The boss discovers that the file has been misplaced. The boss is so angry at the mistake that (s)he threatens to fire the one who erred.
# who ??
The boss decides to use probability to decide, and walks straight to the workload schedule. (S)he reasons that, since the three employees work at the same speed, the probability that a randomly selected file would have been filed by each one would be proportional to his workload. The boss notifies Moe that he has until the end of the day to empty his desk.
But Moe argues in his defense that the boss has ignored additional information. Moe’s likelihood of having misfiled a document is smaller than Larry’s and Curly’s, since he is a diligent worker who pays close attention to his work. Moe admits that he works longer than the others, but he doesn’t make as many mistakes as they do. Thus, Moe recommends that – before making a decision – the boss should update the probability (initially based on workload alone) to incorporate the likelihood of having observed a misfiled document.
And, as it turns out, the boss has information about Moe, Larry, and Curly’s filing accuracy in the past (due to historical performance evaluations). The performance information may be represented by the following table:
Moe | Larry | Curly | |
---|---|---|---|
Misfile Rate | 0.003 | 0.007 | 0.010 |
# who ??
In other words, on the average, Moe misfiles 0.003% of the documents he is supposed to file. Notice that Moe was correct: he is the most accurate filer, followed by Larry, and lastly Curly. If the boss were to make a decision based only on the worker’s overall accuracy, then Curly should get the axe. But Curly hears this and interjects that he only works a short period during the day, and consequently makes mistakes only very rarely; there is only the tiniest chance that he misfiled this particular document.
The boss would like to use this updated information to update the probabilities for the three assistants, that is, (s)he wants to use the additional likelihood that the document was misfiled to update his/her beliefs about the likely culprit. Let A be the event that a document is misfiled. What the boss would like to know are the three probabilities
Moe | Larry | Curly | |
---|---|---|---|
Prior | 0.6 | 0.3 | 0.1 |
Lilkehood | 0.003 | 0.007 | 0.01 |
Posterior | 0.37 | 0.43 | 0.20 |
# who ??
import numpy as np
# set prior
prior = np.array([0.6, 0.3, 0.1])
# set likilhood
like = np.array([0.003, 0.007, 0.01])
# normalizing
evidence = sum(prior * like)
# get posterior
post = prior * like / evidence
print prior
print like
print evidence
print post
[ 0.6 0.3 0.1] [ 0.003 0.007 0.01 ] 0.0049 [ 0.36734694 0.42857143 0.20408163]
Moe | Larry | Curly | |
---|---|---|---|
Prior | 0.6 | 0.3 | 0.1 |
Lilkehood | 0.003 | 0.007 | 0.01 |
Posterior | 0.37 | 0.43 | 0.20 |
!wget http://thinkbayes.com/thinkbayes.py
--2015-02-25 16:49:55-- http://thinkbayes.com/thinkbayes.py Resolving thinkbayes.com (thinkbayes.com)... 208.113.214.221 Connecting to thinkbayes.com (thinkbayes.com)|208.113.214.221|:80... connected. HTTP request sent, awaiting response... 301 Moved Permanently Location: http://www.greenteapress.com/thinkbayes/thinkbayes.py [following] --2015-02-25 16:49:56-- http://www.greenteapress.com/thinkbayes/thinkbayes.py Resolving www.greenteapress.com (www.greenteapress.com)... 208.113.214.221 Reusing existing connection to thinkbayes.com:80. HTTP request sent, awaiting response... 200 OK Length: 42406 (41K) [text/plain] Saving to: `thinkbayes.py' 100%[======================================>] 42,406 104K/s in 0.4s 2015-02-25 16:49:57 (104 KB/s) - `thinkbayes.py' saved [42406/42406]
!ls
ch02_ex_prb.ipynb ch02_ex_sol.ipynb thinkbayes.py
prior_prob = np.array([0.6, 0.3, 0.1])
like_prob = np.array([0.003, 0.007, 0.01])
names = ['Moe', 'Larry', 'Curly']
from thinkbayes import Pmf
pmf = Pmf()
# set prior
for name, prob in zip(names, prior_prob) :
pmf.Set(name, prob)
print name, prob
Moe 0.6 Larry 0.3 Curly 0.1
pmf.d
{'Curly': 0.10000000000000001, 'Larry': 0.29999999999999999, 'Moe': 0.59999999999999998}
# set likilhood
for name, prob in zip(names, like_prob) :
pmf.Mult(name, prob)
print name, prob
Moe 0.003 Larry 0.007 Curly 0.01
pmf.d
{'Curly': 0.001, 'Larry': 0.0020999999999999999, 'Moe': 0.0018}
# normalizing
pmf.Normalize()
0.0048999999999999998
# get posterior
pmf.Prob('Moe')
0.36734693877551022
pmf.d
{'Curly': 0.20408163265306123, 'Larry': 0.42857142857142855, 'Moe': 0.36734693877551022}
newprior_prob = [pmf.d[name] for name in ['Moe', 'Larry', 'Curly']]
newprior_prob
[0.36734693877551022, 0.42857142857142855, 0.20408163265306123]
Moe | Larry | Curly | |
---|---|---|---|
Prior | 0.6 | 0.3 | 0.1 |
Lilkehood | 0.003 | 0.007 | 0.01 |
Posterior | 0.37 | 0.43 | 0.20 |
Suppose the boss gets a change of heart and does not fire anybody. But the next day (s)he randomly selects another file and again finds it to be misplaced.
# who ??
new_prior = post
evidence = sum(new_prior * like)
post = new_prior * like / evidence
print new_prior
print like
print evidence
print post
[ 0.36734694 0.42857143 0.20408163] [ 0.003 0.007 0.01 ] 0.00614285714286 [ 0.17940199 0.48837209 0.33222591]
Moe | Larry | Curly | |
---|---|---|---|
Prior | 0.37 | 0.43 | 0.20 |
Liklihood | 0.003 | 0.007 | 0.01 |
Posterior | 0.18 | 0.49 | 0.33 |
pmf = Pmf()
for name, prob in zip(names, prior_prob) :
pmf.Set(name, prob)
# prior update
for name, prob in zip(names, newprior_prob) :
pmf.Set(name, prob)
for name, prob in zip(names, like_prob) :
pmf.Mult(name, prob)
pmf.Normalize()
pmf.d
{'Curly': 0.33222591362126247, 'Larry': 0.48837209302325579, 'Moe': 0.17940199335548174}
Moe | Larry | Curly | |
---|---|---|---|
Prior | 0.37 | 0.43 | 0.20 |
Liklihood | 0.003 | 0.007 | 0.01 |
Posterior | 0.18 | 0.49 | 0.33 |
Let us incorporate the posterior probability (post) information from the (2) example and suppose that the assistants misfile seven more documents. Using Bayes’ Rule, what would the new posterior probabilities be?
new_prior = post
evidence = sum(new_prior * (like **7))
post = new_prior * (like**7) / evidence
print new_prior
print like
print evidence
print post
[ 0.36734694 0.42857143 0.20408163] [ 0.003 0.007 0.01 ] 2.39456671429e-15 [ 3.35504436e-04 1.47394933e-01 8.52269563e-01]
Moe | Larry | Curly | |
---|---|---|---|
Prior | 0.37 | 0.43 | 0.20 |
Liklihood | 0.003 | 0.007 | 0.01 |
Posterior | 0.000335 | 0.1473 | 0.8522 |
pmf = Pmf()
for name, prob in zip(names, prior_prob) :
pmf.Set(name, prob)
# prior update
for name, prob in zip(names, newprior_prob) :
pmf.Set(name, prob)
for name, prob in zip(names, like_prob**7) :
pmf.Mult(name, prob)
pmf.Normalize()
pmf.d
{'Curly': 0.85226956273773158, 'Larry': 0.14739493282620111, 'Moe': 0.00033550443606733543}
Moe | Larry | Curly | |
---|---|---|---|
Prior | 0.37 | 0.43 | 0.20 |
Liklihood | 0.003 | 0.007 | 0.01 |
Posterior | 0.000335 | 0.1473 | 0.8522 |