#!/usr/bin/env python
# coding: utf-8

# # Chapter 9: Model Comparison

# (Latex macros)
# $
#     \newcommand{\tage}{t_{age}}
#     \newcommand{\dataerr}{\overrightarrow{\sigma_d}}
#     \newcommand{\nclobs}{n_{c,obs}}
#     \newcommand{\nclpred}{n_{c,pred}}
#     \newcommand{\birthrate}{\dot n_{birth}}
#     \newcommand{\Mi}{M_{i}}
#     \newcommand{\ts}{\tilde{t}}
#     \newcommand{\PI}{\overrightarrow{\pi}}
#     \newcommand{\dif}{\,\text{d}} 
#     \newcommand{\dt}{\Delta t}
#     \newcommand{\Ge}{\gamma_e}
#     \newcommand{\Gd}{\gamma_d} 
#     \newcommand{\given}{\,\mid\,}
#     \newcommand{\prob}{\mathcal{P}}
#     \newcommand{\data}{\overrightarrow{d}}
#     \newcommand{\T}{\overrightarrow{\theta}}
#     \newcommand{\err}{\overrightarrow{\sigma}}
#     \newcommand{\like}{\mathcal{L}}
#     \newcommand{\best}{\mathrm{best}}
#     \newcommand{\normal}{\mathcal{N}}
#     \newcommand{\Pbad}{\prob_{\mathrm{b}}}
#     \newcommand{\union}{\cup}
#     \newcommand{\intersect}{\cap}
#     \newcommand{\argmax}{\operatornamewithlimits{arg\,max}}
#     \newcommand{\argmin}{\operatornamewithlimits{arg\,min}}
#     \newcommand{\card}{\operatorname{card}}	
#     \newcommand{\sgn}{\operatorname{sgn}}
#     \newcommand{\rank}{\operatorname{rank}}				
#     \newcommand{\EE}{\operatornamewithlimits{E}}
#     \newcommand{\id}{\operatorname{id}}			
#    \newcommand{\abs}[1]{\left| #1 \right|}
#    \newcommand{\norm}[1]{\left\| #1 \right\|}
#    \newcommand{\pa}[1]{\left(#1\right)}
#    \newcommand{\bra}[1]{\left[#1\right]}
#    \newcommand{\cbra}[1]{\left\{#1\right\}} 	
#    \newcommand{\Vec}[1]{\overrightarrow}
#    \newcommand{\mmatrix}[1]{\boldsymbol{#1}}
#    \newcommand{\inverse}[1]{{#1}^{-1}}
#    \newcommand{\transpose}[1]{{#1}^{\scriptscriptstyle \top}}
#    \newcommand{\mean}[1]{\left<{#1}\right>}
#    \newcommand{\Proba}[1]{\prob\left(\, #1 \,\right)}
#    \newcommand{\proba}[1]{\prob(\, #1 \,)}
#    \newcommand{\set}[1]{\left\{\,#1\,\right\}}
#    \newcommand{\set}[1]{\left\{\,#1\,\right\}}
# $
# $
#   \newcommand{\Unit}[1]{{\mathrm{~#1}}} % define unit
#   \newcommand{\um}{\mu\mathrm{m}}
#   \newcommand{\erg}{\Unit{erg}}
#   \newcommand{\ergs}{\Unit{erg~s}}
#   \newcommand{\yr}{\Unit{yr}}
#   \newcommand{\Myr}{\Unit{Myr}}
#   \newcommand{\Gyr}{\Unit{Gyr}}
#   \newcommand{\pc}{\Unit{pc}}
#   \newcommand{\kpc}{\Unit{kpc}}
#   \newcommand{\Mpc}{\Unit{Mpc}}
#   \newcommand{\Lsun}{\Unit{L}_{\odot}}
#   \newcommand{\Zsun}{\Unit{Z}_{\odot}}
#   \newcommand{\msun}{\Unit{M}_{\odot}}
#   \newcommand{\kms}{\Unit{km\,s^{-1}}}
#   \newcommand{\ang}{\AA}               %Angstrom unit
#   \newcommand{\degpoint}{\mbox{$^\circ\mskip-7.0mu.\,$}}
#   \newcommand{\halpha}{\mbox{H$\alpha$}}
#   \newcommand{\hbeta}{\mbox{H$\beta$}}
#   \newcommand{\hgamma}{\mbox{H$\gamma$}}
#   \newcommand{\lya}{\mbox{Ly$\alpha$}}
#   \newcommand{\lyb}{\mbox{Ly$\beta$}}
#   \newcommand{\minpoint}{\mbox{$'\mskip-4.7mu.\mskip0.8mu$}}
#   \newcommand{\mv}{\mbox{$m_{_V}$}}
#   \newcommand{\Mv}{\mbox{$M_{_V}$}}
#   \newcommand{\peryr}{\mbox{$\>\rm yr^{-1}$}}
#   \newcommand{\secpoint}{\mbox{$''\mskip-7.6mu.\,$}}
#   \newcommand{\sqdeg}{\mbox{${\rm deg}^2$}}
#   \newcommand{\squig}{\sim\!\!}
#   \newcommand{\subsun}{\mbox{$_{\normalsize\odot}$}}
#   \newcommand{\sq}{\mbox{\rlap{$\sqcap$}$\sqcup$}}% 
#   \newcommand{\arcdeg}{\mbox{$^\circ$}}% 
#   \newcommand{\arcmin}{\mbox{$^\prime$}}% 
#   \newcommand{\arcsec}{\mbox{$^{\prime\prime}$}}% 
#   \newcommand{\fd}{\mbox{$.\!\!^{\mathrm d}$}}% 
#   \newcommand{\fh}{\mbox{$.\!\!^{\mathrm h}$}}% 
#   \newcommand{\fm}{\mbox{$.\!\!^{\mathrm m}$}}% 
#   \newcommand{\fs}{\mbox{$.\!\!^{\mathrm s}$}}% 
#   \newcommand{\fdg}{\mbox{$.\!\!^\circ$}}% 
#   \newcommand{\slantfrac}{\case}% 
#   \newcommand{\onehalf}{\slantfrac{1}{2}}% 
#   \newcommand{\onethird}{\slantfrac{1}{3}}% 
#   \newcommand{\twothirds}{\slantfrac{2}{3}}% 
#   \newcommand{\onequarter}{\slantfrac{1}{4}}% 
#   \newcommand{\threequarters}{\slantfrac{3}{4}}% 
#   \newcommand{\ubvr}{\mbox{$U\!BV\!R$}}%% UBVR system 
#   \newcommand{\ub}{\mbox{$U\!-\!B$}}%   % U-B 
#   \newcommand{\bv}{\mbox{$B\!-\!V$}}%   % B-V 
#   \newcommand{\vr}{\mbox{$V\!-\!R$}}%   % V-R 
#   \newcommand{\ur}{\mbox{$U\!-\!R$}}%   % U-R 
#   \newcommand{\ion}[2]{#1$\;${\small\rmfamily\@Roman{#2}}\relax}% 
#   \newcommand{\nodata}{ ~$\cdots$~ }% 
#   \newcommand{\diameter}{\ooalign{\hfil/\hfil\crcr\mathhexbox20D}}% 
#   \newcommand{\degr}{\arcdeg}% 
#   \newcommand{\sun}{\odot}% 
#   \newcommand{\Sun}{\sun}% 
#   \newcommand{\Sol}{\sun}% 
#   \newcommand{\Av}{\ensuremath{{\mathrm{A}}_{\mathrm{V}}}}
# $

# # Exercises

# ##Exercise 1 (30 points) 
# 
# A survey of $50$ galaxies classifies $17$ of them as having "active nuclei". 
# On the basis of one theory, model $M_1$ says that the fraction, $p$, of active
# nuclei in such a survey is $1/3$. Another model, $M_2$, says that this fraction is $1/2$ with a standard
# deviation, $\sigma$, of $0.1$ (i.e, this model says that there is a natural variation in $p$). 
# 
# Interpreting this latter statement as a beta distribution, what is the Bayes factor, $\proba{D \given M_1} / \proba{D \given M_2}$, for these two models? 
# 
# There is some dispute amongst supporters of $M_2$ as to the value of $\sigma$. Plot the Bayes
# factor as a function of $\sigma$ as you vary it from $0.02$ to $0.4$ in small steps. 
# 
# For what values of $\sigma$, if any, do the evidence favour $M_2$ over $M_1$?

# ## Exercise 2 (40 points)
# 
# Use Bayesian model comparison to decide whether a quadratic model or a linear model is a better
# fit of $y$ vs. $x$ to the data in file 2Dline `modelcomparison.dat`. That is, calculate the evidence for 
# each model and then their Bayes factor. The quadratic model is defined as 
# $$y = a_0 + a_1\,x + a_2\,x^2 + \mathcal{N}(0, \sigma)$$
# The linear model is the same with $a_2 = 0$. 
# 
# Use the following priors for the quadratic model:
# \begin{eqnarray}
#     \proba{a_0} = \mathcal{N}(0, 4)\\
#     \proba{\arctan a_1} = \mathcal{U}(-\pi/2, \pi/2)\\
#     \proba{a_2} = \mathcal{N}(0, 1)\\
#     \proba{\log_{10} \sigma} = \mathcal{U}(-\log_{10} 2, \log_{10} 2)
# \end{eqnarray}
# and the same for the linear model, but of course without $a_2$. $\mathcal{N}(\alpha, \beta)$ indicates a Gaussian with
# mean $\alpha$ and standard deviation $\beta$. $\mathcal{U}(\alpha, \beta)$ indicates a distribution which is uniform between $\alpha$ and $\beta$, and zero outside. Use at least $10^5$ samples of the prior. How conclusive is your analysis?
# Investigate how sensitive your results are to the priors.

# ## Exercise 3 (30 points)
# 
# 
# When Maximum a posteriori differs from Bayesian approach.
# 
# Carol wants to play a strange game with Bob and Alice. In front on them is a rectangular table. Carol ask Bob and Alice to take positions along one of the short sides of the table. Virtually Carol cuts the table at a random position parallel to the long edge defining a line that Alice and Bob don't know. This stays fixed for the whole game. Carol explains the game to the players: "You will throw pucks onto the table where ever you want. Any puck stopping on the left side of the line I already defined will give 1 point to Alice, and any on the right side will give one to Bob. The first to get 6 points wins."
# 
# After each of the player has thrown 4 pucks (8 in total), Carols announces that Alice has 5 points and Bob has 3.
# 
# Alice is one point away from winning and the limit placement seems to favor her. Thus, it seems likely that the next round will go to her as well. And she has three opportunities to get a favorable roll before Bob can win, whereas Bob must win the next three rounds to win the game.  Quantitatively, what is the probability that Bob will nonetheless win the game?
# 
# 
#     ++==============================================================+ p=0
#     ||                                                     x        |
#     ||                                  x     x                     |
#     ++----- virtual cut --------------------------------------------+ p
#     ||                                       x            x         |
#     ||                                                              |
#     ||                                   x                          |
#     ||                                                    x         |
#     ||                                         x                    |
#     ++==============================================================+ p=1
# 
# 
# To answer this problem, we can adopt at least 2 approaches: Maximum a posteriori (MAP) or Bayesian. The following questions will help you to answer using both methods.
# 
# 1) **MAP approach**
# 
# 1.1) Given the current score, estimate where the dividing line sits. Use $p$ as the probability that any toss will land in Alice's favor.
#     
# 1.2) Write down an analytic expression for the probability that Bob wins the game and evaluate it using the given data.
# 
# 
# 2) **Bayesian approach**
# To help you out, we consider the following random variables:
# 
# * $B$ = Bob Wins the game,
# * $D$ = observed data, i.e., $D = (n_A,n_B)=(5, 3)$ the respective number of points for Alice and Bob,
# * $p$ = unknown probability that a ball lands on Alice's side in any throw.
# 
# The problem translates into finding $\proba{B \given D}$, the posterior probability that Bob wins given our observation that Alice currently has five points and Bob has only three.
# $\proba{B \given D}$ does not depend on $p$. Therefore, we consider the marginalization over the unknown $p$ parameter (nuisance parameter). In this case, that means we express the posterior probability as
# $$\proba{B \given D} = \int_{0}^{1} \proba{B, p \given D}\, dp$$
# 
# 2.1) Write down an expression for $\proba{B, p \given D}$ in terms of known quantities using the low of joint probabilities and Bayes' theorem. And then, show that
# $$\proba{B \given D} = \frac{\int_0^1 \pa{1 - p}^6 p^5 dp}{\int_0^1 \pa{1 - p}^3 p^5 dp}$$
# 
# 2.2) give the numerical value of $\proba{B \given D}$ in this approach. 
# 
# **hint**: Realize that you known $\proba{B, p \given D}$
# 
# 3) Compare the values from both approaches.
# 
# 4) We will make a Monte Carlo simulation of the entire game to see which approach gives a result closer to the simulation result. 
# 
# 4.1) How many pucks at most need to be thrown to get to the end of any one game?
# 
# 4.2) Draw $10^6$ random values of $p$ and for each draw as many puck positions (from 0 to 1 corresponding to the left to right postion) as necessary to finish the game and count the points. Select only the games that lead to the current conditions after 8 pucks are thrown, and conclude from this simulation on how many times Bob could win.
# 
# 5) In the light of this test, which approach should you adopt for this particular problem?