#!/usr/bin/env python
# coding: utf-8

# # Chapter 7: MCMC

# ## Exercise 1 (35 points)
# 
# Let's play with a die and Markov processes...
# 
# You set the 6 sided die on the table with the one-face up, then you close your eyes and tip it once [not roll] (the one-face is now on one side), and tip it again, and so on, until you have tipped it $20$ times.
# 
# We note: 
# * $X_k = (x_1,...,x_6)$ the state of the die at the kth tipping, where $x_i$ is the event of $i$ is up-face,
# * and $P_k = P(X_k | X_{k-1}) = P(x_1, ..., x_6 | X_{k-1})$ the probability of the die state given the previous configuration, where the subscript on the $P$ and $X$ indicates how many tippings have occurred.
# 
# The probability distribution for the initial state is
# $$P_0 = P(X_0) = (p(x_1), ..., p(x_6)) = (1, 0, 0, 0, 0, 0)$$
# the six-tuple notation is the probability of 1-face up, 2-face up, and so on. Note that $X_0$ is imposed and therefore there is no conditional probability.
# 
# 
# 1. At any further state $(k>0)$, the up-face after each tipping depends on the up-face of the previous state. What is the probability distribution after one tipping, $P_1$? What is $P_2$?
# 
# 2. More generally we want to find $P_k$. Explain why the probability of the new state does not depend on more than just the last previous step:
# $$P(X_{k+1} | X_0, X_1, ..., X_k) = P(X_{k+1} | X_k)$$
# 
# 3. Let's consider the transition probability from an up-face $i$ to an up-face $j$ after tipping, $p(x_j | x_i)$. Find the (6x6) transition matrix $T$ from the $k$th state to the $k+1$ state, in which $T[i,j] = p(x_j|x_i)$. Justify that 
# $P(X_{k+1} | X_k) = T \cdot P(X_k | X_{k-1})$ (i.e., T does not depend on $k$)
# 
# 4. Explain why this transition matrix $T$ does not depend on $k$. Deduce  $P(X_{k+1} | X_0)$.
# 
# 5. Using the above, write a piece of code that simulate the tipping. Using 1 as being face-up for initial state, what is $P_{20}$? 
# 
# 6. Using you code and the same initial state, plot the evolution of the different face-up probabilities as a function of number of tippings. What do you observe?
# 
# 7. How many tippings, $k$ are required to obtain a uniform probability $P_k$ at a $10^{-5}$ precision? and what is this value and what interpretation could you make of it given the problem?
# 
# 8. What happens if instead of a fix $X_0$, you roll the die? Explain the changes you need to make. What do you expect the chain of probabilities to become? Simulate a few tippings with your updates and confirm this behavior.

# ## Exercice 2 (50 points)
# 
# Line fitting using Monte Carlo sampling of the posterior. Here we explore using the Metropolis Monte Carlo method to "fit" a linear function ($y = a_0 + a_1 x + \epsilon$) to the dataset `2Dline.dat`, following the approach covered in section 7.6 (make sure you have read this section carefully). 
# 
# > You are advised to use the R code provided (These are all online -- see the course web site), which means that you will need to do little coding, but you will need to make some minor modifications for this exercise. You should therefore first look over the code to understand what it is doing, and not try to use it blindly. Note that you should not use the initial part of {\tt linearmodel\_functions.R} to simulate new data!
# 
# 
# > You are of course free to use some other code or programming language instead.
#     
# 
# 1. Inspect the data and explicitly assign suitable priors to the parameters. Identify also a sensible initial position for the Monte Carlo sampling and sensible step sizes. (in the R code: the priors are defined in the function `logprior.linearmodel` in the file `linearmodel\_functions.R`). 
# 
# 2. Run the MCMC algorithm to sample the posterior. Experiment with using different step sizes, starting points, and numbers of samples in the MCMC algorithm. Plot (for yourself) the chains and posterior PDFs (as in Figure 7.5). You do not need to hand in your plots, but settle on a choice of step sizes and initial value, and explain briefly how sensitive your resulting posteriors are to these numbers.
# 
# 3. Using your chosen step sizes and initial value, and a burn-in of $10^2$ samples, use the Metropolis algorithm to generate $10^5$ samples from the posterior treating $a_0$, and $a_1$, and $\sigma$ as free parameters.
# 
# 4. Plot the samples: (i) in 2D as scatter plots, (ii) in 1D as a distribution (use a histogram or kernel density estimation). Plot the data and the fit as well as the uncertainties on the fit (you can for instance draw some/many models from your metropolis samples and plot them).
# 
# 5. Calculate the mean of the three 1D marginal posterior PDFs (i.e., for each parameter) as well as the mode of the full 3D posterior.
# 
# 6. For each 1D marginal posterior, calculate the standard deviation, as well as the $0.159$ and $0.841$ quantiles (see the lecture notes for how to do this). Together these quantiles form a $0.841-0.159=0.682$ confidence interval (for each parameter), which would be the same as twice the standard deviation if the 1D marginal posteriors were Gaussian.
# 
# 7. Repeat steps (3) to (6) using just just lines 1,4,7,10,13 (i.e., every third line) in the data set. By how much do your 1D means (means of the 1D marginalized posteriors) change? Are these changes commensurate with the standard deviations of the 1D posteriors? In other words, do the standard deviations characterise the precisions reasonably well?

# ## Exercise (15 points)
# 
# 1. Alice has two children. At a charity event she wins a makeup set, but doesn't want to keep it. Carol, who doesn't not know her, suggests "If you have a daughter, you could give it to her". Alice replies "I do, so I'll do that". What is the probability that Alice has a son?
# 
# 2. Bob has two children. One of them is a daughter. What's the probability that Bob has a son?
# 
# Give an intuitive explanation for both of your answers, and explain why they are the same or are different.
# 
# **everything is in the explanation!**