Inference and Reasoning with Bayesian Networks

COMP4670/8600 - Introduction to Statistical Machine Learning - Assignment 2

Name: SOLUTION FILE

Student ID:

Instructions

Notes
Maximum marks 19
Weight 19% of final grade
Format Complete this ipython notebook. Do not forget to fill in your name and student ID above
Submission mode Use wattle
Formulas All formulas which you derive need to be explained unless you use very common mathematical facts. Picture yourself as explaining your arguments to somebody who is just learning about your assignment. With other words, do not assume that the person marking your assignment knows all the background and therefore you can just write down the formulas without any explanation. It is your task to convince the reader that you know what you are doing when you derive an argument. Typeset all formulas in $\LaTeX$.
Code quality Python code should be well structured, use meaningful identifiers for variables and subroutines, and provide sufficient comments. Please refer to the examples given in the tutorials.
Code efficiency An efficient implementation of an algorithm uses fast subroutines provided by the language or additional libraries. For the purpose of implementing Machine Learning algorithms in this course, that means using the appropriate data structures provided by Python and in numpy/scipy (e.g. Linear Algebra and random generators).
Cooperation All assignments must be done individually. Cheating and plagiarism will be dealt with in accordance with University procedures (please see the ANU policies on Academic Honesty and Plagiarism). Hence, for example, code for programming assignments must not be developed in groups, nor should code be shared. You are encouraged to broadly discuss ideas, approaches and techniques with a few other students, but not at a level of detail where specific solutions or implementation issues are described by anyone. If you choose to consult with other students, you will include the names of your discussion partners for each solution. If you have any questions on this, please ask the lecturer before you act.

$\newcommand{\dotprod}[2]{\left\langle #1, #2 \right\rangle}$ $\newcommand{\onevec}{\mathbb{1}}$

Setting up the environment (there is some hidden latex which needs to be run in this cell).

In [ ]:
import itertools, copy
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display, Image

%matplotlib inline

Part 1: Graphical Models

Problem setting

We are interested to predict the outcome of the election in an imaginary country, called Under Some Assumptions (USA). There are four candidates for whom the citizens can Vote for: Bernie, Donald, Hillary, and Ted. The citizens live in four Regions: north, south, east and west. We have general demographic information about the people, namely: Gender (male, female) and Handedness (right, left). Based on surveys done by an external company, we believe that the Region and Gender affects whether the people use their Jackets full time, part time or never. Surprisingly, the company told us that the Age of their shoes (new, worn, old) depends on how often they wear their Jackets. Furthermore, the Gender and their preferred Hand affects the Colour of their hat (white, black). Finally, surveys say that the citizens will Vote based on their Region, Age of their shoes and Colour of their hats.

The directed graphical model is depicted below:

In [ ]:
Image(url="https://machlearn.gitlab.io/isml2017/assignments/election_model.png")

Conditional probability tables

After paying the survey firm some more money, they provided the following conditional probability tables.

$p(R)$ R=n R=s R=e R=w
marginal 0.2 0.1 0.5 0.2
$p(G)$ G=m G=f
marginal 0.3 0.7
$p(H)$ H=r H=l
marginal 0.9 0.1
$p(J R,G)$ R=n,G=m R=n,G=f R=s,G=m R=s,G=f R=e,G=m R=e,G=f R=w,G=m R=w,G=f
J=full $\quad$ 0.9 0.8 0.1 0.3 0.4 0.01 0.02 0.2
J=part $\quad$ 0.08 0.17 0.03 0.35 0.05 0.01 0.2 0.08
J=never $\quad$ 0.02 0.03 0.87 0.35 0.55 0.98 0.78 0.72
$p(A J)$ J=full J=part J=never
A=new 0.01 0.96 0.3
A=worn 0.98 0.03 0.5
A=old 0.01 0.01 0.2
$p(C G,H)$ G=m,H=r G=m,H=l G=f,H=r G=f,H=l
C=black $\quad$ 0.9 0.83 0.17 0.3
C=white $\quad$ 0.1 0.17 0.83 0.7

The final conditional probability table is given by the matrix below. The order of the rows and columns are also given below.

In [ ]:
vote_column_names = ['north,new,black', 'north,new,white', 'north,worn,black', 'north,worn,white', 
                'north,old,black', 'north,old,white', 'south,new,black', 'south,new,white', 
                'south,worn,black', 'south,worn,white', 'south,old,black', 'south,old,white', 
                'east,new,black', 'east,new,white', 'east,worn,black', 'east,worn,white', 
                'east,old,black', 'east,old,white', 'west,new,black', 'west,new,white', 
                'west,worn,black', 'west,worn,white', 'west,old,black', 'west,old,white']

vote_outcomes = ('bernie','donald','hillary','ted')

vote_pmf_array = np.array(
        [
            [0.1,0.1,0.4,0.02,0.2,0.1,0.1,0.04,0.2,0.1,0.1 ,0.1,0.4 ,0.1 ,0.1,0.1 ,0.1,0.04,0.3,0.2,0.1,0.3,0.34,0.35],
            [0.3,0.4,0.2,0.5 ,0.1,0.2,0.1,0.5 ,0.1,0.2,0.5 ,0.3,0.2 ,0.42,0.2,0.67,0.4,0.4 ,0.1,0.1,0.5,0.1,0.1 ,0.1],
            [0.5,0.4,0.3,0.3 ,0.5,0.6,0.6,0.3 ,0.5,0.4,0.36,0.3,0.28,0.3 ,0.4,0.1 ,0.4,0.16,0.4,0.2,0.3,0.3,0.4 ,0.5],
            [0.1,0.1,0.1,0.18,0.2,0.1,0.2,0.16,0.2,0.3,0.04,0.3,0.12,0.18,0.3,0.13,0.1,0.4 ,0.2,0.5,0.1,0.3,0.16,0.05],
        ]
)

The 7 conditional probability tables in are encoded in python below.

Base your subsequent computations on these objects.

In [ ]:
class RandomVariable(object):
    def __init__(self, name, parents, outcomes, pmf_array):
        assert isinstance(name, str)
        assert all(isinstance(_, RandomVariable) for _ in parents)
        assert isinstance(outcomes, tuple)
        assert all(isinstance(_, str) for _ in outcomes)
        assert isinstance(parents, tuple)
        assert isinstance(pmf_array, np.ndarray)
        keys = tuple(map(tuple, itertools.product(*[_.outcomes for _ in parents])))
        assert np.allclose(np.sum(pmf_array, 0), 1)
        expected_shape = (len(outcomes), len(keys))
        assert pmf_array.shape == expected_shape, (pmf_array.shape, expected_shape)
        pmfs = {k: {outcome: probability for outcome, probability in zip(outcomes, probabilities)} 
                for k, probabilities in zip(keys, pmf_array.T)}
        self.name, self.parents, self.outcomes, self.pmfs = name, parents, outcomes, pmfs


class BayesianNetwork(object):
    def __init__(self, *random_variables):
        assert all(isinstance(_, RandomVariable) for _ in random_variables)
        self.random_variables = random_variables
        

region_pmf_array = np.array([[0.2, 0.1, 0.5, 0.2]]).T
region = RandomVariable(
    name='region',
    parents=tuple(),
    outcomes=('north', 'south', 'east', 'west'), 
    pmf_array = region_pmf_array,
)

gender_pmf_array = np.array([[0.3, 0.7]]).T
gender = RandomVariable(
    name='gender',
    parents=tuple(),
    outcomes=('male', 'female'), 
    pmf_array = gender_pmf_array
)

hand_pmf_array = np.array([[0.9, 0.1]]).T
hand = RandomVariable(
    name='hand',
    parents=tuple(),
    outcomes=('left', 'right'), 
    pmf_array = hand_pmf_array
)

jacket_pmf_array = np.array(
        [
            [0.9,0.8,0.1,0.3,0.4,0.01,0.02,0.2],
            [0.08,0.17,0.03,0.35,0.05,0.01,0.2,0.08],
            [0.02,0.03,0.87,0.35,0.55,0.98,0.78,0.72],
        ]
    )
jacket = RandomVariable(
    name='jacket',
    parents=(region, gender),
    outcomes=('full', 'part', 'never'), 
    pmf_array = jacket_pmf_array
)

age_pmf_array = np.array(
        [
            [0.01,0.96,0.3],
            [0.98,0.03,0.5],
            [0.01,0.01,0.2],
        ]
    )
age = RandomVariable(
    name='age',
    parents=(jacket, ),
    outcomes=('new', 'worn', 'old'), 
    pmf_array = age_pmf_array
)

colour_pmf_array = np.array(
        [
            [0.9,0.83,0.17,0.3],
            [0.1,0.17,0.83,0.7],
        ]
    )
colour = RandomVariable(
    name='colour',
    parents=(gender, hand),
    outcomes=('black', 'white'),
    pmf_array = colour_pmf_array
)

vote = RandomVariable(
    name='vote',
    parents=(region, age, colour),
    outcomes=vote_outcomes,
    pmf_array = vote_pmf_array
)


election_model = BayesianNetwork(region, gender, hand, jacket, age, colour, vote)

1A (1 mark) Joint distribution of a subset

  • Compute the joint distribution of Jacket, Region and Gender. Print in a tab separated format the resulting distribution.
In [ ]:
# Solution goes here

1B1 (2 marks) Variable Ordering

  1. Implement a function which determines an appropriate ordering of the variables for use in the following scheme:
    • For the first node R, draw a sample from p(R).
    • For each subsequent node, draw a sample from the conditional distribution $p(X \,|\, \text{parents}(X))$ where $\text{parents}(X)$ are the parents of the variable $X$ in the graphical model.
  2. Use your function to compute such an ordering and print the result in a human friendly format.
In [ ]:
# Solution goes here

1B2 (2 marks) Sampling

  1. Given the ordering you have determined above, implement the sampler described in the previous question. If you were unable to compute the ordering in the previous question, use ordering = (hand, region, gender, colour, jacket, age, vote).
  2. Draw a single sample and print the result in a human friendly format.
In [ ]:
# Solution goes here

1B3 (2 marks) Marginals

  1. Calculate (and show in LaTeX) the marginal distribution for Jacket.
  2. Implement a function which computes the marginal distribution of each variable. It should make use of the ordering used by your sampler.
  3. Implement a function which takes a list of samples the model and computes the empirical marginal distribution of each variable.
  4. Plot the theoretical and approximate marginals which you obtain along with the absolute percent error between the two, in the format below:
region outcome exact approx error (%) north 0.20 0.20 0.4 south 0.10 0.10 0.1 east 0.50 0.50 0.4 west 0.20 0.20 0.6 gender outcome exact approx error (%) male 0.30 0.30 0.1 female 0.70 0.70 0.0 etc.
In [ ]:
# Solution goes here

1B4 (1 mark) Easy conditional probabilities

Compute $p(X \,|\, G=\text{female})$ for all $X$ other than $G$,

  1. Approximately, using your sampler
  2. Exactly, using your marginal calculating function from the previous question. Hint: what happens if you set $p(G=\text{female})=1$ in your model?
  3. Plot the results side by side in the same format as the previous question.
  4. State for which variables other than $G$ the theoretical scheme above can be used to compute such conditionals, and why.
In [ ]:
# Solution goes here

1B5 (3 marks) General conditional probabilities

  1. Write down the expression of the joint probability $p(R,G,H,J,A,C,V)$ in terms of the conditional probabilities in the graphical model.
  2. Derive $p(G = male \,|\, V = Donald)$ in terms of the conditional probabilities of the graphical model.
  3. Compute and display in a human friendly format the conditional distributions $p(G=g \,|\, V=v)$ for all genders $g$ and votes $v$, by naively marginalising the other variables (other than $G$ and $V$, that is).

Solution description

In [ ]:
# Solution goes here

1B6 (2 marks) Variable elimination

Denote the graphical model consider thus far $\mathcal{M}$.

  1. Derive $p(R,G,J,A,C,V)$ by marginalising over $H$ in $\mathcal{M}$.
  2. Describe how the structure (connectivity) of the new graphical model (call it $\mathcal{M}_H$) over all variables other than $H$ changes after eliminating $H$ in this way.
  3. Describe which conditional(s) in the new graphical model differ from the original model.
  4. Encode the $\mathcal{M}_H$ in python similarly to $\mathcal{M}$.
In [ ]:
# Solution goes here

1B6 (3 marks) General estimation of conditional probabilities (revisited)

  1. As you did earlier, compute and display in a human friendly format the conditional distributions $p(G=g \,|\, V=v)$ for all genders $g$ and votes $v$, by naively marginalising the other variables (other than $G$ and $V$, that is). This time however, do so using $\mathcal{M}_H$ rather than $\mathcal{M}$.
  2. Quantify the computational advantages of using $\mathcal{M}_H$ in this way rather than $\mathcal{M}$.
  3. Which variable (or variables) would be the best to eliminate in this way, in terms of the aforementioned computational advantages, and why?
  4. Pick a (best) variable from the previous question and call it $X$. Assuming we have eliminated $X$, which would be the "best" variable to subsequently eliminate in a similar fashion?
In [ ]:
# Solution goes here

Part 2: Theory

2A (3 marks) Functions of random variables

$u$ and $v$ are independently sampled from the standard uniform distribution on the unit interval, $[0,1]$.

If $s=(2u-1)^2+(2v-1)^2 \geq 1$ then $x$ is sampled from the standard normal distribution, $\mathcal{N}(0,1)$. Otherwise $x=(2u-1)\sqrt{-2 \log(s)/s}$.

How is $x$ distributed?

Solution description