#!/usr/bin/env python
# coding: utf-8

# # Random Variables, Expectation, Random Vectors, and Stochastic Processes
# 
# ## Random Variables
# A _real-valued random variable_ is a mapping from outcome space $\mathcal{S}$ to the real line $\Re$.
# A _real-valued random variable_ $X$ can be characterized by its probability distribution, which specifies (for a suitable collection of subsets of the real line $\Re$ that comprises a sigma-algebra), the chance that the value of $X$ will be in each such subset.
# There are technical requirements regarding  _measurability_, which generally we will ignore.
# Perhaps the most natural mathematical setting for probability theory involves _Lebesgue integration_;
# we will largely ignore the difference between a _Riemann integral_ and a _Lebesgue integral_.
# 
# Let $P_X$ denote the probability distribution of the random variable $X$. 
# Then if $A \subset \Re$, $P_X(A) = {\mathbb P} \{ X \in A \}$.
# We write $X \sim P_X$,
# pronounced "$X$ is distributed as $P_X$" or "$X$ has distribution $P_X$." 
# 
# If two random variables $X$ and $Y$ have the same distribution, we write $X \sim Y$ and we say that $X$ and $Y$
# are _identically distributed_.
# 
# Real-valued random variables can be _continuous_, _discrete_, or _mixed (general)_.
# 
# Continuous random variables have _probability density functions_ with respect to Lebesgue measure.
# If $X$ is a continuous random variables, there is some nonnegative function $f(x)$,
# the probability density of $X$, such that
# for any (suitable) set $A \subset \Re$,
# $$
#   {\mathbb P} \{ X \in A \} = \int_A f(x) dx.
# $$
# Since ${\mathbb P} \{ X \in \Re \} = 1$, it follows that $\int_{-\infty}^\infty f(x) dx = 1$.
# 
# _Example._ 
# Let $f(x) = \lambda e^{-\lambda x}$ for $x \ge 0$, with $\lambda > 0$ fixed, and $f(x) = 0$ otherwise.
# Clearly $f(x) \ge 0$.
# $$
#   \int_{-\infty}^\infty f(x) dx = \int_0^\infty \lambda e^{-\lambda x} dx
#   = - e^{-\lambda x}|_0^\infty = - 0 + 1 = 1.
# $$
# Hence, $\lambda e^{-\lambda x}$ can be the probability density of a continuous random variable.
# A random variable with this density is said to be _exponentially distributed_.
# Exponentially distributed random variables are used to model radioactive decay and the failure
# of items that do not "fatigue." For instance, the lifetime of a semiconductor after an initial
# "burn-in" period is often modeled as an exponentially distributed random variable.
# It is also a common model for the occurrence of earthquakes (although it does not fit the data well).
# 
# _Example._
# Let $a$ and $b$ be real numbers with $a < b$, and let $f(x) = \frac{1}{b-a}$, $x \in [a, b]$ and 
# $f(x)=0$, otherwise. 
# Then $f(x) \ge 0$ and $\int_{-\infty}^\infty f(x) dx = \int_a^b \frac{1}{b-a} = 1$,
# so $f(x)$ can be the probability density function of a continuous random variable.
# A random variable with this density is sad to be _uniformly distributed on the interval $[a, b]$_.
# 
# Discrete random variables assign all their probability to some _countable_ set of points $\{x_i\}_{i=1}^n$,
# where $n$ might be infinite.
# Discrete random variables have _probability mass functions_.
# If $X$ is a discrete random variable, there is a nonnegative function $p$, the probability mass function
# of $X$, such that
# for any set $A \subset \Re$,
# $$
#   {\mathbb P} \{X \in A \} = \sum_{i: x_i \in A} p(x_i).
# $$
# The value $p(x_i) = {\mathbb P} \{X = x_i\}$, and $\sum_{i=1}^\infty p(x_i) = 1$.
# 
# _Example._
# Fix $\lambda > 0$.
# Let $x_i = i-1$ for $i=1, 2, \ldots$, and let $p(x_i) = e^{-\lambda} \lambda^{x_i}/x_i!$.
# Then $p(x_i) > 0$ and 
# $$ 
# \sum_{i=1}^\infty p(x_i) = e^{-\lambda} \sum_{j=0}^\infty \lambda^j/j! = e^{-\lambda} e^{\lambda} = 1.
# $$
# Hence, $p(x)$ is the probability mass function of a discrete random variable.
# A random variable with this probability mass function is said to be _Poisson distributed (with parameter
# $\lambda$)_.
# Poisson-distributed random variables are often used to model rare events.
# 
# 
# _Example._
# Let $x_i = i$ for $i=1, \ldots, n$, and let $p(x_i) = 1/n$ and $p(x) = 0$, otherwise.
# Then $p(x) \ge 0$ and $\sum_{x_i} p(x_i) = 1$.
# Hence, $p(x)$ can be the probability mass function of a discrete random variable.
# A random variable with this probability mass function is said to be _uniformly distributed on $1, \ldots, n$_.
# 
# _Example._
# Let $x_i = i-1$ for $i=1, \ldots, n+1$, and let $p(x_i) = {n \choose x_i} p^{x_i} (1-p)^{n-x_i}$, and
# $p(x) = 0$ otherwise.
# Then $p(x) \ge 0$ and 
# $$
# \sum_{x_i} p(x_i) = \sum_{j=0}^n {n \choose j} p^j (1-p)^{n-j} = 1,
# $$
# by the binomial theorem.
# Hence $p(x)$ is the probability mass function of a discrete random variable.
# A random variable with this probability mass function is said to be _binomially distributed
# with parameters $n$ and $p$_.
# The number of successes in $n$ independent trials that each have the same probability $p$ of success
# has a binomial distribution with parameters $n$ and $p$
# For instance, the number of times a fair die lands with 3 spots showing in 10 independent rolls has
# a binomial distribution with parameters $n=10$ and $p = 1/6$.
# 
# For general random variables, the chance that $X$ is in some subset of $\Re$ cannot be written as
# a sum or as a Riemann integral; it is more naturally represented as a Lebesgue integral (with respect to
# a measure other than Lebesgue measure).
# For example, imagine a random variable $X$ that has probability $\alpha$ of being equal to zero;
# and if $X$ is not zero, it has a uniform distribution on the interval $[0, 1]$.
# Such a random variable is neither continuous nor discrete.
# 
# Most of the random variables in this class are either discrete or continuous.
# 
# If $X$ is a random variable such that, for some constant $x_1 \in \Re$, ${\mathbb P}(X = x_1) = 1$, $X$
# is called a _constant random variable_.

# <hr />
# ### Exercises
# 
# 1. Show analytically that $\sum_{x_i} p(x_i) = \sum_{j=0}^n {n \choose j} p^j (1-p)^{n-j} = 1$.
# + Write a Python program that verifies that equation numerically for $n=10$: for 1000 values of $p$ 
# equispaced on the interval $(0, 1)$, find the maximum absolute value of the difference between the sum and 1.
# 1.  Let $ \in (0, 1]$; let $x_i = 1, 2, \ldots$; and define $p(x_i) = (1-p)^{x_i-1}p$, and $p(x) = 0$ otherwise. Show analytically that $p(x)$ is the probability mass function of a discrete random variable. 
# (A random variable with this probability mass function is said to be _geometrically distributed with parameter $p$_.)
# 
# <hr />

# ### Cumulative Distribution Functions
# 
# The _cumulative distribution function_ or _cdf_ of a real-valued random variable is the chance that the variable is less than $x$, as a function of $x$.
# Cumulative distribution functions are often denoted with capital Roman letters ($F$ is especially common notation):
# 
# $$F_X(x) \equiv \mathbb{P}(X \le x).$$
# 
# Clearly:
# 
# + $0 \le F_X(x) \le 1$
# + $F_X(x)$ increases monotonically with $x$ (i.e., $F_X(a) \le F_X(b)$ if $a \le b$.
# + $\lim_{x \rightarrow -\infty} F_X(x) = 0$
# + $\lim_{x \rightarrow \infty} F_X(x) = 1$
# 
# The cdf of a continuous real-valued random variable is a continuous function.
# The cdf of a discrete real-valued random variable is piecewise constant, with jumps at the possible values of the random variable.
# If the cdf of a real-valued random variable has jumps and also regions where it is not constant, the random variable is neither continuous nor discrete.
# 
# ### Examples
# [To Do]

# In[6]:


# boilerplate
get_ipython().run_line_magic('matplotlib', 'inline')
from __future__ import division
import math
import numpy as np
import scipy as sp
from scipy import stats  # distributions
from scipy import special # special functions
import matplotlib.pyplot as plt
from ipywidgets import interact, interactive, FloatRangeSlider, fixed # interactive stuff


# In[ ]:


# Examples of densities and cdfs

# U[0,1]
def pltUnif(a,b):
    ffac = 0.1
    s = b-a
    fudge = ffac*s
    x = np.arange(a-fudge, b+fudge, s/200)
    y = np.ones(len(x))/s
    y[x<a] = np.zeros(np.sum(x < a))   # zero for x < a
    y[x>b] = np.zeros(np.sum(x > b))   # zero for x > b
    Y = (x-a)/s   # uniform CDF is linear
    Y[x<a] = np.zeros(np.sum(x < a))
    Y[x >= b] = np.ones(np.sum(x >= b))
    plt.plot(x,y,'b-',x,Y,'r-',linewidth=2)
    plt.plot((a-fudge, b+fudge), (0.5, 0.5), 'g--')  # horizontal green dashed line at 0.5
    plt.plot((a-fudge, b+fudge), (0, 0), 'k-')  # horizontal black line at 0
    plt.xlabel('$x$')  # axis labels. Can use LaTeX math markup
    plt.ylabel(r'$f(x) = 1_{[a,b]}/(b-a)')
    plt.axis([a-fudge,b+fudge,-0.1,(1+ffac)*max(1, 1/s)])  # axis limits
    plt.title('The $U[$' + str(a) + ',' + str(b) + '$]$ density and cdf')
    plt.show()

interactive(pltUnif, \
            [a, b] = FloatRangeSlider(min = -5, max = 5, step = 0.05, lower=-1, upper=1))


# In[ ]:


# Exponential(lambda)

def plotExp(lam):
    ffac = 0.05
    x = np.arange(0, 5/lam, step=(5/lam)/200)
    y = sp.stats.expon.pdf(x, scale = 1/lam)
    Y = sp.stats.expon.cdf(x, scale = 1/lam)
    plt.plot(x,y,'b-',x,Y,'r-',linewidth=2)
    plt.plot((-.1, (1+ffac)*np.max(x)), (0.5, 0.5), 'g--')  # horizontal line at 0.5
    plt.plot((-.1, (1+ffac)*np.max(x)), (1, 1), 'k:')  # horizontal line at 1
    plt.xlabel('$x$')  # axis labels. Can use LaTeX math markup
    plt.ylabel(r'$f(x) = \lambda e^{-\lambda x}; F(x) = 1-e^{\lambda x}$.')
    plt.title(r'The exponential density and cdf for $\lambda=$' + str(lam))
    plt.axis([-.1,(1+ffac)*np.max(x),-0.1,(1+ffac)*max(1, lam)])  # axis limits
    plt.show()
    
interact(plotExp, lam=(0, 10, 1))


# ## Jointly Distributed Random Variables
# 
# Often we work with more than one random variable at a time.
# Indeed, much of this course concerns _random vectors_, the components of which are individual
# real-valued random variables.
# 
# The _joint probability distribution_ of a collection of random variables $\{X_i\}_{i=1}^n$ gives the probability that
# the variables simultaneously fall in subsets of their possible values.
# That is, for every (suitable) subset $ A \in \Re^n$, the joint probability distribution of $\{X_i\}_{i=1}^n$
# gives ${\mathbb P} \{ (X_1, \ldots, X_n) \in A \}$.
# 
# An _event determined by the random variable $X$_ is an event of the form $X \in A$, where $A \subset \Re$.
# 
# An _event determined by the random variables $\{X_j\}_{j \in J}$_ is an event of the form
# $(X_j)_{j \in J} \in A$, where $A \subset \Re^{\#J}$.
# 
# Two random variables $X_1$ and $X_2$ are _independent_ if every event determined by $X_1$ is independent
# of every event determined by $X_2$.
# If two random variables are not independent, they are _dependent_.
# 
# A collection of random variables $\{X_i\}_{i=1}^n$ is _independent_ if every event determined by every subset
# of those variables is independent of every event determined by any disjoint subset of those variables.
# If a collection of random variables is not independent, it is _dependent_.
# 
# Loosely speaking, a collection of random variables is independent if learning the values of some of them
# tells you nothing about the values of the rest of them.
# If learning the values of some of them tells you anything about the values of the rest of them,
# the collection is dependent.
# 
# For instance, imagine tossing a fair coin twice and rolling a fair die.
# Let $X_1$ be the number of times the coin lands heads, and $X_2$ be the number of spots that show on the die.
# Then $X_1$ and $X_2$ are independent: learning how many times the coin lands heads tells you nothing about what
# the die did.
# 
# On the other hand, let $X_1$ be the number of times the coin lands heads, and let $X_2$ be the sum of the
# number of heads and the number of spots that show on the die.
# Then $X_1$ and $X_2$ are dependent. For instance, if you know the coin landed heads twice, you know that the sum
# of the number of heads and the number of spots must be at least 3.

# ## Expectation
# 
# See [SticiGui: The Long Run and the Expected Value](http://www.stat.berkeley.edu/~stark/SticiGui/Text/expectation.htm) for an elementary introduction to expectation.
# 
# The _expectation_ or _expected value_ of a random variable $X$, denoted ${\mathbb E}X$, is a probability-weighted average of its possible values.
# From a frequentist perspective, it is the long-run limit (in probabiity) of the average of its values in repeated experiments.
# The expected value of a real-valued random variable (when it exists) is a fixed number, not a random value.
# The expected value depends on the probability distribution of $X$ but not on any realized value of $X$.
# If two random variables have the same probability distribution, they have the same expected value.
# 
# <hr />
# ### Properties of Expectation
# 
# + For any real $\alpha \in \Re$, if ${\mathbb P} \{X = \alpha\} = 1$, then ${\mathbb E}X = \alpha$: the expected
# value of a constant random variable is that constant.
# + For any real $\alpha \in \Re$, ${\mathbb E}(\alpha X) = \alpha {\mathbb E}X$: scalar homogeneity.
# + If $X$ and $Y$ are random variables, ${\mathbb E}(X+Y) = {\mathbb E}X + {\mathbb E}Y$: additivity.
# 
# <hr />
# 
# ### Calculating Expectation
# If $X$ is a continuous real-valued random variable with density $f(x)$, then the expected value of $X$ is
# $$
#    {\mathbb E}X = \int_{-\infty}^\infty x f(x) dx,
# $$
# provided the integral exists.
# 
# If $X$ is a discrete real-valued random variable with probability function $p$, then the expected value of $X$ is
# $$
#    {\mathbb E}X = \sum_{i=1}^\infty x_i p(x_i),
# $$
# where $\{x_i\} = \{ x \in \Re: p(x) > 0\}$,
# provided the sum exists.

# ## Examples
# 
# ### Uniform
# Suppose $X$ has density $f(x) = \frac{1}{b-a}$ for $a \le x \le b$ and $0$ otherwise.
# Then 
# 
# $$ \mathbb{E} = \int_{-\infty}^\infty x f(x) dx = \frac{1}{b-a} \int_a^b x dx = \frac{b^2-a^2}{2(b-a)} =
# \frac{a+b}{2}.$$
# 
# 
# 
# ### Poisson
# Suppose $X$ has a Poisson distribution with parameter $\lambda$.
# Then 
# 
# $$\mathbb{E}X = e^{-\lambda} \sum_{j=0}^\infty j \lambda^j/j! = \lambda.$$

# ## Examples relates to Bernoulli Trials
# 
# ### Bernoulli
# Suppose $X$ can take only two values, 0 and 1, and the probability that $X= 1$ is $p$.
# Then
# 
# $$\mathbb{E} X = 1 \times p + 0 \times (1-p) = p.$$
# 
# ### Binomial
# [To do.] Derive the Binomial distribution as the number of successes in $n$ iid Bernoulli trials.
# 
# The number of successes $X$ in $n$ trials is equivalent to the sum of indicators for the success in each trial. That is, 
# 
# $$ X = \sum_{i=1}^n X_i,$$
# 
# where $X_i = 1$ if the $i$th trial results in success, and $X_i = 0$ otherwise.
# By the additive property of expectation,
# 
# $$ \mathbb{E}X = \mathbb{E} \sum_{i=1}^n X_i = \sum_{i=1}^n \mathbb{E}X_i =
# \sum_{i=1}^n p = np.$$
# 
# ### Geometric
# 
# The number of trials to the first success in iid Bernoulli($p$) trials has a _geometric distribution with parameter $p$_.
# 
# [To do.] Derive the geometric and calculate expectation.
# 
# 
# ### Negative Binomial
# The number of trials to the $k$th success in iid Bernoulli($p$) trials
# has a _negative binomial distribution with parameters $p$ and $k$_.
# 
# [To do.] Derive the negative binomial.
# 
# The number of trials $X$ until the $k$th success in iid Bernoulli trials can be written as the number of trials until the 1st success plus the number to the second success plus \hellip; plus the number of trials to the $k$th success.
# Each of those $k$ "waiting times" $X_i$ has a geometric distribution.
# Hence
# 
# $$ \mathbb{E}X = \mathbb{E} \sum_{i=1}^k X_i = \sum_{i=1}^k \mathbb{E}X_i =
# \sum_{i=1}^k 1/p = k/p.$$
# 
# ### Hypergeometric
# [To do.] Derive hypergeometric.
# 
# Population of $N$ numbers of which $G$ equal 1 and $N-G$ equal 0.
# Number of 1s in a sample of size $n$ drawn without replacement.
# 
# $$ \mathbb{P} \{X = x\} = \frac{ {{G} \choose {x}}{{N-g} \choose {n-x}}}{{N}\choose{n}}.$$
# 
# [To do.] Calculate expected value. Use random permutations of "tickets" to show that expected value in each position is $G/N$.

# ## Examples related to sampling from finite populations
# 
# ### One draw from a box of numbered tickets
# [To do.]
# 
# ### The sample sum of $n$ draws from a box
# [To do.]
# 
# ### The sample mean of $n$ draws from a box
# [To do.]

# ## Variance,  Standard Error, and Covariance
# 
# See [SticiGui: Standard Error](http://www.stat.berkeley.edu/~stark/SticiGui/Text/standardError.htm) for an elementary introduction to variance and standard error.
# 
# The _variance_ of a random variable $X$ is $\mbox{Var }X = {\mathbb E}(X - {\mathbb E}X)^2$.
# 
# Algebraically, the following identity holds:
# $$
# \mbox{Var } X = {\mathbb E}(X - {\mathbb E}X)^2 = {\mathbb E}X^2 - 2({\mathbb E}X)^2 + ({\mathbb E}X)^2 =
# {\mathbb E}X^2 - ({\mathbb E}X)^2.
# $$
# However, this is generally not a good way to calculate $\mbox{Var} X$ numerically, because of roundoff:
# it sacrifices precision unnecessarily.
# 
# The _standard error_ of a random variable $X$ is $\mbox{SE }X = \sqrt{\mbox{Var } X}$.
# 
# If $\{X_i\}_{i=1}^n$ are independent, then $\mbox{Var} \sum_{i=1}^n X_i = \sum_{i=1}^n \mbox{Var }X_i$.
# 
# If $X$ and $Y$ have a joint distribution, then $\mbox{cov} (X,Y) = {\mathbb E} (X - {\mathbb E}X)(Y - {\mathbb E}Y)$.
# It follows from this definition (and the commutativity of multiplication)
# that $\mbox{cov}(X,Y) = \mbox{cov}(Y,X)$.
# Also,
# $$
# \mbox{var }(X+Y) = \mbox{var }X + \mbox{var }Y + 2\mbox{cov}(X,Y).
# $$
# 
# If $X$ and $Y$ are independent, $\mbox{cov }(X,Y) = 0$. 
# However, the converse is not necessarily true: $\mbox{cov}(X,Y) = 0$ does not in general imply that
# $X$ and $Y$ are independent.

# ## Examples
# 
# ### Variance of a Bernoulli random variable
# 
# ### Variance of a Binomial random variable
# 
# ### Variance of a Geometric and Negative Binomial random variable
# 
# ### Variance of the sample sum and sample mean

# ## Random Vectors
# 
# Suppose $\{X_i\}_{i=1}^n$ are jointly distributed random variables, and let
# $$
# X = 
# \begin{pmatrix}
# X_1 \\
# \vdots \\
# X_n
# \end{pmatrix}
# .
# $$
# Then $X$ is a random vector, a $n$ by $1$ vector of real-valued random variables.
# 
# The expected value of $X$ is
# $$
# {\mathbb E} X \equiv
# \begin{pmatrix}
# {\mathbb E} X_1 \\
# \vdots \\
# {\mathbb E} X_n
# \end{pmatrix}
# .
# $$
# 
# The _covariance matrix_ of $X$ is
# $$
# \mbox{cov } X \equiv
# {\mathbb E} 
# \left (
# \begin{pmatrix}
# X_1 - {\mathbb E} X_1 \\
# \vdots \\
# X_n - {\mathbb E} X_n
# \end{pmatrix}
# \begin{pmatrix}
# X_1 - {\mathbb E} X_1 & \cdots & X_n - {\mathbb E} X_n
# \end{pmatrix}
# \right )
# =
# {\mathbb E} 
# \begin{pmatrix}
# (X_1 - {\mathbb E} X_1)^2 & (X_1 - {\mathbb E} X_1)(X_2 - {\mathbb E} X_2) & \cdots & (X_1 - {\mathbb E} X_1)(X_n - {\mathbb E} X_n) \\
# (X_1 - {\mathbb E} X_1)(X_2 - {\mathbb E} X_2) & (X_2 - {\mathbb E} X_2)^2 & \cdots & (X_2 - {\mathbb E} X_2)(X_n - {\mathbb E} X_n) \\
# \vdots & \vdots & \ddots & \vdots \\
# (X_1 - {\mathbb E} X_1)(X_n - {\mathbb E} X_n) & (X_2 - {\mathbb E} X_2)(X_n - {\mathbb E} X_n) & \cdots & (X_n - {\mathbb E} X_n)^2
# \end{pmatrix}
# $$
# 
# $$
# = 
# \begin{pmatrix}
# {\mathbb E}(X_1 - {\mathbb E} X_1)^2 & {\mathbb E}((X_1 - {\mathbb E} X_1)(X_2 - {\mathbb E} X_2)) & \cdots & 
# {\mathbb E}(X_1 - {\mathbb E} X_1)(X_n - {\mathbb E} X_n)) \\
# {\mathbb E}((X_1 - {\mathbb E} X_1)(X_2 - {\mathbb E} X_2)) & {\mathbb E}(X_2 - {\mathbb E} X_2)^2 & \cdots & 
# {\mathbb E}((X_2 - {\mathbb E} X_2)(X_n - {\mathbb E} X_n)) \\
# \vdots & \vdots & \ddots & \vdots \\
# {\mathbb E}((X_1 - {\mathbb E} X_1)(X_n - {\mathbb E} X_n)) & {\mathbb E}(X_2 - {\mathbb E} X_2)(X_n - {\mathbb E} X_n)) & \cdots & {\mathbb E}(X_n - {\mathbb E} X_n)^2
# \end{pmatrix}
# .
# $$
# 
# Covariance matrices are always _positive semidefinite_.
# (If $x'Ax \ge 0$ for all $x \in \Re^n$, $A$ is _nonnegative definite_ (or _positive semi-definite_.  [Here](./linalg.ipynb) is a review of linear algebra.)

# ## The Multivariate Normal Distribution
# 
# The notation $X \sim {\mathcal N}(\mu, \sigma^2)$ means that $X$ has a normal distribution with mean $\mu$
# and variance $\sigma^2$.
# This distribution is continuous, with probability density function
# $$
# \frac{1}{\sqrt{2\pi} \sigma} e^{\frac{-(x-\mu)^2}{2\sigma^2}}.
# $$
# 
# If $X \sim {\mathcal N}(\mu, \sigma^2)$, then $\frac{X-\mu}{\sigma} \sim {\mathcal N}(0, 1)$,
# the _standard normal distribution_.
# The probability density function of the standard normal distribution is
# $$
# \phi(x) = \frac{1}{\sqrt{2\pi}} e^ {-x^2/2}.
# $$

# In[ ]:


## Plot the standard normal density and cdf

def plotNorm(mu, sigma):
    x = np.arange(mu-4*sigma, mu+4*sigma, 8*sigma/200)
    y = np.exp(-x**2/(2*sigma**2))/(sigma*math.sqrt(2*math.pi))  # for clarity
    Y = sp.stats.norm.cdf(x, loc=mu, scale=sigma)  # using scipy for convenience
    plt.plot(x,y,'b-',x,Y,'r-',linewidth=2)
    plt.plot((mu-4.1*sigma, mu+4.1*sigma), (0.5, 0.5), 'g--')  # horizontal line at 0.5
    plt.xlabel('$x$')  # axis labels. Can use LaTeX math markup
    plt.ylabel(r'$f(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-x^2/2\sigma^2}$; $F(x)$')
    plt.axis([mu-4.1*sigma, mu+4.1*sigma,0,max(1.1,max(y))])  # axis limits
    plt.title(r'The $\mathcal{N}($' + str(mu) + ',' + str(sigma**2) + '$)$ density and cdf')
    plt.show()
    
interact(plotNorm, mu=(-5,5,.05), sigma=(0.1, 10, .1))


# A collection of random variables $\{ X_1, X_2, \ldots, X_n\} = \{X_j\}_{j=1}^n$ is _jointly normal_
# if all linear combinations of those variables have normal distributions.
# That is, the collection is jointly normal if for all $\alpha \in \Re^n$, $\sum_{j=1}^n \alpha_j X_j$
# has a normal distribution.
# 
# If $\{X_j \}_{j=1}^n$ are independent, normally distributed random variables, they are jointly normal.
# 
# If for some $\mu \in \Re^n$ and positive-definite matrix $G$, the joint density of $\{X_j \}_{j=1}^n$ is
# $$ 
# \left ( \frac{1}{\sqrt{2 \pi}}\right )^n \frac{1}{\sqrt{\left | G \right |}} 
# \exp \left \{ - \frac{1}{2} (x - \mu)'G^{-1}(x-\mu) \right \},
# $$
# then $\{X_j \}_{j=1}^n$ are jointly normal, and the covariance matrix of $\{X_j\}_{j=1}^n$ is $G$.

# ## The Central Limit Theorem
# 
# For an elementary discussion, see [SticiGui: The Normal Curve, The Central Limit Theorem, and Markov's and Chebychev's Inequalities for Random Variables](http://www.stat.berkeley.edu/~stark/SticiGui/Text/clt.htm).
# 
# Suppose $\{X_j \}_{j=1}^\infty$ are independent and identically distributed (iid), have finite expected value ${\mathbb E}X_j = \mu$, and have finite variance $\mbox{var }X_j = \sigma^2$.
# 
# Define the sum $S_n \equiv \sum_{j=1}^n X_j$.
# Then 
# $$
# {\mathbb E}S_n = {\mathbb E} \sum_{j=1}^n X_j = \sum_{j=1}^n {\mathbb E} X_j = \sum_{j=1}^n \mu = n\mu,
# $$
# and
# $$
# \mbox{var }S_n = \mbox{var } \sum_{j=1}^n X_j = n\sigma^2.
# $$
# (The last step follows from the independence of $\{X_j\}$: the variance of the sum is the sum of the variances.)
# 
# Define $Z_n \equiv \frac{S_n - n\mu}{\sqrt{n}\sigma}$.
# Then for every $a, b \in \Re$ with $a \le b$,
# $$
# \lim_{n \rightarrow \infty} {\mathbb P} \{ a \le Z_n \le b \} = \frac{1}{\sqrt{2\pi}} \int_a^b e^{-x^2/2} dx.
# $$
# This is a basic form of the Central Limit Theorem.

# ## Conditional Distributions
# 
# The conditional distribution of a random variable or random vector
# $X$ given the event $A$ is 
# 
# $$\mathbb{P}_{X|A}(B) = \mathbb{P} \{ X \in B | A \}$$
# 
# as a function of $B$, provided $\mathbb{P} (A) > 0$.
# 
# 
# [To do]
# 

# ## Conditional Expectation
# [To do]
# 
# Conditional expectation is a random variable...
# The expectation of the conditional expectation is the unconditional expectation
# $\mathbb{E}(\mathbb{E}(X|Y)) = \mathbb{E} X$.
# This is essentially another expression of the law of total probability.
# 
# ### Examples
# [To do] Use random permutation of a list of numbers to illustrate: $\mathbb{E}X_j$, $\mathbb{E}(X_j | X_k = x)$, $\mathbb{E}(X_j | X_k)$, $\mathbb{E} (\mathbb{E}(X_j | X_k)) = $\mathbb{E}X_j$.
# 
# 

# ## Point Processes
# 
# Point processes formalize the notion of something occurring at a random place or time (or both).
# 
# For instance, imagine the radioactive decay of a mass of uranium; the particular times at which an atom decays are modeled well as a Poisson process (described below).
# 
# 
# ### Poisson Processes
# 
# Temporal, spatiotemporal.  Waiting times (inter-arrival times) are exponential.
# Alternative characterizations.
# 
# Temporal point processes: the counting function.
# [To Do]
# 
# 
# #### Marked Poisson Processes
# [To Do]
# 
# ### Renewal Processes
# [To Do]
# 
# 
# ### Branching Processes
# [To Do]
# 
# 
# #### Hawkes Processes
# [To Do]
# 

# Previous: [Theories of Probability](probTheory.ipynb) Next: [Probability Inequalities](ineq.ipynb)

# In[ ]:


get_ipython().run_line_magic('run', 'talkTools.py')


# In[ ]: