#!/usr/bin/env python # coding: utf-8 # # Random Variables, Expectation, Random Vectors, and Stochastic Processes # # ## Random Variables # A _real-valued random variable_ is a mapping from outcome space $\mathcal{S}$ to the real line $\Re$. # A _real-valued random variable_ $X$ can be characterized by its probability distribution, which specifies (for a suitable collection of subsets of the real line $\Re$ that comprises a sigma-algebra), the chance that the value of $X$ will be in each such subset. # There are technical requirements regarding _measurability_, which generally we will ignore. # Perhaps the most natural mathematical setting for probability theory involves _Lebesgue integration_; # we will largely ignore the difference between a _Riemann integral_ and a _Lebesgue integral_. # # Let $P_X$ denote the probability distribution of the random variable $X$. # Then if $A \subset \Re$, $P_X(A) = {\mathbb P} \{ X \in A \}$. # We write $X \sim P_X$, # pronounced "$X$ is distributed as $P_X$" or "$X$ has distribution $P_X$." # # If two random variables $X$ and $Y$ have the same distribution, we write $X \sim Y$ and we say that $X$ and $Y$ # are _identically distributed_. # # Real-valued random variables can be _continuous_, _discrete_, or _mixed (general)_. # # Continuous random variables have _probability density functions_ with respect to Lebesgue measure. # If $X$ is a continuous random variables, there is some nonnegative function $f(x)$, # the probability density of $X$, such that # for any (suitable) set $A \subset \Re$, # $$ # {\mathbb P} \{ X \in A \} = \int_A f(x) dx. # $$ # Since ${\mathbb P} \{ X \in \Re \} = 1$, it follows that $\int_{-\infty}^\infty f(x) dx = 1$. # # _Example._ # Let $f(x) = \lambda e^{-\lambda x}$ for $x \ge 0$, with $\lambda > 0$ fixed, and $f(x) = 0$ otherwise. # Clearly $f(x) \ge 0$. # $$ # \int_{-\infty}^\infty f(x) dx = \int_0^\infty \lambda e^{-\lambda x} dx # = - e^{-\lambda x}|_0^\infty = - 0 + 1 = 1. # $$ # Hence, $\lambda e^{-\lambda x}$ can be the probability density of a continuous random variable. # A random variable with this density is said to be _exponentially distributed_. # Exponentially distributed random variables are used to model radioactive decay and the failure # of items that do not "fatigue." For instance, the lifetime of a semiconductor after an initial # "burn-in" period is often modeled as an exponentially distributed random variable. # It is also a common model for the occurrence of earthquakes (although it does not fit the data well). # # _Example._ # Let $a$ and $b$ be real numbers with $a < b$, and let $f(x) = \frac{1}{b-a}$, $x \in [a, b]$ and # $f(x)=0$, otherwise. # Then $f(x) \ge 0$ and $\int_{-\infty}^\infty f(x) dx = \int_a^b \frac{1}{b-a} = 1$, # so $f(x)$ can be the probability density function of a continuous random variable. # A random variable with this density is sad to be _uniformly distributed on the interval $[a, b]$_. # # Discrete random variables assign all their probability to some _countable_ set of points $\{x_i\}_{i=1}^n$, # where $n$ might be infinite. # Discrete random variables have _probability mass functions_. # If $X$ is a discrete random variable, there is a nonnegative function $p$, the probability mass function # of $X$, such that # for any set $A \subset \Re$, # $$ # {\mathbb P} \{X \in A \} = \sum_{i: x_i \in A} p(x_i). # $$ # The value $p(x_i) = {\mathbb P} \{X = x_i\}$, and $\sum_{i=1}^\infty p(x_i) = 1$. # # _Example._ # Fix $\lambda > 0$. # Let $x_i = i-1$ for $i=1, 2, \ldots$, and let $p(x_i) = e^{-\lambda} \lambda^{x_i}/x_i!$. # Then $p(x_i) > 0$ and # $$ # \sum_{i=1}^\infty p(x_i) = e^{-\lambda} \sum_{j=0}^\infty \lambda^j/j! = e^{-\lambda} e^{\lambda} = 1. # $$ # Hence, $p(x)$ is the probability mass function of a discrete random variable. # A random variable with this probability mass function is said to be _Poisson distributed (with parameter # $\lambda$)_. # Poisson-distributed random variables are often used to model rare events. # # # _Example._ # Let $x_i = i$ for $i=1, \ldots, n$, and let $p(x_i) = 1/n$ and $p(x) = 0$, otherwise. # Then $p(x) \ge 0$ and $\sum_{x_i} p(x_i) = 1$. # Hence, $p(x)$ can be the probability mass function of a discrete random variable. # A random variable with this probability mass function is said to be _uniformly distributed on $1, \ldots, n$_. # # _Example._ # Let $x_i = i-1$ for $i=1, \ldots, n+1$, and let $p(x_i) = {n \choose x_i} p^{x_i} (1-p)^{n-x_i}$, and # $p(x) = 0$ otherwise. # Then $p(x) \ge 0$ and # $$ # \sum_{x_i} p(x_i) = \sum_{j=0}^n {n \choose j} p^j (1-p)^{n-j} = 1, # $$ # by the binomial theorem. # Hence $p(x)$ is the probability mass function of a discrete random variable. # A random variable with this probability mass function is said to be _binomially distributed # with parameters $n$ and $p$_. # The number of successes in $n$ independent trials that each have the same probability $p$ of success # has a binomial distribution with parameters $n$ and $p$ # For instance, the number of times a fair die lands with 3 spots showing in 10 independent rolls has # a binomial distribution with parameters $n=10$ and $p = 1/6$. # # For general random variables, the chance that $X$ is in some subset of $\Re$ cannot be written as # a sum or as a Riemann integral; it is more naturally represented as a Lebesgue integral (with respect to # a measure other than Lebesgue measure). # For example, imagine a random variable $X$ that has probability $\alpha$ of being equal to zero; # and if $X$ is not zero, it has a uniform distribution on the interval $[0, 1]$. # Such a random variable is neither continuous nor discrete. # # Most of the random variables in this class are either discrete or continuous. # # If $X$ is a random variable such that, for some constant $x_1 \in \Re$, ${\mathbb P}(X = x_1) = 1$, $X$ # is called a _constant random variable_. #
# ### Exercises # # 1. Show analytically that $\sum_{x_i} p(x_i) = \sum_{j=0}^n {n \choose j} p^j (1-p)^{n-j} = 1$. # + Write a Python program that verifies that equation numerically for $n=10$: for 1000 values of $p$ # equispaced on the interval $(0, 1)$, find the maximum absolute value of the difference between the sum and 1. # 1. Let $ \in (0, 1]$; let $x_i = 1, 2, \ldots$; and define $p(x_i) = (1-p)^{x_i-1}p$, and $p(x) = 0$ otherwise. Show analytically that $p(x)$ is the probability mass function of a discrete random variable. # (A random variable with this probability mass function is said to be _geometrically distributed with parameter $p$_.) # #
# ### Cumulative Distribution Functions # # The _cumulative distribution function_ or _cdf_ of a real-valued random variable is the chance that the variable is less than $x$, as a function of $x$. # Cumulative distribution functions are often denoted with capital Roman letters ($F$ is especially common notation): # # $$F_X(x) \equiv \mathbb{P}(X \le x).$$ # # Clearly: # # + $0 \le F_X(x) \le 1$ # + $F_X(x)$ increases monotonically with $x$ (i.e., $F_X(a) \le F_X(b)$ if $a \le b$. # + $\lim_{x \rightarrow -\infty} F_X(x) = 0$ # + $\lim_{x \rightarrow \infty} F_X(x) = 1$ # # The cdf of a continuous real-valued random variable is a continuous function. # The cdf of a discrete real-valued random variable is piecewise constant, with jumps at the possible values of the random variable. # If the cdf of a real-valued random variable has jumps and also regions where it is not constant, the random variable is neither continuous nor discrete. # # ### Examples # [To Do] # In[6]: # boilerplate get_ipython().run_line_magic('matplotlib', 'inline') from __future__ import division import math import numpy as np import scipy as sp from scipy import stats # distributions from scipy import special # special functions import matplotlib.pyplot as plt from ipywidgets import interact, interactive, FloatRangeSlider, fixed # interactive stuff # In[ ]: # Examples of densities and cdfs # U[0,1] def pltUnif(a,b): ffac = 0.1 s = b-a fudge = ffac*s x = np.arange(a-fudge, b+fudge, s/200) y = np.ones(len(x))/s y[xb] = np.zeros(np.sum(x > b)) # zero for x > b Y = (x-a)/s # uniform CDF is linear Y[x= b] = np.ones(np.sum(x >= b)) plt.plot(x,y,'b-',x,Y,'r-',linewidth=2) plt.plot((a-fudge, b+fudge), (0.5, 0.5), 'g--') # horizontal green dashed line at 0.5 plt.plot((a-fudge, b+fudge), (0, 0), 'k-') # horizontal black line at 0 plt.xlabel('$x$') # axis labels. Can use LaTeX math markup plt.ylabel(r'$f(x) = 1_{[a,b]}/(b-a)') plt.axis([a-fudge,b+fudge,-0.1,(1+ffac)*max(1, 1/s)]) # axis limits plt.title('The $U[$' + str(a) + ',' + str(b) + '$]$ density and cdf') plt.show() interactive(pltUnif, \ [a, b] = FloatRangeSlider(min = -5, max = 5, step = 0.05, lower=-1, upper=1)) # In[ ]: # Exponential(lambda) def plotExp(lam): ffac = 0.05 x = np.arange(0, 5/lam, step=(5/lam)/200) y = sp.stats.expon.pdf(x, scale = 1/lam) Y = sp.stats.expon.cdf(x, scale = 1/lam) plt.plot(x,y,'b-',x,Y,'r-',linewidth=2) plt.plot((-.1, (1+ffac)*np.max(x)), (0.5, 0.5), 'g--') # horizontal line at 0.5 plt.plot((-.1, (1+ffac)*np.max(x)), (1, 1), 'k:') # horizontal line at 1 plt.xlabel('$x$') # axis labels. Can use LaTeX math markup plt.ylabel(r'$f(x) = \lambda e^{-\lambda x}; F(x) = 1-e^{\lambda x}$.') plt.title(r'The exponential density and cdf for $\lambda=$' + str(lam)) plt.axis([-.1,(1+ffac)*np.max(x),-0.1,(1+ffac)*max(1, lam)]) # axis limits plt.show() interact(plotExp, lam=(0, 10, 1)) # ## Jointly Distributed Random Variables # # Often we work with more than one random variable at a time. # Indeed, much of this course concerns _random vectors_, the components of which are individual # real-valued random variables. # # The _joint probability distribution_ of a collection of random variables $\{X_i\}_{i=1}^n$ gives the probability that # the variables simultaneously fall in subsets of their possible values. # That is, for every (suitable) subset $ A \in \Re^n$, the joint probability distribution of $\{X_i\}_{i=1}^n$ # gives ${\mathbb P} \{ (X_1, \ldots, X_n) \in A \}$. # # An _event determined by the random variable $X$_ is an event of the form $X \in A$, where $A \subset \Re$. # # An _event determined by the random variables $\{X_j\}_{j \in J}$_ is an event of the form # $(X_j)_{j \in J} \in A$, where $A \subset \Re^{\#J}$. # # Two random variables $X_1$ and $X_2$ are _independent_ if every event determined by $X_1$ is independent # of every event determined by $X_2$. # If two random variables are not independent, they are _dependent_. # # A collection of random variables $\{X_i\}_{i=1}^n$ is _independent_ if every event determined by every subset # of those variables is independent of every event determined by any disjoint subset of those variables. # If a collection of random variables is not independent, it is _dependent_. # # Loosely speaking, a collection of random variables is independent if learning the values of some of them # tells you nothing about the values of the rest of them. # If learning the values of some of them tells you anything about the values of the rest of them, # the collection is dependent. # # For instance, imagine tossing a fair coin twice and rolling a fair die. # Let $X_1$ be the number of times the coin lands heads, and $X_2$ be the number of spots that show on the die. # Then $X_1$ and $X_2$ are independent: learning how many times the coin lands heads tells you nothing about what # the die did. # # On the other hand, let $X_1$ be the number of times the coin lands heads, and let $X_2$ be the sum of the # number of heads and the number of spots that show on the die. # Then $X_1$ and $X_2$ are dependent. For instance, if you know the coin landed heads twice, you know that the sum # of the number of heads and the number of spots must be at least 3. # ## Expectation # # See [SticiGui: The Long Run and the Expected Value](http://www.stat.berkeley.edu/~stark/SticiGui/Text/expectation.htm) for an elementary introduction to expectation. # # The _expectation_ or _expected value_ of a random variable $X$, denoted ${\mathbb E}X$, is a probability-weighted average of its possible values. # From a frequentist perspective, it is the long-run limit (in probabiity) of the average of its values in repeated experiments. # The expected value of a real-valued random variable (when it exists) is a fixed number, not a random value. # The expected value depends on the probability distribution of $X$ but not on any realized value of $X$. # If two random variables have the same probability distribution, they have the same expected value. # #
# ### Properties of Expectation # # + For any real $\alpha \in \Re$, if ${\mathbb P} \{X = \alpha\} = 1$, then ${\mathbb E}X = \alpha$: the expected # value of a constant random variable is that constant. # + For any real $\alpha \in \Re$, ${\mathbb E}(\alpha X) = \alpha {\mathbb E}X$: scalar homogeneity. # + If $X$ and $Y$ are random variables, ${\mathbb E}(X+Y) = {\mathbb E}X + {\mathbb E}Y$: additivity. # #
# # ### Calculating Expectation # If $X$ is a continuous real-valued random variable with density $f(x)$, then the expected value of $X$ is # $$ # {\mathbb E}X = \int_{-\infty}^\infty x f(x) dx, # $$ # provided the integral exists. # # If $X$ is a discrete real-valued random variable with probability function $p$, then the expected value of $X$ is # $$ # {\mathbb E}X = \sum_{i=1}^\infty x_i p(x_i), # $$ # where $\{x_i\} = \{ x \in \Re: p(x) > 0\}$, # provided the sum exists. # ## Examples # # ### Uniform # Suppose $X$ has density $f(x) = \frac{1}{b-a}$ for $a \le x \le b$ and $0$ otherwise. # Then # # $$ \mathbb{E} = \int_{-\infty}^\infty x f(x) dx = \frac{1}{b-a} \int_a^b x dx = \frac{b^2-a^2}{2(b-a)} = # \frac{a+b}{2}.$$ # # # # ### Poisson # Suppose $X$ has a Poisson distribution with parameter $\lambda$. # Then # # $$\mathbb{E}X = e^{-\lambda} \sum_{j=0}^\infty j \lambda^j/j! = \lambda.$$ # ## Examples relates to Bernoulli Trials # # ### Bernoulli # Suppose $X$ can take only two values, 0 and 1, and the probability that $X= 1$ is $p$. # Then # # $$\mathbb{E} X = 1 \times p + 0 \times (1-p) = p.$$ # # ### Binomial # [To do.] Derive the Binomial distribution as the number of successes in $n$ iid Bernoulli trials. # # The number of successes $X$ in $n$ trials is equivalent to the sum of indicators for the success in each trial. That is, # # $$ X = \sum_{i=1}^n X_i,$$ # # where $X_i = 1$ if the $i$th trial results in success, and $X_i = 0$ otherwise. # By the additive property of expectation, # # $$ \mathbb{E}X = \mathbb{E} \sum_{i=1}^n X_i = \sum_{i=1}^n \mathbb{E}X_i = # \sum_{i=1}^n p = np.$$ # # ### Geometric # # The number of trials to the first success in iid Bernoulli($p$) trials has a _geometric distribution with parameter $p$_. # # [To do.] Derive the geometric and calculate expectation. # # # ### Negative Binomial # The number of trials to the $k$th success in iid Bernoulli($p$) trials # has a _negative binomial distribution with parameters $p$ and $k$_. # # [To do.] Derive the negative binomial. # # The number of trials $X$ until the $k$th success in iid Bernoulli trials can be written as the number of trials until the 1st success plus the number to the second success plus \hellip; plus the number of trials to the $k$th success. # Each of those $k$ "waiting times" $X_i$ has a geometric distribution. # Hence # # $$ \mathbb{E}X = \mathbb{E} \sum_{i=1}^k X_i = \sum_{i=1}^k \mathbb{E}X_i = # \sum_{i=1}^k 1/p = k/p.$$ # # ### Hypergeometric # [To do.] Derive hypergeometric. # # Population of $N$ numbers of which $G$ equal 1 and $N-G$ equal 0. # Number of 1s in a sample of size $n$ drawn without replacement. # # $$ \mathbb{P} \{X = x\} = \frac{ {{G} \choose {x}}{{N-g} \choose {n-x}}}{{N}\choose{n}}.$$ # # [To do.] Calculate expected value. Use random permutations of "tickets" to show that expected value in each position is $G/N$. # ## Examples related to sampling from finite populations # # ### One draw from a box of numbered tickets # [To do.] # # ### The sample sum of $n$ draws from a box # [To do.] # # ### The sample mean of $n$ draws from a box # [To do.] # ## Variance, Standard Error, and Covariance # # See [SticiGui: Standard Error](http://www.stat.berkeley.edu/~stark/SticiGui/Text/standardError.htm) for an elementary introduction to variance and standard error. # # The _variance_ of a random variable $X$ is $\mbox{Var }X = {\mathbb E}(X - {\mathbb E}X)^2$. # # Algebraically, the following identity holds: # $$ # \mbox{Var } X = {\mathbb E}(X - {\mathbb E}X)^2 = {\mathbb E}X^2 - 2({\mathbb E}X)^2 + ({\mathbb E}X)^2 = # {\mathbb E}X^2 - ({\mathbb E}X)^2. # $$ # However, this is generally not a good way to calculate $\mbox{Var} X$ numerically, because of roundoff: # it sacrifices precision unnecessarily. # # The _standard error_ of a random variable $X$ is $\mbox{SE }X = \sqrt{\mbox{Var } X}$. # # If $\{X_i\}_{i=1}^n$ are independent, then $\mbox{Var} \sum_{i=1}^n X_i = \sum_{i=1}^n \mbox{Var }X_i$. # # If $X$ and $Y$ have a joint distribution, then $\mbox{cov} (X,Y) = {\mathbb E} (X - {\mathbb E}X)(Y - {\mathbb E}Y)$. # It follows from this definition (and the commutativity of multiplication) # that $\mbox{cov}(X,Y) = \mbox{cov}(Y,X)$. # Also, # $$ # \mbox{var }(X+Y) = \mbox{var }X + \mbox{var }Y + 2\mbox{cov}(X,Y). # $$ # # If $X$ and $Y$ are independent, $\mbox{cov }(X,Y) = 0$. # However, the converse is not necessarily true: $\mbox{cov}(X,Y) = 0$ does not in general imply that # $X$ and $Y$ are independent. # ## Examples # # ### Variance of a Bernoulli random variable # # ### Variance of a Binomial random variable # # ### Variance of a Geometric and Negative Binomial random variable # # ### Variance of the sample sum and sample mean # ## Random Vectors # # Suppose $\{X_i\}_{i=1}^n$ are jointly distributed random variables, and let # $$ # X = # \begin{pmatrix} # X_1 \\ # \vdots \\ # X_n # \end{pmatrix} # . # $$ # Then $X$ is a random vector, a $n$ by $1$ vector of real-valued random variables. # # The expected value of $X$ is # $$ # {\mathbb E} X \equiv # \begin{pmatrix} # {\mathbb E} X_1 \\ # \vdots \\ # {\mathbb E} X_n # \end{pmatrix} # . # $$ # # The _covariance matrix_ of $X$ is # $$ # \mbox{cov } X \equiv # {\mathbb E} # \left ( # \begin{pmatrix} # X_1 - {\mathbb E} X_1 \\ # \vdots \\ # X_n - {\mathbb E} X_n # \end{pmatrix} # \begin{pmatrix} # X_1 - {\mathbb E} X_1 & \cdots & X_n - {\mathbb E} X_n # \end{pmatrix} # \right ) # = # {\mathbb E} # \begin{pmatrix} # (X_1 - {\mathbb E} X_1)^2 & (X_1 - {\mathbb E} X_1)(X_2 - {\mathbb E} X_2) & \cdots & (X_1 - {\mathbb E} X_1)(X_n - {\mathbb E} X_n) \\ # (X_1 - {\mathbb E} X_1)(X_2 - {\mathbb E} X_2) & (X_2 - {\mathbb E} X_2)^2 & \cdots & (X_2 - {\mathbb E} X_2)(X_n - {\mathbb E} X_n) \\ # \vdots & \vdots & \ddots & \vdots \\ # (X_1 - {\mathbb E} X_1)(X_n - {\mathbb E} X_n) & (X_2 - {\mathbb E} X_2)(X_n - {\mathbb E} X_n) & \cdots & (X_n - {\mathbb E} X_n)^2 # \end{pmatrix} # $$ # # $$ # = # \begin{pmatrix} # {\mathbb E}(X_1 - {\mathbb E} X_1)^2 & {\mathbb E}((X_1 - {\mathbb E} X_1)(X_2 - {\mathbb E} X_2)) & \cdots & # {\mathbb E}(X_1 - {\mathbb E} X_1)(X_n - {\mathbb E} X_n)) \\ # {\mathbb E}((X_1 - {\mathbb E} X_1)(X_2 - {\mathbb E} X_2)) & {\mathbb E}(X_2 - {\mathbb E} X_2)^2 & \cdots & # {\mathbb E}((X_2 - {\mathbb E} X_2)(X_n - {\mathbb E} X_n)) \\ # \vdots & \vdots & \ddots & \vdots \\ # {\mathbb E}((X_1 - {\mathbb E} X_1)(X_n - {\mathbb E} X_n)) & {\mathbb E}(X_2 - {\mathbb E} X_2)(X_n - {\mathbb E} X_n)) & \cdots & {\mathbb E}(X_n - {\mathbb E} X_n)^2 # \end{pmatrix} # . # $$ # # Covariance matrices are always _positive semidefinite_. # (If $x'Ax \ge 0$ for all $x \in \Re^n$, $A$ is _nonnegative definite_ (or _positive semi-definite_. [Here](./linalg.ipynb) is a review of linear algebra.) # ## The Multivariate Normal Distribution # # The notation $X \sim {\mathcal N}(\mu, \sigma^2)$ means that $X$ has a normal distribution with mean $\mu$ # and variance $\sigma^2$. # This distribution is continuous, with probability density function # $$ # \frac{1}{\sqrt{2\pi} \sigma} e^{\frac{-(x-\mu)^2}{2\sigma^2}}. # $$ # # If $X \sim {\mathcal N}(\mu, \sigma^2)$, then $\frac{X-\mu}{\sigma} \sim {\mathcal N}(0, 1)$, # the _standard normal distribution_. # The probability density function of the standard normal distribution is # $$ # \phi(x) = \frac{1}{\sqrt{2\pi}} e^ {-x^2/2}. # $$ # In[ ]: ## Plot the standard normal density and cdf def plotNorm(mu, sigma): x = np.arange(mu-4*sigma, mu+4*sigma, 8*sigma/200) y = np.exp(-x**2/(2*sigma**2))/(sigma*math.sqrt(2*math.pi)) # for clarity Y = sp.stats.norm.cdf(x, loc=mu, scale=sigma) # using scipy for convenience plt.plot(x,y,'b-',x,Y,'r-',linewidth=2) plt.plot((mu-4.1*sigma, mu+4.1*sigma), (0.5, 0.5), 'g--') # horizontal line at 0.5 plt.xlabel('$x$') # axis labels. Can use LaTeX math markup plt.ylabel(r'$f(x) = \frac{1}{\sqrt{2\pi}\sigma} e^{-x^2/2\sigma^2}$; $F(x)$') plt.axis([mu-4.1*sigma, mu+4.1*sigma,0,max(1.1,max(y))]) # axis limits plt.title(r'The $\mathcal{N}($' + str(mu) + ',' + str(sigma**2) + '$)$ density and cdf') plt.show() interact(plotNorm, mu=(-5,5,.05), sigma=(0.1, 10, .1)) # A collection of random variables $\{ X_1, X_2, \ldots, X_n\} = \{X_j\}_{j=1}^n$ is _jointly normal_ # if all linear combinations of those variables have normal distributions. # That is, the collection is jointly normal if for all $\alpha \in \Re^n$, $\sum_{j=1}^n \alpha_j X_j$ # has a normal distribution. # # If $\{X_j \}_{j=1}^n$ are independent, normally distributed random variables, they are jointly normal. # # If for some $\mu \in \Re^n$ and positive-definite matrix $G$, the joint density of $\{X_j \}_{j=1}^n$ is # $$ # \left ( \frac{1}{\sqrt{2 \pi}}\right )^n \frac{1}{\sqrt{\left | G \right |}} # \exp \left \{ - \frac{1}{2} (x - \mu)'G^{-1}(x-\mu) \right \}, # $$ # then $\{X_j \}_{j=1}^n$ are jointly normal, and the covariance matrix of $\{X_j\}_{j=1}^n$ is $G$. # ## The Central Limit Theorem # # For an elementary discussion, see [SticiGui: The Normal Curve, The Central Limit Theorem, and Markov's and Chebychev's Inequalities for Random Variables](http://www.stat.berkeley.edu/~stark/SticiGui/Text/clt.htm). # # Suppose $\{X_j \}_{j=1}^\infty$ are independent and identically distributed (iid), have finite expected value ${\mathbb E}X_j = \mu$, and have finite variance $\mbox{var }X_j = \sigma^2$. # # Define the sum $S_n \equiv \sum_{j=1}^n X_j$. # Then # $$ # {\mathbb E}S_n = {\mathbb E} \sum_{j=1}^n X_j = \sum_{j=1}^n {\mathbb E} X_j = \sum_{j=1}^n \mu = n\mu, # $$ # and # $$ # \mbox{var }S_n = \mbox{var } \sum_{j=1}^n X_j = n\sigma^2. # $$ # (The last step follows from the independence of $\{X_j\}$: the variance of the sum is the sum of the variances.) # # Define $Z_n \equiv \frac{S_n - n\mu}{\sqrt{n}\sigma}$. # Then for every $a, b \in \Re$ with $a \le b$, # $$ # \lim_{n \rightarrow \infty} {\mathbb P} \{ a \le Z_n \le b \} = \frac{1}{\sqrt{2\pi}} \int_a^b e^{-x^2/2} dx. # $$ # This is a basic form of the Central Limit Theorem. # ## Conditional Distributions # # The conditional distribution of a random variable or random vector # $X$ given the event $A$ is # # $$\mathbb{P}_{X|A}(B) = \mathbb{P} \{ X \in B | A \}$$ # # as a function of $B$, provided $\mathbb{P} (A) > 0$. # # # [To do] # # ## Conditional Expectation # [To do] # # Conditional expectation is a random variable... # The expectation of the conditional expectation is the unconditional expectation # $\mathbb{E}(\mathbb{E}(X|Y)) = \mathbb{E} X$. # This is essentially another expression of the law of total probability. # # ### Examples # [To do] Use random permutation of a list of numbers to illustrate: $\mathbb{E}X_j$, $\mathbb{E}(X_j | X_k = x)$, $\mathbb{E}(X_j | X_k)$, $\mathbb{E} (\mathbb{E}(X_j | X_k)) = $\mathbb{E}X_j$. # # # ## Point Processes # # Point processes formalize the notion of something occurring at a random place or time (or both). # # For instance, imagine the radioactive decay of a mass of uranium; the particular times at which an atom decays are modeled well as a Poisson process (described below). # # # ### Poisson Processes # # Temporal, spatiotemporal. Waiting times (inter-arrival times) are exponential. # Alternative characterizations. # # Temporal point processes: the counting function. # [To Do] # # # #### Marked Poisson Processes # [To Do] # # ### Renewal Processes # [To Do] # # # ### Branching Processes # [To Do] # # # #### Hawkes Processes # [To Do] # # Previous: [Theories of Probability](probTheory.ipynb) Next: [Probability Inequalities](ineq.ipynb) # In[ ]: get_ipython().run_line_magic('run', 'talkTools.py') # In[ ]: