%pylab inline
Populating the interactive namespace from numpy and matplotlib
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats
t_dist = scipy.stats.t
norm_dist = scipy.stats.norm
x_values = np.linspace(-4, 4, 100)
The $t$ distribution at different degrees of freedom:
t_prob_3 = t_dist.pdf(x_values, 3)
t_prob_10 = t_dist.pdf(x_values, 10)
t_prob_40 = t_dist.pdf(x_values, 40)
z_prob = norm_dist.pdf(x_values)
plt.figure(figsize=(10, 8))
plt.plot(x_values, t_prob_3, 'r:', label='t df=3')
plt.plot(x_values, t_prob_10, 'g:', label='t df=10')
plt.plot(x_values, t_prob_40, 'b:', label='t df=40')
plt.plot(x_values, z_prob, 'k', label='z')
plt.xlabel('$z$ or $t$ value')
plt.ylabel('probability')
plt.legend()
<matplotlib.legend.Legend at 0x490bfd0>
This is the probability density function (PDF) - the probability of observing a $z$ or $t$ value.
The cumulative density function (CDF) is the area under the curve of the PDF up until a particular value, and is the probability of observing a $z$ or $t$ value less than or equal to the value on the x axis:
t_cdf_10 = t_dist.cdf(x_values, 10)
plt.figure(figsize=(10, 8))
plt.plot(x_values, t_cdf_10, 'b', label='t cdf df=10')
plt.plot(x_values, t_prob_10, 'k', label='t pdf df=10')
plt.xlabel('$z$ or $t$ value')
plt.ylabel('probability')
plt.legend()
<matplotlib.legend.Legend at 0x54d6b10>
The CDF maps $z$ or $t$ value to a probability. That is, it is a function $f$ such that $p = f(t)$ returns the propability of observing (say) $t$ value greater than or equal to $t$. Here is the probability of observing a t value of 2 or less in a t distribution with 10 degrees of freedom:
p = t_dist.cdf(2, 10)
p
0.96330598261462974
What if we want to go in the opposite direction? I mean, we want a function $g$ such that $t = g(p)$ - where $p$ is a probability and $t$ is the $t$ value?
In this case our function $g$ is the inverse of $f$ - that is $g(f(t)) = t$. In scipy
the inverse of the CDF is the percent point function or ppf
(see: http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm).
The $t$ value corresponding to $p$ above should be 2:
t_dist.ppf(p, 10)
1.9999999999960625
It isn't exactly 2, because of floating point error, but as you can see, it is very close.
The ppf
(inverse cumulative density function) allows us to find a $t$ threshold for a probility of - say - 0.05:
t_for_05 = t_dist.ppf(0.95, 10)
t_for_05
1.8124611228107335
Just for completeness, scipy
also has the survival function which is just $1 - cdf(t)$:
p_sf = t_dist.sf(2, 10)
p_sf
0.036694017385370196
1 - t_dist.cdf(2, 10)
0.036694017385370259
scipy
also has the inverse of the survival function, isf
:
t_dist.isf(p_sf, 10)
1.9999999999960638
scipy
has these functions for lots of distributions, including the normal distribution.
So - what is the probability of observing a $z$ value (from the normal distribution) of 0.9 or lower?
norm_dist.cdf(0.9)
0.81593987465324047
What $z$ value gives me a probability of 0.95 of seeing this $z$ value or less?
norm_dist.ppf(0.95)
1.6448536269514722