In [1]:

%pylab inline

Populating the interactive namespace from numpy and matplotlib

In [2]:

import numpy as np
import matplotlib.pyplot as plt

In [3]:

import scipy.stats

In [4]:

t_dist = scipy.stats.t
norm_dist = scipy.stats.norm

In [5]:

x_values = np.linspace(-4, 4, 100)

The $t$ distribution at different degrees of freedom:

In [6]:

t_prob_3 = t_dist.pdf(x_values, 3)
t_prob_10 = t_dist.pdf(x_values, 10)
t_prob_40 = t_dist.pdf(x_values, 40)
z_prob = norm_dist.pdf(x_values)

In [7]:

plt.figure(figsize=(10, 8))
plt.plot(x_values, t_prob_3, 'r:', label='t df=3')
plt.plot(x_values, t_prob_10, 'g:', label='t df=10')
plt.plot(x_values, t_prob_40, 'b:', label='t df=40')
plt.plot(x_values, z_prob, 'k', label='z')
plt.xlabel('$z$ or $t$ value')
plt.ylabel('probability')
plt.legend()

Out[7]:

<matplotlib.legend.Legend at 0x490bfd0>

This is the probability density function (PDF) - the probability of observing a $z$ or $t$ value.

The cumulative density function (CDF) is the area under the curve of the PDF up until a particular value, and is the probability of observing a $z$ or $t$ value less than or equal to the value on the x axis:

In [8]:

t_cdf_10 = t_dist.cdf(x_values, 10)
plt.figure(figsize=(10, 8))
plt.plot(x_values, t_cdf_10, 'b', label='t cdf df=10')
plt.plot(x_values, t_prob_10, 'k', label='t pdf df=10')
plt.xlabel('$z$ or $t$ value')
plt.ylabel('probability')
plt.legend()

Out[8]:

<matplotlib.legend.Legend at 0x54d6b10>

The CDF maps $z$ or $t$ value to a probability. That is, it is a function $f$ such that $p = f(t)$ returns the propability of observing (say) $t$ value greater than or equal to $t$. Here is the probability of observing a t value of 2 or less in a t distribution with 10 degrees of freedom:

In [9]:

p = t_dist.cdf(2, 10)
p

Out[9]:

0.96330598261462974

What if we want to go in the opposite direction? I mean, we want a function $g$ such that $t = g(p)$ - where $p$ is a probability and $t$ is the $t$ value?

In this case our function $g$ is the inverse of $f$ - that is $g(f(t)) = t$. In scipy the inverse of the CDF is the percent point function or ppf (see: http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm).

The $t$ value corresponding to $p$ above should be 2:

In [10]:

t_dist.ppf(p, 10)

Out[10]:

1.9999999999960625

It isn't exactly 2, because of floating point error, but as you can see, it is very close.

The ppf (inverse cumulative density function) allows us to find a $t$ threshold for a probility of - say - 0.05:

In [11]:

t_for_05 = t_dist.ppf(0.95, 10)
t_for_05

Out[11]:

1.8124611228107335

Just for completeness, scipy also has the survival function which is just $1 - cdf(t)$:

In [12]:

p_sf = t_dist.sf(2, 10)
p_sf

Out[12]:

0.036694017385370196

In [13]:

1 - t_dist.cdf(2, 10)

Out[13]:

0.036694017385370259

scipy also has the inverse of the survival function, isf:

In [14]:

t_dist.isf(p_sf, 10)

Out[14]:

1.9999999999960638

scipy has these functions for lots of distributions, including the normal distribution.

So - what is the probability of observing a $z$ value (from the normal distribution) of 0.9 or lower?

In [15]:

norm_dist.cdf(0.9)

Out[15]:

0.81593987465324047

What $z$ value gives me a probability of 0.95 of seeing this $z$ value or less?

In [16]:

norm_dist.ppf(0.95)

Out[16]:

1.6448536269514722