IPython notebook in Dr. H's class¶

Statistical distribution¶

In [6]:

%pylab inline

Populating the interactive namespace from numpy and matplotlib

WARNING: pylab import has clobbered these variables: ['random']
`%matplotlib` prevents importing * from pylab and numpy

Write a function that generates random coin flips, storing a value of 0 and +1 to correspond to heads and tails.
Then, write a loop to flip this X time (ie x = 1, 10, 100) and see what our average score (total number of heads - total number tails) is!

In [7]:

import random

def fair_coin():
    return random.choice([0,1])

Test it out

In [8]:

def multi_trial(flips, flip_function):
    total = 0
    for i in range(FLIPS):
        total += flip_function()   
    return total

In [29]:

FLIPS = 10 #How many flips to do

score = multi_trial(FLIPS, fair_coin)
print 'Total score is: %s out of %s flips for "%s"' % (score, FLIPS, 'fair_coin')

Total score is: 3 out of 10 flips for "fair_coin"

What is a score that is out of the ordinary for 10 flips? IE we expect on average, to get 0; however, we know that we won't always get 0. We don't know the statistical distribution? Let's test this for 1000 simulations of 10 flips.

In [27]:

def simulation(trials=5, fcn=fair_coin, plot=True):
    ''' Run a function TRIALS times, plot the results in a histogram '''
    total = []
    for i in range(trials):
        total.append(multi_trial(FLIPS, fcn))

    if plot:
        plt.hist(total, bins=20, color='black')
        plt.title('%s trials of %s' % (trials, fcn.__name__))
        plt.xlabel('Score')
        plt.ylabel('Occurrences')
              
simulation(2000) 

TRY THIS WITH MORE FLIPS PER TRIAL (IE CHANGE FLIPS) Notice how the space fills up!

Is this the binomial distribution?

In [13]:

from IPython.display import Image
Image(url='http://docs.scipy.org/doc/numpy/_images/math/d0e9bf22b11bd63cac4e02feb23f7c40e2c26943.png')

Out[13]:

We can use Numpy's binomial distribution generator to check!.

In [14]:

help(np.random.binomial)

Help on built-in function binomial:

binomial(...)
    binomial(n, p, size=None)
    
    Draw samples from a binomial distribution.
    
    Samples are drawn from a Binomial distribution with specified
    parameters, n trials and p probability of success where
    n an integer >= 0 and p is in the interval [0,1]. (n may be
    input as a float, but it is truncated to an integer in use)
    
    Parameters
    ----------
    n : float (but truncated to an integer)
            parameter, >= 0.
    p : float
            parameter, >= 0 and <=1.
    size : {tuple, int}
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.
    
    Returns
    -------
    samples : {ndarray, scalar}
              where the values are all integers in  [0, n].
    
    See Also
    --------
    scipy.stats.distributions.binom : probability density function,
        distribution or cumulative density function, etc.
    
    Notes
    -----
    The probability density for the Binomial distribution is
    
    .. math:: P(N) = \binom{n}{N}p^N(1-p)^{n-N},
    
    where :math:`n` is the number of trials, :math:`p` is the probability
    of success, and :math:`N` is the number of successes.
    
    When estimating the standard error of a proportion in a population by
    using a random sample, the normal distribution works well unless the
    product p*n <=5, where p = population proportion estimate, and n =
    number of samples, in which case the binomial distribution is used
    instead. For example, a sample of 15 people shows 4 who are left
    handed, and 11 who are right handed. Then p = 4/15 = 27%. 0.27*15 = 4,
    so the binomial distribution should be used in this case.
    
    References
    ----------
    .. [1] Dalgaard, Peter, "Introductory Statistics with R",
           Springer-Verlag, 2002.
    .. [2] Glantz, Stanton A. "Primer of Biostatistics.", McGraw-Hill,
           Fifth Edition, 2002.
    .. [3] Lentner, Marvin, "Elementary Applied Statistics", Bogden
           and Quigley, 1972.
    .. [4] Weisstein, Eric W. "Binomial Distribution." From MathWorld--A
           Wolfram Web Resource.
           http://mathworld.wolfram.com/BinomialDistribution.html
    .. [5] Wikipedia, "Binomial-distribution",
           http://en.wikipedia.org/wiki/Binomial_distribution
    
    Examples
    --------
    Draw samples from the distribution:
    
    >>> n, p = 10, .5 # number of trials, probability of each trial
    >>> s = np.random.binomial(n, p, 1000)
    # result of flipping a coin 10 times, tested 1000 times.
    
    A real world example. A company drills 9 wild-cat oil exploration
    wells, each with an estimated probability of success of 0.1. All nine
    wells fail. What is the probability of that happening?
    
    Let's do 20,000 trials of the model, and count the number that
    generate zero positive results.
    
    >>> sum(np.random.binomial(9,0.1,20000)==0)/20000.
    answer = 0.38885, or 38%.

In [15]:

n, p = 10, .5 # number of trials, probability of each trial
s = np.random.binomial(n, p, size=1000) # result of flipping a coin 10 times, tested 1000 times.
plt.hist(s, bins=20);

The variance of a binomial distribution follows the following formula:

$V(X) = np(1-p)$

Where n is the number of trials/flips. The distribution is N, ie this was done thousands of times to get a real statistical distribution.

Meaning the standard deviation is:

$\sigma(x) = \sqrt{V(x)}$

$\sigma(x) = \sqrt{np(1-p)}$

For a fair coin:

$\sigma(x) = \sqrt{n} \times 0.25$

In [16]:

print 'std is %s' %(np.std(s))
print 'sqrt n times .25 is',sqrt(10.0 * 0.25)

std is 1.52804417475
sqrt n times .25 is 1.58113883008

By the way, how fast this this?

In [17]:

%%timeit
s = np.random.binomial(n, p, size=1000) # result of flipping a coin 10 times, tested 1000 times.

10000 loops, best of 3: 99.6 µs per loop

In [18]:

%%timeit
simulation(2000, plot=False) 

10 loops, best of 3: 28.1 ms per loop

What, 100-1000 times faster?

What about two coins¶

We were pretty smart with our design of functions, so we can ensemble space, ie the number of possible outcomes.

In [33]:

def two_flips():
    return fair_coin() + fair_coin()

simulation(trials=5000, fcn=two_flips, plot=True)

In [ ]:

IGNORE ME¶

In [19]:

import os.path as op
from IPython.display import display, HTML

def load_style(s, figsize=None, loghide=False):
    """Load a CSS stylesheet in the notebook either by builtin styles, or
    from a file, or from a URL.
    Examples::
        %load_style 
        %load_style mystyle.css
        %load_style http://ipynbstyles.com/otherstyle.css
    """
    if s.startswith('http'):
        try:
            import requests
        except ImportError:
            raise ImportError('Failed to import python "requests" library;'
                              'please install to use load_style(url)')
        r =requests.get(s)
        style = r.text

    else:
        try:
            with open(s, 'r') as f:
                style = f.read()
        except IOError:
            raise IOError('Failed to load style as a url, file or builtin type.'
               ' Valid builtins are "%s"' % '","'.join(__all__.keys() ))

    out = '<style>\n{style}\n</style>'.format(style=style)
    
    if figsize:
        if figsize == True:
            figsize = 8, 5.5
        else:
            try:
                fx, fy = figsize
            except Exception:
                fx, fy = figsize, figsize
            
        fstring = "\ninput:pylab.rcParams['figure.figsize'] = %s, %s\n" \
            % (fx,fy)
        out += fstring

    display(HTML(out))
    
    if loghide:
        import warnings #supress non-skspec log msgs
        warnings.filterwarnings('ignore')      

In [20]:

#load_style('https://raw.githubusercontent.com/panditarevolution/ipythonNotebook_customs/master/monokai/custom.css')
load_style('https://raw.githubusercontent.com/hugadams/skspec/master/skspec/bundled/gwu.css')