Exercise 8.2¶

Every day, you visit the JCR, Library Cafe, College Cafe and all the other taste imperial outlets, and count how many Chicken and Bacon baguettes they have on sale; how many Ham and Cheese baguettes there are; and how many Carrot and Hommous baguettes there are. You record the numbers in a nice table:

Day\Baguette	C&B	H&C	C&H
Monday	32	35	38
Tuesday	20	18	30
Wednesday	27	29	8
Thursday	16	19	10
Friday	22	27	20

You have procured all this information because you read somewhere that, supposedly, 20 of each type are being added by Taste Imperial each day and that approximately 20 of each are eaten each day. You realise the ideal distribution should be:

Day\Baguette	C&B	H&C	C&H
Monday	20	20	20
Tuesday	20	20	20
Wednesday	20	20	20
Thursday	20	20	20
Friday	20	20	20

Perform a chi-squared test and see if reality matches the statistic that you read about.

(Note: All the above numbers have been invented and may not be anywhere close to the actual values)

In [1]:

import numpy as np
from scipy import stats

# This solution is broadly similar to the previous question so lets use
# that answer as a base to answer this one.
def sandwiches_atypical(Obs):
    # Observations 
    total = np.sum(Obs)

    # Typical distribution of marks.
    typical = np.ones_like(Obs)*20
    # Here I make an array of the same shape and size as Exp
    # that consists entirely of ones (a useful numpy function)
    # and multiply it by 20, to get an array with the expected values

    # Expected number of sandwiches
    Exp = total*typical/100.0

    # Check assumptions for the use of this test.
    greater_than_5 = np.size(Exp[Exp>5])
    # We need the total number of elements in the array that are greater than 5
    # We can find this using a neat property of numpy arrays. You can make
    # a new array of elements that fulfill a conditional statement when applied
    # to the old array. e.g. y = x[x==0] y will be an array of all the elements in
    # x that are equal to 0. If there are 2, y will be size 2; 0, y will be size 0
    # etc. This feature is extremely useful when manipulating arrays and DOES NOT
    # work with lists! Only numpy arrays!
    
    
    # Check at least 20% are greater than 5.
    assert greater_than_5 > 0.2*Exp.size

   
    s_statistic, p_value = stats.chisquare(Obs, Exp, axis = None)
    # By adding axis = None the test is applied to the whole array and not column by column
    
    if p_value < 0.05:
        return True # ie the distributions are not the same
    else:
        return False


Obs = np.array([[32,35,38],
               [20,18,30],
               [27,29,8],
               [16,19,10],
               [22,27,20]])
print "Dataset: \n {}".format(Obs)
# '\n' means 'new line' when printing a string
if sandwiches_atypical(Obs):
    print "Number of Sandwiches are atypical."
else:
    print "Number of Sandwiches are typical."

Dataset: 
 [[32 35 38]
 [20 18 30]
 [27 29  8]
 [16 19 10]
 [22 27 20]]
Number of Sandwiches are atypical.