Every day, you visit the JCR, Library Cafe, College Cafe and all the other taste imperial outlets, and count how many Chicken and Bacon baguettes they have on sale; how many Ham and Cheese baguettes there are; and how many Carrot and Hommous baguettes there are. You record the numbers in a nice table:
Day\Baguette | C&B | H&C | C&H |
---|---|---|---|
Monday | 32 | 35 | 38 |
Tuesday | 20 | 18 | 30 |
Wednesday | 27 | 29 | 8 |
Thursday | 16 | 19 | 10 |
Friday | 22 | 27 | 20 |
You have procured all this information because you read somewhere that, supposedly, 20 of each type are being added by Taste Imperial each day and that approximately 20 of each are eaten each day. You realise the ideal distribution should be:
Day\Baguette | C&B | H&C | C&H |
---|---|---|---|
Monday | 20 | 20 | 20 |
Tuesday | 20 | 20 | 20 |
Wednesday | 20 | 20 | 20 |
Thursday | 20 | 20 | 20 |
Friday | 20 | 20 | 20 |
Perform a chi-squared test and see if reality matches the statistic that you read about.
(Note: All the above numbers have been invented and may not be anywhere close to the actual values)
import numpy as np
from scipy import stats
# This solution is broadly similar to the previous question so lets use
# that answer as a base to answer this one.
def sandwiches_atypical(Obs):
# Observations
total = np.sum(Obs)
# Typical distribution of marks.
typical = np.ones_like(Obs)*20
# Here I make an array of the same shape and size as Exp
# that consists entirely of ones (a useful numpy function)
# and multiply it by 20, to get an array with the expected values
# Expected number of sandwiches
Exp = total*typical/100.0
# Check assumptions for the use of this test.
greater_than_5 = np.size(Exp[Exp>5])
# We need the total number of elements in the array that are greater than 5
# We can find this using a neat property of numpy arrays. You can make
# a new array of elements that fulfill a conditional statement when applied
# to the old array. e.g. y = x[x==0] y will be an array of all the elements in
# x that are equal to 0. If there are 2, y will be size 2; 0, y will be size 0
# etc. This feature is extremely useful when manipulating arrays and DOES NOT
# work with lists! Only numpy arrays!
# Check at least 20% are greater than 5.
assert greater_than_5 > 0.2*Exp.size
s_statistic, p_value = stats.chisquare(Obs, Exp, axis = None)
# By adding axis = None the test is applied to the whole array and not column by column
if p_value < 0.05:
return True # ie the distributions are not the same
else:
return False
Obs = np.array([[32,35,38],
[20,18,30],
[27,29,8],
[16,19,10],
[22,27,20]])
print "Dataset: \n {}".format(Obs)
# '\n' means 'new line' when printing a string
if sandwiches_atypical(Obs):
print "Number of Sandwiches are atypical."
else:
print "Number of Sandwiches are typical."
Dataset: [[32 35 38] [20 18 30] [27 29 8] [16 19 10] [22 27 20]] Number of Sandwiches are atypical.