Analysis of 2000 overall course marks from ESESIS shows that the typical marks breakdown is as follows:
Fail: 4.3% 3rd: 9.5% 2ii: 18.4% 2i: 38.4% 1st: 29.4%
Now consider the following distribution of results from two different groups of students:
Grade | Students - group 1 | Students - group 2 |
---|---|---|
Failed | 3 | 0 |
3rd | 10 | 8 |
2ii | 23 | 7 |
2i | 30 | 25 |
1st | 20 | 39 |
Consider each group in turn - are their results atypical?
import numpy as np
# As we want to do this twice, lets write a function.
def marks_atypical(Obs):
# Observations
total = np.sum(Obs)
# Typical distribution of marks.
typical = np.array([4.3, 9.5, 18.4, 38.4, 29.4])
# Expected marks
Exp = total*typical/100.0
# Check assumptions for the use of this test.
greater_than_5 = 0
for val in Exp:
# Check all values are greater than 1.
assert val > 1
if val >= 5:
greater_than_5 += 1
# Check at least 20% are greater than 5.
assert greater_than_5 > 0.2*Exp.size
from scipy import stats
s_statistic, p_value = stats.chisquare(Obs,Exp)
if p_value < 0.05:
return True # ie the distributions are not the same
else:
return False
Obs = np.array([3, 10, 23, 30, 20])
print "Dataset: ", Obs
if marks_atypical(Obs):
print "Marks are atypical."
else:
print "Marks are typical."
Obs = np.array([0, 8, 7, 25, 39])
print "Dataset: ", Obs
if marks_atypical(Obs):
print "Marks are atypical."
else:
print "Marks are typical."
Dataset: [ 3 10 23 30 20] Marks are typical. Dataset: [ 0 8 7 25 39] Marks are atypical.