Exercise 8.1 Chi-squared Test¶

Determine whether marks classifications for a course are atypical¶

Analysis of 2000 overall course marks from ESESIS shows that the typical marks breakdown is as follows:

Fail: 4.3% 3rd: 9.5% 2ii: 18.4% 2i: 38.4% 1st: 29.4%

Now consider the following distribution of results from two different groups of students:

Grade	Students - group 1	Students - group 2
Failed	3	0
3rd	10	8
2ii	23	7
2i	30	25
1st	20	39

Consider each group in turn - are their results atypical?

In [13]:

import numpy as np

# As we want to do this twice, lets write a function.
def marks_atypical(Obs):
    # Observations 
    total = np.sum(Obs)

    # Typical distribution of marks.
    typical = np.array([4.3, 9.5, 18.4, 38.4, 29.4])

    # Expected marks
    Exp = total*typical/100.0

    # Check assumptions for the use of this test.
    greater_than_5 = 0
    for val in Exp:
        # Check all values are greater than 1.
        assert val > 1
        if val >= 5:
            greater_than_5 += 1
    # Check at least 20% are greater than 5.
    assert greater_than_5 > 0.2*Exp.size

    from scipy import stats
    s_statistic, p_value = stats.chisquare(Obs,Exp)
    
    if p_value < 0.05:
        return True # ie the distributions are not the same
    else:
        return False

Obs = np.array([3, 10, 23, 30, 20])
print "Dataset: ", Obs
if marks_atypical(Obs):
    print "Marks are atypical."
else:
    print "Marks are typical."

Obs = np.array([0, 8, 7, 25, 39])
print "Dataset: ", Obs
if marks_atypical(Obs):
    print "Marks are atypical."
else:
    print "Marks are typical."

Dataset:  [ 3 10 23 30 20]
Marks are typical.
Dataset:  [ 0  8  7 25 39]
Marks are atypical.

In [ ]: