This notebook accompanies my blog article Benchmarking Statistics. It gives you the data, ready to play around with it.

First, we need the data from the article:

In [50]:

A = [9,7,4,6,3]
B = [2,3,3,8,4]

So as a first step, we compute the averages.

In [51]:

import scipy.stats as st
def avg(lst):
    return float(sum(lst)) / len(lst)
print("A:", avg(A))
print("B:", avg(B))
print("Speedup: %.2f" % ((avg(A) / avg(B))))

A: 5.8
B: 4.0
Speedup: 1.45

We see $A>B$, so B is 45% faster? Not so quick. Let's look at the standard deviation.

In [52]:

stddev = st.nanstd
print("A: %.2f" % stddev(A))
print("B: %.2f" % stddev(B))
print(avg(A)-stddev(A), "<", avg(B)+stddev(B), "    (overlap!)")

A: 2.39
B: 2.35
3.41253272274 < 6.34520787991     (overlap!)

Ok, now for real. Let's do the Student-T test.

In [53]:

pval = st.ttest_ind(A,B)[1]
print("p-value: %.0f%%" % (pval*100))

p-value: 26%

A p-value of 26% means "not really significant". Usually we want p to less than 5%. Let's take more samples!

In [54]:

pval = st.ttest_ind(A*3,B*3)[1]
print("p-value: %.2f%%" % (pval*100))

p-value: 3.25%

Hm. LibreOffice gave me 1%. Maybe I mixed up the t-test config?