This notebook accompanies my blog article Benchmarking Statistics. It gives you the data, ready to play around with it.
First, we need the data from the article:
A = [9,7,4,6,3]
B = [2,3,3,8,4]
So as a first step, we compute the averages.
import scipy.stats as st
def avg(lst):
return float(sum(lst)) / len(lst)
print("A:", avg(A))
print("B:", avg(B))
print("Speedup: %.2f" % ((avg(A) / avg(B))))
A: 5.8 B: 4.0 Speedup: 1.45
We see $A>B$, so B is 45% faster? Not so quick. Let's look at the standard deviation.
stddev = st.nanstd
print("A: %.2f" % stddev(A))
print("B: %.2f" % stddev(B))
print(avg(A)-stddev(A), "<", avg(B)+stddev(B), " (overlap!)")
A: 2.39 B: 2.35 3.41253272274 < 6.34520787991 (overlap!)
Ok, now for real. Let's do the Student-T test.
pval = st.ttest_ind(A,B)[1]
print("p-value: %.0f%%" % (pval*100))
p-value: 26%
A p-value of 26% means "not really significant". Usually we want p to less than 5%. Let's take more samples!
pval = st.ttest_ind(A*3,B*3)[1]
print("p-value: %.2f%%" % (pval*100))
p-value: 3.25%
Hm. LibreOffice gave me 1%. Maybe I mixed up the t-test config?