Notebook

Suppose we have two groups, A,B, in our experiment, of approximately equal size, $N$. The groups have true conversion rates $p_A, p_B$ respectively. We use a beta-binomial model to find the posterior of $p$, e.g. $p_A \sim Beta(\alpha=1 + c_A, \beta= 1 + N - c_A)$, where $c_A$ is the number of conversions observed. For large $N$, this posterior is approximately (actually really close to being) Normally distributed, e.g.

$$p_A \sim Nor \left( \mu_A = \frac{\alpha}{\alpha+\beta}, \sigma_A = \frac{\frac{\alpha}{\alpha+\beta}\frac{\beta}{\alpha + \beta}}{(\alpha+\beta+1)} \right)$$

Ultimately, we are interested in when $Pr( p_A > p_B \;| \;c_A, c_B, n ) \ge 0.95 \Rightarrow Pr( p_A - p_B > 0 \;| \;c_A, c_B, n ) \ge 0.95$. As both $p_B$ and $p_A$ are normal, denoting $D = p_A - p_B$, then $D$ is Normal, $D \;| \;c_A, c_B, n \sim Nor\left( \mu = \mu_A - \mu_B, \sigma^2 = \sigma_A^2 + \sigma_B^2 \right)$. Suppose $\mu_A > \mu_B$ so that $\mu > 0$.

We'd like to find a $\sigma^2 = \sigma^2(N)$ (as $\sigma_{A,B}$ are functions of the sample size $N$) s.t. $Pr(D > 0 \;| \;c_A, c_B, n ) \ge 0.95 \Rightarrow Pr(D < 0 \;| \;c_A, c_B, n ) \le 0.05$.

$$ Pr(\frac{D - \mu}{\sigma} < \frac{-\mu}{\sigma}) \le 0.05 $$

Inverting the normal CDF:

$$\frac{-\mu}{\sigma} \le -1.65$$$$\frac{\mu^2}{1.65^2} \ge \sigma^2 = \sigma_A^2 + \sigma_B^2 $$

$\sigma_A^2$ can be approximated by $\frac{\hat{p}_A(1-\hat{p}_A)}{N}$, so:

$$ N \ge \frac{(1.65^2)\left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right)}{\mu^2} $$

where $\mu = \hat{p}_A - \hat{p}_B$. Denote

$$ N^* = \frac{(1.65^2)\left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right)}{\mu^2} $$

It can be empirically shown that using the above formula that

$$E[ 1_{Pr( D> 0 \;| \;C_A, C_B, N^*) \ge 0.95 )} ] = 0.5$$

i.e. there is a 50% chance that using $N^*$ will provide significance, a desirable property when we talk about expected sample size. Practically though, choosing $N^*$ should be a lower bound, and anything above it should be chosen.

Thus the "power" (defined at the probabilty of correctly rejecting insignificance) of the test is 50%. Can we tweak the above formula to add more power?

Of course by increasing $N$ we do this, but by how much? What is a good pre-determined power? Most pracitioners choose 80%, so let's try that first.

Empirically, it seems like multiplying the $N^*$ by 2.25 achieves 80% power.

Unequal groups¶

What about the case where $N_A \ne N_B$? We get as far as:

$$\frac{\mu^2}{1.65^2} \ge \left( \frac{\hat{p}_A(1-\hat{p}_A)}{N_A}+ \frac{\hat{p}_B(1-\hat{p}_B)}{N_B}\right) $$$$\frac{\mu^2}{1.65^2} \ge \frac{ \left( \frac{\hat{p}_A(1-\hat{p}_A)}{N_A}+ \frac{\hat{p}_B(1-\hat{p}_B)}{N_B}\right) } {\left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right)} \left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right) $$

We can call the term:

$$ \frac{ \left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right) } {\left( \frac{\hat{p}_A(1-\hat{p}_A)}{N_A}+ \frac{\hat{p}_B(1-\hat{p}_B)}{N_B}\right)} $$

The effective sample size. So if we assign $N_A = N, N_B = rN_A$, where $0 < r \le 1$, (so in the equal case $r=1$), then:

$$ \frac{ \left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right) } {\left( \frac{\hat{p}_A(1-\hat{p}_A)}{N}+ \frac{\hat{p}_B(1-\hat{p}_B)}{rN}\right)} $$$$ N \frac{ \left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right) } {\left( \hat{p}_A(1-\hat{p}_A) + \frac{\hat{p}_B(1-\hat{p}_B)}{r}\right)} $$

Notice that by symmetry, $r<1$ (consider if $r>1$, then we could rewrite the above $N_B = N, N_A = rN_B$, in which case $r < 1$ again). It is easy to see that for every $r < 1$, the effective sample size is smaller than $N$.

In [5]:

%pylab inline

Populating the interactive namespace from numpy and matplotlib

In [2]:

#tests

needed = lambda p_A, p_B: (1.65)**2*( p_A*(1-p_A) + p_B*(1-p_B) )/(p_A - p_B)**2
Var = lambda p,N: p*(1-p)/N

In [3]:

from scipy.stats import norm

In [98]:

figsize(12,6)
x = np.linspace(-0.5,.5, 1000)


p_A, p_B = 0.15,0.07
n = N(p_A, p_B)
print n
mu = (p_A - p_B)
var = np.sqrt( Var(p_A, n) + Var(p_B,n))
plot(x,norm.pdf(x, loc=mu, scale= var ))
print norm.cdf(0, loc=mu, scale= var )

81.930234375
0.0494714680336

In [85]:

n = 4000
p_A, p_B = 0.11,0.1
N = needed(p_A, p_B)*2.25
print N

11510.049375

In [79]:

from scipy.stats import beta
s = 0
for i in range(n):
    c_A = np.random.binomial(1, p_A, size=N)
    c_B = np.random.binomial(1, p_B, size=N)
    s+= (((beta.rvs(1 + c_B.sum(),1 + N - c_B.sum(), size=5000 ) - beta.rvs(1 + c_A.sum(),1 + N - c_A.sum(), size=5000 )
    ) > 0 ).mean() > 0.95)
    
print 1.0*s/n

0.8095

In [86]:

for i in range(10):
    v = np.random.random(size=2)
    p_A = v.min()
    p_B = v.max()
    N = needed(p_A, p_B)*2.25
    s = 0
    for i in range(n):
        c_A = np.random.binomial(1, p_A, size=N)
        c_B = np.random.binomial(1, p_B, size=N)
        s+= (((beta.rvs(1 + c_B.sum(),1 + N - c_B.sum(), size=5000 ) - beta.rvs(1 + c_A.sum(),1 + N - c_A.sum(), size=5000 )
        ) > 0 ).mean() > 0.95)
        
    print "Power: %.2f"%(1.0*s/n), "p_A: %.2f, p_B %.2f"%(p_A,p_B), "N: %d"%int(N)

Power: 0.76 p_A: 0.10, p_B 0.44 N: 18
Power: 0.04 p_A: 0.16, p_B 0.92 N: 2
Power: 0.78 p_A: 0.27, p_B 0.52 N: 42
Power: 0.74 p_A: 0.42, p_B 0.78 N: 19
Power: 0.70 p_A: 0.18, p_B 0.58 N: 14
Power: 0.00 p_A: 0.02, p_B 0.72 N: 2
Power: 0.75 p_A: 0.18, p_B 0.55 N: 17
Power: 0.77 p_A: 0.22, p_B 0.47 N: 41
Power: 0.72 p_A: 0.37, p_B 0.74 N: 19
Power: 0.79 p_A: 0.89, p_B 0.97 N: 135

In [ ]:

#effective sample size.
r = 0.5 
N =