Suppose we have two groups, A,B, in our experiment, of approximately equal size, $N$. The groups have true conversion rates $p_A, p_B$ respectively. We use a beta-binomial model to find the posterior of $p$, e.g. $p_A \sim Beta(\alpha=1 + c_A, \beta= 1 + N - c_A)$, where $c_A$ is the number of conversions observed. For large $N$, this posterior is approximately (actually really close to being) Normally distributed, e.g.
$$p_A \sim Nor \left( \mu_A = \frac{\alpha}{\alpha+\beta}, \sigma_A = \frac{\frac{\alpha}{\alpha+\beta}\frac{\beta}{\alpha + \beta}}{(\alpha+\beta+1)} \right)$$Ultimately, we are interested in when $Pr( p_A > p_B \;| \;c_A, c_B, n ) \ge 0.95 \Rightarrow Pr( p_A - p_B > 0 \;| \;c_A, c_B, n ) \ge 0.95$. As both $p_B$ and $p_A$ are normal, denoting $D = p_A - p_B$, then $D$ is Normal, $D \;| \;c_A, c_B, n \sim Nor\left( \mu = \mu_A - \mu_B, \sigma^2 = \sigma_A^2 + \sigma_B^2 \right)$. Suppose $\mu_A > \mu_B$ so that $\mu > 0$.
We'd like to find a $\sigma^2 = \sigma^2(N)$ (as $\sigma_{A,B}$ are functions of the sample size $N$) s.t. $Pr(D > 0 \;| \;c_A, c_B, n ) \ge 0.95 \Rightarrow Pr(D < 0 \;| \;c_A, c_B, n ) \le 0.05$.
$$ Pr(\frac{D - \mu}{\sigma} < \frac{-\mu}{\sigma}) \le 0.05 $$Inverting the normal CDF:
$$\frac{-\mu}{\sigma} \le -1.65$$$$\frac{\mu^2}{1.65^2} \ge \sigma^2 = \sigma_A^2 + \sigma_B^2 $$$\sigma_A^2$ can be approximated by $\frac{\hat{p}_A(1-\hat{p}_A)}{N}$, so:
$$ N \ge \frac{(1.65^2)\left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right)}{\mu^2} $$where $\mu = \hat{p}_A - \hat{p}_B$. Denote
$$ N^* = \frac{(1.65^2)\left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right)}{\mu^2} $$It can be empirically shown that using the above formula that
$$E[ 1_{Pr( D> 0 \;| \;C_A, C_B, N^*) \ge 0.95 )} ] = 0.5$$i.e. there is a 50% chance that using $N^*$ will provide significance, a desirable property when we talk about expected sample size. Practically though, choosing $N^*$ should be a lower bound, and anything above it should be chosen.
Thus the "power" (defined at the probabilty of correctly rejecting insignificance) of the test is 50%. Can we tweak the above formula to add more power?
Of course by increasing $N$ we do this, but by how much? What is a good pre-determined power? Most pracitioners choose 80%, so let's try that first.
Empirically, it seems like multiplying the $N^*$ by 2.25 achieves 80% power.
What about the case where $N_A \ne N_B$? We get as far as:
$$\frac{\mu^2}{1.65^2} \ge \left( \frac{\hat{p}_A(1-\hat{p}_A)}{N_A}+ \frac{\hat{p}_B(1-\hat{p}_B)}{N_B}\right) $$$$\frac{\mu^2}{1.65^2} \ge \frac{ \left( \frac{\hat{p}_A(1-\hat{p}_A)}{N_A}+ \frac{\hat{p}_B(1-\hat{p}_B)}{N_B}\right) } {\left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right)} \left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right) $$We can call the term:
$$ \frac{ \left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right) } {\left( \frac{\hat{p}_A(1-\hat{p}_A)}{N_A}+ \frac{\hat{p}_B(1-\hat{p}_B)}{N_B}\right)} $$The effective sample size. So if we assign $N_A = N, N_B = rN_A$, where $0 < r \le 1$, (so in the equal case $r=1$), then:
$$ \frac{ \left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right) } {\left( \frac{\hat{p}_A(1-\hat{p}_A)}{N}+ \frac{\hat{p}_B(1-\hat{p}_B)}{rN}\right)} $$$$ N \frac{ \left(\hat{p}_A(1-\hat{p}_A)+ \hat{p}_B(1-\hat{p}_B)\right) } {\left( \hat{p}_A(1-\hat{p}_A) + \frac{\hat{p}_B(1-\hat{p}_B)}{r}\right)} $$Notice that by symmetry, $r<1$ (consider if $r>1$, then we could rewrite the above $N_B = N, N_A = rN_B$, in which case $r < 1$ again). It is easy to see that for every $r < 1$, the effective sample size is smaller than $N$.
%pylab inline
Populating the interactive namespace from numpy and matplotlib
#tests
needed = lambda p_A, p_B: (1.65)**2*( p_A*(1-p_A) + p_B*(1-p_B) )/(p_A - p_B)**2
Var = lambda p,N: p*(1-p)/N
from scipy.stats import norm
figsize(12,6)
x = np.linspace(-0.5,.5, 1000)
p_A, p_B = 0.15,0.07
n = N(p_A, p_B)
print n
mu = (p_A - p_B)
var = np.sqrt( Var(p_A, n) + Var(p_B,n))
plot(x,norm.pdf(x, loc=mu, scale= var ))
print norm.cdf(0, loc=mu, scale= var )
81.930234375 0.0494714680336
n = 4000
p_A, p_B = 0.11,0.1
N = needed(p_A, p_B)*2.25
print N
11510.049375
from scipy.stats import beta
s = 0
for i in range(n):
c_A = np.random.binomial(1, p_A, size=N)
c_B = np.random.binomial(1, p_B, size=N)
s+= (((beta.rvs(1 + c_B.sum(),1 + N - c_B.sum(), size=5000 ) - beta.rvs(1 + c_A.sum(),1 + N - c_A.sum(), size=5000 )
) > 0 ).mean() > 0.95)
print 1.0*s/n
0.8095
for i in range(10):
v = np.random.random(size=2)
p_A = v.min()
p_B = v.max()
N = needed(p_A, p_B)*2.25
s = 0
for i in range(n):
c_A = np.random.binomial(1, p_A, size=N)
c_B = np.random.binomial(1, p_B, size=N)
s+= (((beta.rvs(1 + c_B.sum(),1 + N - c_B.sum(), size=5000 ) - beta.rvs(1 + c_A.sum(),1 + N - c_A.sum(), size=5000 )
) > 0 ).mean() > 0.95)
print "Power: %.2f"%(1.0*s/n), "p_A: %.2f, p_B %.2f"%(p_A,p_B), "N: %d"%int(N)
Power: 0.76 p_A: 0.10, p_B 0.44 N: 18 Power: 0.04 p_A: 0.16, p_B 0.92 N: 2 Power: 0.78 p_A: 0.27, p_B 0.52 N: 42 Power: 0.74 p_A: 0.42, p_B 0.78 N: 19 Power: 0.70 p_A: 0.18, p_B 0.58 N: 14 Power: 0.00 p_A: 0.02, p_B 0.72 N: 2 Power: 0.75 p_A: 0.18, p_B 0.55 N: 17 Power: 0.77 p_A: 0.22, p_B 0.47 N: 41 Power: 0.72 p_A: 0.37, p_B 0.74 N: 19 Power: 0.79 p_A: 0.89, p_B 0.97 N: 135
#effective sample size.
r = 0.5
N =