Supporting material for
an article published in Theoretical Population Biology.
This IPython notebook is provided for reproduction of Figure 2 of the article. It can be viewed by copying its URL to nbviewer and it can be run by opening it in binder. The notebook also contains functions for deterministic and stochastic calculation of different quantities in FGM.
We start by loading required modules and extensions:
# This will probably produce a warning. That's OK.
%pylab inline
Populating the interactive namespace from numpy and matplotlib
WARNING: pylab import has clobbered these variables: ['draw_if_interactive', 'new_figure_manager'] `%matplotlib` prevents importing * from pylab and numpy
import scipy.stats as st
import scipy.special as sp
from scipy.integrate import quad as integrate
rcParams['figure.figsize'] = 8,6
# http://nbviewer.ipython.org/github/rossant/ipycache/blob/master/examples/example.ipynb
!pip install ipycache
Requirement already satisfied (use --upgrade to upgrade): ipycache in d:\workspace\fgmprob\venv\lib\site-packages
%load_ext ipycache
The ipycache extension is already loaded. To reload it, use: %reload_ext ipycache
Define model parameters:
d = 1.0
r = 0.1
n = 10
A = zeros(n)
A[0] = d/2
assert allclose(norm(A), d/2)
reps = 1000
draw
is used to draw a random mutation in Fisher's geometric model, i.e., a random vector of magnitude r
.
def draw(d,r,n):
Z = normal(0,1,n)
X = r * Z / norm(Z)
assert allclose(r, norm(X))
return X
X = draw(d,r,n)
print X[0]/r < -r/d
print norm(X + A) < norm(A)
False False
Xs = array([draw(d,r,n) for _ in range(reps)])
assert [X[0]/r < -r/d for X in Xs] == [norm(X + A) < norm(A) for X in Xs]
We with the basic result, first presented by Fisher, of the probability of improvement in FGM.
We estimate this probability by drawing many mutations and calculating the fraction of mutations that are beneficial.
def simulate_pben(d,r,n):
A = zeros(n)
A[0] = d/2
assert allclose(norm(A), d/2)
Xs = array([draw(d,r,n) for _ in range(reps)])
ps = [(norm(X + A) < norm(A)) for X in Xs]
mean_p = mean(ps)
std_p = std(ps, ddof=1)/sqrt(float(reps))
return mean_p, std_p
simulate_pben(d,r,n)
(0.38600000000000001, 0.015402637476784374)
This is $p_n^*$ from our article, first shown in (Kimura 1983) (see main text for references):
def kimura_pben(d,r,n): # Kimura
return 0.5 * sp.betainc((n-1)/2., 0.5, 1-r**2/d**2)
kimura_pben(d,r,n)
0.3849374999460683
This is Fisher's original result from (Fisher 1930):
def fisher_pben(d,r,n):
return 1-st.norm.cdf(sqrt(n)*r/d)
fisher_pben(d,r,n)
0.37591481702292462
This is $p^{\#}_{n}$, from (Hartl & Taubes 1996; Rice 1990):
def hartl_pben(d,r,n): # Hartl & Taubes 1996, eq. 1
numer = integrate(lambda x: sin(x)**(n-2), 0, arccos(r/d))[0]
denom = integrate(lambda x: sin(x)**(n-2), 0, pi)[0]
return numer/denom
hartl_pben(d,r,n)
0.38493749994606824
We now do the calculations and plot a figure, which is not used in the main text:
%%cache pben_emp.pkl nrange pben_emp
nrange = range(2,102,5)
pben_emp = array([simulate_pben(d,r,n) for n in nrange])
[Skipped the cell's code and loaded variables nrange, pben_emp from file 'D:\workspace\FGMProb\pben_emp.pkl'.]
pben_kimura = [kimura_pben(d,r,n) for n in nrange]
pben_fisher = [fisher_pben(d,r,n) for n in nrange]
pben_hartl = [hartl_pben(d,r,n) for n in nrange]
n = 10
errorbar(nrange, pben_emp[:,0], yerr=pben_emp[:,1], fmt='.k', capsize=10, label='Empricial')
plot(nrange, pben_kimura, '-k', lw=2, label='Theoretical')
plot(nrange, pben_fisher, '--r', label='Fisher')
plot(nrange, pben_hartl, '-b', label='Hartl')
xlim(-5,105)
xticks([1]+range(20,101,20))
yticks(arange(0.15,0.5,0.1))
xlabel("$n$", fontsize=16)
ylabel(r"$P(|A+X|<|A|)$", fontsize=16)
title("Probability of a beneficial mutation")
legend(fontsize=16);
Next, we evaluate the expected improvement towards the optimum in FGM. Functions denoted by TPB
refer to equations derived in the main text.
def simulate_X1(d,r,n):
A = zeros(n)
A[0] = d/2
assert allclose(norm(A), d/2)
p = kimura_pben(d,r,n)
Xs = array([draw(d,r,n) for _ in range(int(reps/p))])
X1 = [abs(X[0]) for X in Xs if (norm(A + X) < norm(A))]
mean_X1 = mean(X1)
std_X1 = std(X1, ddof=1)/sqrt(len(X1))
return mean_X1, std_X1
simulate_X1(d,r,n)
(0.033368079181765507, 0.00050703916553601364)
def TPB_X1(d,r,n):
r2d2 = (r**2) / (d**2)
n_min_1_by_2 = 0.5 * (n-1)
numer = 2 * r * (1 - r2d2)**n_min_1_by_2
denom = (n - 1) * sp.betainc(n_min_1_by_2, 0.5, 1 - r2d2) * sp.beta(n_min_1_by_2, 0.5)
return numer/denom
TPB_X1(d,r,n)
0.032115722558925502
%%cache X1_emp.pkl X1_emp
X1_emp = array([simulate_X1(d,r,n) for n in nrange])
[Skipped the cell's code and loaded variables X1_emp from file 'D:\workspace\FGMProb\X1_emp.pkl'.]
X1_TPB = array([TPB_X1(d,r,n) for n in nrange])
n = 10
errorbar(nrange, X1_emp[:,0], yerr=X1_emp[:,1], fmt='.k', capsize=10, label='empricial')
plot(nrange, X1_TPB, '--k', lw=2, label='Theoretical')
xlim(-5,105)
xticks([1]+range(20,101,20))
xlabel("$n$", fontsize=16)
ylabel(r"$E[|X_1| | |A+X|<|A|]$", fontsize=16)
title("Expected effect towards optimum in beneficial mutations")
legend(fontsize=16);
The following quantity does not appear at all in the meain text, but rather in (Hartl and Taubes 1996). Here $\Delta$ is the change in phenotypic distance to the optimal phenotype.
d,r,n = 1., 0.1, 10
def simulate_deltad(d,r,n):
A = zeros(n)
A[0] = d/2
assert allclose(norm(A), d/2)
p = kimura_pben(d,r,n)
Xs = array([draw(d,r,n) for _ in range(int(reps/p))])
deltad = [abs(norm(A+X) - norm(A)) for X in Xs if (norm(A + X) < norm(A))]
mean_deltad = mean(deltad)
std_deltad = std(deltad, ddof=1)/sqrt(len(deltad))
return mean_deltad, std_deltad
simulate_deltad(d,r,n)
(0.022234901179445685, 0.00052840418560986993)
def hartl_deltad(d,r,n): # Hartl & Taubes 1996, eq. 3
delta = lambda theta: d/2 - sqrt(0.25 * d**2 + r**2 - r * d * cos(theta))
numer = integrate(lambda x: delta(x) * sin(x)**(n-2), 0, arccos(r/d))[0]
denom = integrate(lambda x: sin(x)**(n-2), 0, arccos(r/d))[0]
return numer/denom
hartl_deltad(d,r,n)
0.0229329880606476
%%cache deltad_emp.pkl deltad_emp
deltad_emp = array([simulate_deltad(d,r,n) for n in nrange])
[Skipped the cell's code and loaded variables deltad_emp from file 'D:\workspace\FGMProb\deltad_emp.pkl'.]
deltad_hartl = array([hartl_deltad(d,r,n) for n in nrange])
n = 10
errorbar(nrange, deltad_emp[:,0], yerr=deltad_emp[:,1], fmt='.k', capsize=10, label='empricial')
plot(nrange, deltad_hartl, '-b', label='Hartl')
xlim(-5,105)
#ylim(0.1, 0.5)
xticks([1]+range(20,101,20))
#yticks(arange(0.15,0.5,0.1))
xlabel("$n$", fontsize=16)
ylabel(r"$E[|A+X|-|A| | |A+X|<|A|]$", fontsize=16)
title("Expected change in distance in beneficial mutations")
legend(fontsize=16);
Here we analyze the expected change in fitness in beneficial mutations. Fitness is defined by: $$\omega(A) = 1 - ||A||^2$$.
def fitness(A):
return 1 - norm(A)**2
assert fitness([d/2,0,0]) == 1 - d**2/4
assert fitness(zeros(n)) == 1
d,r,n = 1., 0.1, 10
def simulate_s(d,r,n):
A = zeros(n)
A[0] = d/2
assert allclose(norm(A), d/2)
fitness_A = fitness(A)
p = kimura_pben(d,r,n)
Xs = array([draw(d,r,n) for _ in range(int(reps/p))])
s = array([fitness(X + A)/fitness_A - 1 for X in Xs])
s = s[s > 0]
mean_s = mean(s)
std_s = std(s, ddof=1)/sqrt(len(s))
return mean_s, std_s
simulate_s(d,r,n)
(0.030619426578933063, 0.00070996698649594962)
This is equation (10) from the main text:
def TPB_s(d,r,n):
r2d2 = (r**2) / (d**2)
n1by2 = 0.5 * (n-1)
s = 2 * d * r * (1 - r2d2)**n1by2
s /= (n - 1) * (1 - 0.25 * d**2) * sp.betainc(n1by2, 0.5, 1 - r2d2) * sp.beta(n1by2, 0.5)
s -= r**2 / (1 - 0.25 * d**2)
return s
TPB_s(d,r,n)
0.029487630078567334
The same value, calculated from $\Delta$ (Hartl & Taubes 1996):
def hartl_s(d,r,n):
deltad = hartl_deltad(d,r,n)
s = fitness(d/2 - deltad)/fitness(d/2) - 1
return s
hartl_s(d,r,n)
0.029876088159010328
We plot the results, once for $n$, number of traits, and once for $r$, the mutation size ($d$ is fixed at 1).
%%cache s_emp.pkl reps s_emp
reps = 500
s_emp = array([simulate_s(d,r,n) for n in nrange])
[Skipped the cell's code and loaded variables reps, s_emp from file 'D:\workspace\FGMProb\s_emp.pkl'.]
s_TPB= array([TPB_s(d,r,n) for n in nrange])
s_hartl = array([hartl_s(d,r,n) for n in nrange])
n = 10
errorbar(nrange, s_emp[:,0], yerr=s_emp[:,1], fmt='ok', capsize=10, label='Empricial')
plot(nrange, s_TPB, '--k', lw=2, label='Theoretical')
plot(nrange, s_hartl, '-b', label='Hartl')
xlim(-5,105)
ylim(0, 0.09)
xticks([1]+range(20,101,20))
yticks(arange(0.02,0.1,0.02))
xlabel("$n$", fontsize=16)
ylabel(r"$E[s | s>0]$", fontsize=16)
title("Expected selection coefficient in beneficial mutations")
legend(fontsize=16);
n = 10
rrange = linspace(0.001,0.5,25)
s_TPB_r = array([TPB_s(d,r,n) for r in rrange])
%%cache s_emp_r.pkl reps s_emp_r
reps = 500
s_emp_r = array([simulate_s(d,r,n) for r in rrange])
[Skipped the cell's code and loaded variables reps, s_emp_r from file 'D:\workspace\FGMProb\s_emp_r.pkl'.]
errorbar(rrange, s_emp_r[:,0], yerr=s_emp_r[:,1], fmt='ok', capsize=10, label='Empricial')
plot(rrange, s_TPB_r, '--k', lw=2, label='Theoretical')
xlim(-0.01,0.51)
ylim(0, 0.09)
#xticks([1]+range(20,101,20))
yticks(arange(0.02,0.1,0.02))
xlabel("$r$", fontsize=16)
ylabel(r"$E[s | s>0]$", fontsize=16)
title("Expected selection coefficient in beneficial mutations")
legend(loc=2,fontsize=16);
The main result of this notebook, the probability that a mutation (a) is beneficial, and (b) avoids extinction.
n = 10
d = 1.
r = 0.1
This is a Wright-Fisher simulation. A single mutation appears in the population, and if it is beneficial, the evolutionary process proceeds until the mutation either reaches majority (f > 0.5
) or goes to extinction (f < 1/N
). $N$ is the variance effective population size. The simulation includes both selection:
$$
f' = f / \bar{\omega} \\\\
\bar{\omega} = f + (1-f)(1-s),
$$
and drift:
$$
n \sim Bin(N,f) \\\\
f'' = n/N.
$$
def simulation(d,r,n,N = 1e6):
N = float(N)
A = zeros(n)
A[0] = d/2
X = draw(d,r,n)
s = fitness(A + X)/fitness(A) - 1
if s < 0:
return False
f = 1/N
while True:
assert 0 <= f <= 1
if f < 1/N: return False
elif f > 0.5: return True
f = f
f /= 1 - s * (1 - f)
f = binomial(N,f)/N
simulation(d,r,n)
False
This function estimates the fixation probability by running many simulations:
def simulate_pfix(d,r,n,reps=1000):
fixes = array([simulation(d,r,n) for _ in range(reps)])
return mean(fixes), std(fixes) / sqrt(reps)
simulate_pfix(d,r,n)
(0.031, 0.0054807846153630381)
This is equation (6) from the main text:
def TPB_pfix(d,r,n):
s = TPB_s(d,r,n)
p = kimura_pben(d,r,n)
return 2*s*p
TPB_pfix(d,r,n)
0.02270178920355639
rrange = linspace(0.001,0.5,25)
pfix_TPB = array([TPB_pfix(d,r,n) for r in rrange])
%%cache pfix_emp.pkl pfix_emp reps
reps = 100000
pfix_emp = array([simulate_pfix(d,r,n,reps=reps) for r in rrange])
[Saved variables pfix_emp, reps to file 'D:\workspace\FGMProb\pfix_emp.pkl'.]
n = 10
r = 0.1
Here we plot Figure 2, the fixation probability as a function of $r/d$:
fig, ax = subplots(1, 1, figsize=(10,8))
ax.errorbar(rrange, pfix_emp[:,0], yerr=pfix_emp[:,1], fmt='.k', capsize=5, label='Empricial')
ax.plot(rrange, pfix_TPB, '-k', lw=2, label='Theoretical')
ax.set_xlabel("Relative mutation size - $r/d$", fontsize=24)
ax.set_ylabel(r"Fixation probability - $p_{fix}$", fontsize=24)
ax.set_xlim(-0.01, 0.51)
ax.set_xticks(arange(0, 0.51, 0.1))
ax.set_ylim(-0.001, 0.0311)
ax.set_yticks(arange(0, 0.031, 0.01))
ax.set_title("Fixation probability of beneficial mutations")
fig.tight_layout();
ax.set_title('')
ax.legend().set_visible(False)
fig.savefig("fig2.png")
fig.savefig("fig2.pdf", dpi=300, papertype='a4')