Notebook

In [1]:

from math import *
assert matplotlib.__version__ == '1.3.0'
xkcd(scale=0.5); # get the humor sans font from https://github.com/shreyankg/xkcd-desktop/blob/master/Humor-Sans.ttf, install, then delete  ~/.matplotlib/fontList.cache
#from mpltools import style
#style.use('jfm')
rcParams['font.size'] = 22
rcParams['figure.figsize'] = (8,8)

Fisher's geometrical model¶

Geometric story¶

Two dimensions¶

Let's start with a simple case. There are two traits (dimensions) to an individual, $x$ and $y$. The optimal trait combination is $(0,0)$ and the distance of the current trait combination is $d/2$ - for example, the trait combination can be $(d/2,0)$.

A mutation occurs, changing the trait combination $(x,y)$ to some $(x',y')$, such that the distance between the previous and the new combinations is $r$. If the direction of the mutation is random, but its size ($r$) is positive and non-random, then what are the chances that the new trait combination is better then the previous?

To demonstrate this, look at the following diagram. $A=(d/2,0)$ is the current trait combination. The blue circle marks all the combintations that are $r$ away from $A$. The red circle marks all the combinations that are exactly as good as $A$ - they are all $d/2$ away from the optimum (at $O$). A beneficial mutation is such that occurs on the part of the blue circle that is inside the red circle. The probability that a mutation is beneficial $p$ is the ratio between the blue circle that lays inside the red circle.

Next I will draw a the diagram - the specific choice of $d/2$ and $r$ are just for the sake of making a good diagram.

In [2]:

d = 1.
r = 0.8 * d/2
x_high = d/2-r**2/d
x_low = d/2-r
y_low = -sqrt(r**2 * (1 - (r/d)**2))
y_high = -y_low

In [3]:

xlim((-0.7,1.2))
ylim((-0.95,0.95))
xlabel('trait 1')
ylabel('trait 2')
axhline(y=0, ls='--', color='gray')
axvline(x=0, ls='--', color='gray')

gca().add_artist(Circle((0,0), d/2, color='r', fill=False))
gca().add_artist(Circle((d/2,0), r, color='b', fill=False))
gca().add_artist(Polygon([[d/2,0],[x_high,y_low],[x_high,y_high]], color="green", fill=False))
gca().add_artist(Polygon([[-d/2, -d/2-r/2],[d/2, -d/2-r/2]], color="gray", fill=False))
gca().add_artist(Polygon([[d/2-r, -d/2-r/4],[d/2, -d/2-r/4]], color="gray", fill=False))
scatter([0, d/2, x_high, x_high, x_high],[0, 0, 0, y_high, -y_high])

text(-.07, .02, 'O')
text(d/2 + .02, 0.02, 'A')
text(x_high, y_high + .05, 'B')
text(x_high, y_low - .1, 'D')
text(x_high - .07, .02, 'C')
text(d/2-.075, .02, r'$\theta$')
text(d/4, -d/2 - r/2 + .12, r'$r$')
text(d/4, -d/2 - r/2 - .09, r'$d$')

text(d/2 + r/2, d/2 + .25, r'$AC=\frac{r^2}{d}$')
text(d/2 + r/2, d/2, r'$AB=r$')
text(d/2 + r/2, -d/2, r'$\cos{\theta}=\frac{AC}{AB}$')
text(d/2 + r/2, -d/2 - .25, r'$p=\frac{\theta}{\pi}$');

Explanations:¶

$AC$ is actually $d/2-x$, where $x$ is the x coordinate of the intersection of the circles:

$$ (x-d/2)^2 + y^2 = r^2 \\\\ x^2 + y^2 = (d/2)^2 \Rightarrow \\\\ x = d/2-\frac{r^2}{d} $$

and therefore $AC=\frac{r^2}{d}$. 2. $AB$ is the radius of the blue circle. 3. $\theta$ is the angle between $AC$ and $AB$. 5. $p$ is the ratio between the fraction of the blue circle that lies inside the red circle - $2\theta$ and the whole circle - $2\pi$.

The final formula for $p$ is (see also Rice 1990): $$ p = \frac{1}{\pi} \arccos{\frac{r}{d}} $$

In [4]:

ram_2D_p = lambda r,d: acos(r/d)/pi

In [5]:

d = 1.0
rs = linspace(0, d/2, 100)
ps_2D = [ram_2D_p(r,d) for r in rs]
plot(rs, ps_2D)
xlabel('mutational radius')
ylabel('probability of beneficial');

Three dimensions¶

When going from the two to the three dimensional we have two spheres, a red one centered at $(0,0,0)$ with radius $d/2$ and blue one centered at $(d/2,0,0)$ with radius $x$. Their intersection defines a plane which cuts the blue sphere to create a spherical cap. The area of this spherical cap has a very simple formula - $2\pi r (r-a)$ where $a$ is the distance from the center of the blue sphere to the intersection plane. This distance $a$ is $d/2-x$ from above and can be found much in the same way - it is euqal to $\frac{r^2}{d}$.

Because the entire area of the surface of the blue sphere is $4\pi r^2$, the ratio between the area of the spherical cap and the whole sphere, and therefore the probability that a mutation with length $r$ is beneficial, is: $$ p = \frac{2\pi r (r-\frac{r^2}{d})}{4\pi r^2} = \frac{1}{2}(1-\frac{r}{d}) $$ which is exactly what Fisher has in his book: $\frac{1}{2}(1-\frac{r}{d})$!.

In [5]:

fisher_3D_p = lambda r,d: (1-r/d)/2.0

In [7]:

ps_3D = [fisher_3D_p(r,d) for r in rs]
plot(rs, ps_2D)
plot(rs, ps_3D)
legend(['2D','3D'], loc=3)
xlabel('mutational radius')
ylabel('probability of beneficial');

N dimensions¶

In the case of N dimensions we are looking at two N-1 hyperspheres, but most of the other parameters remain the same.

Following (Li 2011), but slightly changed according to Wikipedia, the area of a hyperspherical cap is: $$ \frac{1}{2} A_n (r) r^{n-1} I_{(2rh-h^2)/r^2}(\frac{n-1}{2},\frac{1}{2}) $$ Again, $h$ is equal to $r-x$ where $x$ is the distance of the intersecting hyperplane from the origin, but there are two new functions here:

$A_n$ is the area of a hypersphere with radius 1
$I$ is the regularized incomplete beta function

Because we want the ratio of the area of the cap to the area of the entire hypersphere, we divide by $A_n r^{n-1}$ to get $$ \frac{1}{2} I_{(2rh-h^2)/r^2}(\frac{n-1}{2},\frac{1}{2}) $$.

Let's find $h$:

$$ h = r-\frac{d}{2}+x \\\\ x = \frac{d}{2}-\frac{r^2}{d} \Rightarrow \\\\ h= r(1-\frac{r}{d}) $$

We use $h$ to find the expression in the subscript of $I$: $$ \frac{(2rh-h^2)}{r^2} = 1-\frac{r}{d}^2 $$ So we get that the probability of beneficial mutation is: $$ p = \frac{1}{2} I_{1-(r/d)^2}(\frac{n-1}{2},\frac{1}{2}) $$ or in full form: $$ p = \frac{1}{2} \frac{\int\limits_0^{1-r^2/d^2}{t^{(n-1)/2}\sqrt{1-t} \, d t}}{\int\limits_0^{1}{t^{(n-1)/2}\sqrt{1-t} \, d t}} $$

Here is a comparison of the previous derived formulas with the general formula:

In [6]:

from scipy.special import betainc
full_p = lambda r,d,n: 0.5 * betainc((n - 1)/2.0, 0.5, 1 - (r/d)**2)

In [9]:

ps_full_2D = [full_p(r, d, 2) for r in rs]
plot(rs, ps_2D)
plot(rs, ps_full_2D, '--')
legend(['2D','General'], loc=3)
xlabel('mutational radius')
ylabel('probability of beneficial')
title("2D");

In [10]:

ps_full_3D = [full_p(r, d, 3) for r in rs]
plot(rs, ps_3D)
plot(rs, ps_full_3D, '--')
legend(['3D','General'], loc=3)
xlabel('mutational radius')
ylabel('probability of beneficial')
title("3D");

You can see that this new formula agrees perfectly with the derived formulas from above.

How does the probability change with the number of dimensions?

In [11]:

ns = range(2,1000)
rs = [d/100, d/20, d/10]
ps = [[full_p(r,d,n) for r in rs] for n in ns]
plot(ns, ps)
legend(rs, title="r/d", loc=5)
xlabel('dimensions')
ylabel('probability of beneficial');

In [12]:

ns = [2,3,4,5,50]
rs = linspace(0,1,500)
ps = [[full_p(r,d,n) for n in ns] for r in rs]
plot(rs, ps)
legend(ns, title="Dimensions", loc=1)
xlabel('Mutational Radius')
ylabel('probability of beneficial');

Relation to Rice 1990¶

Rice gives a different formula:

$$ \frac{\int_0^{\arccos{\frac{r}{d}}}{\sin^{n-2}({\theta}) d\theta}}{\int_0^\pi{\sin^{n-2}({\theta}) d\theta}} $$

How does this relate to our result? Let's start with the numinator. The integral of the power of sine can be expressed using the hypergeometric function $_2F_1$:

$$ \int{\sin^{n-2}{\theta} d\theta} = -\cos{\theta} \; _2F_1[\frac{1}{2}, \frac{3-n}{2}, \frac{3}{2}, \cos^2{\theta}] $$

The hypergeometric function is related to the beta function by $Beta_x(p,q) = x^p \frac{1}{p} \; _2F_1[p,1-q,p+1,x]$. Therefore,

$$ \int{\sin^{n-2}{\theta} d\theta} = -\cos{\theta} \frac{1}{2} \frac{1}{\cos{\theta}} B_{\cos^2{\theta}}(\frac{1}{2},\frac{n-1}{2}) = - \frac{1}{2} B_{\cos^2{\theta}}(\frac{1}{2},\frac{n-1}{2}) $$

Calculating the definite integral from $0$ to $\arccos{\frac{r}{d}}$, as in Rice's formula: $$ \int_0^{\arccos{\frac{r}{d}}}{\sin^{n-2}{\theta} d\theta} =

\frac{1}{2} B_{r^2/d^2}(\frac{1}{2},\frac{n-1}{2})

\frac{1}{2} B_{1}(\frac{1}{2},\frac{n-1}{2})

$$ By definition, $I_x(a,b) = \frac{B_x(a,b)}{B(a,b)}$ and $B_1(a,b)=B(a,b)$, therefore: $$

= \frac{1}{2} ( B(\frac{1}{2},\frac{n-1}{2}) - B(\frac{1}{2},\frac{n-1}{2}) I_{r^2/d^2}(\frac{1}{2},\frac{n-1}{2} )) = \\ = \frac{1}{2} B(\frac{1}{2},\frac{n-1}{2}) ( 1 - I_{r^2/d^2}(\frac{1}{2},\frac{n-1}{2} )) = \\ = \frac{1}{2} B(\frac{1}{2},\frac{n-1}{2}) I_{1 - r^2/d^2}(\frac{n-1}{2}, \frac{1}{2} ) $$

Now for the denominator. We will use the relation between Wallis' integrals and the beta function - $W_n = \int_0^{\pi/2}{\sin^n{\theta} d\theta} = \frac{1}{2} Beta(\frac{n+1}{2}, \frac{1}{2})$. Also, remember that sin is symmetric around $\frac{\pi}{2}$ and that the beta function is symmetric with respect to its two arguments ($B(a,b)=B(b,a)$).

$$ \int_o^\pi{\sin^{n-2}{\theta} d\theta} = 2 W_{n-2} = Beta(\frac{1}{2},\frac{n-1}{2}) $$

Combining the results for the numerator and denominator, we get an equality between Rice's formula and our own: $$ \frac{\int_0^{\arccos{\frac{r}{d}}}{\sin^{n-2}{\theta} d\theta}}{\int_o^\pi{\sin^{n-2}{\theta} d\theta}} = \frac{\frac{1}{2} B(\frac{1}{2},\frac{n-1}{2}) I_{1 - r^2/d^2}(\frac{n-1}{2}, \frac{1}{2} )}{Beta(\frac{1}{2},\frac{n-1}{2})} = \frac{1}{2} I_{1 - r^2/d^2}(\frac{n-1}{2}, \frac{1}{2} ) $$

Numerical simulation¶

The following is a simple algorithm (Muller 1959) for picking random points on the surface of a hypersphere (why other ways are not appropriate):

In [13]:

from scipy.linalg import norm
from random import gauss

def random_vector(n, r):
    vec = [gauss(0, 1) for i in range(n)]
    mag = norm(vec)
    return array([r * x/mag for x in vec])

In [14]:

def is_beneficial(n, d, r):
    z = zeros(n)
    z[0] = d/2
    v = random_vector(n,r)
    return norm(z + v) < d/2

In [15]:

def empirical_p(n, d, r, replicates=1000):
    return mean([is_beneficial(n, d, r) for _ in range(replicates)])

In [16]:

d = 1.0
N = 10000
rs = linspace(0,1,15)
ns = [2,5,100]

In [17]:

empirical_r = [[empirical_p(n, d, r, N) for n in ns] for r in rs]

In [18]:

full_p_r = [[full_p(r, d, n) for n in ns] for r in rs]

In [19]:

plot(rs, full_p_r)
plot(rs, empirical_r, 'ok')
legend(["%dD" % n for n in ns], title="Dimensions", loc=1)
xlabel('Mutational Radius')
ylabel('Probability of beneficial');

In [20]:

N = 10000
d = 1.0
ns = range(2, 1001, 100)
rs = [d/100, d/20, d/10]

In [21]:

empirical_n = [[empirical_p(n, d, r, N) for r in rs] for n in ns]

In [22]:

full_p_n = [[full_p(r, d, n) for r in rs] for n in ns]

In [23]:

plot(ns, full_p_n)
plot(ns, empirical_n, 'ok')
legend(rs, title="r/d", loc=1)
xlabel('dimensions')
ylabel('probability of beneficial');

Comparison with "The Genetical Theory of Natural Selection"¶

The following is Figure 3 from page 40 of Fisher's book [@Fisher1930]:

This expression for the probability of beneficial is equal to the complement of the Cumulative Distribution Function of a standard normal distribution: $$ 1-\Phi(\sqrt{n} \frac{r}{d}) = \frac{1}{\sqrt{2 \pi}} \int_{\sqrt{n} \frac{r}{d}}^{\infty}{e^{-\frac{1}{2}t^2}} $$ So one can interpret this as a the probability of getting a value higher than $\sqrt{n} \frac{r}{d}$ with a standard normal r.v. or getting a value larger than $\frac{r}{d}$ with a normal r.v. with variance $1/n$, the reciprocal of the number of dimensions.

I reproduce the figure here:

In [7]:

from sympy import simplify, symbols, integrate, exp, pi, oo, sqrt, limit, gamma
from IPython.display import display
from sympy.interactive import printing
printing.init_printing(use_latex=True)

In [8]:

x,t = symbols('x t')
fisher_integral = (1 / sqrt(2 * pi)) * integrate(exp(-(t**2) / 2),(t,x,oo))
fisher_integral = simplify(fisher_integral)
xs = linspace(0,3.5,50)
fisher_ps = [fisher_integral.subs(x,y).evalf() for y in xs]
fisher_integral

Out[8]:

$$- \frac{1}{2} \operatorname{erf}{\left (\frac{1}{2} \sqrt{2} x \right )} + \frac{1}{2}$$

In [27]:

figsize(10,5)
plt.plot(xs,fisher_ps,'k');
xlim((0,3.2))
ylim((0,0.5))
ylabel("probability of improvement")
xlabel(u"magnitude of change");

Next, I reproduce the figure with my function:

In [9]:

n = 7
d = 1.0
max_x = 3.2
full_p_x = lambda x,n: 0.5 * betainc((n-1)/2., 0.5, 1 - x**2/n)
full_ps_x = [full_p_x(x, n) for x in xs]

In [10]:

figsize(10,5)
plt.plot(xs, fisher_ps,'k',label='large D');
plt.plot(xs, full_ps_x,'ob',label='7D');
xlim((0,max_x))
ylim((0,0.5))
ylabel("probability of improvement")
xlabel(u"magnitude of change")
legend();

Probability theory story¶

Define $Z_i \sim Normal(0,1)$ for $i\in {1,...,n}$ and $X_i = \frac{rZ_i}{||Z||}$ where $Z=(Z_1,...,Z_n)$ and $||\cdot||$ is the Eucleadian norm ($L_2$ norm). This means that $||X||^2 = r^2$ as required.

Now, we want to find the probability that the distance to the origin after the mutation is smaller than before. The point before the mutation is $(d/2,0,...,0)$ and after the mutation it is $(d/2+X_1, X_2, ..., X_n)$.

So we want to find:

$$ p = P(||(d/2+X_1, X_2, ..., X_n)|| < \frac{d}{2}) = \\\\ P(d/2 + X_1)^2 + X_2^2 + ... + X_n^2 < \frac{d^2}{4}) = \\\\ P(\frac{d^2}{4} + dX_1 + X_1^2 + X_2^2 + ... + X_n^2 < \frac{d^2}{4}) = \\\\ P(dX_1 + X_1^2 + X_2^2 + ... + X_n^2 < 0) = \\\\ P(dX_1 + r^2 < 0) = \\\\ P(\frac{rZ_1}{||Z||} < -\frac{r^2}{d}) = \\\\ P(\frac{Z_1}{||Z||} < -\frac{r}{d}) = \\\\ P(\frac{Z_1}{||Z||} < -\frac{r}{d}) = \\\\ P(\frac{Z_1^2}{||Z||^2} > \frac{r^2}{d^2} \; and \; Z_1 < 0 ) \Rightarrow \\\\ p = P(Z_1 < 0 ) P(\frac{Z_1^2}{||Z||^2} > \frac{r^2}{d^2}) $$

The actual size of the mutational step in the 1st coordinate is $\frac{r Z_1}{||Z||}$, so $\frac{Z_1}{||Z||}$ can be interpreted as the ratio between the change in the correct direction and the total change in trait space.

This leads to the interpretation that the probability for beneficial mutation $p$ is equal to the probability that the ratio of change in the correct direction $rZ_1/||Z||$ to the total change $r$ is larger than the ratio of the total change $r$ to the distance from the optimum $d$, given that the mutational step was in the correct direction ($Z_1<0$).

Denote $A = Z_i^2$ and $B=Z_2^2+...+Z_n^2$. The sum of squares of normal r.v. has a chi-squared distribution:

$$ A \sim \chi^2(1) \\\\ B \sim \chi^2(n-1) \\\\ A+B = ||Z||^2 \Rightarrow \\\\ \frac{Z_1^2}{||Z||^2} = \frac{A}{A+B} $$

Also, the chi-square distribution is related to the Beta distribution: $$ \frac{A}{A+B} \sim Beta(\frac{1}{2},\frac{n-1}{2}) $$

Therefore, because the cdf of a Beta rv $X\sim Beta(a,b)$ is $P(X<x) = I_x(a,b)$, we get:

$$ p = P(Z_1 < 0 ) P(\frac{Z_1^2}{||Z||^2} > \frac{r^2}{d^2}) = \frac{1}{2} (1 - I_{\frac{r^2}{d^2}}(\frac{1}{2},\frac{n-1}{2})) $$

which is the same as the result we got through the geometric argument.

Simulation validation¶

I draw random vectors on the hypersphere with n dimensions and radius r and estimate the probability that their first element is larger then $\frac{r^2}{d}$, and compare that to my analytic result for $p$.

In [31]:

def empirical_p(n,d,r,N):
    v = [random_vector(n,r) for _ in range(N)]
    return sum([x[0] > r**2/d for x in v])/float(N)

In [32]:

d = 1.0
N = 10000
rs = linspace(0,1,25)
ns = [2,5,100]
empirical_r = [[empirical_p(n, d, r, N) for n in ns] for r in rs]
full_p_r = [[full_p(r, d, n) for n in ns] for r in rs]

In [33]:

lines = plot(rs, full_p_r)
points = plot(rs, empirical_r, 'o')
for i,l in enumerate(lines):
    points[i].set_color(l.get_color())
legend(["%dD" % n for n in ns], title="Dimensions", loc=1)
xlabel('Mutational Radius')
ylabel('Probability of beneficial');

Normal approximation to the Beta distribution¶

We developed here the formula:

$$ p = \frac{1}{2} (1 - I_{r^2/d^2}(\frac{1}{2}, \frac{n-1}{2})) = \frac{1}{2} P(B>\frac{r^2}{d^2}) \\\\ B \sim Beta(\frac{1}{2}, \frac{n-1}{2}) $$

How does this relate to Fisher's result, $p = 1-\Phi(\sqrt{n}\frac{r}{d})$?

We use the identity $I_x(a,a) = \frac{1}{2} I_{4x(1-x)}(a,\frac{1}{2}) = \frac{1}{2}(1- I_{1-4x(1-x)}(\frac{1}{2},a))$ for $0\le x \le \frac{1}{2}$.

What is $x$ in terms of $r$ and $d$?

$$ 1-4x(1-x) = \frac{r^2}{d^2} \Rightarrow 1-4x+4x^2-\frac{r^2}{d^2}=0 \Rightarrow \\\\ 4x^2 - 4x +1-\frac{r^2}{d^2} = 0 \Rightarrow \\\\ x = \frac{4 \pm \sqrt{16 - 16(1-\frac{r^2}{d^2})}}{8} \Rightarrow \\\\ x = \frac{4 \pm 4\frac{r}{d}}{8} = \frac{1}{2} (1 \pm \frac{r}{d}) $$

Since the identity requires that $0 \le x \le \frac{1}{2}$, we take $x = \frac{1}{2}(1-\frac{r}{d})$.

So, we got that $$ p = \frac{1}{2} (1 - I_{\frac{r^2}{d^2}}(\frac{1}{2},\frac{n-1}{2})) = I_{\frac{1}{2}(1-\frac{r}{d})}(\frac{n-1}{2},\frac{n-1}{2}) $$

Or that $p=P(D<\frac{1}{2}(1-\frac{r}{d}))$ where $D \sim Beta(\frac{n-1}{2},\frac{n-1}{2})$.

Now, a beta distribution can be approximated by a normal distribution if the parameters are roughly equal and large enough. In these cases, the appropriate Normal distribution has the same mean and variance - when the parameters are equal (denote their value by $a$), these are $\frac{1}{2}$ and $\frac{1}{8a+4}$:

$$ E[D] = \frac{1}{2} \\\\ V[D] = \frac{1}{8\frac{n-1}{2}+4} = \frac{1}{4n} $$

So our approximation r.v. is $Q=Normal(\frac{1}{2}, \frac{1}{4n})$, and we approximate by:

$$ p = P(D<\frac{1}{2}(1-\frac{r}{d}) \approx P(Q<\frac{1}{2}(1-\frac{r}{d})) = \\\\ \Phi( \frac{\frac{1}{2}(1-\frac{r}{d}) - E[Q]}{SD[Q]}) = \Phi( \frac{\frac{1}{2}(1-\frac{r}{d}) - \frac{1}{2}}{\sqrt{\frac{1}{4n}}}) = \\\\ \Phi( \frac{\frac{1}{2}-\frac{r}{2d} - \frac{1}{2}}{\frac{1}{2\sqrt{n}}}) = \Phi( -\sqrt{n} \frac{r}{d} ) = \\\\ 1 - \Phi( \sqrt{n} \frac{r}{d} ) $$

This is the result that Fisher presented in his book in 1930.

License¶

This is currently under a copyright license to the author Yoav Ram. When I finish working on this I will probably change the license but not yet.

Info¶

The notebook can be viewed here and downloaded here.