Behaviour of the Gaussian distribution with increasing dimension

This notebook requires the brand new Meijer integrator, so make sure you have the latest release of SymPy.

In [29]:
%pylab inline
import numpy as np
import matplotlib.pyplot as plt
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.
In [30]:
from sympy import *
mint = integrals.meijerint_definite
x, n, r = s.var('x,n,r')

We'd like to investigate how the cumulative Gaussian distribution (the integral of the probability density function or PDF) changes over increasing dimension. The approach here is to, for each dimension, plot the integral of the multivariate PDF over spheres of different radii.

The multivariate $n$-dimensional Gaussian PDF is

$$\mathcal{N}(\mathbf{x} | \mathbf{\mu}, \mathbf{\Sigma}) = \frac{1}{(2 \pi)^{n/2}} \frac{1}{|\mathbf{\Sigma}|^{1/2}} \exp \left( -\frac{1}{2}(\mathbf{x} - \mathbf{\mu})^\mathrm{T} \mathbf{\Sigma}^{-1} (\mathbf{x} - \mathbf{\mu}) \right).$$

Let's work with a standard normal distribution, so that $\mathbf{\Sigma}=I$ and $\mathbf{\mu}=\mathbf{0}$, then the expression above becomes

$$\mathcal{N}(\mathbf{x} | \mathbf{\mu}, I) = \frac{1}{(2 \pi)^{n/2}} \exp \left( -\frac{1}{2}\mathbf{x}^\mathrm{T} \mathbf{x} \right).$$

This can also be expressed in spherical coordinates as

$$p_n(r) = \frac{1}{(2 \pi)^{n/2}} e^{-1/2 r^2}.$$

Our aim is to plot the integral of $p_n(r)$ over different values of $r$, and for different dimensions, i.e.

$$P_n(x) = \int_0^x p_n(r) dV_r.$$

For an $n$-dimensional sphere (or hypersphere) the volume element is

$$dV_r = 2 \pi^{n/2} \frac{1}{ \Gamma\left(\frac{n}{2}\right) } r^{n-1} dr,$$

so that the full integral becomes

$$ P_n(x) = \frac{1}{2^{n/2 - 1}} \frac{1}{\Gamma\left(\frac{n}{2}\right)} \int_0^x e^{-1/2 r^2} r^{n-1} dr. $$
In [31]:
f, cond = mint(1/(2**(n/2 - 1)*gamma(n/2))*exp(-S(1)/2*r**2)*r**(n-1), r, 0, x)
In [32]:
R = np.linspace(0, 10)
for k in range(1, 10+1, 2):
    plt.plot(R, [f.subs(n, k).subs(x, xx).evalf() for xx in R], label='d = %d' % k)

t = plt.title(r'Cumulative Gaussian distribution (dimension n)', fontsize=16)
plt.ylabel('$P_n(r)$', fontsize=16)
plt.xlabel('$r$', fontsize=16);
In [47]: