# Large Scale Structure Statistics¶

Material used to prepare my 2 Feb 2015 LSST DESC Dark Energy School lecture. You are welcome to modify and improve this material. Contact me at [email protected] with questions or corrections. This material is released under the MIT License.

Some of the data files used for the plots are created with external programs, using command lines provided below. There is also an accompanying Mathematica notebook used to generate of the files we read below.

## Initialization¶

In [1]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib

In [2]:
import math

In [3]:
import astropy.units as u
import astropy.constants as const

In [4]:
import scipy.interpolate
import scipy.fftpack

In [5]:
# yt is not included in anaconda but installs easily with "pip install yt".
# It is only used to make one plot below, so you skip this if you don't need that plot.
import yt


## Fiducial Cosmology¶

In [5]:
from astropy.cosmology import Planck13 as fiducial

In [6]:
h0 = fiducial.H0/(100*u.km/u.s/u.Mpc)
print 'Fiducial h0 =',h0

Fiducial h0 = 0.6777


The median redshift of the LSST "gold" weak-lensing sample (i < 25.3, SNR > 20 for point sources). Redshift errors in this sample should be $\sigma(z)/(1+z)$ = (1-2)% with 90% of galaxies better than 4% (from Section 4.1 of http://arxiv.org/abs/0805.2366).

Load the Planck13 power spectrum at z=1.2 which has sigma8 = 0.465 at z=1.2 and 0.835 at z=0. We are using the accompanying CAMB input file camb/Planck13.camb and running the July 2013 version of CAMB since that was the version used for the Planck13 likelihood calculations. The CAMB input file was generated using the CosmoTools mathematica package (using the accompanying camb/Planck13.nb Mathematica notebook) and would need some minor modifications to run with more recent versions of CAMB due to changes in how the neutrino sector is parameterized.

In [7]:
kvec,planck13_Pk = np.loadtxt('camb/Planck13_matterpower_1.dat').transpose()


Create an interpolating function that is linear in log(k),log(P(k)) and returns zero for k=0, where units are (h/Mpc):

In [8]:
pk_interpolator = scipy.interpolate.InterpolatedUnivariateSpline(np.log(kvec),np.log(planck13_Pk),k=1)

In [9]:
def pk_func(kvec_hMpc):
pk = np.zeros_like(kvec_hMpc)
nonzero = (kvec_hMpc>0)
pk[nonzero] = np.exp(pk_interpolator(np.log(kvec_hMpc[nonzero])))
# Units of P(k) are (h/Mpc)**2
return pk


Tabulate the k-space window function for evaluating sigma8:

In [10]:
kR = 8.*kvec
sigma8_window = (3/kR**3*(np.sin(kR)-kR*np.cos(kR)))**2*kvec**3/(2*np.pi**2)


Plot the sigma8 window function superimposed on $P(k)$ (not $k^3 P(k)/(2\pi^2)$):

In [11]:
fig = plt.figure(figsize=(6,4))
plt.fill_between(kvec,planck13_Pk,edgecolor='blue',lw=1,facecolor=(0.,0.,1.,0.25))
plt.xlim(kvec[0],kvec[-1])
plt.xlabel('Wavenumber k (h/Mpc)')
plt.ylabel('Power P(k) (h/Mpc)^3',color='blue')
plt.xscale('log')
ax2 = plt.gca().twinx()
ax2.fill_between(kvec,sigma8_window,edgecolor='red',lw=0.5,facecolor=(1.,0.,0.,0.25))
ax2.set_xlim(kvec[0],kvec[-1])
ax2.set_yticklabels([])
ax2.set_yticks([]);
ax2.set_ylabel('Sigma8 Window Function',color='red');
plt.savefig('plots/sigma8window.pdf')


## Alternative Cosmologies¶

Define some alternative power spectra tabulated on the same k-vector grid:

In [13]:
k0 = 0.05 # h/Mpc
r0 = 100 # Mpc/h
powerlaw_Pk = 1e6*kvec/(1+(kvec/k0)**3.5)
wiggles_Pk = powerlaw_Pk*(1 + np.sin(kvec*r0)*np.exp(-(kvec*r0)**2/1e4))
bump1_Pk = 4e4*np.exp(-np.log(kvec/k0)**2)
bump2_Pk = 4e4*np.exp(-np.log(kvec/(2*k0))**2)
noscale_Pk = kvec**-2

In [14]:
plt.plot(kvec/k0,planck13_Pk,'k-')
plt.plot(kvec/k0,powerlaw_Pk,'r-')
plt.plot(kvec/k0,wiggles_Pk,'b-')
plt.plot(kvec/k0,bump1_Pk,'g-')
plt.plot(kvec/k0,bump2_Pk,'g--')
plt.plot(kvec/k0,noscale_Pk,'magenta')
plt.xlim(kvec[0]/k0,kvec[-1]/k0)
plt.ylim(1e-3,1e5)
plt.xscale('log')
plt.yscale('log')
plt.xlabel('Wavenumber k/k0')
plt.ylabel('Power P(k) (h/Mpc)^3')
plt.grid();


Save these tabulated power spectra:

In [15]:
np.savetxt('alt/powerlaw_Pk.dat',np.vstack((kvec,powerlaw_Pk)).T)
np.savetxt('alt/wiggles_Pk.dat',np.vstack((kvec,wiggles_Pk)).T)
np.savetxt('alt/bump1_Pk.dat',np.vstack((kvec,bump1_Pk)).T)
np.savetxt('alt/bump2_Pk.dat',np.vstack((kvec,bump2_Pk)).T)
np.savetxt('alt/noscale_Pk.dat',np.vstack((kvec,noscale_Pk)).T)


Calculate the corresponding r-space correlation functions (times 2*pi^2) using the following command lines:

cosmotrans -i camb/Planck13_matterpower_1.dat -o camb/Planck13_xi.dat --min 1 --max 200
cosmotrans -i alt/powerlaw_Pk.dat -o alt/powerlaw_xi.dat --min 1 --max 200
cosmotrans -i alt/wiggles_Pk.dat -o alt/wiggles_xi.dat --min 1 --max 200
cosmotrans -i alt/bump1_Pk.dat -o alt/bump1_xi.dat --min 1 --max 200
cosmotrans -i alt/bump2_Pk.dat -o alt/bump2_xi.dat --min 1 --max 200
cosmotrans -i alt/noscale_Pk.dat -o alt/noscale_xi.dat --min 1 --max 200

The cosmotrans program is part of the cosmo package used by baofit.

Load and plot these correlation functions:

In [16]:
rvec,planck13_xi = np.loadtxt('camb/Planck13_xi.dat').transpose()
rvec,powerlaw_xi = np.loadtxt('alt/powerlaw_xi.dat').transpose()
rvec,wiggles_xi = np.loadtxt('alt/wiggles_xi.dat').transpose()
rvec,bump1_xi = np.loadtxt('alt/bump1_xi.dat').transpose()
rvec,bump2_xi = np.loadtxt('alt/bump2_xi.dat').transpose()
rvec,noscale_xi = np.loadtxt('alt/noscale_xi.dat').transpose()
planck13_xi /= 2*np.pi**2
powerlaw_xi /= 2*np.pi**2
wiggles_xi /= 2*np.pi**2
bump1_xi /= 2*np.pi**2
bump2_xi /= 2*np.pi**2
noscale_xi /= 2*np.pi**2

In [17]:
plt.plot(rvec*k0,rvec**2*planck13_xi,'k-')
plt.plot(rvec*k0,rvec**2*powerlaw_xi,'r-')
plt.plot(rvec*k0,rvec**2*wiggles_xi,'b-')
plt.plot(rvec*k0,rvec**2*bump1_xi,'g-')
plt.plot(rvec*k0,rvec**2*bump2_xi,'g--')
plt.plot(rvec*k0,rvec**2*noscale_xi,'magenta')
plt.xlim(0.,rvec[-1]*k0)
plt.ylim(-150.,500.)
plt.xlabel('Separation r*k0')
plt.ylabel('Correlation function r^2*xi(r) (Mpc/h)^2');


Run the following commands to generate (100 Mpc/h)^3 Gaussian random fields from these power spectra and save 2D slices:

cosmogrf --verbose --spacing 0.5 --nx 200 --load-power camb/Planck13_matterpower_1.dat --save-delta-slice camb/planck13_slice.dat
cosmogrf --verbose --spacing 0.5 --nx 200 --load-power alt/powerlaw_Pk.dat --save-delta-slice alt/powerlaw_slice.dat
cosmogrf --verbose --spacing 0.5 --nx 200 --load-power alt/wiggles_Pk.dat --save-delta-slice alt/wiggles_slice.dat
cosmogrf --verbose --spacing 0.5 --nx 200 --load-power alt/bump1_Pk.dat --save-delta-slice alt/bump1_slice.dat
cosmogrf --verbose --spacing 0.5 --nx 200 --load-power alt/bump2_Pk.dat --save-delta-slice alt/bump2_slice.dat
cosmogrf --verbose --spacing 0.5 --nx 200 --load-power alt/noscale_Pk.dat --save-delta-slice alt/noscale_slice.dat

The cosmogrf program is also in the cosmo package.

Load these realization slices:

In [18]:
planck13_slice = np.loadtxt('camb/planck13_slice.dat')
powerlaw_slice = np.loadtxt('alt/powerlaw_slice.dat')
wiggles_slice = np.loadtxt('alt/wiggles_slice.dat')
bump1_slice = np.loadtxt('alt/bump1_slice.dat')
bump2_slice = np.loadtxt('alt/bump2_slice.dat')
noscale_slice = np.loadtxt('alt/noscale_slice.dat')


Create images where students have to match P(k), xi(r) and a realization.

In [19]:
def plot_Pk(data,label):
plt.plot(kvec/k0,data,'k-')
visible = (1e-2 < kvec/k0) & (kvec/k0 < 1e2)
pmax = np.max(data[visible])
plt.xlim(1e-2,1e2)
plt.ylim(2e-4*pmax,2*pmax)
plt.xscale('log')
plt.yscale('log')
plt.xlabel('Wavenumber $k/k_{0}$')
plt.ylabel('Power $P(k)$')
plt.gca().get_yaxis().set_ticklabels([])
plt.grid()
plt.annotate(label,xy=(0.85,0.85),xycoords='axes fraction',fontsize='xx-large')

def plot_xi(data,label):
plt.plot(k0*rvec,rvec**2*data,'k-')
plt.xlim(0.,rvec[-1]*k0)
if np.min(data) >= 0:
plt.ylim(-0.05*np.max(rvec**2*data),None)
plt.xlabel('Separation $k_{0} r$')
plt.ylabel('Correlation $r^{2} \\xi(r)$')
plt.gca().get_yaxis().set_ticklabels([])
plt.gca().axhline(0., linestyle='--', color='k')
plt.annotate(label,xy=(0.85,0.85),xycoords='axes fraction',fontsize='xx-large')

# Display a 2D density field using a color map where 0 = white, >0 is red, <0 is blue.
# The input data is assumed to have a mean near zero.
def slice_plot(data,clip_percent=1.,**kwargs):
lo,hi = np.percentile(data,(clip_percent,100.-clip_percent))
assert lo < 0. and hi > 0.
cut = 0.5*(hi-lo)
plt.imshow(data,cmap='RdBu_r',vmin=lo,vmax=hi,**kwargs)

def plot_slice(data,label):
slice_plot(data,extent=(0.,5.,0.,5.))
plt.xlabel('$k_{0}x$')
plt.ylabel('$k_{0}y$')
plt.annotate(label,xy=(0.85,0.85),xycoords='axes fraction',fontsize='xx-large')

In [20]:
def match1(save_name=None):
fig = plt.figure(figsize=(7.5,7.5))
# planck13 P(k)
plt.subplot(3,3,1)
plot_Pk(planck13_Pk,'A')
# bump2 P(k)
plt.subplot(3,3,4)
plot_Pk(bump2_Pk,'B')
# bump1 P(k)
plt.subplot(3,3,7)
plot_Pk(bump1_Pk,'C')
# bump2 xi(r)
plt.subplot(3,3,2)
plot_xi(bump2_xi,'A')
# bump1 xi(r)
plt.subplot(3,3,5)
plot_xi(bump1_xi,'B')
# planck13 xi(r)
plt.subplot(3,3,8)
plot_xi(planck13_xi,'C')
# bump1 realization
plt.subplot(3,3,3)
plot_slice(bump1_slice,'A')
# planck13 realization
plt.subplot(3,3,6)
plot_slice(planck13_slice,'B')
# bump2 realization
plt.subplot(3,3,9)
plot_slice(bump2_slice,'C')
#
plt.tight_layout()
if save_name:
plt.savefig(save_name)
match1('plots/match1.pdf')

In [21]:
def match2(save_name=None):
fig = plt.figure(figsize=(7.5,7.5))
# noscale P(k)
plt.subplot(3,3,1)
plot_Pk(noscale_Pk,'A')
# powerlaw P(k)
plt.subplot(3,3,4)
plot_Pk(powerlaw_Pk,'B')
# wiggles P(k)
plt.subplot(3,3,7)
plot_Pk(wiggles_Pk,'C')
# powerlaw xi(r)
plt.subplot(3,3,2)
plot_xi(powerlaw_xi,'A')
# noscale xi(r)
plt.subplot(3,3,5)
plot_xi(noscale_xi,'B')
# wiggles xi(r)
plt.subplot(3,3,8)
plot_xi(wiggles_xi,'C')
# noscale realization
plt.subplot(3,3,3)
plot_slice(noscale_slice,'A')
# powerlaw realization
plt.subplot(3,3,6)
plot_slice(powerlaw_slice,'B')
# wiggles realization
plt.subplot(3,3,9)
plot_slice(wiggles_slice,'C')
#
plt.tight_layout()
if save_name:
plt.savefig(save_name)
match2('plots/match2.pdf')


## Gaussian Random Fields¶

The code below demonstrates how to generate a Gaussian random field $\delta(r)$ as a realization of an arbitrary power spectrum $P(k)$. The first step is to generate a random complex-valued $\delta(k)$ where the variance of $|\delta(k)|$ values within a small volume $dk^3$ is $P(k) dk^3/(2\pi)^3$ and the random phases of $\delta(k)$ are generated uniformly on a circle. The next step is to Fourier transform $\delta(k)$ to $\delta(r)$.

There are two factors of 2 to be aware of when using $d\sigma(k)^2 = P(k) dk^3/(2\pi)^3$, which happen to cancel in the code we are using here. First, since we sample Gaussians for both the real and imaginary parts of $\delta(k)$, rather than generating a Gaussian $|\delta(k)|$ and uniform phase, we should divide $d\sigma(k)^2$ by two. Second, since we do not impose the symmetries on $\delta(k)$ required to ensure that its Fourier transform $\delta(r)$ is real, we should multiply $d\sigma(k)^2$ by two (and we also get two independent realizations as the real and imaginary parts of the complex $\delta(r)$).

This code is primarily intended as a pedagogical demonstration, but is not very memory efficient since python does not allow us to reuse a single large array. The input spacing is in (Mpc/h).

In [13]:
def generate_field(spacing_Mpch=1.,num_grid=64,seed=2015):
dk = 2*np.pi/(spacing_Mpch*num_grid) # in h/Mpc
kvec = 2*np.pi*np.fft.fftfreq(num_grid,d=spacing_Mpch) # in h/Mpc
kx,ky,kz = np.meshgrid(kvec,kvec,kvec,sparse=True)
knorm = np.sqrt(kx**2 + ky**2 + kz**2)
del kx,ky,kz # We don't need these large arrays any more
print 'Sampling P(k) for k = (%.3f - %.3f) h/Mpc' % (np.min(knorm[knorm>0]),np.max(knorm))
# Calculate the variance in each k-space cell of the real and imaginary parts of the k-space delta field.
variance = pk_func(knorm)*dk**3/(8*np.pi**3)
del knorm # We don't need this large array any more
# Generate a complex-valued random number for each cell, with its real and imaginary parts
# sampled from independent zero-mean unit Gaussians, scaled to the correct variance.
# The extra factor of num_grid**3 is due to the normalization of scipy.fftpack inverse transforms.
generator = np.random.RandomState(seed)
delta_k = num_grid**3*np.sqrt(variance)*(generator.normal(size=(num_grid,num_grid,2*num_grid)).view(dtype=complex))
del variance # We don't need this large array any more
# Transform the k-space complex delta field to an r-space complex delta field.
delta_r = scipy.fftpack.ifftn(delta_k,overwrite_x=True).view(dtype=float).reshape(num_grid,num_grid,num_grid,2)
# Since we treat the k-space delta field as a general complex field, its inverse FT is not real.
# We could save some memory by building in the necessary k-space symmetries and using a special iFFT algorighm,
# but instead we return a pair of uncorrelated real fields (as views into the complex r-space field)
return delta_r[:,:,:,0],delta_r[:,:,:,1]

In [14]:
delta_r1,delta_r2 = generate_field(spacing_Mpch=8./15.,num_grid=256);

Sampling P(k) for k = (0.046 - 10.203) h/Mpc


Create a 2D slice from the 3D random field by averaging over a thin slab.

In [17]:
def create_slice(delta_r,num_zavg):
return np.sum(delta_r[:,:,:num_zavg],axis=-1)/num_zavg


Generate 2D slices with a thickness of $20\times (8/15) = (4/3) 8$ h/Mpc so that circles of radius 8 h/Mpc represent the same volume as spheres of radius 8 h/Mpc.

In [18]:
slice1 = create_slice(delta_r1,20)
slice2 = create_slice(delta_r2,20)


Compare with the output of the cosmogrf C++ Gaussian random field generator as a cross-check of our python implementation (8./15. = 0.533):

cosmogrf --verbose --spacing 0.533 --nx 256 --load-power camb/Planck13_matterpower_1.dat --save-delta-slice camb/tracer_slice1.dat --seed 1 --delta-slice-avg 20

Read 695 rows from camb/Planck13_matterpower_1.dat
Memory size = 64.5 Mb
Delta field mean = -3.05917e-11, variance = 5.21133, P(k) variance estimate is 5.05447

Generate a second independent realization with:

cosmogrf --verbose --spacing 0.533 --nx 256 --load-power camb/Planck13_matterpower_1.dat --save-delta-slice camb/tracer_slice2.dat --seed 2 --delta-slice-avg 20

Note that these slice cover a different volume than the ones generated above for the match1 & match2 exercises.

Load the generated slices and compare their variance with the python-generated slices (looks good!):

In [19]:
tracer_slice1 = np.loadtxt('camb/tracer_slice1.dat')
tracer_slice2 = np.loadtxt('camb/tracer_slice2.dat')

In [20]:
np.var(slice1),np.var(slice2),np.var(tracer_slice1),np.var(tracer_slice2)

Out[20]:
(1.1099509623398982,
1.0490263502410166,
1.0305633113734194,
0.95044534605442055)

Sample an r-space density field slice to generate random tracer positions in 2D:

In [21]:
def sample_tracers(slice_data,num_tracers,spacing,bias=1.5,seed=2015):
print 'Using bias = %.3f' % bias
nx,ny = slice_data.shape
print 'mean tracer density = %.5f / (units of spacing)**2' % (float(num_tracers)/(nx*ny*spacing))
# Set the overall mean number of tracers averaged over all cells.
overall_mean = float(num_tracers)/slice_data.size
# Calculate the mean number of tracers in each cell.
cell_mean = overall_mean*(1 + bias*slice_data)
# Clip any negative cell means.
print 'Clipped %d / %d cells with mean < 0.' % (np.count_nonzero(cell_mean < 0),nx*ny)
cell_mean[cell_mean < 0] = 0.
# Generate the actual number of tracers in each cell.
generator = np.random.RandomState(seed)
cell_count = generator.poisson(cell_mean)
# Clip any cells with more than 1 tracer.
clipped = (cell_count > 1)
print 'Clipped %d / %d cells with > 1 tracer.' % (np.count_nonzero(clipped),nx*ny)
cell_count[clipped] = 1
# Create the coordinate grid.
xgrid = np.linspace(-nx+1,nx-1,nx,endpoint=True)*spacing/2.
ygrid = np.linspace(-ny+1,ny-1,ny,endpoint=True)*spacing/2.
x,y = np.meshgrid(xgrid,ygrid)
filled = (cell_count > 0)
print 'generated %d tracers' % np.count_nonzero(filled)
return x[filled],y[filled]

In [22]:
tracer1_x,tracer1_y = sample_tracers(tracer_slice1,1000,8./15./h0)
tracer2_x,tracer2_y = sample_tracers(tracer_slice2,1000,8./15./h0)

Using bias = 1.500
mean tracer density = 0.01939 / (units of spacing)**2
Clipped 18530 / 65536 cells with mean < 0.
Clipped 20 / 65536 cells with > 1 tracer.
generated 1155 tracers
Using bias = 1.500
mean tracer density = 0.01939 / (units of spacing)**2
Clipped 14260 / 65536 cells with mean < 0.
Clipped 31 / 65536 cells with > 1 tracer.
generated 1330 tracers


### Density Field Visualizations¶

The yt package is required for the visualizations in this section. Just skip it if you do not have yt installed.

In [23]:
data = dict(Density = delta_r1)
size = 128*8./15./h0
bbox = np.array([[-size,+size],[-size,+size],[-size,+size]])
ds = yt.load_uniform_grid(data,delta_r1.shape,length_unit = 'Mpc',bbox = bbox,nprocs = 4)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-6455a3b34b7b> in <module>()
2 size = 128*8./15./h0
3 bbox = np.array([[-size,+size],[-size,+size],[-size,+size]])
----> 4 ds = yt.load_uniform_grid(data,delta_r1.shape,length_unit = 'Mpc',bbox = bbox,nprocs = 4)

NameError: name 'yt' is not defined
In [35]:
slc = yt.SlicePlot(ds,'z',['Density'])
slc.set_cmap('Density','RdBu_r')
slc.set_log('Density',False)
slc.show()

yt : [INFO     ] 2015-02-06 11:42:12,525 Loading field plugins.
yt : [INFO     ] 2015-02-06 11:42:12,526 Loaded angular_momentum (8 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,526 Loaded astro (15 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,526 Loaded cosmology (22 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,527 Loaded fluid (64 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,528 Loaded fluid_vector (96 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,528 Loaded geometric (112 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,528 Loaded local (112 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,529 Loaded magnetic_field (120 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,529 Loaded my_plugins (120 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,530 Loaded species (122 new fields)
yt : [INFO     ] 2015-02-06 11:42:12,676 xlim = -100.732871 100.732871
yt : [INFO     ] 2015-02-06 11:42:12,676 ylim = -100.732871 100.732871
yt : [INFO     ] 2015-02-06 11:42:12,677 Making a fixed resolution buffer of (('gas', 'Density')) 800 by 800
yt : [INFO     ] 2015-02-06 11:42:12,709 xlim = -100.732871 100.732871
yt : [INFO     ] 2015-02-06 11:42:12,709 ylim = -100.732871 100.732871
yt : [INFO     ] 2015-02-06 11:42:12,710 Making a fixed resolution buffer of (('gas', 'Density')) 800 by 800
yt : [INFO     ] 2015-02-06 11:42:12,719 Making a fixed resolution buffer of (('gas', 'Density')) 800 by 800
yt : [WARNING  ] 2015-02-06 11:42:12,746 Plot image for field ('gas', 'Density') has both positive and negative values. Min = -6.737681, Max = 6.568211.
yt : [WARNING  ] 2015-02-06 11:42:12,747 Switching to symlog colorbar scaling unless linear scaling is specified later