Problem 1¶

Let $X$ be an R.V. that is a linear combination (with known, fixed coefficients $\alpha_k$) of twenty $N(0,1)$ deviates. That is, $X = \sum_{k=1}^{20} \alpha_k T_k$ where $T_k \sim N(0,1)$. How can you most simply form a t-value-squared (that is, something distributed as $\text{Chisquare}(1)$ from $X$? For some particular choice of $\alpha_k$'s (random is ok), generate a sample of $x$'s, plot their histogram, and show that it agrees with $\text{Chisquare}(1)$.

$X \sim N(0, \sqrt{\sum_i{\alpha^2}})$, so $t^2 = \frac{X^2}{\sum_i{\alpha^2}}$

In [1]:

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats

np.random.seed(13)

Randomly generate $\alpha$ and sample a bunch of $X$s.

In [2]:

alphas = np.random.gamma(6, 1, size=20) - 5
alphas

Out[2]:

array([-0.86562055,  2.65704066,  0.56138695,  1.81167494,  4.52672201,
        2.9739262 ,  4.96578367, -1.47533179,  2.11488308,  0.10694607,
        3.13198172,  1.45618077,  0.97514323,  7.48169484, -1.36807812,
        3.99997065,  0.32797569,  1.25194973,  4.69276641, -2.37408539])

In [3]:

xs = (np.random.normal(0, 1, (1000000, 20)) * alphas).sum(axis=1)
xs

Out[3]:

array([  1.67577355e-01,   1.42682015e-02,  -1.68161358e-01, ...,
         2.14471568e+01,   5.95289406e-01,  -1.95527541e+01])

Show that $X \sim N(0, \sqrt{\sum_i{\alpha^2}})$

In [4]:

fig, ax = plt.subplots()

ax.hist(xs, bins=100, normed=True, label='$X$')

mean = 0
std = np.sqrt((alphas**2).sum())
x = np.linspace(xs.min(), xs.max(), num=1000)
y = scipy.stats.norm(loc=mean, scale=std).pdf(x)
ax.plot(x, y, color='red', label='$Normal({0},{1:.2f})$'.format(mean, std))

ax.legend()

Out[4]:

<matplotlib.legend.Legend at 0xa99956c>

Show that $\frac{X^2}{\sum_i{\alpha^2}} \sim Chisquare(1)$

In [5]:

fig, ax = plt.subplots()

xmin, xmax = 0, 4
ymin, ymax = 0, 5

ax.hist((xs / std)**2, bins=100, normed=True, label='$(X / std(X))^2$', range=(xmin, xmax))

x = np.linspace(xmin + 0.001, xmax, num=1000)
y = scipy.stats.chi2(1).pdf(x)
plt.plot(x, y, color='red', label='$Chisquare(1)$')

ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.legend()

Out[5]:

<matplotlib.legend.Legend at 0xac8312c>

Problem 2¶

From some matrix of known coefficients $\alpha_{ik}$ with $k=1,\ldots,20$ and $i = 1,\ldots,100$, generate 100 R.V.s $X_i = \sum_{k=1}^{20} \alpha_{ik} T_k$ where $T_k \sim N(0,1)$. In other words, you are expanding 20 i.i.d. $T_k$'s into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as $\text{Chisquare}(\nu)$? What is the value of $\nu$? Use enough samples so that you could distinguish between $\nu$ and $\nu-1$.

In [6]:

alpha = np.random.random((100, 20))
alpha_sums_squared = (alpha ** 2).sum(axis=1)

In [7]:

def sample():
    t = np.random.normal(0, 1, (100, 20))
    xs = (alpha * t).sum(axis=1)
    return np.sum(xs**2 / alpha_sums_squared)

In [8]:

%time vals = np.array([sample() for _ in xrange(2500000)])

CPU times: user 10min 53s, sys: 104 ms, total: 10min 53s
Wall time: 10min 53s

In [11]:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
xmin, xmax = 40, 160
x = np.linspace(xmin, xmax, num=1000)

ax1.hist(vals, bins=250, label='Sum of t-values-squared', normed=True, alpha=0.5)
ax1.plot(x, scipy.stats.chi2(99).pdf(x),  label='$Chisquare(99)$', linewidth=2)
ax1.plot(x, scipy.stats.chi2(100).pdf(x), label='$Chisquare(100)$', linewidth=2)
ax1.plot(x, scipy.stats.chi2(101).pdf(x), label='$Chisquare(101)$', linewidth=2)
ax1.set_xlim(xmin, xmax)
ax1.legend(prop={'size': 10})

ax2.hist(vals, bins=500, label='Sum of t-values-squared', normed=True, alpha=0.5)
ax2.plot(x, scipy.stats.chi2(99).pdf(x),  label='$Chisquare(99)$', linewidth=2)
ax2.plot(x, scipy.stats.chi2(100).pdf(x), label='$Chisquare(100)$', linewidth=2)
ax2.plot(x, scipy.stats.chi2(101).pdf(x), label='$Chisquare(101)$', linewidth=2)
ax2.set_xlim(90, 120)
ax2.set_ylim(0.01, 0.03)
ax2.legend(prop={'size': 10})

Out[11]:

<matplotlib.legend.Legend at 0xc84bf4c>