Let $X$ be an R.V. that is a linear combination (with known, fixed coefficients $\alpha_k$) of twenty $N(0,1)$ deviates. That is, $X = \sum_{k=1}^{20} \alpha_k T_k$ where $T_k \sim N(0,1)$. How can you most simply form a t-value-squared (that is, something distributed as $\text{Chisquare}(1)$ from $X$? For some particular choice of $\alpha_k$'s (random is ok), generate a sample of $x$'s, plot their histogram, and show that it agrees with $\text{Chisquare}(1)$.
$X \sim N(0, \sqrt{\sum_i{\alpha^2}})$, so $t^2 = \frac{X^2}{\sum_i{\alpha^2}}$
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats
np.random.seed(13)
Randomly generate $\alpha$ and sample a bunch of $X$s.
alphas = np.random.gamma(6, 1, size=20) - 5
alphas
array([-0.86562055, 2.65704066, 0.56138695, 1.81167494, 4.52672201, 2.9739262 , 4.96578367, -1.47533179, 2.11488308, 0.10694607, 3.13198172, 1.45618077, 0.97514323, 7.48169484, -1.36807812, 3.99997065, 0.32797569, 1.25194973, 4.69276641, -2.37408539])
xs = (np.random.normal(0, 1, (1000000, 20)) * alphas).sum(axis=1)
xs
array([ 1.67577355e-01, 1.42682015e-02, -1.68161358e-01, ..., 2.14471568e+01, 5.95289406e-01, -1.95527541e+01])
Show that $X \sim N(0, \sqrt{\sum_i{\alpha^2}})$
fig, ax = plt.subplots()
ax.hist(xs, bins=100, normed=True, label='$X$')
mean = 0
std = np.sqrt((alphas**2).sum())
x = np.linspace(xs.min(), xs.max(), num=1000)
y = scipy.stats.norm(loc=mean, scale=std).pdf(x)
ax.plot(x, y, color='red', label='$Normal({0},{1:.2f})$'.format(mean, std))
ax.legend()
<matplotlib.legend.Legend at 0xa99956c>
Show that $\frac{X^2}{\sum_i{\alpha^2}} \sim Chisquare(1)$
fig, ax = plt.subplots()
xmin, xmax = 0, 4
ymin, ymax = 0, 5
ax.hist((xs / std)**2, bins=100, normed=True, label='$(X / std(X))^2$', range=(xmin, xmax))
x = np.linspace(xmin + 0.001, xmax, num=1000)
y = scipy.stats.chi2(1).pdf(x)
plt.plot(x, y, color='red', label='$Chisquare(1)$')
ax.set_xlim(xmin, xmax)
ax.set_ylim(ymin, ymax)
ax.legend()
<matplotlib.legend.Legend at 0xac8312c>
From some matrix of known coefficients $\alpha_{ik}$ with $k=1,\ldots,20$ and $i = 1,\ldots,100$, generate 100 R.V.s $X_i = \sum_{k=1}^{20} \alpha_{ik} T_k$ where $T_k \sim N(0,1)$. In other words, you are expanding 20 i.i.d. $T_k$'s into 100 R.V.'s. Form a sum of 100 t-values-squareds obtained from these variables and demonstrate numerically by repeated sampling that it is distributed as $\text{Chisquare}(\nu)$? What is the value of $\nu$? Use enough samples so that you could distinguish between $\nu$ and $\nu-1$.
alpha = np.random.random((100, 20))
alpha_sums_squared = (alpha ** 2).sum(axis=1)
def sample():
t = np.random.normal(0, 1, (100, 20))
xs = (alpha * t).sum(axis=1)
return np.sum(xs**2 / alpha_sums_squared)
%time vals = np.array([sample() for _ in xrange(2500000)])
CPU times: user 10min 53s, sys: 104 ms, total: 10min 53s Wall time: 10min 53s
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
xmin, xmax = 40, 160
x = np.linspace(xmin, xmax, num=1000)
ax1.hist(vals, bins=250, label='Sum of t-values-squared', normed=True, alpha=0.5)
ax1.plot(x, scipy.stats.chi2(99).pdf(x), label='$Chisquare(99)$', linewidth=2)
ax1.plot(x, scipy.stats.chi2(100).pdf(x), label='$Chisquare(100)$', linewidth=2)
ax1.plot(x, scipy.stats.chi2(101).pdf(x), label='$Chisquare(101)$', linewidth=2)
ax1.set_xlim(xmin, xmax)
ax1.legend(prop={'size': 10})
ax2.hist(vals, bins=500, label='Sum of t-values-squared', normed=True, alpha=0.5)
ax2.plot(x, scipy.stats.chi2(99).pdf(x), label='$Chisquare(99)$', linewidth=2)
ax2.plot(x, scipy.stats.chi2(100).pdf(x), label='$Chisquare(100)$', linewidth=2)
ax2.plot(x, scipy.stats.chi2(101).pdf(x), label='$Chisquare(101)$', linewidth=2)
ax2.set_xlim(90, 120)
ax2.set_ylim(0.01, 0.03)
ax2.legend(prop={'size': 10})
<matplotlib.legend.Legend at 0xc84bf4c>