Computational Statistics¶

The name, computational statistics, implies that it is composed of two indispensable parts, statistics inference models and the corresponding algorithms implemented in computers. Based on the different kinds of hypotheses, statistics inference can be divided into two schools, frequentist inference school and Bayesian inference school.

Here, we describe each one briefly. Let $\mathcal{P}$ be a premise and $O$ be an observation which may give evidence for $\mathcal{P}$. The priori $P(\mathcal{P})$ is the probability that $\mathcal{P}$ is true before the observation is considered. Also, the posterior $P(\mathcal{P} | O)$ is the probability that $\mathcal{P}$ is true after the observation $O$ is con- sidered.

The likelihood $P(O | \mathcal{P})$ is the chance of observation $O$ when evidence $\mathcal{P}$ exists. Finally, $P(O)$ is the total probability, calculated in the following way:

$$ P(O)=\sum_{\mathcal{P}} P(O | \mathcal{P}) P(\mathcal{P}) $$

Connecting the probabilities above is the significant Bayes' formula in the theory of probability

$$ P(\mathcal{P} | O)=\frac{P(O | \mathcal{P}) P(\mathcal{P})}{P(O)} \sim P(O | \mathcal{P}) P(\mathcal{P}) $$

where $P(O)$ can be calculated automatically if we haveknown the likelihood $P(O | \mathcal{P})$ and $P(\mathcal{P})$. If we presume that some hypothesis (parameter specifying the conditional distribution of the data) is true and that the observed data is sampled from that distribution, that is, $P(\mathcal{P})=1$ only using conditional distributions of data given the specific hypotheses are the view of the frequentist school. However, if there is no presumption that some hypothesis (parameter specifying the conditional distribution of the data) is true, that is, there is a prior probability for the hypothesis $\mathcal{P}$ $\mathcal{P} \sim P(\mathcal{P})$ summing up the information from the prior and likelihood is the view from the Bayesian school. Apparently, the view from the frequentist school is a special case of the view from the Bayesian school, but the view from the Bayesian school is more comprehensive and requires more information.

Take the Gaussian distribution with known variance for the likelihood as an example. Without loss of generality, we assume the variance $\sigma^{2}=1 .$ In other words, the data point is viewed as a random variable X following the rule below: $$ \mathbf{X} \sim P(x | \mathcal{P})=\frac{1}{\sqrt{2 \pi}} e^{-\frac{(x-\mu)^{2}}{2}} $$ where & the & hypothesis is

$\mathcal{P}=\{\mu | \mu \in(-\infty, \infty) \text { is some fixed real number }\} .$

Let the data set be $O=\left\{x_{i}\right\}_{i=1}^{n} .$ The frequentist school requires us to compute maximum likelihood or maximum log-likelihood, that is,

$$ \begin{aligned} \underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} f(\mu) &=\underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} \log P(O | \mathcal{P}) \\ &=\underset{\mu \in[-\infty, \infty)}{\operatorname{argmax}}\left(\log \prod_{i=1}^{n} P\left(x_{i} \in O | \mathcal{P}\right)\right) \\ &=\underset{\mu \in[-\infty, \infty)}{\operatorname{argmax}} \log \left[\left(\frac{1}{\sqrt{2 \pi}}\right)^{n} e^{-\frac{\sum_{i=1}^{n}\left(x_{i}-\mu\right)^{2}}{2}}\right] \\ &=-\underset{\mu \in(-\infty, \infty)}{\operatorname{argmin}}\left[\frac{1}{2} \sum_{i=1}^{n}\left(x_{i}-\mu\right)^{2}+n \log \sqrt{2 \pi}\right] \end{aligned} $$

which has been shown in the classical textbooks, such as [RS15], whereas the Bayesian school requires to compute maximum posterior estimate or maximum log-posterior estimate that is, we need to assume reasonable prior distribution.

If the prior distribution is a Gauss distribution $\mu \sim \mathcal{N}\left(0, \sigma_{0}^{2}\right)$

we have:

$\underset{\mu \in \mathbb{C}-\infty, \infty}{\operatorname{argmax}} f(\mu)$ ( $\mu)$ $$ \begin{array}{l} =\underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} \log P(O | \mathcal{P}) P(\mathcal{P}) \\ =\underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} \log \left(\prod_{i=1}^{n} \log P\left(x_{i} \in O | \mathcal{P}\right)\right) P(\mathcal{P}) \\ =\underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} \log \left\{\left[\left(\frac{1}{\sqrt{2 \pi}}\right)^{n} e^{-\frac{\left.\sum^{n}-1 x_{i}-\mu\right)^{2}}{2}}\right] \cdot\left(\frac{1}{\sqrt{2 \pi} \sigma_{0}}\right) e^{-\frac{\mu^{2}}{2 \sigma_{0}^{2}}}\right\} \\ =-\underset{\mu \in[-\infty, \infty)}{\operatorname{argmin}}\left[\frac{1}{2} \sum_{i=1}^{n}\left(x_{i}-\mu\right)^{2}+\frac{1}{2 \sigma_{0}^{2}} \cdot \mu^{2}+n \log \sqrt{2 \pi}+\log \sqrt{2 \pi} \sigma_{0}\right] \end{array} $$

If the prior distribution is a Laplace distribution $\mu \sim \mathcal{L}\left(0, \sigma_{0}^{2}\right)$

we have $\max _{\mu \in(-\infty, \infty)} f(\mu)$ $$ \begin{array}{l} =\underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} \log P(O | \mathcal{P}) P(\mathcal{P}) \\ =\underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} \log \left(\prod_{i=1}^{n} \log P\left(x_{i} \in O | \mathcal{P}\right)\right) P(\mathcal{P}) \\ =\underset{\mu \in[-\infty, \infty)}{\operatorname{argmax}} \log \left\{\left[\left(\frac{1}{\sqrt{2 \pi}}\right)^{n} e^{-\frac{\sum^{n}=1}{2}\left(\frac{x_{i}-\omega_{i}}{2}\right)^{2}}\right] \cdot\left(\frac{1}{2 \sigma_{0}^{2}}\right) e^{-\frac{W}{\sigma_{0}^{2}}}\right\} \\ =-\underset{\mu \in[-\infty, \infty)}{\operatorname{argmin}}\left[\frac{1}{2} \sum_{i=1}^{n}\left(x_{i}-\mu\right)^{2}+\frac{1}{\sigma_{0}^{2}} \cdot|\mu|+n \log \sqrt{2 \pi}+\log 2 \sigma_{0}^{2}\right] \end{array} $$

If the prior distribution is the mixed distribution combined with Laplace distribution and Gaussian distribution $\mu \sim \mathcal{M}\left(0, \sigma_{0,1}^{2}, \sigma_{0,2}^{2}\right),$ we have

$$ \underset{\mathbf{r}^{\mu}(-\infty, \infty)}{\operatorname{argmax}} f(\mu) $$$$ \begin{array}{l} =\underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} \log P(O | \mathcal{P}) P(\mathcal{P}) \\ =\underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} \log \left(\prod_{i=1}^{n} \log P\left(x_{i} \in O | \mathcal{P}\right)\right) P(\mathcal{P}) \\ =\underset{\mu \in(-\infty, \infty)}{\operatorname{argmax}} \log \left\{\left[\left(\frac{1}{\sqrt{2 \pi}}\right)^{n} e^{-\frac{\sum_{i=1}^{n}\left(i_{i}-\mu\right)^{2}}{2}}\right]\right. \\ =-\underset{\mu \in[-\infty, \infty)}{\arg \min }\left[\frac{1}{2} \sum_{i=1}^{n}\left(x_{i}-\mu\right)^{2}+\frac{1}{\sigma_{0}^{2}} \cdot|\mu|+\frac{1}{2 \sigma_{0,2}^{2}} \cdot \mu^{2}\right. \\ \left.+n \log \sqrt{2 \pi}+\log C\left(\sigma_{0,1}, \sigma_{0,2}\right)\right] \end{array} $$

where $C=2 \sqrt{2 \pi} \sigma_{0,1}^{2} \sigma_{0,2}$