General-purpose Inequalities

The Arithmetic-Geometric Mean Inequality

Let $\{ a_j \}_{j=1}^n$ and $\{q_j\}_{j=1}^n$ be sets of nonnegative numbers with $\sum_j q_j = 1$. Then $$ \prod_{j=1}^n a_j^{q_j} < \sum_{j=1}^n q_j a_j. $$


Let $x \equiv (x_j)_{j=1}^n \in {\mathbb R}^n$. The decreasing rearrangement of $x$, denoted $x^*$, has the same multiset of component values as $x$, but ordered so that $$ x_1^* \ge x_2^* \ge \cdots \ge x_n^*. $$

The increasing rearrangmement of $x$, denoted $x_*$, has the same multiset of component values as $x$, but ordered so that $$ x_{*1} \le x_{*2} \le \cdots \le x_{*n}. $$

The Rearrangement Inequality for two vectors

Suppose $x \in \mathbb{R}^n$ and $y \in \mathbb{R}^n$. Let $x \cdot y \equiv \sum_{j=1}^n x_j y_j$. Then $$ x^* \cdot y_* \le x \cdot y \le x^* \cdot y^*. $$

The Rearrangement Inequality for three vectors

[To do]

Probability Inequalities

This section follows Lugosi (2006) closely.

The tail-sum formula

If $X$ is a nonnegative real-valued random variable, $$ {\mathbb E} X = \int_0^\infty {\mathbb{P}} \{X \ge x\} dx $$

Jensen's Inequality

If $\phi$ is a convex function from ${\mathcal X}$ to $\mathbb{R}$, then $\phi({\mathbb E} X) \le {\mathbb E} \phi(X)$.

From the tail-sum formula, $$ {\mathbb E} X \ge \int_0^t {\mathbb{P}} \{X \ge x\} dx \ge t {\mathbb{P}} \{X \ge t \} $$ so $$ {\mathbb{P}} \{ X \ge t \} \le \frac{{\mathbb E} X}{t}. $$ This is Markov's Inequality.

Moreover, for any strictly monotonic function $f$ and nonnegative $X$, $$ {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ f(X) \ge f(t) \} \le \frac{{\mathbb E} f(X)}{f(t)}. $$ In particular, for any real-valued $X$ and real $q > 0$, $| X - {\mathbb E} X |$ is a nonnegative random variable and $f(x) = x^q$ is strictly monotonic, so $$ {\mathbb{P}} \{| X - {\mathbb E} X | \ge t \} = {\mathbb{P}} \{ | X - {\mathbb E} X |^q \ge t^q \} \le \frac{{\mathbb E} | X - {\mathbb E} X |^q}{t^q}. $$ This is the Generalized Markov Inequality.

Chebychev's Inequality is a special case: $$ {\mathbb{P}} \{ | X - {\mathbb E} X |^2 \ge t^2 \} \le \frac{{\mathbb E} | X - {\mathbb E} X |^2}{t^2} = \frac{{\mbox{Var}} X}{t^2}. $$

Chernoff bounds

If $X$ has a moment generating function (if ${\mathbb{E}} e^{sX}$ exists for $s \in [-a, a]$ for some $a > 0$), we can apply the generalized Markov Inequality with $f(x) = e^{sx}$ for positive $s$: $$ {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ e^{sX} \ge e^{st} \} \le \frac{{\mathbb E} e^{sX}}{e^{st}} $$ for all $s \in [-a, a]$. For any particular $X$, one can optimize this over $s$: $$ {\mathbb{P}} \{ X \ge t \} = {\mathbb{P}} \{ e^{sX} \ge e^{st} \} \le \inf_{s \in [-a, a]} \frac{{\mathbb E} e^{sX}}{e^{st}}. $$

Hoeffding's Inequality

Let $\{ X_j \}_{j=1}^n$ be independent, and define $S_n \equiv \sum_{j=1}^n X_j$. Then, applying the Chernoff bound gives $$ {\mathbb{P}} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-st} {\mathbb E} e^{sS_n} = e^{-st} \prod_{j=1}^n e^{s(X_j - E X_j)}. $$ Hoeffding bounds the moment generating function for a bounded random variable $X$: If $a \le X \le b$ with probability 1, then $$ {\mathbb E} e^{sX} \le e^{s^2 (b-a)^2/8}, $$ from which follows Hoeffding's tail bound:

If $\{X_j\}_{j=1}^n$ are independent and ${\mathbb{P}} \{a_j \le X_j \le b_j\} = 1$, then $$ {\mathbb{P}} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} $$ and $$ {\mathbb{P}} \{ S_n - {\mathbb E} S_n \le -t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} $$

Hoeffding's Other Inequality

Suppose $f$ is a convex, continuous, real function and ${\mathcal X}$ is a finite set. Let $\{X_j \}_{j=1}^n$ be a simple random sample from ${\mathcal X}$ and let $\{Y_j \}_{j=1}^n$ be an iid uniform random sample (with replacement) from ${\mathcal X}$. Then $$ {\mathbb E} f \left ( \sum_{j=1}^n X_j \right ) \le {\mathbb E} f \left ( \sum_{j=1}^n Y_j \right ). $$

Bernstein's Inequality

Suppose $\{X_j \}_{j=1}^n$ are independent with ${\mathbb E} X_j = 0$ for all $j$, ${\mathbb{P}} \{ | X_j | \le c\} = 1$, $\sigma_j^2 = {\mathbb E} X_j^2$ and $\sigma = \frac{1}{n} \sum_{j=1}^n \sigma_j^2$. Then for any $\epsilon > 0$, $$ {\mathbb{P}} \{ S_n/n > \epsilon \} \le e^{-n \epsilon^2 / 2(\sigma^2 + c\epsilon/3)}. $$

In [ ]: