This follows Lugosi (2006) rather closely; it also a repeat of the results in the chapter on Mathematical Preliminaries, which we largely omitted.
If $X$ is a nonnegative real-valued random variable, $$ {\mathbb E} X = \int_0^\infty {\mathbb P} \{X \ge x\} dx $$
If $\phi$ is a convex function from ${\mathcal X}$ to $\Re$, then $\phi({\mathbb E} X) \le {\mathbb E} \phi(X)$.
From the tail-integral formula,
$$ {\mathbb E} X \ge \int_0^t {\mathbb P} \{X \ge x\} dx \ge t {\mathbb P} \{X \ge t \}, $$which yields Markov's Inequality:
$$ {\mathbb P} \{ X \ge t \} \le \frac{{\mathbb E} X}{t}. $$Moreover, for any strictly monotonic function $f$ and nonnegative $X$, we have the Generalized Markov Inequality:
$$ {\mathbb P} \{ X \ge t \} = {\mathbb P} \{ f(X) \ge f(t) \} \le \frac{{\mathbb E} f(X)}{f(t)}. $$In particular, for any real-valued $X$ and real $q > 0$, $| X - {\mathbb E} X |$ is a nonnegative random variable and $f(x) = x^q$ is strictly monotonic, so
$$ {\mathbb P} \{| X - {\mathbb E} X | \ge t \} = {\mathbb P} \{ | X - {\mathbb E} X |^q \ge t^q \} \le \frac{{\mathbb E} | X - {\mathbb E} X |^q}{t^q}. $$Chebychev's inequality is a special case:
$$ {\mathbb P} \{ | X - {\mathbb E} X |^2 \ge t^2 \} \le \frac{{\mathbb E} | X - {\mathbb E} X |^2}{t^2} = \frac{{\mbox{Var}} X}{t^2}. $$Apply the Generalized Markov Inequality with $f(x) = e^{sx}$ for positive $s$ to obtain the Chernoff Bound: $$ {\mathbb P} \{ X \ge t \} = {\mathbb P} \{ e^{sX} \ge e^{st} \} \le \frac{{\mathbb E} e^{sX}}{e^{st}} $$ for all $s$. For particular $X$, one can optimize over $s$.
Let $\{ X_j \}_{j=1}^n$ be independent, and define $S_n \equiv \sum_{j=1}^n X_j$. Applying the Chernoff Bound yields $$ {\mathbb P} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-st} {\mathbb E} e^{sS_n} = e^{-st} \prod_{j=1}^n e^{s(X_j - E X_j)}. $$ Hoeffding bounds the moment generating function for a bounded random variable $X$: If $a \le X \le b$ with probability 1, then $$ {\mathbb E} e^{sX} \le e^{s^2 (b-a)^2/8}, $$ from which follows Hoeffding's tail bound.
If $\{X_j\}_{j=1}^n$ are independent and ${\mathbb P} \{a_j \le X_j \le b_j\} = 1$, then $$ {\mathbb P} \{ S_n - {\mathbb E} S_n \ge t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} $$ and $$ {\mathbb P} \{ S_n - {\mathbb E} S_n \le -t \} \le e^{-2t^2/\sum_{j=1}^n (b_j - a_j)^2} $$
Suppose $f$ is a convex, real function and ${\mathcal X}$ is a finite set. Let $\{X_j \}_{j=1}^n$ be a simple random sample from ${\mathcal X}$ and let $\{Y_j \}_{j=1}^n$ be an iid uniform random sample (with replacement) from ${\mathcal X}$. Then $$ {\mathbb E} f \left ( \sum_{j=1}^n X_j \right ) \le {\mathbb E} f \left ( \sum_{j=1}^n Y_j \right ). $$
Suppose $\{X_j \}_{j=1}^n$ are independent with ${\mathbb E} X_j = 0$ for all $j$, ${\mathbb P} \{ | X_j | \le c\} = 1$, $\sigma_j^2 = {\mathbb E} X_j^2$ and $\sigma = \frac{1}{n} \sum_{j=1}^n \sigma_j^2$. Then for any $\epsilon > 0$, $$ {\mathbb P} \{ S_n/n > \epsilon \} \le e^{-n \epsilon^2 / 2(\sigma^2 + c\epsilon/3)}. $$
Previous: Random Variables, Expectation, & Random Vectors Next: Simulation
%run talktools.py