Wald's Sequential Probability Ratio Test

Sequential test: draw data sequentially

Sequence of data $X_1, X_2, \ldots$.

Two hypotheses, $H_0$ and $H_1$, each of which completely specifies the joint distribution of the data.

Assume that the joint distributions under $H_0$ and $H_1$ are absolutely continuous with respect to each other, relative to some underlying measure.

Let $f_{0m}$ be the likelihood of $H_0$ for data $(X_j)_{j=1}^m$ and let $f_{1m}$ be the likelihood of $H_1$ for data $(X_j)_{j=1}^m$.

The likelihood ratio of $H_1$ to $H_0$ is $f_{1m}/f_{0m}$; this is (loosely speaking) the probability of observing $X_1, \ldots, X_m$ if $H_1$ is true, divided by the probability of observing $X_1, \ldots, X_m$ if $H_0$ is true.

The probability of observing the data actually observed will tend to be higher for whichever hypothesis is in fact true, so this likelihood ratio will tend to be greater than $1$ if $H_1$ is true, and will tend to be less than $1$ if $H_0$ is true. The more observations we make, the more probable it is that the resulting likelihood ratio will be small if $H_0$ is true. Wald (1945) showed that if $H_0$ is true, then the probability is at most $\alpha$ that the likelihood ratio is ever greater than $1/\alpha$, no matter how many observations are made. More generally, we have:

 

Wald's SPRT

For any $\alpha \in (0, 1)$ and $\beta \in [0, 1)$, the following sequential algorithm tests the hypothesis $H_0$ at level no larger than $\alpha$ and with power at least $1-\beta$ against the alternative $H_1$.

Set $m=0$.

  1. Increment $m$
  2. If $\frac{f_{1m}}{f_{0m}} \ge \frac{1-\beta}{\alpha}$, stop and reject $H_0$.
  3. If $\frac{f_{1m}}{f_{0m}} \le \frac{\beta}{1-\alpha}$, stop and do not reject $H_0$.
  4. If $\frac{\beta}{1-\alpha} < \frac{f_{1m}}{f_{0m}} < \frac{1-\beta}{\alpha}$, go to step 1.

SPRT miracle

Don't need to know the distribution of the likelihood ratio $\mbox{LR}=\frac{f_{1m}}{f_{0m}}$ under the null hypothesis to find the critical values for the test.

Connection to Gambler's ruin

For iid data, the likelihood ratio is a product of terms. On a log scale, it's a sum. Each "trial" produces another term in the sum, positive or negative. But—unlike the classical Gambler's Ruin problem in which the game is fair—the terms are not equal in magnitude: the steps are not all the same size.

Likelihood ratio for two values of $p$ in iid Bernoulli trials

Suppose $X_1, X_2, \ldots,$ are iid $\mbox{Bernoulli}(p)$ random variables and let $p_1 > p_0$.

Set $\mbox{LR} \leftarrow 1$ and $j \leftarrow 0$.

  • Increment $j$
  • If $X_j = 1$, $\mbox{LR} \leftarrow \mbox{LR} \times p_1/p_0$.
  • If $X_j = 0$, $\mbox{LR} \leftarrow \mbox{LR} \times (1-p_1)/(1- p_0)$.

What's $\mbox{LR}$ at stage $m$? Let $T_m \equiv \sum_{j=1}^m X_j$. $$ \frac{p_{1m}}{p_{0m}} \equiv \frac{p_1^{T_m}(1-p_1)^{m-T_m}}{p_0^{T_m}(1-p_0)^{m-T_m}}. $$ This is the ratio of binomial probability when $p = p_1$ to binomial probability when $p = p_0$ (the binomial coefficients in the numerator and denominator cancel).

Wald's SPRT for $p$ in iid Bernoulli trials

Conclude $p > p_0$ if $$ \frac{p_{1m}}{p_{0m}} \ge \frac{1-\beta}{\alpha}. $$ Conclude $p \le p_0$ if $$ \frac{p_{1m}}{p_{0m}} \le \frac{\beta}{1-\alpha}. $$ Otherwise, draw again.

The SPRT approximately minimizes the expected sample size when $p \le p_0$ or $p > p_1$. For values in $(p_1, p_0)$, it can have larger sample sizes than fixed-sample-size tests.

Simulation of SPRT for Binomial

Let's watch the SPRT in action for iid Bernoulli trials.

In [1]:
# This is the first cell with code: set up the Python environment
%matplotlib inline
from __future__ import print_function, division
import matplotlib.pyplot as plt
import math
import numpy as np
import scipy as sp
import scipy.stats
from scipy.stats import binom
import pandas as pd
# For interactive widgets
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
from IPython.display import clear_output, display, HTML
In [2]:
def plotBinomialSPRT(n, p, p0, p1, alpha, beta):
    '''
       Plots the progress of the SPRT for n iid Bernoulli trials with probabiity p
       of success, for testing the hypothesis that p=p0 against the hypothesis p=p1
       with significance level alpha and power 1-beta
    '''
    fig, ax = plt.subplots(nrows=2, ncols=1, sharex=True)
    trials = sp.stats.binom.rvs(1, p, size=n+1)  # leave room to start at 1
    terms = np.ones(n+1)
    sfac = p1/p0
    ffac = (1.0-p1)/(1.0-p0)
    terms[trials == 1.0] = sfac
    terms[trials == 0.0] = ffac
    terms[0] = 1.0
    logterms = np.log(terms)
    #
    ax[0].plot(range(n+1),np.cumprod(terms), color='b')
    ax[0].axhline(y=(1-beta)/alpha, xmin=0, xmax=n, color='g', label=r'$(1-\beta)/\alpha$')
    ax[0].axhline(y=beta/(1-alpha), xmin=0, xmax=n, color='r', label=r'$\beta/(1-\alpha)$')
    ax[0].set_title('Simulation of Wald\'s SPRT')
    ax[0].set_ylabel('LR')
    ax[0].legend(loc='best')
    #
    ax[1].plot(range(n+1),np.cumsum(logterms), color='b', linestyle='--')
    ax[1].axhline(y=math.log((1-beta)/alpha), xmin=0, xmax=n, color='g', label=r'$log((1-\beta)/\alpha)$')
    ax[1].axhline(y=math.log(beta/(1-alpha)), xmin=0, xmax=n, color='r', label=r'$log(\beta/(1-\alpha))$')
    ax[1].set_ylabel('log(LR)')
    ax[1].set_xlabel('trials')
    ax[1].legend(loc='best')
    plt.show()


interact(plotBinomialSPRT, n=widgets.IntSlider(min=5, max=300, step=5, value=100),\
         p=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.45),\
         p0=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.5),\
         p1=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.6),\
         alpha=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.05),\
         beta=widgets.FloatSlider(min=0.001, max=1, step=0.01, value=.05)
         )
Out[2]:
<function __main__.plotBinomialSPRT>

Behavior

For $p_0 < p_1$,

  • when $p \ge p_1$, the likelihood ratio is likely to cross the upper (green) line eventually and unlikely to cross the lower (red) line.
  • when $p \le p_0$, the likelihood ratio is likely to cross the lower (red) line eventually and unlikely to cross the upper (green) line.
  • the SPRT approximately minimizes the expected number of trials before rejecting one or the other hypothesis, provided $p \notin (p_0, p_1)$.

For $p_1 < p_0$, the directions are reversed.

It works for $\beta = 0$ too: $P$-values

The inequalities hold when $\beta = 0$ also. Then the likelihood ratio has chance at most $\alpha$ of ever being greater than $1/\alpha$, if in fact $p > p_0$.

Hence, $1/\mbox{LR}$ is the $P$-value of the hypothesis $p > p_0$. This can be used to construct one-sided confidence bounds for $p$ and other parameters. The next chapter does exactly that, to find a lower confidence bound for the mean of a nonnegative population.

Dependent observations

As an illustration of SPRT in for obervations that are not iid, consider the following problem.

There is a population of $B$ items. Item $b$ has "weight" $N_b$ and "value" $a_b \in \{0, 1\}$. Let $N = \sum_{b=1}^B N_b$. (You might think of items as manufacturing lots, where $a_b = 1$ means the lot is acceptable and $a_b=0$ means the lot is defective.)

We want to test the hypothesis $H_0$ that $\frac{1}{N}\sum_b N_b a_b = 1/2$ against the alternative hypothesis $H_1$ that $\frac{1}{N}\sum_b N_b a_b = \gamma $, for some fixed $\gamma > 1/2$.

We will draw items sequentially, without replacement, such that the chance that item $b$ is selected in draw $i$, assuming it has not been selected already, is $N_b/\sum_{j \notin {\mathcal B_{i-1}}} N_j$, where ${\mathcal B_{i-1}}$ is the sample up to and including the $i-1$st draw, and ${\mathcal B_0} \equiv \emptyset$. That is, the chance of selecting an item is in proportion to its weight among the items that have not yet been selected.

Let $\mathbb B_i$ denote the item selected at random in such a manner in the $i$th draw.

The chance that the first draw ${\mathbb B_1}$ gives an item with value 1, i.e., $\Pr \{a_{\mathbb B_1} = 1\}$, is $\frac{1}{N}\sum_b N_b a_b$. Under $H_0$, this chance is $p_{01} = 1/2$; under $H_1$, this chance is $p_{11} = \gamma$.

Given the values of $\{a_{\mathbb B_k}\}_{k=1}^i$, the conditional probability that the $i$th draw gives an item with value 1 is

$$ \Pr \{a_{\mathbb B_i} = 1 | {\mathcal B_{i-1}} \} = \frac{ \sum_{b \notin {\mathcal B_{i-1}}} N_b a_b}{\sum_{b \notin {\mathcal B_{i-1}}} N_b}. $$

Under $H_0$, this chance is

$$ p_{0i} = \frac{N/2 - \sum_{b \in {\mathcal B_{i-1}}} N_b a_b}{N - \sum_{b \in {\mathcal B_{i-1}}} N_b}. $$

Under $H_1$, this chance is

$$ p_{1i} = \frac{N \gamma - \sum_{b \in {\mathcal B_{i-1}}} N_b a_b}{N - \sum_{b \in {\mathcal B_{i-1}}} N_b}. $$

Let $X_k$ be the indicator of the event that the $k$th draw gives an item with value $1$, i.e., the indicator of the event $a_{\mathbb B_k} = 1$. The likelihood ratio for a given sequence $\{X_k\}_{k=1}^i$ is

$$ \mbox{LR} = \frac{\prod_{k=1}^i p_{1k}^{X_k}(1-p_{1k})^{1-X_k}} {\prod_{k=1}^i p_{0k}^{X_k}(1-p_{0k})^{1-X_k}}. $$

This can be simplified: $p_{0k}$ and $p_{1k}$ have the same denominator, $N - \sum_{b \in {\mathcal B_{i-1}}} N_b$, and their numerators share a term. Define $N(k) \equiv \sum_{j = 1}^{k-1}N_b$ and $A(k) \equiv \sum_{j = 1}^{k-1}N_b a_b$. Then

$$ \mbox{LR} = \prod_{k=1}^i \left ( \frac{N\gamma - A(k)}{N/2 - A(k)} \right )^{X_k} \left ( \frac{N-N\gamma - (N(k) - A(k))}{N-N/2 - (N(k)-A(k))} \right )^{1-X_k} $$$$ = \prod_{k=1}^i \left ( \frac{N\gamma - A(k)}{N/2 - A(k)} \right )^{X_k} \left ( \frac{N(1-\gamma) - N(k) + A(k)}{N/2 - N(k) + A(k)} \right )^{1-X_k}, $$

where the products are defined to be infinite if the denominator vanishes anywhere.

If $H_0$ is true, the chance that $\mbox{LR}$ is ever greater than $1/\alpha$ is at most $\alpha$.

In [ ]: