Confidence bounds for the mean of a bounded population: Binomial and Hypergeometric

We will use basic combinatorics to find upper and lower confidence bounds for the mean of a bounded population.

  • Elementary derivation for {0, 1} populations
  • Obvious extension to 2-valued populations
  • Thresholding for general bounded populations
  • Trinomial and Multinomial bounds

Comparison with the normal approximation

Let's start by constructing two-sided confidence intervals for the same {0, 1} cases in which the Normal approximation did so badly.

To find two-sided intervals, we find lower and upper confidence bounds at half the value of $\alpha$. This is equivalent to constructing acceptance regions that are "balanced" in the sense that, under the null hypothesis, the chance of rejecting the hypothesis because $X$ is too big is equal to the chance of rejecting the hypothesis because $X$ is too small.

The code tells the story:

In [1]:
# This is the first cell with code: set up the Python environment
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np
import scipy as sp
import scipy.stats
from scipy.stats import binom
import pandas as pd
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
from IPython.display import clear_output, display, HTML
In [2]:
def binoLowerCL(n, x, cl = 0.975, inc=0.000001, p = None):
    "Lower confidence level cl confidence interval for Binomial p, for x successes in n trials"
    if p is None:
            p = float(x)/float(n)
    lo = 0.0
    if (x > 0):
            f = lambda q: cl - scipy.stats.binom.cdf(x-1, n, q)
            lo = sp.optimize.brentq(f, 0.0, p, xtol=inc)
    return lo

def binoUpperCL(n, x, cl = 0.975, inc=0.000001, p = None):
    "Upper confidence level cl confidence interval for Binomial p, for x successes in n trials"
    if p is None:
            p = float(x)/float(n)
    hi = 1.0
    if (x < n):
            f = lambda q: scipy.stats.binom.cdf(x, n, q) - (1-cl)
            hi = sp.optimize.brentq(f, p, 1.0, xtol=inc) 
    return hi
In [3]:
# Population of two values, {0, 1}, in various proportions.  Amounts to Binomial random variable
ns = np.array([25, 50, 100, 400])  # sample sizes
ps = np.array([.001, .01, 0.1])    # mixture fractions, proportion of 1s in the population
alpha = 0.05  # 1- (confidence level)
reps = int(1.0e3)   # just for demonstration
vals = [0, 1]

simTable = pd.DataFrame(columns=('fraction of 1s', 'sample size', 'Student-t cov',\
                                 'Binom cov', 'Student-t len', 'Binom len'))
for p in ps:
    popMean = p
    for n in ns:
        tCrit = sp.stats.t.ppf(q=1.0-alpha/2, df=n-1)
        samMean = np.zeros(reps)
        sam = sp.stats.binom.rvs(n, p, size=reps)
        samMean = sam/float(n)
        samSD = np.sqrt(samMean*(1-samMean)/(n-1))
        coverT = (np.fabs(samMean-popMean) < tCrit*samSD).sum()
        aveLenT = 2*(tCrit*samSD).mean()
        coverB = 0
        totLenB = 0.0
        for r in range(int(reps)):  
            lo = binoLowerCL(n, sam[r], cl=1.0-alpha/2)
            hi = binoUpperCL(n, sam[r], cl=1.0-alpha/2)
            coverB += ( p >= lo) & (p <= hi)
            totLenB += hi-lo
        simTable.loc[len(simTable)] =  p, n, str(100*float(coverT)/float(reps)) + '%', \
            str(100*float(coverB)/float(reps)) + '%',\
            str(round(aveLenT,4)),\
            str(round(totLenB/float(reps),4))
#
ansStr =  '<h3>Simulated coverage probability and expected length of Student-t and Binomial confidence intervals for a {0, 1} population</h3>' +\
          '<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
          '%</strong>.<br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'
display(HTML(ansStr))
display(simTable)

Simulated coverage probability and expected length of Student-t and Binomial confidence intervals for a {0, 1} population

Nominal coverage probability 95.0%.
Estimated from 1000 replications.
fraction of 1s sample size Student-t cov Binom cov Student-t len Binom len
0 0.001 25.0 2.8% 97.2% 0.0047 0.1391
1 0.001 50.0 5.1% 99.9% 0.0041 0.0729
2 0.001 100.0 9.5% 99.6% 0.0038 0.038
3 0.001 400.0 34.6% 99.0% 0.0037 0.011
4 0.010 25.0 20.8% 99.8% 0.0357 0.1518
5 0.010 50.0 40.7% 98.6% 0.0361 0.0881
6 0.010 100.0 60.9% 97.5% 0.0297 0.052
7 0.010 400.0 90.8% 95.8% 0.0188 0.0221
8 0.100 25.0 93.4% 99.3% 0.2377 0.2631
9 0.100 50.0 87.9% 96.3% 0.1675 0.1812
10 0.100 100.0 93.4% 95.2% 0.1183 0.126
11 0.100 400.0 93.9% 95.1% 0.0588 0.061

The empirical coverage rates are gnerally much higher than 95% when $n$ is small, because of the discreteness of the Binomial distribution: to ensure that a test rejects at most 5% of the time might require rejecting far less than 5% of the time. That is, the tests are conservative rather than exact.

We could construct randomized tests that have exactly level $\alpha$, but we won't go there...

Hypergeometric confidence bounds

Consider $n$ uniform draws without replacement from a $\{0, 1\}$ population of known size $N$ that contains an unknown number $G$ of 1s—i.e., a simple random sample of size $n$. Let $X$ denote the number of 1s in the sample. Then $X$ has a Hypergeometric distribution with parameters $N$, $G$, and $n$.

In particular, $$ \mathbb P_{N, G, n} (X = k) = \frac{{G \choose k}{N-G \choose n-k}}{{N \choose n}}$$ for $\max(0, n − (N−G)) \le k \le \min(n, G)$.

We can get sharper confidence bounds in this case by inverting hypergeometric tests instead of Binomial tests.

The strategy is the same: construct a family of acceptance regions for each possible value of the parameter, then let the confidence set be the parameter values for which, given the observed data, the test would not reject.

To find a one-sided lower confidence bound for $G$ in a Hypergeometric$(N, G, n)$ distribution, with $N$ and $n$ known, we would invert one-sided upper tests, that is, tests that reject the hypothesis $G = g$ when $X$ is large. (The corresponding confidence bound for the population mean is the confidence bound for $G$, divided by $N$.)

The form of the acceptance region for the test is: $$ A_G \equiv \ [0, a_G],$$ where $$ a_G \equiv \min \left \{ k: \sum_{i=k+1}^n \frac{{G \choose i}{N-G \choose n-i}}{{N \choose n}} \le \alpha \right \}.$$

Two-valued populations

Clearly, if the population is known to contain only the values $\{a, b\}$ instead of the values $\{0, 1\}$, it's trivial to re-scale binomial or hypergeometric confidence bounds.

If we observe the sum $Y$ of $n$ iid uniform draws from the population,

  • let $X \equiv (Y - na)/(b-a)$
  • find the Binomial confidence bound for $p$ based on $X$ with $n$ trials
  • transform each endpoint $q$ of that interval to $(1-q)a + qb$ to get the corresponding endpoint of the interval for the mean of the original population.

For example, suppose the population was known to contain only the values 1 and 10, but in unknown proportions.

We draw an iid uniform sample of 50 items; the sample sum is 320 (corresponding to drawing "1" twenty times and "10" thirty times).

Then a 95% upper confidence bound for the fraction of 10s in the population is binoUpperCL(50, 30, cl=0.95) = 0.7169, so a 95% upper confidence bound for the population mean is

$$(1-0.7169)\times 1 + 0.7169\times 10) = 7.45182.$$

Thresholding for bounded populations

The applications that motivate this inquiry generally call for one-sided confidence bounds rather than confidence intervals. For instance:

  • A plaintiff may want to show that, with high confidence, the damages are at least \$ $x$
  • A prosecutor may want to show that, with high confidence, the accused committed fraud worth at least \$ $x$
  • A financial auditor might want to certify that, with high confidence, a company's assets are overstated by at most \$$x$ and its liabilities are understated by _at most_ \$$y$.
  • A local election official might want determine whether, with high confidence, the error in tallying the votes is not as large as the margin
  • An online marketer might want to show that, with high confidence, its technology increases sales by at least $x$%.
  • A pharmaceutical company might want to show that, with high confidence, its headache remedy decreases the average duration of headaches by at least $x$ hours, or that its Ebola vaccine increases the rate of survival by at least $x$%.

In general, some constraint is needed to get even a one-sided (conservative or exact) confidence bound.

In these problems, there are natural constraints, for instance:

  • Damages cannot be negative (the plaintiff doesn't owe the defendant—unless the defendant sues the plaintiff, becoming the plaintiff in a new action)
  • Fraudulent billing is non-negative; the frauduent portion of a bill does not exceed total bill

Consider the second example in the Normal Approximation notebook, a mixture of a uniform and a point mass at zero. Suppose for the moment that we are interested in an upper confidence bound, e.g., an upper bound on the overstatement of a collection of accounts, for the purpose of establishing that a company's books are fairly represented.

We have an unknown population $\{ x_j \}_{j=1}^N$. We want an upper confidence bound for $\mu_x \equiv \frac{1}{N} \sum_{j=1}^N x_j$.

Suppose we have an a priori upper bound $u_j$ on the value $x_j$ of item $j$, $\forall j$. For simplicity, let's take $u_j = 1$. Pick a threshold $t < 1$. Imagine a new population $\{ y_j \}_{j=1}^N$, where

$$y_j \equiv \left \{ \begin{array}{ll} t, & x_j \le t \cr 1, & x_j > t. \end{array} \right . $$

Clearly $\mu_y \equiv \frac{1}{N} \sum_{j=1}^N y_j \ge \mu_x$, so an upper confidence bound for $\mu_y$ is also an upper confidence bound for $\mu_x$. But we can find an upper confidence bound for $\mu_y$ using a random sample with replacement and the general two-value transformation of the Binomial, as sketched above.

  • draw a random sample with replacement of size $n$ from the population
  • let $X$ denote the number of items in the sample with value greater than $t$. Then $X$ has a Binomial distribution with parameters $n$ and $p$, where $p$ is the population fraction of items with value greater than $t$
  • find an upper $1-\alpha$ confidence bound $p^+$ for $p$ by inverting Binomial tests
  • an upper $1-\alpha$ confidence bound for $\mu$ is $(1-p^+)t + p^+$.

Let's see how this does compared to a one-sided Student-t interval. We will estimate the coverage for a variety of thresholds $t$. We will also estimate the average length of the intervals, to get an idea of the tradeoff between coverage and precision.

It is perhaps discomfiting to have a tuning parameter $t$ in the method, but regardless of the choice of $t$, the intervals are guaranteed to be conservative (true converage probability at least $1-\alpha$. However, the expected length of the interval depends on $t$ and on the underlying population of values.

In [4]:
# Nonstandard mixture: a pointmass at zero and a uniform[0,1]
ns = np.array([25, 50, 100, 400])  # sample sizes
ps = np.array([0.9, 0.99, 0.999])    # mixture fraction, weight of pointmass
thresh = [0.2, 0.1, 0.01, .001]
alpha = 0.05  # 1- (confidence level)
reps = 1.0e3   # just for demonstration

cols = ['mass at 0', 'sample size', 'Student-t cov']
for i in range(len(thresh)):
    cols.append('Bin t=' + str(thresh[i]) + ' cov')
cols.append('Student-t len')
for i in range(len(thresh)):
    cols.append('Bin t=' + str(thresh[i]) + ' len')


simTable = pd.DataFrame(columns=cols)

for p in ps:
    popMean = (1-p)*0.5  #  p*0 + (1-p)*.5
    for n in ns:
        tCrit = sp.stats.t.ppf(q=1-alpha, df=n-1)
        coverT = 0    # coverage of t intervals
        tUp = 0       # mean upper bound of t intervals
        coverB = np.zeros(len(thresh))  # coverage of binomial threshold intervals
        bUp = np.zeros(len(thresh))     # mean upper bound of binomial threshold intervals
        for rep in range(int(reps)):
            sam = np.random.uniform(size=n)
            ptMass = np.random.uniform(size=n)
            sam[ptMass < p] = 0.0
            samMean = np.mean(sam)
            samSD = np.std(sam, ddof=1)
            tlim = samMean + tCrit*samSD
            coverT += (popMean <= tlim)  # one-sided Student-t
            tUp += tlim
            for i in range(len(thresh)):
                x = (sam > thresh[i]).sum()  # number of binomial "successes"
                pPlus = binoUpperCL(n, x, cl=1-alpha)
                blim = thresh[i]*(1.0-pPlus) + pPlus
                coverB[i] += (popMean <= blim)
                bUp[i] += blim
        theRow = [p, n, str(100*float(coverT)/float(reps)) + '%']
        for i in range(len(thresh)):
            theRow.append(str(100*float(coverB[i])/float(reps)) + '%')
        theRow.append(str(round(tUp/float(reps), 3)))
        for i in range(len(thresh)):
            theRow.append(str(round(bUp[i]/float(reps), 3)))
        simTable.loc[len(simTable)] =  theRow
#
ansStr =  '<h3>Simulated coverage probability and expected lengths of one-sided Student-t confidence intervals and threshold ' +\
          'Binomial intervals for mixture of U[0,1] and pointmass at 0</h3>' +\
          '<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
          '%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'

display(HTML(ansStr))
display(simTable)

Simulated coverage probability and expected lengths of one-sided Student-t confidence intervals and threshold Binomial intervals for mixture of U[0,1] and pointmass at 0

Nominal coverage probability 95.0%.
Estimated from 1000 replications.
mass at 0 sample size Student-t cov Bin t=0.2 cov Bin t=0.1 cov Bin t=0.01 cov Bin t=0.001 cov Student-t len Bin t=0.2 len Bin t=0.1 len Bin t=0.01 len Bin t=0.001 len
0 0.900 25.0 89.5% 100.0% 100.0% 100.0% 100.0% 0.316 0.383 0.318 0.262 0.257
1 0.900 50.0 98.8% 100.0% 100.0% 100.0% 100.0% 0.332 0.337 0.266 0.204 0.198
2 0.900 100.0 99.9% 100.0% 100.0% 100.0% 100.0% 0.334 0.311 0.236 0.17 0.163
3 0.900 400.0 100.0% 100.0% 100.0% 100.0% 100.0% 0.339 0.285 0.206 0.136 0.129
4 0.990 25.0 22.9% 100.0% 100.0% 100.0% 100.0% 0.047 0.301 0.214 0.137 0.129
5 0.990 50.0 38.0% 100.0% 100.0% 100.0% 100.0% 0.058 0.257 0.165 0.083 0.075
6 0.990 100.0 60.2% 100.0% 100.0% 100.0% 100.0% 0.069 0.234 0.139 0.055 0.046
7 0.990 400.0 96.7% 100.0% 100.0% 100.0% 100.0% 0.093 0.216 0.119 0.032 0.023
8 0.999 25.0 2.3% 100.0% 100.0% 100.0% 100.0% 0.004 0.291 0.203 0.123 0.115
9 0.999 50.0 6.3% 100.0% 100.0% 100.0% 100.0% 0.008 0.248 0.154 0.07 0.061
10 0.999 100.0 9.8% 100.0% 100.0% 100.0% 100.0% 0.008 0.225 0.128 0.041 0.032
11 0.999 400.0 32.5% 100.0% 100.0% 100.0% 100.0% 0.015 0.207 0.108 0.019 0.01

Notice that in many cases the expected length of the Student-t interval is greater than the length of the binomial interval, even though the coverage probability of the Student-t interval is smaller.

Trinomial Bounds, Multinomial Bounds, and the Stringer Bound

Instead of thresholding at one value (yielding a binomial variable: "success" means "above threshold"), we could discretize the support of $\bar{X}$ into three or more ranges, and make inferences based on the trinomial or multinomial distribution that induces.

Specifying the acceptance regions in a way that allows the tests to be inverted in a simple way is not simple (and there are issues with published methods).

The Stringer Bound is well known in financial auditing. It amounts to combining confidence limits for data-dependent ranges (adaptive thresholds) into an overall confidence limit. Empirically, it is conservative, and Bickel has shown that it is asymptotically conservative, but as far as I know, there is no proof that it is conservative for finite samples.

Moreover, computational evidence suggests that methods we will explore in later lectures perform better than multinomial methods and the Stringer bound in practice, so we will not dwell on them.

What's next?

Now we will consider some conservative methods for constructing lower confidence intervals for the mean of nonnegative populations

In [5]:
%run talkTools.py
In [ ]: