Confidence bounds for the mean of a bounded population: Binomial and Hypergeometric

- Elementary derivation for {0, 1} populations
- Obvious extension to 2-valued populations
- Thresholding for general bounded populations
- Trinomial and Multinomial bounds

Let's start by constructing two-sided confidence intervals for the same {0, 1} cases in which the Normal approximation did so badly.

To find two-sided intervals, we find lower and upper confidence bounds at half the value of $\alpha$. This is equivalent to constructing acceptance regions that are "balanced" in the sense that, under the null hypothesis, the chance of rejecting the hypothesis because $X$ is too big is equal to the chance of rejecting the hypothesis because $X$ is too small.

The code tells the story:

In [1]:

```
# This is the first cell with code: set up the Python environment
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np
import scipy as sp
import scipy.stats
from scipy.stats import binom
import pandas as pd
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
from IPython.display import clear_output, display, HTML
```

In [2]:

```
def binoLowerCL(n, x, cl = 0.975, inc=0.000001, p = None):
"Lower confidence level cl confidence interval for Binomial p, for x successes in n trials"
if p is None:
p = float(x)/float(n)
lo = 0.0
if (x > 0):
f = lambda q: cl - scipy.stats.binom.cdf(x-1, n, q)
lo = sp.optimize.brentq(f, 0.0, p, xtol=inc)
return lo
def binoUpperCL(n, x, cl = 0.975, inc=0.000001, p = None):
"Upper confidence level cl confidence interval for Binomial p, for x successes in n trials"
if p is None:
p = float(x)/float(n)
hi = 1.0
if (x < n):
f = lambda q: scipy.stats.binom.cdf(x, n, q) - (1-cl)
hi = sp.optimize.brentq(f, p, 1.0, xtol=inc)
return hi
```

In [3]:

```
# Population of two values, {0, 1}, in various proportions. Amounts to Binomial random variable
ns = np.array([25, 50, 100, 400]) # sample sizes
ps = np.array([.001, .01, 0.1]) # mixture fractions, proportion of 1s in the population
alpha = 0.05 # 1- (confidence level)
reps = int(1.0e3) # just for demonstration
vals = [0, 1]
simTable = pd.DataFrame(columns=('fraction of 1s', 'sample size', 'Student-t cov',\
'Binom cov', 'Student-t len', 'Binom len'))
for p in ps:
popMean = p
for n in ns:
tCrit = sp.stats.t.ppf(q=1.0-alpha/2, df=n-1)
samMean = np.zeros(reps)
sam = sp.stats.binom.rvs(n, p, size=reps)
samMean = sam/float(n)
samSD = np.sqrt(samMean*(1-samMean)/(n-1))
coverT = (np.fabs(samMean-popMean) < tCrit*samSD).sum()
aveLenT = 2*(tCrit*samSD).mean()
coverB = 0
totLenB = 0.0
for r in range(int(reps)):
lo = binoLowerCL(n, sam[r], cl=1.0-alpha/2)
hi = binoUpperCL(n, sam[r], cl=1.0-alpha/2)
coverB += ( p >= lo) & (p <= hi)
totLenB += hi-lo
simTable.loc[len(simTable)] = p, n, str(100*float(coverT)/float(reps)) + '%', \
str(100*float(coverB)/float(reps)) + '%',\
str(round(aveLenT,4)),\
str(round(totLenB/float(reps),4))
#
ansStr = '<h3>Simulated coverage probability and expected length of Student-t and Binomial confidence intervals for a {0, 1} population</h3>' +\
'<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
'%</strong>.<br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'
display(HTML(ansStr))
display(simTable)
```

The empirical coverage rates are gnerally much *higher* than 95% when $n$ is small, because of the discreteness of the Binomial distribution: to ensure that a test rejects *at most* 5% of the time might require rejecting far less than 5% of the time. That is, the tests are conservative rather than exact.

We could construct *randomized* tests that have exactly level $\alpha$, but we won't go there...

Consider $n$ uniform draws *without* replacement from a $\{0, 1\}$ population of known size $N$ that contains an unknown number $G$ of 1s—i.e., a *simple random sample* of size $n$.
Let $X$ denote the number of 1s in the sample.
Then $X$ has a Hypergeometric distribution with parameters $N$, $G$, and $n$.

In particular, $$ \mathbb P_{N, G, n} (X = k) = \frac{{G \choose k}{N-G \choose n-k}}{{N \choose n}}$$ for $\max(0, n − (N−G)) \le k \le \min(n, G)$.

We can get sharper confidence bounds in this case by inverting hypergeometric tests instead of Binomial tests.

The strategy is the same: construct a family of acceptance regions for each possible value of the parameter, then let the confidence set be the parameter values for which, given the observed data, the test would not reject.

To find a one-sided lower confidence bound for $G$ in a Hypergeometric$(N, G, n)$ distribution, with $N$ and $n$ known, we would invert one-sided upper tests, that is, tests that reject the hypothesis $G = g$ when $X$ is large. (The corresponding confidence bound for the population mean is the confidence bound for $G$, divided by $N$.)

The form of the acceptance region for the test is: $$ A_G \equiv \ [0, a_G],$$ where $$ a_G \equiv \min \left \{ k: \sum_{i=k+1}^n \frac{{G \choose i}{N-G \choose n-i}}{{N \choose n}} \le \alpha \right \}.$$

Clearly, if the population is known to contain only the values $\{a, b\}$ instead of the values $\{0, 1\}$, it's trivial to re-scale binomial or hypergeometric confidence bounds.

If we observe the sum $Y$ of $n$ iid uniform draws from the population,

- let $X \equiv (Y - na)/(b-a)$
- find the Binomial confidence bound for $p$ based on $X$ with $n$ trials
- transform each endpoint $q$ of that interval to $(1-q)a + qb$ to get the corresponding endpoint of the interval for the mean of the original population.

For example, suppose the population was known to contain only the values 1 and 10, but in unknown proportions.

We draw an iid uniform sample of 50 items; the sample sum is 320 (corresponding to drawing "1" twenty times and "10" thirty times).

Then a 95% upper confidence bound for the fraction of 10s in the population is binoUpperCL(50, 30, cl=0.95) = 0.7169, so a 95% upper confidence bound for the population mean is

$$(1-0.7169)\times 1 + 0.7169\times 10) = 7.45182.$$The applications that motivate this inquiry generally call for **one-sided confidence bounds** rather than confidence intervals. For instance:

- A plaintiff may want to show that, with high confidence, the damages are
*at least*\$ $x$ - A prosecutor may want to show that, with high confidence, the accused committed fraud worth
*at least*\$ $x$ - A financial auditor might want to certify that, with high confidence, a company's assets are overstated by
*at most*\$$x$ and its liabilities are understated by _at most_ \$$y$. - A local election official might want determine whether, with high confidence, the error in tallying the votes is
*not as large as*the margin - An online marketer might want to show that, with high confidence, its technology increases sales by
*at least*$x$%. - A pharmaceutical company might want to show that, with high confidence, its headache remedy decreases the average duration of headaches by
*at least*$x$ hours, or that its Ebola vaccine increases the rate of survival by*at least*$x$%.

In general, *some* constraint is needed to get even a one-sided (conservative or exact) confidence bound.

In these problems, there are natural constraints, for instance:

- Damages cannot be negative (the plaintiff doesn't owe the defendant—unless the defendant sues the plaintiff, becoming the plaintiff in a new action)
- Fraudulent billing is non-negative; the frauduent portion of a bill does not exceed total bill

Consider the second example in the Normal Approximation notebook, a mixture of a uniform and a point mass at zero. Suppose for the moment that we are interested in an upper confidence bound, e.g., an upper bound on the overstatement of a collection of accounts, for the purpose of establishing that a company's books are fairly represented.

We have an unknown population $\{ x_j \}_{j=1}^N$. We want an upper confidence bound for $\mu_x \equiv \frac{1}{N} \sum_{j=1}^N x_j$.

Suppose we have an a priori upper bound $u_j$ on the value $x_j$ of item $j$, $\forall j$. For simplicity, let's take $u_j = 1$. Pick a threshold $t < 1$. Imagine a new population $\{ y_j \}_{j=1}^N$, where

$$y_j \equiv \left \{ \begin{array}{ll} t, & x_j \le t \cr 1, & x_j > t. \end{array} \right . $$

Clearly $\mu_y \equiv \frac{1}{N} \sum_{j=1}^N y_j \ge \mu_x$, so an upper confidence bound for $\mu_y$ is also an upper confidence bound for $\mu_x$. But we can find an upper confidence bound for $\mu_y$ using a random sample with replacement and the general two-value transformation of the Binomial, as sketched above.

- draw a random sample with replacement of size $n$ from the population
- let $X$ denote the number of items in the sample with value greater than $t$. Then $X$ has a Binomial distribution with parameters $n$ and $p$, where $p$ is the population fraction of items with value greater than $t$
- find an upper $1-\alpha$ confidence bound $p^+$ for $p$ by inverting Binomial tests
- an upper $1-\alpha$ confidence bound for $\mu$ is $(1-p^+)t + p^+$.

Let's see how this does compared to a one-sided Student-t interval. We will estimate the coverage for a variety of thresholds $t$. We will also estimate the average length of the intervals, to get an idea of the tradeoff between coverage and precision.

It is perhaps discomfiting to have a tuning parameter $t$ in the method, but regardless of the choice of $t$, the intervals are guaranteed to be *conservative* (true converage probability at least $1-\alpha$.
However, the expected *length* of the interval depends on $t$ and on the underlying population of values.

In [4]:

```
# Nonstandard mixture: a pointmass at zero and a uniform[0,1]
ns = np.array([25, 50, 100, 400]) # sample sizes
ps = np.array([0.9, 0.99, 0.999]) # mixture fraction, weight of pointmass
thresh = [0.2, 0.1, 0.01, .001]
alpha = 0.05 # 1- (confidence level)
reps = 1.0e3 # just for demonstration
cols = ['mass at 0', 'sample size', 'Student-t cov']
for i in range(len(thresh)):
cols.append('Bin t=' + str(thresh[i]) + ' cov')
cols.append('Student-t len')
for i in range(len(thresh)):
cols.append('Bin t=' + str(thresh[i]) + ' len')
simTable = pd.DataFrame(columns=cols)
for p in ps:
popMean = (1-p)*0.5 # p*0 + (1-p)*.5
for n in ns:
tCrit = sp.stats.t.ppf(q=1-alpha, df=n-1)
coverT = 0 # coverage of t intervals
tUp = 0 # mean upper bound of t intervals
coverB = np.zeros(len(thresh)) # coverage of binomial threshold intervals
bUp = np.zeros(len(thresh)) # mean upper bound of binomial threshold intervals
for rep in range(int(reps)):
sam = np.random.uniform(size=n)
ptMass = np.random.uniform(size=n)
sam[ptMass < p] = 0.0
samMean = np.mean(sam)
samSD = np.std(sam, ddof=1)
tlim = samMean + tCrit*samSD
coverT += (popMean <= tlim) # one-sided Student-t
tUp += tlim
for i in range(len(thresh)):
x = (sam > thresh[i]).sum() # number of binomial "successes"
pPlus = binoUpperCL(n, x, cl=1-alpha)
blim = thresh[i]*(1.0-pPlus) + pPlus
coverB[i] += (popMean <= blim)
bUp[i] += blim
theRow = [p, n, str(100*float(coverT)/float(reps)) + '%']
for i in range(len(thresh)):
theRow.append(str(100*float(coverB[i])/float(reps)) + '%')
theRow.append(str(round(tUp/float(reps), 3)))
for i in range(len(thresh)):
theRow.append(str(round(bUp[i]/float(reps), 3)))
simTable.loc[len(simTable)] = theRow
#
ansStr = '<h3>Simulated coverage probability and expected lengths of one-sided Student-t confidence intervals and threshold ' +\
'Binomial intervals for mixture of U[0,1] and pointmass at 0</h3>' +\
'<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
'%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'
display(HTML(ansStr))
display(simTable)
```

Notice that in many cases the expected length of the Student-t interval is *greater* than the length of the binomial interval, even though the coverage probability of the Student-t interval is smaller.

Instead of thresholding at one value (yielding a binomial variable: "success" means "above threshold"), we could discretize the support of $\bar{X}$ into three or more ranges, and make inferences based on the trinomial or multinomial distribution that induces.

Specifying the acceptance regions in a way that allows the tests to be inverted in a simple way is not simple (and there are issues with published methods).

The Stringer Bound is well known in financial auditing. It amounts to combining confidence limits for data-dependent ranges (adaptive thresholds) into an overall confidence limit. Empirically, it is conservative, and Bickel has shown that it is asymptotically conservative, but as far as I know, there is no proof that it is conservative for finite samples.

Moreover, computational evidence suggests that methods we will explore in later lectures perform better than multinomial methods and the Stringer bound in practice, so we will not dwell on them.

Now we will consider some conservative methods for constructing lower confidence intervals for the mean of nonnegative populations

In [5]:

```
%run talkTools.py
```

In [ ]:

```
```