I hope you're convinced that Student-t intervals don't necessarily have true coverage levels close to their nominal coverage levels, even for large sample sizes.

Moreover, there are a variety of conservative nonparametric methods that can be used for populations with one-sided or two-sided bounds to produce one-sided or two-sided confidence intervals guaranteed to have coverage probabilities at least as large as their nominal confidence level.

Which is best?

If the population really consists of only two values, it is impossible to improve on exact Binomial intervals for samples drawn with replacement or Hypergeometric intervals for sampling without replacement (for one-sided bounds; for two-sided bounds, there is no unique "best" choice).

For more general populations, your mileage may vary.

Let's do some experiments to compare them. None is best in every situation. Relative performance depends on the population distribution and on sample sizes.

Let's compare the principal methods we've developed, using simulations from a broader variety of populations. We will skip the thresholded binomial, Chebychev's inequality, and Markov's inequality: they are dominated by other methods.

Some of the methods (Hoeffding, Penny Sampling) require upper and lower population bounds. When they are applicable, we might expect them to do better than methods that require only one-sided population bounds (MDKW, Kaplan-Wald), since they use more information.

In [1]:

```
# This is the first cell with code: set up the Python environment
%matplotlib inline
from __future__ import division
import matplotlib.pyplot as plt
import math
import numpy as np
import scipy as sp
import scipy.stats
from scipy.stats import binom
import scipy.optimize
import pandas as pd
from ipywidgets import interact, interactive, fixed
import ipywidgets as widgets
from IPython.display import clear_output, display, HTML
```

In [2]:

```
def binoLowerCL(n, x, cl = 0.975, inc=0.000001, p = None):
"Lower confidence level cl confidence interval for Binomial p, for x successes in n trials"
if p is None:
p = float(x)/float(n)
lo = 0.0
if (x > 0):
f = lambda q: cl - scipy.stats.binom.cdf(x-1, n, q)
lo = sp.optimize.brentq(f, 0.0, p, xtol=inc)
return lo
def binoUpperCL(n, x, cl = 0.975, inc=0.000001, p = None):
"Upper confidence level cl confidence interval for Binomial p, for x successes in n trials"
if p is None:
p = float(x)/float(n)
hi = 1.0
if (x < n):
f = lambda q: scipy.stats.binom.cdf(x, n, q) - (1-cl)
hi = sp.optimize.brentq(f, p, 1.0, xtol=inc)
return hi
def ecdf(x):
'''
calculates the empirical cdf of data x
returns the unique values of x in ascending order and the cumulative probabity at those values
NOTE: This is not an efficient algorithm: it is O(n^2), where n is the length of x.
A better algorithm would rely on the Collections package or something similar and could work
in O(n log n)
'''
theVals = sorted(np.unique(x))
theProbs = np.array([sum(x <= v) for v in theVals])/float(len(x))
if (theVals[0] > 0.0):
theVals = np.append(0., theVals)
theProbs = np.append(0., theProbs)
return theVals, theProbs
def ksLowerMean(x, c):
'''
lower confidence bound for the mean of a nonnegative population
x is an iid sample with replacement from the population
c is the Massart constant for the desired coverage
'''
# find the ecdf
vals, probs = ecdf(x)
probs = np.fmin(probs+c, 1) # This is G^-
gProbs = np.diff(np.append([0.0], probs)) # pre-pend a 0 so that diff does the right thing;
# gProbs is the vector of masses
return (vals*gProbs).sum()
def kaplanWaldLowerCI(x, cl = 0.95, gamma = 0.99, xtol=1.e-12, logf=True):
'''
Calculates the Kaplan-Wald lower 1-alpha confidence bound for the mean of a nonnegative random
variable.
'''
alpha = 1.0-cl
if any(x < 0):
raise ValueError('Data x must be nonnegative.')
elif all(x <= 0):
lo = 0.0
else:
if logf:
f = lambda t: (np.max(np.cumsum(np.log(gamma*x/t + 1 - gamma))) + np.log(alpha))
else:
f = lambda t: (np.max(np.cumprod(gamma*x/t + 1 - gamma)) - 1/alpha)
xm = np.mean(x)
if f(xtol)*f(xm) > 0.0:
lo = 0.0
else:
lo = sp.optimize.brentq(f, xtol, np.mean(x), xtol=xtol)
return lo
def pennySampleReplacement(weights, n):
'''
Weighted random sample of size n drawn with replacement.
Returns indices of the selected items, the "remainder pennies,"
and the raw uniform values used to select the sample
'''
if any(weights < 0):
print 'negative weight in weightedRandomSample'
return float('NaN')
else:
totWt = np.sum(weights, dtype=float)
wc = np.cumsum(weights, dtype=float)/totWt # ensure weights sum to 1
theSam = np.random.random_sample((n,))
inx = np.array(wc).searchsorted(theSam)
penny = [(wc[inx[i]]-theSam[i])*totWt for i in range(n)]
return inx, penny, theSam
def pennyBinomialLowerBound(x, inx, pennies, cl=0.95):
'''
Penny sampling lower (one-sided) 1-alpha confidence bound on the mean, for sampling with replacement.
x is the vector of observed values
pennies is the vector of _which_ "penny" in each sampled item is to be adjudicated as "good" or "bad"
The first x_j pennies in item j are deemed "good," the remaining (u_j - x_j) are "bad."
Returns the lower bound and the number of "good" pennies in the sample.
'''
s = sum([pennies[i] <= x[inx[i]] for i in range(len(pennies))])
n = len(inx)
return binoLowerCL(n, s, cl=cl), s
def pennyBinomialBounds(x, inx, pennies, cl=0.95):
'''
Penny sampling 2-sided confidence interval for the mean, for sampling with replacement.
x is the vector of observed values
pennies is the vector of _which_ "penny" in each sampled item is to be adjudicated as "good" or "bad"
The first x_j pennies in item j are deemed "good," the remaining (u_j - x_j) are "bad."
Returns the lower bound, the upper bound and the number of "good" pennies in the sample.
'''
s = sum([pennies[i] <= x[inx[i]] for i in range(len(pennies))])
n = len(inx)
return binoLowerCL(n, s, cl=1-(1-cl)/2), binoUpperCL(n, s, cl=1-(1-cl)/2), s
```

We will compare lower confidence bounds using truncated Hoeffding, MDKW, Kaplan-Wald, and Continuous Penny Sampling

In [3]:

```
# Nonstandard mixture: a pointmass at zero and a uniform[0,1]
ns = np.array([25, 50, 100, 400]) # sample sizes
ps = np.array([0.9, 0.99, 0.999]) # mixture fraction, weight of pointmass
alpha = 0.05 # 1- (confidence level)
reps = int(1.0e4) # just for demonstration
gamma = 0.99 # tuning constant in Kaplan-Wald
xtol = 1.0e-6 # numerical tolerance for Kaplan-Wald
cols = ['mass at 0', 'sample size', 'Trunc Hoeff cov', 'MDKW cov', 'KW cov', 'Penny cov',\
'Trunc Hoeff low', 'MDKW low', 'KW low', 'Penny low']
simTable = pd.DataFrame(columns=cols)
for p in ps:
popMean = (1-p)*0.5 # p*0 + (1-p)*.5
for n in ns:
hCrit = np.sqrt(-math.log(alpha/2)/(2*n)) # Hoeffding concentration bound
mCrit = np.sqrt(-np.log(alpha)/(2.0*n)) # the 1-sided MDKW constant
covH = 0
covM = 0
covK = 0
covP = 0
lowH = 0.0
lowM = 0.0
lowK = 0.0
lowP = 0.0
for rep in range(int(reps)):
sam = np.random.uniform(size=n)
ptMass = np.random.uniform(size=n)
pennies = np.random.uniform(size=n)
sam[ptMass < p] = 0.0
samMean = np.mean(sam)
#
hLow = max(samMean - hCrit, 0.0)
covH += (hLow <= popMean)
lowH += hLow
#
mLow = ksLowerMean(sam, mCrit)
covM += (mLow <= popMean)
lowM += mLow
#
kLow = kaplanWaldLowerCI(sam, cl = 1-alpha, gamma = 0.99, xtol = xtol)
covK += (kLow <= popMean)
lowK += kLow
#
pLow, s = pennyBinomialLowerBound(sam, np.r_[0:n], pennies, cl=1-alpha)
covP += (pLow <= popMean)
lowP += pLow
simTable.loc[len(simTable)] = p, n,\
str(100*float(covH)/float(reps)) + '%',\
str(100*float(covM)/float(reps)) + '%',\
str(100*float(covK)/float(reps)) + '%',\
str(100*float(covP)/float(reps)) + '%',\
str(round(lowH/float(reps),4)),\
str(round(lowM/float(reps),4)),\
str(round(lowK/float(reps),4)),\
str(round(lowP/float(reps), 4))
#
ansStr = '<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals ' +\
'mixture of U[0,1] and pointmass at 0</h3>' +\
'<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
'%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'
display(HTML(ansStr))
display(simTable)
```

Truncated Hoeffding intervals do not appear to be competitive—despite the fact that they use more information than the Kaplan-Wald interval. The Kaplan-Wald interval is slightly worse than the continuous penny sampling interval for this population (using this value of $\gamma$), but KW requires only nonnegativity.

Let's look at what happens with a pointmass at 1 instead of 0.

In [4]:

```
# Nonstandard mixture: a pointmass at 1 and a uniform[0,1]
ns = np.array([25, 50, 100, 400]) # sample sizes
ps = np.array([0.9, 0.99, 0.999]) # mixture fraction, weight of pointmass
alpha = 0.05 # 1- (confidence level)
reps = int(1.0e4) # just for demonstration
gamma = 0.99 # tuning constant in Kaplan-Wald
xtol = 1.0e-12
cols = ['mass at 1', 'sample size', 'trunc Hoeff cov', 'MDKW cov', 'KW cov', 'Penny cov',\
'trunc Hoeff low', 'MDKW low', 'KW low', 'Penny low']
simTable = pd.DataFrame(columns=cols)
for p in ps:
popMean = (1-p)*0.5 + p
for n in ns:
hCrit = np.sqrt(-math.log(alpha/2)/(2*n)) # Hoeffding concentration bound
mCrit = np.sqrt(-np.log(alpha)/(2.0*n)) # the 1-sided MDKW constant
covH = 0
covM = 0
covK = 0
covP = 0
lowH = 0.0
lowM = 0.0
lowK = 0.0
lowP = 0.0
for rep in range(int(reps)):
sam = np.random.uniform(size=n)
ptMass = np.random.uniform(size=n)
pennies = np.random.uniform(size=n)
sam[ptMass < p] = 1.0
samMean = np.mean(sam)
#
hLow = max(samMean - hCrit, 0.0)
covH += (hLow <= popMean)
lowH += hLow
#
mLow = ksLowerMean(sam, mCrit)
covM += (mLow <= popMean)
lowM += mLow
#
kLow = kaplanWaldLowerCI(sam, cl = 1-alpha, gamma = gamma, xtol = xtol)
covK += (kLow <= popMean)
lowK += kLow
#
pLow, s = pennyBinomialLowerBound(sam, np.r_[0:n], pennies, cl=1-alpha)
covP += (pLow <= popMean)
lowP += pLow
simTable.loc[len(simTable)] = p, n,\
str(100*float(covH)/float(reps)) + '%',\
str(100*float(covM)/float(reps)) + '%',\
str(100*float(covK)/float(reps)) + '%',\
str(100*float(covP)/float(reps)) + '%',\
str(round(lowH/float(reps),4)),\
str(round(lowM/float(reps),4)),\
str(round(lowK/float(reps),4)),\
str(round(lowP/float(reps),4))
#
ansStr = '<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals ' +\
'mixture of U[0,1] and pointmass at 1</h3>' +\
'<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
'%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'
display(HTML(ansStr))
display(simTable)
```

Here again, the Kaplan-Wald method performs essentially the same as Continuous Penny Sampling (with $\gamma = 0.99$), even though KW only requires nonnegativity, and Continuous Penny Sampling requires an upper bound on the population as well.

Let's see what happens as $\gamma$ varies.

In [5]:

```
# Nonstandard mixture: a pointmass at 0 and a uniform[0,1]
ns = np.array([25, 50, 100, 400]) # sample sizes
ps = np.array([0.9, 0.99, 0.999]) # mixture fraction, weight of pointmass
alpha = 0.05 # 1- (confidence level)
reps = int(1.0e4) # just for demonstration
gamma = np.array([0.01, 0.1, 0.5, 0.9, 0.999]) # tuning constant in Kaplan-Wald
xtol = 1.0e-12
cols = ['mass at 0', 'sample size']
cols.extend(['KW cov ' + str(g) for g in gamma])
cols.extend(['KW low ' + str(g) for g in gamma])
simTable = pd.DataFrame(columns=cols)
for p in ps:
popMean = (1-p)*0.5
for n in ns:
covK = np.zeros(len(gamma))
lowK = np.zeros(len(gamma))
for rep in range(int(reps)):
sam = np.random.uniform(size=n)
ptMass = np.random.uniform(size=n)
pennies = np.random.uniform(size=n)
sam[ptMass < p] = 0.0
samMean = np.mean(sam)
#
for i in range(len(gamma)):
kLow = kaplanWaldLowerCI(sam, cl = 1-alpha, gamma = gamma[i], xtol = xtol)
covK[i] += (kLow <= popMean)
lowK[i] += kLow
#
theRow = [p, n]
theRow.extend([str(100*float(covK[i])/float(reps)) + '%' for i in range(len(gamma))])
theRow.extend([str(round(lowK[i]/float(reps),4)) for i in range(len(gamma))])
simTable.loc[len(simTable)] = theRow
#
ansStr = '<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals ' +\
'mixture of U[0,1] and pointmass at 0</h3>' +\
'<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
'%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'
display(HTML(ansStr))
display(simTable)
```

As you can see, smaller values of $\gamma$ improve the confidence bound when many observations are (nearly) zero. The Kaplan-Wald method is quite competitive with Continuous Penny Sampling in this case when $\gamma = 0.1$.

In [6]:

```
# Nonstandard mixture: a pointmass at 1 and a uniform[0,1]
ns = np.array([25, 50, 100, 400]) # sample sizes
ps = np.array([0.9, 0.99, 0.999]) # mixture fraction, weight of pointmass
alpha = 0.05 # 1- (confidence level)
reps = int(1.0e4) # just for demonstration
gamma = np.array([0.01, 0.1, 0.5, 0.9, 0.999]) # tuning constant in Kaplan-Wald
xtol = 1.0e-12
cols = ['mass at 1', 'sample size']
cols.extend(['KW cov ' + str(g) for g in gamma])
cols.extend(['KW low ' + str(g) for g in gamma])
simTable = pd.DataFrame(columns=cols)
for p in ps:
popMean = (1-p)*0.5 + p
for n in ns:
covK = np.zeros(len(gamma))
lowK = np.zeros(len(gamma))
for rep in range(int(reps)):
sam = np.random.uniform(size=n)
ptMass = np.random.uniform(size=n)
pennies = np.random.uniform(size=n)
sam[ptMass < p] = 1.0
samMean = np.mean(sam)
#
for i in range(len(gamma)):
kLow = kaplanWaldLowerCI(sam, cl = 1-alpha, gamma = gamma[i], xtol = xtol)
covK[i] += (kLow <= popMean)
lowK[i] += kLow
#
theRow = [p, n]
theRow.extend([str(100*float(covK[i])/float(reps)) + '%' for i in range(len(gamma))])
theRow.extend([str(round(lowK[i]/float(reps),4)) for i in range(len(gamma))])
simTable.loc[len(simTable)] = theRow
#
ansStr = '<h3>Simulated coverage probability and expected lengths of one-sided nonparametric confidence intervals ' +\
'mixture of U[0,1] and pointmass at 1</h3>' +\
'<strong>Nominal coverage probability ' + str(100*(1-alpha)) +\
'%</strong>. <br /><strong>Estimated from ' + str(int(reps)) + ' replications.</strong>'
display(HTML(ansStr))
display(simTable)
```

However, using a small value of $\gamma$ hurts the confidence bound—at least for small sample sizes—when $x$ tends to have *few* values near zero.