Expected Value of Closest Two-Sum To Zero¶

The Closest Two-Sum To Zero problem, posted at Programming Praxis, asks us to write a function that takes a list of k random integers on the interval (-n..n), finds the pair that have a sum closest to zero, and return that sum. After suggesting a solution, Praxis asked "What is the expected minimum sum for a given k and n?"

Here is some code that tries to answer that question by running many trials for various combinations of n and k, then calculate the expected values and plot some results. To keep it accessable, standard python is used to do the trials and calculations; Pandas is only used for graphing.

In [11]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [12]:

import itertools as it

from collections import Counter
from random import randrange

In [13]:

ns = (50, 100, 150)      # possible values for n and k
ks = (5, 10, 15)
ntrials = 10000          # the number of trials for each (n, k)-combination.
plot_range = [-10, 11]   # only plot sums that fall in this range

Generate a list of k random integers in the range(-n, n+1). Then find a pair that have the sum that is closest to zero and return the minimum sum.

In [14]:

def trial_min_2sum(n, k):
    lst = [randrange(-n, n+1) for _ in range(k)]
    lst.sort(key=abs)
    return min(map(sum, zip(lst, lst[1:])), key=abs)

The question "What is the expected minimum sum for a given k and n?" is a somewhat ambiguous--is the minimum sum the actual sum or the distance from zero, i.e., the absolute value of the sum?

The code below looks at both interpretations. A "raw" prefix or suffix indicates the actual sum that is closest to zero, but could be positive or negative; whereas "abs" indicates the distance from zero, and is only positive.

In [15]:

trials = []
for n, k in it.product(ns, ks):
    trial = [trial_min_2sum(n, k) for _ in range(ntrials)]
    raw_counts = Counter(trial)
    abs_counts = Counter(map(abs, trial))
    trials.append((n, k, raw_counts, abs_counts))

print('raw counts')
for n, k, raw_cnt, _ in trials:
    print("{:3} {:2} {}".format(n, k, [raw_cnt[i] for i in range(*plot_range)]))
print('\nabs counts')
for n, k, _, abs_cnt in trials:
    print("{:3} {:2} {}".format(n, k, [abs_cnt[i] for i in range(plot_range[1])]))

raw counts
 50  5 [128, 151, 194, 236, 283, 351, 450, 545, 706, 813, 940, 782, 688, 513, 445, 364, 288, 230, 193, 157, 134]
 50 10 [3, 2, 9, 23, 31, 69, 120, 313, 785, 1892, 3571, 1858, 735, 329, 130, 59, 29, 7, 9, 5, 4]
 50 15 [0, 0, 0, 0, 0, 0, 5, 23, 214, 1559, 6414, 1556, 190, 28, 7, 2, 1, 1, 0, 0, 0]
100  5 [161, 183, 214, 261, 296, 310, 323, 388, 432, 434, 514, 452, 457, 351, 308, 290, 273, 254, 228, 182, 169]
100 10 [22, 35, 75, 107, 139, 211, 341, 614, 938, 1419, 2016, 1404, 933, 563, 373, 232, 165, 94, 60, 57, 26]
100 15 [0, 1, 6, 8, 9, 38, 80, 225, 681, 1949, 4057, 1895, 660, 234, 94, 33, 19, 7, 1, 1, 0]
150  5 [160, 163, 203, 193, 211, 256, 257, 286, 288, 326, 357, 308, 284, 285, 255, 237, 211, 225, 193, 163, 180]
150 10 [81, 86, 118, 160, 270, 319, 396, 644, 849, 1098, 1429, 1097, 778, 592, 491, 368, 248, 179, 121, 108, 81]
150 15 [6, 8, 17, 22, 52, 110, 197, 440, 865, 1780, 2859, 1847, 858, 442, 239, 109, 73, 25, 13, 6, 10]

abs counts
 50  5 [940, 1595, 1394, 1058, 895, 715, 571, 466, 387, 308, 262]
 50 10 [3571, 3750, 1520, 642, 250, 128, 60, 30, 18, 7, 7]
 50 15 [6414, 3115, 404, 51, 12, 2, 1, 1, 0, 0, 0]
100  5 [514, 886, 889, 739, 631, 600, 569, 515, 442, 365, 330]
100 10 [2016, 2823, 1871, 1177, 714, 443, 304, 201, 135, 92, 48]
100 15 [4057, 3844, 1341, 459, 174, 71, 28, 15, 7, 2, 0]
150  5 [357, 634, 572, 571, 512, 493, 422, 418, 396, 326, 340]
150 10 [1429, 2195, 1627, 1236, 887, 687, 518, 339, 239, 194, 162]
150 15 [2859, 3627, 1723, 882, 436, 219, 125, 47, 30, 14, 16]

Calculate the expected values

In [16]:

print("n    k   E(raw)  E(abs)")
for n, k, raw_counts, abs_counts in trials:
    e_raw = sum(k*v for k,v in raw_counts.items())/ntrials
    e_abs = sum(k*v for k,v in abs_counts.items())/ntrials
    print("{:3}  {:2}  {:6.2f}  {:6.2f}".format(n, k, e_raw, e_abs))

n    k   E(raw)  E(abs)
 50   5   -0.07    5.77
 50  10   -0.03    1.15
 50  15   -0.00    0.41
100   5   -0.05   11.54
100  10    0.01    2.38
100  15   -0.00    0.93
150   5    0.45   17.28
150  10    0.02    3.47
150  15    0.04    1.45

Build a pandas DataFrame. The data in the columns is the raw counts for sums in the plot_range. The columns are labelled (n,k).

In [17]:

raw_df = pd.DataFrame({(n,k):[cnt[i] for i in range(*plot_range)] for n,k,cnt,_ in trials}, index=range(*plot_range))

In [18]:

plt.figure(); raw_df.plot(figsize=(12,8)); plt.title('Count of raw sums'); plt.legend(loc='best')

Out[18]:

<matplotlib.legend.Legend at 0x8892d30>

<matplotlib.figure.Figure at 0x85ed2e8>

This pandas DataFrame is for the abs counts (i.e., the distance from zero).

In [19]:

abs_df = pd.DataFrame({(n,k):[cnt[i] for i in range(plot_range[1])] for n,k,_,cnt in trials}, index=range(plot_range[1]))

In [20]:

plt.figure(); abs_df.plot(figsize=(12,8)); plt.title('Count of abs(sum)'); plt.legend(loc='best')

Out[20]:

<matplotlib.legend.Legend at 0x8cdae48>

<matplotlib.figure.Figure at 0x85edfd0>

In [ ]: