The Closest Two-Sum To Zero problem, posted at Programming Praxis, asks us to write a function that takes a list of k random integers on the interval (-n..n), finds the pair that have a sum closest to zero, and return that sum. After suggesting a solution, Praxis asked "What is the expected minimum sum for a given k and n?"
Here is some code that tries to answer that question by running many trials for various combinations of n and k, then calculate the expected values and plot some results. To keep it accessable, standard python is used to do the trials and calculations; Pandas is only used for graphing.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import itertools as it
from collections import Counter
from random import randrange
ns = (50, 100, 150) # possible values for n and k
ks = (5, 10, 15)
ntrials = 10000 # the number of trials for each (n, k)-combination.
plot_range = [-10, 11] # only plot sums that fall in this range
Generate a list of k random integers in the range(-n, n+1). Then find a pair that have the sum that is closest to zero and return the minimum sum.
def trial_min_2sum(n, k):
lst = [randrange(-n, n+1) for _ in range(k)]
lst.sort(key=abs)
return min(map(sum, zip(lst, lst[1:])), key=abs)
The question "What is the expected minimum sum for a given k and n?" is a somewhat ambiguous--is the minimum sum the actual sum or the distance from zero, i.e., the absolute value of the sum?
The code below looks at both interpretations. A "raw" prefix or suffix indicates the actual sum that is closest to zero, but could be positive or negative; whereas "abs" indicates the distance from zero, and is only positive.
trials = []
for n, k in it.product(ns, ks):
trial = [trial_min_2sum(n, k) for _ in range(ntrials)]
raw_counts = Counter(trial)
abs_counts = Counter(map(abs, trial))
trials.append((n, k, raw_counts, abs_counts))
print('raw counts')
for n, k, raw_cnt, _ in trials:
print("{:3} {:2} {}".format(n, k, [raw_cnt[i] for i in range(*plot_range)]))
print('\nabs counts')
for n, k, _, abs_cnt in trials:
print("{:3} {:2} {}".format(n, k, [abs_cnt[i] for i in range(plot_range[1])]))
raw counts 50 5 [128, 151, 194, 236, 283, 351, 450, 545, 706, 813, 940, 782, 688, 513, 445, 364, 288, 230, 193, 157, 134] 50 10 [3, 2, 9, 23, 31, 69, 120, 313, 785, 1892, 3571, 1858, 735, 329, 130, 59, 29, 7, 9, 5, 4] 50 15 [0, 0, 0, 0, 0, 0, 5, 23, 214, 1559, 6414, 1556, 190, 28, 7, 2, 1, 1, 0, 0, 0] 100 5 [161, 183, 214, 261, 296, 310, 323, 388, 432, 434, 514, 452, 457, 351, 308, 290, 273, 254, 228, 182, 169] 100 10 [22, 35, 75, 107, 139, 211, 341, 614, 938, 1419, 2016, 1404, 933, 563, 373, 232, 165, 94, 60, 57, 26] 100 15 [0, 1, 6, 8, 9, 38, 80, 225, 681, 1949, 4057, 1895, 660, 234, 94, 33, 19, 7, 1, 1, 0] 150 5 [160, 163, 203, 193, 211, 256, 257, 286, 288, 326, 357, 308, 284, 285, 255, 237, 211, 225, 193, 163, 180] 150 10 [81, 86, 118, 160, 270, 319, 396, 644, 849, 1098, 1429, 1097, 778, 592, 491, 368, 248, 179, 121, 108, 81] 150 15 [6, 8, 17, 22, 52, 110, 197, 440, 865, 1780, 2859, 1847, 858, 442, 239, 109, 73, 25, 13, 6, 10] abs counts 50 5 [940, 1595, 1394, 1058, 895, 715, 571, 466, 387, 308, 262] 50 10 [3571, 3750, 1520, 642, 250, 128, 60, 30, 18, 7, 7] 50 15 [6414, 3115, 404, 51, 12, 2, 1, 1, 0, 0, 0] 100 5 [514, 886, 889, 739, 631, 600, 569, 515, 442, 365, 330] 100 10 [2016, 2823, 1871, 1177, 714, 443, 304, 201, 135, 92, 48] 100 15 [4057, 3844, 1341, 459, 174, 71, 28, 15, 7, 2, 0] 150 5 [357, 634, 572, 571, 512, 493, 422, 418, 396, 326, 340] 150 10 [1429, 2195, 1627, 1236, 887, 687, 518, 339, 239, 194, 162] 150 15 [2859, 3627, 1723, 882, 436, 219, 125, 47, 30, 14, 16]
Calculate the expected values
print("n k E(raw) E(abs)")
for n, k, raw_counts, abs_counts in trials:
e_raw = sum(k*v for k,v in raw_counts.items())/ntrials
e_abs = sum(k*v for k,v in abs_counts.items())/ntrials
print("{:3} {:2} {:6.2f} {:6.2f}".format(n, k, e_raw, e_abs))
n k E(raw) E(abs) 50 5 -0.07 5.77 50 10 -0.03 1.15 50 15 -0.00 0.41 100 5 -0.05 11.54 100 10 0.01 2.38 100 15 -0.00 0.93 150 5 0.45 17.28 150 10 0.02 3.47 150 15 0.04 1.45
Build a pandas DataFrame. The data in the columns is the raw counts for sums in the plot_range. The columns are labelled (n,k).
raw_df = pd.DataFrame({(n,k):[cnt[i] for i in range(*plot_range)] for n,k,cnt,_ in trials}, index=range(*plot_range))
plt.figure(); raw_df.plot(figsize=(12,8)); plt.title('Count of raw sums'); plt.legend(loc='best')
<matplotlib.legend.Legend at 0x8892d30>
<matplotlib.figure.Figure at 0x85ed2e8>
This pandas DataFrame is for the abs counts (i.e., the distance from zero).
abs_df = pd.DataFrame({(n,k):[cnt[i] for i in range(plot_range[1])] for n,k,_,cnt in trials}, index=range(plot_range[1]))
plt.figure(); abs_df.plot(figsize=(12,8)); plt.title('Count of abs(sum)'); plt.legend(loc='best')
<matplotlib.legend.Legend at 0x8cdae48>
<matplotlib.figure.Figure at 0x85edfd0>