I recently learned that NumPy 1.7/1.8 will have a choice
function that randomly chooses k of n elements, with or without replacement, with or without weighted probabilities. I was interested in the simplest case of choosing without replacement and without weights (uniformly).
I took a look at the code and saw the implementation for this case is basically:
return np.random.permutation(n)[:k]
which surprised me because if n
is much larger than k
than it seems you are doing a lot of work for nothing permuting that big array.
I stumbled upon a different suggestion on stackoverflow (see the comment by Saša Šijak on this question which suggested using random.sample
- that is built-in random, not NumPy's random.
So I made a quick benchmark for my specific case:
import random
import numpy as np
def choose_no_rep_python(n,k):
return random.sample(xrange(n), k)
def choose_no_rep_numpy(n, k):
return np.random.permutation(n)[:k]
def choose_no_rep_numpy_take1(n, k):
return np.random.permutation(n).take(range(k))
def choose_no_rep_numpy_take2(n, k):
return np.random.permutation(n).take(arange(k))
def choose_no_rep_numpy_take3(n, k):
return np.random.permutation(n).take(xrange(k))
%timeit -n 10000 choose_no_rep_python(1000, 4)
%timeit -n 10000 choose_no_rep_numpy(1000, 4)
%timeit -n 10000 choose_no_rep_numpy_take1(1000, 4)
%timeit -n 10000 choose_no_rep_numpy_take2(1000, 4)
%timeit -n 10000 choose_no_rep_numpy_take3(1000, 4)
10000 loops, best of 3: 6.73 us per loop 10000 loops, best of 3: 320 us per loop 10000 loops, best of 3: 453 us per loop 10000 loops, best of 3: 439 us per loop 10000 loops, best of 3: 456 us per loop
It seems that the naive python implementation is ~50-fold faster than the NumPy implementation, which surprised me...
import os, sys, platform
print platform.system(), platform.release()
print "python version", sys.version
fin = os.popen("ipython -V")
print "ipython version", fin.read(),
fin.close()
print "numpy version", np.version.version
Windows 7 python version 2.7.3 |EPD 7.3-2 (64-bit)| (default, Apr 12 2012, 15:20:16) [MSC v.1500 64 bit (AMD64)] ipython version 0.13 numpy version 1.6.1