This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
random
Python module and NumPy.import random
import numpy as np
We use the %precision
magic (defined in IPython) to show only 3 decimals in the Python output. This is just to alleviate the text.
%precision 3
x
and y
, each one containing one million random numbers between 0 and 1.n = 1000000
x = [random.random() for _ in range(n)]
y = [random.random() for _ in range(n)]
x[:3], y[:3]
x
plus the first element of y
, and so on. We use a for
loop in a list comprehension.z = [x[i] + y[i] for i in range(n)]
z[:3]
%timeit
magic command to quickly evaluate the time taken by a single command.%timeit [x[i] + y[i] for i in range(n)]
np.array()
function does just that.xa = np.array(x)
ya = np.array(y)
xa[:3]
The arrays xa
and ya
contain the exact same numbers than our original lists x
and y
. Whereas those lists where instances of a built-in class list
, our arrays are instances of a NumPy class ndarray
. Those types are implemented very differently in Python and NumPy. We will see that, in this example, using arrays instead of lists leads to drastic performance improvements.
for
loop anymore. In NumPy, adding two arrays means adding the elements of the arrays component by component.za = xa + ya
za[:3]
We see that the list z
and the array za
contain the same elements (the sum of the numbers in x
and y
).
%timeit xa + ya
We observe that this operation is more than one order of magnitude faster in NumPy than in pure Python!
x
or xa
. Although this is not an element-wise operation, NumPy is still highly efficient here. The pure Python version uses the built-in sum
function on an iterable. The NumPy version uses the np.sum()
function on a NumPy array.%timeit sum(x) # pure Python
%timeit np.sum(xa) # NumPy
We also observe an impressive speedup here.
for
loops.d = [abs(x[i] - y[j])
for i in range(1000) for j in range(1000)]
d[:3]
da = np.abs(xa[:1000,None] - ya[:1000])
da
%timeit [abs(x[i] - y[j]) for i in range(1000) for j in range(1000)]
%timeit np.abs(xa[:1000, None] - ya[:1000])
Here again, observe observe the significant speedups.
You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).
IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).