Here is the fastest way to achieve parallel computation in an IPython notebook when applying a function on independent data points via the map idiom.
The full turotial can be found here.
We define the function we want to use. In this case we want to calculate this classic integral: $$ \int_0^x{e^{-t^2} dt} $$
Start with the standard calculation using scipy.integrate.quad
:
from scipy.integrate import quad
from numpy import exp
def integrand(x):
return exp(-(x)**2)
print "integrand(1) =", integrand(1)
def integral(x):
return quad(integrand,0,x)[0]
print "integral(1) =", integral(1)
integrand(1) = 0.367879441171 integral(1) = 0.746824132812
Our benchmark will be the calculation of the integral on the segments $[0,n]$ for $n\in {0,1,2,...,N}$
N = 1000
%timeit -n 10 map(integral, range(N))
10 loops, best of 3: 618 ms per loop
To do the parallelization we need to turn on an IPython cluster - go to the IPython dashboard to do that.
Then these two lines will import the parallelization client and get a worker pool from it:
from IPython.parallel import Client
pool = Client()[:]
To make sure it worked, let's check how many workers/CPUs we have (the number of these is chosen in the IPython dashboard):
print "# CPUs",len(pool.client.ids)
# CPUs 4
We need to setup the environment - do the imports and load the required functions and variables (no shared memory!):
with pool.sync_imports():
from scipy.integrate import quad
from numpy import exp
pool['integrand'] = integrand
importing quad from scipy.integrate on engine(s) importing exp from numpy on engine(s)
Now we can run the computation in parallel using map_sync
which is similar to map
:
print pool.map_sync(integral, range(10))
[0.0, 0.7468241328124271, 0.8820813907624215, 0.8862073482595214, 0.8862269117895689, 0.8862269254513955, 0.8862269254527582, 0.8862269254527579, 0.8862269254527579, 0.8862269254527579]
%timeit -n 10 pool.map(integral, range(N))
10 loops, best of 3: 1.97 ms per loop
That's it.
The notebook can be found at http://ipython.yoavram.com.
This notebook is licensed under CC-BY-SA 3.0