Here we illustrate the basic parallel computing capabilities of IPython.
First, IPython engines must be started, for example with the following command to launch 2 engines (one per core):
ipcluster start -n 2
from IPython.parallel import Client
The Client
allows to start jobs on the engines.
rc = Client()
We can obtain the engines identifiers through the client.
rc.ids
[0, 1]
ERRATUM: the original code did not contain %px
before the import os
statement. This magic command is necessary so that the import occurs on all engines.
%px import os
The %px
magic commands allows to execute commands in parallel on every engine.
%px print(os.getpid())
[stdout:0] 3256 [stdout:1] 1056
We can specify with %pxconfig
the engine identifiers which the commands should be executed on (here, the second engine).
%pxconfig --targets 1
%px print(os.getpid())
1056
Another possibility is to use the %%px
cell magic to run an entire cell on all engines. The --targets
option can accept a slice object (here, all engines except the last one).
%%px --targets :-1
print(os.getpid())
[stdout:0] 3256
By default, the parallel calls are synchronous (blocking) but we can ask IPython to make asynchronous calls.
%%px --noblock
import time
time.sleep(1)
os.getpid()
<AsyncResult: execute>
With asynchronous (non-blocking) calls, the results can be obtained synchronously from the engines with %pxresult
. This call is blocking.
%pxresult
Out[1:4]: 1056
Another option to run tasks on the engines is to use map
. First, we need to retrieve a view on the engines, which represents a particular set of engines among the ones that are running.
v = rc[:]
We import a module on each engine.
with v.sync_imports():
import time
importing time on engine(s)
We define a simple function.
def f(x):
time.sleep(1)
return x * x
Now, we call map_sync
, which is a synchronous and parallel version of Python's built-in map
function. We execute f
on all integers between 0 and 9 in parallel across all engines.
v.map_sync(f, range(10))
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
We check how much time the native function takes.
timeit -n 1 -r 1 map(f, range(10))
1 loops, best of 1: 10 s per loop
And we compare with the time taken by the parallel version.
r = v.map(f, range(10))
r.ready(), r.elapsed
(False, 0.065)
We wait and get the results.
r.get()
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
r.elapsed, r.serial_time
(5.009, 10.0)