A Client is the low-level object which manages your connection to the various Schedulers and the Hub. Everything you do passes through one of these objects, either indirectly or directly.
It has an ids
property, which is always an up-to-date list of the integer engine IDs currently available.
import os,sys,time
import numpy
from IPython import parallel
rc = parallel.Client()
rc.ids
The most basic function of the Client is to create the View objects, which are the interfaces for actual communication with the engines.
There are two basic models for working with engines. Let's start with the simplest case for remote execution, a DirectView of one engine:
e0 = rc[0] # index-access of a client gives us a DirectView
e0.block = True # let's start synchronous
e0
It's all about:
view.apply(f, *args, **kwargs)
We want the interface for remote and parallel execution to be as natural as possible.
And what's the most natural unit of execution? Code! Simply define a function,
just as you would use locally, and instead of calling it, pass it to view.apply()
,
with the remaining arguments just as you would have passed them to the function.
def get_norms(A, levels=[2]):
"""get all the requested norms for an array"""
norms = {}
for level in levels:
norms[level] = numpy.linalg.norm(A, level)
return norms
A = numpy.random.random(1024)
get_norms(A, levels=[1,2,3,numpy.inf])
To call this remotely, simply replace 'get_norms(
' with 'e0.apply(get_norms,
'. This replacement is generally true for turning local execution into remote.
Note that this will probably raise a NameError
on numpy:
e0.apply(get_norms, A, levels=[1,2,3,numpy.inf])
The simplest way to import numpy is to do:
e0.execute("import numpy")
But if you want to simultaneously import modules locally and globally, you can use view.sync_imports()
:
with e0.sync_imports():
import numpy
e0.apply(get_norms, A, levels=[1,2,3,numpy.inf])
Functions don’t have to be interactively defined, you can use module functions as well:
e0.apply(numpy.linalg.norm, A, 2)
You can also run files or strings with run
and execute
respectively.
For instance, I have a script myscript.py
that defines a function
mysquare
:
import math
import numpy
import sys
a=5
def mysquare(x):
return x*x
I can run that remotely, just like I can locally with %run
, and then I
will have mysquare()
, and any imports and globals from the script in the
engine's namespace:
%pycat myscript.py
e0.run("myscript.py")
e0.execute("b=mysquare(a)")
e0['a']
e0['b']
The namespace on the engine is accessible to your functions as
globals
. So if you want to work with values that persist in the engine namespace, you just use
global variables.
def inc_a(increment):
global a
a += increment
print(" %2i" % e0['a'])
e0.apply(inc_a, 5)
print(" + 5")
print(" = %2i" % e0['a'])
And just like the rest of Python, you don’t have to specify global variables if you aren’t assigning to them:
def mul_by_a(b):
return a*b
e0.apply(mul_by_a, 10)
If you want to do multiple actions on data, you obviously don’t want to send it every time. For this, we have a Reference
class. A Reference is just a wrapper for an identifier that gets unserialized by pulling the corresponding object out of the engine namespace.
def is_it_a(b):
return a is b
e0.apply(is_it_a, 5)
e0.apply(is_it_a, parallel.Reference('a'))
parallel.Reference
is useful to avoid repeated data movement.
In addition to calling functions and executing code on engines, you can
transfer Python objects to and from your IPython session and the
engines. In IPython, these operations are called push
(sending an
object to the engines) and pull
(getting an object from the
engines).
push takes a dictionary, used to update the remote namespace:
e0.push(dict(a=1.03234, b=3453))
pull takes one or more keys:
e0.pull('a')
e0.pull(('b','a'))
treating a DirectView like a dictionary results in push/pull operations:
e0['a'] = range(5)
e0.execute('b = a[::-1]')
e0['b']
get()
and update()
work as well.
Can you get the eigenvalues (numpy.linalg.eigvals
and norms (numpy.linalg.norm
) of an array that's already on e0:
A = np.random.random((16,16))
A = A.dot(A.T)
e0['A'] = A
np.linalg.eigvals(A)
np.linalg.norm(A, 2)
We have covered the basic methods for running code remotely, but we have been using block=True
. We can also do non-blocking execution.
e0.block = False
In non-blocking mode, apply
submits the command to be executed and
then returns a AsyncResult
object immediately. The AsyncResult
object gives you a way of getting a result at a later time through its
get()
method.
The AsyncResult object provides a superset of the interface in multiprocessing.pool.AsyncResult
.
See the official Python documentation for more.
def wait(t):
import time
tic = time.time()
time.sleep(t)
return time.time()-tic
ar = e0.apply(wait, 10)
ar
ar.ready()
tells us if the result is ready
ar.ready()
ar.get()
blocks until the result is ready, or a timeout is reached, if one is specified
%time ar.get(1)
%time ar.get()
For convenience, you can set block for a single call with the extra sync/async methods:
e0.apply_sync(os.getpid)
ar = e0.apply_async(os.getpid)
ar
ar.get()
ar.metadata
Now that we have the basic interface covered, we can really get going in Parallel.