The basic interface for remote computation with IPython¶

A Client is the low-level object which manages your connection to the various Schedulers and the Hub. Everything you do passes through one of these objects, either indirectly or directly.

It has an ids property, which is always an up-to-date list of the integer engine IDs currently available.

In [ ]:

import os,sys,time
import numpy

from IPython import parallel
rc = parallel.Client()

In [ ]:

rc.ids

The most basic function of the Client is to create the View objects, which are the interfaces for actual communication with the engines.

There are two basic models for working with engines. Let's start with the simplest case for remote execution, a DirectView of one engine:

In [ ]:

e0 = rc[0] # index-access of a client gives us a DirectView
e0.block = True # let's start synchronous
e0

It's all about:

view.apply(f, *args, **kwargs)

We want the interface for remote and parallel execution to be as natural as possible. And what's the most natural unit of execution? Code! Simply define a function, just as you would use locally, and instead of calling it, pass it to view.apply(), with the remaining arguments just as you would have passed them to the function.

In [ ]:

def get_norms(A, levels=[2]):
    """get all the requested norms for an array"""
    norms = {}
    
    for level in levels:
        norms[level] = numpy.linalg.norm(A, level)
    return norms

A = numpy.random.random(1024)
get_norms(A, levels=[1,2,3,numpy.inf])

To call this remotely, simply replace 'get_norms(' with 'e0.apply(get_norms,'. This replacement is generally true for turning local execution into remote.

Note that this will probably raise a NameError on numpy:

In [ ]:

e0.apply(get_norms, A, levels=[1,2,3,numpy.inf])

The simplest way to import numpy is to do:

In [ ]:

e0.execute("import numpy")

But if you want to simultaneously import modules locally and globally, you can use view.sync_imports():

In [ ]:

with e0.sync_imports():
    import numpy

In [ ]:

e0.apply(get_norms, A, levels=[1,2,3,numpy.inf])

Functions don’t have to be interactively defined, you can use module functions as well:

In [ ]:

e0.apply(numpy.linalg.norm, A, 2)

execute and run¶

You can also run files or strings with run and execute respectively.

For instance, I have a script myscript.py that defines a function mysquare:

import math
import numpy
import sys

a=5

def mysquare(x):
    return x*x

I can run that remotely, just like I can locally with %run, and then I will have mysquare(), and any imports and globals from the script in the engine's namespace:

In [ ]:

%pycat myscript.py

In [ ]:

e0.run("myscript.py")

In [ ]:

e0.execute("b=mysquare(a)")

In [ ]:

e0['a']

In [ ]:

e0['b']

Working with the engine namespace¶

The namespace on the engine is accessible to your functions as globals. So if you want to work with values that persist in the engine namespace, you just use global variables.

In [ ]:

def inc_a(increment):
    global a
    a += increment

print("   %2i" % e0['a'])
e0.apply(inc_a, 5)
print(" +  5")
print(" = %2i" % e0['a'])

And just like the rest of Python, you don’t have to specify global variables if you aren’t assigning to them:

In [ ]:

def mul_by_a(b):
    return a*b

e0.apply(mul_by_a, 10)

If you want to do multiple actions on data, you obviously don’t want to send it every time. For this, we have a Reference class. A Reference is just a wrapper for an identifier that gets unserialized by pulling the corresponding object out of the engine namespace.

In [ ]:

def is_it_a(b):
    return a is b

e0.apply(is_it_a, 5)

In [ ]:

e0.apply(is_it_a, parallel.Reference('a'))

parallel.Reference is useful to avoid repeated data movement.

Moving data around¶

In addition to calling functions and executing code on engines, you can transfer Python objects to and from your IPython session and the engines. In IPython, these operations are called push (sending an object to the engines) and pull (getting an object from the engines).

push takes a dictionary, used to update the remote namespace:

In [ ]:

e0.push(dict(a=1.03234, b=3453))

pull takes one or more keys:

In [ ]:

e0.pull('a')

In [ ]:

e0.pull(('b','a'))

Dictionary interface¶

treating a DirectView like a dictionary results in push/pull operations:

In [ ]:

e0['a'] = range(5)
e0.execute('b = a[::-1]')
e0['b']

get() and update() work as well.

Exercise: Remote matrix operations¶

Can you get the eigenvalues (numpy.linalg.eigvals and norms (numpy.linalg.norm) of an array that's already on e0:

In [ ]:

A = np.random.random((16,16))
A = A.dot(A.T)
e0['A'] = A

In [ ]:

np.linalg.eigvals(A)

In [ ]:

np.linalg.norm(A, 2)

Asynchronous execution¶

We have covered the basic methods for running code remotely, but we have been using block=True. We can also do non-blocking execution.

In [ ]:

e0.block = False

In non-blocking mode, apply submits the command to be executed and then returns a AsyncResult object immediately. The AsyncResult object gives you a way of getting a result at a later time through its get() method.

The AsyncResult object provides a superset of the interface in multiprocessing.pool.AsyncResult. See the official Python documentation for more.

In [ ]:

def wait(t):
     import time
     tic = time.time()
     time.sleep(t)
     return time.time()-tic

In [ ]:

ar = e0.apply(wait, 10)
ar

ar.ready() tells us if the result is ready

In [ ]:

ar.ready()

ar.get() blocks until the result is ready, or a timeout is reached, if one is specified

In [ ]:

%time ar.get(1)

In [ ]:

%time ar.get()

For convenience, you can set block for a single call with the extra sync/async methods:

In [ ]:

e0.apply_sync(os.getpid)

In [ ]:

ar = e0.apply_async(os.getpid)
ar

In [ ]:

ar.get()

In [ ]:

ar.metadata

Now that we have the basic interface covered, we can really get going in Parallel.