Working with interactively defined classes in IPython.parallel

Create our Client

In [1]:
from IPython import parallel
client = parallel.Client()
lbv = client.load_balanced_view()
dview = client[:]
<DirectView [0, 1]>

The problem

Interactively defined classes can be annoying to use with IPython.parallel. This stems from how class instances are serialized by pickle

In [2]:
# the real model object is much more involved...
class MyModel(object):
    def maximize(self, r):
        return max(r, 4)

def evaluate(r, model):
    return model.maximize(r)
In [3]:
import cPickle as pickle

m = MyModel()
print pickle.dumps(m)

Note how the pickled model is almost nothing but a reference to __main__.MyModel and object. This is how the instance will be reconstructed remotely. But the probem is that MyModel is not defined on the engines:

In [4]:
model = MyModel()
lbv.apply_sync(evaluate, 2, model)
AttributeError                            Traceback (most recent call last)/Users/minrk/dev/ip/mine/IPython/kernel/zmq/serialize.pyc in unpack_apply_message(bufs, g, copy)
    191     args = []
    192     for i in range(info['nargs']):
--> 193         arg, arg_bufs = unserialize_object(arg_bufs, g)
    194         args.append(arg)
    195     args = tuple(args)
/Users/minrk/dev/ip/mine/IPython/kernel/zmq/serialize.pyc in unserialize_object(buffers, g)
    130         # a zmq message
    131         pobj = bytes(pobj)
--> 132     canned = pickle.loads(pobj)
    133     if istype(canned, sequence_types) and len(canned) < MAX_ITEMS:
    134         for c in canned:
AttributeError: 'DummyMod' object has no attribute 'MyModel'

This is a common problem with relying on interactively defined names, but there are issues peculiar to classes, as opposed to locally defined functions.

The old way

In IPython 0.13, there are a few shortcomings:

  1. Classes cannot be pushed, unlike functions.
  2. There is no way to use %%px to define a class both locally and remotely.

So defining a class everywhere is a two-step process, either manual:

In [19]:
class MyModel1(object):
    def maximize(self, r):
        return max(r, 0)
In [20]:
class MyModel1(object):
    def maximize(self, r):
        return max(r, 0)
In [21]:
model = MyModel1()
lbv.apply_sync(evaluate, 1, model)

Or more automatic:

In [22]:
class MyModel2(object):
    def maximize(self, r):
        return max(r, 2)
In [23]:
# this just executes the previous cell on all our engines
# (note that In[-1] is *this* cell, so In[-2] is the previous one)
dview.execute(In[-2], block=True)
<AsyncResult: finished>
In [24]:
model = MyModel2()
lbv.apply_sync(evaluate, 1, model)

The new way

In IPython 1.0, this has been alleviated to some degree, by making interactively defined classes pushable, and adding %%px --local.

So now, you can use the same push step that is commonly used when you have multi-function tasks:

In [25]:
class MyModel3(object):
    def maximize(self, r):
        return max(r, 3)
In [26]:
dview['MyModel3'] = MyModel3
In [27]:
model = MyModel3()
lbv.apply_sync(evaluate, 0, model)

Or you can simultaneously define the class both locally and remotely with %%px --local

In [28]:
%%px --local
# the real model object is much more involved...
class MyModel4(object):
    def maximize(self, r):
        return max(r, 4)
In [29]:
model = MyModel4()
lbv.apply_sync(evaluate, 2, model)

The new new way

Now, if you really want bleeding edge, there is an outstanding Pull Request that lets you define dependencies for locally defined names, which will actually attach the local objects to the task, so they are guaranteed to be defined when the task runs

In [30]:
class MyModel5(object):
    def maximize(self, r):
        return max(r, 5)
In [31]:
def evaluate(r, model):
    return model.maximize(r)
In [32]:
model = MyModel5()
lbv.apply_sync(evaluate, 2, model)
Back to top