Using dill to pickle anything

IPython.parallel doesn't do much in the way of serialization. It has custom zero-copy handling of numpy arrays, but other than that, it doesn't do anything other than the bare minimum to make basic interactively defined functions and classes sendable.

There are a few projects that extend pickle to make just about anything sendable, and one of these is dill.

NOTE: This requires IPython 1.0-dev and latest dill, currently only available from:

pip install dill-0.2a.dev-20120503.tar.gz

First, as always, we create our

In [1]:
def make_closure(a):
    """make a function with a closure, and return it"""
    def has_closure(b):
        return a * b
    return has_closure
In [2]:
closer = make_closure(5)
In [3]:
closer(2)
Out[3]:
10
In [4]:
import pickle

Without help, pickle can't deal with closures

In [5]:
pickle.dumps(closer)
---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
<ipython-input-5-2dad7d7aaa8a> in <module>()
----> 1 pickle.dumps(closer)

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py in dumps(obj, protocol)
   1372 def dumps(obj, protocol=None):
   1373     file = StringIO()
-> 1374     Pickler(file, protocol).dump(obj)
   1375     return file.getvalue()
   1376 

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py in dump(self, obj)
    222         if self.proto >= 2:
    223             self.write(PROTO + chr(self.proto))
--> 224         self.save(obj)
    225         self.write(STOP)
    226 

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py in save_global(self, obj, name, pack)
    746             raise PicklingError(
    747                 "Can't pickle %r: it's not found as %s.%s" %
--> 748                 (obj, module, name))
    749         else:
    750             if klass is not obj:

PicklingError: Can't pickle <function has_closure at 0x10b6ac140>: it's not found as __main__.has_closure

But after we import dill, magic happens

In [6]:
import dill
In [7]:
pickle.dumps(closer)[:64] + '...'
Out[7]:
"cdill.dill\n_load_type\np0\n(S'FunctionType'\np1\ntp2\nRp3\n(cdill.dill..."

So from now on, pretty much everything is pickleable.

Now use this in IPython.parallel

As usual, we start by creating our Client and View

In [8]:
from IPython import parallel
rc = parallel.Client()
view = rc.load_balanced_view()

We can use dill to allow IPython.parallel to send anything.

Step 1. Disable IPython's special handling of function objects (nothing to work around anymore):

In [9]:
from types import FunctionType
from IPython.utils.pickleutil import can_map

can_map.pop(FunctionType, None)
Out[9]:
IPython.utils.pickleutil.CannedFunction

Step 2. Switch the serialize module to use Python pickle instead of cPickle (dill extends pickle to be able to dump anything, but this doesn't work with the optimized cPickle)

In [10]:
from IPython.kernel.zmq import serialize
serialize.pickle = pickle

And that's it! Now we can send closures and other previously non-pickleables to our engines.

In [11]:
view.apply_sync(closer, 3)
Out[11]:
15

But wait, there's more!

In [12]:
view.apply_sync(make_closure, 2)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)/Users/minrk/dev/ip/mine/IPython/kernel/zmq/serialize.pyc in serialize_object(obj, buffer_threshold, item_threshold)
    107         buffers.extend(_extract_buffers(cobj, buffer_threshold))
    108 
--> 109     buffers.insert(0, pickle.dumps(cobj,-1))
    110     return buffers
    111 
/Users/minrk/dev/ip/mine/IPython/utils/codeutil.pyc in reduce_code(co)
     32 def reduce_code(co):
     33     if co.co_freevars or co.co_cellvars:
---> 34         raise ValueError("Sorry, cannot pickle code objects with closures")
     35     args =  [co.co_argcount, co.co_nlocals, co.co_stacksize,
     36             co.co_flags, co.co_code, co.co_consts, co.co_names,
ValueError: Sorry, cannot pickle code objects with closures

We can also apply very same patch to the engines, then we can send previously unpickleable objects both ways. Here's a way to do dill patch the pickle everywhere at once:

In [13]:
%%px --local

# load dill
import dill

# disable special function handling
from types import FunctionType
from IPython.utils.pickleutil import can_map

can_map.pop(FunctionType, None)

# fallback to pickle instead of cPickle, so that dill can take over
import pickle
from IPython.kernel.zmq import serialize
serialize.pickle = pickle
In [14]:
remote_closure = view.apply_sync(make_closure, 4)
remote_closure(5)
Out[14]:
20

At this point, we can send/recv all kinds of stuff

In [15]:
def outer(a):
    def inner(b):
        def inner_again(c):
            return c * b * a
        return inner_again
    return inner

So outer returns a function with a closure, which returns a function with a closure.

Now, we can resolve the first closure on the engine, the second here, and the third on a different engine, after passing through a lambda we define here and call there, just for good measure.

In [16]:
view.apply_sync(lambda f: f(3),view.apply_sync(outer, 1)(2))
Out[16]:
6