Notebook

First Steps with numba¶

In [1]:

import numba
print(numba.__version__)

0.17.0

Introduction to numba¶

Numba allows the compilation of selected portions of Python code to native code, using llvm as its backend. This allows the selected functions to execute at a speed competitive with code generated by C compilers.

It works at the function level. We can take a function, generate native code for that function as well as the wrapper code needed to call it directly from Python. This compilation is done on-the-fly and in-memory.

In this notebook I will illustrate some very simple usage of numba.

A simple example¶

Let's start with a simple, yet time consuming function: a Python implementation of bubblesort. This bubblesort implementation works on a NumPy array.

In [2]:

def bubblesort(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp
                #X[i], X[i + 1] = X[i + 1], X[i]

Now, let's try the function, this way we check that it works. First we'll create an array of sorted values and randomly shuffle them:

In [3]:

import numpy as np

original = np.arange(0.0, 10.0, 0.01, dtype='f4')
shuffled = original.copy()
np.random.shuffle(shuffled)

Now we'll create a copy and do our bubble sort on the copy:

In [4]:

sorted = shuffled.copy()
bubblesort(sorted)
print(np.array_equal(sorted, original))

True

Let's see how it behaves in execution time:

In [5]:

sorted[:] = shuffled[:]
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)

1 loops, best of 3: 317 ms per loop

Note that as execution time may depend on its input and the function itself is destructive, I make sure to use the same input in all the timings, by copying the original shuffled array into the new one. %timeit makes several runs and takes the best result, if the copy wasn't done inside the timing code the vector would only be unsorted in the first iteration. As bubblesort works better on vectors that are already sorted, the next runs would be selected and we will get the time when running bubblesort in an already sorted array. In our case the copy time is minimal, though:

In [6]:

%timeit sorted[:] = shuffled[:]

The slowest run took 19.48 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 771 ns per loop

Compiling a function with numba.jit using an explicit function signature¶

Let's get a numba version of this code running. One way to compile a function is by using the numba.jit decorator with an explicit signature. Later, we will see that we can get by without providing such a signature by letting numba figure out the signatures by itself. However, it is useful to know what the signature is, and what role it has in numba.

First, let's start by peeking at the numba.jit string-doc:

In [7]:

print(numba.jit.__doc__)

jit([signature_or_function, [locals={}, [target='cpu', [**targetoptions]]]])

    This function is used to compile a Python function into native code. It is
    designed to be used as a decorator for the function to be compiled,
    but it can also be called as a regular function.
    
    Args
    -----
    signature_or_function: function or str
        This argument takes either the function to be compiled, or the signature
        of the function to be compiled. If this function is used as a decorator,
        the function to be compiled is the decorated function. In that case,
        this argument should only be used to optionally specify the function
        signature. If this function is called like a regular function, and this
        argument is used to specify the function signature, this function will
        return another jit function object which can be called again with the
        function to be compiled as this argument.

    argtypes: deprecated

    restype: deprecated

    locals: dict
        Mapping of local variable names to Numba types. Used to override the
        types deduced by Numba's type inference engine.

    targets: str
        Specifies the target platform to compile for. Valid targets are cpu,
        gpu, npyufunc, and cuda. Defaults to cpu.

    targetoptions: 
        For a cpu target, valid options are:
            nopython: bool
                Set to True to disable the use of PyObjects and Python API
                calls. The default behavior is to allow the use of PyObjects
                and Python API. Default value is False.

            forceobj: bool
                Set to True to force the use of PyObjects for every value.
                Default value is False.

            looplift: bool
                Set to True to enable jitting loops in nopython mode while
                leaving surrounding code in object mode. This allows functions
                to allocate NumPy arrays and use Python objects, while the
                tight loops in the function can still be compiled in nopython
                mode. Any arrays that the tight loop uses should be created
                before the loop is entered. Default value is True.

            wraparound: bool
                Set to True to enable array indexing wraparound for negative
                indices, for a small performance penalty. Default value
                is True.

    Returns
    --------

    compiled function

    Examples
    --------
    The function can be used in the following ways:

    1) jit(signature, [target='cpu', [**targetoptions]]) -> jit(function)

        Equivalent to:

            d = dispatcher(function, targetoptions)
            d.compile(signature)

        Create a dispatcher object for a python function and default
        target-options.  Then, compile the funciton with the given signature.

        Example:

            @jit("void(int32, float32)")
            def foo(x, y):
                return x + y

    2) jit(function) -> dispatcher

        Same as old autojit.  Create a dispatcher function object that
        specialize at call site.

        Example:

            @jit
            def foo(x, y):
                return x + y

    3) jit([target='cpu', [**targetoptions]]) -> configured_jit(function)

        Same as old autojit and 2).  But configure with target and default
        target-options.


        Example:

            @jit(target='cpu', nopython=True)
            def foo(x, y):
                return x + y

So let's make a compiled version of our bubblesort:

In [8]:

bubblesort_jit = numba.jit("void(f4[:])")(bubblesort)

At this point, bubblesort_jit contains the compiled function -wrapped so that is directly callable from Python- generated from the original bubblesort function. Note that there is a fancy parameter "void(f4[:])" that is passed. That parameter describes the signature of the function to generate (more on this later).

Let's check that it works:

In [9]:

sorted[:] = shuffled[:] # reset to shuffled before sorting
bubblesort_jit(sorted)
print(np.array_equal(sorted, original))

True

Now let's compare the time it takes to execute the compiled function compared to the original

In [10]:

%timeit sorted[:] = shuffled[:]; bubblesort_jit(sorted)

1000 loops, best of 3: 820 µs per loop

In [11]:

%timeit sorted[:] = shuffled[:]; bubblesort(sorted)

1 loops, best of 3: 307 ms per loop

Bear in mind that numba.jit is a decorator, although for practical reasons in this tutorial we will be calling it like a function to have access to both, the original function and the jitted one. In many practical uses, the decorator syntax may be more appropriate. With the decorator syntax our sample will look like this:

In [12]:

@numba.jit("void(f4[:])")
def bubblesort_jit(X):
    N = len(X)
    for end in range(N, 1, -1):
        for i in range(end - 1):
            cur = X[i]
            if cur > X[i + 1]:
                tmp = X[i]
                X[i] = X[i + 1]
                X[i + 1] = tmp

Signature¶

In order to generate fast code, the compiler needs type information for the code. This allows a direct mapping from the Python operations to the appropriate machine instruction without any type check/dispatch mechanism. In numba, in most cases it suffices to specify the types for the parameters. In many cases, numba can deduce types for intermediate values as well as the return value using type inference. For convenience, it is also possible to specify in the signature the type of the return value

A numba.jit compiled function will only work when called with the right type of arguments (it may, however, perform some conversions on types that it considers equivalent).

A signature contains the return type as well as the argument types. One way to specify the signature is using a string, like in our example. The signature takes the form: <return type> ( <arg1 type>, <arg2 type>, ... ). The types may be scalars or arrays (NumPy arrays). In our example, void(f4[:]), it means a function with no return (return type is void) that takes as unique argument an one-dimensional array of 4 byte floats f4[:]. Starting with numba version 0.12 the result type is optional. In that case the signature will look like the following: <arg1 type>, <arg2 type>, .... When the signature doesn't provide a type for the return value, the type is inferred.

One way to specify the signature is by using such a string, the type for each argument being based on NumPy dtype strings for base types. Array types are also supported by using [:] type notation, where [:] is a one-dimensional strided array, [::1] is a one-dimensional contiguous array, [:,:] a bidimensional strided array, [:,:,:] a tridimiensional array, and so on. There are other ways to build the signature, you can find more details on signatures in its documentation page.

Some sample signatures follow:

signature	meaning
`void(f4[:], u8)`	a function with no return value taking a one-dimensional array of single precision floats and a 64-bit unsigned integer.
`i4(f8)`	a function returning a 32-bit signed integer taking a double precision float as argument.
`void(f4[:,:],f4[:,:])`	a function with no return value taking two 2-dimensional arrays as arguments.

For a more in-depth explanation on supported types you can take a look at the "Numba types" notebook tutorial.

Compiling a function without providing a function signature (autojit functionality)¶

Starting with numba version 0.12, it is possible to use numba.jit without providing a type-signature for the function. This functionality was provided by numba.autojit in previous versions of numba. The old numba.autojit has been deprecated in favour of this signature-less version of numba.jit.

When no type-signature is provided, the decorator returns wrapper code that will automatically create and run a numba compiled version when called. When called, resulting function will infer the types of the arguments being used. That information will be used to generate the signature to be used when compiling. The resulting compiled function will be called with the provided arguments.

For performance reasons, functions are cached so that code is only compiled once for a given signature. It is possible to call the function with different signatures, in that case, different native code will be generated and the right version will be chosen based on the argument types.

For most uses, using jit without a signature will be the simplest option.

In [13]:

bubblesort_autojit = numba.jit(bubblesort)

In [14]:

%timeit sorted[:] = shuffled[:]; bubblesort_autojit(sorted)

The slowest run took 90.48 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.32 ms per loop

Some extra remarks¶

There is no magic, there are several details that is good to know about numba.

First, compiling takes time. Luckily enough it will not be a lot of time, specially for small functions. But when compiling many functions with many specializations the time may add up. Numba tries to do its best by caching compilation as much as possible though, so no time is spent in spurious compilation. It does its best to be lazy regarding compilation, this allows not paying the compilation time for code that is not used.

Second, not all code is compiled equal. There will be code that numba compiles down to an efficient native function. Sometimes the code generated has to fallback to the Python object system and its dispatch semantics. Other code may not compile at all.

When targeting the "cpu" target (the default), numba will either generate:

Fast native code -also called 'nopython'-. The compiler was able to infer all the types in the function, so it can translate the code to a fast native routine without making use of the Python runtime.
Native code with calls to the Python run-time -also called object mode-. The compiler was not able to infer all the types, so that at some point a value was typed as a generic 'object'. This means the full native version can't be used. Instead, numba generates code using the Python run-time that should be faster than actual interpretation but quite far from what you could expect from a full native function.

By default, the 'cpu' target tries to compile the function in 'nopython' mode. If this fails, it tries again in object mode.

This example shows how falling back to Python objects may cause a slowdown in the generated code:

In [15]:

@numba.jit("void(i1[:])")
def test(value):
    for i in xrange(len(value)):
        value[i] = i % 100

from decimal import Decimal
@numba.jit("void(i1[:])")
def test2(value):
    for i in xrange(len(value)):
        value[i] = i % Decimal(100)

res = np.zeros((10000,), dtype="i1")

In [16]:

%timeit test(res)

The slowest run took 15.52 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 26.6 µs per loop

In [17]:

%timeit test2(res)

1 loops, best of 3: 367 ms per loop

It is possible to force a failure if the nopython code generation fails. This allows getting some feedback about whether it is possible to generate code for a given function that doesn't rely on the Python run-time. This can help when trying to write fast code, as object mode can have a huge performance penalty.

In [18]:

@numba.jit("void(i1[:])", nopython=True)
def test(value):
    for i in xrange(len(value)):
        value[i] = i % 100

On the other hand, test2 fails if we pass the nopython keyword:

In [19]:

@numba.jit("void(i1[:])", nopython=True)
def test2(value):
    for i in xrange(len(value)):
        value[i] = i % Decimal(100)

---------------------------------------------------------------------------
TypingError                               Traceback (most recent call last)
<ipython-input-19-6038b783c49c> in <module>()
----> 1 @numba.jit("void(i1[:])", nopython=True)
      2 def test2(value):
      3     for i in xrange(len(value)):
      4         value[i] = i % Decimal(100)

/Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/decorators.pyc in wrapper(func)
    169         disp = dispatcher(py_func=func,  locals=locals,
    170                           targetoptions=targetoptions)
--> 171         disp.compile(sig)
    172         disp.disable_compile()
    173         return disp

/Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/dispatcher.pyc in compile(self, sig)
    275                                           self.py_func,
    276                                           args=args, return_type=return_type,
--> 277                                           flags=flags, locals=self.locals)
    278 
    279             # Check typing error if object mode is used

/Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library)
    545     pipeline = Pipeline(typingctx, targetctx, library,
    546                         args, return_type, flags, locals)
--> 547     return pipeline.compile_extra(func)
    548 
    549 

/Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in compile_extra(self, func)
    291                 raise e
    292 
--> 293         return self.compile_bytecode(bc, func_attr=self.func_attr)
    294 
    295     def compile_bytecode(self, bc, lifted=(),

/Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in compile_bytecode(self, bc, lifted, func_attr)
    299         self.lifted = lifted
    300         self.func_attr = func_attr
--> 301         return self._compile_bytecode()
    302 
    303     def compile_internal(self, bc, func_attr=DEFAULT_FUNCTION_ATTRIBUTES):

/Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in _compile_bytecode(self)
    532 
    533         pm.finalize()
--> 534         return pm.run(self.status)
    535 
    536 

/Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in run(self, status)
    189                     # No more fallback pipelines?
    190                     if is_final_pipeline:
--> 191                         raise patched_exception
    192                     # Go to next fallback pipeline
    193                     else:

TypingError: Failed at nopython (nopython frontend)
Untyped global name 'Decimal'
File "<ipython-input-19-6038b783c49c>", line 4

In [ ]: