import numba
print(numba.__version__)
0.17.0
Numba allows the compilation of selected portions of Python code to native code, using llvm as its backend. This allows the selected functions to execute at a speed competitive with code generated by C compilers.
It works at the function level. We can take a function, generate native code for that function as well as the wrapper code needed to call it directly from Python. This compilation is done on-the-fly and in-memory.
In this notebook I will illustrate some very simple usage of numba.
Let's start with a simple, yet time consuming function: a Python implementation of bubblesort. This bubblesort implementation works on a NumPy array.
def bubblesort(X):
N = len(X)
for end in range(N, 1, -1):
for i in range(end - 1):
cur = X[i]
if cur > X[i + 1]:
tmp = X[i]
X[i] = X[i + 1]
X[i + 1] = tmp
#X[i], X[i + 1] = X[i + 1], X[i]
Now, let's try the function, this way we check that it works. First we'll create an array of sorted values and randomly shuffle them:
import numpy as np
original = np.arange(0.0, 10.0, 0.01, dtype='f4')
shuffled = original.copy()
np.random.shuffle(shuffled)
Now we'll create a copy and do our bubble sort on the copy:
sorted = shuffled.copy()
bubblesort(sorted)
print(np.array_equal(sorted, original))
True
Let's see how it behaves in execution time:
sorted[:] = shuffled[:]
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)
1 loops, best of 3: 317 ms per loop
Note that as execution time may depend on its input and the function itself is destructive, I make sure to use the same input in all the timings, by copying the original shuffled array into the new one. %timeit makes several runs and takes the best result, if the copy wasn't done inside the timing code the vector would only be unsorted in the first iteration. As bubblesort works better on vectors that are already sorted, the next runs would be selected and we will get the time when running bubblesort in an already sorted array. In our case the copy time is minimal, though:
%timeit sorted[:] = shuffled[:]
The slowest run took 19.48 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 771 ns per loop
Let's get a numba version of this code running. One way to compile a function is by using the numba.jit decorator with an explicit signature. Later, we will see that we can get by without providing such a signature by letting numba figure out the signatures by itself. However, it is useful to know what the signature is, and what role it has in numba.
First, let's start by peeking at the numba.jit string-doc:
print(numba.jit.__doc__)
jit([signature_or_function, [locals={}, [target='cpu', [**targetoptions]]]]) This function is used to compile a Python function into native code. It is designed to be used as a decorator for the function to be compiled, but it can also be called as a regular function. Args ----- signature_or_function: function or str This argument takes either the function to be compiled, or the signature of the function to be compiled. If this function is used as a decorator, the function to be compiled is the decorated function. In that case, this argument should only be used to optionally specify the function signature. If this function is called like a regular function, and this argument is used to specify the function signature, this function will return another jit function object which can be called again with the function to be compiled as this argument. argtypes: deprecated restype: deprecated locals: dict Mapping of local variable names to Numba types. Used to override the types deduced by Numba's type inference engine. targets: str Specifies the target platform to compile for. Valid targets are cpu, gpu, npyufunc, and cuda. Defaults to cpu. targetoptions: For a cpu target, valid options are: nopython: bool Set to True to disable the use of PyObjects and Python API calls. The default behavior is to allow the use of PyObjects and Python API. Default value is False. forceobj: bool Set to True to force the use of PyObjects for every value. Default value is False. looplift: bool Set to True to enable jitting loops in nopython mode while leaving surrounding code in object mode. This allows functions to allocate NumPy arrays and use Python objects, while the tight loops in the function can still be compiled in nopython mode. Any arrays that the tight loop uses should be created before the loop is entered. Default value is True. wraparound: bool Set to True to enable array indexing wraparound for negative indices, for a small performance penalty. Default value is True. Returns -------- compiled function Examples -------- The function can be used in the following ways: 1) jit(signature, [target='cpu', [**targetoptions]]) -> jit(function) Equivalent to: d = dispatcher(function, targetoptions) d.compile(signature) Create a dispatcher object for a python function and default target-options. Then, compile the funciton with the given signature. Example: @jit("void(int32, float32)") def foo(x, y): return x + y 2) jit(function) -> dispatcher Same as old autojit. Create a dispatcher function object that specialize at call site. Example: @jit def foo(x, y): return x + y 3) jit([target='cpu', [**targetoptions]]) -> configured_jit(function) Same as old autojit and 2). But configure with target and default target-options. Example: @jit(target='cpu', nopython=True) def foo(x, y): return x + y
So let's make a compiled version of our bubblesort:
bubblesort_jit = numba.jit("void(f4[:])")(bubblesort)
At this point, bubblesort_jit contains the compiled function -wrapped so that is directly callable from Python- generated from the original bubblesort function. Note that there is a fancy parameter "void(f4[:])" that is passed. That parameter describes the signature of the function to generate (more on this later).
Let's check that it works:
sorted[:] = shuffled[:] # reset to shuffled before sorting
bubblesort_jit(sorted)
print(np.array_equal(sorted, original))
True
Now let's compare the time it takes to execute the compiled function compared to the original
%timeit sorted[:] = shuffled[:]; bubblesort_jit(sorted)
1000 loops, best of 3: 820 µs per loop
%timeit sorted[:] = shuffled[:]; bubblesort(sorted)
1 loops, best of 3: 307 ms per loop
Bear in mind that numba.jit is a decorator, although for practical reasons in this tutorial we will be calling it like a function to have access to both, the original function and the jitted one. In many practical uses, the decorator syntax may be more appropriate. With the decorator syntax our sample will look like this:
@numba.jit("void(f4[:])")
def bubblesort_jit(X):
N = len(X)
for end in range(N, 1, -1):
for i in range(end - 1):
cur = X[i]
if cur > X[i + 1]:
tmp = X[i]
X[i] = X[i + 1]
X[i + 1] = tmp
In order to generate fast code, the compiler needs type information for the code. This allows a direct mapping from the Python operations to the appropriate machine instruction without any type check/dispatch mechanism. In numba, in most cases it suffices to specify the types for the parameters. In many cases, numba can deduce types for intermediate values as well as the return value using type inference. For convenience, it is also possible to specify in the signature the type of the return value
A numba.jit compiled function will only work when called with the right type of arguments (it may, however, perform some conversions on types that it considers equivalent).
A signature contains the return type as well as the argument types. One way to specify the signature is using a string, like in our example. The signature takes the form: <return type> ( <arg1 type>, <arg2 type>, ... )
. The types may be scalars or arrays (NumPy arrays). In our example, void(f4[:])
, it means a function with no return (return type is void
) that takes as unique argument an one-dimensional array of 4 byte floats f4[:]
. Starting with numba version 0.12 the result type is optional. In that case the signature will look like the following: <arg1 type>, <arg2 type>, ...
. When the signature doesn't provide a type for the return value, the type is inferred.
One way to specify the signature is by using such a string, the type for each argument being based on NumPy dtype strings for base types. Array types are also supported by using [:] type notation, where [:] is a one-dimensional strided array, [::1] is a one-dimensional contiguous array, [:,:] a bidimensional strided array, [:,:,:] a tridimiensional array, and so on. There are other ways to build the signature, you can find more details on signatures in its documentation page.
Some sample signatures follow:
signature | meaning |
---|---|
void(f4[:], u8) |
a function with no return value taking a one-dimensional array of single precision floats and a 64-bit unsigned integer. |
i4(f8) |
a function returning a 32-bit signed integer taking a double precision float as argument. |
void(f4[:,:],f4[:,:]) |
a function with no return value taking two 2-dimensional arrays as arguments. |
For a more in-depth explanation on supported types you can take a look at the "Numba types" notebook tutorial.
Starting with numba version 0.12, it is possible to use numba.jit without providing a type-signature for the function. This functionality was provided by numba.autojit in previous versions of numba. The old numba.autojit has been deprecated in favour of this signature-less version of numba.jit.
When no type-signature is provided, the decorator returns wrapper code that will automatically create and run a numba compiled version when called. When called, resulting function will infer the types of the arguments being used. That information will be used to generate the signature to be used when compiling. The resulting compiled function will be called with the provided arguments.
For performance reasons, functions are cached so that code is only compiled once for a given signature. It is possible to call the function with different signatures, in that case, different native code will be generated and the right version will be chosen based on the argument types.
For most uses, using jit without a signature will be the simplest option.
bubblesort_autojit = numba.jit(bubblesort)
%timeit sorted[:] = shuffled[:]; bubblesort_autojit(sorted)
The slowest run took 90.48 times longer than the fastest. This could mean that an intermediate result is being cached 1000 loops, best of 3: 1.32 ms per loop
There is no magic, there are several details that is good to know about numba.
First, compiling takes time. Luckily enough it will not be a lot of time, specially for small functions. But when compiling many functions with many specializations the time may add up. Numba tries to do its best by caching compilation as much as possible though, so no time is spent in spurious compilation. It does its best to be lazy regarding compilation, this allows not paying the compilation time for code that is not used.
Second, not all code is compiled equal. There will be code that numba compiles down to an efficient native function. Sometimes the code generated has to fallback to the Python object system and its dispatch semantics. Other code may not compile at all.
When targeting the "cpu" target (the default), numba will either generate:
Fast native code -also called 'nopython'-. The compiler was able to infer all the types in the function, so it can translate the code to a fast native routine without making use of the Python runtime.
Native code with calls to the Python run-time -also called object mode-. The compiler was not able to infer all the types, so that at some point a value was typed as a generic 'object'. This means the full native version can't be used. Instead, numba generates code using the Python run-time that should be faster than actual interpretation but quite far from what you could expect from a full native function.
By default, the 'cpu' target tries to compile the function in 'nopython' mode. If this fails, it tries again in object mode.
This example shows how falling back to Python objects may cause a slowdown in the generated code:
@numba.jit("void(i1[:])")
def test(value):
for i in xrange(len(value)):
value[i] = i % 100
from decimal import Decimal
@numba.jit("void(i1[:])")
def test2(value):
for i in xrange(len(value)):
value[i] = i % Decimal(100)
res = np.zeros((10000,), dtype="i1")
%timeit test(res)
The slowest run took 15.52 times longer than the fastest. This could mean that an intermediate result is being cached 10000 loops, best of 3: 26.6 µs per loop
%timeit test2(res)
1 loops, best of 3: 367 ms per loop
It is possible to force a failure if the nopython code generation fails. This allows getting some feedback about whether it is possible to generate code for a given function that doesn't rely on the Python run-time. This can help when trying to write fast code, as object mode can have a huge performance penalty.
@numba.jit("void(i1[:])", nopython=True)
def test(value):
for i in xrange(len(value)):
value[i] = i % 100
On the other hand, test2 fails if we pass the nopython keyword:
@numba.jit("void(i1[:])", nopython=True)
def test2(value):
for i in xrange(len(value)):
value[i] = i % Decimal(100)
--------------------------------------------------------------------------- TypingError Traceback (most recent call last) <ipython-input-19-6038b783c49c> in <module>() ----> 1 @numba.jit("void(i1[:])", nopython=True) 2 def test2(value): 3 for i in xrange(len(value)): 4 value[i] = i % Decimal(100) /Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/decorators.pyc in wrapper(func) 169 disp = dispatcher(py_func=func, locals=locals, 170 targetoptions=targetoptions) --> 171 disp.compile(sig) 172 disp.disable_compile() 173 return disp /Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/dispatcher.pyc in compile(self, sig) 275 self.py_func, 276 args=args, return_type=return_type, --> 277 flags=flags, locals=self.locals) 278 279 # Check typing error if object mode is used /Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library) 545 pipeline = Pipeline(typingctx, targetctx, library, 546 args, return_type, flags, locals) --> 547 return pipeline.compile_extra(func) 548 549 /Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in compile_extra(self, func) 291 raise e 292 --> 293 return self.compile_bytecode(bc, func_attr=self.func_attr) 294 295 def compile_bytecode(self, bc, lifted=(), /Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in compile_bytecode(self, bc, lifted, func_attr) 299 self.lifted = lifted 300 self.func_attr = func_attr --> 301 return self._compile_bytecode() 302 303 def compile_internal(self, bc, func_attr=DEFAULT_FUNCTION_ATTRIBUTES): /Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in _compile_bytecode(self) 532 533 pm.finalize() --> 534 return pm.run(self.status) 535 536 /Users/aterrel/workspace/apps/anaconda/envs/pydata_apps/lib/python2.7/site-packages/numba/compiler.pyc in run(self, status) 189 # No more fallback pipelines? 190 if is_final_pipeline: --> 191 raise patched_exception 192 # Go to next fallback pipeline 193 else: TypingError: Failed at nopython (nopython frontend) Untyped global name 'Decimal' File "<ipython-input-19-6038b783c49c>", line 4