This notebook was put together by Jake Vanderplas for UW's Astro 599 course. Source and license info is on GitHub.
In a previous session, we talked about ways to get performance out of Python using vectorization strategies within NumPy. Today we're going to talk about approaches which involve interfacing Python to external compiled code, or converting Python code to compilable code. There are several different approaches. We'll give examples of solving a problem by several means:
There is no way to go into depth on all of these, but I mainly just want to show some examples so that you know the types of problems they solve. At the above links, you'll find more in-depth documentation and tutorials.
There are other options we won't talk about here, because they're not as useful for our purposes:
The optimal approach depends on the desired use-case. We'll go through a few examples, from lowest to highest level:
There are a few reasons you might wish to use the tools outlined below:
Sometimes vectorization of algorithms can lead to excessive memory usage, because it relies on temporary arrays to hold intermediate results. Directly implementing the routine in compiled code can lead to more efficient computation.
Sometimes problems cannot be implemented as vectorized NumPy operations. An example is in tree-based algorithms. This is why I used cython when I wrote the k-neighbors and kernel density estimation code within sklearn.neighbors
.
Sometimes there are legacy code-bases that you don't want to have to re-implement: you'd much rather just write some sort of bindings to call the old code from within Python.
We'll do a brief demonstration of some of these tools. These tools do have a bit of a learning curve, so I'll provide references for where you might look to go deeper. The purpose of this lecture is to get a taste of some of the better options for addressing these problems.
Ctypes is part of the standard library, and offers tools to call routines from compiled libraries with Python. These can either be routines in your system libraries, or in shared libraries you compile yourself. We'll see examples of both.
One weakness of ctypes is that it can be very platform-specific. For example, shared-library extensions are different from platform to platform (e.g. *.so
on Linux, *.dll
on Windows, *.dylib
on OSX). Other platform-by-platform issues like 32/64 bit can also make things difficult.
In Linux & Mac, you'll usually find the standard libraries here (remember that !
lets us execute shell commands within IPython):
!ls /usr/lib/
arc libmecabra.dylib arch_tool libmenu.5.4.dylib bundle1.o libmenu.dylib c++ libmx.A.dylib charset.alias libmx.dylib clang libncurses.5.4.dylib cron libncurses.5.dylib crt1.10.5.o libncurses.dylib crt1.10.6.o libneon.27.dylib crt1.o libneon.dylib dtrace libnetsnmp.15.1.2.dylib dyld libnetsnmp.15.dylib dylib1.10.5.o libnetsnmp.25.dylib dylib1.o libnetsnmp.5.2.1.dylib freeradius libnetsnmp.5.dylib gcrt1.o libnetsnmp.dylib groff libnetsnmpagent.25.dylib lazydylib1.o libnetsnmpagent.dylib libACSClient.dylib libnetsnmphelpers.25.dylib libBSDPClient.A.dylib libnetsnmphelpers.dylib libBSDPClient.dylib libnetsnmpmibs.25.dylib libCRFSuite.dylib libnetsnmpmibs.dylib libCRFSuite0.12.dylib libnetsnmptrapd.25.dylib libCoreStorage.dylib libnetsnmptrapd.dylib libDHCPServer.A.dylib libobjc.A.dylib libDHCPServer.dylib libobjc.dylib libDiagnosticMessagesClient.dylib libodbc.a libIOKit.A.dylib libodfde.dylib libIOKit.dylib libpam.1.dylib libLTO.dylib libpam.2.dylib libMatch.1.dylib libpam.dylib libMatch.dylib libpanel.5.4.dylib libOpenScriptingUtil.dylib libpanel.dylib libScreenReader.dylib libpcap.A.dylib libSystem.B.dylib libpcap.dylib libSystem.B_debug.dylib libpcre.0.dylib libSystem.dylib libpcre.dylib libSystem_debug.dylib libpcreposix.0.dylib libUniversalAccess.dylib libpcreposix.dylib libXplugin.1.dylib libpgtypes.3.2.dylib libXplugin.dylib libpgtypes.3.dylib libalias.A.dylib libpgtypes.dylib libalias.dylib libpoll.dylib libapr-1.0.4.5.dylib libpq.5.4.dylib libapr-1.0.dylib libpq.5.dylib libapr-1.dylib libpq.dylib libaprutil-1.0.3.12.dylib libproc.dylib libaprutil-1.0.dylib libprofile_rt.a libaprutil-1.dylib libprofile_rt.dylib libarchive.2.dylib libpthread.dylib libarchive.dylib libpython.dylib libatlas.dylib libpython2.5.dylib libauditd.0.dylib libpython2.6.dylib libauditd.dylib libpython2.7.dylib libauto.dylib libquit.dylib libblas.dylib libreadline.dylib libbsm.0.dylib libresolv.9.dylib libbsm.dylib libresolv.dylib libbz2.1.0.5.dylib librpcsvc.dylib libbz2.1.0.dylib libruby.1.dylib libbz2.dylib libruby.dylib libc++.1.dylib libsandbox.1.dylib libc++.dylib libsandbox.dylib libc++abi.dylib libsasl2.2.0.1.dylib libc.dylib libsasl2.2.0.15.dylib libcblas.dylib libsasl2.2.0.21.dylib libcharset.1.0.0.dylib libsasl2.2.0.22.dylib libcharset.1.dylib libsasl2.2.dylib libcharset.dylib libsasl2.dylib libclang.dylib libsqlite3.0.dylib libclapack.dylib libsqlite3.dylib libcom_err.dylib libssl.0.9.7.dylib libcpp_kext.a libssl.0.9.8.dylib libcrypto.0.9.7.dylib libssl.dylib libcrypto.0.9.8.dylib libstdc++.6.0.9.dylib libcrypto.dylib libstdc++.6.dylib libcsfde.dylib libstdc++.dylib libcups.2.dylib libsvn_client-1.0.0.0.dylib libcups.dylib libsvn_client-1.0.dylib libcupscgi.1.dylib libsvn_client-1.dylib libcupscgi.dylib libsvn_delta-1.0.0.0.dylib libcupsimage.2.dylib libsvn_delta-1.0.dylib libcupsimage.dylib libsvn_delta-1.dylib libcupsmime.1.dylib libsvn_diff-1.0.0.0.dylib libcupsmime.dylib libsvn_diff-1.0.dylib libcupsppdc.1.dylib libsvn_diff-1.dylib libcupsppdc.dylib libsvn_fs-1.0.0.0.dylib libcurl.3.dylib libsvn_fs-1.0.dylib libcurl.4.dylib libsvn_fs-1.dylib libcurl.dylib libsvn_fs_fs-1.0.0.0.dylib libcurses.dylib libsvn_fs_fs-1.0.dylib libdbm.dylib libsvn_fs_fs-1.dylib libdes425.dylib libsvn_fs_util-1.0.0.0.dylib libdl.dylib libsvn_fs_util-1.0.dylib libdtrace.dylib libsvn_fs_util-1.dylib libecpg.6.dylib libsvn_ra-1.0.dylib libecpg.dylib libsvn_ra-1.dylib libecpg_compat.3.3.dylib libsvn_ra_local-1.0.0.0.dylib libecpg_compat.3.dylib libsvn_ra_local-1.0.dylib libecpg_compat.dylib libsvn_ra_local-1.dylib libedit.2.dylib libsvn_ra_neon-1.0.0.0.dylib libedit.3.0.dylib libsvn_ra_neon-1.0.dylib libedit.3.dylib libsvn_ra_neon-1.dylib libedit.dylib libsvn_ra_svn-1.0.0.0.dylib libexpat.1.5.2.dylib libsvn_ra_svn-1.0.dylib libexpat.1.dylib libsvn_ra_svn-1.dylib libexpat.dylib libsvn_repos-1.0.0.0.dylib libexslt.0.dylib libsvn_repos-1.0.dylib libexslt.dylib libsvn_repos-1.dylib libf77lapack.dylib libsvn_subr-1.0.0.0.dylib libffi.dylib libsvn_subr-1.0.dylib libform.5.4.dylib libsvn_subr-1.dylib libform.dylib libsvn_wc-1.0.0.0.dylib libgcc_s.1.dylib libsvn_wc-1.0.dylib libgcc_s.10.4.dylib libsvn_wc-1.dylib libgcc_s.10.5.dylib libsysmon.dylib libgermantok.dylib libtcl.dylib libgmalloc.B.dylib libtcl8.5.dylib libgmalloc.dylib libtclstub8.5.a libgssapi_krb5.dylib libtermcap.dylib libhunspell-1.2.0.0.0.dylib libtidy.A.dylib libhunspell-1.2.0.dylib libtidy.dylib libhunspell-1.2.dylib libtk.dylib libiconv.2.4.0.dylib libtk8.5.dylib libiconv.2.dylib libtkstub8.5.a libiconv.dylib libutil.dylib libicucore.A.dylib libutil1.0.dylib libicucore.dylib libxar.1.dylib libinfo.dylib libxar.dylib libiodbc.2.1.18.dylib libxml2.2.dylib libiodbc.2.dylib libxml2.dylib libiodbc.a libxsanmgrcommon.dylib libiodbc.dylib libxslt.1.dylib libiodbcinst.2.1.18.dylib libxslt.dylib libiodbcinst.2.dylib liby.a libiodbcinst.a libz.1.1.3.dylib libiodbcinst.dylib libz.1.2.5.dylib libipconfig.dylib libz.1.dylib libipsec.A.dylib libz.dylib libipsec.dylib pam libk5crypto.dylib php libkmod.a pkgconfig libkmodc++.a postgresql libkrb4.dylib python2.5 libkrb5.dylib python2.6 libkrb524.dylib python2.7 libkrb5support.dylib rpcsvc libl.a ruby liblangid.dylib sa liblapack.dylib sasl2 liblber.dylib sqlite3 libldap.dylib system libldap_r.dylib tclConfig.sh libm.dylib tkConfig.sh libmecab.1.0.0.dylib zsh libmecab.dylib
CTypes lets you link to any of these libraries and call the functions directly:
from ctypes import CDLL
libc_name = 'libc.dylib' # OSX
# libc_name = 'libc.so.6' # Linux
# libc_name = 'libc.dll' # Windows
libc = CDLL(libc_name)
libc.time
<_FuncPtr object at 0x1021a1ae0>
print "Seconds since January 1, 1970:"
print libc.time()
Seconds since January 1, 1970: 1384797127
You can do more than simply wrapping the functionality of system libraries: you can also create your own C functions and wrap them with CTypes (this will require having a C compiler installed):
%%file my_sum.c
#include <stdio.h>
// sum all the values in the array x
// x is a pointer to a memory block
// of length n
int sum(int *x, int n)
{
int i, counter;
counter = 0;
for(i=0; i<n; i++)
{
counter += x[i];
}
return counter;
}
Overwriting my_sum.c
%%bash
gcc -c my_sum.c
gcc -shared -o my_sum.so my_sum.o
!ls my_sum.*
my_sum.c my_sum.o my_sum.so
Now we'll use the CTypes API to create Python objects that can be passed to this function:
from ctypes import CDLL, c_void_p
import numpy as np
my_sum=CDLL('my_sum.so')
a = np.arange(10, dtype=np.int32)
adata = a.ctypes.data_as(c_void_p)
asize = a.size
print my_sum.sum(adata, asize)
print a.sum()
45 45
Here we've used just simple built-in numerical types: we can also do more complicated things like defining C Structures within Python:
%%file my_sum2.c
#include <stdio.h>
// Define a simple array struct, containing
// a pointer and a length
struct Array{
int *x;
int n;
};
// Sum the values in the array struct
int sum(struct Array a)
{
int counter, i;
counter = 0;
for(i=0; i<a.n; i++)
{
counter += a.x[i];
}
return counter;
}
Overwriting my_sum2.c
%%bash
gcc -c my_sum2.c
gcc -shared -o my_sum2.so my_sum2.o
Now the function requires us to pass a structure as an argument. To handle this, we need to create a mirror of this structure within our Python script. This can be done via the CTypes API:
from ctypes import Structure, POINTER, c_int
# Here's our C structure defined in Python
class Array(Structure):
_fields_ = ("x", c_void_p), ("n", c_int)
my_sum2=CDLL('my_sum2.so')
a = np.arange(10, dtype=np.int32)
arr = Array(a.ctypes.data_as(c_void_p), a.size)
print my_sum2.sum(arr)
print a.sum()
45 45
We're starting to see how this can be helpful, but what if we want something more involved?
Let's interface to the system LAPACK (Linear Algebra Package) routines and solve a linear equation. This is the type of function you might write in order to interface with a code someone else has written.
We'll use the DGESV routine from LAPACK: the Double precision GEneral SolVer for linear equations. This is a standard routine found on most computers. See http://www.netlib.no/netlib/lapack/double/dgesv.f for a description of the call signature.
from ctypes import CDLL, c_int, c_void_p, byref
import numpy as np
def solve(M, b):
"""Solve the linear equation M x = b using LAPACK's DGESV"""
# Interface to the LAPACK system library
# This may be different depending on your operating system
lapack = CDLL('liblapack.dylib')
# Make sure M is double precision fortran-ordered
M = np.asarray(M, dtype=np.float64, order='F')
# Make a copy of b. It will be overwritten with the result
b = np.array(b, dtype=np.float64, copy=True)
n = b.size
# We could generalize this, but we'll do a simple problem for now
assert M.shape == (n, n)
assert b.shape == (n,)
# prepare the variables for dgesv.
size = c_int(n)
one = c_int(1)
info = c_int(0)
lapack.dgesv(byref(size), # size of the problem
byref(one), # number of columns of b
M.ctypes.data_as(c_void_p), # M data array
byref(size), # size of the problem
(n * c_int)(), # integer work array
b.ctypes.data_as(c_void_p), # b data array
byref(size), # size of b
byref(info)) # exit flag
return b
X = np.random.random((5, 5))
b = np.random.random(5)
print solve(X, b)
[ 1.18799324 -0.49926094 1.43864153 -1.71137939 1.89825396]
print np.linalg.solve(X, b)
[ 1.18799324 -0.49926094 1.43864153 -1.71137939 1.89825396]
The routine np.linalg.solve
actually uses this same routine (depending on your install options), so it should give precisely the same results. You'd never want to do this for LAPACK routines (they're already in numpy, after all), but this sort of thing can come in handy to call legacy C/Fortran code from Python.
F2Py is a submodule of NumPy, and is designed to create easy interfaces to Fortran code (though it will also work with C). We can do something similar to above, but use f2py
rather than gcc
to compile it (note that this will require you to have a Fortran compiler installed).
We'll start with a simple example that computes the first N fibonacci numbers. Here's a Python version of the function:
import numpy as np
def fib_py(N):
x = np.zeros(N, dtype=float)
for i in range(N):
if i == 0:
x[i] = 0
elif i == 1:
x[i] = 1
else:
x[i] = x[i - 1] + x[i - 2]
return x
print fib_py(10)
[ 0. 1. 1. 2. 3. 5. 8. 13. 21. 34.]
%timeit fib_py(10000)
100 loops, best of 3: 8.66 ms per loop
-c:11: RuntimeWarning: overflow encountered in double_scalars
Now we'll write the same routine in Fortran:
%%file fib.f
subroutine fib(A,N)
integer N
double precision A(N)
do i=1,N
if (i.EQ.1) then
A(i) = 0.0D0
elseif (i.EQ.2) then
A(i) = 1.0D0
else
A(i) = A(i-1) + A(i-2)
endif
enddo
end
Overwriting fib.f
We'll compile this from the command line using f2py
:
!f2py -c -m fib fib.f
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "fib" sources
f2py options: []
f2py:> /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fibmodule.c
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7
Reading fortran codes...
Reading file 'fib.f' (format:fix,strict)
Post-processing...
Block: fib
Block: fib
Post-processing (stage 2)...
Building modules...
Building module "fib"...
Constructing wrapper function "fib"...
fib(a,[n])
Wrote C/API module "fib" to file "/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fibmodule.c"
adding '/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fortranobject.c' to sources.
adding '/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7' to include_dirs.
copying /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/f2py/src/fortranobject.c -> /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7
copying /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/f2py/src/fortranobject.h -> /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7
build_src: building npy-pkg config files
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize Gnu95FCompiler
Found executable /usr/local/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build_ext
building 'fib' extension
compiling C sources
C compiler: /usr/bin/clang -fno-strict-aliasing -I/Users/jakevdp/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/var
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/var/folders
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/var/folders/_q
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7
compile options: '-I/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7 -I/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include -I/Users/jakevdp/anaconda/include/python2.7 -c'
clang: /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fortranobject.c
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fortranobject.c:2:
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fortranobject.h:13:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:15:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1728:
/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION"
^
/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fortranobject.c:338:30: warning: equality comparison with extraneous parentheses [-Wparentheses-equality]
if ((fp->defs[i].func==NULL)) {
~~~~~~~~~~~~~~~~^~~~~~
/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fortranobject.c:338:30: note: remove extraneous parentheses around the comparison to silence this warning
if ((fp->defs[i].func==NULL)) {
~ ^ ~
/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fortranobject.c:338:30: note: use '=' to turn this equality comparison into an assignment
if ((fp->defs[i].func==NULL)) {
^~
=
2 warnings generated.
clang: /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fibmodule.c
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fibmodule.c:18:
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fortranobject.h:13:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:15:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1728:
/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION"
^
/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fibmodule.c:111:12: warning: unused function 'f2py_size' [-Wunused-function]
static int f2py_size(PyArrayObject* var, ...)
^
2 warnings generated.
compiling Fortran sources
Fortran f77 compiler: /usr/local/bin/gfortran -Wall -ffixed-form -fno-second-underscore -m64 -fPIC -O3 -funroll-loops
Fortran f90 compiler: /usr/local/bin/gfortran -Wall -fno-second-underscore -m64 -fPIC -O3 -funroll-loops
Fortran fix compiler: /usr/local/bin/gfortran -Wall -ffixed-form -fno-second-underscore -Wall -fno-second-underscore -m64 -fPIC -O3 -funroll-loops
compile options: '-I/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7 -I/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include -I/Users/jakevdp/anaconda/include/python2.7 -c'
gfortran:f77: fib.f
/usr/local/bin/gfortran -Wall -m64 -Wall -undefined dynamic_lookup -bundle /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fibmodule.o /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/src.macosx-10.5-x86_64-2.7/fortranobject.o /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI/fib.o -L/usr/local/lib/gcc/x86_64-apple-darwin12.4.0/4.8.1 -lgfortran -o ./fib.so
running scons
Removing build directory /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpe_sWoI
import numpy as np
import fib
a = np.zeros(10, dtype='d')
fib.fib(a)
print a
[ 0. 1. 1. 2. 3. 5. 8. 13. 21. 34.]
a = np.zeros(10000, dtype='d')
%timeit fib.fib(a)
10000 loops, best of 3: 26.8 µs per loop
The Fortran version is about 300x faster!
But this is a bit awkward that we have to create the array that will hold the results... It would be nice if this could happen automatically. We can make this happen through the use of interface files, which have a .pyf
extension.
An interface template can be generated automatically with f2py. We'll tell it that we want a module called fib2
:
!rm -f _fib2.pyf
!f2py fib.f -m fib2 -h _fib2.pyf
Reading fortran codes... Reading file 'fib.f' (format:fix,strict) Post-processing... Block: fib2 Block: fib Post-processing (stage 2)... Saving signatures to file "./_fib2.pyf"
!cat _fib2.pyf
! -*- f90 -*- ! Note: the context of this file is case sensitive. python module fib2 ! in interface ! in :fib2 subroutine fib(a,n) ! in :fib2:fib.f double precision dimension(n) :: a integer, optional,check(len(a)>=n),depend(a) :: n=len(a) end subroutine fib end interface end python module fib2 ! This file was auto-generated with f2py (version:2). ! See http://cens.ioc.ee/projects/f2py2e/
Take a look at the default settings. We have our subroutine called fib(a, n)
, which lists the following:
double precision dimension(n) :: a
This lists a variable which is an input into the routine: an array of length n
integer, optional,check(len(a)>=n),depend(a) :: n=len(a)
This tells us what the routine expects n
to be. It is optionally specified, with a default value which is the length of a
. Also, there is a check in here that makes sure a
is long enough. Let's check this out:
from fib import fib
a = np.zeros(10)
fib(a, 10) # specify the optional keyword
print a
[ 0. 1. 1. 2. 3. 5. 8. 13. 21. 34.]
a = np.zeros(10)
fib(a, 5) # specify only part of the array
print a
[ 0. 1. 1. 2. 3. 0. 0. 0. 0. 0.]
a = np.zeros(10)
fib(a, 20) # specify a length which is too long
--------------------------------------------------------------------------- error Traceback (most recent call last) <ipython-input-26-3aa9413db322> in <module>() 1 a = np.zeros(10) ----> 2 fib(a, 20) # specify a length which is too long error: (len(a)>=n) failed for 1st keyword n: fib:n=20
What we want to do is to modify this interface so that the length n
is an input, and the array a
is an (automatically constructed) output. We'll do this below; for more details on the options available in F2Py interface files, see the F2Py documentation.
%%file fib2.pyf
! -*- f90 -*-
! Note: the context of this file is case sensitive.
python module fib2 ! in
interface ! in :fib2
subroutine fib(a,n) ! in :fib2:fib.f
double precision dimension(n), intent(out), depend(n) :: a
integer intent(in) :: n
end subroutine fib
end interface
end python module fib2
Overwriting fib2.pyf
We've specified the intent and the dependencies of each variable using the f2py
specifications.
Here we compile fib2 using the interface file we've created:
!f2py -c -m fib2 fib2.pyf fib.f
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "fib2" sources
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7
f2py options: []
f2py: fib2.pyf
Reading fortran codes...
Reading file 'fib2.pyf' (format:free)
Post-processing...
Block: fib2
Block: fib
Post-processing (stage 2)...
Building modules...
Building module "fib2"...
Constructing wrapper function "fib"...
a = fib(n)
Wrote C/API module "fib2" to file "/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fib2module.c"
adding '/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fortranobject.c' to sources.
adding '/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7' to include_dirs.
copying /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/f2py/src/fortranobject.c -> /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7
copying /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/f2py/src/fortranobject.h -> /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7
build_src: building npy-pkg config files
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize Gnu95FCompiler
Found executable /usr/local/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build_ext
building 'fib2' extension
compiling C sources
C compiler: /usr/bin/clang -fno-strict-aliasing -I/Users/jakevdp/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/var
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/var/folders
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/var/folders/_q
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7
compile options: '-I/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7 -I/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include -I/Users/jakevdp/anaconda/include/python2.7 -c'
clang: /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fib2module.c
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fib2module.c:18:
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fortranobject.h:13:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:15:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1728:
/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION"
^
/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fib2module.c:104:12: warning: unused function 'f2py_size' [-Wunused-function]
static int f2py_size(PyArrayObject* var, ...)
^
2 warnings generated.
clang: /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fortranobject.c
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fortranobject.c:2:
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fortranobject.h:13:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:15:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1728:
/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION"
^
/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fortranobject.c:338:30: warning: equality comparison with extraneous parentheses [-Wparentheses-equality]
if ((fp->defs[i].func==NULL)) {
~~~~~~~~~~~~~~~~^~~~~~
/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fortranobject.c:338:30: note: remove extraneous parentheses around the comparison to silence this warning
if ((fp->defs[i].func==NULL)) {
~ ^ ~
/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fortranobject.c:338:30: note: use '=' to turn this equality comparison into an assignment
if ((fp->defs[i].func==NULL)) {
^~
=
2 warnings generated.
compiling Fortran sources
Fortran f77 compiler: /usr/local/bin/gfortran -Wall -ffixed-form -fno-second-underscore -m64 -fPIC -O3 -funroll-loops
Fortran f90 compiler: /usr/local/bin/gfortran -Wall -fno-second-underscore -m64 -fPIC -O3 -funroll-loops
Fortran fix compiler: /usr/local/bin/gfortran -Wall -ffixed-form -fno-second-underscore -Wall -fno-second-underscore -m64 -fPIC -O3 -funroll-loops
compile options: '-I/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7 -I/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include -I/Users/jakevdp/anaconda/include/python2.7 -c'
gfortran:f77: fib.f
/usr/local/bin/gfortran -Wall -m64 -Wall -undefined dynamic_lookup -bundle /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fib2module.o /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/src.macosx-10.5-x86_64-2.7/fortranobject.o /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7/fib.o -L/usr/local/lib/gcc/x86_64-apple-darwin12.4.0/4.8.1 -lgfortran -o ./fib2.so
running scons
Removing build directory /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpegZnM7
Now we can import and call the function like this:
import fib2
print fib2.fib(10)
[ 0. 1. 1. 2. 3. 5. 8. 13. 21. 34.]
fib2.fib?
%timeit fib2.fib(10000)
10000 loops, best of 3: 29.3 µs per loop
From our simple FORTRAN function, along with an interface file and a compilation with f2py, we now have a Python module which is callable in a very intuitive way.
If you look into the source code of SciPy, you'll see that this is how much of its functionality is implemented, through wrapping pieces of the NETLIB repository.
CTypes and F2Py provide the ability to wrap Fortran, C, and C++ code so that it can be imported into Python. Cython enables this as well, though we will not focus on that part of it here. The biggest part of Cython is that it lets you convert Python code and Python-like code into compiled C code, which can run many times faster than the original code.
Let's see a quick example. Here's a Python function which computes the N^th fibonacci number:
def nth_fib(n):
a, b = 0, 1
for i in range(n):
b, a = a + b, b
return a
[nth_fib(i) for i in range(10)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
%timeit nth_fib(10000)
100 loops, best of 3: 3.25 ms per loop
Now we'll take the exact same code, and compile it with Cython. In general, this will be done by saving the code to file, and running cython
on the command line. You can read about that in the documentation. Here we'll use IPython's Cython magic to streamline the process:
%load_ext cythonmagic
%%cython
def nth_fib2(n):
a, b = 0, 1
for i in range(n):
b, a = a + b, b
return a
[nth_fib2(i) for i in range(10)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
%timeit nth_fib2(10000)
100 loops, best of 3: 2.81 ms per loop
Just compiling the code in Cython gave us a ~10% speedup. But we can do better by adding type annotations.
See, the main reason Python is slow is because it has to do dynamic type checking each time it evaluates an expression. If we can tell Cython what the types are from the beginning, this step can be skipped, and we have large time savings. We do this through a cdef
command. We also do the temporary assignment explicitly to remove the Python tuple assignment:
%%cython
def nth_fib3(int n):
cdef int a = 0
cdef int b = 1
cdef int tmp
for i in range(n):
tmp = b
b = a + b
a = tmp
return a
[nth_fib3(i) for i in range(10)]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
print "Python only:"
%timeit nth_fib(10000)
print "Bare Cython:"
%timeit nth_fib(10000)
print "Typed Cython:"
%timeit nth_fib3(10000)
Python only: 100 loops, best of 3: 3.13 ms per loop Bare Cython: 100 loops, best of 3: 3.29 ms per loop Typed Cython: 100000 loops, best of 3: 5.99 µs per loop
By adding some type information to our script, we sped up the execution by several orders of magnitude! This shows how easy Cython is for simple problems.
One very useful trick to be aware of is cython -a
. This will produce an annotated html document showing which lines of the program are causing problems. You can run it like this:
%%file nth_fib.pyx
# Slow version:
def nth_fib2(n):
a, b = 0, 1
for i in range(n):
b, a = a + b, b
return a
# Fast Version:
def nth_fib3(int n):
cdef int a = 0
cdef int b = 1
cdef int tmp
for i in range(n):
tmp = b
b = a + b
a = tmp
return a
Overwriting nth_fib.pyx
!cython -a nth_fib.pyx
If we open the resulting HTML file, we'll see a highlighted version of our code. The darkness of the highlight shows how many lines of C code were generated by the line of Python. More yellow lines generally means slower code. This can be very helpful when your code is not running as quickly as you'd expected. To see the results, open nth_fib.html after running the above code.
Cython provides a really nice interface to numpy arrays via the Typed Memoryview syntax. Let's implement the same fib function as above, but using Cython.
First we'll simply compile our Python function again:
%%cython
import numpy as np
def fib_cy(N):
x = np.zeros(N, dtype=float)
for i in range(N):
if i == 0:
x[i] = 0
elif i == 1:
x[i] = 1
else:
x[i] = x[i - 1] + x[i - 2]
return x
print np.allclose(fib_py(10000), fib_cy(10000))
%timeit fib_py(10000)
%timeit fib_cy(10000)
True 100 loops, best of 3: 8.73 ms per loop
-c:1: RuntimeWarning: overflow encountered in double_scalars -c:257: RuntimeWarning: overflow encountered in double_scalars
100 loops, best of 3: 7.58 ms per loop
Again, a very small improvement. Let's add some type information and see how we do:
%%cython
import numpy as np
from numpy cimport float_t
def fib_cy2(int N):
cdef int i
cdef float_t[::1] x = np.zeros(N, dtype=float)
for i in range(N):
if i == 0:
x[i] = 0
elif i == 1:
x[i] = 1
else:
x[i] = x[i - 1] + x[i - 2]
return x
%timeit fib_cy(10000)
%timeit fib_cy2(10000)
100 loops, best of 3: 7.82 ms per loop 10000 loops, best of 3: 37.2 µs per loop
Wow! Simply adding some type information for the Cython compiler made our code orders of magnitude faster! This is because the array can now be treated as a contiguous memory block rather than a Python object. This makes each of the indexing operations much more efficient, because they no longer have to go through the Python interface layer.
There's a whole lot more to know about Cython, especially the details of working with numpy arrays. Here are some more resources:
You might wonder whether some of that type information could be inferred by the compiler, or whether the compilation phase is really necessary. This is where Numba comes in. Numba is an LLVM-based Just In Time (JIT) compiler for Python. It allows simple annotations of Python code which can lead to huge speedups for certain operations.
Numba comes from Continuum Analytics, a company founded by the creator of NumPy and SciPy, and the same people who are behind the Anaconda Python distribution. Numba is still relatively young, and can be a little finnicky for some problems, but the approach is really clean.
You can install numba from scratch, but because of the dependencies on LLVM and other difficult-to-install tools, I'd highly recommend just going with Anaconda.
Let's go back to our Python fib()
function:
def fib(N):
x = np.zeros(N, dtype=np.float64)
for i in range(N):
if i == 0:
x[i] = 0
elif i == 1:
x[i] = 1
else:
x[i] = x[i - 1] + x[i - 2]
return x
To speed this up with Numba, we simply use the jit
decorator:
from numba import autojit
fib_nb = autojit(fib)
fib_nb(10)
array([ 0., 1., 1., 2., 3., 5., 8., 13., 21., 34.])
%timeit fib(10000)
%timeit fib_nb(10000)
100 loops, best of 3: 8.51 ms per loop 10000 loops, best of 3: 33.1 µs per loop
-c:9: RuntimeWarning: overflow encountered in double_scalars
Usually, numba is used as a decorator, which essentially does the same as we saw above. You'd write a decorated function this way:
@autojit
def fib_nb(N):
x = np.zeros(N, dtype=np.float64)
for i in range(N):
if i == 0:
x[i] = 0
elif i == 1:
x[i] = 1
else:
x[i] = x[i - 1] + x[i - 2]
return x
%timeit fib_nb(10000)
10000 loops, best of 3: 34.2 µs per loop
You can see that simply by passing our fib
function through the autojit
decorator, we've attained execution which is a few percent faster than what we got with Cython - and there's no fancy compilation stage required! The funcion is compiled on demand through LLVM, which makes numba very convenient to use.
Now for the bad news: Numba is still young, and there are a lot of corner cases where the compiler will just break without any useful error message. For more complicated problems and functions, the output code is not always faster than Python+NumPy, and can sometimes even be slower!
My recommendation right now is to keep Numba in mind, to try it in your problems, but not necessarily depend on it -- yet.
There's still not a lot of information out there, but here are a few resources:
One of the more interesting and beautiful calculations you can do with Python is the Mandelbrot Set, a straightforwardly-generated set of points whose boundary is a fractal. The computation of whether a point is in the mandelbrot set can be done as follows:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
def mandel(x, y, max_iter):
"""
Given z = x + iy and max_iter, determine whether the candidate
is in the mandelbrot set for the given number of iterations
"""
c = complex(x, y)
z = 0.0j
for i in range(max_iter):
z = z*z + c
if (z.real*z.real + z.imag*z.imag) >= 4:
return i
return max_iter
def create_fractal(Nx, Ny, xmin, xmax, ymin, ymax, max_iter):
"""Create and return a fractal image"""
image = np.zeros((Ny, Nx), dtype=float)
dx = (xmax - xmin) * 1. / Nx
dy = (ymax - ymin) * 1. / Ny
for x in range(Nx):
rpart = xmin + x * dx
for y in range(Ny):
ipart = ymin + y * dy
color = mandel(rpart, ipart, max_iter)
image[y, x] = color
return image
image = create_fractal(300, 200, -2, 1, -1, 1, 20)
plt.imshow(image, cmap=plt.cm.jet)
<matplotlib.image.AxesImage at 0x109981390>
Using this simple code, you can generate some absolutely stunning images. The problem is, to go deeper you need very high resolution and very many iterations, both of which are extremely slow in Python!
Your assignment is to make these faster:
Speed up this example using either CTypes, F2Py, or Cython. With CTypes/F2Py, you'll have to write the above functionality as a C/Fortran routine, and use the tools above to wrap that functionality.
For Cython, it will require adding appropriate type information to each of the variables used in the computation. Also, check the Cython documentation and make sure to make mandel()
a cdef
-ed function. Once you're finished, use the cython -a
trick to generate annotated code, and make sure there are no yellow-highlighted lines in the core of your algorithm: the loops should look completely clean!
Speed up this example using Numba. Numba is much more of a black box, but you should be able to do it very quickly. You'll likely find that simply adding the decorator to the function doesn't work. I'll give you a hint: Numba doesn't like an array specified with dtype=float
. Look back to our previous example for clues about how to fix this.
(Also, you might find this notebook to be a useful resource: it is all about doing the mandelbrot set with Numba).
Use %timeit
to compare your three implementations side-by-side for the values passed above.
Once you have these working, choose one of the methods and use it to zoom-in on a portion of the above plot. Go to high resolution and a large number of iterations (say 256). Experiment with different matplotlib color schemes (see some of the options here) and produce your most beautiful visualization of the Mandelbrot set.