name = '2018-02-19-debugging-profiling'
title = 'Debugging and profiling'
tags = 'optimisation, debugging, profiling, basics'
author = 'Denis Sergeev'
from nb_tools import connect_notebook_to_post
from IPython.core.display import HTML
html = connect_notebook_to_post(name, title, tags, author)
Today we went through some basic tools to inspect Python scripts for errors and performance bottlenecks.
The standard Python tool for interactive debugging is pdb
, the Python debugger.
ipdb
, the IPython debugger.First, we import a function to insert breakpoints:
from IPython.core.debugger import set_trace
Define a simple function, insert a breakpoint:
def fun(foo, bar):
set_trace()
return foo + bar
# fun(1, 2)
(Uncomment it to run with debugger)
In IPython, perhaps the most convenient interface to debugging is the %debug
magic command. If you call it after hitting an exception, it will automatically open an interactive debugging prompt at the point of the exception. The ipdb
prompt lets you explore the current state of the stack, explore the available variables, and even run Python commands!
def func1(a, b):
return a / b
def func2(x):
a = x
b = x - 1
return func1(a, b)
func2(1)
--------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) <ipython-input-7-7cb498ea7ed1> in <module>() ----> 1 func2(1) <ipython-input-6-586ccabd0db3> in func2(x) 5 a = x 6 b = x - 1 ----> 7 return func1(a, b) <ipython-input-6-586ccabd0db3> in func1(a, b) 1 def func1(a, b): ----> 2 return a / b 3 4 def func2(x): 5 a = x ZeroDivisionError: division by zero
# %debug
If you'd like the debugger to launch automatically whenever an exception is raised, you can use the %pdb
magic function to turn on this automatic behavior:
%pdb on
...some code to debug...
%pdb off
In IPython and Jupyter you can use special magics to measure how much time a snippet of code takes to run.
%timeit sum(range(100))
1.59 µs ± 229 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Note that because this operation is so fast, %timeit
automatically does a large number of repetitions. For slower commands, %timeit
will automatically adjust and perform fewer repetitions.
Once you have your code working, it can be useful to dig into its efficiency a bit. Sometimes it's useful to check the execution time of a given command or set of commands; other times it's useful to dig into a multiline process and determine where the bottleneck lies in some complicated series of operations.
You can either use the built in profiler:
import cProfile
cp = cProfile.Profile()
cp.enable()
# some code to profile
cp.disable()
cp.print_stats()
Let's define a function with a big number of for
-loop iterations.
def sum_of_lists(N):
total = 0
for i in range(5):
L = [j ^ (j >> i) for j in range(N)]
total += sum(L)
return total
We can now load a line_profiler
extension (after we install line_profiler
package using pip or conda).
%load_ext line_profiler
And call the function using the special profiler magic:
%lprun -f sum_of_lists sum_of_lists(5000)
In the similar manner, we can even do memory profiling (once the memory_profiler
package is installed).
%load_ext memory_profiler
%memit sum_of_lists(100000)
peak memory: 50.20 MiB, increment: 7.16 MiB
def foo(a, b):
return a + b
Let's disassemble it with the dis
module:
import dis
dis.dis(foo)
2 0 LOAD_FAST 0 (a) 2 LOAD_FAST 1 (b) 4 BINARY_ADD 6 RETURN_VALUE
BINARY_ADD
?¶We crack open CPython's source code and take a look inside Python/ceval.c
:
/* Python/ceval.c */
TARGET(BINARY_ADD) {
PyObject *right = POP();
PyObject *left = TOP();
PyObject *sum;
/* NOTE(haypo): Please don't try to micro-optimize int+int on
CPython using bytecode, it is simply worthless.
See http://bugs.python.org/issue21955 and
http://bugs.python.org/issue10044 for the discussion. In short,
no patch shown any impact on a realistic benchmark, only a minor
speedup on microbenchmarks. */
if (PyUnicode_CheckExact(left) &&
PyUnicode_CheckExact(right)) {
sum = unicode_concatenate(left, right, f, next_instr);
/* unicode_concatenate consumed the ref to left */
}
else {
sum = PyNumber_Add(left, right); // <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Py_DECREF(left);
}
Py_DECREF(right);
SET_TOP(sum);
if (sum == NULL)
goto error;
DISPATCH();
}
PyNumber_Add(left, right)
?¶/* Objects/abstract.c */
PyObject *
PyNumber_Add(PyObject *v, PyObject *w)
{
PyObject *result = binary_op1(v, w, NB_SLOT(nb_add)); // <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
if (result == Py_NotImplemented) {
PySequenceMethods *m = v->ob_type->tp_as_sequence;
Py_DECREF(result);
if (m && m->sq_concat) {
return (*m->sq_concat)(v, w);
}
result = binop_type_error(v, w, "+");
}
return result;
}
binary_op1()
?¶static PyObject *
binary_op1(PyObject *v, PyObject *w, const int op_slot)
{
PyObject *x;
binaryfunc slotv = NULL;
binaryfunc slotw = NULL;
if (v->ob_type->tp_as_number != NULL)
slotv = NB_BINOP(v->ob_type->tp_as_number, op_slot);
if (w->ob_type != v->ob_type &&
w->ob_type->tp_as_number != NULL) {
slotw = NB_BINOP(w->ob_type->tp_as_number, op_slot); // <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
if (slotw == slotv)
slotw = NULL;
}
if (slotv) {
if (slotw && PyType_IsSubtype(w->ob_type, v->ob_type)) {
x = slotw(v, w);
if (x != Py_NotImplemented)
return x;
Py_DECREF(x); /* can't do it */
slotw = NULL;
}
x = slotv(v, w);
if (x != Py_NotImplemented)
return x;
Py_DECREF(x); /* can't do it */
}
if (slotw) {
x = slotw(v, w);
if (x != Py_NotImplemented)
return x;
Py_DECREF(x); /* can't do it */
}
Py_RETURN_NOTIMPLEMENTED;
}
NB_BINOP()
?¶#define NB_BINOP(nb_methods, slot) \
(*(binaryfunc*)(& ((char*)nb_methods)[slot]))
/* Objects/longobject.c */
static PyObject *
long_add(PyLongObject *a, PyLongObject *b)
{
PyLongObject *z;
CHECK_BINOP(a, b);
if (Py_ABS(Py_SIZE(a)) <= 1 && Py_ABS(Py_SIZE(b)) <= 1) {
return PyLong_FromLong(MEDIUM_VALUE(a) + MEDIUM_VALUE(b)); // <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
}
if (Py_SIZE(a) < 0) {
if (Py_SIZE(b) < 0) {
z = x_add(a, b);
if (z != NULL) {
/* x_add received at least one multiple-digit int,
and thus z must be a multiple-digit int.
That also means z is not an element of
small_ints, so negating it in-place is safe. */
assert(Py_REFCNT(z) == 1);
Py_SIZE(z) = -(Py_SIZE(z));
}
}
else
z = x_sub(b, a);
}
else {
if (Py_SIZE(b) < 0)
z = x_sub(a, b);
else
z = x_add(a, b);
}
return (PyObject *)z;
}
MEDIUM_VALUE()
?¶/* Objects/longobject.c */
#define MEDIUM_VALUE(x) (assert(-1 <= Py_SIZE(x) && Py_SIZE(x) <= 1), \
Py_SIZE(x) < 0 ? -(sdigit)(x)->ob_digit[0] : \
(Py_SIZE(x) == 0 ? (sdigit)0 : \
(sdigit)(x)->ob_digit[0]))
BINARY_ADD
in this case).foo('a', 'b')
or any types that support +
.__add__
or __radd__
magic method.All these features mean a lot of code at the C level.
%timeit foo(1, 2)
104 ns ± 1.22 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
%timeit foo('a', 'b')
150 ns ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
ceval.c
.HTML(html)
This post was written as an IPython (Jupyter) notebook. You can view or download it using nbviewer.