Notebook

Debugging, Exceptions, and Bugs in Python (`pdb`)¶

It happens to the best of us. Our code forged in the caffeine-fueled pits of night now brought forth to the day cracks and crumbles before our eyes. But what can we do about it?

Debugging is the process of identifying systematic errors in applications, whether from formal errors or modeling errors. An example of a formal error is an out-of-bounds error on an array; an example of modeling error is mistyping the differential equation being solved. Debugging either involves analyzing raw code, execution behavior, and output.

In [3]:

from __future__ import print_function, division

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline

Consider the Zen of Python:

In [ ]:

import this

Contents¶

Introduction
Types of Bugs
- Exceptions and Errors
- Tracebacks
Handling bugs
Linting
Coding Standards
Python Debugger
Using a Context Manager for Elegant Expression
Exercise
References
Credits

Types of Bugs¶

A few working definitions for our discussion today:

Exceptions — unusual behavior (although not necessarily unexpected behavior, particularly in Python)
Errors — exceptions which cause the program to be unrunnable (cannot be handled at run time)
Tracebacks — listing of function calls on the stack at the time the exception arises
Bugs — errors and exceptions, but also miswritten, ambiguous, or incorrect code which in fact runs but does not advertise its miscreancy

Exceptions & Errors¶

Formally, an exception is an event raised to indicate that a function has failed. Most of the time, this means that the function was passed bad data, or it encountered a situation it can't handle, or just reached a known invalid result, like division by zero. (However, this may also be intentional—Python causes a container to raise a StopIteration exception to signal that there are no items left to iterate over, for instance in a for loop.)

Common exceptions include:

SyntaxError — check missing colons or parentheses
NameError — check for typos, function definitions
TypeError — check variable types (coerce if necessary)
ValueError — check function parameters
IOError — check that files exist
IndexError — don't reference nonexistent list elements
KeyError — similar to an IndexError, but for dictionaries
ZeroDivisionError — three guesses...
IndentationError — check that spaces and tabs aren't mixed
Exception — generic error category

Exercise¶

Write some snippets of code which throw the following exceptions:
- SyntaxError
- NameError
- TypeError
- ValueError
- IOError
- IndexError
- KeyError
- ZeroDivisionError

In [ ]:

Tracebacks¶

When something goes wrong in Python, the interpreter helpfully tries to show you where and why the exception occurred. Although this can be intimidating to new users, the traceback is quite useful in determining the offending bit of code.

Programs generally call functions on the stack: that is, each time a function is called, the calling function is set aside and the new function becomes the active site for the program. When this function completes, control is returned to the initial function. When this extends across many function calls, we have a deep nested structure.


%% main.py
def do\_numerics():
    print(sin(5.0))

if __name__=="__main__": do_numerics()

This is what the traceback is showing us, indicating where the code failed and tracing the stack, or nested function calls, to show you what the chain of calls was in case that helps you figure out why things went wrong.

In [ ]:

import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 1, 10001)
y = np.cos(np.pi/x) * np.exp(-2*x)

plt.plot(x[0:-1:2], y)
plt.show()

Two things happened above:

The first was a warning or two, marked in Jupyter by red highlighting. These don't impact the successful completion of our code, but they can affect the quality of the results or have other externalities.
Next, a misalignment of dimensions in the vectors we desire to plot leads to an irreconciliable difficulty in the code. By my count in the current versions of NumPy and MatPlotLib, we are generating an error six layers deep in the function stack.

In order to fix this problem, we need to align the vectors: either sample y at the same rate as x or don't downsample x.

Tracebacks occasionally signal problems with your installation, rather than your code. For instance, I recently had the following error arise: >>> from numpy import sin Traceback (most recent call last): File "", line 1, in File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/init.py", line 128, in from version import git_revision as git_revision ImportError: No module named 'version'

It turns out that my distribution of Python was incorrectly using certain libraries: the $PYTHONPATH environment variable was set up wrong. (Incidentally, this is the sort of thing that has motivated people to counsel against using $PYTHONPATH at all.) When fixed, the traceback disappeared and the import worked properly.

Let's try invoking a few more dramatic errors. First, how about an infinite recursion?

In [ ]:

f = lambda f:f(f)
f(f)

Of course, although the Python and IPython interpreters are extremely robust, there are limits. The following, for instance, will crash the interpreter without even a traceback.

In [ ]:

# Try this in a Python interpreter.
import sys

sys.setrecursionlimit(1<<30)
f = lambda f:f(f)
f(f)

Tracebacks and exceptions are objects, and you can extract much more information if so inclined. Try this sophisticated analysis of exceptions, using sys.exc_info:

In [ ]:

import sys, os

try:
    raise NotImplementedError("No error")

except Exception as e:
    exc_type, exc_obj, exc_tb = sys.exc_info()
    fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
    print(exc_type.__name__, fname, exc_tb.tb_lineno)

Handling bugs¶

Handling Exceptions (`try`/`except`/`else`/`finally`)¶

Now what we saw above were all in fact errors in that they caused execution to halt. What if we wanted to use them more intelligently—that is, to diagnose and handle problems before they crash our code?

This is what the try/except/else/finally workflow attempts to do. Basically, we can write a snippet of code which is susceptible to a specific error which can be handled in a try block, and then deal with the aftermath in the except block.

In [ ]:

try:
    x = 1 / 0
except ZeroDivisionError:
    print("Division by zero occurred.")

In [ ]:

denom = 0
while True:
    try:
        # Read int from console.
        denom = input()
        
        # Use as denominator.
        i = 1 / float(denom)
    except:
        print("non-numeric value entered")
    else:
        print(i)
    finally:
        if denom == 'q': break

try/catch error handling should encapsulate the fewest statements possible. It can also reduce code readability, and so should probably be used only where things are likely to go wrong, in your judgment: the file system, or some obtuse calculation.

Basically, try lets you execute an error-prone segment of code, while except lets you handle any or all of the errors that arise. (It is better to handle less, as a general maxim, so that you don't mask other errors lurking in an operation.) An optional finally clause will execute in any case.

In [ ]:

filename = 'spring.data'
try:
    data = np.genfromtxt(filename)
except:
    print 'Unable to read file "%s".'%filename

In [ ]:

filename = 'spring.data'
try:
    data = np.genfromtxt(filename)
    print data
except IOError, err:
    print 'Unable to read file "%s"; %s.'%(filename,err)
    # why output err?  what else can go wrong?
finally:
    print 'Done with data loading code.'

_**The Principle of Least Astonishment**_: The result of performing some operation should be obvious, consistent, and predictable, based upon the name of the operation and other clues.

Raising Exceptions¶

Just as you can handle exceptions to make your code run properly, you can raise them as well. Generic, specific, and user-specified exceptions are all available to you.

In [ ]:

#raise( Exception, "This is my customised error message." ) #Python 2
raise Exception( "This is my customised error message." ) #Python 3

Should I Use Exceptions?¶

If we are going to intelligently use exceptions to control the execution of our program, what impact will this have?

Cons
- The use of exceptions for control structures (like for statements) is controversial.
- In many languages, exception handling is expensive computationally and should be avoided.
Pros
- The function can adapt to circumstances without crashing.
- Code can be written for the programmer, not for the machine.

The following code gives perspective on the case of relative efficiency in Python^(src):

In [ ]:

SETUP = 'counter = 0'

LOOP_IF = """
counter += 1
"""

LOOP_EXCEPT = """
try:
    counter += 1
except:
    pass
"""

import timeit
if_time = timeit.Timer(LOOP_IF, setup=SETUP)
except_time = timeit.Timer(LOOP_EXCEPT, setup=SETUP)
print('using if statement: {}'.format(min(if_time.repeat(number=10 ** 7))))
print('using exception: {}'.format(min(except_time.repeat(number=10 ** 7))))

So the use of exception-handling code is not that big of a deal—if raise makes your code easier to understand or operate, then please use it liberally. Code that is easier to read and debug is code less likely to have bugs, since the emergent features can be well-characterized.

Linting¶

As I said before, you can debug by simply attempting to run your code. This, however, is very annoying. First off, the code will always stop at the first exception. This means that, if you have ten errors, you'll have to run the code ten times to find all of them. Now, imagine that this is long-running code. Imagine waiting five minutes for your code to run, only to discover it breaks because of a typo. Doesn't that sound exciting? Linting is the process of statically analyzing the code in order to catch multiple errors simultaneously (rather than dealing with them one at a time in the debugger). There are several available; I'll illustrate the use of pyflakes.

$ pyflakes my_code.py

Coding Standards¶

In a written natural language, there are many ways to express the same idea. To make the consumption of information easier, people define style guides to enforce particularly effective ways of writing. This is even more true in coding; consistent style choices make scripts much easier to read. Just like version control, standards become absolutely essential as projects become large ($n>1$, where $n$ is the number of coders).

Some programming languages have multiple competing standards, and it's easy to imagine how messy this can get. You can find strong opinions on what constitutes a tab, for instance. Luckily, Python doesn't have this issue. The official standard, PEP8, is used everywhere. Unless you plan on hiding all the code you write from the outside world, you should learn it.

To help out coders, there are tools to test for compliance. The aptly named pep8 library shows you where your code deviates from the PEP8 standard, and autopep8 goes a step further by trying to fix all of your errors for you. These are both run from the shell. (They are not available in Canopy Basic, so you'll need to install them yourself or get the Academic license.)

$ pep8 my_code.py $ autopep8 my_code.py > my_new_code.py

Python DeBugger (`pdb`)¶

Now that we've seen how errors and exceptions are raised and handled, let's handle some code interactively using pdb, an invaluable tool for finding subtle errors in numerical code, for instance. (If you have used the GNU DeBugger, gdb, before then you will see many similarities.)

Essentially, a common mode of use for pdb allows you to run your Python code normally until an error is reached in the interpreter. At this point, your program crashes—you're probably used to this—but all of the data remain available, and you can manipulate them to figure out what went awry and why. Another mode of interaction lets you pause program execution periodically so you can look "under the hood", figuring out exactly what is going on and whether the results align with expectations.

(This sounds a lot like the Python interpreter, and it is. pdb just extends that functionality to standalone scripts as well.)

First, let's just run pdb in a conventional program, either in IPython (which works but can output some strange tracebacks occasionally) or in a conventional Python or IPython interpreter. Another common method for using pdb—arguably more useful—is a GUI-based IDE such as Spyder, which makes the active use of code breakpoints particularly tractable.

In [ ]:

%%file pdb_vel.py
import numpy as np
import pdb

def y_fall(t, x0, v0):
    a = -9.8
    y = a*t**2 + v0*t + x0
    return y

v  = 2520 # m/s
x0 = 0    # m
t = np.arange(0,300,0.1) # s

pdb.set_trace()

print(y_fall(t, x0, v))

When you execute python pdb_vel.py, you notice that a new prompt appears, (Pdb). At this prompt, you can type a series of commands. Try, for instance, print v or print t[0]. You can even directly manipulate variables: t[0] = -1. When you are ready to continue execution, enter continue, at which point the program will proceed directly (including the values of altered variables, if any).

Invocation¶

pdb can be invoked in several ways. Generally speaking, you want it to either pop up after an exception has been raised or run continuously from a designated breakpoint. The following chart illustrates a few ways you can get into the pdb interface.

Operation¶

This chart records the most commonly used pdb commands. We'll do some hands-on exercises in a few minutes so you can get a feel for what they do, and which are the most useful in a given context.

Most of these need to be seen in context to appreciate them, but I'll point out up front the most frequently used: - n[ext] - s[tep] - r[eturn] - p[rint]

Debugging a Code Step-by-Step¶

Let's take a look at an example script which takes a positive integer as an argument and outputs all positive integer factors of that integer. Or, at least it should do this. Give it a shot below.

(This example is modified from one given in the WinPDB documentation.)

In [ ]:

%%file divisible.py
from __future__ import print_function
import sys

def is_divisible(a, b):
    """Determines if integer a is divisible by integer b."""
    
    remainder = a % b
    # if there's no remainder, then a is divisible by b
    if not remainder:
        return True
    else:
        return False

def find_divisors(integer):
    """Find all divisors of an integer and return them as a list."""

    divisors = []
    # we know that an integer divides itself
    divisors.append(integer)
    # we also know that the biggest divisor other than the integer itself
    # must be at most half the value of the integer (think about it)
    divisor = integer / 2

    while divisor >= 0:
        if is_divisible(integer, divisor):
            divisors.append(divisor)
        divisor =- 1

    return divisors

if __name__ == '__main__':
    # do some checking of the user's input
    try:
        if len(sys.argv) == 2:
            # the following statement will raise a ValueError for
            # non-integer arguments
            test_integer = int(sys.argv[1])
            # check to make sure the integer was positive
            if test_integer <= 0:
                raise ValueError("integer must be positive")
        elif len(sys.argv) == 1:
            # print the usage if there are no arguments and exit
            print(__doc__)
            sys.exit(0)
        else:
            # alert the user they did not provide the correct number of
            # arguments
            raise ValueError("too many arguments")
    # catch the errors raised if sys.argv[1] is not a positive integer
    except ValueError as e:
        # alert the user to provide a valid argument
        print("Error: please provide one positive integer as an argument.")
        sys.exit(2)

    divisors = find_divisors(test_integer)
    # print the results
    print("The divisors of %d are:" % test_integer)
    for divisor in divisors:
        print(divisor)

In [ ]:

!python divisible.py 100

Execute the code at the command prompt with pdb activated as follows: $ python -m pdb divisible.py 100

Another Example¶

Consider the following series statement for a Bessel function of the first kind, $$J_{0}(x) = \sum_{m=0}^{\infty} \frac{(-1)^{m}}{m! (m+1)!} {\left(\frac{x}{2}\right)}^{2m} \text{.} $$

In [ ]:

%%file bessel.py
from scipy.misc import factorial2 as fact
from scipy.special import j0 # for testing error

import pdb
pdb.set_trace()

def term(m, x):
    return ((-1)**m)/(fact(m)*fact(m+1)*(0.5**x)*(2*m))

value = 0.5
max_term = 20
my_sum = 0.0
for i in range(0, max_term):
    my_sum += term(i, value)

print('series gives %f'%my_sum)
print('scipy gives %f'%j0(value))
print('error is %f'%(my_sum-j0(value)))

In [ ]:

!python bessel.py

`pdb` in IPython¶

IPython has slightly enhanced support for debugging as compared to the regular Python interpreter. One convenient magic is %debug, which can be called immediately after failing code to analyze the traceback. (This works poorly in the IPython Notebook, but is more tractable in the command-line interpreter.)

In [ ]:

import string

greek = { 'a':u'α', 'b':u'β', 'g':u'γ', 'd':u'δ', 'e':u'ε', 'z':u'ζ' }

for letter in string.ascii_lowercase:
    print(greek[letter])

In [ ]:

%debug

Using a Context Manager for Elegant Expression (`with`)¶

Incidentally, one very useful practice is to allow a Python context manager to deal with much of the boilerplate of setting up and always taking down an object (even in the case of failure). The with statement is used thus:

In [ ]:

sums = [sum(range(0,i+1)) for i in range(1,10)]

with open('sums.txt', 'w') as file_out:
    for value in sums:
        print(value, file=file_out)
print("Successfully wrote file.")
with open('sums.txt', 'r') as file_in:
    for line in file_in:
        print(line.strip(), end=',')
print("\nSuccessfully read file.")

Naturally it fails the same way, but it also closes the file without your explicit intervention.

In [ ]:

with open('nonexisting.txt', 'r') as file_in:
    for line in file_in:
        print(line.strip(), end=',')

Thus you still have exceptions raised as you would expect, but if a file operation—or anything else—fails, the object is automatically closed (deallocated, whatever is in __exit__) by the context manager arranged by with. (ref)

The following two snippets of code are thus equivalent (src):

Old way: try: f = open("file", "r") try: line = f.readline() finally: f.close() except IOError: # handle error
New way: try: with open("file", "r") as f: line = f.readline() except IOError: # handle error

You can see that the first opens, reads, and closes the file; the second does the same, only much more elegantly and Pythonically.

Exercise¶

Finite-difference models are used throughout engineering to obtain numerical solutions to differential equations. This particular system models the heat equation

$$ \frac{1}{\alpha} \frac{\partial u}{\partial t} = \frac{\partial^2 u}{\partial x^2}$$

given an initial condition of $u(x,t=0) = \sin\left(\pi x/L\right)$ and boundary conditions of $u(x=0,t) = 0$ and $u(x=L,t) = 0$.

To approximate a derivative, the most straightforward way is to take the formal definition

$$f'(x) = \frac{f(x+h)-f(x)}{h}$$

and use a small but nonzero step $h$ in your calculation.

Application of this principle to the heat equation leads to a statement of the form

$$ \frac{1}{\alpha} \frac{u^m_i - u^{m-1}_i}{\Delta t} = \frac{u^{m-1}_{i-1} - 2 u^{m-1}_{i} + u^{m-1}_{i+1}}{\Delta x^2} $$

or $u^m_i = \frac{\alpha \Delta t}{\Delta x^2}u^{m-1}_{i-1} + \left[1 - 2\left(\frac{\alpha \Delta t}{\Delta x^2}\right)\right]u^{m-1}_{i} + \frac{\alpha \Delta t}{\Delta x^2}u^{m-1}_{i+1}$.

This clearly yields a way to calculate subsequent time steps point-by-point from the previous time step's data.

Debug the following finite-difference code using pdb. Consider the following elements of the problem:
- Are the sizes of dt, dx, etc., correct?
- Are the ranges of the loops correct?
- Are other definitions and equations correctly typed?

In [ ]:

%%file fd_he.py
import numpy as np

# Basic parameters
nt = 120
nx = 25
alpha = 0.1 
length = 1.0
tmax = 0.5

# Derived parameters:  mesh spacing and time step size
dx = length / nx
dt = tmax / (nt-1)

# Create arrays to save data in process.
x = np.linspace(0, length+1e-15, nx)
t = np.linspace(0, tmax+1e-15, nt)
u = np.zeros([nx, nt])

# Set initial and boundary conditions.
u[:, 0] = np.sin(np.pi*x/length)**2
#boundaries are implicitly set by this initial condition

# Loop through each time step.
r  = alpha * dt / (dx*dx)
s  = 1 - 2*r
for n in range(1, nt):
    for j in range(1, nx):
        u[n, j] = r*u[j-1, n-1] + s*u[j, n-1] + r*u[j+1, n-1]

# Output the results.
np.savetxt('fd_data.txt', u)

References¶

Langtangen, Hans Petter. Python Scripting for Computational Science, 3ed. Berlin–Heidelberg: Springer–Verlag, 2009.
Lugo, Michael. On propagation of errors. 26 March 2012.

Credits¶

Neal Davis and Lakshmi Rao developed these materials for Computational Science and Engineering at the University of Illinois at Urbana–Champaign. It incorporates some elements from Software Carpentry, contributed by Greg Wilson and others; and The Hacker Within, contributed by Katy Huff, Anthony Scopatz, Joshua R. Smith, and Sri Hari Krishna Narayanan.

This content is available under a [Creative Commons Attribution 3.0 Unported License](https://creativecommons.org/licenses/by/3.0/).

Debugging, Exceptions, and Bugs in Python (pdb)¶

Contents¶

Types of Bugs¶

Exceptions & Errors¶

Exercise¶

Tracebacks¶

Handling bugs¶

Handling Exceptions (try/except/else/finally)¶

Raising Exceptions¶

Should I Use Exceptions?¶

Linting¶

Coding Standards¶

Python DeBugger (pdb)¶

Invocation¶

Operation¶

Debugging a Code Step-by-Step¶

Another Example¶

pdb in IPython¶

Using a Context Manager for Elegant Expression (with)¶

Exercise¶

References¶

Credits¶

Debugging, Exceptions, and Bugs in Python (`pdb`)¶

Handling Exceptions (`try`/`except`/`else`/`finally`)¶

Python DeBugger (`pdb`)¶

`pdb` in IPython¶

Using a Context Manager for Elegant Expression (`with`)¶