pdb
)¶It happens to the best of us. Our code forged in the caffeine-fueled pits of night now brought forth to the day cracks and crumbles before our eyes. But what can we do about it?
Debugging is the process of identifying systematic errors in applications, whether from formal errors or modeling errors. An example of a formal error is an out-of-bounds error on an array; an example of modeling error is mistyping the differential equation being solved. Debugging either involves analyzing raw code, execution behavior, and output.
from __future__ import print_function, division
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
%matplotlib inline
Consider the Zen of Python:
import this
A few working definitions for our discussion today:
Exceptions — unusual behavior (although not necessarily unexpected behavior, particularly in Python)
Errors — exceptions which cause the program to be unrunnable (cannot be handled at run time)
Tracebacks — listing of function calls on the stack at the time the exception arises
Bugs — errors and exceptions, but also miswritten, ambiguous, or incorrect code which in fact runs but does not advertise its miscreancy
Formally, an exception is an event raised to indicate that a function has failed. Most of the time, this means that the function was passed bad data, or it encountered a situation it can't handle, or just reached a known invalid result, like division by zero. (However, this may also be intentional—Python causes a container to raise a StopIteration
exception to signal that there are no items left to iterate over, for instance in a for
loop.)
Common exceptions include:
SyntaxError
— check missing colons or parenthesesNameError
— check for typos, function definitionsTypeError
— check variable types (coerce if necessary)ValueError
— check function parametersIOError
— check that files existIndexError
— don't reference nonexistent list elementsKeyError
— similar to an IndexError, but for dictionariesZeroDivisionError
— three guesses...IndentationError
— check that spaces and tabs aren't mixedException
— generic error categorySyntaxError
NameError
TypeError
ValueError
IOError
IndexError
KeyError
ZeroDivisionError
When something goes wrong in Python, the interpreter helpfully tries to show you where and why the exception occurred. Although this can be intimidating to new users, the traceback is quite useful in determining the offending bit of code.
Programs generally call functions on the stack: that is, each time a function is called, the calling function is set aside and the new function becomes the active site for the program. When this function completes, control is returned to the initial function. When this extends across many function calls, we have a deep nested structure.
%% main.py
def do\_numerics():
print(sin(5.0))
|
This is what the traceback is showing us, indicating where the code failed and tracing the stack, or nested function calls, to show you what the chain of calls was in case that helps you figure out why things went wrong.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0, 1, 10001)
y = np.cos(np.pi/x) * np.exp(-2*x)
plt.plot(x[0:-1:2], y)
plt.show()
Two things happened above:
The first was a warning or two, marked in Jupyter by red highlighting. These don't impact the successful completion of our code, but they can affect the quality of the results or have other externalities.
Next, a misalignment of dimensions in the vectors we desire to plot leads to an irreconciliable difficulty in the code. By my count in the current versions of NumPy and MatPlotLib, we are generating an error six layers deep in the function stack.
In order to fix this problem, we need to align the vectors: either sample y
at the same rate as x
or don't downsample x
.
Tracebacks occasionally signal problems with your installation, rather than your code. For instance, I recently had the following error arise:
>>> from numpy import sin
Traceback (most recent call last):
File "
It turns out that my distribution of Python was incorrectly using certain libraries: the $PYTHONPATH
environment variable was set up wrong. (Incidentally, this is the sort of thing that has motivated people to counsel against using $PYTHONPATH
at all.) When fixed, the traceback disappeared and the import
worked properly.
Let's try invoking a few more dramatic errors. First, how about an infinite recursion?
f = lambda f:f(f)
f(f)
Of course, although the Python and IPython interpreters are extremely robust, there are limits. The following, for instance, will crash the interpreter without even a traceback.
# Try this in a Python interpreter.
import sys
sys.setrecursionlimit(1<<30)
f = lambda f:f(f)
f(f)
Tracebacks and exceptions are objects, and you can extract much more information if so inclined. Try this sophisticated analysis of exceptions, using sys.exc_info
:
import sys, os
try:
raise NotImplementedError("No error")
except Exception as e:
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print(exc_type.__name__, fname, exc_tb.tb_lineno)
try
/except
/else
/finally
)¶Now what we saw above were all in fact errors in that they caused execution to halt. What if we wanted to use them more intelligently—that is, to diagnose and handle problems before they crash our code?
This is what the try
/except
/else
/finally
workflow attempts to do. Basically, we can write a snippet of code which is susceptible to a specific error which can be handled in a try
block, and then deal with the aftermath in the except
block.
try:
x = 1 / 0
except ZeroDivisionError:
print("Division by zero occurred.")
denom = 0
while True:
try:
# Read int from console.
denom = input()
# Use as denominator.
i = 1 / float(denom)
except:
print("non-numeric value entered")
else:
print(i)
finally:
if denom == 'q': break
try
/catch
error handling should encapsulate the fewest statements possible. It can also reduce code readability, and so should probably be used only where things are likely to go wrong, in your judgment: the file system, or some obtuse calculation.
Basically, try
lets you execute an error-prone segment of code, while except
lets you handle any or all of the errors that arise. (It is better to handle less, as a general maxim, so that you don't mask other errors lurking in an operation.) An optional finally
clause will execute in any case.
filename = 'spring.data'
try:
data = np.genfromtxt(filename)
except:
print 'Unable to read file "%s".'%filename
filename = 'spring.data'
try:
data = np.genfromtxt(filename)
print data
except IOError, err:
print 'Unable to read file "%s"; %s.'%(filename,err)
# why output err? what else can go wrong?
finally:
print 'Done with data loading code.'
_**The Principle of Least Astonishment**_: The result of performing some operation should be obvious, consistent, and predictable, based upon the name of the operation and other clues.
Just as you can handle exceptions to make your code run properly, you can raise
them as well. Generic, specific, and user-specified exceptions are all available to you.
#raise( Exception, "This is my customised error message." ) #Python 2
raise Exception( "This is my customised error message." ) #Python 3
If we are going to intelligently use exceptions to control the execution of our program, what impact will this have?
Cons
for
statements) is controversial.Pros
The following code gives perspective on the case of relative efficiency in Python(src):
SETUP = 'counter = 0'
LOOP_IF = """
counter += 1
"""
LOOP_EXCEPT = """
try:
counter += 1
except:
pass
"""
import timeit
if_time = timeit.Timer(LOOP_IF, setup=SETUP)
except_time = timeit.Timer(LOOP_EXCEPT, setup=SETUP)
print('using if statement: {}'.format(min(if_time.repeat(number=10 ** 7))))
print('using exception: {}'.format(min(except_time.repeat(number=10 ** 7))))
So the use of exception-handling code is not that big of a deal—if raise
makes your code easier to understand or operate, then please use it liberally. Code that is easier to read and debug is code less likely to have bugs, since the emergent features can be well-characterized.
As I said before, you can debug by simply attempting to run your code. This, however, is very annoying. First off, the code will always stop at the first exception. This means that, if you have ten errors, you'll have to run the code ten times to find all of them. Now, imagine that this is long-running code. Imagine waiting five minutes for your code to run, only to discover it breaks because of a typo. Doesn't that sound exciting? Linting is the process of statically analyzing the code in order to catch multiple errors simultaneously (rather than dealing with them one at a time in the debugger). There are several available; I'll illustrate the use of pyflakes.
In a written natural language, there are many ways to express the same idea. To make the consumption of information easier, people define style guides to enforce particularly effective ways of writing. This is even more true in coding; consistent style choices make scripts much easier to read. Just like version control, standards become absolutely essential as projects become large ($n>1$, where $n$ is the number of coders).
Some programming languages have multiple competing standards, and it's easy to imagine how messy this can get. You can find strong opinions on what constitutes a tab, for instance. Luckily, Python doesn't have this issue. The official standard, PEP8, is used everywhere. Unless you plan on hiding all the code you write from the outside world, you should learn it.
To help out coders, there are tools to test for compliance. The aptly named pep8
library shows you where your code deviates from the PEP8 standard, and autopep8
goes a step further by trying to fix all of your errors for you. These are both run from the shell. (They are not available in Canopy Basic, so you'll need to install them yourself or get the Academic license.)
pdb
)¶Now that we've seen how errors and exceptions are raised and handled, let's handle some code interactively using pdb
, an invaluable tool for finding subtle errors in numerical code, for instance. (If you have used the GNU DeBugger, gdb
, before then you will see many similarities.)
Essentially, a common mode of use for pdb
allows you to run your Python code normally until an error is reached in the interpreter. At this point, your program crashes—you're probably used to this—but all of the data remain available, and you can manipulate them to figure out what went awry and why. Another mode of interaction lets you pause program execution periodically so you can look "under the hood", figuring out exactly what is going on and whether the results align with expectations.
(This sounds a lot like the Python interpreter, and it is. pdb
just extends that functionality to standalone scripts as well.)
First, let's just run pdb
in a conventional program, either in IPython (which works but can output some strange tracebacks occasionally) or in a conventional Python or IPython interpreter. Another common method for using pdb
—arguably more useful—is a GUI-based IDE such as Spyder, which makes the active use of code breakpoints particularly tractable.
%%file pdb_vel.py
import numpy as np
import pdb
def y_fall(t, x0, v0):
a = -9.8
y = a*t**2 + v0*t + x0
return y
v = 2520 # m/s
x0 = 0 # m
t = np.arange(0,300,0.1) # s
pdb.set_trace()
print(y_fall(t, x0, v))
When you execute python pdb_vel.py
, you notice that a new prompt appears, (Pdb)
. At this prompt, you can type a series of commands. Try, for instance, print v
or print t[0]
. You can even directly manipulate variables: t[0] = -1
. When you are ready to continue execution, enter continue
, at which point the program will proceed directly (including the values of altered variables, if any).
pdb
can be invoked in several ways. Generally speaking, you want it to either pop up after an exception has been raised or run continuously from a designated breakpoint. The following chart illustrates a few ways you can get into the pdb
interface.
This chart records the most commonly used pdb
commands. We'll do some hands-on exercises in a few minutes so you can get a feel for what they do, and which are the most useful in a given context.
Most of these need to be seen in context to appreciate them, but I'll point out up front the most frequently used:
- n[ext]
- s[tep]
- r[eturn]
- p[rint]
%%file divisible.py
from __future__ import print_function
import sys
def is_divisible(a, b):
"""Determines if integer a is divisible by integer b."""
remainder = a % b
# if there's no remainder, then a is divisible by b
if not remainder:
return True
else:
return False
def find_divisors(integer):
"""Find all divisors of an integer and return them as a list."""
divisors = []
# we know that an integer divides itself
divisors.append(integer)
# we also know that the biggest divisor other than the integer itself
# must be at most half the value of the integer (think about it)
divisor = integer / 2
while divisor >= 0:
if is_divisible(integer, divisor):
divisors.append(divisor)
divisor =- 1
return divisors
if __name__ == '__main__':
# do some checking of the user's input
try:
if len(sys.argv) == 2:
# the following statement will raise a ValueError for
# non-integer arguments
test_integer = int(sys.argv[1])
# check to make sure the integer was positive
if test_integer <= 0:
raise ValueError("integer must be positive")
elif len(sys.argv) == 1:
# print the usage if there are no arguments and exit
print(__doc__)
sys.exit(0)
else:
# alert the user they did not provide the correct number of
# arguments
raise ValueError("too many arguments")
# catch the errors raised if sys.argv[1] is not a positive integer
except ValueError as e:
# alert the user to provide a valid argument
print("Error: please provide one positive integer as an argument.")
sys.exit(2)
divisors = find_divisors(test_integer)
# print the results
print("The divisors of %d are:" % test_integer)
for divisor in divisors:
print(divisor)
!python divisible.py 100
pdb
activated as follows:
$ python -m pdb divisible.py 100Consider the following series statement for a Bessel function of the first kind, $$J_{0}(x) = \sum_{m=0}^{\infty} \frac{(-1)^{m}}{m! (m+1)!} {\left(\frac{x}{2}\right)}^{2m} \text{.} $$
%%file bessel.py
from scipy.misc import factorial2 as fact
from scipy.special import j0 # for testing error
import pdb
pdb.set_trace()
def term(m, x):
return ((-1)**m)/(fact(m)*fact(m+1)*(0.5**x)*(2*m))
value = 0.5
max_term = 20
my_sum = 0.0
for i in range(0, max_term):
my_sum += term(i, value)
print('series gives %f'%my_sum)
print('scipy gives %f'%j0(value))
print('error is %f'%(my_sum-j0(value)))
!python bessel.py
pdb
in IPython¶IPython has slightly enhanced support for debugging as compared to the regular Python interpreter. One convenient magic is %debug
, which can be called immediately after failing code to analyze the traceback. (This works poorly in the IPython Notebook, but is more tractable in the command-line interpreter.)
import string
greek = { 'a':u'α', 'b':u'β', 'g':u'γ', 'd':u'δ', 'e':u'ε', 'z':u'ζ' }
for letter in string.ascii_lowercase:
print(greek[letter])
%debug
with
)¶Incidentally, one very useful practice is to allow a Python context manager to deal with much of the boilerplate of setting up and always taking down an object (even in the case of failure). The with
statement is used thus:
sums = [sum(range(0,i+1)) for i in range(1,10)]
with open('sums.txt', 'w') as file_out:
for value in sums:
print(value, file=file_out)
print("Successfully wrote file.")
with open('sums.txt', 'r') as file_in:
for line in file_in:
print(line.strip(), end=',')
print("\nSuccessfully read file.")
Naturally it fails the same way, but it also closes the file without your explicit intervention.
with open('nonexisting.txt', 'r') as file_in:
for line in file_in:
print(line.strip(), end=',')
Thus you still have exceptions raised as you would expect, but if a file operation—or anything else—fails, the object is automatically closed (deallocated, whatever is in __exit__
) by the context manager arranged by with
. (ref)
The following two snippets of code are thus equivalent (src):
Old way: try: f = open("file", "r") try: line = f.readline() finally: f.close() except IOError: # handle error
New way: try: with open("file", "r") as f: line = f.readline() except IOError: # handle error
You can see that the first open
s, reads, and close
s the file; the second does the same, only much more elegantly and Pythonically.
Finite-difference models are used throughout engineering to obtain numerical solutions to differential equations. This particular system models the heat equation
$$ \frac{1}{\alpha} \frac{\partial u}{\partial t} = \frac{\partial^2 u}{\partial x^2}$$given an initial condition of $u(x,t=0) = \sin\left(\pi x/L\right)$ and boundary conditions of $u(x=0,t) = 0$ and $u(x=L,t) = 0$.
To approximate a derivative, the most straightforward way is to take the formal definition
$$f'(x) = \frac{f(x+h)-f(x)}{h}$$and use a small but nonzero step $h$ in your calculation.
Application of this principle to the heat equation leads to a statement of the form
$$ \frac{1}{\alpha} \frac{u^m_i - u^{m-1}_i}{\Delta t} = \frac{u^{m-1}_{i-1} - 2 u^{m-1}_{i} + u^{m-1}_{i+1}}{\Delta x^2} $$or $u^m_i = \frac{\alpha \Delta t}{\Delta x^2}u^{m-1}_{i-1} + \left[1 - 2\left(\frac{\alpha \Delta t}{\Delta x^2}\right)\right]u^{m-1}_{i} + \frac{\alpha \Delta t}{\Delta x^2}u^{m-1}_{i+1}$.
This clearly yields a way to calculate subsequent time steps point-by-point from the previous time step's data.
pdb
. Consider the following elements of the problem:dt
, dx
, etc., correct?%%file fd_he.py
import numpy as np
# Basic parameters
nt = 120
nx = 25
alpha = 0.1
length = 1.0
tmax = 0.5
# Derived parameters: mesh spacing and time step size
dx = length / nx
dt = tmax / (nt-1)
# Create arrays to save data in process.
x = np.linspace(0, length+1e-15, nx)
t = np.linspace(0, tmax+1e-15, nt)
u = np.zeros([nx, nt])
# Set initial and boundary conditions.
u[:, 0] = np.sin(np.pi*x/length)**2
#boundaries are implicitly set by this initial condition
# Loop through each time step.
r = alpha * dt / (dx*dx)
s = 1 - 2*r
for n in range(1, nt):
for j in range(1, nx):
u[n, j] = r*u[j-1, n-1] + s*u[j, n-1] + r*u[j+1, n-1]
# Output the results.
np.savetxt('fd_data.txt', u)
Neal Davis and Lakshmi Rao developed these materials for Computational Science and Engineering at the University of Illinois at Urbana–Champaign. It incorporates some elements from Software Carpentry, contributed by Greg Wilson and others; and The Hacker Within, contributed by Katy Huff, Anthony Scopatz, Joshua R. Smith, and Sri Hari Krishna Narayanan.
This content is available under a [Creative Commons Attribution 3.0 Unported License](https://creativecommons.org/licenses/by/3.0/).