Testing

Code is assumed guilty until proven innocent. This applies to software written by other people, but even more so to software written by yourself. The mechanism that builds trust that software is performing correctly is called testing. Testing is the process by which the expected results of code are compared against the observed results of actually having run that code.

What and How to Test?

Let’s see how this mindset applies to an actual physics problem. Given two previous observations in the sky and the time between them, Kepler’s Laws provide a closedform equation for the future location of a celestial body. This can be implemented via a function named kepler_loc(). The following is a stub interface representing this function that lacks the actual function body:

In [10]:
def kepler_loc(p1, p2, dt, t):
    ...
    return p3

As a basic test of this function, we can take three points on the planet Jupiter’s actual measured path and use the latest of these as the expected result. We will then compare this to the result that we observe as the output of the kepler_loc() function.

The following example is pseudocode for testing that the measured positions of Jupiter, given by the function jupiter(), can be predicted with the kepler_loc() function:

In [11]:
# note that assertions are superior to the ValueError in the example below.
def test_kepler_loc():
    p1 = jupiter(two_days_ago)
    p2 = jupiter(yesterday)
    exp = jupiter(today)
    obs = kepler_loc(p1, p2, 1, 1)
    if exp != obs:
        raise ValueError("Jupiter is not where it should be!")

The following pseudocode represents the basic foundation of this, and indeed all, tests. This example uses an assertion.

In [12]:
def test_func():
    exp = get_expected()
    obs = func(*args, **kwargs)
    assert exp == obs

Below, we can rewrite the kepler test, using an assertion.

In [13]:
def test_keppler_loc():
    p1 = jupiter(two_days_ago)
    p2 = jupiter(yesterday)
    exp = jupiter(today)
    obs = keppler_loc(p1, p2, 1, 1)
    assert exp == obs
Exercise: Add a Test to Your Project

1) Create a file called test_filename.py for a file (filename) in your project source code.

2) For the most important function in the file, create a test function using an assertion.

3) Save and run the test file. Does the test pass? How can you tell?

Nose

nose has a variety of helpful and specific assertion functions that display extra debugging information when they fail. These are all accessible through the nose.tools module. The simplest one is named assert_equal().

In [14]:
from nose.tools import assert_equal

def test_kepler_loc():
    p1 = jupiter(two_days_ago)
    p2 = jupiter(yesterday)
    exp = jupiter(today)
    obs = keppler_loc(p1, p2, 1, 1)
    assert_equal(exp, obs)

Running Tests

The major boon a testing framework provides is a utility to find and run the tests automatically. With nose, this is a command-line tool called nosetests. If the following fibonacci function is being tested, we can run all of the tests with the nosetests command on the command line.

In [15]:
def fib(n):
    if n == 0 or n == 1:
        return 1
    else:
        return fib(n - 1) + fib(n - 2)
In [16]:
from nose.tools import assert_equal

def test_fib0():
    # test edge 0
    obs = fib(0)
    assert_equal(1, obs)

def test_fib1():
    # test edge 1
    obs = fib(1)
    assert_equal(1, obs)

def test_fib6():
    # test regular point
    obs = fib(6)
    assert_equal(13, obs)
In [17]:
# Running the test functions manually should produce no output if the tests pass
test_fib0()
test_fib1()
test_fib6()
Exercise: Use Nose For Your Project

1) Rewrite your new project test to use nose instead.

2) To run the nose test, type nosetests in the directory where the test file is stored.

3) Attempt this and debug your function and test until the test passes.

4) If you have extra time, try writing another test or two.

In [18]:
import numpy as np

def sinc2d(x, y):
    if x == 0.0 and y == 0.0:
        return 1.0
    elif x == 0.0:
        return np.sin(y) / y
    elif y == 0.0:
        return np.sin(x) / x
    else:
        return (np.sin(x) / x) * (np.sin(y) / y)
In [19]:
import numpy as np
from nose.tools import assert_equal

def test_internal():
    exp = (2.0 / np.pi) * (-2.0 / (3.0 * np.pi))
    obs = sinc2d(np.pi / 2.0, 3.0 * np.pi / 2.0)
    assert_equal(exp, obs)

def test_edge_x():
    exp = (-2.0 / (3.0 * np.pi))
    obs = sinc2d(0.0, 3.0 * np.pi / 2.0)
    assert_equal(exp, obs)

def test_edge_y():
    exp = (2.0 / np.pi)
    obs = sinc2d(np.pi / 2.0, 0.0)
    assert_equal(exp, obs)

def test_corner():
    exp = 1.0
    obs = sinc2d(0.0, 0.0)
    assert_equal(exp, obs)
In [20]:
# run the tests for sinc2() manually here!
In [21]:
def a(x):
    return x + 1

def b(x):
    return 2 * x

def c(x):
    return b(a(x))
In [22]:
from nose.tools import assert_equal

def test_c():
    exp = 6
    obs = c(2)
    assert_equal(exp, obs)
    
test_c()

Test-Driven Development

To start, we write a test for computing the standard deviation from a list of numbers as follows:

In [23]:
from nose.tools import assert_equal

def test_std1():
    obs = std([0.0, 2.0])
    exp = 1.0
    assert_equal(obs, exp)

Next, we write the minimal version of std() that will cause test_std1() to pass:

In [24]:
def std(vals):
    # surely this is cheating...
    return 1.0

# run the test
test_std1()

If we only ever want to take the standard deviation of the numbers 0.0 and 2.0, or 1.0 and 3.0, and so on, then this implementation will work perfectly. If we want to branch out, then we probably need to write more robust code. However, before we can write more code, we first need to add another test or two:

In [25]:
def test_std1():
    obs = std([0.0, 2.0])
    exp = 1.0
    assert_equal(obs, exp)

def test_std2():
    obs = std([]) 
    exp = 0.0
    assert_equal(obs, exp)

def test_std3():
    obs = std([0.0, 4.0])
    exp = 2.0
    assert_equal(obs, exp)
    
# run the tests
test_std1()
test_std2()
test_std3()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-25-7d31efb13ae3> in <module>()
     16 # run the tests
     17 test_std1()
---> 18 test_std2()
     19 test_std3()

<ipython-input-25-7d31efb13ae3> in test_std2()
      7     obs = std([])
      8     exp = 0.0
----> 9     assert_equal(obs, exp)
     10 
     11 def test_std3():

/Users/khuff/anaconda3/lib/python3.4/unittest/case.py in assertEqual(self, first, second, msg)
    795         """
    796         assertion_func = self._getAssertEqualityFunc(first, second)
--> 797         assertion_func(first, second, msg=msg)
    798 
    799     def assertNotEqual(self, first, second, msg=None):

/Users/khuff/anaconda3/lib/python3.4/unittest/case.py in _baseAssertEqual(self, first, second, msg)
    788             standardMsg = '%s != %s' % _common_shorten_repr(first, second)
    789             msg = self._formatMessage(msg, standardMsg)
--> 790             raise self.failureException(msg)
    791 
    792     def assertEqual(self, first, second, msg=None):

AssertionError: 1.0 != 0.0

We'll need to improve the function to make these pass.

In [ ]:
def std(vals):
    # a little better
    if len(vals) == 0:
        return 0.0
    return vals[-1] / 2.0

Even though the tests all pass, this is clearly still not a generic standard deviation function. To create a better implementation, TDD states that we again need to expand the test suite:

In [ ]:
def test_std1():
    obs = std([0.0, 2.0])
    exp = 1.0
    assert_equal(obs, exp)

def test_std2():
    obs = std([])
    exp = 0.0
    assert_equal(obs, exp)

def test_std3():
    obs = std([0.0, 4.0])
    exp = 2.0
    assert_equal(obs, exp)

def test_std4():
    obs = std([1.0, 3.0])
    exp = 1.0
    assert_equal(obs, exp)

def test_std5():
    obs = std([1.0, 1.0, 1.0])
    exp = 0.0
    assert_equal(obs, exp)

# run the tests
test_std1()
test_std2()
test_std3()
test_std4()
test_std5()

At this point, we may as well try to implement a generic standard deviation function. We would spend more time trying to come up with clever approximations to the standard deviation than we would spend actually coding it. Just biting the bullet, we might write the following implementation:

In [ ]:
def std(vals):
    # finally, some math
    n = len(vals)
    if n == 0:
        return 0.0
    mu = sum(vals) / n
    var = 0.0
    for val in vals:
        var = var + (val - mu)**2
    return (var / n)**0.5
In [ ]:
# run the tests
test_std1()
test_std2()
test_std3()
test_std4()
test_std5()

Testing Wrap-Up

At this point, you should know that:

  • Tests compare that the result observed from running code is the same as what was expected ahead of time.
  • Tests should be written at the same time as the code they are testing is written.
  • The person best suited to write a test is the author of the original code.
  • Tests are grouped together in a test suite.
  • Test frameworks, like nose, discover and execute tests for you automatically.
  • An edge case is when an input is at the limit of its range.
  • A corner case is where two or more edge cases meet.
  • Unit tests try to test the smallest pieces of code possible, usually functions and methods.
  • Integration tests make sure that code units work together properly.
  • Regression tests ensure that everything works the same today as it did yesterday.
  • Test generators can be used to efficiently check many cases.
  • Test coverage is the percentage of the code base that is executed by the test suite.
  • Test-driven development says to write your tests before you write the code that is being tested.
In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()
Out[1]: