## unit tests

This is an example of unit testing with nose. We are trying to make sure that the function calc_gc properly calculated the gc fraction of the DNA sequence.

Problems worked through in class included --

1. the sequence contained 'N's
2. the sequence contained lowercase char
3. divide by zero for sequences with no A, T, C, G
In [1]:
%%file calc_gc.py
def calc_gc(sequence):
sequence = sequence.upper()                    # make all chars uppercase
n = sequence.count('T') + sequence.count('A')  # count only A, T,
m = sequence.count('G') + sequence.count('C')  # C, and G -- nothing else (no Ns, Rs, Ws, etc.)
if n + m == 0:
return 0.                                  # avoid divide-by-zero
return float(m) / float(n + m)

def test_1():
result = round(calc_gc('ATGGCAT'), 2)
print 'hello, this is a test; the value of result is', result
assert result == 0.43

def test_2(): # test handling N
result = round(calc_gc('NATGC'), 2)
assert result == 0.5, result

def test_3(): # test handling lowercase
result = round(calc_gc('natgc'), 2)
assert result == 0.5, result

Overwriting calc_gc.py


## Running nosetests

Here, the 'nosetests' command looks through calc_gc.py, finds all functions named test_, and runs them.

In [2]:
!nosetests calc_gc.py

...
----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK


You can also run nosetests with a '-v' option:

In [3]:
!nosetests -v calc_gc.py

calc_gc.test_1 ... ok
calc_gc.test_2 ... ok
calc_gc.test_3 ... ok

----------------------------------------------------------------------
Ran 3 tests in 0.001s

OK


## Regression testing

Here I'm going to set up some regression tests, where we're simply comparing the output of a previously run script with the output of that script now. If we're running on the same data, we should get the same answer... right?

The script just calculates the average of the average GC content of each sequence in 25k.fq.gz.

In [4]:
%%file gc-of-seqs.py
import sys
import screed
import calc_gc

filename = sys.argv[1]    # take the sequence filename in from the command line
total_gc = []
for record in screed.open(filename):
gc = calc_gc.calc_gc(record.sequence)
total_gc.append(gc)

print sum(total_gc) / float(len(total_gc))

Overwriting gc-of-seqs.py

In [5]:
# run the script and look at the output -- then write that output into the following file.
!python gc-of-seqs.py 25k.fq.gz

0.607911191366

In [6]:
%%file test_gc_script.py
import subprocess

correct_output = "0.607911191366\n"   # this is taken from the previous exec'd cell

# the following function checks to see if running this script at the command line
# returns the right result.  make sure you're running this from *within* the python/ subdirectory
# of the 2012-11-scripps/ repository.
def test_run():
p = subprocess.Popen('python gc-of-seqs.py 25k.fq.gz', shell=True, stdout=subprocess.PIPE)
(stdout, stderr) = p.communicate()
assert stdout == correct_output


Overwriting test_gc_script.py

In [7]:
!nosetests test_gc_script.py

.
----------------------------------------------------------------------
Ran 1 test in 0.937s

OK

In [7]: