This is an example of unit testing with nose. We are trying to make sure that the function calc_gc properly calculated the gc fraction of the DNA sequence.
Problems worked through in class included --
%%file calc_gc.py
def calc_gc(sequence):
sequence = sequence.upper() # make all chars uppercase
n = sequence.count('T') + sequence.count('A') # count only A, T,
m = sequence.count('G') + sequence.count('C') # C, and G -- nothing else (no Ns, Rs, Ws, etc.)
if n + m == 0:
return 0. # avoid divide-by-zero
return float(m) / float(n + m)
def test_1():
result = round(calc_gc('ATGGCAT'), 2)
print 'hello, this is a test; the value of result is', result
assert result == 0.43
def test_2(): # test handling N
result = round(calc_gc('NATGC'), 2)
assert result == 0.5, result
def test_3(): # test handling lowercase
result = round(calc_gc('natgc'), 2)
assert result == 0.5, result
Overwriting calc_gc.py
Here, the 'nosetests' command looks through calc_gc.py, finds all functions named test_, and runs them.
!nosetests calc_gc.py
... ---------------------------------------------------------------------- Ran 3 tests in 0.001s OK
You can also run nosetests with a '-v' option:
!nosetests -v calc_gc.py
calc_gc.test_1 ... ok calc_gc.test_2 ... ok calc_gc.test_3 ... ok ---------------------------------------------------------------------- Ran 3 tests in 0.001s OK
Here I'm going to set up some regression tests, where we're simply comparing the output of a previously run script with the output of that script now. If we're running on the same data, we should get the same answer... right?
The script just calculates the average of the average GC content of each sequence in 25k.fq.gz.
%%file gc-of-seqs.py
import sys
import screed
import calc_gc
filename = sys.argv[1] # take the sequence filename in from the command line
total_gc = []
for record in screed.open(filename):
gc = calc_gc.calc_gc(record.sequence)
total_gc.append(gc)
print sum(total_gc) / float(len(total_gc))
Overwriting gc-of-seqs.py
# run the script and look at the output -- then write that output into the following file.
!python gc-of-seqs.py 25k.fq.gz
0.607911191366
%%file test_gc_script.py
import subprocess
correct_output = "0.607911191366\n" # this is taken from the previous exec'd cell
# the following function checks to see if running this script at the command line
# returns the right result. make sure you're running this from *within* the python/ subdirectory
# of the 2012-11-scripps/ repository.
def test_run():
p = subprocess.Popen('python gc-of-seqs.py 25k.fq.gz', shell=True, stdout=subprocess.PIPE)
(stdout, stderr) = p.communicate()
assert stdout == correct_output
Overwriting test_gc_script.py
!nosetests test_gc_script.py
. ---------------------------------------------------------------------- Ran 1 test in 0.937s OK