code by Steven H. D. Haddock and Casey W. Dunn as described in:
Practical Computing for Biologists
by Steven H. D. Haddock and Casey W. Dunn
published in 2011 by Sinauer Associates.
ISBN 978-0-87893-391-4http://www.sinauer.com/practical-computing-for-biologists.html
see practicalcomputing.orgscripts freely available by the original authors at practicalcomputing.org
DIRECT LINK: http://practicalcomputing.org/files/pcfb_examples.zip
Updated to Python 3 by Wayne Decaturposted as a Gist and IPython Notebook by Wayne (fomightez at GitHub) with full credit and reference to original code authors.¶
DNASeq = "ATGTCTCATTCAAAGCA"
SeqLength = float(len(DNASeq))
BaseList = "ACGT"
for Base in BaseList:
Percent = 100 * DNASeq.count(Base) / SeqLength
print ("%s: %4.1f" % (Base,Percent))
A: 35.3 C: 23.5 G: 11.8 T: 29.4
See the code in action and explore it interactively here.
Obtain a copy of this entire IPython Notebook here in order to explore it interactively.
%whos
Variable Type Data/Info ------------------------------ Base str T BaseList str ACGT DNASeq str ATGTCTCATTCAAAGCA Percent float 29.41176470588235 SeqLength float 17.0
The above special command lets us see what is defined and can be used in an IPython Notebook.
(For some reason it doesn't work for any of the initiating variables over in the interactive gist console.)
We can go ahead and define a function that will do this caculation:
def calc(MyDNASeq):
SeqLength = float(len(MyDNASeq))
BaseList = "ACGT"
for Base in BaseList:
Percent = 100 * MyDNASeq.count(Base) / SeqLength
print ("%s: %4.1f" % (Base,Percent))
Then define a variable
MyDNASeq="TTGGGGGGCGAAAA"
Then we feed that variable to the function:
calc(MyDNASeq)
A: 28.6 C: 7.1 G: 50.0 T: 14.3
In fact, we can even skip the variable and directly input the sequence into the function:
calc("TGTTTTTCTTTTTCCCCCCCAAAA")
A: 16.7 C: 33.3 G: 4.2 T: 45.8