organisms = ['Pan troglodytes', 'Gallus gallus', 'Xenopus laevis', 'Vipera palaestinae']
We access elements of lists by using their index:
print(organisms[0])
print(organisms[2])
Pan troglodytes Xenopus laevis
Dictionaries are another data structure used to store collections of elements, only this time they can be accessed through a key. Keys can be anything - a string, an integer, float and so on. Each key is connected to a value.
organisms_classes = {'Pan troglodytes': 'Mammalia', 'Gallus gallus': 'Aves', 'Xenopus laevis': 'Amphibia', 'Vipera palaestinae': 'Reptilia'}
In this dictionary, the keys are the organisms and the values are the class of each organism. Both are of type str
.
Another example would be a dictionary representing the number of observations of various species:
observations = {'Equus zebra': 143,
'Hippopotamus amphibius': 27,
'Giraffa camelopardalis': 71,
'Panthera leo': 112}
Here, the keys are of type str
and the values are of type int
. Any other combination could be used.
Accessing a dictionary record is similar to what we did with lists, only this time we'll call a key instead of an index:
print(organisms_classes['Pan troglodytes'])
print(organisms_classes['Gallus gallus'])
Mammalia Aves
We can change the dictionary by simply assigning a new value to a key.
organisms_classes['Pan troglodytes'] = 'Mammals'
print(organisms_classes['Pan troglodytes'])
Mammals
Similarly, we can use this syntax to add new records:
organisms_classes['Danio rerio'] = 'Actinopterygii'
print(organisms_classes['Danio rerio'])
Actinopterygii
Note: A dictionary may not contain multiple records with the same key, but it may contain many keys with the same value.
Remember the for loop and how we used it to loop on lists?
for organism in organisms:
print(organism)
Pan troglodytes Gallus gallus Xenopus laevis Vipera palaestinae
Well, it also works on dictionaries! The for loop simply itterates over the keys of the dictionary.
for organism in organisms_classes:
print(organism, 'belongs to the', organisms_classes[organism], 'class.')
Pan troglodytes belongs to the Mammals class. Gallus gallus belongs to the Aves class. Vipera palaestinae belongs to the Reptilia class. Danio rerio belongs to the Actinopterygii class. Xenopus laevis belongs to the Amphibia class.
Notice that dictionary items don't keep their original order.
We can even change values while looping:
for animal in observations:
if observations[animal] > 50:
observations[animal] = True
else:
observations[animal] = False
print(observations)
{'Giraffa camelopardalis': True, 'Equus zebra': True, 'Panthera leo': True, 'Hippopotamus amphibius': False}
We can check if a key is in the dictionary using an if statement:
'Vipera palaestinae' in organisms_classes
True
'Bos taurus' in organisms_classes
False
new_organism = ['Vipera palaestinae', 'Bos taurus']
for organism in new_organism:
if organism in organisms_classes:
print(organism, 'belongs to the', organisms_classes[organism], 'class.')
else:
print(organism, 'not found in dictionary.')
Vipera palaestinae belongs to the Reptilia class. Bos taurus not found in dictionary.
# Create dictionary
details_dict = {'Name': 'James Watson', 'Address': 'Cambridge', 'Phone': '12345678'}
# print sentence
print("My name is",details_dict['Name'],"I live in",details_dict['Address'],"My phone number is",details_dict['Phone'])
print("your print", end='')
# Create codons dictionary
bases = ['t', 'c', 'a', 'g']
codons = [a+b+c for a in bases for b in bases for c in bases]
amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
codon_table = dict(zip(codons, amino_acids))
# Sequence list
seq_list = ["atg","caa","ggc","ata","tca","tgg","cga","agg","cct","taa"]
# iterate on list and translate
for codon in seq_list:
print(codon_table[codon], end='')
MQGISWRRP*
x = 3
y = 2*x + 6
print(y)
12
A function is a piece of code that performs some process. Like the mathematical concept, a function receives inputs and returns outputs.
We define functions with the def command.
The general syntax is:
def function_name(input1, input2, input3,...):
# some processes
.
.
.
return output
File "<ipython-input-3-e73f96cd0bf6>", line 1 def function_name(input1, input2, input3,...): ^ SyntaxError: invalid syntax
def linear1 (x):
y = 2*x + 6
return y
Once a function is defined, we can call it whenever we need it (i.e. multiple times), with different inputs.
result1 = linear1(3)
print(result1)
12
result2 = linear1(7)
print(result2)
20
A function may have more than one input, and they can also be other types of variables.
For example, the following function receives a list of sequences and concatenates a given sequence string to each sequence in the list. It then returns the new list.
def concat_to_sequences(sequence_list, sequence_to_concat):
new_list = []
for seq in sequence_list:
new_list.append(seq + sequence_to_concat)
return new_list
my_sequences = ['AGTTAGAGTTA', 'TTACCAGTG', 'GGCAACTTTAGG']
new_sequences = concat_to_sequences(my_sequences, 'GGG')
print(my_sequences)
print(new_sequences)
['AGTTAGAGTTA', 'TTACCAGTG', 'GGCAACTTTAGG'] ['AGTTAGAGTTAGGG', 'TTACCAGTGGGG', 'GGCAACTTTAGGGGG']
The inputs of a function are also called Arguments or formal variables.
So why bother? Can't we just write code as we did so far and avoid all that functions mess?
Functions are good for (at least) three reasons:
Now, let's use some of the stuff we've learned to write a function that finds the reverse complement of a given sequence. Let's start by finding the complement.
def complement(sequence):
transcript_dict = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
complement = ''
for base in sequence:
complement += transcript_dict[base]
return complement
my_dna = 'ACGCTATTAGAGGGCGAGAAGCTAGAGGA'
my_complement = complement(my_dna)
print(my_complement)
TGCGATAATCTCCCGCTCTTCGATCTCCT
Now, let's write another function, that reverses a given sequences.
def reverse_sequence(sequence):
reversed_seq = ''
seq_as_list = list(sequence)
for base in reversed(seq_as_list):
reversed_seq += base
return reversed_seq
my_reverse_complement = reverse_sequence(my_complement)
print(my_reverse_complement)
TCCTCTAGCTTCTCGCCCTCTAATAGCGT
We can call functions from within a function, thereby wrapping the two functions we have in a third function.
def reverse_complement(sequence):
complement_seq = complement(sequence)
reverse_complement = reverse_sequence(complement_seq)
return reverse_complement
print(reverse_complement(my_dna))
TCCTCTAGCTTCTCGCCCTCTAATAGCGT
Fuctions don't have to return anything. Sometimes they just print stuff to the screen or to a file (next lesson). For example, we can take the function we created above and simply replace 'return' with 'print':
def print_reverse_complement(sequence):
complement_seq = complement(sequence)
reverse_complement = reverse_sequence(complement_seq)
print(reverse_complement)
print_reverse_complement(my_dna)
TCCTCTAGCTTCTCGCCCTCTAATAGCGT
So, what's the difference between return and print???
As the names suggest, while print just prints the output of the function, return retrns a value that can be stored within a variable. The difference is especially noticable when the output is not a string (e.g. list, dictionary etc). Even if the output is a string, retun let's you further manipulate the output, while print does not.
my_reverse_complement = reverse_complement(my_dna)
final_sequence = "ATG" + my_reverse_complement + "TAA"
print(final_sequence)
ATGTCCTCTAGCTTCTCGCCCTCTAATAGCGTTAA
It is considered good practice to add documentation to functions you write - what do they do, what's their input and output etc. It becomes very useful once you have lots of code that you want to reuse. If you document your functions, you won't have to read the whole code when you need them again.
Documenting functions is done by adding a 'docstring' right under the definition line. It is enclosed by """. For example:
def reverse_complement(sequence):
"""
Receives a string of DNA sequence and returns a string of it's reverse complement
"""
complement_seq = complement(sequence)
reverse_complement = reverse_sequence(complement_seq)
return reverse_complement
You can easily access the documentation of a function using the help()
command.
help(reverse_complement)
Help on function reverse_complement in module __main__: reverse_complement(sequence) Receives a string of DNA sequence and returns a string of it's reverse complement
# define function
def first_5_longer_sequence(seq1,seq2):
if len(seq1) > len(seq2):
return seq1[:5]
else:
return seq2[:5]
# Test function
sequence1 = "aggtctcggatataggcgcgatattta"
sequence2 = "ttaagccacgcttcggatta"
first_5 = first_5_longer_sequence(sequence1, sequence2)
print(first_5)
# define function
def odd_bases(seq):
odd_bases_list = []
for i in range(len(seq)):
if i % 2 == 0:
odd_bases_list.append(seq[i])
return odd_bases_list
# or another option
def odd_bases(seq):
odd_bases_list = list(seq[::2])
return odd_bases_list
# Test function
odd_bases_list = odd_bases("aggtctcggatataggcgcgatattta")
print(odd_bases_list)
In fact, we've used functions before, without defining them first. For example: print(), type(), int(), len() etc. These functions are provided by the courtesy of Python developers. It is strongly adviced not to overwrite built-in functions with your own functions. That is, don't do:
def len(lst):
.
.
.
just use another name...
We can acquire more functions written by others by importing them into our code. We'll do that on the next lesson.
Assume we have the following function, that calculates the hypotenuse (יתר) given two sides of a right triangle. (Remember Pythagoras' theorem?)
def pythagoras(a,b):
hypo_square = a**2 + b**2
hypo = hypo_square**0.5
And now we want to run our function on the sides a = 3 and b = 5. So we do:
pythagoras(3,5)
print(hypo)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-14-4c15aca3e7a8> in <module>() 1 pythagoras(3,5) ----> 2 print(hypo) NameError: name 'hypo' is not defined
What happened to our result???
The answer is Scope!
The variable hypo 'lives' only as long as the function is running. In other words, it exists only withing the scope of the function, and so do a, b and hypo_square!
If we try to print hypo from within the function:
def pythagoras(a,b):
hypo_square = a**2 + b**2
hypo = hypo_square**0.5
print(hypo)
pythagoras(3,5)
5.830951894845301
Or even better, we can use the return statement to get the result. Like this:
def pythagoras(a,b):
hypo_square = a**2 + b**2
hypo = hypo_square**0.5
return(hypo)
result = pythagoras(3,5)
print(result)
5.830951894845301
This notebook is part of the Python Programming for Life Sciences Graduate Students course given in Tel-Aviv University, Spring 2015.
The notebook was written using Python 3.4.1 and IPython 2.1.0 (download from PyZo).
The code is available at https://github.com//Py4Life/TAU2015/blob/master/lecture3.ipynb.
The notebook can be viewed online at http://nbviewer.ipython.org//Py4Life/TAU2015/Py4Life/blob/master/lecture3.ipynb.
The notebook is also available as a PDF at https://github.com//Py4Life/TAU2015/blob/master/lecture3.pdf?raw=true.
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.