This assignment will first require you to gain familiarity with two of the most common web based tools for dealing with human genomics, the UCSC human genome browser and the 1000 genomes database. All of the questions in this assignment will be based on actual data from those two sites. Once you are familiar with these tools, you will use the data to complete some simple population genetics excercises.
Become familiar with resources in the field of human genetics. Develop an understanding of how the principles of population genetics can be used to deduce physical characteristics of populations based on a given populations genetic makeup
You should turn in two files a .pdf file with all of your written answers and a .ipynb file with any code that you ran.
Remember to learn about what a function does you can run:
help(name_of_function)
Try this with the funcitons below to see what they do.
from __future__ import division
from assignment_5_util import get_genotype_counts, create_x2_distribution_plot
There are three vcf files available, generated from the 1000 genomes dataset. Complete the follwing questions for the GBR population and the Finnish populations individually and then again for the populations together. For the snp at location: 89721094. You are welome to calculate these values by hand or with python.You will need to show your work.
The three vcf files which can be opened by running the cell below contain information about the Finnish population(FIN) the British population(GBR) and the two populations together(GBR_and_FIN)
#Use this call to calculate the values in question 3.
Great_Britain = 'GBR.vcf'
Finland = 'FIN.vcf'
GBR_and_FIN = 'GBR_and_FIN.vcf'
#Use this cell to help answer question 3
create_x2_distribution
function to estimate the p-value for each of the test statistic generated above. This does not have to be an exact number though you should indicate whether it is signifcant or not based an an alpha of .05.#Use this cell to answer question 4