Notebook

Setup¶

I want to define an automated way of solving a small subset of the genetic problems we have been solving in class. I am focusing problems that might involve a Punnet Square, if done on paper, and that show only classical Mendelian inhertiance.

Going up the tree¶

The first problem was one from a packet in class:

In guinea pigs R-rough, r-smooth, S-short hair, s-long hair, B-black, b-white. A rough, long haired, black male is mated to a rough, short haired, white female. After they have produced several litters, their offspring are found to be as follows: 15 rough, short haired, black; 13 rough, long haired, black, 4 smooth, short haired, black; 4 smooth, long haired, black. What were the genotypes of the parents?

When we solved it in class, we realized that there actually two possible solutions, but one is much more likely than the other. So I want it to return the likelihood of each solution.

In [59]:

from collections import Counter
from helpers import Parents

assert round_values(parent_genotypes(
    parents_phenotypes=Parents( # parents are specified pheontypically
        ('rough', 'long haired', 'black'),
        ('rough', 'short haired', 'white'),
    ),
    offspring_phenotypes=list(Counter({ # offspring as well
        ('rough', 'short haired', 'black'): 15,
        ('rough', 'long haired', 'black'): 13,
        ('smooth', 'short haired', 'black'): 4,
        ('smooth', 'long haired', 'black'): 4,
    }).elements()),
    structure=( # show each of the traits, dominant first. genotype is created with first lettter of word
        ('rough', 'smooth'),
        ('short haired', 'long haired'),
        ('black', 'white'),
    ),
), 1) == Counter({'RrssBB (x) RrSsbb': 1.0, 'RrssBb (x) RrSsbb': 0.0}) # map genotypes of parents to probability

I wrote a lot of code to solve this, so I moved most of it into a seperate file. However here is the parent function that is the most readable and everything else is just helpers for that.

In [60]:

from helpers import *

from itertools import repeat
from collections import Counter

def parent_genotypes(parents_phenotypes, offspring_phenotypes, structure):
    '''
    Returns the probabiblity of different parent genotypes, based on the phenotypes of the known children
    and parents.
    '''

    parents_genotypes_probability = Counter()

    # lets try a brute force technique. get a combination of every possible parents genotypes and
    # try them on the parents and offspring phenotypes to see if they match.
    for trial_parents_genotypes in all_genotype_parents_from_structure(structure):
        # first make sure that the parents work
        trial_parents_phenotypes = map(phenotype_from_genotype, trial_parents_genotypes.elements(), repeat(structure))
        if Counter(trial_parents_phenotypes) == Counter(parents_phenotypes):
            # then we want to see if the real offspring are a subset of the children of those parents
            trial_offspring_genotypes = offspring_genotypes(trial_parents_genotypes.elements())
            trial_offspring_phenotypes = Counter(map(phenotype_from_genotype, trial_offspring_genotypes.elements(), repeat(structure)))
            if set(offspring_phenotypes).issubset(set(trial_offspring_phenotypes)):
                # then we know that those parents could actually be the right ones, however they might not be,
                # so we also need to compute, how probable it is.

                # one way to look at it is, if the `trial_parents` had `len(offspring_phenotypes)` kids, what percentage
                # of the time would the phenotypes of their kids be `offspring_phenotypes`?

                # that gives us the probability of getting the actual litter we have with these parents.

                # then we can compare that probability to the other options, and see which one is more likely.
                chance_that_offspring_came_from_trial_parents = 1

                trial_offspring_phenotypes_percentiles = normalize_values(trial_offspring_phenotypes)
                for offspring_phenotype in offspring_phenotypes:
                    # calculate the chance of getting each child, and since we are talking about succesion we multiply them
                    chance_that_offspring_came_from_trial_parents *= trial_offspring_phenotypes_percentiles[offspring_phenotype]

                parents_genotypes_probability[pprint_parents_genotypes(trial_parents_genotypes.elements())] = chance_that_offspring_came_from_trial_parents

    return normalize_values(parents_genotypes_probability)

How Many Offspring Do We Need?¶

Another problem I thought was interesting was the case of parents where one is dominant and the other recessive, phenotypically and children that all turn out to be dominant. If you can't tell the genotypess directly, how many children should you breed until you can be fairly certain that the genotypes are 'dd (x) DD' and not 'dd (x) Dd', if all the children keep turning out dominant?

In [61]:

def n_dom_child(n):
    return parent_genotypes(
        parents_phenotypes=Parents(
            ('dom',),
            ('rec',),
        ),
        offspring_phenotypes=list(Counter({('dom',): n}).elements()),
        structure=(
            ('dom', 'rec'),
        ),
    )

In [62]:

n_dom_child(1)

Out[62]:

Counter({'dd (x) DD': 0.6666666666666666, 'dd (x) Dd': 0.3333333333333333})

So with one kid who is dominant, there is a 66% chance of the parents being 'dd (x) DD'

In [67]:

[[i, n_dom_child(i).most_common()] for i in range(10)]

Out[67]:

[[0, [('dd (x) DD', 0.5), ('dd (x) Dd', 0.5)]],
 [1, [('dd (x) DD', 0.6666666666666666), ('dd (x) Dd', 0.3333333333333333)]],
 [2, [('dd (x) DD', 0.8), ('dd (x) Dd', 0.2)]],
 [3, [('dd (x) DD', 0.8888888888888888), ('dd (x) Dd', 0.1111111111111111)]],
 [4, [('dd (x) DD', 0.9411764705882353), ('dd (x) Dd', 0.058823529411764705)]],
 [5, [('dd (x) DD', 0.9696969696969697), ('dd (x) Dd', 0.030303030303030304)]],
 [6, [('dd (x) DD', 0.9846153846153847), ('dd (x) Dd', 0.015384615384615385)]],
 [7, [('dd (x) DD', 0.9922480620155039), ('dd (x) Dd', 0.007751937984496124)]],
 [8,
  [('dd (x) DD', 0.9961089494163424), ('dd (x) Dd', 0.0038910505836575876)]],
 [9, [('dd (x) DD', 0.9980506822612085), ('dd (x) Dd', 0.001949317738791423)]]]

And then you can see that by 6 kids you have a 98% chance of being 'dd (x) DD'. Good enough for me!

In [ ]: