Notebook

Answer all questions and submit them either as an IPython notebook, LaTeX document, or Markdown document. Each question is worth 25 points.

This homework is due Friday, February 12, 2022.

In [1]:

%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

DATA_URL = 'https://raw.githubusercontent.com/fonnesbeck/Bios8366/master/data/'

Question 1¶

The data below provides counts of a flour beetle (Tribolium confusum) population at various points in time:

In [2]:

days = 0,8,28,41,63,79,97,117,135,154
beetles = 2,47,192,256,768,896,1120,896,1184,1024

plt.plot(days, beetles)

Out[2]:

[<matplotlib.lines.Line2D at 0x1c357adb100>]

An elementary model for population growth is the logistic model:

$$\frac{dN}{dt} = rN\left(1 - \frac{N}{K}\right)$$

where $N$ is population size, $t$ is time, $r$ is a growth rate parameter, and $K$ is a parameter that represents the population carrying capacity of the environment. The solution to this differential equation is given by:

$$N_t = f(t) = \frac{KN_0}{N_0 + (K - N_0)\exp(-rt)}$$

where $N_t$ denotes the population size at time $t$.

Fit the logistic growth model to the flour beetle data using optimization to minimize the sum of squared errors between model predictions and observed counts.
In many population modeling applications, an assumption of lognormality is adopted. The simplest assumption would be that the $\log(N_t)$ are independent and normally distributed with mean $\log[f(t)]$ and variance $\sigma^2$. Find the MLEs under this assumption, and provide estimates of standard errors and correlation between them.

In [3]:

# Write your answer here

Question 2¶

Implement simulated annealing for minimizing the AIC for the baseball salary regression problem. Model your algorithm on the example given in class.
1. Compare the effects of different cooling schedules (different temperatures and different durations at each temperature).
2. Compare the effect of a proposal distribution that is discrete uniform over 2-neighborhoods versus one that is discrete uniform over 3-neighborhoods.
Implement a genetic algorithm for minimizing the AIC for the baseball salary regression problem. Model your algorithm on Example 3.5.
1. Compare the effects of using different mutation rates.
2. Compare the effects of using different generation sizes.
3. Instead of the selection mechanism used in the class example, try using independent selection of both parents with probabilities proportional to their fitness.

In [4]:

baseball = pd.read_table(DATA_URL + 'baseball.dat', sep='\s+')

In [5]:

# Write your answer here

Question 3¶

Use the combinatorial optimization method of your choice to obtain a solution to the traveling salesman problem for the Brazilian cities described in the lecture notes, using minimum total distance as the criterion. Use the the first city listed in the dataset as "home" (i.e. the trip must start and end there). I will award 5 bonus points to the best solution!

In [6]:

def parse_latlon(x):
    d, m, s = map(float, x.split(':'))
    ms = m/60. + s/3600.
    if d<0:
        return d - ms
    return d + ms

cities =  pd.read_csv(DATA_URL + 'brasil_capitals.txt', 
                      names=['city','lat','lon'])[['lat','lon']].applymap(parse_latlon)

In [7]:

# Write your answer here

Question 4¶

The ../data/ebola folder contains summarized reports of Ebola cases from three countries during the recent outbreak of the disease in West Africa. For each country, there are daily reports that contain various information about the outbreak in several cities in each country.

From these data files, use pandas to import them and create a single data frame that includes the daily totals of new cases and deaths for each country.

In [8]:

# Write your answer here