Answer all questions and submit them either as an IPython notebook, LaTeX document, or Markdown document. Each question is worth 25 points.
This homework is due Friday, February 12, 2022.
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
DATA_URL = 'https://raw.githubusercontent.com/fonnesbeck/Bios8366/master/data/'
The data below provides counts of a flour beetle (Tribolium confusum) population at various points in time:
days = 0,8,28,41,63,79,97,117,135,154
beetles = 2,47,192,256,768,896,1120,896,1184,1024
plt.plot(days, beetles)
[<matplotlib.lines.Line2D at 0x1c357adb100>]
An elementary model for population growth is the logistic model:
$$\frac{dN}{dt} = rN\left(1 - \frac{N}{K}\right)$$where $N$ is population size, $t$ is time, $r$ is a growth rate parameter, and $K$ is a parameter that represents the population carrying capacity of the environment. The solution to this differential equation is given by:
$$N_t = f(t) = \frac{KN_0}{N_0 + (K - N_0)\exp(-rt)}$$where $N_t$ denotes the population size at time $t$.
Fit the logistic growth model to the flour beetle data using optimization to minimize the sum of squared errors between model predictions and observed counts.
In many population modeling applications, an assumption of lognormality is adopted. The simplest assumption would be that the $\log(N_t)$ are independent and normally distributed with mean $\log[f(t)]$ and variance $\sigma^2$. Find the MLEs under this assumption, and provide estimates of standard errors and correlation between them.
# Write your answer here
Implement simulated annealing for minimizing the AIC for the baseball salary regression problem. Model your algorithm on the example given in class.
Implement a genetic algorithm for minimizing the AIC for the baseball salary regression problem. Model your algorithm on Example 3.5.
baseball = pd.read_table(DATA_URL + 'baseball.dat', sep='\s+')
# Write your answer here
Use the combinatorial optimization method of your choice to obtain a solution to the traveling salesman problem for the Brazilian cities described in the lecture notes, using minimum total distance as the criterion. Use the the first city listed in the dataset as "home" (i.e. the trip must start and end there). I will award 5 bonus points to the best solution!
def parse_latlon(x):
d, m, s = map(float, x.split(':'))
ms = m/60. + s/3600.
if d<0:
return d - ms
return d + ms
cities = pd.read_csv(DATA_URL + 'brasil_capitals.txt',
names=['city','lat','lon'])[['lat','lon']].applymap(parse_latlon)
# Write your answer here
The ../data/ebola
folder contains summarized reports of Ebola cases from three countries during the recent outbreak of the disease in West Africa. For each country, there are daily reports that contain various information about the outbreak in several cities in each country.
From these data files, use pandas
to import them and create a single data frame that includes the daily totals of new cases and deaths for each country.
# Write your answer here