We will go over mathematical, statistical, and computational material and tools relevant to the practice of Statistics.
Along the way, we will look at several papers that use Statistics to address problems in science and engineering.
The papers are from reputable journals, but you will find that the use of Statistics is, in general, quite bad. Looking carefully at the application of Statistics in these examples has several potential benefits:
The next few chapters present an introduction to Jupyter notebook and R, some mathematical background, some linear algebra, and some probability. They are rather technical.
Before jumping into that, we will spend several hours discussing two headline-grabbing papers. The first is from Science, entitled "Accelerating extinction risk from climate change" by Mark Urban. The paper appeared in volume 348 and was published on 1 May 2015.
Here is the abstract:
Current predictions of extinction risks from climate change vary widely depending on the specific
assumptions and geographic and taxonomic focus of each study. I synthesized published studies in order to estimate a global mean extinction rate and determine which factors contribute the greatest uncertainty to climate change-induced extinction risks. Results suggest that extinction rists will accelerate with future global temperatures, threatening up to one in six species under current policies. …
The paper applies "Bayesian Markov chain Monte Carlo (MCMC) random-effects meta-analysis that incorporated variation among and within studies and with each study weighted by sample size," starting with 131 studies that estimated extinctions. It comes up with an overall extinction risk of 7.9%, with a 95% confidence interval of 6.2% to 9.8%. (For an introduction to confidence intervals, see SticiGui: Confidence Intervals.)
We will read this paper together to try to understand what those numbers might mean. The supplementary materials for the paper are here.
This paper is particularly instructive because it (mis)uses so many statistical ideas and techniques, including means, confidence intervals, MCMC, random effects models, linear regression, and meta-analysis, in addition to garbling the claims in the underlying study. Moreover, even if the statistics had been done correctly, it is not at all clear what the claim shows, or even why counting species is a helpful measure of anything.
Be sure to track down and read the studies of extinctions that this paper cites.
The second paper we will discuss at length appeared in Proceedings of the National Academy of Science (PNAS) by Dietz et al., Political influences on greenhouse gas emissions from US states
Read both papers. Track down the most important references and read them as well. Be prepared to discuss:
Next chapter: Jupyter and R