This course provides an overview of an extremely flexible statistical framework for describing and performing inference with a wide variety of data types: the Generalised Linear Model (GLM). Many common statistical procedures are special cases of the GLM. In the course, we focus on the construction and understanding of design matrices and the interpretation of regression weights. We mostly concentrate on the linear Gaussian model, before discussing more general cases. We also touch on how this framework relates to ANOVA-style model comparison.
The course was designed and presented as a six week elective statistics course for graduate students in the neuroscience program at the University of Tübingen, in January 2015. Lectures were presented as a collection of IPython Notebooks. While the notebooks are (we hope) well documented, they are lecture materials rather than a textbook. As such, some content might not be self-explanatory.
We chose to do the course in Python because
Nevertheless, the main statistical module we use here (Statsmodels
) is well behind R in its maturity (no wonder, since R is a lot older). Thankfully, learning to create and interpret design matrices using Patsy
formula notation is a skill that transfers easily to R's glm
routines.
Note two things:
Where content is erroneous, unclear or buggy, please tell us at our GitHub repository.
To demonstrate the ideas in the course we used several datasets obtained from the OzDASL database as well as from our own research. They are provided in the git repository to facilitate self learning.
Authors: Tom Wallis and Philipp Berens
Year: 2015
Copyright: This work is licensed under a CC-by-4.0 license. You may reuse, modify and redistribute these materials provided you give appropriate credit to the authors. All images embedded in the lecture materials were obtained from the internet and are used under "fair use" for educational purposes. The copyright for all images remain with their respective holders.
Here we provide some references for further reading. These reflect our own backgrounds in neuroscience and psychology.
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference a practical information-theoretic approach. New York: Springer.
Gelman, A., & Hill, J. (2007). Data Analysis using regression and multilevel/hierarchical models. New York, NY: Cambridge Univ Press.
Knoblauch, K., & Maloney, L. T. (2012). Modeling Psychophysical Data in R. New York: Springer.
Kruschke, J. K. (2011). Doing Bayesian Data Analysis. Academic Press / Elsevier.
Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), ??–??
Here are some notes on how we set up a Python environment for the course (packages, versions) etc.
from IPython.core.display import HTML
def css_styling():
styles = open("custom_style.css", "r").read()
return HTML(styles)
css_styling()