Statistics 157, Fall 2017

Nonparametric Inference and Sensitivity Auditing with Applications to Social Good

Philip B. Stark http://www.stat.berkeley.edu/~stark

Course outline:

Theory and Philosophy: pseudo-random number generation, algorithms for random sampling, elements of group theory in probability (invariance, orbits, exchangeability), permutation tests, stratified permutation tests, selecting test statistics, nonparametric inference about effect size, sampling and inference for finite populations, useful probability inequalities for statistical inference, Wald's sequential probability ratio test, cargo-cult statistics, the ontogeny of probability in applied statistics, post-normal science, sensitivity analysis and sensitivity auditing, uncertainty quantification.

Applications: election auditing, gender bias in teaching evaluations, forecasting the price of solar cells, forecasting the economic impact of climate change, predictive policing, ethical considerations in applying data science to societal problems, policy implications of models and uncertainty, communicating statistical ideas to a lay audience.

Term projects: contribute to an open-source project for permutation methods and to an open-source project for election auditing, and analyze data relevant to an important societal issue, e.g., the partial recount of the 2016 US presidential election or the 2017 Kenyan election, crime or recidivism or "predictive policing," consumer lending or college loans, alternative energy, farm subsidies, mass transportation, bicycle commuting, natural disasters, or climate. Written and oral presentations are required; some projects might involve building an interactive website to present results or enable others to use the techniques. Grades will be based in part on the quality of the writing and the programming, the relevance and acuity of the analysis, the reproducibility of the computations, and the effectiveness of the communication.

Written assignments will be submitted using GitHub. There will be emphasis on using good computational hygiene, including revision control systems, unit tests, regression tests, and coverage tests, and on documenting one's code adequately. We will occasionally have code reviews in class.

Philosophy: Learn something, teach something, make a contribution to something you feel good about.

Prerequisites: Statistics 133, 134, 135, willingness to work with a team of peers, dedication. Students are expected to be comfortable with LaTeX, Markdown, HTML5, git, GitHub, Python, Javascript, and Jupyter. Some projects might require jQuery and D3.

Code of conduct; attribution of work: The high academic standard at the University of California, Berkeley, is reflected in each degree awarded. Every student is expected to maintain this high standard by ensuring that all academic work reflects unique ideas or properly attributes the ideas to the original sources.

These are some basic expectations of students with regards to academic integrity: Any work submitted should be your own individual thoughts, and should not have been submitted for credit in another course unless you have prior written permission to re-use it in this course from this instructor.

All assignments must use "proper attribution," meaning that you have identified the original source and extent or words or ideas that you reproduce or use in your assignment. This includes drafts and homework assignments! If you are unclear about expectations, ask your instructor.

Do not collaborate or work with other students on assignments or projects unless the instructor gives you permission or instruction to do so.

Disability accommodations: If you need an accommodation for a disability, if you have information your wish to share with the instructor about a medical emergency, or if you need special arrangements if the building needs to be evacuated, please inform the instructor as soon as possible.

If you are not currently listed with DSP (the Disabled Students' Program) and believe you might benefit from their support, please apply online at dsp.berkeley.edu

GitHub classroom for this course:

Resources

Software "stack" for this course

  • Jupyter
  • Python
    • SciPy
    • NumPy
    • not Pandas, for the most part
    • nose, unittest, or other test suites
  • Git/GitHub
  • Travis CI
  • Coveralls
  • LaTeX / Markdown / Pandoc

Rough weekly schedule

  1. mathematical preliminaries; talk about term projects.
    • first assignment, due 9/5
  2. mathematical preliminaries, permutations, invariance; the permute package; PRNGs
  3. PRNGs, introduction to election auditing, Wald's sequential probability ratio test
  4. nonparametric inference about the mean of finite populations
  5. risk-limiting audits
  6. guest lecture by Kristian Lum on predictive policing
  7. permutation tests, application to gender bias
  8. the 2-sample problem; probability inequalities
  9. sensitivity auditing
In [ ]: