# Render our plots inline
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
pd.set_option('display.mpl_style', 'default') # Make the graphs a bit prettier
plt.rcParams['figure.figsize'] = (15, 5)
Get our big data file from my webpage. You can use http
in your browser or wget
or whatever. You'll need to uncompress it.
!wget http://www.columbia.edu/~mj340/HMXPC_13.zip #etc. whatever
#check contents of directory!
Exploratory data analysis (EDA) seeks to reveal structure, or simple descriptions, in data. We look at numbers and graphs and try to find patterns.
- Persi Diaconis, "Theories of Data Analysis: From Magical Thinking Through Classical statistics"
. . . proceeding via a ‘dustbowl’ empiricism is dangerous at worst and foolish at best . . . . The purely empirical approach is particularly dangerous in an age when computers and packaged programs are readily available, since there is temptation to substitute immediate empirical analysis for more analytic thought and theory building.
- Einhorn, “Alchemy in the Behavioral Sciences,” 1972
. . . we can view the techniques of EDA as a ritual designed to reveal patters in a data set. Thus, we may believe that naturally occurring data sets contain structure, that EDA is a useful vehicle for revealing the structure. . . . If we make no attempt to check whether the structure could have arisen by chance, and tend to accept the findinds as gospel, then the ritual comes close to magical thinking. ... a controlled form of magical thinking--in the guise of 'working hypothesis'--is a basic ingredient of scientific progress.
- Persi Diaconis, "Theories of Data Analysis: From Magical Thinking Through Classical statistics"
Pandas
first-line python
tool for EDA¶Pandas
: charismatic megafauna¶