Welcome! Allow me to be the first to offer my congratulations on your decision to take an interest in Applied Predictive Modeling with Python! This is a collection of IPython Notebooks that provides an interactive way to reproduce this awesome book by Kuhn and Johnson.
If you experience any problems along the way or have any feedback at all, please reach out to me.
Best Regards,
Lei Gong
Email: LeiG.inbox@gmail.com
Twitter: @_LeiG
import numpy
import scipy
import pandas
import sklearn
import matplotlib
import rpy2
import pyearth
import statsmodels
Thanks to the authors, all datasets that are necessary in order to reproduce the examples in the book are available in the .RData format from their R package $\texttt{caret}$ and $\texttt{AppliedPredictiveModeling}$. To prepare them for our purpose, I did a little hack so that you can download all the datasets and convert them from .RData to .csv by running this script "fetch_data.py".
%run ../fetch_data.py
Using existing datasets folder:/Users/leigong/Documents/Research/DataScience/Applied-Predictive-Modeling/datasets Downloading AppliedPredictiveModeling from http://cran.r-project.org/src/contrib/AppliedPredictiveModeling_1.1-6.tar.gz (2 MB) Decomposing /Users/leigong/Documents/Research/DataScience/Applied-Predictive-Modeling/datasets/AppliedPredictiveModeling_1.1-6.tar.gz Checking that the AppliedPredictiveModeling file exists... => Success! Downloading Caret from http://cran.r-project.org/src/contrib/caret_6.0-37.tar.gz (2 MB) Decomposing /Users/leigong/Documents/Research/DataScience/Applied-Predictive-Modeling/datasets/caret_6.0-37.tar.gz Checking that the Caret file exists... => Success! Extract .RData files from the package... Convert .RData to .csv and clean up .RData files... => Success!
Predictive modeling: the process of developing a mathematical tool or model that generates an accurate prediction.
There are a number of common reasons why predictive models fail, e.g,
The trade-off between prediction and interpretation depends on the primary goal of the task. The unfortunate reality is that as we push towards higher accuracy, models become more complex and their interpretability becomes more difficult.
The foundation of an effective predictive model is laid with intuition and deep knowledge of the problem context, which are entirely vital for driving decisions about model development. The process begins with relevant data.