Notebooks avaliable online at: https://github.com/jackgolding/FullStackDataAnalysis
Viewable at: http://nbviewer.ipython.org/github/jackgolding/FullStackDataAnalysis/tree/master/
Introduction
Very quick introduction to Python Syntax
Web Scraping
Consuming APIs
Machine Learning
Application Development with Flask
From the docs
https://docs.python.org/2/faq/general.html
Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Finally, Python is portable: it runs on many Unix variants, on the Mac, and on PCs under MS-DOS, Windows, Windows NT, and OS/2
TLDR;
Python is a B+ language at everything, its extensibility and the focus from the community onto scientific libraries in the 90s is why it is used so much in data analysis.
Given that Revolution Analytics just got bought by Microsoft I think its appropriate to discuss Python vs R
Python
Why R
Why Neither
Other languages like R and Python worth mentioning
Credit to http://www.dataschool.io/python-or-r-for-data-science/
Python 2 or 3?
Installing Python is a pain in the butt for a few reasons:
I explain how to set up anaconda here: https://github.com/jackgolding/FullStackDataAnalysis
Keep your Python Installations and Libraries Isolated
When you update pandas from 0.14 to 0.15 for a new project, you don't want to break all your old projects!
Create and activate a new environment in Anaconda
conda info -e
conda create -n new_environment --clone root
source activate new_environment
use pyenv/virtualenv if you don't want to install Anaconda
Cross Industry Standard Process for Data Mining
Currently championed by IBM, created in mid 90s
leading methodology for data mining (KDD, surveys 2002, 2004, 2007)
Puts business understanding at the front and centre
All other methologies I have seen are waterfall-esque
from IPython.display import display, Image
Image(filename=r'./Assets/1024px-CRISP-DM_Process_Diagram.png')
#Image Copyright, Kenneth Jensen http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining#mediaviewer/File:CRISP-DM_Process_Diagram.png
What a typical project looks like for me: