#!/usr/bin/env python
# coding: utf-8
#
#
# Pandas Hands-on
# Universitat Pompeu Fabra (UPF) - Barcelona
# Massimo Quadrana
#
#
# # About me: Massimo Quadrana
#
# - PhD student at Politecnico di Milano
# - Working on Recommendation Systems
# ->
#
# - https://github.com/mquad
# - [@mquad](https://twitter.com/mquad)
#
#
# Originally Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/)
#
# # Content of this talk
#
# - Why do you need pandas?
# - Basic introduction to:
# - Data structures and basic operations
# - Indexing and selecting data
# - Groupby operation
# # Material
#
# - All materials (notebook, data, link to nbviewer): https://github.com/mquad/pandas-tutorial
# - The complete tutorial is available at https://github.com/jorisvandenbossche/pandas-tutorial
# - You need `pandas` >= 0.15.2 (easy solution is using Anaconda)
# # Why do you need pandas?
# ## Why do you need pandas?
#
# When working with *tabular or structured data* (like R dataframe, SQL table, Excel spreadsheet, ...):
#
# - Import data
# - Clean up messy data
# - Explore data, gain insight into data
# - Process and prepare your data for analysis
# - Analyse your data (together with scikit-learn, statsmodels, __Keras__...)
# # Pandas: data analysis in python
#
# For data-intensive work in Python the [Pandas](http://pandas.pydata.org) library has become essential.
#
# What is ``pandas``?
#
# * Pandas can be thought of as NumPy arrays with labels for rows and columns, and better support for heterogeneous data types, but it's also much, much more than that.
# * Pandas can also be thought of as `R`'s `data.frame` in Python.
# * Powerful for working with missing data, working with time series data, for reading and writing your data, for reshaping, grouping, merging your data, ...
#
# It's documentation: http://pandas.pydata.org/pandas-docs/stable/
# ## Key features
#
# * Fast, easy and flexible input/output for a lot of different data formats
# * Working with missing data (`.dropna()`, `pd.isnull()`)
# * Merging and joining (`concat`, `join`)
# * Grouping: `groupby` functionality
# * Reshaping (`stack`, `pivot`) [ADVANCED]
# * Powerful time series manipulation (resampling, timezones, ..) [ADVANCED]
# * Easy plotting
# # Further reading
#
# - the documentation: http://pandas.pydata.org/pandas-docs/stable/
# - Wes McKinney's book "Python for Data Analysis"
# - Jake VanderPlas's Python Data Science Handbook: https://github.com/jakevdp/PythonDataScienceHandbook
# - Tom Augspurger's series on modern idiomatic pandas: https://tomaugspurger.github.io/modern-1.html
# - lots of tutorials on the internet, eg http://github.com/jvns/pandas-cookbook, https://github.com/brandon-rhodes/pycon-pandas-tutorial/
# In[ ]: