#!/usr/bin/env python # coding: utf-8 #
#
#

Pandas Hands-on

#

Universitat Pompeu Fabra (UPF) - Barcelona

#

Massimo Quadrana

#
#
# # About me: Massimo Quadrana # # - PhD student at Politecnico di Milano # - Working on Recommendation Systems # -> # # - https://github.com/mquad # - [@mquad](https://twitter.com/mquad) # # # Originally Licensed under [CC BY 4.0 Creative Commons](http://creativecommons.org/licenses/by/4.0/) # # # Content of this talk # # - Why do you need pandas? # - Basic introduction to: # - Data structures and basic operations # - Indexing and selecting data # - Groupby operation # # Material # # - All materials (notebook, data, link to nbviewer): https://github.com/mquad/pandas-tutorial # - The complete tutorial is available at https://github.com/jorisvandenbossche/pandas-tutorial # - You need `pandas` >= 0.15.2 (easy solution is using Anaconda) # # Why do you need pandas? # ## Why do you need pandas? # # When working with *tabular or structured data* (like R dataframe, SQL table, Excel spreadsheet, ...): # # - Import data # - Clean up messy data # - Explore data, gain insight into data # - Process and prepare your data for analysis # - Analyse your data (together with scikit-learn, statsmodels, __Keras__...) # # Pandas: data analysis in python # # For data-intensive work in Python the [Pandas](http://pandas.pydata.org) library has become essential. # # What is ``pandas``? # # * Pandas can be thought of as NumPy arrays with labels for rows and columns, and better support for heterogeneous data types, but it's also much, much more than that. # * Pandas can also be thought of as `R`'s `data.frame` in Python. # * Powerful for working with missing data, working with time series data, for reading and writing your data, for reshaping, grouping, merging your data, ... # # It's documentation: http://pandas.pydata.org/pandas-docs/stable/ # ## Key features # # * Fast, easy and flexible input/output for a lot of different data formats # * Working with missing data (`.dropna()`, `pd.isnull()`) # * Merging and joining (`concat`, `join`) # * Grouping: `groupby` functionality # * Reshaping (`stack`, `pivot`) [ADVANCED] # * Powerful time series manipulation (resampling, timezones, ..) [ADVANCED] # * Easy plotting # # Further reading # # - the documentation: http://pandas.pydata.org/pandas-docs/stable/ # - Wes McKinney's book "Python for Data Analysis" # - Jake VanderPlas's Python Data Science Handbook: https://github.com/jakevdp/PythonDataScienceHandbook # - Tom Augspurger's series on modern idiomatic pandas: https://tomaugspurger.github.io/modern-1.html # - lots of tutorials on the internet, eg http://github.com/jvns/pandas-cookbook, https://github.com/brandon-rhodes/pycon-pandas-tutorial/ # In[ ]: