%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
try:
import seaborn
except ImportError:
pass
pd.options.display.max_rows = 10
This dataset is borrowed from the PyCon tutorial of Brandon Rhodes (so all credit to him!). You can download these data from here: titles.csv
and cast.csv
and put them in the /data
folder.
cast = pd.read_csv('data/cast.csv')
cast.head()
titles = pd.read_csv('data/titles.csv')
titles.head()
Why is it useful to have an index?
It is this last one we are going to explore here!
Setting the title
column as the index:
c = cast.set_index('title')
c.head()
Instead of doing:
%%time
cast[cast['title'] == 'Hamlet']
we can now do:
%%time
c.loc['Hamlet']
But you can also have multiple columns as the index, leading to a multi-index or hierarchical index:
c = cast.set_index(['title', 'year'])
c.head()
%%time
c.loc[('Hamlet', 2000),:]
c2 = c.sort_index()
%%time
c2.loc[('Hamlet', 2000),:]