This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
We will download and process a dataset about attendance on Montreal's bicycle tracks. This example is largely inspired by a presentation from Julia Evans.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
url
that contains the address to a CSV (Comma-separated values) data file. This standard text-based file format is used to store tabular data.url = "https://github.com/ipython-books/cookbook-data/raw/master/bikes.csv"
read_csv
function that can read any CSV file. Here, we give it the URL to the file. Pandas will automatically download and parse the file, and return a DataFrame
object. We need to specify a few options to make sure the dates are parsed correctly.df = pd.read_csv(url, index_col='Date', parse_dates=True, dayfirst=True)
df
variable contains a DataFrame
object, a specific Pandas data structure that contains 2D tabular data. The head(n)
method displays the first n
rows of this table.df.head(2)
Every row contains the number of bicycles on every track of the city, for every day of the year.
describe
method.df.describe()
'Berri1'
and 'PierDup'
. Then, we call the plot
method.# The styling '-' and '--' is just to make the figure
# readable in the black & white printed version of this book.
df[['Berri1', 'PierDup']].plot(figsize=(8,4),
style=['-', '--']);
index
attribute of the DataFrame
contains the dates of all rows in the table. This index has a few date-related attributes, including weekday
.df.index.weekday
However, we would like to have names (Monday, Tuesday, etc.) instead of numbers between 0 and 6. This can be done easily. First, we create an array days
with all weekday names. Then, we index it by df.index.weekday
. This operation replaces every integer in the index by the corresponding name in days
. The first element, Monday
, has index 0, so every 0 in df.index.weekday
is replaced by Monday
, and so on. We assign this new index to a new column Weekday
in the DataFrame
.
days = np.array(['Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday',
'Sunday'])
df['Weekday'] = days[df.index.weekday]
groupby
method lets us do just that. Once grouped, we can sum all rows in every group.df_week = df.groupby('Weekday').sum()
df_week
ix
(indexing operation). Then, we plot the table, specifying the line width and the figure size.df_week.ix[days].plot(lw=3, figsize=(6,4));
plt.ylim(0); # Set the bottom axis to 0.
from ipywidgets import interact
#from IPython.html.widgets import interact # IPython < 4.x
@interact
def plot(n=(1, 30)):
plt.figure(figsize=(8,4));
pd.rolling_mean(df['Berri1'], n).dropna().plot();
plt.ylim(0, 8000);
plt.show();
You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).
IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).