Introducing Pandas¶

Before we get into the details of flow analysis, it makes sense to get used to the tools we'll be using. This notebook is designed to introduce IPython and Pandas for general data analysis tasks. This forms a basis for doing flow analysis work in this environment.

This notebook is (very lightly) adapted from Chapters 0 through 4 of the "Pandas Cookbook" by Julia Evans; all credit for the content here goes to her (and when you read "I" here, that's Ms. Evans speaking).

A quick tour of IPython Notebook¶

This tour is designed to be run in interactive mode, using IPython notebooks. If you're not already viewing the tutorial using IPython's notebook mode, get IPython notebook installed and, and start it from a terminal by running

ipython notebook

First, we need to explain how to run cells. Try to run the cell below!

In [ ]:

import pandas as pd

print("Hi! This is a cell. Press the ▶ button above to run it")

You can also run a cell with Ctrl+Enter or Shift+Enter. Experiment a bit with that.

One of the most useful things about IPython notebook is its tab completion.

Try this: click just after read_csv( in the cell below and press Shift+Tab 4 times, slowly

In [ ]:

pd.read_csv(

After the first time, you should see this:

After the second time:

After the fourth time, a big help box should pop up at the bottom of the screen, with the full documentation for the read_csv function:

I find this amazingly useful. I think of this as "the more confused I am, the more times I should press Shift+Tab". Nothing bad will happen if you tab complete 12 times.

Okay, let's try tab completion for function names!

In [ ]:

pd.r

You should see this:

Writing code¶

Writing code in the notebook is pretty normal.

In [ ]:

def print_10_nums():
    for i in range(10):
        print(i)

In [ ]:

print_10_nums()

Saving¶

As of the latest stable version, the notebook autosaves. You should use the latest stable version. Really.

Magic functions¶

IPython has all kinds of magic functions. Here's an example of comparing sum() with a list comprehension to a generator comprehension using the %time magic.

In [ ]:

%time sum([x for x in range(100000)])

In [ ]:

%time sum(x for x in range(100000))

The magics I use most are %time and %prun for profiling. You can run %magic to get a list of all of them, and %quickref for a reference sheet.

In [ ]:

%quickref

Getting data into Pandas (reading from a CSV file)¶

Now let's get started with using Pandas. First, run the following code to set up the environment in IPython: