Notebook

Hands-on with Pandas¶

This notebook will walk you through some exercises to get practice using Pandas for data manipulation.

As you use this, feel free to make ample use of the Pandas Documentation, the Pandas StackOverflow Channel, and your favorite search engine. For example, if you search phrases like "Pandas sum all columns", you're very likely to find an answer to the question you have in mind.

Also, if it comes down to it, note that solutions are available in the Git repository.

In [1]:

# Start with our normal batch of imports and settings
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Following is optional: set plotting styles
import seaborn; seaborn.set()

Diving Deeper into Baby Names¶

In the lecture, we looked at the US Social Security Baby Names data. Here let's dive a little bit deeper into this.

Try to do the following with the Baby Names data.

0. Load the baby names data¶

(Here you can copy the code from the other notebook; make sure you understand what it's doing!)

In [ ]:

1. Find your own name within the data.¶

How many babies per year are born with your name?
What fraction of births each year have your name?

Note: there are multiple ways to do this, but the first part will use masking and pivot tables, while the second part might also throw-in a groupby.

In [ ]:

2. Find names which have switched genders.¶

This is a bit tricky: you might be tempted to use a groupby and apply over the multiple indices ['year', 'gender', 'name'], but if you try this you'll find that it's very computationally intensive.

I'd suggest doing the following:

Use a pivot table, and find the total number of births for each name before some early date (say, 1920) and after some later date (say, 1980).
Compute the percentage of males for each name within those groups.
Use masking to find which names have transitioned from a low percentage to a high percentage, and vice versa.

Is a name more likely to transition from female to male, or from male to female?

In [ ]: