This notebook will walk you through some exercises to get practice using Pandas for data manipulation.
As you use this, feel free to make ample use of the Pandas Documentation, the Pandas StackOverflow Channel, and your favorite search engine. For example, if you search phrases like "Pandas sum all columns", you're very likely to find an answer to the question you have in mind.
Also, if it comes down to it, note that solutions are available in the Git repository.
# Start with our normal batch of imports and settings
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# Following is optional: set plotting styles
import seaborn; seaborn.set()
In the lecture, we looked at the US Social Security Baby Names data. Here let's dive a little bit deeper into this.
Try to do the following with the Baby Names data.
(Here you can copy the code from the other notebook; make sure you understand what it's doing!)
Note: there are multiple ways to do this, but the first part will use masking and pivot tables, while the second part might also throw-in a groupby.
This is a bit tricky: you might be tempted to use a groupby
and apply
over the multiple indices ['year', 'gender', 'name']
, but if you try this you'll find that it's very computationally intensive.
I'd suggest doing the following:
Is a name more likely to transition from female to male, or from male to female?