# Start with our normal batch of imports and settings
from __future__ import print_function, division
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns; sns.set()
index_col
argument to set the index to the first column)pd.concat
into a single dataframefemales = pd.read_csv('data/femaleVisitsToPhysician.csv')
males = pd.read_csv('data/maleVisitsToPhysician.csv')
For each gender, the data shows the per capita consultations by age and year.
Use pd.pivot_table
and plot the data.
Also, as you create these plots, experiment with sns.set_palette
to get a color scheme which helps convey the information you're interested in.
The copayment for GP visits was eliminated in 2010. Let's see whether there is any indication that this affected the rate of visits
with_copay
, which is True if the year is prior to 2010, and False otherwiseLet's try to pull some information out of the data that's not obviously available.
Notice that the age
column and the year
column are intertwined... that is, by subtracting the age from the year, we can find the birth year of the group of people recorded.
If you finish the above tasks, try this more open-ended exploration on a different dataset.
Seaborn includes a dataset representing the individuals who were on-board the ill-fated maiden voyage of the Titanic. It has information about their age, gender, class, fare paid, the deck their quarters were on, whether they were traveling with someone, and whether they survived.
This is a fairly open-ended exploration, but try answering these questions:
See what sort of interesting relationships you can find between the various pieces of data.
# load the titanic data
titanic = sns.load_dataset('titanic')