#!/usr/bin/env python
# coding: utf-8

# # Intro to Plotting
# 
# ## Objectives
# 
# * Plot data from a Pandas DataFrame using Pandas' plotting tools
# * Use matplotlib to take more control of figures
# * Understand matplotlib's Figures and Axes
# * Know some advantages to using seaborn

# This is a little bit of IPython magic that causes plots to show up inline in the Notebook instead of in a separate window:

# In[1]:


get_ipython().run_line_magic('matplotlib', 'inline')


# ## Plotting with Pandas
# 
# Pandas has some convenient routines for making plots straight out from data in DataFrames and Series.
# 
# We'll use the same data from the Pandas lesson:

# In[2]:


import pandas as pd


# In[3]:


monthly = pd.read_csv('precip_monthly.csv')


# Let's start with plot of the mean California rainfall over the years.
# The first step will be to use groupby ot aggregate the data by year:

# In[4]:


yearly = monthly.groupby('year').precip.mean()
yearly.head()


# That gives us a Series, which has a `.plot` method for plotting the index of the Series vs. the values:

# In[5]:


yearly.plot()


# Or we can compare regions and make a bar chart:

# In[6]:


regional = monthly.groupby('region').precip.mean()
regional


# In[7]:


regional.plot(kind='bar')


# This is a lot easier to read if it's sorted:

# In[8]:


regional.sort(inplace=False).plot(kind='bar')


# (Check this map if you're curous where these different regions are: http://www.oocities.org/watergeographer/hydromap.gif.)

# If you have a DataFrame of data with the same kind of data in each column you can use the `.plot` method to create plots with different lines (or other styles) for each column.
# But first we need to make a DataFrame with data laid out in that way.
# Let's compare average rainfall in each region over time, which requires grouping on year *and* region then calculating the mean:

# In[9]:


regional_yearly = monthly.groupby(['year', 'region']).precip.mean()
regional_yearly.head(15)


# `regional_yearly` is a Series with a two-level index (a `MultiIndex`), one for year and one for region.
# What we're after is a DataFrame with regions as columns and precip values for each region/year in the table.
# The Series `.unstack()` method can be used to pivot index labels into column labels:

# In[10]:


regional_yearly = regional_yearly.unstack(level='region')
regional_yearly.head()


# Now we can use `.plot` to get a different line on the plot for every region:

# In[11]:


regional_yearly.plot(kind='line', figsize=(10, 8), colormap='Set3', linewidth=4)


# That's a lot of lines, though.
# We can use a box plot to more clearly see regional variation, but it masks the yearly variation (as with a bar chart).

# In[12]:


regional_yearly.plot(kind='box', figsize=(10, 8), rot=45)


# ### Exercise 1
# 
# The `monthly` DataFrame has a `'pct of avg'` column that describes how the amount of measured precipitation compares to the previously recorded average.
# Use the `'pct of avg'` column to make plots like those above, and feel free to experiment!
# 
# **Note!** You can't use `.pct of avg` to refer to the `'pct of avg'` column. Use `['pct of avg']` instead.

# In[ ]:


# We'll come back to the `'pct of avg'` plot in a few minutes.

# ## matplotlib
# 
# Pandas is making the above plots using a library called [matplotlib](http://matplotlib.org/).
# The Pandas `.plot()` method is great for quickly creating plots from data in a DataFrame or Series,
# but maybe your data is in NumPy arrays or you want more control than Pandas gives you.
# In those situations you'll likely start working with matplotlib directly.

# In[13]:


import matplotlib.pyplot as plt
import matplotlib.style as style


# Plotting in matplotlib begins with getting something on which to draw.
# In matplotlib there are two levels of plot containers, axes and figures:
# 
# - Axes contain plots. Everything we've seen above is an example of a single axes containing one or more plot elements.
# - Figures contain axes. In the examples so far every figure has contained one axes, but we'll see in a moment how to make a figure containing multiple axes (often referred to as subplots).
# 
# Let's remake the first `'pct of avg'` plot from above.
# First we'll need the data.

# In[14]:


yearly = monthly.groupby('year')['pct of avg'].mean()
yearly.head()


# The index of the `yearly` Series are the x-values we want, and the values are y.
# We use the `subplots` function to make figures and axes.
# Axes have a `.plot` method that takes, among other things, arrays of x and y values to plot:

# In[15]:


fig, ax = plt.subplots()
ax.plot(yearly.index, yearly)


# A nice feature of having the figure and axes in variables is that we can continue to modify the plot by adding labels, grids, titles, etc.
# (After modifying the plot we need to echo the figure variable to get everything to show up again.)

# In[16]:


ax.set_xlabel('Year')
ax.set_ylabel('Pct of Avg Precip')
ax.set_title('California Precipitation Compared to Historical Average')
ax.grid(True)
fig


# If you wanted, you could now use the `fig` variable to save this plot to a file.

# In[17]:


fig.savefig('ca_pct_of_avg.png')


# ### A Note on Style
# 
# Many people do not like matplotlib's default plot styling.
# You can change plot styles in a number of ways, but one of the easiest is to select from matplotlib's builtin style sheets: http://matplotlib.org/gallery.html#style_sheets.
# I like the 'bmh' style, so let's activate that and remake the above plot:

# In[18]:


style.available


# In[19]:


style.use('bmh')


# In[20]:


fig, ax = plt.subplots()
ax.plot(yearly.index, yearly)
ax.set_xlabel('Year')
ax.set_ylabel('Pct of Avg Precip')
ax.set_title('California Precipitation Compared to Historical Average')
ax.grid(True)


# That plot style will remain in effect for the rest of the notebook or until we change it to something else.

# ### Markers and lines
# 
# To make it a bit easier to see where the data points are we can add markers to the plot
# using the `marker` keyword:

# In[21]:


fig, ax = plt.subplots()
ax.plot(yearly.index, yearly, marker='o')
ax.set_xlabel('Year')
ax.set_ylabel('Pct of Avg Precip')
ax.set_title('California Precipitation Compared to Historical Average')
ax.grid(True)


# We can also change whether the line is solid, dotted, dashed, etc., by setting the linestyle. For example, if we wanted to remove the line entirely:

# In[22]:


fig, ax = plt.subplots()
ax.plot(yearly.index, yearly, marker='o', linestyle='')
ax.set_xlabel('Year')
ax.set_ylabel('Pct of Avg Precip')
ax.set_title('California Precipitation Compared to Historical Average')
ax.grid(True)


# ### subplots
# 
# The second `'pct of avg'` plot we made with ten lines was a bit much,
# an alternative is to make a figure with ten subplots, one for each region.
# To make a figure with ten subplots we again use the `subplots` function,
# but tell it the number of rows and columns of subplots we want:

# In[23]:


fig, axes = plt.subplots(nrows=10, ncols=1, sharex=True, figsize=(10, 15))


# `axes` is a little different than the `ax` variable we used for the last plot

# In[24]:


axes


# Our strategy this time is loop over all the different subplots and individually create each one.
# To do that we also need to loop over the data we want to plot. Let's look at `regional_yearly` again:

# In[25]:


regional_yearly = monthly.groupby(['year', 'region'])['pct of avg'].mean().unstack(level='region')
regional_yearly.head()


# We can loop over the columns and use each one to make a plot!
# Since I want to loop over both the columns and the axes I'm going to use Python's
# [zip](https://docs.python.org/3/library/functions.html#zip) function:

# In[26]:


fig, axes = plt.subplots(nrows=10, ncols=1, sharex=True, figsize=(10, 15))
for col_name, ax in zip(regional_yearly.columns, axes):
    col = regional_yearly[col_name]
    ax.plot(col.index, col)


# So that worked, but we need to include things like axis labels and titles, 
# and it'd be nice all the subplots had the same y-scale.
# We can take care of all of that in the loop:

# In[27]:


fig, axes = plt.subplots(nrows=10, ncols=1, sharex=True, figsize=(10, 15))
for col_name, ax in zip(regional_yearly.columns, axes):
    col = regional_yearly[col_name]
    ax.plot(col.index, col)
    ax.set_xlabel('Year')
    ax.set_ylabel('pct of avg')
    ax.set_ylim(0, 250)
    ax.set_title(col_name)


# Ok, now things have gotten a bit crowded.
# As a final step we can tell matplotlib to neatly arrange everything using the `.title_layout()` method:

# In[28]:


fig.tight_layout()
fig


# ## seaborn
# 
# Seaborn is another plotting library, also built on top of matplotlib. Seaborn provides a very nice default style, as well as several functions that make it easy to create beautiful and sophisticated plots.
# 
# Note: Seaborn offers many more plotting utilities beyond the examples we'll go through here. We recommend checking out their tutorial to see all of the options that Seaborn has to offer!
# 
# http://stanford.edu/~mwaskom/software/seaborn/tutorial.html
# 
# First off, just importing Seaborn will change the default style:

# In[29]:


import seaborn as sns


# The style is a bit similar to the bmh style we set earlier, though the colors are a bit different, and there are some other changes (for example, the figure size is larger by default):

# In[30]:


yearly.plot()


# In[31]:


regional.sort(inplace=False).plot(kind='bar')


# Seaborn also includes tools to construct sophisticated plots very quickly. For example, recall the box plot that we created earlier:

# In[32]:


regional_yearly = monthly.groupby(['year', 'region'])['precip'].mean().unstack('region')
regional_yearly.plot(kind='box', figsize=(10, 8), rot=45)


# Seaborn gives us a method to create a similar type of plot called a "violin" plot. This plot provides a little bit more information than a box plot, as it additionally shows the true distribution of the data (not just the median and quantiles). This often has a shape that looks curvy, sometimes like a violin, hence the name.
# 
# First, we need to convert our data into a form that Seaborn can work with:

# In[33]:


# make the columns into an index
df = regional_yearly.stack()

# convert from Series to DataFrame
df = df.to_frame('precip')

# turn the index back into columns
df = df.reset_index()

df.head()


# Then, to create the violin plot, we tell Seaborn that it should plot the region on the x-axis, precipitation on the y-axis, and that the quartiles should be displayed inside each violin:

# In[34]:


plt.xticks(rotation=45)
sns.violinplot(x='region', y='precip', data=df, inner='quartile', linewidth=1)


# ### Grids of subplots

# Seaborn also provides an easy way to create complex plots like the subplots we had above. We can create this figure using `FacetGrid`, where we specify that each row will correspond to a different region. We then tell the grid to create a plot with `year` on the x-axis and `precip` on the y-axis for each region: 

# In[35]:


# Initialize a grid of plots with an Axes for each region
grid = sns.FacetGrid(df, row="region", aspect=3)

# Draw a line plot to show the trajectory of each random walk
grid.map(plt.plot, "year", "precip")


# Another type of figure with a grid of subplots is called a "pair plot", where the columns of a data frame are plotted against each other. In the case of the precipitation data, we could look at how the average precipitation compares between different regions. 

# In[36]:


# only use the first few regions
cols = regional_yearly.columns[:6]

# create the pairplot
grid = sns.pairplot(regional_yearly[cols], diag_kind='kde', diag_kws=dict(shade=True))

# synchronize the axis limits
grid.set(xlim=(0, 6), ylim=(0, 6))


# In[ ]: