from IPython.core.display import HTML
with open('creative_commons.txt', 'r') as f:
html = f.read()
name = '2014-06-16-seaborn'
html = '''
<small>
<p> This post was written as an IPython notebook.
It is available for <a href='https://ocefpaf.github.com/python4oceanographers/downloads/notebooks/%s.ipynb'>download</a>
or as a static <a href='https://nbviewer.ipython.org/url/ocefpaf.github.com/python4oceanographers/downloads/notebooks/%s.ipynb'>html</a>.</p>
<p></p>
%s''' % (name, name, html)
%matplotlib inline
from matplotlib import style
style.use('ggplot')
This week I was helping a friend to explore her data-set with some simple statistics and plots. So I decided to try seaborn out.
It is a really nice library that, together with pandas, becomes a powerful tool to take the first steps while exploring your data.
Here is a simple example of what we did.
import seaborn
import numpy as np
import matplotlib.pyplot as plt
from io import BytesIO
from pandas import read_csv
kw = dict(na_values='NaN', sep=',', encoding='utf-8',
skipinitialspace=True, index_col=False)
df = read_csv("./data/fish.csv", **kw)
df.head()
Days | ID | Recovery | Extract weight | Lipid % | Weight (g) | Size (cm) | Liver weight (g) | LSI | CF | BDE 47 (ng/g) | BDE 99 (ng/g) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | A | 73.21 | 0.10 | 3.600000 | 20.09 | 12.8 | 0.14 | 0.696864 | 0.957966 | 0 | 0 |
1 | 0 | B | 98.24 | 0.22 | 2.272727 | 36.52 | 15.5 | 0.33 | 0.903614 | 0.980699 | 0 | 0 |
2 | 0 | C | 89.71 | 0.18 | 3.500000 | 28.74 | 14.7 | 0.25 | 0.869868 | 0.904763 | 0 | 0 |
3 | 1 | A | 78.40 | 0.13 | 1.330769 | 23.70 | 14.0 | 0.15 | 0.632911 | 0.863703 | 0 | 0 |
4 | 1 | B | 66.24 | 0.13 | 2.838462 | 32.80 | 15.0 | 0.20 | 0.609756 | 0.971852 | 0 | 0 |
Seaborn
makes it easy to control the figure aesthetics with set_style
and
get_style
.
kw = {'axes.edgecolor': '0', 'text.color': '0', 'ytick.color': '0', 'xtick.color': '0',
'ytick.major.size': 5, 'xtick.major.size': 5, 'axes.labelcolor': '0'}
seaborn.set_style("whitegrid", kw)
The first plot will be a simple and naive correlation matrix. It it just one
line with seaborn
.
ax = seaborn.corrplot(df, annot=False, diag_names=False)
Easy conclusion, the bigger the fish, the heavier it is ;). But seriously now,
BDE 47
is positively correlated with Days
and BDE 99
, that is worth
exploring. BDE 99
was part of the experiment. However, BDE 47
was not in
the fish at the begging, it is a by-product of the BDE 99
that appear as the
fish metabolized it.
We can explore this a little further. Note that we used pandas groupby
to
aggregate the the data around the variables "Days".
g = df.groupby('Days')
mean_df = g.mean()
g.describe().head()
BDE 47 (ng/g) | BDE 99 (ng/g) | CF | Extract weight | LSI | Lipid % | Liver weight (g) | Recovery | Size (cm) | Weight (g) | ||
---|---|---|---|---|---|---|---|---|---|---|---|
Days | |||||||||||
0 | count | 3 | 3 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 |
mean | 0 | 0 | 0.947809 | 0.166667 | 0.823449 | 3.124242 | 0.240000 | 87.053333 | 14.333333 | 28.450000 | |
std | 0 | 0 | 0.038974 | 0.061101 | 0.110916 | 0.739127 | 0.095394 | 12.724725 | 1.386843 | 8.218838 | |
min | 0 | 0 | 0.904763 | 0.100000 | 0.696864 | 2.272727 | 0.140000 | 73.210000 | 12.800000 | 20.090000 | |
25% | 0 | 0 | 0.931364 | 0.140000 | 0.783366 | 2.886364 | 0.195000 | 81.460000 | 13.750000 | 24.415000 |
ax = seaborn.jointplot("Days", "BDE 99 (ng/g)", df, kind="reg")
ax = seaborn.jointplot("Days", "BDE 47 (ng/g)", df, kind="reg")
The increase in BDE 47
is clear. BDE 99
does not show a decrease in the
same rate as BDE 47
because it was part of the fish diet.
The inspection of the residues is also a one-liner.
ax = seaborn.residplot("Days", "BDE 99 (ng/g)", df)
ax = seaborn.residplot("Days", "BDE 47 (ng/g)", df)
Hopefully that is useful for others. Do not forget to check seaborn docs.
HTML(html)
This post was written as an IPython notebook. It is available for download or as a static html.