Implementation of typographic and design principles in matplotlib and iPython notebook

UCSD Scientific Python User's Group, April 10th, 2013

Outline

  1. Why should I care about design?
  2. Matplotlib
    1. Default colors
      1. Lines and scatterplots
      2. Heatmaps
    2. Default fonts
    3. Removing "chartjunk"
    4. Sparklines
  3. Final notes
    1. iPython notebook
      1. Default fonts and layouts via custom profiles
    2. Bokeh plotting package: the future?

Why should I care about design?

Bad design = difficult interpretation, possible loss of information, and inability to recognize trends. I will use concepts from Visual Display of Quantitative Information, 2nd Ed, by Edward Tufte, Graphics Press (2001).

Do not do this bad example from the matplotlib gallery:

Why is this so bad? The divergent 'rainbow' color scheme makes it difficult to compare. Humans are terrible at using different hues to discriminate between different values, but alright at using saturation, such as one color from very light to very dark.

Or this also terrible example from the gallery:

Why is this so bad? The graphics of the box distract from the true information. It would be much more effective as a plain bar chart.

We will talk about how to

Matplotlib

Default colors

Lines and scatterplots

The default colors in matplotlib are not pretty, nor are they conducive to easy comparison. They were meant to be familiar to MATLAB users but that's not a good reason for poor design choices.

In [1]:
# For setting parameters, we will need to use matplotlib (mpl) directly
import matplotlib as mpl

# This is the usual invocation of pyplot
import matplotlib.pyplot as plt

import numpy as np
import pandas as pd

# Set the random seed for consistency
np.random.seed(12)

# I happen to know that there are 7 default colors in matplotlib
for i in range(7):
    plt.plot(np.random.randn(1000).cumsum())

Ugh. It's an unfortunate mishmash of RGB+CYMK: Red, blue, green, and cyan, yellow, magenta and blac(k). But we already know that we can do better.

In 2003, Cynthia Brewer and colleagues released guidelines for coloring maps with sequential, divergent, and qualitative colors, and these guidelines are now available through http://colorbrewer2.org/. These colors are included in an existing package in R, but only recently someone added these colors to Python through the package brewer2mpl, intended as being used in matplotlib.

An example import is, (from the author's blog post):

import brewer2mpl
bmap = brewer2mpl.get_map('Set1', 'qualitative', 5)
colors = bmap.mpl_colors

So let's install this package.

In [18]:
! sudo easy_install brewer2mpl
Password:

(can't do interactive terminal stuff in iPython so I did this in my actual terminal)

The output:

Searching for brewer2mpl Reading http://pypi.python.org/simple/brewer2mpl/ Reading https://github.com/jiffyclub/brewer2mpl/wiki Best match: brewer2mpl 1.3.1 Downloading http://pypi.python.org/packages/source/b/brewer2mpl/brewer2mpl-1.3.1.zip#md5=ae1e2cfc57e7e022e0208e2b5a994292 Processing brewer2mpl-1.3.1.zip Writing /tmp/easy_install-9BiZaj/brewer2mpl-1.3.1/setup.cfg Running brewer2mpl-1.3.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-9BiZaj/brewer2mpl-1.3.1/egg-dist-tmp-UKWbyr zip_safe flag not set; analyzing archive contents... brewer2mpl.brewer2mpl: module references __file__ Adding brewer2mpl 1.3.1 to easy-install.pth file Installed /Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/brewer2mpl-1.3.1-py2.7.egg Processing dependencies for brewer2mpl Finished processing dependencies for brewer2mpl
In [19]:
! cat ~/.matplotlibrc | grep color_cycle
cat: /Users/olga/.matplotlibrc: No such file or directory
In [28]:
import brewer2mpl

# brewer2mpl.get_map args: set name  set type  number of colors
bmap = brewer2mpl.get_map('Set2', 'qualitative', 7)
colors = bmap.mpl_colors
print colors
[(0.4, 0.7607843137254902, 0.6470588235294118), (0.9882352941176471, 0.5529411764705883, 0.3843137254901961), (0.5529411764705883, 0.6274509803921569, 0.796078431372549), (0.9058823529411765, 0.5411764705882353, 0.7647058823529411), (0.6509803921568628, 0.8470588235294118, 0.32941176470588235), (1.0, 0.8509803921568627, 0.1843137254901961), (0.8980392156862745, 0.7686274509803922, 0.5803921568627451)]

We have a list of 3-tuples of RGB decimal values, from 0 to 1, as specified in the matplotlib colors API. You may be used to seeing RGB specifications in values between 0 and 255, and this is the same thing, except it's a fraction of 255.

Now let's use these colors to plot. To do so, we'll have to change the default color cycle of matplotlib via the command,

mpl.rcParams['axes.color_cycle'] = colors

Now that mpl we imported earlier is coming in handy!

In [34]:
# Set the random seed for consistency
np.random.seed(12)

# Change the default colors
mpl.rcParams['axes.color_cycle'] = colors

# I happen to know that there are 7 default colors in matplotlib
for i in range(7):
    plt.plot(np.random.randn(1000).cumsum())

Now that looks much better! Here is a cheat sheet of the ColorBrewer colors (from the cbrewer page on Mathworks website)

As for scatterplots, I prefer to show them with a very thin, grey line around the circle. So instead of no outlines like this:

In [60]:
# Set the random seed for consistency
np.random.seed(12)

# Change the default colors
#mpl.rcParams['axes.color_cycle'] = 
colors = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colors

#matplotlib.image.cmap = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colormap

# I happen to know that there are 7 default colors in matplotlib
for i, color in enumerate(colors):
    plt.scatter(np.random.randn(1000), np.random.randn(1000), 
    color=color)

Or an overpowering black outline that speaks louder than the plot itself,

In [58]:
# Set the random seed for consistency
np.random.seed(12)

# Change the default colors
#mpl.rcParams['axes.color_cycle'] = 
colors = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colors

#matplotlib.image.cmap = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colormap

# I happen to know that there are 7 default colors in matplotlib
for i, color in enumerate(colors):
    plt.scatter(np.random.randn(1000), np.random.randn(1000), 
    color=color, edgecolors='k')

A light grey, thin outline balances both visibility and aesthetics.

In [56]:
# Set the random seed for consistency
np.random.seed(12)

# Change the default colors
#mpl.rcParams['axes.color_cycle'] = 
colors = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colors

#matplotlib.image.cmap = brewer2mpl.get_map('Set2', 'qualitative', 7).mpl_colormap

# I happen to know that there are 7 default colors in matplotlib
for i, color in enumerate(colors):
    plt.scatter(np.random.randn(1000), np.random.randn(1000), 
    color=color,
        edgecolors='grey',linewidths=0.1)

Now to introduce 'Set2' as our default colors, we must change our .matplotlibrc file.

Let's check where ours is.

In [11]:
# For some reason, this doesn't work with mpl
import matplotlib
matplotlib.matplotlib_fname()
Out[11]:
'/Library/Frameworks/EPD64.framework/Versions/7.3/lib/python2.7/site-packages/matplotlib/mpl-data/matplotlibrc'

According to the matplotlib customization information, the order in which the matplotlibrc files are looked at:

  1. matplotlibrc in the current working directory, usually used for specific customizations that you do not want to apply elsewhere.
  2. .matplotlib/matplotlibrc, for the user’s default customizations. See .matplotlib directory location.
  3. INSTALL/matplotlib/mpl-data/matplotlibrc, where INSTALL is something like /usr/lib/python2.5/site-packages on Linux, and maybe C:\Python25\Lib\site-packages on Windows. Every time you install matplotlib, this file will be overwritten, so if you want your customizations to be saved, please move this file to your .matplotlib directory.

So that we can distinguish our custom matplotlibrc file, we'll make the ~/.matplotlib directory and the matplotlibrc file within it. If you haven't created this directory and the file already, you will need to instantiate one.

We will use a sample .matplotlibrc file is available from the matplotlib website.

In [15]:
%%bash
mkdir ~/.matplotlib
cd ~/.matplotlib
wget http://matplotlib.org/_static/matplotlibrc 
cat ~/.matplotlib/matplotlibrc
### MATPLOTLIBRC FORMAT

# This is a sample matplotlib configuration file - you can find a copy
# of it on your system in
# site-packages/matplotlib/mpl-data/matplotlibrc.  If you edit it
# there, please note that it will be overwritten in your next install.
# If you want to keep a permanent local copy that will not be
# overwritten, place it in HOME/.matplotlib/matplotlibrc (unix/linux
# like systems) and C:\Documents and Settings\yourname\.matplotlib
# (win32 systems).
#
# This file is best viewed in a editor which supports python mode
# syntax highlighting. Blank lines, or lines starting with a comment
# symbol, are ignored, as are trailing comments.  Other lines must
# have the format
#    key : val # optional comment
#
# Colors: for the color values below, you can either use - a
# matplotlib color string, such as r, k, or b - an rgb tuple, such as
# (1.0, 0.5, 0.0) - a hex string, such as ff00ff or #ff00ff - a scalar
# grayscale intensity such as 0.75 - a legal html color name, eg red,
# blue, darkslategray

#### CONFIGURATION BEGINS HERE

# the default backend; one of GTK GTKAgg GTKCairo GTK3Agg GTK3Cairo
# CocoaAgg FltkAgg MacOSX QtAgg Qt4Agg TkAgg WX WXAgg Agg Cairo GDK PS
# PDF SVG Template
# You can also deploy your own backend outside of matplotlib by
# referring to the module name (which must be in the PYTHONPATH) as
# 'module://my_backend'
backend      : GTKAgg

# If you are using the Qt4Agg backend, you can choose here
# to use the PyQt4 bindings or the newer PySide bindings to
# the underlying Qt4 toolkit.
#backend.qt4 : PyQt4        # PyQt4 | PySide

# Note that this can be overridden by the environment variable
# QT_API used by Enthought Tool Suite (ETS); valid values are
# "pyqt" and "pyside".  The "pyqt" setting has the side effect of
# forcing the use of Version 2 API for QString and QVariant.

# if you are running pyplot inside a GUI and your backend choice
# conflicts, we will automatically try to find a compatible one for
# you if backend_fallback is True
#backend_fallback: True

#interactive  : False
#toolbar      : toolbar2   # None | toolbar2  ("classic" is deprecated)
#timezone     : UTC        # a pytz timezone string, eg US/Central or Europe/Paris

# Where your matplotlib data lives if you installed to a non-default
# location.  This is where the matplotlib fonts, bitmaps, etc reside
#datapath : /home/jdhunter/mpldata


### LINES
# See http://matplotlib.org/api/artist_api.html#module-matplotlib.lines for more
# information on line properties.
#lines.linewidth   : 1.0     # line width in points
#lines.linestyle   : -       # solid line
#lines.color       : blue    # has no affect on plot(); see axes.color_cycle
#lines.marker      : None    # the default marker
#lines.markeredgewidth  : 0.5     # the line width around the marker symbol
#lines.markersize  : 6            # markersize, in points
#lines.dash_joinstyle : miter        # miter|round|bevel
#lines.dash_capstyle : butt          # butt|round|projecting
#lines.solid_joinstyle : miter       # miter|round|bevel
#lines.solid_capstyle : projecting   # butt|round|projecting
#lines.antialiased : True         # render lines in antialised (no jaggies)

### PATCHES
# Patches are graphical objects that fill 2D space, like polygons or
# circles.  See
# http://matplotlib.org/api/artist_api.html#module-matplotlib.patches
# information on patch properties
#patch.linewidth        : 1.0     # edge width in points
#patch.facecolor        : blue
#patch.edgecolor        : black
#patch.antialiased      : True    # render patches in antialised (no jaggies)

### FONT
#
# font properties used by text.Text.  See
# http://matplotlib.org/api/font_manager_api.html for more
# information on font properties.  The 6 font properties used for font
# matching are given below with their default values.
#
# The font.family property has five values: 'serif' (e.g. Times),
# 'sans-serif' (e.g. Helvetica), 'cursive' (e.g. Zapf-Chancery),
# 'fantasy' (e.g. Western), and 'monospace' (e.g. Courier).  Each of
# these font families has a default list of font names in decreasing
# order of priority associated with them.
#
# The font.style property has three values: normal (or roman), italic
# or oblique.  The oblique style will be used for italic, if it is not
# present.
#
# The font.variant property has two values: normal or small-caps.  For
# TrueType fonts, which are scalable fonts, small-caps is equivalent
# to using a font size of 'smaller', or about 83% of the current font
# size.
#
# The font.weight property has effectively 13 values: normal, bold,
# bolder, lighter, 100, 200, 300, ..., 900.  Normal is the same as
# 400, and bold is 700.  bolder and lighter are relative values with
# respect to the current weight.
#
# The font.stretch property has 11 values: ultra-condensed,
# extra-condensed, condensed, semi-condensed, normal, semi-expanded,
# expanded, extra-expanded, ultra-expanded, wider, and narrower.  This
# property is not currently implemented.
#
# The font.size property is the default font size for text, given in pts.
# 12pt is the standard value.
#
#font.family         : sans-serif
#font.style          : normal
#font.variant        : normal
#font.weight         : medium
#font.stretch        : normal
# note that font.size controls default text sizes.  To configure
# special text sizes tick labels, axes, labels, title, etc, see the rc
# settings for axes and ticks. Special text sizes can be defined
# relative to font.size, using the following values: xx-small, x-small,
# small, medium, large, x-large, xx-large, larger, or smaller
#font.size           : 12.0
#font.serif          : Bitstream Vera Serif, New Century Schoolbook, Century Schoolbook L, Utopia, ITC Bookman, Bookman, Nimbus Roman No9 L, Times New Roman, Times, Palatino, Charter, serif
#font.sans-serif     : Bitstream Vera Sans, Lucida Grande, Verdana, Geneva, Lucid, Arial, Helvetica, Avant Garde, sans-serif
#font.cursive        : Apple Chancery, Textile, Zapf Chancery, Sand, cursive
#font.fantasy        : Comic Sans MS, Chicago, Charcoal, Impact, Western, fantasy
#font.monospace      : Bitstream Vera Sans Mono, Andale Mono, Nimbus Mono L, Courier New, Courier, Fixed, Terminal, monospace

### TEXT
# text properties used by text.Text.  See
# http://matplotlib.org/api/artist_api.html#module-matplotlib.text for more
# information on text properties

#text.color          : black

### LaTeX customizations. See http://www.scipy.org/Wiki/Cookbook/Matplotlib/UsingTex
#text.usetex         : False  # use latex for all text handling. The following fonts
                              # are supported through the usual rc parameter settings:
                              # new century schoolbook, bookman, times, palatino,
                              # zapf chancery, charter, serif, sans-serif, helvetica,
                              # avant garde, courier, monospace, computer modern roman,
                              # computer modern sans serif, computer modern typewriter
                              # If another font is desired which can loaded using the
                              # LaTeX \usepackage command, please inquire at the
                              # matplotlib mailing list
#text.latex.unicode : False # use "ucs" and "inputenc" LaTeX packages for handling
                            # unicode strings.
#text.latex.preamble :  # IMPROPER USE OF THIS FEATURE WILL LEAD TO LATEX FAILURES
                            # AND IS THEREFORE UNSUPPORTED. PLEASE DO NOT ASK FOR HELP
                            # IF THIS FEATURE DOES NOT DO WHAT YOU EXPECT IT TO.
                            # preamble is a comma separated list of LaTeX statements
                            # that are included in the LaTeX document preamble.
                            # An example:
                            # text.latex.preamble : \usepackage{bm},\usepackage{euler}
                            # The following packages are always loaded with usetex, so
                            # beware of package collisions: color, geometry, graphicx,
                            # type1cm, textcomp. Adobe Postscript (PSSNFS) font packages
                            # may also be loaded, depending on your font settings

#text.dvipnghack : None      # some versions of dvipng don't handle alpha
                             # channel properly.  Use True to correct
                             # and flush ~/.matplotlib/tex.cache
                             # before testing and False to force
                             # correction off.  None will try and
                             # guess based on your dvipng version

#text.hinting : 'auto' # May be one of the following:
                       #   'none': Perform no hinting
                       #   'auto': Use freetype's autohinter
                       #   'native': Use the hinting information in the
                       #             font file, if available, and if your
                       #             freetype library supports it
                       #   'either': Use the native hinting information,
                       #             or the autohinter if none is available.
                       # For backward compatibility, this value may also be
                       # True === 'auto' or False === 'none'.
text.hinting_factor : 8 # Specifies the amount of softness for hinting in the
                         # horizontal direction.  A value of 1 will hint to full
                         # pixels.  A value of 2 will hint to half pixels etc.

#text.antialiased : True # If True (default), the text will be antialiased.
                         # This only affects the Agg backend.

# The following settings allow you to select the fonts in math mode.
# They map from a TeX font name to a fontconfig font pattern.
# These settings are only used if mathtext.fontset is 'custom'.
# Note that this "custom" mode is unsupported and may go away in the
# future.
#mathtext.cal : cursive
#mathtext.rm  : serif
#mathtext.tt  : monospace
#mathtext.it  : serif:italic
#mathtext.bf  : serif:bold
#mathtext.sf  : sans
#mathtext.fontset : cm # Should be 'cm' (Computer Modern), 'stix',
                       # 'stixsans' or 'custom'
#mathtext.fallback_to_cm : True  # When True, use symbols from the Computer Modern
                                 # fonts when a symbol can not be found in one of
                                 # the custom math fonts.

#mathtext.default : it # The default font to use for math.
                       # Can be any of the LaTeX font names, including
                       # the special name "regular" for the same font
                       # used in regular text.

### AXES
# default face and edge color, default tick sizes,
# default fontsizes for ticklabels, and so on.  See
# http://matplotlib.org/api/axes_api.html#module-matplotlib.axes
#axes.hold           : True    # whether to clear the axes by default on
#axes.facecolor      : white   # axes background color
#axes.edgecolor      : black   # axes edge color
#axes.linewidth      : 1.0     # edge linewidth
#axes.grid           : False   # display grid or not
#axes.titlesize      : large   # fontsize of the axes title
#axes.labelsize      : medium  # fontsize of the x any y labels
#axes.labelweight    : normal  # weight of the x and y labels
#axes.labelcolor     : black
#axes.axisbelow      : False   # whether axis gridlines and ticks are below
                              # the axes elements (lines, text, etc)
#axes.formatter.limits : -7, 7 # use scientific notation if log10
                               # of the axis range is smaller than the
                               # first or larger than the second
#axes.formatter.use_locale : False # When True, format tick labels
                                   # according to the user's locale.
                                   # For example, use ',' as a decimal
                                   # separator in the fr_FR locale.
#axes.formatter.use_mathtext : False # When True, use mathtext for scientific
                                     # notation.
#axes.unicode_minus  : True    # use unicode for the minus symbol
                               # rather than hyphen.  See
                               # http://en.wikipedia.org/wiki/Plus_and_minus_signs#Character_codes
#axes.color_cycle    : b, g, r, c, m, y, k  # color cycle for plot lines
                                            # as list of string colorspecs:
                                            # single letter, long name, or
                                            # web-style hex

#polaraxes.grid      : True    # display grid on polar axes
#axes3d.grid         : True    # display grid on 3d axes

### TICKS
# see http://matplotlib.org/api/axis_api.html#matplotlib.axis.Tick
#xtick.major.size     : 4      # major tick size in points
#xtick.minor.size     : 2      # minor tick size in points
#xtick.major.width    : 0.5    # major tick width in points
#xtick.minor.width    : 0.5    # minor tick width in points
#xtick.major.pad      : 4      # distance to major tick label in points
#xtick.minor.pad      : 4      # distance to the minor tick label in points
#xtick.color          : k      # color of the tick labels
#xtick.labelsize      : medium # fontsize of the tick labels
#xtick.direction      : in     # direction: in, out, or inout

#ytick.major.size     : 4      # major tick size in points
#ytick.minor.size     : 2      # minor tick size in points
#ytick.major.width    : 0.5    # major tick width in points
#ytick.minor.width    : 0.5    # minor tick width in points
#ytick.major.pad      : 4      # distance to major tick label in points
#ytick.minor.pad      : 4      # distance to the minor tick label in points
#ytick.color          : k      # color of the tick labels
#ytick.labelsize      : medium # fontsize of the tick labels
#ytick.direction      : in     # direction: in, out, or inout


### GRIDS
#grid.color       :   black   # grid color
#grid.linestyle   :   :       # dotted
#grid.linewidth   :   0.5     # in points
#grid.alpha       :   1.0     # transparency, between 0.0 and 1.0

### Legend
#legend.fancybox      : False  # if True, use a rounded box for the
                               # legend, else a rectangle
#legend.isaxes        : True
#legend.numpoints     : 2      # the number of points in the legend line
#legend.fontsize      : large
#legend.pad           : 0.0    # deprecated; the fractional whitespace inside the legend border
#legend.borderpad     : 0.5    # border whitespace in fontsize units
#legend.markerscale   : 1.0    # the relative size of legend markers vs. original
# the following dimensions are in axes coords
#legend.labelsep      : 0.010  # deprecated; the vertical space between the legend entries
#legend.labelspacing  : 0.5    # the vertical space between the legend entries in fraction of fontsize
#legend.handlelen     : 0.05   # deprecated; the length of the legend lines
#legend.handlelength  : 2.     # the length of the legend lines in fraction of fontsize
#legend.handleheight  : 0.7     # the height of the legend handle in fraction of fontsize
#legend.handletextsep : 0.02   # deprecated; the space between the legend line and legend text
#legend.handletextpad : 0.8    # the space between the legend line and legend text in fraction of fontsize
#legend.axespad       : 0.02   # deprecated; the border between the axes and legend edge
#legend.borderaxespad : 0.5   # the border between the axes and legend edge in fraction of fontsize
#legend.columnspacing : 2.    # the border between the axes and legend edge in fraction of fontsize
#legend.shadow        : False
#legend.frameon       : True   # whether or not to draw a frame around legend

### FIGURE
# See http://matplotlib.org/api/figure_api.html#matplotlib.figure.Figure
#figure.figsize   : 8, 6    # figure size in inches
#figure.dpi       : 80      # figure dots per inch
#figure.facecolor : 0.75    # figure facecolor; 0.75 is scalar gray
#figure.edgecolor : white   # figure edgecolor
#figure.autolayout : False  # When True, automatically adjust subplot
                            # parameters to make the plot fit the figure

# The figure subplot parameters.  All dimensions are a fraction of the
# figure width or height
#figure.subplot.left    : 0.125  # the left side of the subplots of the figure
#figure.subplot.right   : 0.9    # the right side of the subplots of the figure
#figure.subplot.bottom  : 0.1    # the bottom of the subplots of the figure
#figure.subplot.top     : 0.9    # the top of the subplots of the figure
#figure.subplot.wspace  : 0.2    # the amount of width reserved for blank space between subplots
#figure.subplot.hspace  : 0.2    # the amount of height reserved for white space between subplots

### IMAGES
#image.aspect : equal             # equal | auto | a number
#image.interpolation  : bilinear  # see help(imshow) for options
#image.cmap   : jet               # gray | jet etc...
#image.lut    : 256               # the size of the colormap lookup table
#image.origin : upper             # lower | upper
#image.resample  : False

### CONTOUR PLOTS
#contour.negative_linestyle :  dashed # dashed | solid

### Agg rendering
### Warning: experimental, 2008/10/10
#agg.path.chunksize : 0           # 0 to disable; values in the range
                                  # 10000 to 100000 can improve speed slightly
                                  # and prevent an Agg rendering failure
                                  # when plotting very large data sets,
                                  # especially if they are very gappy.
                                  # It may cause minor artifacts, though.
                                  # A value of 20000 is probably a good
                                  # starting point.
### SAVING FIGURES
#path.simplify : True   # When True, simplify paths by removing "invisible"
                        # points to reduce file size and increase rendering
                        # speed
#path.simplify_threshold : 0.1  # The threshold of similarity below which
                                # vertices will be removed in the simplification
                                # process
#path.snap : True # When True, rectilinear axis-aligned paths will be snapped to
                  # the nearest pixel when certain criteria are met.  When False,
                  # paths will never be snapped.

# the default savefig params can be different from the display params
# Eg, you may want a higher resolution, or to make the figure
# background white
#savefig.dpi        : 100      # figure dots per inch
#savefig.facecolor  : white    # figure facecolor when saving
#savefig.edgecolor  : white    # figure edgecolor when saving
#savefig.format     : png      # png, ps, pdf, svg
#savefig.bbox       : standard # 'tight' or 'standard'.
#savefig.pad_inches : 0.1      # Padding to be used when bbox is set to 'tight'

# tk backend params
#tk.window_focus   : False    # Maintain shell focus for TkAgg

# ps backend params
#ps.papersize      : letter   # auto, letter, legal, ledger, A0-A10, B0-B10
#ps.useafm         : False    # use of afm fonts, results in small files
#ps.usedistiller   : False    # can be: None, ghostscript or xpdf
                                          # Experimental: may produce smaller files.
                                          # xpdf intended for production of publication quality files,
                                          # but requires ghostscript, xpdf and ps2eps
#ps.distiller.res  : 6000      # dpi
#ps.fonttype       : 3         # Output Type 3 (Type3) or Type 42 (TrueType)

# pdf backend params
#pdf.compression   : 6 # integer from 0 to 9
                       # 0 disables compression (good for debugging)
#pdf.fonttype       : 3         # Output Type 3 (Type3) or Type 42 (TrueType)

# svg backend params
#svg.image_inline : True       # write raster image data directly into the svg file
#svg.image_noscale : False     # suppress scaling of raster data embedded in SVG
#svg.fonttype : 'path'         # How to handle SVG fonts:
#    'none': Assume fonts are installed on the machine where the SVG will be viewed.
#    'path': Embed characters as paths -- supported by most SVG renderers
#    'svgfont': Embed characters as SVG fonts -- supported only by Chrome,
#               Opera and Safari

# docstring params
#docstring.hardcopy = False  # set this when you want to generate hardcopy docstring

# Set the verbose flags.  This controls how much information
# matplotlib gives you at runtime and where it goes.  The verbosity
# levels are: silent, helpful, debug, debug-annoying.  Any level is
# inclusive of all the levels below it.  If your setting is "debug",
# you'll get all the debug and helpful messages.  When submitting
# problems to the mailing-list, please set verbose to "helpful" or "debug"
# and paste the output into your report.
#
# The "fileo" gives the destination for any calls to verbose.report.
# These objects can a filename, or a filehandle like sys.stdout.
#
# You can override the rc default verbosity from the command line by
# giving the flags --verbose-LEVEL where LEVEL is one of the legal
# levels, eg --verbose-helpful.
#
# You can access the verbose instance in your code
#   from matplotlib import verbose.
#verbose.level  : silent      # one of silent, helpful, debug, debug-annoying
#verbose.fileo  : sys.stdout  # a log filename, sys.stdout or sys.stderr

# Event keys to interact with figures/plots via keyboard.
# Customize these settings according to your needs.
# Leave the field(s) empty if you don't need a key-map. (i.e., fullscreen : '')

#keymap.fullscreen : f               # toggling
#keymap.home : h, r, home            # home or reset mnemonic
#keymap.back : left, c, backspace    # forward / backward keys to enable
#keymap.forward : right, v           #   left handed quick navigation
#keymap.pan : p                      # pan mnemonic
#keymap.zoom : o                     # zoom mnemonic
#keymap.save : s                     # saving current figure
#keymap.quit : ctrl+w                # close the current figure
#keymap.grid : g                     # switching on/off a grid in current axes
#keymap.yscale : l                   # toggle scaling of y-axes ('log'/'linear')
#keymap.xscale : L, k                # toggle scaling of x-axes ('log'/'linear')
#keymap.all_axes : a                 # enable all axes

# Control location of examples data files
#examples.directory : ''   # directory to look in for custom installation

###ANIMATION settings
#animation.writer : ffmpeg         # MovieWriter 'backend' to use
#animation.codec : mp4             # Codec to use for writing movie
#animation.bitrate: -1             # Controls size/quality tradeoff for movie.
                                   # -1 implies let utility auto-determine
#animation.frame_format: 'png'     # Controls frame format used by temp files
#animation.ffmpeg_path: 'ffmpeg'   # Path to ffmpeg binary. Without full path
                                   # $PATH is searched
#animation.ffmpeg_args: ''         # Additional arugments to pass to mencoder
#animation.mencoder_path: 'ffmpeg' # Path to mencoder binary. Without full path
                                   # $PATH is searched
#animation.mencoder_args: ''       # Additional arugments to pass to mencoder
mkdir: /Users/olga/.matplotlib: File exists
--2013-04-09 22:59:48--  http://matplotlib.org/_static/matplotlibrc
Resolving matplotlib.org... 204.232.175.78
Connecting to matplotlib.org|204.232.175.78|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 23428 (23K) [application/octet-stream]
Saving to: ‘matplotlibrc’

     0K .......... .......... ..                              100% 67.6K=0.3s

2013-04-09 22:59:49 (67.6 KB/s) - ‘matplotlibrc’ saved [23428/23428]

You'll need to edit the ~/.matplotlib/matplotlibrc file in a text editor on your own machine to change the colors. However, we can't just use that vector we created earlier, because we must use HEX colors. We can use the mpl.colors.rgb2hex function to convert the 3-tuples to HEX strings.

In [13]:
for color in colors:
    print mpl.colors.rgb2hex(color)
#66c2a5
#fc8d62
#8da0cb
#e78ac3
#a6d854
#ffd92f
#e5c494

Before I edit the file, let's see what the file looks like on the line we're going to edit, where it says axes.color_cycle,

In [16]:
! cat ~/.matplotlib/matplotlibrc | grep axes.color_cycle
#lines.color       : blue    # has no affect on plot(); see axes.color_cycle
#axes.color_cycle    : b, g, r, c, m, y, k  # color cycle for plot lines

I edited the ~/.matplotlib/matplotlibrc file separately in a text editor.

In [17]:
! cat ~/.matplotlib/matplotlibrc | grep axes.color_cycle
#lines.color       : blue    # has no affect on plot(); see axes.color_cycle
axes.color_cycle    : 66c2a5, fc8d62, 8da0cb, e78ac3, a6d854, ffd92f, e5c494  # color cycle for plot lines

Now, in future instances (after we restart python and reload matplotlib) when we reset the color cycle to the defaults, we should get the correct 'Set2' colorbrewer colors. For now, we'll use the change we made to mpl.rcParams and to keep the colors the way they are.

Heatmaps

Let's use the same principles as before to improve this heatmap:

In [5]:
from matplotlib.colors import LogNorm
from pylab import *

#normal distribution center at x=0 and y=5
x = randn(100000)
y = randn(100000)+5

hist2d(x, y, bins=40, norm=LogNorm())
colorbar()
show()

What's so bad about this? Well, it's using a rainbow of colors to indicate a single scale - increasing from zero. Let's use one of the sequential colorbrewer palettes to improve this. I like green, so let's use that. We will tell brewer2mpl to give us a matplotlib-compatible colormap with the attribute .mpl_colormap, with the full call being,

brewer2mpl.get_map('Greens', 'sequential', 8).mpl_colormap
In [15]:
from matplotlib.colors import LogNorm
from pylab import *

#normal distribution center at x=0 and y=5
x = randn(100000)
y = randn(100000)+5

hist2d(x, y, bins=40, norm=LogNorm(), 
    cmap=brewer2mpl.get_map('Greens', 'sequential', 8).mpl_colormap)
colorbar()
show()

This is much easier to interpret, since we only have to distinguish an increase in saturation of the hue green, rather than be forced to think about multiple different hues and how their colors represent an increase in value.

Though if you just have increases from 0 to larger numbers, it may be even simpler (and better) to just use grey. Maybe not as pretty, but very easy to interpret.

In [16]:
from matplotlib.colors import LogNorm
from pylab import *

#normal distribution center at x=0 and y=5
x = randn(100000)
y = randn(100000)+5

# norm=LogNorm() tells the function to use a logscale for the z-values
hist2d(x, y, bins=40, norm=LogNorm(), 
    cmap=brewer2mpl.get_map('Greys', 'sequential', 8).mpl_colormap)
colorbar()
show()

But what if your data has positive and negative values? Then you want to use a divergent color map. I like blue-red (RdBu in reverse with these colormaps) because it has the natural interpretation of blue=cold, negative, and red=hot, positive.

The below example is from griddata_demo.py in the matplotlib gallery.

In [17]:
from numpy.random import uniform, seed
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
import numpy as np
# make up data.
#npts = int(raw_input('enter # of random points to plot:'))
seed(0)
npts = 200
x = uniform(-2,2,npts)
y = uniform(-2,2,npts)
z = x*np.exp(-x**2-y**2)
# define grid.
xi = np.linspace(-2.1,2.1,100)
yi = np.linspace(-2.1,2.1,200)
# grid the data.
zi = griddata(x,y,z,xi,yi,interp='linear')
# contour the gridded data, plotting dots at the nonuniform data points.
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')
CS = plt.contourf(xi,yi,zi,15,cmap=plt.cm.rainbow,
                  vmax=abs(zi).max(), vmin=-abs(zi).max())
plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=5,zorder=10)
plt.xlim(-2,2)
plt.ylim(-2,2)
plt.title('griddata test (%d points)' % npts)
Out[17]:
<matplotlib.text.Text at 0x10f546c50>

We'll improve on this example with a more natural, divergent colormap.

In [21]:
from numpy.random import uniform, seed
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
import numpy as np
# make up data.
#npts = int(raw_input('enter # of random points to plot:'))
seed(0)
npts = 200
x = uniform(-2,2,npts)
y = uniform(-2,2,npts)
z = x*np.exp(-x**2-y**2)
# define grid.
xi = np.linspace(-2.1,2.1,100)
yi = np.linspace(-2.1,2.1,200)
# grid the data.
zi = griddata(x,y,z,xi,yi,interp='linear')
# contour the gridded data, plotting dots at the nonuniform data points.
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')

# ---- This is the line we changed ---- #
CS = plt.contourf(xi,yi,zi,15,
    cmap=brewer2mpl.get_map('RdBu', 'diverging', 8, reverse=True).mpl_colormap,
                  vmax=abs(zi).max(), vmin=-abs(zi).max())

plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=5,zorder=10)
plt.xlim(-2,2)
plt.ylim(-2,2)
plt.title('griddata test (%d points)' % npts)
Out[21]:
<matplotlib.text.Text at 0x10ee3c6d0>

We can do other colormaps just for fun, too. What does purple and green look like?

In [23]:
from numpy.random import uniform, seed
from matplotlib.mlab import griddata
import matplotlib.pyplot as plt
import numpy as np
# make up data.
#npts = int(raw_input('enter # of random points to plot:'))
seed(0)
npts = 200
x = uniform(-2,2,npts)
y = uniform(-2,2,npts)
z = x*np.exp(-x**2-y**2)
# define grid.
xi = np.linspace(-2.1,2.1,100)
yi = np.linspace(-2.1,2.1,200)
# grid the data.
zi = griddata(x,y,z,xi,yi,interp='linear')
# contour the gridded data, plotting dots at the nonuniform data points.
CS = plt.contour(xi,yi,zi,15,linewidths=0.5,colors='k')

# ---- This is the line we changed ---- #
CS = plt.contourf(xi,yi,zi,15,
    cmap=brewer2mpl.get_map('PRGn', 'diverging', 8, reverse=True).mpl_colormap,
                  vmax=abs(zi).max(), vmin=-abs(zi).max())

plt.colorbar() # draw colorbar
# plot data points.
plt.scatter(x,y,marker='o',c='b',s=5,zorder=10)
plt.xlim(-2,2)
plt.ylim(-2,2)
plt.title('griddata test (%d points)' % npts)
Out[23]:
<matplotlib.text.Text at 0x10f9eedd0>

Default fonts

The default font shipped with matplotlib is Bitsream Vera Sans, and it's not that pretty. I much prefer Helvetica, and I wrote a tutorial on how to set Helvetica as the default sans-serif font in matplotlib. It was originally wrote for Mac OSX users, but the concepts can be used on any system. The basic idea is that you need to either obtain a set of Helvetica*.tff files, or extract them from Mac OS X's Helvetica.dfont file. Unfortuantely, it's fairly involved, and I will leave the reader to follow the link and use the tutorial.

Here are the before and after plots. Before:

Before setting Helvetica as the default font

After:

After setting Helvetica as the default font

Much nicer! Unfortunately, I performed this change on my old computer and didn't have time to change the defaults on this one, so we will have to suffer through Bitstream Vera Sans together.

Removing 'chartjunk'

'Chartjunk' is a term coined by Edward Tufte to describe any uninformative aspects of a graph. You can also think about the 'data-ink ratio' with the question, How is this patch of ink contributing to the interpretation of these data?

For example, this bar graph has an extraordinarily low 'data-ink ratio', and this unfortunate example is also from the matplotlib gallery.

In [61]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

class RibbonBox(object):

    original_image = read_png(get_sample_data("Minduka_Present_Blue_Pack.png",
                                              asfileobj=False))
    cut_location = 70
    b_and_h = original_image[:,:,2]
    color = original_image[:,:,2] - original_image[:,:,0]
    alpha = original_image[:,:,3]
    nx = original_image.shape[1]

    def __init__(self, color):
        rgb = matplotlib.colors.colorConverter.to_rgb(color)

        im = np.empty(self.original_image.shape,
                      self.original_image.dtype)


        im[:,:,:3] = self.b_and_h[:,:,np.newaxis]
        im[:,:,:3] -= self.color[:,:,np.newaxis]*(1.-np.array(rgb))
        im[:,:,3] = self.alpha

        self.im = im


    def get_stretched_image(self, stretch_factor):
        stretch_factor = max(stretch_factor, 1)
        ny, nx, nch = self.im.shape
        ny2 = int(ny*stretch_factor)

        stretched_image = np.empty((ny2, nx, nch),
                                   self.im.dtype)
        cut = self.im[self.cut_location,:,:]
        stretched_image[:,:,:] = cut
        stretched_image[:self.cut_location,:,:] = \
                self.im[:self.cut_location,:,:]
        stretched_image[-(ny-self.cut_location):,:,:] = \
                self.im[-(ny-self.cut_location):,:,:]

        self._cached_im = stretched_image
        return stretched_image



class RibbonBoxImage(BboxImage):
    zorder = 1

    def __init__(self, bbox, color,
                 cmap = None,
                 norm = None,
                 interpolation=None,
                 origin=None,
                 filternorm=1,
                 filterrad=4.0,
                 resample = False,
                 **kwargs
                 ):

        BboxImage.__init__(self, bbox,
                           cmap = cmap,
                           norm = norm,
                           interpolation=interpolation,
                           origin=origin,
                           filternorm=filternorm,
                           filterrad=filterrad,
                           resample = resample,
                           **kwargs
                           )

        self._ribbonbox = RibbonBox(color)
        self._cached_ny = None


    def draw(self, renderer, *args, **kwargs):

        bbox = self.get_window_extent(renderer)
        stretch_factor = bbox.height / bbox.width

        ny = int(stretch_factor*self._ribbonbox.nx)
        if self._cached_ny != ny:
            arr = self._ribbonbox.get_stretched_image(stretch_factor)
            self.set_array(arr)
            self._cached_ny = ny

        BboxImage.draw(self, renderer, *args, **kwargs)


if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    box_colors = [(0.8, 0.2, 0.2),
                  (0.2, 0.8, 0.2),
                  (0.2, 0.2, 0.8),
                  (0.7, 0.5, 0.8),
                  (0.3, 0.8, 0.7),
                  ]
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
        bbox0 = Bbox.from_extents(year-0.4, 0., year+0.4, h)
        bbox = TransformedBbox(bbox0, ax.transData)
        rb_patch = RibbonBoxImage(bbox, bc, interpolation="bicubic")

        ax.add_artist(rb_patch)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h), va="bottom", ha="center")

    patch_gradient = BboxImage(ax.bbox,
                               interpolation="bicubic",
                               zorder=0.1,
                               )
    gradient = np.zeros((2, 2, 4), dtype=np.float)
    gradient[:,:,:3] = [1, 1, 0.]
    gradient[:,:,3] = [[0.1, 0.3],[0.3, 0.5]] # alpha channel
    patch_gradient.set_array(gradient)
    ax.add_artist(patch_gradient)


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)

    fig.savefig('ribbon_box.png')
    plt.show()

Why is this so bad? We have these superfluous present boxes to represent five numbers. However, one thing that this figure does correctly is put the value the bar graph represents just above the bar. First, let's get rid of this silly and uninformative gradient by commenting it out.

In [63]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

class RibbonBox(object):

    original_image = read_png(get_sample_data("Minduka_Present_Blue_Pack.png",
                                              asfileobj=False))
    cut_location = 70
    b_and_h = original_image[:,:,2]
    color = original_image[:,:,2] - original_image[:,:,0]
    alpha = original_image[:,:,3]
    nx = original_image.shape[1]

    def __init__(self, color):
        rgb = matplotlib.colors.colorConverter.to_rgb(color)

        im = np.empty(self.original_image.shape,
                      self.original_image.dtype)


        im[:,:,:3] = self.b_and_h[:,:,np.newaxis]
        im[:,:,:3] -= self.color[:,:,np.newaxis]*(1.-np.array(rgb))
        im[:,:,3] = self.alpha

        self.im = im


    def get_stretched_image(self, stretch_factor):
        stretch_factor = max(stretch_factor, 1)
        ny, nx, nch = self.im.shape
        ny2 = int(ny*stretch_factor)

        stretched_image = np.empty((ny2, nx, nch),
                                   self.im.dtype)
        cut = self.im[self.cut_location,:,:]
        stretched_image[:,:,:] = cut
        stretched_image[:self.cut_location,:,:] = \
                self.im[:self.cut_location,:,:]
        stretched_image[-(ny-self.cut_location):,:,:] = \
                self.im[-(ny-self.cut_location):,:,:]

        self._cached_im = stretched_image
        return stretched_image



class RibbonBoxImage(BboxImage):
    zorder = 1

    def __init__(self, bbox, color,
                 cmap = None,
                 norm = None,
                 interpolation=None,
                 origin=None,
                 filternorm=1,
                 filterrad=4.0,
                 resample = False,
                 **kwargs
                 ):

        BboxImage.__init__(self, bbox,
                           cmap = cmap,
                           norm = norm,
                           interpolation=interpolation,
                           origin=origin,
                           filternorm=filternorm,
                           filterrad=filterrad,
                           resample = resample,
                           **kwargs
                           )

        self._ribbonbox = RibbonBox(color)
        self._cached_ny = None


    def draw(self, renderer, *args, **kwargs):

        bbox = self.get_window_extent(renderer)
        stretch_factor = bbox.height / bbox.width

        ny = int(stretch_factor*self._ribbonbox.nx)
        if self._cached_ny != ny:
            arr = self._ribbonbox.get_stretched_image(stretch_factor)
            self.set_array(arr)
            self._cached_ny = ny

        BboxImage.draw(self, renderer, *args, **kwargs)


if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    box_colors = [(0.8, 0.2, 0.2),
                  (0.2, 0.8, 0.2),
                  (0.2, 0.2, 0.8),
                  (0.7, 0.5, 0.8),
                  (0.3, 0.8, 0.7),
                  ]
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
        bbox0 = Bbox.from_extents(year-0.4, 0., year+0.4, h)
        bbox = TransformedBbox(bbox0, ax.transData)
        rb_patch = RibbonBoxImage(bbox, bc, interpolation="bicubic")

        ax.add_artist(rb_patch)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h), va="bottom", ha="center")

#    patch_gradient = BboxImage(ax.bbox,
#                               interpolation="bicubic",
#                               zorder=0.1,
#                               )
#    gradient = np.zeros((2, 2, 4), dtype=np.float)
#    gradient[:,:,:3] = [1, 1, 0.]
#    gradient[:,:,3] = [[0.1, 0.3],[0.3, 0.5]] # alpha channel
#    patch_gradient.set_array(gradient)
#    ax.add_artist(patch_gradient)


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)

    fig.savefig('ribbon_box.png')
    plt.show()

That was easy, we just removed the call to the gradient. Next, let's get rid of these boxes and replace them with simple bars. I'm going to cut out the gradient and the box code, and add the line,

 ax.bar(year, h, color=bc)
In [85]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    box_colors = [(0.8, 0.2, 0.2),
                  (0.2, 0.8, 0.2),
                  (0.2, 0.2, 0.8),
                  (0.7, 0.5, 0.8),
                  (0.3, 0.8, 0.7),
                  ]
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
#        bbox0 = Bbox.from_extents(year-0.4, 0., year+0.4, h)
#       bbox = TransformedBbox(bbox0, ax.transData)
#        rb_patch = BboxImage(bbox, interpolation='bicubic')
#        rb_ptch = RibbonBoxImage(bbox, bc, interpolation="bicubic")

#        ax.add_artist(rb_patch)
#        ax.add_artist(bbox)

        # --- this is the line we changed --- #
        ax.bar(year, h, color=bc)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h), va="bottom", ha="center")


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)
    
    fig.savefig('ribbon_box_no_ribbons.png')
    plt.show()

But this is offset to the right. Let's move it to the left using year-0.04as the previous graph. Also lets change from these hideous colors to 'Set1', another qualitative colorbrewer scheme.

In [77]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
#    box_colors = [(0.8, 0.2, 0.2),
#                  (0.2, 0.8, 0.2),
#                  (0.2, 0.2, 0.8),
#                  (0.7, 0.5, 0.8),
#                  (0.3, 0.8, 0.7),
#                  ]
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
        # --- this is the line we changed --- #
        ax.bar(year-0.4, h, color =bc)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h), va="bottom", ha="center")


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)
    
    fig.savefig('ribbon_box_no_ribbons.png')
    plt.show()

Let's move the number up a little.

In [80]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
#    box_colors = [(0.8, 0.2, 0.2),
#                  (0.2, 0.8, 0.2),
#                  (0.2, 0.2, 0.8),
#                  (0.7, 0.5, 0.8),
#                  (0.3, 0.8, 0.7),
#                  ]
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
        # --- this is the line we changed --- #
        ax.bar(year-0.4, h, color =bc)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h), va="bottom", ha="center")


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)
    
    fig.savefig('ribbon_box_no_ribbons.png')
    plt.show()
In [82]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
#    box_colors = [(0.8, 0.2, 0.2),
#                  (0.2, 0.8, 0.2),
#                  (0.2, 0.2, 0.8),
#                  (0.7, 0.5, 0.8),
#                  (0.3, 0.8, 0.7),
#                  ]
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
        # --- this is the line we changed --- #
        ax.bar(year-0.4, h, color =bc)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h+100), va="bottom", ha="center")


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)
    
    fig.savefig('ribbon_box_no_ribbons.png')
    plt.show()

Let's think some more about this data-ink ratio. What do the right and top axes really tell us? They just make a box around the plot. It looks much cleaner without them. We'll remove them with,

ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
In [83]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    
    # --- changed this line --- #
    box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
    
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
        # --- this is the line we changed --- #
        ax.bar(year-0.4, h, color =bc)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h+100), va="bottom", ha="center")


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)
    
    # --- Added this line --- #
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    
    fig.savefig('ribbon_box_no_ribbons.png')
    plt.show()

Well that removed the axis, but the ticks remain. We'll remove them with

ax.yaxis.set_ticks_position('left')
ax.xaxis.set_ticks_position('bottom')
In [84]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    
    # --- changed this line --- #
    box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
    
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
        # --- this is the line we changed --- #
        ax.bar(year-0.4, h, color =bc)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h+100), va="bottom", ha="center")


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)
    
    # --- Added this line --- #
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    
    # --- Added this line --- #
    ax.yaxis.set_ticks_position('left')
    ax.xaxis.set_ticks_position('bottom')
    
    fig.savefig('ribbon_box_no_ribbons.png')
    plt.show()

Even better, let's remove the left axis and replace it with a white overlapping grid. This way, the reader doesn't have to move their eye back and forth to the left axis and back to see what value corresponds to what height. We will aslo remove the ticks on the x-axis, since the year name labels the position, and we don't need a tick there.

ax.spines['left'].set_visible(False)
...
ax.xaxis.set_ticks_position('none')
ax.yaxis.set_ticks_position('none')
...
ax.grid(axis = 'y', color ='white', linestyle='-')
In [96]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    
    # --- changed this line --- #
    box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
    
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
        # --- this is the line we changed --- #
        ax.bar(year-0.4, h, color =bc)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h+100), va="bottom", ha="center")


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)
    
    # --- Added this line --- #
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    
    # --- Added this line --- #
    ax.yaxis.set_ticks_position('none')
    ax.xaxis.set_ticks_position('none')
    
    ax.grid(axis = 'y', color ='white', linestyle='-')
    
    fig.savefig('ribbon_box_no_ribbons.png')
    plt.show()

It would look even nicer without the black lines around the bars. We will adjust the ax.bar line to set linewidth=0,

ax.bar(year-0.4, h, color=bc, linewidth=0)
In [97]:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.image import BboxImage

from matplotlib._png import read_png
import matplotlib.colors
from matplotlib.cbook import get_sample_data

if 1:
    from matplotlib.transforms import Bbox, TransformedBbox
    from matplotlib.ticker import ScalarFormatter

    fig = plt.gcf()
    fig.clf()
    ax = plt.subplot(111)

    years = np.arange(2004, 2009)
    
    # --- changed this line --- #
    box_colors = brewer2mpl.get_map('Set1', 'qualitative', 5).mpl_colors
    
    heights = np.random.random(years.shape) * 7000 + 3000

    fmt = ScalarFormatter(useOffset=False)
    ax.xaxis.set_major_formatter(fmt)

    for year, h, bc in zip(years, heights, box_colors):
        # --- this is the line we changed --- #
        ax.bar(year-0.4, h, color=bc, linewidth=0)

        ax.annotate(r"%d" % (int(h/100.)*100),
                    (year, h+100), va="bottom", ha="center")


    ax.set_xlim(years[0]-0.5, years[-1]+0.5)
    ax.set_ylim(0, 10000)
    
    # --- Added this line --- #
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.spines['left'].set_visible(False)
    
    # --- Added this line --- #
    ax.yaxis.set_ticks_position('none')
    ax.xaxis.set_ticks_position('none')
    
    ax.grid(axis = 'y', color ='white', linestyle='-')
    
    fig.savefig('ribbon_box_no_ribbons.png')
    plt.show()

So now we have a very nice looking bar graph! All we did was keep 'erasing' chart items that weren't informative. You can use these concepts in your own graphs.

So far we've talked about things you can do with the existing matplotlib package. Now we'll talk about packages that implement other design principles.

Sparklines

Recently Tufte has introduced the idea of 'Sparklines', or a 'data-word', is an intense, word-sized graphic. The following examples use sparkplot and its introductory blog post. For example, if you visualize the wins (red, up) and losses (blue, down) by the Lakers' 2002 season where they won the NBA championships, it is easy to see streaks of wins and losses, Lakers' 2002 game series. It is also easy to compare to their 2005 performance, where they did not win the championship, Lakers' 2005 game series. This is a very nice way to visualize binary data.

Additionally, sparklines can be used to visualize a series of information. For example, this shows the number of messages sent on the message list comp.lang.py in 1994, 1994, and you see that the minimum is zero and the maximum is 518. Compare this to the messages sent in 2004, 2004.

But you may not just be interested in the min and max, but maybe in deviations from the norm. The southern oscillation is a good indicator of El Nino, and values less than -1 usually define an El Nino weather pattern, [data: Tahiti, 1955-1992]

If you have some series data or binary data you'd like to incorporate into a sentence, Sparklines are great.

iPython Notebook

To change your default fonts in iPython notebook, you will need to create a custom profile and create a custom CSS file, which is described thorougly in this tutorial. If you like what you see in my iPython notebook, which includes Consolas as the default code font, approximately 80-character column width, and centered cells, you may use my custom.css file:

In [1]:
# Find where my iPython directory is
! ipython locate
/Users/olga/.ipython
In [4]:
# Show the contents of my custom.css file, which I created using the above tutorial
! cat /Users/olga/.ipython/profile_customcss/static/css/custom.css
/**write your css in here**/
/* like */

<style>
    .CodeMirror{
        font-family: "Consolas", sans-serif;
    }
    
pre, code, kbd, samp {
     font-family: Consolas, monospace;
}

	div.input{
	width: 105ex;
}

div.text_cell{
	width: 105ex;
}

div.text_cell_render{
	width: 105ex;
}

    div.cell{
        max-width:750px;
        margin-left:auto;
        margin-right:auto;
    }

    h1 {
        text-align:left;
    }
</style>

Bokeh

Bokeh (photography term for the aesthetic quality of a blurred background which focuses attention on the foreground, definition from the Bokeh Github readme) is a new package (started in March 2012, compared to matplotlib which started in 2002) which aims to have beautiful, interactive visualizations within the iPython framework. It uses the powerful Data Driven Documents (d3) javascript library to render lovely vector-based graphics using the HTML5 canvas in the browser.

I downloaded the package but couldn't get the examples to work, so I will show you the example notebook they provided. It will definitely be a package to watch! The underlying data structures in Bokeh are pandas DataFrames, so you can expect further integration with it and iPython in the future.

In [1]:
from bokeh.mpl import PlotClient
p = PlotClient(username='defaultuser', serverloc="http://portcon:5006",userapikey="nokey")
p.use_doc('example')
p.notebooksources()
got read write apikey