IPython notebook: visualization in context

In [1]:
%run talktools
%matplotlib inline

Data Visualization: just graphics are not enough

What is the problem here?

(source: media matters)

How about here?

(source: wtf vz)

How about this?

(source: fivethirtyeight)

The Bechdel Test

"The Rule": female film characters have depth if...

  1. Film must have at least two women
  2. The women must have names
  3. The women must have a conversation at some point
  4. That conversation is with each other
  5. That conversation is not about the men

The results: gender bias is not profitable!

What's wrong with the FiveThirtyEight results?

Probably nothing, but *how can we know that?*

The data & analysis are hidden from us.

Given the importance and influence of data-heavy media sources like FiveThirtyEight (especially when they address important topics like elections, climate change, gender issues, etc.) shouldn't we expect a higher bar?

Blog Post by Brian Keegan

The Need for Openness in Data Journalism

Thesis: journalists should subject themselves to the same reproducibility and openness standards as scientist.

Keegan: "I have found this “new” brand of data journalism disappointing foremost because it wants to perform science without abiding by scientific norms."

The takeaway

As data scientists, we should be as offended by obfuscated data as designers are by obfuscating design.

Brian's Response to FiveThirtyEight

Walking the walk...

In [10]:
from IPython.display import IFrame
IFrame("http://nbviewer.ipython.org/github/brianckeegan/Bechdel/blob/master/Bechdel_test.ipynb", 800, 600)
Out[10]:

To their credit, FiveThirtyEight responded and put their data on GitHub.

This is the IPython notebook

Combination of Markdown, Math, Code, Visualizations, and more into a single executable document.

We can embed, and execute, blocks of code:

In [3]:
def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

for i, f in enumerate(fibonacci()):
    print f,
    if i > 35:
        break
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352

We can write paragraphs of text

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

We can write math using TeX

$$ \int x^n \exp[-x]dx = \Gamma(n + 1, x)$$

We can write lists

  • Lorem ipsum dolor sit amet
  • consectetur adipisicing elit
  • sed do eiusmod tempor incididunt
  • ut labore et dolore magna aliqua

We can embed static figures

More importantly

we can embed dynamically figures with the generating code

In [4]:
import matplotlib.pyplot as plt
import numpy as np

x, y = np.random.normal(size=(2, 100))
s, c = np.random.random(size=(2, 100))

plt.scatter(x, y, c=c, s=1000 * s, alpha=0.3);

The core viz tool: matplotlib

Lots of visualization types are available: e.g. matplotlib gallery

In [5]:
# %load http://matplotlib.org/mpl_examples/mplot3d/bars3d_demo.py
In [6]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for c, z in zip(['r', 'g', 'b', 'y'], [30, 20, 10, 0]):
    xs = np.arange(20)
    ys = np.random.rand(20)

    # You can provide either a single color or an array. To demonstrate this,
    # the first bar of each set will be colored cyan.
    cs = [c] * len(xs)
    cs[0] = 'c'
    ax.bar(xs, ys, zs=z, zdir='y', color=cs, alpha=0.8)

ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')

plt.show()

Notebooks can be shared online

IPython's "nbviewer" website: http://nbviewer.ipython.org

In [7]:
from IPython.display import IFrame
IFrame("http://nbviewer.ipython.org", 800, 600)
Out[7]:

Many people use IPython to blog as well

Here is an example from my own blog: a post written in IPython notebook and published to the web

In [8]:
IFrame("http://jakevdp.github.io/blog/2013/08/28/understanding-the-fft/", 800, 600)
Out[8]:

Notebooks can even be viewed as SlideShows

(I'm So Meta, Even This Acronymn)

So What Is IPython?

"Tools for the entire lifecycle of a scientific idea"

  • From exploration to collaboration to publication to reproduction of results.

  • Code + Description + Data + Visualization in one place = True Openness and Reproducibility!

Picking on Nate Silver again...

Many more notebooks to explore

IPython Architecture

Message passing is kernel agnostic

Client view is browser-based

JS-enabled Frontend gives us huge flexibility

That is, we can create simple Python scripts that display javascript results:

In [9]:
from intfact import factorizer
factorizer()

(thanks to Brian Granger and Jon Frederic)

We'll see more of this later...

IPython is not just for Python...

The Kernel/message passing system is generic enough to not need Python!

Via language magics and language kernels, the notebook can be used with an ever-expanding list of programming languages!