%run talktools
%matplotlib inline
How about here?
(source: wtf vz)
How about this?
(source: fivethirtyeight)
"The Rule": female film characters have depth if...
|
The Need for Openness in Data Journalism
Thesis: journalists should subject themselves to the same reproducibility and openness standards as scientist.
Keegan: "I have found this “new” brand of data journalism disappointing foremost because it wants to perform science without abiding by scientific norms."
As data scientists, we should be as offended by obfuscated data as designers are by obfuscating design.
Walking the walk...
from IPython.display import IFrame
IFrame("http://nbviewer.ipython.org/github/brianckeegan/Bechdel/blob/master/Bechdel_test.ipynb", 800, 600)
To their credit, FiveThirtyEight responded and put their data on GitHub.
def fibonacci():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
for i, f in enumerate(fibonacci()):
print f,
if i > 35:
break
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
import matplotlib.pyplot as plt
import numpy as np
x, y = np.random.normal(size=(2, 100))
s, c = np.random.random(size=(2, 100))
plt.scatter(x, y, c=c, s=1000 * s, alpha=0.3);
Lots of visualization types are available: e.g. matplotlib gallery
# %load http://matplotlib.org/mpl_examples/mplot3d/bars3d_demo.py
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
for c, z in zip(['r', 'g', 'b', 'y'], [30, 20, 10, 0]):
xs = np.arange(20)
ys = np.random.rand(20)
# You can provide either a single color or an array. To demonstrate this,
# the first bar of each set will be colored cyan.
cs = [c] * len(xs)
cs[0] = 'c'
ax.bar(xs, ys, zs=z, zdir='y', color=cs, alpha=0.8)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()
IPython's "nbviewer" website: http://nbviewer.ipython.org
from IPython.display import IFrame
IFrame("http://nbviewer.ipython.org", 800, 600)
Here is an example from my own blog: a post written in IPython notebook and published to the web
IFrame("http://jakevdp.github.io/blog/2013/08/28/understanding-the-fft/", 800, 600)
From exploration to collaboration to publication to reproduction of results.
Code + Description + Data + Visualization in one place = True Openness and Reproducibility!
That is, we can create simple Python scripts that display javascript results:
from intfact import factorizer
factorizer()
(thanks to Brian Granger and Jon Frederic)
We'll see more of this later...
The Kernel/message passing system is generic enough to not need Python!
Via language magics and language kernels, the notebook can be used with an ever-expanding list of programming languages!