Lede Program

Data and databases

Day 0

Technological determinism: "we can't rewind"

In [26]:
#first blackbox
from IPython.display import YouTubeVideo

They took the credit for your second symphony

re-written by machine and new technology

and now I understand the problems you can see.

Oh oh -- I met your children

oh oh -- what did you tell them+

video killed the radio star

video killed the radio star

pictures came and broke your heart

we can't rewind we've gone too far --Buggles, 1979

fundamental thesis

In my mind and in my car

We can't rewind we've gone too far

Pictures came and broke your heart

Put the blame on VTR

lest point be lost “radio star” is stuck in a plastic tube from which she cannot escape [1:51]

metaphysics of 'Video Killed the Radio Star'

  • agency of technology
  • inevitability
  • outside our control
  • evolution unilinear
  • clear link to normative
    • technological development means (creative) destruction and downsizing
    • necessarily so, so accept it
    • "disruption" etc.


  • NSA: We can collect communications, so we should/must/can't not do it
  • “Internet Killed the Video Star: How In-House Internet Distribution of Home Video Will Affect Profit”
  • “Video killed the radio star, but has Google killed the learning organization?”

technological determinism

roughly, a belief that technology causes social, economic and cultural transformation

  • often belief that technology primary, most important cause of these changes
  • often belief that technology has internal dynamic, a univocal path of development

belief in technological determinism itself a major cause

  • even if false (as it surely is), belief in inevitability of technological change a major political and economic argument
  • need to figure out what to do given the change or, surrender to its (non-extant) inevitability


Not "digital" literacy

Technological autonomy

Letting us direct technology critically, rather than being ruled by it.

too binary: trick to learn the affordances of extant technologies while appreciating tradeoffs


is very easy. All I had to do was copy a part of the web address (aka URL).

  • no global optimal solution in use of pre-built technologies
  • recognize problems with solutions
  • confidence in opening black boxes IF AND WHEN called for

black box

  • we will at first use of fair number of black boxes to get you moving. These are procedures, initially rote gobbledigook. We'll get back to some of them. Others will likely remain rote unless you descend deeper into programming.
  • black boxes enable and constrain
    • think doing graphs in Excel or typography in Word

open boxes starts with learning Python

tie together black boxes that help us

then start opening them if necessary

"Raw data" is an oxymoron.

We make data from sources: we don't find it pregiven.

Data is made, not born: fully artificial

Artificiality of data first moment of reflection

  • who produced this data?
  • is there a documented standard for this data? what interests produced this standard?
  • what do and don't record?
  • how frequently? Are these sensors calibrated? Are the people drunk half the time? What sort of drunk?
  • what systems of classification used?
  • what thrown out and how?

Positivism is not our friend!

Big data ideology: more data yield more knowledge

  • McCarthy: neither good science nor good philosophy

Against the repressive hypothesis

Could treat as negative:

artificial therefore false

Or artificial therefore way to create something positive

Artificiality of data as positive critical stance

  • Biology (Bionformatics)
  • Literary criticism (Ramsay)
  • Journalism

Triple problem of knowledge, ethics, and computational practice

technical solutions are integral to ethical and social solutions

Deceptive accessibility

First law of data accessibility

never discuss data accessibility

Second law of data accessibility

data is actually useful for data journalism and digital humanists in inverse proportion to its readiblity by mere mortals

In other words, if you can read it easily (and you are not a computer), then the computer probably can't read it easily.

Our gripes with the bad data practices of others leads us to impose a law unto ourselves:

Third law of data accessibility

It is a universal maxim to strive to produce our data findings in formats good for human beings and also in formats open for other computational tools


Learn to take, structure, and present data findings.

Data and databases structures

We give you several examples of major ways of getting, processing, and organzing data. More importantly, develop skills in getting help when confronted with any data format--from formal documentation and, often more importantly, on-line communities such as Stack Overflow. Data structures enable us, aid communication, and constrain us. We'll be thinking about what produces those constraints and ways to hack them--to work against structures that limit what we are doing.


Like Gaul, roughly three parts

  • Data Structures and ways of getting at data
  • Making Structure from data
  • Storing and making structured data available


Often three parts to daily assignments

  • technical coding, scraping, munging exercises
  • examples of data journalism (GOOD AND BAD AND IN BETWEEN)
  • more methodological article or chapter

Formal work

  • Six weekly assignments, likely to be done in IPython notebook
  • Final project
  • Participation
  • "Drills"

Final Project

  • build and document some creative use of tools
  • some digestion of the data, of your methods
  • web server capable of providing the data in a meaningful way to someone else
In [ ]:

Python data structures: affordances and constraints

You'll remember from Ms. Ersatz's 8th grade algebra class assigning variables such as $x=1$, and from Ms. Candlestick's calculus class functions like $f(x)=x^2$. Using Python, we're going to assign lots of variables and make loads of functions. And it will be way more fun.

Assigning a value to a variable in Python is much like elementary algebra:

 y='Hello, there'

In the first, x is set equal, for now, to one. In the second, y is set equal to a series of characters, something typically not seen in algebra. Such a series of characters is known as a string among computer types, and now among you. Using the quotation marks tells python, "hey, a string is starting here," and then "yo, that string I mentioned, yeah, well, it's done."

Once you've set a variable, you can begin operating with it.

The most basic use of Python is as a big calculator. The IPython notebook is a particularly elegant form of this calculator.

In [22]:
#set x to 1
#and try adding 2 to x
#click in this box and then press SHIFT and ENTER at the same time

The # just tells Python that the line following is a comment, not a command.

We can 'calculate' with more than numbers. The process above works by analogy with strings, too. By a metaphorical extension, we can "add" them.

In [11]:
y='Hello, there'
y+' big bad wolf'
#click shift plus enter
'Hello, there big bad wolf'

What if you tried to add x, which is a number, and y, a bunch of characters?

In [16]:
TypeError                                 Traceback (most recent call last)
<ipython-input-16-140f49470644> in <module>()
----> 1 y+x

TypeError: cannot concatenate 'str' and 'int' objects

This is your first of many error messages.

It says:

TypeError: cannot concatenate 'str' and 'int' objects

In other words, python says, "Hello! You can't add together a string and an integer. (Duh.)"

Generally this is seen as positive: python prevents us from making certain and thus enables us to write better code without mistakes.

Like all programming languages, Python has different data types. Some are good for integers, some for text and text-like stuff, and some are good for collections of data. They are artificial kinds. They let us do lots of things and save us from doing some things, and prevent us from doing others.

In [21]:
#let's try adding a non-integer number
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [10]: