Homework assignment 2

To turn in this assignment, use the same methodology as we used last week. (Download a copy of this notebook, fill in the blanks, and e-mail to Dan.)

Problem set 1: Working with dictionaries

In the following code cell, I've made a dictionary mapping the names of several states to their capitals, called state_capitals.

In [ ]:
state_capitals = {'Alabama': 'Montgomery', 'Alaska': 'Juneau', 'Arizona': 'Phoenix'}

In the blank below, write an expression that evaluates Juneau, using square brackets to get a value from the dictionary.

In [ ]:

Now, write an expression that evaluates to the number of keys in the dictionary.

In [ ]:

In the following code cell, I've made a list of strings and assigned it to a variable called cheeses:

In [ ]:
cheeses = ["cheddar", "emmental", "gouda", "brie", "camembert"]

In the blank below, I've provided the skeleton of a for loop. Replace the ??? in the for loop with a statement that will cause the for loop to fill in the blank dictionary cheese_name_lengths, such that the dictionary has a key for every string in the cheeses list, and each key maps to a value that is the length of that string. The final line of the code compares cheese_name_lengths to the known correct value for the dictionary; when you run the code cell, it should print out True.

In [ ]:
cheese_name_lengths = {}
for cheese in cheeses:
print cheese_name_lengths == {'emmental': 8, 'gouda': 5, 'cheddar': 7, 'brie': 4, 'camembert': 9}

Problem set 2: the New York Times API

This one is tough, but I have faith in you. You're smart, and capable, and the outfit you're wearing for doing homework in is great.

Get a key for the Campaign Finance API. Write a Python program in the cell below that calculates and prints out the total dollar amount of presidential campaign contributions from contributors in New York state, to any candidate, in the 2012 election cycle. (Hint: Use the Presidential State/Zip URI structure. Make use of the API tool as appropriate.) I've already filled in the appropriate import statements for you.

In [9]:
import urllib
import json

# your code here!

Problem set 3: Working with strings

In the cell below, I've created a list of strings and assigned it to a variable capitalize_me.

In [10]:
capitalize_me = ['an abacus', 'bitter beefsteak', 'comfy culottes']

In the following blank code cell, write a short program (or a single expression!) that evaluates to another list, containing copies of these strings with their first letter capitalized. In other words, your filled-in code cell should display this when you run it:

['An abacus', 'Bitter beefsteak', 'Comfy culottes']

Use string slices and the .upper() method in your solution.

In [12]:

Problem set 4: Regular expressions

We're going to work with the Enron e-mail subject lines in this problem set. Make sure you have a copy of the corpus downloaded to your machine by running the following code cell:

In [14]:
import urllib
urllib.urlretrieve("https://raw.githubusercontent.com/ledeprogram/courses/master/databases/data/enronsubjects.txt", "enronsubjects.txt")
subjects = [x.strip() for x in open("enronsubjects.txt").readlines()]
all_subjects = open("enronsubjects.txt").read()

The variable subjects now contains a list, with each item in the list being a string that has a single subject line in it. The all_subjects variable contains a big string with all of the subject lines in it.

In the following cell, write a list comprehension that evaluates to a list of all subject lines that contain a US phone number (i.e., in the format 555-555-1212). Use the re.search() function to accomplish this task. (Hint: there should be 28 of them.) I've included the appropriate import statement for you.

In [18]:
import re

Now use the re.findall() function to create an expression that evaluates to a list of just the phone numbers.

In [ ]: