Homework assignment 2

To turn in this assignment, use the same methodology as we used last week. (Download a copy of this notebook, fill in the blanks, and e-mail to Dan.)

Problem set 1: Working with dictionaries

In the following code cell, I've made a dictionary mapping the names of several states to their capitals, called state_capitals.

In [1]:
state_capitals = {'Alabama': 'Montgomery', 'Alaska': 'Juneau', 'Arizona': 'Phoenix'}

In the blank below, write an expression that evaluates Juneau, using square brackets to get a value from the dictionary.

In [2]:
state_capitals['Alaska']
Out[2]:
'Juneau'

Now, write an expression that evaluates to the number of keys in the dictionary.

In [3]:
len(state_capitals.keys())
Out[3]:
3

In the following code cell, I've made a list of strings and assigned it to a variable called cheeses:

In [5]:
cheeses = ["cheddar", "emmental", "gouda", "brie", "camembert"]

In the blank below, I've provided the skeleton of a for loop. Replace the ??? in the for loop with a statement that will cause the for loop to fill in the blank dictionary cheese_name_lengths, such that the dictionary has a key for every string in the cheeses list, and each key maps to a value that is the length of that string. The final line of the code compares cheese_name_lengths to the known correct value for the dictionary; when you run the code cell, it should print out True.

In [6]:
cheese_name_lengths = {}
for cheese in cheeses:
    cheese_name_lengths[cheese] = len(cheese)
print cheese_name_lengths == {'emmental': 8, 'gouda': 5, 'cheddar': 7, 'brie': 4, 'camembert': 9}
True

Problem set 2: the New York Times API

This one is tough, but I have faith in you. You're smart, and capable, and the outfit you're wearing for doing homework in is great.

Get a key for the Campaign Finance API. Write a Python program in the cell below that calculates and prints out the total dollar amount of presidential campaign contributions from contributors in New York state, to any candidate, in the 2012 election cycle. (Hint: Use the Presidential State/Zip URI structure. Make use of the API tool as appropriate.) I've already filled in the appropriate import statements for you.

In [9]:
import urllib
import json

api_key = "your api key here"

url = "http://api.nytimes.com/svc/elections/us/v3/finances/2012/president/states/NY.json?api-key=" + api_key
response_str = urllib.urlopen(url).read()
response_dict = json.loads(response_str)

sum([float(rec['total']) for rec in response_dict['results']])
Out[9]:
19022925.18

Problem set 3: Working with strings

In the cell below, I've created a list of strings and assigned it to a variable capitalize_me.

In [11]:
capitalize_me = ['an abacus', 'bitter beefsteak', 'comfy culottes']

In the following blank code cell, write a short program (or a single expression!) that evaluates to another list, containing copies of these strings with their first letter capitalized. In other words, your filled-in code cell should display this when you run it:

['An abacus', 'Bitter beefsteak', 'Comfy culottes']

Use string slices and the .upper() method in your solution.

In [12]:
[s[0].upper() + s[1:] for s in capitalize_me]
Out[12]:
['An abacus', 'Bitter beefsteak', 'Comfy culottes']

Problem set 4: Regular expressions

We're going to work with the Enron e-mail subject lines in this problem set. Make sure you have a copy of the corpus downloaded to your machine by running the following code cell:

In [18]:
import urllib
urllib.urlretrieve("https://raw.githubusercontent.com/ledeprogram/courses/master/databases/data/enronsubjects.txt", "enronsubjects.txt")
subjects = [x.strip() for x in open("enronsubjects.txt").readlines()]
all_subjects = open("enronsubjects.txt").read()

The variable subjects now contains a list, with each item in the list being a string that has a single subject line in it. The all_subjects variable contains a big string with all of the subject lines in it.

In the following cell, write a list comprehension that evaluates to a list of all subject lines that contain a US phone number (i.e., in the format 555-555-1212). Use the re.search() function to accomplish this task. (Hint: there should be 28 of them.) I've included the appropriate import statement for you.

In [16]:
import re
[subj for subj in subjects if re.search(r"\d\d\d-\d\d\d-\d\d\d\d", subj)]
Out[16]:
['Call Chris 713-853-4743',
 "FW: Birgit's Contact Info: 713-222-7667",
 "Birgit's Contact Info: 713-222-7667",
 "RE: Birgit's Contact Info: 713-222-7667",
 "Birgit's Contact Info: 713-222-7667",
 "RE: Birgit's Contact Info: 713-222-7667",
 "RE: Birgit's Contact Info: 713-222-7667",
 "RE: Birgit's Contact Info: 713-222-7667",
 "FW: Birgit's Contact Info: 713-222-7667",
 "RE: Birgit's Contact Info: 713-222-7667",
 "RE: Birgit's Contact Info: 713-222-7667",
 'Terry 281-296-0573',
 'Re: 713-851-2499',
 "FW: Mark's number is 713-345-7896",
 "RE: Mark's number is 713-345-7896",
 "Mark's number is 713-345-7896",
 "RE: Mark's number is 713-345-7896",
 "Mark's number is 713-345-7896",
 'Re: Fw: KU Calendar please call for map 1-281-367-8953 or',
 'Bill F 713-528-0759',
 'Call Jonathon Fairbanks 713-850-9002w/713-703-8294c and Freddy',
 'Call Ken Kirk re CGAS lawsuit 614-888-9588',
 'Call Alisa Johnston at Dynegy 713-767-8686 re Debbie Chance',
 'Re: Set up meeting w/Teldata /Tracy Ashmore-303-571-6135',
 'Re: Kaye Ellis - 281-537-9334 (home)',
 'Interconnection Issues Discussion Paper Conf Call 1-800-937-6563,',
 "Tentative: EPSA Cost/Benefit Analysis MEETING  Julie Simon to support FERC's RTO policies.  Dial 1-800-937-6563 and ask for the Julie Simon/EPSA call.",
 'CA Pacific NW Refund Conf Call (Alvarez) 1-888-296-1938, HC:']

Now use the re.findall() function to create an expression that evaluates to a list of just the phone numbers.

In [17]:
re.findall(r"\d\d\d-\d\d\d-\d\d\d\d", all_subjects)
Out[17]:
['713-853-4743',
 '713-222-7667',
 '713-222-7667',
 '713-222-7667',
 '713-222-7667',
 '713-222-7667',
 '713-222-7667',
 '713-222-7667',
 '713-222-7667',
 '713-222-7667',
 '713-222-7667',
 '281-296-0573',
 '713-851-2499',
 '713-345-7896',
 '713-345-7896',
 '713-345-7896',
 '713-345-7896',
 '713-345-7896',
 '281-367-8953',
 '713-528-0759',
 '713-850-9002',
 '713-703-8294',
 '614-888-9588',
 '713-767-8686',
 '303-571-6135',
 '281-537-9334',
 '800-937-6563',
 '800-937-6563',
 '888-296-1938']