Homework Assignment #6

These problem sets focus on MongoDB and Tornado.

Problem set #1: Humongous soup

For this first problem set, we're going to build off of your solution to Problem Set #2 ("Of Widgets and Pandas") in last week's homework assignment. Specifically, we're going to be working with the widgets listed on this page. You'll be creating a MongoDB database that has a document for every listed widget. But first, in the cell below, connect to your local MongoDB instance and make a variable collection that points to a collection called widgets in the lede_program database. I've done the appropriate import statements for you, and included a line at the end that prints the full name of the collection (i.e., its database name plus the name of the collection.) The cell's output should be the string lede_program.widgets.

In [17]:
import pymongo

# your code here
conn = pymongo.Connection("localhost")
db = conn['lede_program']
collection = db['widgets']
# end your code

print collection.full_name
lede_program.widgets

In the cell below, write a Python statement that will remove all documents from this collection. We want to start fresh! I've left in a line that prints out the number of records in the collection. This number should be 0.

In [18]:
# your code here
collection.remove()
# end your code
print collection.count()
0

Great! Now, the tough part. In the cell below, duplicate the code in the second code cell from Problem Set #2 in Assignment #5. There should be one key difference in your code, however: instead of creating an empty list, and adding each document to the list, you should instead insert each document into the widgets collection. After you've executed the code, evaluating the expression list(collection.find()) should look something like this (your ObjectId numbers will be different):

[{u'_id': ObjectId('53b4c0c92735fe3ff2977816'),
  u'partno': u'C1-9476',
  u'price': 2.7,
  u'quantity': 512,
  u'widgetname': u'Skinner Widget'},
 {u'_id': ObjectId('53b4c0c92735fe3ff2977817'),
  u'partno': u'JDJ-32/V',
  u'price': 9.36,
  u'quantity': 967,
  u'widgetname': u'Widget For Furtiveness'},
  ... some widgets omitted for brevity ...
 {u'_id': ObjectId('53b4c0c92735fe3ff297781e'),
  u'partno': u'5B-941/F',
  u'price': 13.26,
  u'quantity': 919,
  u'widgetname': u'Widget For Cinema'}]

(Hint: Pay attention to types! Make sure that price is an integer and quantity is a floating-point number when inserting the document into MongoDB.) I've included some scaffolding for you, including the Beautiful Soup import statement and the code to fetch the contents of widgets.html into a variable called html_str. I've also included, at the very end, the expression to show all documents in the collection.

In [19]:
from bs4 import BeautifulSoup
import urllib

html_str = urllib.urlopen("http://static.decontextualize.com/widgets.html").read()

# your code here
document = BeautifulSoup(html_str)
tr_tags = document.find_all("tr", attrs={'class': 'widgetinfo'})
for tr_tag in tr_tags:
    widget_dict = {}
    for class_ops in (("partno", str), ("widgetname", str), ("price", lambda x: float(x[1:])), ("quantity", int)):
        tag = tr_tag.find("td", attrs={'class': class_ops[0]})
        widget_dict[class_ops[0]] = class_ops[1](tag.string)
    collection.insert(widget_dict) 
# end your code

list(collection.find())
Out[19]:
[{u'_id': ObjectId('53b4d4002735fe3ff297781f'),
  u'partno': u'C1-9476',
  u'price': 2.7,
  u'quantity': 512,
  u'widgetname': u'Skinner Widget'},
 {u'_id': ObjectId('53b4d4002735fe3ff2977820'),
  u'partno': u'JDJ-32/V',
  u'price': 9.36,
  u'quantity': 967,
  u'widgetname': u'Widget For Furtiveness'},
 {u'_id': ObjectId('53b4d4002735fe3ff2977821'),
  u'partno': u'YP4-325/J',
  u'price': 5.17,
  u'quantity': 787,
  u'widgetname': u'Widget For Strawman'},
 {u'_id': ObjectId('53b4d4002735fe3ff2977822'),
  u'partno': u'VK-486',
  u'price': 8.97,
  u'quantity': 441,
  u'widgetname': u'Manicurist Widget'},
 {u'_id': ObjectId('53b4d4002735fe3ff2977823'),
  u'partno': u'R4K-990',
  u'price': 11.73,
  u'quantity': 320,
  u'widgetname': u'Infinite Widget'},
 {u'_id': ObjectId('53b4d4002735fe3ff2977824'),
  u'partno': u'MZ-556/B',
  u'price': 2.35,
  u'quantity': 948,
  u'widgetname': u'Yellow-Tipped Widget'},
 {u'_id': ObjectId('53b4d4002735fe3ff2977825'),
  u'partno': u'QV-730',
  u'price': 3.76,
  u'quantity': 59,
  u'widgetname': u'Unshakable Widget'},
 {u'_id': ObjectId('53b4d4002735fe3ff2977826'),
  u'partno': u'T1-9731',
  u'price': 7.11,
  u'quantity': 790,
  u'widgetname': u'Self-Knowledge Widget'},
 {u'_id': ObjectId('53b4d4002735fe3ff2977827'),
  u'partno': u'5B-941/F',
  u'price': 13.26,
  u'quantity': 919,
  u'widgetname': u'Widget For Cinema'}]

Nice. Your work is how I like my burgers: well done. (It's also how I like my steak: rare, and of the highest quality.)

Finally, in the cell below, write an expression that checks to ensure that the number of documents in the collection is equal to the number of widgets in widgets.html. The cell should contain a single expression that evaluates to True.

In [20]:
len(document.find_all("tr")) == collection.count()
Out[20]:
True

Problem set #2: An inquiry into the nature of widgets

This problem set focuses on exercising your ability to write expressions with Pymongo that filter, limit, and sort lists of documents in a MongoDB collection, using the .find(), .sort() and .limit() methods.

First problem. In the cell below, write a statement that performs a MongoDB query returning a list containing one document: the least expensive widget in the catalog. I.e., your code, when run, should evaluate to this (keeping in mind that your ObjectId will be different):

[{u'_id': ObjectId('53b4c0c92735fe3ff297781b'),
  u'partno': u'MZ-556/B',
  u'price': 2.35,
  u'quantity': 948,
  u'widgetname': u'Yellow-Tipped Widget'}]
In [21]:
list(collection.find().sort("price").limit(1))
Out[21]:
[{u'_id': ObjectId('53b4d4002735fe3ff2977824'),
  u'partno': u'MZ-556/B',
  u'price': 2.35,
  u'quantity': 948,
  u'widgetname': u'Yellow-Tipped Widget'}]

Now, in the cell below, write an expression that returns a list of widget documents where the quantity of available widgets is greater than 900. These documents should only have a subset of available fields, namely partno and quantity. Your code, when run, should evaluate to this (again, keeping in mind that your ObjectIds will be different; the order of documents in the list might also be different):

[{u'partno': u'JDJ-32/V', u'quantity': 967},
 {u'partno': u'MZ-556/B', u'quantity': 948},
 {u'partno': u'5B-941/F', u'quantity': 919}]
In [22]:
list(collection.find({"quantity": {"$gt": 900}}, {"_id": 0, "partno": 1, "quantity": 1}))
Out[22]:
[{u'partno': u'JDJ-32/V', u'quantity': 967},
 {u'partno': u'MZ-556/B', u'quantity': 948},
 {u'partno': u'5B-941/F', u'quantity': 919}]

Cool. Finally, in the cell below, write an expression that returns a list of widget documents where the word "Widget" occurs at the end of the widgetname string. Use the $regex query selector. The documents in the list should include only the widgetname field, and should be sorted by the widgetname field. I.e., your code, when run, should evaluate to this (again, your ObjectIds will be different):

[{u'widgetname': u'Infinite Widget'},
 {u'widgetname': u'Manicurist Widget'},
 {u'widgetname': u'Self-Knowledge Widget'},
 {u'widgetname': u'Skinner Widget'},
 {u'widgetname': u'Unshakable Widget'},
 {u'widgetname': u'Yellow-Tipped Widget'}]
In [23]:
list(collection.find({"widgetname": {"$regex": "Widget$"}}, {"_id": 0, "widgetname": 1}).sort("widgetname"))
Out[23]:
[{u'widgetname': u'Infinite Widget'},
 {u'widgetname': u'Manicurist Widget'},
 {u'widgetname': u'Self-Knowledge Widget'},
 {u'widgetname': u'Skinner Widget'},
 {u'widgetname': u'Unshakable Widget'},
 {u'widgetname': u'Yellow-Tipped Widget'}]

Problem set #3: It's a twister!

In this problem set, you'll make a very simple web API with Tornado.

This problem set works a little bit different from the others! You'll be pasting into the cell below a program that you've written elsewhere. (As discussed in class, it's difficult to run a Tornado application inside of iPython Notebook.)

Here's how your web API should work. A request to the resource /oz should return the following response (as a JSON string):

{"result": "Toto, I've a feeling we're not in Kansas anymore."}

If the parameters pet and place are included in the query string, then the string in the response should include the strings specified as the values of those keys in place of Toto and Kansas, respectively. For example, the following request with curl, assuming your web service is running on localhost port 8000...

curl -s "http://localhost:8000/oz?pet=Fluffy&place=Brooklyn"

... should print the following response:

{"result": "Fluffy, I've a feeling we're not in Brooklyn anymore."}

I've included the basic framework for the application for you. The part you need to fill in is in the definition of the get() method.

In [ ]:
import tornado.httpserver
import tornado.ioloop
import tornado.options
import tornado.web

from tornado.options import define, options

define("port", default=8000, help="run on the given port", type=int)
tornado.options.parse_command_line()

class OzHandler(tornado.web.RequestHandler):
  def get(self):
    # your code here!
    pet = self.get_argument("pet", "Toto")
    place = self.get_argument("place", "Kansas")
    self.write({"result": pet + ", I've a feeling we're not in " + place + " anymore."})
    # end your code

application = tornado.web.Application(handlers=[(r"/oz", OzHandler)])
http_server = tornado.httpserver.HTTPServer(application)
http_server.listen(options.port)
tornado.ioloop.IOLoop.instance().start()

Great job! I enjoyed writing this homework assignment and I hope you enjoyed completing it.