It's about time for us to stop talking about data and start talking about databases. In particular, we're going to talk about database software called MongoDB. MongoDB is very versatile database software, used widely for a number of applications. It's especially good for operating on large amounts of data and data that needs to be used at scale (i.e., by many users simultaneously).
The purpose of this tutorial is to show you the basics of working with MongoDB. We'll cover how to insert documents into a MongoDB database and how to get lists of documents back from the database that match a particular set of criteria. This is barely scratching the surface of MongoDB's potential! But hopefully by the end of this session you'll have enough literacy in how MongoDB works to explore on your own its more advanced features and applications.
It seems like we've been doing fine so far in class just working with CSV files and web APIs. So what exactly is a database, and why might we need one? For our purpose, we can define a "database" as a piece of software whose main purpose is to make it possible for us to store data somewhere, then later retrieve it, usually in a way that pays attention to the structure of the data itself. We haven't used "databases" per se in class, opting instead to download data from CSVs and web APIs straight into our Python programs, then discard our local version of the data when we're done with it. There are several reasons we might want to put this data into a database instead:
Persistence. With the programs we've written so far in our notebooks, we download data, process it into a form that we like, draw conclusions from it and then... it disappears forever, once we close the notebook. To get that data again, we have to download and process it again, from scratch. This is fine for small amounts of data, but with larger amounts it can be very time-consuming. Having separate database software allows us to store our data in a way that persists from one notebook session to the next. Very convenient.
Sharing. Another problem with downloading and processing data on demand is that it's difficult for us to share the result of our data processing with other people. The data exists in our IPython notebook and nowhere else---there's no easy way to let someone else access it. A database like MongoDB, on the other hand, can be used by many people simultaneously. It's also easy to create a "dump" of a MongoDB database's contents and send it to a colleague, who can then reconstruct the data on their own server with a minimum amount of hassle.
Performance. Many databases, like MongoDB, boast features (like indexing, aggregation, and map-reduce) that can make accessing and processing data very fast, faster than we could do in Python on our own.
MongoDB is "client/server" software, which means that the software itself runs on a server somewhere, and various clients on other computers can access it. The clients each talk to the server over the network, with a particular protocol unique to MongoDB. (Most database software works like this, but there are some exceptions, like SQLite, which work with files stored locally on your machine.)
We're going to write our "client" software in IPython Notebook, using a library called pymongo
. The pymongo
library gives us an easy way to write Python code that opens a network connection to the server, sends it commands using the MongoDB protocol, and interprets the results that come back.
The pymongo
library is pre-installed on your EC2 instance, but you can install it (using pip) on other machines like so:
pip install pymongo
As a quick note---in this class, the "server" software (MongoDB itself) and the "client" software (the Python code running in your notebook) both live on the same machine (i.e., your EC2 server). When you see the word localhost
below, that's what it means---localhost
is a special word in Internet talk that means "connect to the same server that I'm running on." Other than that word, though, everything you'll learn here applies to connecting to MongoDB on non-localhost servers.
MongoDB is a "document-based" database. MongoDB "documents" are essentially Python dictionaries: a lists of key/value pairs that describe some particular thing. Documents are stored in a structure called a "collection," which is essentially like a list of dictionaries in Python. Most of the work we do in MongoDB will be adding documents to a collection, and then asking that collection to return documents that match particular criteria.
Collections themselves are grouped into "databases," and each MongoDB server can support multiple databases.
Okay enough prefatory material, let's get to the meat. First, we'll import pymongo
and call its Connection
function, which returns a new object that represents our network connection to the server. We'll pass the string localhost
as the first argument, which tells Python to connect to the MongoDB server running on your own EC2 machine.
import pymongo
conn = pymongo.Connection("localhost")
print type(conn)
<class 'pymongo.connection.Connection'>
One thing you can do with a Connection
object is call its .database_names()
method, which returns a list of all databases on the server.
conn.database_names()
[u'local']
You should see only one database right now---local
. The local
database is for MongoDB's internal use, so we won't mess with it. Instead, we'll use the Connection
object to get another object that represents a new database, like so:
db = conn['lede_program']
print type(db)
<class 'pymongo.database.Database'>
Note: We haven't done anything at this point to explicitly create the lede_program
database! MongoDB automatically creates databases when you first use them.
This Database
object supports several interesting methods, among them .collection_names()
, which shows all of the collections in this database:
db.collection_names()
[]
It's an empty list right now (except maybe for a system.indexes
collection, which is for internal MongoDB use and which you can ignore for now), because we haven't made any collections yet! Using the Database
object as a dictionary, we can get an object representing a collection:
collection = db['kittens']
print type(collection)
<class 'pymongo.collection.Collection'>
Now we're in business. Let's insert our first document into the collection, using the collection's .insert()
method. In between the parentheses of the .insert()
method, we need to supply an expression that evaluates to a Python dictionary. PyMongo will convert this dictionary into a MongoDB document, and then add that document to the collection. Calling the .insert()
method evaluates to a MongoDB ObjectId
object, which contains a randomly generated number that uniquely identifies the record that we just added.
collection.insert({"name": "Fluffy", "favorite_color": "chartreuse", "lbs": 9.5})
ObjectId('53b2a26e2735fe2db55a9871')
Let's insert a few more records!
collection.insert({"name": "Monsieur Whiskeurs", "favorite_color": "cerulean", "lbs": 10.8})
collection.insert({"name": "Grandpa Pants", "favorite_color": "mauve", "lbs": 14.1})
collection.insert({"name": "Susan B. Meownthony", "favorite_color": "cerulean", "lbs": 9.0})
ObjectId('53b2a2702735fe2db55a9874')
Of course, inserting documents on its own is not very useful. We'd like to be able to retrieve them later. To do so, we can use the .find_one()
method of a collection object. Between the parentheses of the .find_one()
call, we give a Python dictionary that tells MongoDB which documents to return. The .find_one()
evaluates to the document that has an exact match for whichever key/value pairs are specified in the dictionary. To demonstrate:
collection.find_one({"name": "Monsieur Whiskeurs"})
{u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'favorite_color': u'cerulean', u'lbs': 10.8, u'name': u'Monsieur Whiskeurs'}
If more than one document had the value Monsieur Whiskeurs
for the key name
, MongoDB would have returned only the first matching document. If no documents match, this happens:
val = collection.find_one({"name": "Big Shoes"})
print val
None
... the method evaluates to None
.
You may have noticed the key _id
in the document above. We didn't specify that key when we created the document, so where did it come from? It turns out that unless we specify the _id
key manually, MongoDB will add it automatically and give it a randomly generated and unique ObjectId
object as a value.
Let's do that .find_one()
call again and see what else we can do with it.
doc = collection.find_one({"name": "Monsieur Whiskeurs"})
print type(doc)
print doc['favorite_color']
<type 'dict'> cerulean
As you can see, the value returned from .find_one()
is just a Python dictionary. We can use it in any of the ways we usually use Python dictionaries---by getting a value for one of its keys, for example.
EXERCISE: Use the
.find_one()
method to print out thefavorite_color
value for our kitten namedGrandpa Pants
.
The collection object has a method .find()
that allows you to access every document in the collection. It doesn't return a list, but a weird thing called a Cursor
. To get data from a cursor, you either have to use it in a for
loop like this:
for doc in collection.find():
print doc
{u'favorite_color': u'chartreuse', u'_id': ObjectId('53b2a26e2735fe2db55a9871'), u'name': u'Fluffy', u'lbs': 9.5} {u'favorite_color': u'cerulean', u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'name': u'Monsieur Whiskeurs', u'lbs': 10.8} {u'favorite_color': u'mauve', u'_id': ObjectId('53b2a2702735fe2db55a9873'), u'name': u'Grandpa Pants', u'lbs': 14.1} {u'favorite_color': u'cerulean', u'_id': ObjectId('53b2a2702735fe2db55a9874'), u'name': u'Susan B. Meownthony', u'lbs': 9.0}
... or explicitly convert it to a list, with the list()
function:
documents = list(collection.find())
documents
[{u'_id': ObjectId('53b2a26e2735fe2db55a9871'), u'favorite_color': u'chartreuse', u'lbs': 9.5, u'name': u'Fluffy'}, {u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'favorite_color': u'cerulean', u'lbs': 10.8, u'name': u'Monsieur Whiskeurs'}, {u'_id': ObjectId('53b2a2702735fe2db55a9873'), u'favorite_color': u'mauve', u'lbs': 14.1, u'name': u'Grandpa Pants'}, {u'_id': ObjectId('53b2a2702735fe2db55a9874'), u'favorite_color': u'cerulean', u'lbs': 9.0, u'name': u'Susan B. Meownthony'}]
We can also pass a dictionary to .find()
to tell MongoDB to only return a subset of documents, namely, only those documents that match the key/value pairs in the dictionary we put in the parentheses. For example, to fetch only those kittens whose favorite_color
is cerulean
:
cerulean_lovers = list(collection.find({'favorite_color': 'cerulean'}))
cerulean_lovers
[{u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'favorite_color': u'cerulean', u'lbs': 10.8, u'name': u'Monsieur Whiskeurs'}, {u'_id': ObjectId('53b2a2702735fe2db55a9874'), u'favorite_color': u'cerulean', u'lbs': 9.0, u'name': u'Susan B. Meownthony'}]
EXERCISE: Write a list comprehension that evaluates to a list of the names of kittens whose favorite color is cerulean.
You can ask MongoDB how many documents are in a collection with the collection's .count()
method:
collection.count()
4
It's also easy to get a list of distinct values there are for a particular field, using the distinct
method:
collection.distinct("favorite_color")
[u'chartreuse', u'cerulean', u'mauve']
You can remove a single document from a collection with the .remove()
method, passing in a dictionary that describes which documents you want to remove. For example, to .remove()
documents where the name
key has the value Fluffy
:
collection.remove({'name': 'Fluffy'})
list(collection.find())
[{u'_id': ObjectId('53b07f092735fe2d9e61325d'), u'favorite_color': u'cerulean', u'name': u'Monsieur Whiskeurs'}, {u'_id': ObjectId('53b07f092735fe2d9e61325e'), u'favorite_color': u'mauve', u'name': u'Grandpa Pants'}, {u'_id': ObjectId('53b07f092735fe2d9e61325f'), u'favorite_color': u'cerulean', u'name': u'Susan B. Meownthony'}]
You can see that Fluffy
has now gone missing. You can also easily remove all documents from a collection, using the .remove()
method without any parameters. WARNING: Don't run this cell unless you want to remove everything you've inserted so far!
collection.remove()
list(collection.find())
[]
We can be a bit more specific about which documents we want from the collection using MongoDB query selectors. Query selectors take the form of dictionaries that we pass to the .find()
method. Keys in this dictionary should be the field that you want to match against, and the value for such a key should be another dictionary, that has as its key a MongoDB query operator (listed below), and as its value the number to go with the operator. Here's an example, to make it more clear, searching our collection of kittens for documents where the lbs
field is greater than 10
:
list(collection.find({'lbs': {'$gt': 10}}))
[{u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'favorite_color': u'cerulean', u'lbs': 10.8, u'name': u'Monsieur Whiskeurs'}, {u'_id': ObjectId('53b2a2702735fe2db55a9873'), u'favorite_color': u'mauve', u'lbs': 14.1, u'name': u'Grandpa Pants'}]
Other operators that are supported (full list here):
$gt
greater than$gte
greater than or equal$lt
less than$lte
less than or equal$ne
not equal toYou can combine more than one operator for a particular field, in which case MongoDB will find documents that match all criteria:
list(collection.find({'lbs': {'$gt': 9, '$lt': 10.8}}))
[{u'_id': ObjectId('53b2a26e2735fe2db55a9871'), u'favorite_color': u'chartreuse', u'lbs': 9.5, u'name': u'Fluffy'}]
You can also include conditions for more than one field in the dictionary, in which case MongoDB will find documents that match those criteria for each respective field:
list(collection.find({'favorite_color': 'cerulean'}))
[{u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'favorite_color': u'cerulean', u'lbs': 10.8, u'name': u'Monsieur Whiskeurs'}, {u'_id': ObjectId('53b2a2702735fe2db55a9874'), u'favorite_color': u'cerulean', u'lbs': 9.0, u'name': u'Susan B. Meownthony'}]
list(collection.find({'favorite_color': 'cerulean', 'name': {'$ne': 'Monsieur Whiskeurs'}}))
[{u'_id': ObjectId('53b2a2702735fe2db55a9874'), u'favorite_color': u'cerulean', u'lbs': 9.0, u'name': u'Susan B. Meownthony'}]
Another valuable search criterion that MongoDB supports is $regex
, which will return documents that match a regular expression for a particular field. For example, to find all kittens whose name ends with the letter y
:
list(collection.find({'name': {'$regex': 'y$'}}))
[{u'_id': ObjectId('53b2a26e2735fe2db55a9871'), u'favorite_color': u'chartreuse', u'lbs': 9.5, u'name': u'Fluffy'}, {u'_id': ObjectId('53b2a2702735fe2db55a9874'), u'favorite_color': u'cerulean', u'lbs': 9.0, u'name': u'Susan B. Meownthony'}]
EXERCISE: Write a call to
.find()
that returns all kittens whose favorite color begins with the letterc
.
Results from .find()
aren't returned in a particular order. You may find it helpful for this reason to sort the results. You can specify a sort order for results from the .find()
method by tacking on a .sort()
call to the end. It looks like this:
list(collection.find().sort('lbs'))
[{u'_id': ObjectId('53b2a2702735fe2db55a9874'), u'favorite_color': u'cerulean', u'lbs': 9.0, u'name': u'Susan B. Meownthony'}, {u'_id': ObjectId('53b2a26e2735fe2db55a9871'), u'favorite_color': u'chartreuse', u'lbs': 9.5, u'name': u'Fluffy'}, {u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'favorite_color': u'cerulean', u'lbs': 10.8, u'name': u'Monsieur Whiskeurs'}, {u'_id': ObjectId('53b2a2702735fe2db55a9873'), u'favorite_color': u'mauve', u'lbs': 14.1, u'name': u'Grandpa Pants'}]
The parameter you pass to .sort()
specifies which field the documents should be sorted by. Specifying descending order is a bit more tricky:
list(collection.find().sort('lbs', -1))
[{u'_id': ObjectId('53b2a2702735fe2db55a9873'), u'favorite_color': u'mauve', u'lbs': 14.1, u'name': u'Grandpa Pants'}, {u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'favorite_color': u'cerulean', u'lbs': 10.8, u'name': u'Monsieur Whiskeurs'}, {u'_id': ObjectId('53b2a26e2735fe2db55a9871'), u'favorite_color': u'chartreuse', u'lbs': 9.5, u'name': u'Fluffy'}, {u'_id': ObjectId('53b2a2702735fe2db55a9874'), u'favorite_color': u'cerulean', u'lbs': 9.0, u'name': u'Susan B. Meownthony'}]
(The -1
means 'in reverse order'.) The .sort()
method works even if you've specified query selectors in the call to .find()
:
list(collection.find({'lbs': {'$gt': 9.0}}).sort('name'))
[{u'_id': ObjectId('53b2a26e2735fe2db55a9871'), u'favorite_color': u'chartreuse', u'lbs': 9.5, u'name': u'Fluffy'}, {u'_id': ObjectId('53b2a2702735fe2db55a9873'), u'favorite_color': u'mauve', u'lbs': 14.1, u'name': u'Grandpa Pants'}, {u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'favorite_color': u'cerulean', u'lbs': 10.8, u'name': u'Monsieur Whiskeurs'}]
You can also limit the number of results returned from .find()
using the .limit()
method, which, like .sort()
, gets tacked on to the end of .find()
. To return only two kittens:
list(collection.find().limit(2))
[{u'_id': ObjectId('53b2a26e2735fe2db55a9871'), u'favorite_color': u'chartreuse', u'lbs': 9.5, u'name': u'Fluffy'}, {u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'favorite_color': u'cerulean', u'lbs': 10.8, u'name': u'Monsieur Whiskeurs'}]
Search for all kittens weighing less than 10 pounds, limit to one result:
list(collection.find({'lbs': {'$lt': 10}}).limit(1))
[{u'_id': ObjectId('53b2a26e2735fe2db55a9871'), u'favorite_color': u'chartreuse', u'lbs': 9.5, u'name': u'Fluffy'}]
You can put a .limit()
after a .sort()
to get only the first few results from a sorted list of documents. So, for example, to get only the heaviest cat:
list(collection.find().sort("lbs", -1).limit(1))
[{u'_id': ObjectId('53b2a2702735fe2db55a9873'), u'favorite_color': u'mauve', u'lbs': 14.1, u'name': u'Grandpa Pants'}]
If we want our result to only include certain key/value pairs from the document, we can provide a second argument to the find
method. This argument should be a dictionary whose keys are the fields we want included, and whose values are all 1
. For example, to find all kittens whose favorite color is cerulean
, but only return their names, we could do this:
list(collection.find({"favorite_color": "cerulean"}, {"name": 1}))
[{u'_id': ObjectId('53b2a2702735fe2db55a9872'), u'name': u'Monsieur Whiskeurs'}, {u'_id': ObjectId('53b2a2702735fe2db55a9874'), u'name': u'Susan B. Meownthony'}]
The _id
field is always included by default. If we want to get rid of it, we can include the _id
key in our list of fields, giving it a 0
(instead of a 1
):
list(collection.find({"favorite_color": "cerulean"}, {"name": 1, "_id": 0}))
[{u'name': u'Monsieur Whiskeurs'}, {u'name': u'Susan B. Meownthony'}]
I want to take you through a real-world example of consuming data from a source, putting it into MongoDB, then querying MongoDB to find interesting stuff in that data. Specifically, we're going to fetch a big ol' CSV of historic data about congress members from the "Bulk Data" section of govtrack.us. Here's the file, which contains a row for every member of Congress in the history of the United States (who isn't currently a sitting member).
First, let's retrieve the file to our EC2 machines, as we've done in the past with CSV files:
import urllib
urllib.urlretrieve("https://www.govtrack.us/data/congress-legislators/legislators-historic.csv",
"legislators-historic.csv")
('legislators-historic.csv', <httplib.HTTPMessage instance at 0x7f9cd91c12d8>)
Let's play with the csv
library's DictReader
class to see what the data looks like.
import csv
rows = csv.DictReader(open("legislators-historic.csv"))
all_rows = list(rows)
all_rows[0]
{'address': '', 'ballotpedia_id': '', 'bioguide_id': 'B000226', 'birthday': '1745-04-02', 'contact_form': '', 'cspan_id': '', 'facebook': '', 'facebook_id': '', 'first_name': 'Richard', 'gender': 'M', 'govtrack_id': '401222', 'icpsr_id': '507', 'last_name': 'Bassett', 'lis_id': '', 'opensecrets_id': '', 'party': 'Anti-Administration', 'phone': '', 'rss_url': '', 'state': 'DE', 'thomas_id': '', 'twitter': '', 'type': 'sen', 'url': '', 'votesmart_id': '', 'washington_post_id': '', 'wikipedia_id': '', 'youtube': '', 'youtube_id': ''}
What we seem to have here is a dictionary that describes a member of congress. This happens to be one Richard Bassett, born in 1745. So that's pretty cool! I don't really know what most of these fields mean, but we'll take some guesses at them later.
So how about putting all those rows into MongoDB? Here's how it would go. It's pretty simple! I'm going to create a separate collection in our database for these legislators, called legislators
.
legislators_coll = db['legislators']
Now, I'm going to loop through the rows of the table and just... insert each dictionary from DictReader
straight into MongoDB. Easy!
for row in all_rows:
legislators_coll.insert(row)
At this point, the number of documents in the database should match the number of rows in the CSV file. Let's make sure.
len(all_rows) == legislators_coll.count()
True
And how many exactly is that?
legislators_coll.count()
11741
Eleven thousand legislators. Not exactly "big data," I admit, but hopefully you can still see the benefit of having this data in one place without having to re-download and parse the data each time we want to use it.
Let's do some queries on our data now! For example, let's make a list of all legislators who are women.
legislators_coll.find({"gender": "F"}).count()
64
How about a list of legislators who are women, whose party is not Democrat
?
legislators_coll.find({"gender": "F", "party": {"$ne": "Democrat"}}).count()
30
Let's make a list of these women, including their names, states, and birthdays:
list(legislators_coll.find(
{"gender": "F", "party": {"$ne": "Democrat"}},
{"first_name": 1, "last_name": 1, "state": 1, "birthday": 1, "_id": 0}))
[{u'birthday': u'1958-06-05', u'first_name': u'Enid', u'last_name': u'Greene Waldholtz', u'state': u'UT'}, {u'birthday': u'1938-01-27', u'first_name': u'Helen', u'last_name': u'Chenoweth-Hage', u'state': u'ID'}, {u'birthday': u'1931-02-12', u'first_name': u'Constance', u'last_name': u'Morella', u'state': u'MD'}, {u'birthday': u'1929-09-19', u'first_name': u'Marge', u'last_name': u'Roukema', u'state': u'NJ'}, {u'birthday': u'1936-07-29', u'first_name': u'Elizabeth', u'last_name': u'Dole', u'state': u'NC'}, {u'birthday': u'1941-07-29', u'first_name': u'Jennifer', u'last_name': u'Dunn', u'state': u'WA'}, {u'birthday': u'1957-04-05', u'first_name': u'Katherine', u'last_name': u'Harris', u'state': u'FL'}, {u'birthday': u'1962-04-04', u'first_name': u'Melissa', u'last_name': u'Hart', u'state': u'PA'}, {u'birthday': u'1935-01-05', u'first_name': u'Nancy', u'last_name': u'Johnson', u'state': u'CT'}, {u'birthday': u'1936-09-26', u'first_name': u'Sue', u'last_name': u'Kelly', u'state': u'NY'}, {u'birthday': u'1948-01-22', u'first_name': u'Anne', u'last_name': u'Northup', u'state': u'KY'}, {u'birthday': u'1953-06-22', u'first_name': u'Shelley', u'last_name': u'Sekula-Gibbs', u'state': u'TX'}, {u'birthday': u'1946-11-30', u'first_name': u'Barbara', u'last_name': u'Cubin', u'state': u'WY'}, {u'birthday': u'1950-06-29', u'first_name': u'Jo Ann', u'last_name': u'Davis', u'state': u'VA'}, {u'birthday': u'1949-11-20', u'first_name': u'Thelma', u'last_name': u'Drake', u'state': u'VA'}, {u'birthday': u'1949-01-27', u'first_name': u'Marilyn', u'last_name': u'Musgrave', u'state': u'CO'}, {u'birthday': u'1951-07-29', u'first_name': u'Deborah', u'last_name': u'Pryce', u'state': u'OH'}, {u'birthday': u'1960-12-30', u'first_name': u'Heather', u'last_name': u'Wilson', u'state': u'NM'}, {u'birthday': u'1943-10-05', u'first_name': u'Virginia', u'last_name': u'Brown-Waite', u'state': u'FL'}, {u'birthday': u'1954-12-09', u'first_name': u'Mary', u'last_name': u'Fallin', u'state': u'OK'}, {u'birthday': u'1943-07-22', u'first_name': u'Kay', u'last_name': u'Hutchison', u'state': u'TX'}, {u'birthday': u'1947-02-21', u'first_name': u'Olympia', u'last_name': u'Snowe', u'state': u'ME'}, {u'birthday': u'1956-12-14', u'first_name': u'Sandy', u'last_name': u'Adams', u'state': u'FL'}, {u'birthday': u'1937-08-15', u'first_name': u'Judy', u'last_name': u'Biggert', u'state': u'IL'}, {u'birthday': u'1961-10-24', u'first_name': u'Mary', u'last_name': u'Bono Mack', u'state': u'CA'}, {u'birthday': u'1951-05-08', u'first_name': u'Ann Marie', u'last_name': u'Buerkle', u'state': u'NY'}, {u'birthday': u'1959-12-14', u'first_name': u'Nan', u'last_name': u'Hayworth', u'state': u'NY'}, {u'birthday': u'1941-08-01', u'first_name': u'Sue', u'last_name': u'Myrick', u'state': u'NC'}, {u'birthday': u'1951-11-29', u'first_name': u'Jean', u'last_name': u'Schmidt', u'state': u'OH'}, {u'birthday': u'1950-09-16', u'first_name': u'Jo Ann', u'last_name': u'Emerson', u'state': u'MO'}]
How about the youngest five Republican legislators, as determined by their birthday?
list(legislators_coll.find(
{'party': 'Republican'},
{'first_name': 1, 'last_name': 1, 'birthday': 1, 'state': 1, '_id': 0}
).sort("birthday", -1).limit(5)
)
[{u'birthday': u'1976-11-03', u'first_name': u'Ben', u'last_name': u'Quayle', u'state': u'AZ'}, {u'birthday': u'1976-04-20', u'first_name': u'Trey', u'last_name': u'Radel', u'state': u'FL'}, {u'birthday': u'1974-07-31', u'first_name': u'Adam', u'last_name': u'Putnam', u'state': u'FL'}, {u'birthday': u'1971-06-10', u'first_name': u'Bobby', u'last_name': u'Jindal', u'state': u'LA'}, {u'birthday': u'1970-12-23', u'first_name': u'Jeff', u'last_name': u'Landry', u'state': u'LA'}]
How about a list of all distinct parties (witness the varieties of American democracy)?
legislators_coll.distinct("party")
[u'Anti-Administration', u'', u'Pro-Administration', u'Republican', u'Federalist', u'Democratic Republican', u'Unknown', u'Adams', u'Jackson', u'Jackson Republican', u'Crawford Republican', u'Whig', u'Anti-Jacksonian', u'Adams Democrat', u'Nullifier', u'Anti Masonic', u'Anti Jacksonian', u'Jacksonian', u'Democrat', u'Anti Jackson', u'Union Democrat', u'Conservative', u'Ind. Democrat', u'Law and Order', u'American', u'Liberty', u'Free Soil', u'Independent', u'Ind. Republican-Democrat', u'Ind. Whig', u'Unionist', u'States Rights', u'Anti-Lecompton Democrat', u'Constitutional Unionist', u'Independent Democrat', u'Unconditional Unionist', u'Conservative Republican', u'Ind. Republican', u'Liberal Republican', u'National Greenbacker', u'Readjuster Democrat', u'Readjuster', u'Union', u'Union Labor', u'Populist', u'Silver Republican', u'Free Silver', u'Democratic and Union Labor', u'Progressive Republican', u'Progressive', u'Prohibitionist', u'Socialist', u'Farmer-Labor', u'Nonpartisan', u'Coalitionist', u'Popular Democrat', u'American Labor', u'New Progressive', u'Republican-Conservative', u'Democrat-Liberal', u'Democrat/Republican', u'Democrat Farmer Labor']
EXERCISE: Investigate MongoDB's
$nin
operator to write a MongoDB query that returns a list of the names, states, and parties of all legislators whose party is neither Republican nor Democrat.
Great work---you've learned the basics. Where to go next?
Have fun!