Notebook

From the docs¶

http://api.dp.la/v2 is the base URL of the DPLA API.
items and collections are the two resource types you can request.

launch of dp.la API v2: https://cyber.law.harvard.edu/lists/arc/dpla-tech/2013-04/msg00004.html

Need a key, which you can get, using incantation from Tom Morris:

curl -v -XPOST http://api.dp.la/v2/api_key/you@your_email.com

As Tom wrote: "If you use Gmail, check your spam folder. The key is sent immediately."

Would be nice to compare to Europeana API: http://pro.europeana.eu/api

Use special library to parse json-ld?¶

Reading http://json-ld.org to get the lowdown on json-ld Should I use one of the Python libs for json-ld? if so, which one?

OK, I'll try https://github.com/digitalbazaar/pyld because it's being actively developed:

git clone git://github.com/digitalbazaar/pyld.git
cd pyld/
python setup.py install

but I ran into installation problems -- so I'll let go of looking at json-ld right now. (There might be a fix: https://github.com/digitalbazaar/pyld/commit/1173af0db20a1a27ba2fcf15bde531c0bf1fca2b )

In [1]:

# from pyld import jsonld

In [2]:

# Goal:  feed a bunch of search terms to try to get at some collections

# API doc: https://github.com/dpla/platform/
# test data sources: http://dp.la/wiki/Platform_test_data_sources

import requests
import json
import urllib
from itertools import islice

from CREDENTIALS import DPLA_KEY

# http://api.dp.la/v2/items?api_key=YOUR_API_KEY

# Retrieve an item by ID
# http://api.dp.la/v2/items/a4e2346032cae75b0832abe064c14bcb

# Retrieve multiple items by ID
# http://api.dp.la/v1/items/a4e2346032cae75b0832abe0644e9b26,a4e2346032cae75b0832abe064c14bcb


def dpla_query(**kw_input):
    
    kwargs = {"page_size": 20, "page": 1, "sort_order":"asc", "api_key":DPLA_KEY}
    
    # fudgy -- allow an extra parameter to allow for ones that can fit kw_input -- e.g., spatial.coordinates
    extras = kw_input.pop('extras',{})
    kw_input.update(extras)
    
    kwargs.update(kw_input)
    kwargs = dict([(k,v) for (k,v) in kwargs.items() if v is not None])
    
    # asc vs desc
    
    # available text search fields
    text_search_fields = ("title", "description", "dplaContributor", "creator", "sourceResource.type", "publisher", "format", "rights", "contributor", "spatial")
    expected_doc_fields = ['title','description', 'creator', 'type', 'publisher', 'format', 'rights', 'contributor', 'created', 'spatial', 'temporal', 'source']
    
    # temporal fields
    # http://api.dp.la/v1/items?temporal.after=1963-11-01&temporal.before=1963-11-30
    
    # location available...not implemented here
    more_items = True
    
   # content["count"], content["start"], content["limit"]
    
    while more_items:
        
        url = "http://api.dp.la/v2/items?" + urllib.urlencode(kwargs)
        #print url
        r = requests.get(url)
        content = json.loads(r.content)
        
        if len(content.get("docs", [])):
            for doc in content["docs"]:
                yield (doc, content["count"])
            if kwargs['sort_order'] == 'desc':
                kwargs['page'] -= 1
            else:
                kwargs['page'] += 1
        else:
            more_items = False

In [3]:

# search terms to feed in 

SEARCH_TERMS = ["Bach", "tree", "horse", "cow", "Gore"]

# figure out collections 
collections = set()

for term in islice(SEARCH_TERMS,1):
    results = list(islice(dpla_query(q=term),100)) 
            
    for (i, (doc, count)) in enumerate(results):
        collections.add(doc.get('isPartOf', {'title':None}).get('title'))
    
print len(collections)

# for each collection let's figure out the number of items in the collection

from IPython.display import HTML
from jinja2 import Template

num_items = []

for (i, collection) in enumerate(sorted(collections)):
    if collection is not None:
        size_collection = list(islice(dpla_query(isPartOf=collection),1))[0][1] if collection is not None else 0
        url = "http://api.dp.la/v1/items?" + urllib.urlencode({'isPartOf': collection})
        num_items.append((collection, size_collection, url))
    else:
        num_items.append((None, 0, ""))
    
TABLE_TEMPLATE = """<table>
 <tr>
   <th>Collection</th>
   <th>Number of items</th>
   <th>API</th>
 </tr>
 {% for num_item in num_items %}
 <tr>
  <td>{{num_item.0}}</td>
  <td>{{num_item.1}}</td>
  <td><a href="{{num_item.2}}">{{num_item.2}}</a></td>
{% endfor %}
 </tr>
"""
    
template = Template(TABLE_TEMPLATE)
HTML(template.render(num_items=num_items))  

Out[3]:

In [4]:

r = dpla_query(**{'q':'tiger', 'sourceResource.type':'image'})

In [5]:

r0 = list(islice(r,10))[0]

In [6]:

print "keys", r0[0].keys()

print "count", r0[1]
print "item_url", "http://dp.la/item/{0}".format(r0[0]['id']) 
print "id", r0[0]['id']                                                 

keys [u'_id', u'hasView', u'sourceResource', u'_rev', u'object', u'ingestDate', u'originalRecord', u'score', u'isShownAt', u'provider', u'@context', u'ingestType', u'dataProvider', u'@id', u'id']
count 717
item_url http://dp.la/item/84a6b96a7a9e3d17562c6d7c8eac4acb
id 84a6b96a7a9e3d17562c6d7c8eac4acb

In [7]:

HTML("""<a href="{0}">item</a>""".format("http://dp.la/item/{0}".format(r0[0]['id'])))

Out[7]:

item

In [8]:

# namespaces 
r0[0]['@context']

Out[8]:

{u'@vocab': u'http://purl.org/dc/terms/',
 u'LCSH': u'http://id.loc.gov/authorities/subjects',
 u'aggregatedDigitalResource': u'dpla:aggregatedDigitalResource',
 u'begin': {u'@id': u'dpla:dateRangeStart', u'@type': u'xsd:date'},
 u'collection': u'dpla:aggregation',
 u'coordinates': u'dpla:coordinates',
 u'dataProvider': u'edm:dataProvider',
 u'dpla': u'http://dp.la/terms/',
 u'edm': u'http://www.europeana.eu/schemas/edm/',
 u'end': {u'@id': u'dpla:end', u'@type': u'xsd:date'},
 u'hasView': u'edm:hasView',
 u'isShownAt': u'edm:isShownAt',
 u'name': u'xsd:string',
 u'object': u'edm:object',
 u'originalRecord': u'dpla:originalRecord',
 u'provider': u'edm:provider',
 u'sourceResource': u'edm:sourceResource',
 u'state': u'dpla:state',
 u'stateLocatedIn': u'dpla:stateLocatedIn'}

In [9]:

r0[0]['dataProvider']

Out[9]:

u'National Archives at College Park - Still Pictures'

In [10]:

r0[0]['hasView']

Out[10]:

[{u'format': u'image/jpeg',
  u'url': u'http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15515.jpeg'}]

In [11]:

r0[0]['object']

Out[11]:

u'http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15515_t.jpeg'

In [12]:

results = dpla_query(**{'q':'tiger', 'sourceResource.type':'image'})
items = list([result[0] for result in islice(results,100)])

for item in items:
    print item.get('object', None)

http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15515_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15215_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15552_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15540_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12182_t.jpeg
http://media.artstor.net/imgstor/size2/yale/british/yale_british_526_8b_srgb.jpg
http://content.lib.utah.edu/utils/getthumbnail/collection/VE_Photos/id/8126
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12177_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12179_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15217_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15196_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15513_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15526_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15213_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12181_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15210_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15220_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15195_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15547_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15527_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15220_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12180_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15219_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15189_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15514_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15207_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15539_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15519_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15216_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15554_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15193_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15545_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15194_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15546_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15555_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15533_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15211_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15201_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15535_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15203_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15522_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15535_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15203_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15528_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15548_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15537_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15205_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15517_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15532_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15200_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15544_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15192_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15524_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15541_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15209_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15521_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00170_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15529_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15197_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15549_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15197_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15549_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15208_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15218_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15188_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15520_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15202_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15534_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15212_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00171_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15530_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15550_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-15198_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00170_t.jpeg
http://digital.library.unlv.edu/cgi-bin/thumbnail.exe?CISOROOT=/snv&CISOPTR=289
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-12178_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00485_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-00171_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-17143_t.jpeg
http://content.lib.utah.edu/utils/getthumbnail/collection/coa/id/5443
http://content.lib.utah.edu/utils/getthumbnail/collection/coa/id/5373
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-17895_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DA-SD-05-06503_t.jpeg
http://media.nara.gov/stillpix/330-cfd/1991/DF-ST-91-06474_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2007/DN-SD-07-07719_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-16908_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-17245_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00309_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-00180_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00454_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-00139_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-00339_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2005/DF-SD-05-00164_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2004/DF-SD-04-17141_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DM-SD-03-14162_t.jpeg
http://media.nara.gov/media/images/21/30/21-2989t.gif
http://media.nara.gov/stillpix/330-cfd/1986/DF-ST-86-04924_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2003/DF-SD-03-17889_t.jpeg
http://media.nara.gov/stillpix/330-cfd/2007/DN-SD-07-07722_t.jpeg
http://media.nara.gov/stillpix/330-cfd/1982/DF-ST-82-00473_t.jpeg

In [13]:

results = dpla_query(**{'q':'tiger', 'sourceResource.type':'image'})

items = list([result[0] for result in islice(results,10)])

TABLE_TEMPLATE = """
 {% for item in items %}
<img src="{{item.object}}"/>
 {% endfor %}
"""
    
template = Template(TABLE_TEMPLATE)
HTML(template.render(items=items)) 

Out[13]:

Rights challenge¶

In [14]:

results = dpla_query(**{'q':'tiger', 'sourceResource.type':'image'})

for result in islice(results,10):
    print result[0]['sourceResource'].get('rights')

Restrictions: Unrestricted; Use status: Unrestricted
Restrictions: Unrestricted; Use status: Unrestricted
Restrictions: Unrestricted; Use status: Unrestricted
Restrictions: Unrestricted; Use status: Unrestricted
Restrictions: Unrestricted; Use status: Unrestricted
None
Digital Image, copyright 2010 Uintah County Library
Restrictions: Unrestricted; Use status: Unrestricted
Restrictions: Unrestricted; Use status: Unrestricted
Restrictions: Unrestricted; Use status: Unrestricted

Follow-up¶

Revise my code to get reliable access to object URI: http://dp.la/info/forums/topic/how-to-reliably-find-url-for-returned-object-in-api/#post-8327

In [ ]:

Collection	Number of items	API
None	0