Notebook

OU Linked Data Demo - Aggregation Operators¶

As with SQL, SPARQL supports a range of aggregation operators that can be used to generate summary reports over a dataset.

As usual, let's import the necessary libraries and set up the endpoint URL.

In [ ]:

from SPARQLWrapper import SPARQLWrapper, JSON
endpoint_ou="http://data.open.ac.uk/query"

And drawing on the previous OU Linked Data Demo notebook, bring in some helper functions.

In [ ]:

#We should perhaps show how to create a simple package in the first OU notebook that we can then just import?
def runQuery(endpoint,prefix,q):
    ''' Run a SPARQL query with a declared prefix over a specified endpoint '''
    sparql = SPARQLWrapper(endpoint)
    sparql.setQuery(prefix+q)
    sparql.setReturnFormat(JSON)
    return sparql.query().convert()

import pandas as pd
#And some more helpers
def dict2df(results):
    ''' Hack a function to flatten the SPARQL query results and return the column values '''
    data=[]
    for result in results["results"]["bindings"]:
        tmp={}
        for el in result:
            tmp[el]=result[el]['value']
        data.append(tmp)

    df = pd.DataFrame(data)
    return df

def dfResults(endpoint,prefix,q):
    ''' Generate a data frame containing the results of running
        a SPARQL query with a declared prefix over a specified endpoint '''
    return dict2df( runQuery( endpoint, prefix, q ) )

SPARQL queries support a similar set of aggregation operators to SQL. For example, we can group results using the GROUP BY operator.

In [ ]:

prefix='''
PREFIX mlo: <http://purl.org/net/mlo/>
PREFIX aiiso: <http://purl.org/vocab/aiiso/schema#>
'''

#We can rename a query variable or expression using the AS command within a set of brackets:
## ( CONSTRUCTION AS ?newname)
q='''
SELECT ?level (COUNT(?course) AS ?count)
FROM <http://data.open.ac.uk/context/course>
WHERE {
    ?course mlo:location <http://sws.geonames.org/2328926/> .
    ?course a aiiso:Module.
    #?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> "1"^^<http://www.w3.org/2001/XMLSchema#string>.
    ?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> ?level.
} GROUP BY ?level
'''
dfResults(endpoint_ou,prefix,q)

We can also limit searches based on the results of grouped aggregation operations by using the HAVING limit.

In [ ]:

prefix='''
PREFIX mlo: <http://purl.org/net/mlo/>
PREFIX aiiso: <http://purl.org/vocab/aiiso/schema#>
'''

q='''
SELECT ?level (COUNT(?course) AS ?count)
FROM <http://data.open.ac.uk/context/course>
WHERE {
    ?course mlo:location <http://sws.geonames.org/2328926/> .
    ?course a aiiso:Module.
    ?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> ?level.
}
GROUP BY ?level
HAVING (COUNT(?level) >= 25)
'''
dfResults(endpoint_ou,prefix,q)

As with other query languages such as SQL, the ability to use aggregation operators as the basis of reporting means that, to a certain extent, you can push computational requirements onto the query engine and minimise the amount of data that needs to be transported, and then computed on, within the client.