As with SQL, SPARQL supports a range of aggregation operators that can be used to generate summary reports over a dataset.
As usual, let's import the necessary libraries and set up the endpoint URL.
from SPARQLWrapper import SPARQLWrapper, JSON
endpoint_ou="http://data.open.ac.uk/query"
And drawing on the previous OU Linked Data Demo notebook, bring in some helper functions.
#We should perhaps show how to create a simple package in the first OU notebook that we can then just import?
def runQuery(endpoint,prefix,q):
''' Run a SPARQL query with a declared prefix over a specified endpoint '''
sparql = SPARQLWrapper(endpoint)
sparql.setQuery(prefix+q)
sparql.setReturnFormat(JSON)
return sparql.query().convert()
import pandas as pd
#And some more helpers
def dict2df(results):
''' Hack a function to flatten the SPARQL query results and return the column values '''
data=[]
for result in results["results"]["bindings"]:
tmp={}
for el in result:
tmp[el]=result[el]['value']
data.append(tmp)
df = pd.DataFrame(data)
return df
def dfResults(endpoint,prefix,q):
''' Generate a data frame containing the results of running
a SPARQL query with a declared prefix over a specified endpoint '''
return dict2df( runQuery( endpoint, prefix, q ) )
SPARQL queries support a similar set of aggregation operators to SQL. For example, we can group results using the GROUP BY
operator.
prefix='''
PREFIX mlo: <http://purl.org/net/mlo/>
PREFIX aiiso: <http://purl.org/vocab/aiiso/schema#>
'''
#We can rename a query variable or expression using the AS command within a set of brackets:
## ( CONSTRUCTION AS ?newname)
q='''
SELECT ?level (COUNT(?course) AS ?count)
FROM <http://data.open.ac.uk/context/course>
WHERE {
?course mlo:location <http://sws.geonames.org/2328926/> .
?course a aiiso:Module.
#?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> "1"^^<http://www.w3.org/2001/XMLSchema#string>.
?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> ?level.
} GROUP BY ?level
'''
dfResults(endpoint_ou,prefix,q)
We can also limit searches based on the results of grouped aggregation operations by using the HAVING
limit.
prefix='''
PREFIX mlo: <http://purl.org/net/mlo/>
PREFIX aiiso: <http://purl.org/vocab/aiiso/schema#>
'''
q='''
SELECT ?level (COUNT(?course) AS ?count)
FROM <http://data.open.ac.uk/context/course>
WHERE {
?course mlo:location <http://sws.geonames.org/2328926/> .
?course a aiiso:Module.
?course <http://data.open.ac.uk/saou/ontology#OUCourseLevel> ?level.
}
GROUP BY ?level
HAVING (COUNT(?level) >= 25)
'''
dfResults(endpoint_ou,prefix,q)
As with other query languages such as SQL, the ability to use aggregation operators as the basis of reporting means that, to a certain extent, you can push computational requirements onto the query engine and minimise the amount of data that needs to be transported, and then computed on, within the client.