In this notebook you will learn how to create your own in-memory Linked Data RDF triplestore from one or more files of triples stored in a Turtle (.ttl) datafile.
[See also the Getting Started With GP Data - Linked Data demo notebook in the gpdata directory for an example of how to create an RDF graph of data that originates in a tabular data formatted dataset such as a CSV file.]
The Turtle data files for this example come from http://learning-provider.data.ac.uk/
Learning Providers (basic) - 180K
University Groups and Consortia - 26K
Linkset to DBPedia - 22K
Linkset to "Gateway to Research" URIs - 22K
Using this data, we should be able to generate our own Linked Data graph describing UK HE organisations.
#List the triple files
!ls linkedDataTriples/
We can use the rdflib
package to manage a simple in-memory RDF triplestore.
import rdflib
g = rdflib.Graph()
g.parse('linkedDataTriples/groups.ttl', format='n3')
for stmt in g:
print(stmt)
We can also use it to run SPARQL queries over the triple store.
def rdfQuery(graph,q):
ans=graph.query(q)
for row in ans:
for el in row:
print(el,end=" ")
print()
q='''
SELECT DISTINCT ?y ?z {
<http://id.learning-provider.data.ac.uk/ukprn/10007801> ?y ?z
}
'''
rdfQuery(g,q)
q='''
SELECT DISTINCT ?y ?z {
<http://id.learning-provider.data.ac.uk/group/N8_Research_Partnership> ?y ?z
}
'''
rdfQuery(g,q)
q='''
SELECT DISTINCT ?group {
?group <http://xmlns.com/foaf/0.1/member> <http://id.learning-provider.data.ac.uk/ukprn/10007801>
}
'''
rdfQuery(g,q)
rdfQuery(g,q)
r = rdflib.Graph()
r.parse('linkedDataTriples/gtr-linkset.ttl', format='n3')
q='''
SELECT DISTINCT ?y ?z {
<http://gtr.rcuk.ac.uk:80/organisation/7801F008-7C77-45E7-90E9-4345B47D138E> ?y ?z
}
'''
rdfQuery(r,q)
!head linkedDataTriples/gtr-linkset.ttl
q='''
SELECT (COUNT(*) as ?n) WHERE { ?x ?y ?z }
'''
rdfQuery(r,q)
#dbpedia-linkset.ttl groups.ttl gtr-linkset.ttl learning-providers.ttl
l = rdflib.Graph()
l.parse('linkedDataTriples/dbpedia-linkset.ttl', format='n3')
!head linkedDataTriples/dbpedia-linkset.ttl
p = rdflib.Graph()
p.parse('linkedDataTriples/learning-providers.ttl', format='n3')
!head -n 20 linkedDataTriples/learning-providers.ttl
q='''
SELECT ?y ?z {
<http://id.learning-provider.data.ac.uk/ukprn/10007801> ?y ?z
}
'''
rdfQuery(l,q)
We can add the triples contained within multiple graphs together to create a new graph.
q='''
SELECT (COUNT(*) as ?count) {
?x ?y ?z
}
'''
rdfQuery(l,q)
rdfQuery(l+g,q)
q='''
SELECT ?y ?z {
<http://id.learning-provider.data.ac.uk/ukprn/10007801> ?y ?z
}
'''
rdfQuery(l+g,q)
rdfQuery(p,q)
q='''
SELECT ?group {
<http://id.learning-provider.data.ac.uk/ukprn/10007801> ?y ?z.
?r ?y ?z.
?group <http://xmlns.com/foaf/0.1/member> ?r.
}
'''
rdfQuery(l,q)
q='''
SELECT DISTINCT ?group {
?group <http://xmlns.com/foaf/0.1/member> <http://id.learning-provider.data.ac.uk/ukprn/10007801>.
}
'''
rdfQuery(l+g,q)
!head -n 22 linkedDataTriples/dbpedia-linkset.ttl