An example of using a Python MySQL library to access the OpenWorm Semantic Database. Further down we use RDFlib to begin generating RDF from this database.
Make sure we have the right libraries installed:
!sudo pip install PyMySQL
Downloading/unpacking PyMySQL Downloading PyMySQL-0.5.tar.gz Running setup.py egg_info for package PyMySQL Installing collected packages: PyMySQL Running setup.py install for PyMySQL Successfully installed PyMySQL Cleaning up...
!sudo pip install rdflib
Requirement already satisfied (use --upgrade to upgrade): rdflib in /usr/local/lib/python2.7/dist-packages Requirement already satisfied (use --upgrade to upgrade): isodate in /usr/local/lib/python2.7/dist-packages (from rdflib) Requirement already satisfied (use --upgrade to upgrade): pyparsing<=1.5.7 in /usr/local/lib/python2.7/dist-packages (from rdflib) Requirement already satisfied (use --upgrade to upgrade): SPARQLWrapper in /usr/local/lib/python2.7/dist-packages (from rdflib) Requirement already satisfied (use --upgrade to upgrade): html5lib in /usr/local/lib/python2.7/dist-packages (from rdflib) Requirement already satisfied (use --upgrade to upgrade): six in /usr/local/lib/python2.7/dist-packages (from html5lib->rdflib) Cleaning up...
Simple example of pulling data out of the database -- no RDF yet.
#!/usr/bin/env python
import pymysql
#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')
conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
cur = conn.cursor()
#let's look at IDs and names coming from the database
cur.execute("SELECT DISTINCT ID, Entity FROM tblentity LIMIT 10")
# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
print(r)
cur.close()
conn.close()
(1L, 'Neuron') (2L, 'Interneuron') (3L, 'Sensory') (4L, 'Motor') (5L, 'Interneuron/Motor') (6L, 'Pharyngeal interneuron') (7L, 'Sensory/Interneuron/Motor') (8L, 'Sensory/Interneuron') (9L, 'Ring Motor') (10L, 'Ring/Pharynx')
Now a super simple example drawing from a basic approach to create RDF triples. This demonstrates we know how to build up a simple URIRef with an OpenWorm namespace.
from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF
from rdflib import URIRef, BNode, Literal
g = Graph()
n = Namespace("http://openworm.org/entities/")
print(n['worm'])
http://openworm.org/entities/worm
Now we combine the last two examples together and we make a simple RDF document
#!/usr/bin/env python
import pymysql
from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF, RDFS
from rdflib import URIRef, BNode, Literal
import urllib
#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')
conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
cur = conn.cursor()
cur.execute("SELECT DISTINCT ID, Entity FROM tblentity LIMIT 10")
n = Namespace("http://openworm.org/entities/")
# print cur.description
g = Graph()
# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
#first item is a number -- needs to be converted to a string
first = str(r[0])
#second item is text
second = str(r[1])
# This is the backbone of any RDF graph. The unique
# ID for each entity is encoded as a URI and every other piece of
# knowledge about that entity is connected via triples to that URI
# In this case, we connect the common name of that entity to the
# root URI via the RDFS label property.
g.add( (n[first], RDFS.label, Literal(second)) )
cur.close()
conn.close()
print g.serialize(format='turtle')
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://openworm.org/entities/1> rdfs:label "Neuron" . <http://openworm.org/entities/10> rdfs:label "Ring/Pharynx" . <http://openworm.org/entities/2> rdfs:label "Interneuron" . <http://openworm.org/entities/3> rdfs:label "Sensory" . <http://openworm.org/entities/4> rdfs:label "Motor" . <http://openworm.org/entities/5> rdfs:label "Interneuron/Motor" . <http://openworm.org/entities/6> rdfs:label "Pharyngeal interneuron" . <http://openworm.org/entities/7> rdfs:label "Sensory/Interneuron/Motor" . <http://openworm.org/entities/8> rdfs:label "Sensory/Interneuron" . <http://openworm.org/entities/9> rdfs:label "Ring Motor" .
This example pulls data from the relationship table in the OpenWorm semantic database
#!/usr/bin/env python
import pymysql
from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF, RDFS
from rdflib import URIRef, BNode, Literal
import urllib
#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')
conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
cur = conn.cursor()
#first step, grab all entities and add them to the graph
cur.execute("SELECT DISTINCT ID, Entity FROM tblentity LIMIT 10")
n = Namespace("http://openworm.org/entities/")
# print cur.description
g = Graph()
# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
#first item is a number -- needs to be converted to a string
first = str(r[0])
#second item is text
second = str(r[1])
# This is the backbone of any RDF graph. The unique
# ID for each entity is encoded as a URI and every other piece of
# knowledge about that entity is connected via triples to that URI
# In this case, we connect the common name of that entity to the
# root URI via the RDFS label property.
g.add( (n[first], RDFS.label, Literal(second)) )
#second stem, get the relationships between them and add them to the graph
cur.execute("SELECT DISTINCT EnID1, Relation, EnID2 FROM tblrelationship LIMIT 10")
# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
print r
#all items are numbers -- need to be converted to a string
first = str(r[0])
second = str(r[1])
third = str(r[2])
g.add( (n[first], n[second], n[third]) )
cur.close()
conn.close()
print g.serialize(format='turtle')
(39L, 1516L, 1438L) (39L, 1516L, 1441L) (48L, 1516L, 1428L) (48L, 1516L, 1429L) (48L, 1516L, 1452L) (48L, 1516L, 1453L) (49L, 1516L, 1430L) (49L, 1516L, 1431L) (49L, 1516L, 1432L) (49L, 1516L, 1433L) @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix xml: <http://www.w3.org/XML/1998/namespace> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <http://openworm.org/entities/1> rdfs:label "Neuron" . <http://openworm.org/entities/10> rdfs:label "Ring/Pharynx" . <http://openworm.org/entities/2> rdfs:label "Interneuron" . <http://openworm.org/entities/3> rdfs:label "Sensory" . <http://openworm.org/entities/39> <http://openworm.org/entities/1516> <http://openworm.org/entities/1438>, <http://openworm.org/entities/1441> . <http://openworm.org/entities/4> rdfs:label "Motor" . <http://openworm.org/entities/48> <http://openworm.org/entities/1516> <http://openworm.org/entities/1428>, <http://openworm.org/entities/1429>, <http://openworm.org/entities/1452>, <http://openworm.org/entities/1453> . <http://openworm.org/entities/49> <http://openworm.org/entities/1516> <http://openworm.org/entities/1430>, <http://openworm.org/entities/1431>, <http://openworm.org/entities/1432>, <http://openworm.org/entities/1433> . <http://openworm.org/entities/5> rdfs:label "Interneuron/Motor" . <http://openworm.org/entities/6> rdfs:label "Pharyngeal interneuron" . <http://openworm.org/entities/7> rdfs:label "Sensory/Interneuron/Motor" . <http://openworm.org/entities/8> rdfs:label "Sensory/Interneuron" . <http://openworm.org/entities/9> rdfs:label "Ring Motor" .
This example pulls information from the entity table and the relationship table together in the same graph and performs a very simple SPARQL query on the new graph to demonstrate it works
#!/usr/bin/env python
import pymysql
from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF, RDFS
from rdflib import URIRef, BNode, Literal
import urllib
#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')
conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
cur = conn.cursor()
#first step, grab all entities and add them to the graph
cur.execute("SELECT DISTINCT ID, Entity FROM tblentity")
n = Namespace("http://openworm.org/entities/")
# print cur.description
g = Graph()
# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
#first item is a number -- needs to be converted to a string
first = str(r[0])
#second item is text
second = str(r[1])
# This is the backbone of any RDF graph. The unique
# ID for each entity is encoded as a URI and every other piece of
# knowledge about that entity is connected via triples to that URI
# In this case, we connect the common name of that entity to the
# root URI via the RDFS label property.
g.add( (n[first], RDFS.label, Literal(second)) )
#second stem, get the relationships between them and add them to the graph
cur.execute("SELECT DISTINCT EnID1, Relation, EnID2 FROM tblrelationship")
# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
#print r
#all items are numbers -- need to be converted to a string
first = str(r[0])
second = str(r[1])
third = str(r[2])
g.add( (n[first], n[second], n[third]) )
cur.close()
conn.close()
print("going to get results...")
qres = g.query(
"""SELECT ?subject ?predicate ?object
WHERE {
?subject ?predicate ?object.
} LIMIT 5""")
print("printing results")
print("The graph has " + str(len(g)) + " items in it")
print("Name--not necessarily in strign format: ")
print(qres.result[0])
# when done!
g.close()
going to get results... printing results The graph has 5773 items in it Name--not necessarily in strign format: (rdflib.term.URIRef(u'http://openworm.org/entities/190'), rdflib.term.URIRef(u'http://openworm.org/entities/356'), rdflib.term.URIRef(u'http://openworm.org/entities/11'))
#!/usr/bin/env python
import pymysql
from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF, RDFS
from rdflib import URIRef, BNode, Literal
import urllib
#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')
conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
cur = conn.cursor()
#first step, grab all entities and add them to the graph
cur.execute("SELECT DISTINCT ID, Entity FROM tblentity")
n = Namespace("http://openworm.org/entities/")
# print cur.description
g = Graph()
# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
#first item is a number -- needs to be converted to a string
first = str(r[0])
#second item is text
second = str(r[1])
# This is the backbone of any RDF graph. The unique
# ID for each entity is encoded as a URI and every other piece of
# knowledge about that entity is connected via triples to that URI
# In this case, we connect the common name of that entity to the
# root URI via the RDFS label property.
g.add( (n[first], RDFS.label, Literal(second)) )
#second stem, get the relationships between them and add them to the graph
cur.execute("SELECT DISTINCT EnID1, Relation, EnID2 FROM tblrelationship")
# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
#print r
#all items are numbers -- need to be converted to a string
first = str(r[0])
second = str(r[1])
third = str(r[2])
g.add( (n[first], n[second], n[third]) )
cur.close()
conn.close()
print("going to get results...")
qres = g.query(
"""SELECT ?predLabel ?objLabel #we want to get out the labels associated with the predicates and the objects
WHERE {
?AVALnode ?p "AVAL". #we are looking first for the 'AVALnode' that is the anchor of all information about the AVAL neuron
?AVALnode ?predicate ?object .# having identified that node, here we match any predicate and object associated with the AVALnode
?predicate rdfs:label ?predLabel .#for the predicates, look up their plain text label (because otherwise we only have URIs)
?object rdfs:label ?objLabel #for the object, look up their plain text label.
} ORDER BY ?predLabel #sort by the predicate""")
print("printing results")
print("The graph has " + str(len(g)) + " items in it\n\n")
print "AVAL has the following information stored about it: \n"
for r in qres.result:
print str(r[0]), str(r[1])
# when done!
g.close()
going to get results... printing results The graph has 5773 items in it AVAL has the following information stored about it: Neuropeptide FLP-1 Neuropeptide FLP-18 Receptor GLR-5 Receptor NMR-1 Receptor NMR-2 Receptor GLR-2 Receptor GLR-1 Receptor GGR-3 Receptor GLR-4 Receptor UNC-8 gap junction VA8 gap junction DA7 gap junction SABD gap junction PVCL gap junction AVAR gap junction URYDL gap junction URYVR gap junction VA4 gap junction AS5 gap junction VA3 gap junction VB9 gap junction DA5 gap junction SDQR gap junction VA1 gap junction VA5 gap junction VA6 gap junction DA6 gap junction PVPL gap junction VA7 gap junction AS8 gap junction VA11 gap junction PVCR gap junction AS1 gap junction DB6 gap junction AVJL gap junction RIMR gap junction VA10 gap junction VA9 gap junction SABVR gap junction FLPL gap junction AS3 gap junction DA2 gap junction VA2 gap junction DA1 gap junction LUAL gap junction AS9 gap junction DA4 gap junction AS10 gap junction AS7 gap junction AS6 is a Neuron send AS5 send AVBR send DA6 send LUAL send AS4 send AS11 send DB5 send AS3 send VA11 send VA9 send VA8 send AVDL send VA2 send DA3 send AS10 send AVHL send VA6 send VA3 send AS2 send DA9 send AS9 send AS7 send VA10 send DB6 send DA5 send VA5 send DA1 send DA2 send DA7 send AS8 send DA4 send DA8 send VA4 send VA12 send PVCL send AVAR send PVCR