Notebook

An example of using a Python MySQL library to access the OpenWorm Semantic Database. Further down we use RDFlib to begin generating RDF from this database.

Make sure we have the right libraries installed:

In [1]:

!sudo pip install PyMySQL

Downloading/unpacking PyMySQL
  Downloading PyMySQL-0.5.tar.gz
  Running setup.py egg_info for package PyMySQL
    
Installing collected packages: PyMySQL
  Running setup.py install for PyMySQL
    
Successfully installed PyMySQL
Cleaning up...

In [37]:

!sudo pip install rdflib

Requirement already satisfied (use --upgrade to upgrade): rdflib in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied (use --upgrade to upgrade): isodate in /usr/local/lib/python2.7/dist-packages (from rdflib)
Requirement already satisfied (use --upgrade to upgrade): pyparsing<=1.5.7 in /usr/local/lib/python2.7/dist-packages (from rdflib)
Requirement already satisfied (use --upgrade to upgrade): SPARQLWrapper in /usr/local/lib/python2.7/dist-packages (from rdflib)
Requirement already satisfied (use --upgrade to upgrade): html5lib in /usr/local/lib/python2.7/dist-packages (from rdflib)
Requirement already satisfied (use --upgrade to upgrade): six in /usr/local/lib/python2.7/dist-packages (from html5lib->rdflib)
Cleaning up...

Simple example of pulling data out of the database -- no RDF yet.

In [17]:

#!/usr/bin/env python

import pymysql

#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')

conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
   
cur = conn.cursor()

#let's look at IDs and names coming from the database
cur.execute("SELECT DISTINCT ID, Entity FROM tblentity LIMIT 10")

# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
   print(r)

cur.close()
conn.close()

(1L, 'Neuron')
(2L, 'Interneuron')
(3L, 'Sensory')
(4L, 'Motor')
(5L, 'Interneuron/Motor')
(6L, 'Pharyngeal interneuron')
(7L, 'Sensory/Interneuron/Motor')
(8L, 'Sensory/Interneuron')
(9L, 'Ring Motor')
(10L, 'Ring/Pharynx')

Now a super simple example drawing from a basic approach to create RDF triples. This demonstrates we know how to build up a simple URIRef with an OpenWorm namespace.

In [16]:

from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF
from rdflib import URIRef, BNode, Literal

g = Graph()

n = Namespace("http://openworm.org/entities/")

print(n['worm'])

http://openworm.org/entities/worm

Now we combine the last two examples together and we make a simple RDF document

In [1]:

#!/usr/bin/env python

import pymysql
from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF, RDFS
from rdflib import URIRef, BNode, Literal
import urllib

#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')

conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
   
cur = conn.cursor()

cur.execute("SELECT DISTINCT ID, Entity FROM tblentity LIMIT 10")

n = Namespace("http://openworm.org/entities/")

# print cur.description

g = Graph()

# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
    #first item is a number -- needs to be converted to a string
    first = str(r[0])
    #second item is text 
    second = str(r[1])
    
    # This is the backbone of any RDF graph.  The unique
    # ID for each entity is encoded as a URI and every other piece of 
    # knowledge about that entity is connected via triples to that URI
    # In this case, we connect the common name of that entity to the 
    # root URI via the RDFS label property.
    g.add( (n[first], RDFS.label, Literal(second)) )

cur.close()
conn.close()

print g.serialize(format='turtle')

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://openworm.org/entities/1> rdfs:label "Neuron" .

<http://openworm.org/entities/10> rdfs:label "Ring/Pharynx" .

<http://openworm.org/entities/2> rdfs:label "Interneuron" .

<http://openworm.org/entities/3> rdfs:label "Sensory" .

<http://openworm.org/entities/4> rdfs:label "Motor" .

<http://openworm.org/entities/5> rdfs:label "Interneuron/Motor" .

<http://openworm.org/entities/6> rdfs:label "Pharyngeal interneuron" .

<http://openworm.org/entities/7> rdfs:label "Sensory/Interneuron/Motor" .

<http://openworm.org/entities/8> rdfs:label "Sensory/Interneuron" .

<http://openworm.org/entities/9> rdfs:label "Ring Motor" .

This example pulls data from the relationship table in the OpenWorm semantic database

In [4]:

#!/usr/bin/env python

import pymysql
from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF, RDFS
from rdflib import URIRef, BNode, Literal
import urllib

#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')

conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
   
cur = conn.cursor()

#first step, grab all entities and add them to the graph

cur.execute("SELECT DISTINCT ID, Entity FROM tblentity LIMIT 10")

n = Namespace("http://openworm.org/entities/")

# print cur.description

g = Graph()

# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
    #first item is a number -- needs to be converted to a string
    first = str(r[0])
    #second item is text 
    second = str(r[1])
    
    # This is the backbone of any RDF graph.  The unique
    # ID for each entity is encoded as a URI and every other piece of 
    # knowledge about that entity is connected via triples to that URI
    # In this case, we connect the common name of that entity to the 
    # root URI via the RDFS label property.
    g.add( (n[first], RDFS.label, Literal(second)) )


#second stem, get the relationships between them and add them to the graph
cur.execute("SELECT DISTINCT EnID1, Relation, EnID2 FROM tblrelationship LIMIT 10")

# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
    print r
    #all items are numbers -- need to be converted to a string
    first = str(r[0])
    second = str(r[1])
    third = str(r[2])
    
    g.add( (n[first], n[second], n[third]) )

cur.close()
conn.close()

print g.serialize(format='turtle')

(39L, 1516L, 1438L)
(39L, 1516L, 1441L)
(48L, 1516L, 1428L)
(48L, 1516L, 1429L)
(48L, 1516L, 1452L)
(48L, 1516L, 1453L)
(49L, 1516L, 1430L)
(49L, 1516L, 1431L)
(49L, 1516L, 1432L)
(49L, 1516L, 1433L)
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://openworm.org/entities/1> rdfs:label "Neuron" .

<http://openworm.org/entities/10> rdfs:label "Ring/Pharynx" .

<http://openworm.org/entities/2> rdfs:label "Interneuron" .

<http://openworm.org/entities/3> rdfs:label "Sensory" .

<http://openworm.org/entities/39> <http://openworm.org/entities/1516> <http://openworm.org/entities/1438>,
        <http://openworm.org/entities/1441> .

<http://openworm.org/entities/4> rdfs:label "Motor" .

<http://openworm.org/entities/48> <http://openworm.org/entities/1516> <http://openworm.org/entities/1428>,
        <http://openworm.org/entities/1429>,
        <http://openworm.org/entities/1452>,
        <http://openworm.org/entities/1453> .

<http://openworm.org/entities/49> <http://openworm.org/entities/1516> <http://openworm.org/entities/1430>,
        <http://openworm.org/entities/1431>,
        <http://openworm.org/entities/1432>,
        <http://openworm.org/entities/1433> .

<http://openworm.org/entities/5> rdfs:label "Interneuron/Motor" .

<http://openworm.org/entities/6> rdfs:label "Pharyngeal interneuron" .

<http://openworm.org/entities/7> rdfs:label "Sensory/Interneuron/Motor" .

<http://openworm.org/entities/8> rdfs:label "Sensory/Interneuron" .

<http://openworm.org/entities/9> rdfs:label "Ring Motor" .

This example pulls information from the entity table and the relationship table together in the same graph and performs a very simple SPARQL query on the new graph to demonstrate it works

In [9]:

#!/usr/bin/env python

import pymysql
from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF, RDFS
from rdflib import URIRef, BNode, Literal
import urllib

#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')

conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
   
cur = conn.cursor()

#first step, grab all entities and add them to the graph

cur.execute("SELECT DISTINCT ID, Entity FROM tblentity")

n = Namespace("http://openworm.org/entities/")

# print cur.description

g = Graph()

# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
    #first item is a number -- needs to be converted to a string
    first = str(r[0])
    #second item is text 
    second = str(r[1])
    
    # This is the backbone of any RDF graph.  The unique
    # ID for each entity is encoded as a URI and every other piece of 
    # knowledge about that entity is connected via triples to that URI
    # In this case, we connect the common name of that entity to the 
    # root URI via the RDFS label property.
    g.add( (n[first], RDFS.label, Literal(second)) )


#second stem, get the relationships between them and add them to the graph
cur.execute("SELECT DISTINCT EnID1, Relation, EnID2 FROM tblrelationship")

# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
    #print r
    #all items are numbers -- need to be converted to a string
    first = str(r[0])
    second = str(r[1])
    third = str(r[2])
    
    g.add( (n[first], n[second], n[third]) )

cur.close()
conn.close()

print("going to get results...")

qres = g.query(
     """SELECT ?subject ?predicate ?object
       WHERE {
          ?subject ?predicate ?object.
    	} LIMIT 5""")

print("printing results")


print("The graph has " + str(len(g)) + " items in it")

print("Name--not necessarily in strign format: ")
print(qres.result[0])

# when done!
g.close()

going to get results...
printing results
The graph has 5773 items in it
Name--not necessarily in strign format: 
(rdflib.term.URIRef(u'http://openworm.org/entities/190'), rdflib.term.URIRef(u'http://openworm.org/entities/356'), rdflib.term.URIRef(u'http://openworm.org/entities/11'))

In [20]:

#!/usr/bin/env python

import pymysql
from rdflib import Graph
from rdflib import Namespace
from rdflib.namespace import RDF, RDFS
from rdflib import URIRef, BNode, Literal
import urllib

#conn = pymysql.connect(host='127.0.0.1', unix_socket='/tmp/mysql.sock', user='root', passwd=None, db='mysql')

conn = pymysql.connect(host='my01.winhost.com', port=3306, user='openworm', passwd='openworm', db='mysql_31129_celegans')
   
cur = conn.cursor()

#first step, grab all entities and add them to the graph

cur.execute("SELECT DISTINCT ID, Entity FROM tblentity")

n = Namespace("http://openworm.org/entities/")

# print cur.description

g = Graph()

# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
    #first item is a number -- needs to be converted to a string
    first = str(r[0])
    #second item is text 
    second = str(r[1])
    
    # This is the backbone of any RDF graph.  The unique
    # ID for each entity is encoded as a URI and every other piece of 
    # knowledge about that entity is connected via triples to that URI
    # In this case, we connect the common name of that entity to the 
    # root URI via the RDFS label property.
    g.add( (n[first], RDFS.label, Literal(second)) )


#second stem, get the relationships between them and add them to the graph
cur.execute("SELECT DISTINCT EnID1, Relation, EnID2 FROM tblrelationship")

# r = cur.fetchall()
# print r
# ...or...
for r in cur.fetchall():
    #print r
    #all items are numbers -- need to be converted to a string
    first = str(r[0])
    second = str(r[1])
    third = str(r[2])
    
    g.add( (n[first], n[second], n[third]) )

cur.close()
conn.close()

print("going to get results...")

qres = g.query(
     """SELECT ?predLabel ?objLabel     #we want to get out the labels associated with the predicates and the objects
       WHERE {
          ?AVALnode ?p "AVAL".          #we are looking first for the 'AVALnode' that is the anchor of all information about the AVAL neuron
          ?AVALnode ?predicate ?object .# having identified that node, here we match any predicate and object associated with the AVALnode
          ?predicate rdfs:label ?predLabel .#for the predicates, look up their plain text label (because otherwise we only have URIs)
          ?object rdfs:label ?objLabel  #for the object, look up their plain text label.
    	} ORDER BY ?predLabel           #sort by the predicate""")       

print("printing results")


print("The graph has " + str(len(g)) + " items in it\n\n")

print "AVAL has the following information stored about it: \n"
for r in qres.result:
    print str(r[0]), str(r[1])

# when done!
g.close()

going to get results...
printing results
The graph has 5773 items in it


AVAL has the following information stored about it: 

Neuropeptide FLP-1
Neuropeptide FLP-18
Receptor GLR-5
Receptor NMR-1
Receptor NMR-2
Receptor GLR-2
Receptor GLR-1
Receptor GGR-3
Receptor GLR-4
Receptor UNC-8
gap junction VA8
gap junction DA7
gap junction SABD
gap junction PVCL
gap junction AVAR
gap junction URYDL
gap junction URYVR
gap junction VA4
gap junction AS5
gap junction VA3
gap junction VB9
gap junction DA5
gap junction SDQR
gap junction VA1
gap junction VA5
gap junction VA6
gap junction DA6
gap junction PVPL
gap junction VA7
gap junction AS8
gap junction VA11
gap junction PVCR
gap junction AS1
gap junction DB6
gap junction AVJL
gap junction RIMR
gap junction VA10
gap junction VA9
gap junction SABVR
gap junction FLPL
gap junction AS3
gap junction DA2
gap junction VA2
gap junction DA1
gap junction LUAL
gap junction AS9
gap junction DA4
gap junction AS10
gap junction AS7
gap junction AS6
is a Neuron
send AS5
send AVBR
send DA6
send LUAL
send AS4
send AS11
send DB5
send AS3
send VA11
send VA9
send VA8
send AVDL
send VA2
send DA3
send AS10
send AVHL
send VA6
send VA3
send AS2
send DA9
send AS9
send AS7
send VA10
send DB6
send DA5
send VA5
send DA1
send DA2
send DA7
send AS8
send DA4
send DA8
send VA4
send VA12
send PVCL
send AVAR
send PVCR

In [ ]: