This notebook is intended to illustrate the use of the ADMESARfari webservice API from Python. Since the webservices for ADMESARfari are written using Cornice (https://github.com/mozilla-services/cornice) we have an exposed SPORE (https://github.com/SPORE/specifications) endpoint. This allows us to use a Python library, such as Respire (https://github.com/spiral-project/respire) to parse the JSON description of the methods available, which is provided by the SPORE endpoint (https://www.ebi.ac.uk/chembl/admesarfari/rest/spore) and to automatically generate callable methods from Python without handcoding the necessary boilerplate code.
We will cover
# Let do our imports first.
import respire,urllib,re
from IPython.display import HTML,JSON
# We just need to monkey-patch the URL join method in this instance,
# since it truncates the URL due to the way ADMESARfari is hosted
def urljoin_patched(base,path):
return base+path
respire.client.urljoin = urljoin_patched
# Create our client and associated methods
api_client = respire.client_from_url('http://wwwdev.ebi.ac.uk/chembl/admesarfari/rest/spore')
# What methods do we have available?
# Iterate over the parsed endpoint, pulling out applicable methods, the paths and the descriptions.
# We'll add some HTML elements to the output.
tc=[]
ts = '<table><tr>'
te = '</tr></table>'
for method in api_client.description.methods:
methodname = method
method = api_client.description.methods[methodname]
if method['method']!='HEAD':
tc.append("<tr><th>"+methodname+"</th></tr>")
tc.append('<tr><td>'+method['path']+'</td><td>'+method['description']+'</td></tr>')
h = HTML(ts+"".join(tc)+te)
h
get_textsearch | |
---|---|
/rest/:TEXT/search | Return a set of target ids where the search term appears |
get_blast | |
/rest/:FASTA/blast | BLAST the input sequence(s) against the ADME SARfari set of target sequences. This requires: * A URL encoded FASTA sequence (May extended to other formats via a keyword) |
get_celltypes | |
/rest/celltypes | Retrieve the list of cell types |
get_targetsequence | |
/rest/targetsequence/:TARGET_ID | Return the Target Sequence and Variation information for a particular Target ID |
post_postsimsubsdf | |
/rest/simsubsdf/:VALUE | Return a set of molecules via either a similarity or substructure search Requires: * The POST body to contain CTAB or SMILES * A similarity cut-off value (100 will perform a sub-structure search) This returns a gzipped SDF file. |
get_orthologuematrix | |
/rest/orthologuematrix/:TAX_IDS | Retrieves the orthologue mapping matrix for a specific set of Taxonomy IDs. Requires: Comma seperated list of Taxonomy IDs |
post_modelpredictor2 | |
/rest/modelpredictor2 | Run the input CTAB through the ADME SARfari SciKit/RDKit Bayesian model. This requires a URL encoded CTAB |
get_target | |
/rest/target/:TARGET_ID | Return the Target information for a particular Target ID |
get_targetalignment | |
/rest/targetalignment/:TARGET_ID/:TAX_IDS | Return the alignment information for a particular Target ID |
get_bioactivity | |
/rest/:MOLREGNO/bioactivity | Retrieve the bioactivity and assay data for a particular molregno. Requires: Molregno (Int) If the Molregno == -1 then it will bring back all records Returns: Datatables format JSON. |
post_postblast | |
/rest/blast | BLAST the input sequence(s) against the ADME SARfari set of protein databases This requires: * A FASTA sequence as the POST body |
get_targetcompounds | |
/rest/:TARGET_ID/targetcompounds | Retrieve the compound SMILES associated with an ADME SARfari target (via activity) Requires: ADME SARfari Target ID |
get_expressionmatrix | |
/rest/expressionmatrix/:TISSUE_IDS | Retrieves the tissue target expression level matrix |
get_taxids | |
/rest/taxids | Return the list of taxonomy IDs used. |
get_alignmentdendrogram | |
/rest/alignmentdendrogram/:TARGET_ID/:TAX_IDS | Return the dendrogram tree information for a particular Target ID (from the relevant orthologues) Requires a target ID and a comma seperated list of tax IDs |
get_targetinvivomatrix | |
/rest/:TARGET_ID/targetinvivomatrix | Retrieves the invivo matrix for a particular target Requires: ADME SARfari internal target id This will return an object with xcats,ycats and data elements (primarily used with the Highcharts Heatmap plugin.) |
get_tissues | |
/rest/tissues | Retrieve the list of tissues |
get_targetbioactivity | |
/rest/:TARGET_ID/targetbioactivity | Retrieve the bioactivity and assay data for a particular target Requires: ADME SARfari Target ID |
get_modelpredictor2 | |
/rest/:CTAB/modelpredictor2 | Run the input CTAB through the ADME SARfari SciKit/RDKit Bayesian model. This requires a URL encoded CTAB |
# Let's set a few lookup dictionaries
# Taxonmy ID look up
# Get the taxids
taxids = api_client.get_taxids()['results']
t = {}
# Create taxonomy look-up
for taxid in taxids:
t[taxid['taxid']]=taxid['name']
taxids = t
# Get tissues
tissues = api_client.get_tissues()['results'][0]
alltissues = str(",".join(tissues.keys()))
cells = api_client.get_celltypes()['results'][0]
# Get Human expression levels (Could take a while!)
expressionlevels = api_client.get_expressionmatrix(TISSUE_IDS=alltissues)['expression_matrix']
print "Levels found:",expressionlevels.__len__()
Levels found: 459
# Let's use an input compound and predict it's ADME profile
# We'll use Gleevec (CHEMBL941) as our input
gleevec_ctab = """
SciTegic12111210002D
37 41 0 0 0 0 999 V2000
6.9208 -3.0042 0.0000 C 0 0
7.5250 -2.6417 0.0000 N 0 0
3.2167 -3.0417 0.0000 C 0 0
5.6875 -3.0167 0.0000 C 0 0
6.3000 -2.6542 0.0000 N 0 0
3.8292 -2.6792 0.0000 N 0 0
8.1417 -2.9833 0.0000 C 0 0
0.1292 -2.0000 0.0000 N 0 0
5.0667 -2.6667 0.0000 C 0 0
-1.1083 -2.6917 0.0000 N 0 0
6.9250 -3.7167 0.0000 N 0 0
2.6000 -2.6917 0.0000 C 0 0
4.4542 -3.0292 0.0000 C 0 0
8.7542 -2.6125 0.0000 C 0 0
5.6917 -3.7292 0.0000 C 0 0
3.2250 -3.7500 0.0000 O 0 0
9.3458 -1.5375 0.0000 N 0 0
0.7417 -1.6417 0.0000 C 0 0
2.6000 -1.9792 0.0000 C 0 0
1.9875 -3.0500 0.0000 C 0 0
5.0667 -4.0917 0.0000 C 0 0
0.1167 -2.7167 0.0000 C 0 0
-0.4708 -1.6417 0.0000 C 0 0
-0.5000 -3.0542 0.0000 C 0 0
-1.1000 -1.9792 0.0000 C 0 0
8.1583 -3.6958 0.0000 C 0 0
7.5458 -4.0625 0.0000 C 0 0
1.3667 -1.9917 0.0000 C 0 0
4.4542 -3.7417 0.0000 C 0 0
1.9667 -1.6292 0.0000 C 0 0
1.3750 -2.7042 0.0000 C 0 0
8.7417 -1.9083 0.0000 C 0 0
-1.7375 -3.0292 0.0000 C 0 0
9.3750 -2.9625 0.0000 C 0 0
9.9750 -1.8833 0.0000 C 0 0
6.3042 -4.0875 0.0000 C 0 0
9.9917 -2.5958 0.0000 C 0 0
2 1 1 0
3 6 1 0
4 5 1 0
5 1 1 0
6 13 1 0
7 2 2 0
8 18 1 0
9 4 1 0
10 25 1 0
11 1 2 0
12 3 1 0
13 9 2 0
14 7 1 0
15 4 2 0
16 3 2 0
17 32 2 0
18 28 1 0
19 12 2 0
20 12 1 0
21 15 1 0
22 8 1 0
23 8 1 0
24 22 1 0
25 23 1 0
26 7 1 0
27 11 1 0
28 31 1 0
29 21 2 0
30 19 1 0
31 20 2 0
32 14 1 0
33 10 1 0
34 14 2 0
35 37 2 0
36 15 1 0
37 34 1 0
27 26 2 0
29 13 1 0
17 35 1 0
28 30 2 0
24 10 1 0
M END
"""
predictions = api_client.post_modelpredictor2(data=urllib.quote(gleevec_ctab))['results']
# How many ADME targets were predicted?
print predictions.__len__()
8
# Let's view the predictions
tc=[]
ts = '<table><tr>'
te = '</tr></table>'
for prediction in predictions:
tc.append("<tr><th>"+prediction['PROTEIN_ACCESSION']+"</th><th>"+prediction['full_name']+"</th></tr>")
if prediction['function'] != None:
pfunc = prediction['function']
else:
pfunc = 'Unknown'
tc.append('<tr><td>'+taxids[prediction['taxid']]+'</td><td>'+pfunc+'</td></tr>')
h = HTML(ts+"".join(tc)+te)
h
CHEMBL3356 | Cytochrome P450 1A2 |
---|---|
Human | Cytochromes P450 are a group of heme-thiolate monooxygenases. In liver microsomes, this enzyme is involved in an NADPH-dependent electron transport pathway. It oxidizes a variety of structurally unrelated compounds, including steroids, fatty acids, and xenobiotics. Most active in catalyzing 2-hydroxylation. Caffeine is metabolized primarily by cytochrome CYP1A2 in the liver through an initial N3-demethylation. Also acts in the metabolism of aflatoxin B1 and acetaminophen. Participates in the bioactivation of carcinogenic aromatic and heterocyclic amines. Catalizes the N-hydroxylation of heterocyclic amines and the O-deethylation of phenacetin. |
CHEMBL5393 | ATP-binding cassette sub-family G member 2 |
Human | Xenobiotic transporter that may play an important role in the exclusion of xenobiotics from the brain. May be involved in brain-to-blood efflux. Appears to play a major role in the multidrug resistance phenotype of several cancer cell lines. When overexpressed, the transfected cells become resistant to mitoxantrone, daunorubicin and doxorubicin, display diminished intracellular accumulation of daunorubicin, and manifest an ATP-dependent increase in the efflux of rhodamine 123. |
CHEMBL340 | Cytochrome P450 3A4 |
Human | Cytochromes P450 are a group of heme-thiolate monooxygenases. In liver microsomes, this enzyme is involved in an NADPH-dependent electron transport pathway. It performs a variety of oxidation reactions (e.g. caffeine 8-oxidation, omeprazole sulphoxidation, midazolam 1''''-hydroxylation and midazolam 4-hydroxylation) of structurally unrelated compounds, including steroids, fatty acids, and xenobiotics. Acts as a 1,8-cineole 2-exo-monooxygenase. The enzyme also hydroxylates etoposide. |
CHEMBL3397 | Cytochrome P450 2C9 |
Human | Cytochromes P450 are a group of heme-thiolate monooxygenases. In liver microsomes, this enzyme is involved in an NADPH-dependent electron transport pathway. It oxidizes a variety of structurally unrelated compounds, including steroids, fatty acids, and xenobiotics. This enzyme contributes to the wide pharmacokinetics variability of the metabolism of drugs such as S-warfarin, diclofenac, phenytoin, tolbutamide and losartan. |
CHEMBL3622 | Cytochrome P450 2C19 |
Human | Responsible for the metabolism of a number of therapeutic agents such as the anticonvulsant drug S-mephenytoin, omeprazole, proguanil, certain barbiturates, diazepam, propranolol, citalopram and imipramine. |
CHEMBL3577 | Retinal dehydrogenase 1 |
Human | Binds free retinal and cellular retinol-binding protein-bound retinal. Can convert/oxidize retinaldehyde to retinoic acid (By similarity). |
CHEMBL289 | Cytochrome P450 2D6 |
Human | Responsible for the metabolism of many drugs and environmental chemicals that it oxidizes. It is involved in the metabolism of drugs such as antiarrhythmics, adrenoceptor antagonists, and tricyclic antidepressants. |
CHEMBL6035 | Thioredoxin reductase 1, cytoplasmic |
Rat | Unknown |
# Now lets look at expression levels of these targets
# Select only HIGH expression levels
tc=[]
ts = '<table><tr>'
te = '</tr></table>'
for prediction in predictions:
tc.append("<tr><th>"+prediction['PROTEIN_ACCESSION']+"</th><th>"+prediction['full_name']+"</th></tr>")
# This dumps out expression levels for all tissues and cell types!
targetexpression=[]
for humexp in expressionlevels:
for tissue in tissues:
for cell in cells:
percell = humexp[str(tissue)]
if str(cell) in percell:
if percell[str(cell)]['target_id']==prediction['target_id']:
expstring = "Tissue =",percell[str(cell)]['tissue'],", Cell =",cells[str(cell)]," Level =",percell[str(cell)]['exp_level']," Type =",percell[str(cell)]['expression_type']," Reliability =",percell[str(cell)]['reliability']
level = percell[str(cell)]['exp_level']
if re.match('High|Strong',level):
targetexpression.append(expstring)
for exp in targetexpression:
tc.append('<tr><td>'+" ".join(exp)+'</td></tr>')
h = HTML(ts+"".join(tc)+te)
h
CHEMBL3356 | Cytochrome P450 1A2 |
---|---|
Tissue = liver , Cell = hepatocytes Level = High Type = APE Reliability = Medium | |
CHEMBL5393 | ATP-binding cassette sub-family G member 2 |
Tissue = vulva/anal+skin , Cell = epidermal cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = tonsil , Cell = squamous epithelial cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = lung , Cell = macrophages Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = nasopharynx , Cell = respiratory epithelial cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = vagina , Cell = squamous epithelial cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = uterus,+post-menopause , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = stomach,+upper , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = testis , Cell = cells in seminiferus ducts Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = adrenal+gland , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = bone+marrow , Cell = hematopoietic cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = bronchus , Cell = respiratory epithelial cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = colon , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = cervix,+uterine , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = spleen , Cell = cells in red pulp Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = spleen , Cell = cells in white pulp Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = epididymis , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = esophagus , Cell = squamous epithelial cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = heart+muscle , Cell = myocytes Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = gallbladder , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = seminal+vesicle , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = small+intestine , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = skin , Cell = keratinocytes Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = skin , Cell = Langerhans Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = skin , Cell = fibroblasts Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = skin , Cell = melanocytes Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = skeletal+muscle , Cell = myocytes Level = Strong Type = Staining Reliability = Uncertain | |
CHEMBL340 | Cytochrome P450 3A4 |
Tissue = duodenum , Cell = glandular cells Level = Strong Type = Staining Reliability = Supportive | |
Tissue = liver , Cell = hepatocytes Level = Strong Type = Staining Reliability = Supportive | |
Tissue = small+intestine , Cell = glandular cells Level = Strong Type = Staining Reliability = Supportive | |
CHEMBL3397 | Cytochrome P450 2C9 |
Tissue = liver , Cell = hepatocytes Level = High Type = APE Reliability = High | |
CHEMBL3622 | Cytochrome P450 2C19 |
Tissue = liver , Cell = hepatocytes Level = Strong Type = Staining Reliability = Supportive | |
CHEMBL3577 | Retinal dehydrogenase 1 |
CHEMBL289 | Cytochrome P450 2D6 |
Tissue = cerebellum , Cell = cells in granular layer Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = duodenum , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = liver , Cell = hepatocytes Level = Strong Type = Staining Reliability = Uncertain | |
Tissue = small+intestine , Cell = glandular cells Level = Strong Type = Staining Reliability = Uncertain | |
CHEMBL6035 | Thioredoxin reductase 1, cytoplasmic |
# Now lets see how many activity points we have per target
# Let's view the predictions
tc=[]
ts = '<table><tr>'
te = '</tr></table>'
for prediction in predictions:
try:
activity = api_client.get_targetbioactivity(TARGET_ID=str(prediction['target_id']))['results']
print activity.__len__()," activity points"
except:
print "Error retrieving data points!"
#tc.append("<tr><th>"+prediction['PROTEIN_ACCESSION']+"</th><th>"+prediction['full_name']+"</th></tr>")
#tc.append('<tr><td>Activity points</td><td>'+str(activity.__len__())+'</td></tr>')
h = HTML(ts+"".join(tc)+te)
h
66105 activity points 10257 activity points Error retrieving data points! 59288 activity points 79250 activity points Error retrieving data points! Error retrieving data points! Error retrieving data points!
# How many compounds do we have per target?
tc=[]
ts = '<table><tr>'
te = '</tr></table>'
for prediction in predictions:
try:
targetcompounds = api_client.get_targetcompounds(TARGET_ID=str(prediction['target_id']))['results']
count = targetcompounds.__len__()," compounds"
except:
count = 0
print "Error retrieving data points!"
tc.append("<tr><th>"+prediction['PROTEIN_ACCESSION']+"</th><th>"+prediction['full_name']+"</th></tr>")
tc.append('<tr><td>Activity points</td><td>'+str(count)+'</td></tr>')
h = HTML(ts+"".join(tc)+te)
h
CHEMBL3356 | Cytochrome P450 1A2 |
---|---|
Activity points | (11791, ' compounds') |
CHEMBL5393 | ATP-binding cassette sub-family G member 2 |
Activity points | (484, ' compounds') |
CHEMBL340 | Cytochrome P450 3A4 |
Activity points | (15913, ' compounds') |
CHEMBL3397 | Cytochrome P450 2C9 |
Activity points | (11124, ' compounds') |
CHEMBL3622 | Cytochrome P450 2C19 |
Activity points | (11707, ' compounds') |
CHEMBL3577 | Retinal dehydrogenase 1 |
Activity points | (75307, ' compounds') |
CHEMBL289 | Cytochrome P450 2D6 |
Activity points | (9717, ' compounds') |
CHEMBL6035 | Thioredoxin reductase 1, cytoplasmic |
Activity points | (39319, ' compounds') |