This notebook will provide some examples of using the myChEMBL
webservices.
The web services have recently been updated to the 2.x version and are not backwards compatible. The main features introduced by this latest version are:
You can call the web services in the following two ways:
Directly via URLs (see the 'Web Services' link on the myChEMBL
LaunchPad for a list of the available endpoints). The advantage of using the URLs is that it is language-agnostic: although the examples below use Python, any other language with a library for executing HTTP requests would do just as well.
Using the API provided by the Python package chembl_webresource_client
. This has the following advantages:
For the reasons above, we recommend using the API where possible.
Note that the chembl_webresource_client
module is aleady installed on the myChEMBL
VM; if you wish to use it on other machines, it can be installed using pip
.
Please note that the code below attempts to balance clarity and brevity, and is not intended to be a template for production code: error checking, for example, should be much more thorough in practice.
import logging
from collections import Counter
from operator import itemgetter
from lxml import etree
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from IPython.display import Image, display
# Python modules used for API access...
# By default, the API connects to the main ChEMBL database; set it to use the local version (i.e. myChEMBL) instead...
from chembl_webresource_client.settings import Settings
Settings.Instance().NEW_CLIENT_URL = 'http://localhost/chemblws'
from chembl_webresource_client.new_client import new_client
It's easy to get a list of available resources by invoking:
available_resources = [resource for resource in dir(new_client) if not resource.startswith('_')]
print available_resources
print len(available_resources)
['activity', 'assay', 'atc_class', 'binding_site', 'biotherapeutic', 'cell_line', 'chembl_id_lookup', 'description', 'document', 'drug_indication', 'go_slim', 'image', 'mechanism', 'metabolism', 'molecule', 'molecule_form', 'official', 'protein_class', 'similarity', 'source', 'substructure', 'target', 'target_component'] 23
Which means there are 20 different types of resources available via web services. In this notebook only the most important of these are covered.
Molecule records may be retrieved in a number of ways, such as lookup of single molecules using various identifiers or searching for compounds via substruture or similarity.
# Get a molecule-handler object for API access and check the connection to the database...
molecule = new_client.molecule
molecule.set_format('json')
print "%s molecules available in myChEMBL_20" % len(molecule.all())
1592191 molecules available in myChEMBL_20
In order to retrieve a single molecule from the web services, you need to know its unique and unambiguous identifier. In case of molecule resource this can be one of three types:
# so this:
# 1.
m1 = molecule.get('CHEMBL25')
# 2.
m2 = molecule.get('BSYNRYMUTXBXSQ-UHFFFAOYSA-N')
#
m3 = molecule.get('CC(=O)Oc1ccccc1C(=O)O')
# will return the same data:
m1 == m2 == m3
True
All the main entities in the ChEMBL database have a ChEMBL ID. It is a stable identifier designed for straightforward lookup of data.
# Lapatinib, the bioactive component of the anti-cancer drug Tykerb
chembl_id = "CHEMBL554"
# Get compound record using client...
record_via_client = molecule.get(chembl_id)
record_via_client
{u'atc_classifications': [u'L01XE07'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 49603, u'chirality': u'2', u'dosed_ingredient': False, u'first_approval': 2007, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': None, u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL554', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL554', u'parent_chembl_id': u'CHEMBL554'}, u'molecule_properties': {u'acd_logd': u'6.26', u'acd_logp': u'6.30', u'acd_most_apka': None, u'acd_most_bpka': u'6.34', u'alogp': u'6.04', u'aromatic_rings': 5, u'full_molformula': u'C29H26ClFN4O4S', u'full_mwt': u'581.06', u'hba': 7, u'hbd': 2, u'heavy_atoms': 40, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'581.06', u'mw_monoisotopic': u'580.1347', u'num_alerts': 1, u'num_ro5_violations': 2, u'psa': u'114.73', u'qed_weighted': u'0.18', u'ro3_pass': u'N', u'rtb': 11}, u'molecule_structures': {u'canonical_smiles': u'CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2', u'standard_inchi': u'InChI=1S/C29H26ClFN4O4S/c1-40(36,37)12-11-32-16-23-7-10-27(39-23)20-5-8-26-24(14-20)29(34-18-33-26)35-22-6-9-28(25(30)15-22)38-17-19-3-2-4-21(31)13-19/h2-10,13-15,18,32H,11-12,16-17H2,1H3,(H,33,34,35)', u'standard_inchi_key': u'BCFGMOOMADDAQU-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'FDA', u'synonyms': u'Lapatinib'}, {u'syn_type': u'INN', u'synonyms': u'Lapatinib'}, {u'syn_type': u'OTHER', u'synonyms': u'Lapatinib'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'GW-572016'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'GW-2016'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'LAPATINIB', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-tinib', u'usan_stem_definition': u'tyrosine kinase inhibitors', u'usan_substem': None, u'usan_year': 2003}
As noted above, a URLs may also be used to access the data, and, although the examples here use Python, any other language with a library for executing HTTP requests would do as well.
# Import a Python module to allow URL-based access...
import requests
from urllib import quote
# Stem of URL for local version of web services...
url_stem = "http://localhost/chemblws"
# Note that, for historical reasons, the URL-based webservices return XML by default, so JSON
# must be requested explicity by appending '.json' to the URL.
# Get request object...
url = url_stem + "/molecule/" + chembl_id + ".json"
request = requests.get(url)
print url
# Check reqest status: should be 200 if everything went OK...
print request.status_code
http://localhost/chemblws/molecule/CHEMBL554.json 200
record_via_url = request.json()
record_via_url
{u'atc_classifications': [u'L01XE07'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 49603, u'chirality': u'2', u'dosed_ingredient': False, u'first_approval': 2007, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': None, u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL554', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL554', u'parent_chembl_id': u'CHEMBL554'}, u'molecule_properties': {u'acd_logd': u'6.26', u'acd_logp': u'6.30', u'acd_most_apka': None, u'acd_most_bpka': u'6.34', u'alogp': u'6.04', u'aromatic_rings': 5, u'full_molformula': u'C29H26ClFN4O4S', u'full_mwt': u'581.06', u'hba': 7, u'hbd': 2, u'heavy_atoms': 40, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'581.06', u'mw_monoisotopic': u'580.1347', u'num_alerts': 1, u'num_ro5_violations': 2, u'psa': u'114.73', u'qed_weighted': u'0.18', u'ro3_pass': u'N', u'rtb': 11}, u'molecule_structures': {u'canonical_smiles': u'CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2', u'standard_inchi': u'InChI=1S/C29H26ClFN4O4S/c1-40(36,37)12-11-32-16-23-7-10-27(39-23)20-5-8-26-24(14-20)29(34-18-33-26)35-22-6-9-28(25(30)15-22)38-17-19-3-2-4-21(31)13-19/h2-10,13-15,18,32H,11-12,16-17H2,1H3,(H,33,34,35)', u'standard_inchi_key': u'BCFGMOOMADDAQU-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'FDA', u'synonyms': u'Lapatinib'}, {u'syn_type': u'INN', u'synonyms': u'Lapatinib'}, {u'syn_type': u'OTHER', u'synonyms': u'Lapatinib'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'GW-572016'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'GW-2016'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'LAPATINIB', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-tinib', u'usan_stem_definition': u'tyrosine kinase inhibitors', u'usan_substem': None, u'usan_year': 2003}
Note that in both cases we are getting exactly the same results:
record_via_client == record_via_url
True
When retrieved in JSON format, a record is a nested dictionary, so to get, say, a SMILES string we have to write:
smiles_from_json = record_via_client['molecule_structures']['canonical_smiles']
It is possible to retrieve data in XML format as well:
# Get compound record in XML format...
molecule.set_format('xml')
xml = molecule.get(chembl_id).encode('utf-8')
#print xml
# The XML must be parsed (e.g. using the lxml.etree module in Python) to enable extraction of the data...
root = etree.fromstring(xml).getroottree()
# Extract SMILES via xpath...
smiles_from_xml = root.xpath("/molecule/molecule_structures/canonical_smiles/text()")[0]
print smiles_from_xml
print smiles_from_xml == smiles_from_json
CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2 True
# Pretty-print XML...
print etree.tostring(root, pretty_print=True)
<molecule> <atc_classifications> <level5>L01XE07</level5> </atc_classifications> <availability_type>1</availability_type> <biotherapeutic/> <black_box_warning>1</black_box_warning> <chebi_par_id>49603</chebi_par_id> <chirality>2</chirality> <dosed_ingredient/> <first_approval>2007</first_approval> <first_in_class>0</first_in_class> <helm_notation/> <indication_class/> <inorganic_flag>0</inorganic_flag> <max_phase>4</max_phase> <molecule_chembl_id>CHEMBL554</molecule_chembl_id> <molecule_hierarchy> <molecule_chembl_id>CHEMBL554</molecule_chembl_id> <parent_chembl_id>CHEMBL554</parent_chembl_id> </molecule_hierarchy> <molecule_properties> <acd_logd>6.26</acd_logd> <acd_logp>6.30</acd_logp> <acd_most_apka/> <acd_most_bpka>6.34</acd_most_bpka> <alogp>6.04</alogp> <aromatic_rings>5</aromatic_rings> <full_molformula>C29H26ClFN4O4S</full_molformula> <full_mwt>581.06</full_mwt> <hba>7</hba> <hbd>2</hbd> <heavy_atoms>40</heavy_atoms> <molecular_species>NEUTRAL</molecular_species> <mw_freebase>581.06</mw_freebase> <mw_monoisotopic>580.1347</mw_monoisotopic> <num_alerts>1</num_alerts> <num_ro5_violations>2</num_ro5_violations> <psa>114.73</psa> <qed_weighted>0.18</qed_weighted> <ro3_pass>N</ro3_pass> <rtb>11</rtb> </molecule_properties> <molecule_structures> <canonical_smiles>CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2</canonical_smiles> <standard_inchi>InChI=1S/C29H26ClFN4O4S/c1-40(36,37)12-11-32-16-23-7-10-27(39-23)20-5-8-26-24(14-20)29(34-18-33-26)35-22-6-9-28(25(30)15-22)38-17-19-3-2-4-21(31)13-19/h2-10,13-15,18,32H,11-12,16-17H2,1H3,(H,33,34,35)</standard_inchi> <standard_inchi_key>BCFGMOOMADDAQU-UHFFFAOYSA-N</standard_inchi_key> </molecule_structures> <molecule_synonyms> <synonym> <syn_type>FDA</syn_type> <synonyms>Lapatinib</synonyms> </synonym> <synonym> <syn_type>INN</syn_type> <synonyms>Lapatinib</synonyms> </synonym> <synonym> <syn_type>OTHER</syn_type> <synonyms>Lapatinib</synonyms> </synonym> <synonym> <syn_type>RESEARCH_CODE</syn_type> <synonyms>GW-572016</synonyms> </synonym> <synonym> <syn_type>RESEARCH_CODE</syn_type> <synonyms>GW-2016</synonyms> </synonym> </molecule_synonyms> <molecule_type>Small molecule</molecule_type> <natural_product>0</natural_product> <oral>True</oral> <parenteral/> <polymer_flag/> <pref_name>LAPATINIB</pref_name> <prodrug>0</prodrug> <structure_type>MOL</structure_type> <therapeutic_flag>True</therapeutic_flag> <topical/> <usan_stem>-tinib</usan_stem> <usan_stem_definition>tyrosine kinase inhibitors</usan_stem_definition> <usan_substem/> <usan_year>2003</usan_year> </molecule>
Compound records may also be retrieved via InChI Key lookup.
# InChI Key for Lapatinib
inchi_key = "BCFGMOOMADDAQU-UHFFFAOYSA-N"
# getting molecule via client
molecule.set_format('json')
record_via_client = molecule.get(inchi_key)
# getting molecule via url
url = url_stem + "/molecule/" + inchi_key + ".json"
record_via_url = requests.get(url).json()
print url
# they are the same
print record_via_url == record_via_client
http://localhost/chemblws/molecule/BCFGMOOMADDAQU-UHFFFAOYSA-N.json True
Compound records may also be retrieved via SMILES lookup.
The purpose of the get
method is to return objects identified by their unique and unambiguous properties.
This is why SMILES provided as arguments to the get
method need to be canonical.
But you can still search for molecules, using non-canonical SMILES - this functionaly will be covered later in this notebook.
# Canonoical SMILES for Lapatinib
canonical_smiles = "CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2"
# getting molecule via client
molecule.set_format('json')
record_via_client = molecule.get(canonical_smiles)
# getting molecule via url
url = url_stem + "/molecule/" + quote(canonical_smiles) + ".json"
record_via_url = requests.get(url).json()
print url
# they are the same
record_via_url == record_via_client
http://localhost/chemblws/molecule/CS%28%3DO%29%28%3DO%29CCNCc1oc%28cc1%29c2ccc3ncnc%28Nc4ccc%28OCc5cccc%28F%29c5%29c%28Cl%29c4%29c3c2.json
True
Multiple records may be requested at once. The get
method can accept a list of homogenous identifiers.
records1 = molecule.get(['CHEMBL6498', 'CHEMBL6499', 'CHEMBL6505'])
records2 = molecule.get(['XSQLHVPPXBBUPP-UHFFFAOYSA-N', 'JXHVRXRRSSBGPY-UHFFFAOYSA-N', 'TUHYVXGNMOGVMR-GASGPIRDSA-N'])
records3 = molecule.get(['CNC(=O)c1ccc(cc1)N(CC#C)Cc2ccc3nc(C)nc(O)c3c2',
'Cc1cc2SC(C)(C)CC(C)(C)c2cc1\\N=C(/S)\\Nc3ccc(cc3)S(=O)(=O)N',
'CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H]3CCCN3C(=O)C(CCCCN)CCCCN)C(C)(C)C)C(=O)O'])
records1 == records2 == records3
True
The same can be done via urls:
url1 = url_stem + "/molecule/set/%s;%s;%s" % ('CHEMBL6498', 'CHEMBL6499', 'CHEMBL6505') + ".json"
records1 = requests.get(url1).json()
url2 = url_stem + "/molecule/set/%s;%s;%s" % ('XSQLHVPPXBBUPP-UHFFFAOYSA-N', 'JXHVRXRRSSBGPY-UHFFFAOYSA-N', 'TUHYVXGNMOGVMR-GASGPIRDSA-N') + ".json"
records2 = requests.get(url2).json()
url3 = url_stem + "/molecule/set/%s;%s;%s" % (quote('CNC(=O)c1ccc(cc1)N(CC#C)Cc2ccc3nc(C)nc(O)c3c2'),
quote('Cc1cc2SC(C)(C)CC(C)(C)c2cc1\\N=C(/S)\\Nc3ccc(cc3)S(=O)(=O)N'),
quote('CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](Cc1c[nH]c2ccccc12)NC(=O)[C@H]3CCCN3C(=O)C(CCCCN)CCCCN)C(C)(C)C)C(=O)O')) + ".json"
records3 = requests.get(url3).json()
print url1
print url2
print url3
records1 == records2 == records3
http://localhost/chemblws/molecule/set/CHEMBL6498;CHEMBL6499;CHEMBL6505.json http://localhost/chemblws/molecule/set/XSQLHVPPXBBUPP-UHFFFAOYSA-N;JXHVRXRRSSBGPY-UHFFFAOYSA-N;TUHYVXGNMOGVMR-GASGPIRDSA-N.json http://localhost/chemblws/molecule/set/CNC%28%3DO%29c1ccc%28cc1%29N%28CC%23C%29Cc2ccc3nc%28C%29nc%28O%29c3c2;Cc1cc2SC%28C%29%28C%29CC%28C%29%28C%29c2cc1%5CN%3DC%28/S%29%5CNc3ccc%28cc3%29S%28%3DO%29%28%3DO%29N;CC%28C%29C%5BC%40H%5D%28NC%28%3DO%29%5BC%40%40H%5D%28NC%28%3DO%29%5BC%40H%5D%28Cc1c%5BnH%5Dc2ccccc12%29NC%28%3DO%29%5BC%40H%5D3CCCN3C%28%3DO%29C%28CCCCN%29CCCCN%29C%28C%29%28C%29C%29C%28%3DO%29O.json
True
Please note that the length of url can't be more than 4000 characters. This is why url-based approach should not be used for a very long lists of identifiers. Also molecule.get
call needs to be modified slightly in that case.
# Generate a list of 300 ChEMBL IDs (N.B. not all will be valid)...
chembl_ids = ['CHEMBL{}'.format(x) for x in range(1, 301)]
# Get compound records, note `molecule_chembl_id` named parameter.
# Named parameters should always be used for longer lists
records = molecule.get(molecule_chembl_id=chembl_ids)
len(records)
168
Note that we expect to see a number that is less than 300 (169). This is because for some identifiers in range (CHEMBL1, ..., CHEMBL300)
there are no molecule mapped to them.
All resources available through ChEMBL web services can be filtered. Some examples of filtering applied to molecules:
# First, filtering using the client:
# 1. Get all approved drugs
approved_drugs = molecule.filter(max_phase=4)
# 2. Get all molecules in ChEMBL with no Rule-of-Five violations
no_violations = molecule.filter(molecule_properties__num_ro5_violations=0)
# 3. Get all biotherapeutic molecules
biotherapeutics = molecule.filter(biotherapeutic__isnull=False)
# 4. Return molecules with molecular weight <= 300
light_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300)
# 5. Return molecules with molecular weight <= 300 AND pref_name ends with nib
light_nib_molecules = molecule.filter(molecule_properties__mw_freebase__lte=300).filter(pref_name__iendswith="nib")
# Secondly, fltering using url endpoint:
# 1. Get all approved drugs
url_1 = url_stem + "/molecule.json?max_phase=4"
url_approved_drugs = requests.get(url_1).json()
# 2. Get all molecules in ChEMBL with no Rule-of-Five violations
url_2 = url_stem + "/molecule.json?molecule_properties__num_ro5_violations=0"
ulr_no_violations = requests.get(url_2).json()
# 3. Get all biotherapeutic molecules
url_3 = url_stem + "/molecule.json?biotherapeutic__isnull=false"
url_biotherapeutics = requests.get(url_3).json()
# 4. Return molecules with molecular weight <= 300
url_4 = url_stem + "/molecule.json?molecule_properties__mw_freebase__lte=300"
url_light_molecules = requests.get(url_4).json()
# 5. Return molecules with molecular weight <= 300 AND pref_name ends with nib
url_5 = url_stem + "/molecule.json?molecule_properties__mw_freebase__lte=300&pref_name__iendswith=nib"
url_light_nib_molecules = requests.get(url_5).json()
print url_1
print url_2
print url_3
print url_4
print url_5
http://localhost/chemblws/molecule.json?max_phase=4 http://localhost/chemblws/molecule.json?molecule_properties__num_ro5_violations=0 http://localhost/chemblws/molecule.json?biotherapeutic__isnull=false http://localhost/chemblws/molecule.json?molecule_properties__mw_freebase__lte=300 http://localhost/chemblws/molecule.json?molecule_properties__mw_freebase__lte=300&pref_name__iendswith=nib
There are some important differences between filering results returned by the client and generated using URL endpoint. Let's have a look at them.
# First off, they are not the same thing:
print approved_drugs == url_approved_drugs
# Not surprisingly, url-endpoint produced JSON data, which has been paresed into python dict:
print type(url_approved_drugs)
# Whereas the client has returned an object of type `QuerySet`
print type(approved_drugs)
False <type 'dict'> <class 'chembl_webresource_client.query_set.QuerySet'>
# Let's examine what data contains the python dict:
url_approved_drugs
{u'molecules': [{u'atc_classifications': [u'C02CA01'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 8364, u'chirality': u'2', u'dosed_ingredient': False, u'first_approval': 1976, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antihypertensive', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL2', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL2', u'parent_chembl_id': u'CHEMBL2'}, u'molecule_properties': {u'acd_logd': u'2.09', u'acd_logp': u'2.14', u'acd_most_apka': None, u'acd_most_bpka': u'6.52', u'alogp': u'2.11', u'aromatic_rings': 3, u'full_molformula': u'C19H21N5O4', u'full_mwt': u'383.40', u'hba': 7, u'hbd': 1, u'heavy_atoms': 28, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'383.40', u'mw_monoisotopic': u'383.1594', u'num_alerts': 0, u'num_ro5_violations': 0, u'psa': u'106.94', u'qed_weighted': u'0.74', u'ro3_pass': u'N', u'rtb': 4}, u'molecule_structures': {u'canonical_smiles': u'COc1cc2nc(nc(N)c2cc1OC)N3CCN(CC3)C(=O)c4occc4', u'standard_inchi': u'InChI=1S/C19H21N5O4/c1-26-15-10-12-13(11-16(15)27-2)21-19(22-17(12)20)24-7-5-23(6-8-24)18(25)14-4-3-9-28-14/h3-4,9-11H,5-8H2,1-2H3,(H2,20,21,22)', u'standard_inchi_key': u'IENZQIKPVFGBNW-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'RESEARCH_CODE', u'synonyms': u'CP-12299'}, {u'syn_type': u'FDA', u'synonyms': u'Prazosin'}, {u'syn_type': u'BAN', u'synonyms': u'Prazosin'}, {u'syn_type': u'INN', u'synonyms': u'Prazosin'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'PRAZOSIN', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-azosin', u'usan_stem_definition': u'antihypertensives (prazosin type)', u'usan_substem': None, u'usan_year': 1968}, {u'atc_classifications': [u'N07BA01'], u'availability_type': u'2', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 17688, u'chirality': u'1', u'dosed_ingredient': True, u'first_approval': 1984, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Smoking Cessation Adjunct', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL3', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL3', u'parent_chembl_id': u'CHEMBL3'}, u'molecule_properties': {u'acd_logd': u'-0.62', u'acd_logp': u'0.57', u'acd_most_apka': None, u'acd_most_bpka': u'8.00', u'alogp': u'1.24', u'aromatic_rings': 1, u'full_molformula': u'C10H14N2', u'full_mwt': u'162.23', u'hba': 2, u'hbd': 0, u'heavy_atoms': 12, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'162.23', u'mw_monoisotopic': u'162.1157', u'num_alerts': 0, u'num_ro5_violations': 0, u'psa': u'16.13', u'qed_weighted': u'0.62', u'ro3_pass': u'Y', u'rtb': 1}, u'molecule_structures': {u'canonical_smiles': u'CN1CCC[C@H]1c2cccnc2', u'standard_inchi': u'InChI=1S/C10H14N2/c1-12-7-3-5-10(12)9-4-2-6-11-8-9/h2,4,6,8,10H,3,5,7H2,1H3/t10-/m0/s1', u'standard_inchi_key': u'SNICXCGAKADSCV-JTQLQIEISA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Habitrol'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Nicoderm CQ'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Nicotine'}, {u'syn_type': u'USAN', u'synonyms': u'Nicotine'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Nicotrol'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Prostep'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Nicotrol Inhaler'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Nicotrol NS'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Nicoderm'}, {u'syn_type': u'MERCK_INDEX', u'synonyms': u'Nicotine'}, {u'syn_type': u'FDA', u'synonyms': u'Nicotine'}, {u'syn_type': u'USP', u'synonyms': u'Nicotine'}], u'molecule_type': u'Small molecule', u'natural_product': u'1', u'oral': True, u'parenteral': False, u'polymer_flag': True, u'pref_name': u'NICOTINE', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': True, u'usan_stem': None, u'usan_stem_definition': None, u'usan_substem': None, u'usan_year': 1985}, {u'atc_classifications': [u'S02AA16', u'J01MA01', u'S01AE01'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 7731, u'chirality': u'0', u'dosed_ingredient': True, u'first_approval': 1990, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antibacterial', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL4', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL4', u'parent_chembl_id': u'CHEMBL4'}, u'molecule_properties': {u'acd_logd': u'-0.39', u'acd_logp': u'1.86', u'acd_most_apka': u'5.19', u'acd_most_bpka': u'7.37', u'alogp': u'-1.37', u'aromatic_rings': 1, u'full_molformula': u'C18H20FN3O4', u'full_mwt': u'361.37', u'hba': 7, u'hbd': 1, u'heavy_atoms': 26, u'molecular_species': u'ACID', u'mw_freebase': u'361.37', u'mw_monoisotopic': u'361.1438', u'num_alerts': 1, u'num_ro5_violations': 0, u'psa': u'73.31', u'qed_weighted': u'0.65', u'ro3_pass': u'N', u'rtb': 2}, u'molecule_structures': {u'canonical_smiles': u'CC1COc2c(N3CCN(C)CC3)c(F)cc4C(=O)C(=CN1c24)C(=O)O', u'standard_inchi': u'InChI=1S/C18H20FN3O4/c1-10-9-26-17-14-11(16(23)12(18(24)25)8-22(10)14)7-13(19)15(17)21-5-3-20(2)4-6-21/h7-8,10H,3-6,9H2,1-2H3,(H,24,25)', u'standard_inchi_key': u'GSDSWSVVBLHKDQ-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Floxin'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Floxin Otic'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'HOE-280'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Ocuflox'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Ofloxacin'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Visiren'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Tarivid'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'DL-8280'}, {u'syn_type': u'BAN', u'synonyms': u'Ofloxacin'}, {u'syn_type': u'FDA', u'synonyms': u'Ofloxacin'}, {u'syn_type': u'INN', u'synonyms': u'Ofloxacin'}, {u'syn_type': u'JAN', u'synonyms': u'Ofloxacin'}, {u'syn_type': u'USP', u'synonyms': u'Ofloxacin'}, {u'syn_type': u'USAN', u'synonyms': u'Ofloxacin'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'OFLOXACIN', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': True, u'usan_stem': u'-oxacin', u'usan_stem_definition': u'antibacterials (quinolone derivatives)', u'usan_substem': None, u'usan_year': 1984}, {u'atc_classifications': [u'J01MB02'], u'availability_type': u'0', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 100147, u'chirality': u'2', u'dosed_ingredient': True, u'first_approval': 1964, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antibacterial', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL5', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL5', u'parent_chembl_id': u'CHEMBL5'}, u'molecule_properties': {u'acd_logd': u'-1.54', u'acd_logp': u'0.03', u'acd_most_apka': u'3.45', u'acd_most_bpka': u'6.12', u'alogp': u'1.18', u'aromatic_rings': 1, u'full_molformula': u'C12H12N2O3', u'full_mwt': u'232.24', u'hba': 5, u'hbd': 1, u'heavy_atoms': 17, u'molecular_species': u'ACID', u'mw_freebase': u'232.24', u'mw_monoisotopic': u'232.0848', u'num_alerts': 1, u'num_ro5_violations': 0, u'psa': u'70.50', u'qed_weighted': u'0.78', u'ro3_pass': u'N', u'rtb': 2}, u'molecule_structures': {u'canonical_smiles': u'CCN1C=C(C(=O)O)C(=O)c2ccc(C)nc12', u'standard_inchi': u'InChI=1S/C12H12N2O3/c1-3-14-6-9(12(16)17)10(15)8-5-4-7(2)13-11(8)14/h4-6H,3H2,1-2H3,(H,16,17)', u'standard_inchi_key': u'MHWLWQUZZRMNGJ-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Nalidixic Acid'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Neggram'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'WIN-18320'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Neg Gram'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Uroneg'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Wintomylon'}, {u'syn_type': u'OTHER', u'synonyms': u'Nalidixane'}, {u'syn_type': u'BAN', u'synonyms': u'Nalidixic Acid'}, {u'syn_type': u'FDA', u'synonyms': u'Nalidixic Acid'}, {u'syn_type': u'INN', u'synonyms': u'Nalidixic Acid'}, {u'syn_type': u'JAN', u'synonyms': u'Nalidixic Acid'}, {u'syn_type': u'USP', u'synonyms': u'Nalidixic Acid'}, {u'syn_type': u'USAN', u'synonyms': u'Nalidixic Acid'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'NALIDIXIC ACID', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': None, u'usan_stem_definition': None, u'usan_substem': None, u'usan_year': 1962}, {u'atc_classifications': [u'M01AB51', u'M02AA23', u'C01EB03', u'S01BC01', u'M01AB01'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 49662, u'chirality': u'2', u'dosed_ingredient': True, u'first_approval': 1965, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Anti-Inflammatory', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL6', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL6', u'parent_chembl_id': u'CHEMBL6'}, u'molecule_properties': {u'acd_logd': u'0.98', u'acd_logp': u'4.25', u'acd_most_apka': u'3.96', u'acd_most_bpka': None, u'alogp': u'4.24', u'aromatic_rings': 3, u'full_molformula': u'C19H16ClNO4', u'full_mwt': u'357.79', u'hba': 4, u'hbd': 1, u'heavy_atoms': 25, u'molecular_species': u'ACID', u'mw_freebase': u'357.79', u'mw_monoisotopic': u'357.0768', u'num_alerts': 0, u'num_ro5_violations': 0, u'psa': u'68.53', u'qed_weighted': u'0.76', u'ro3_pass': u'N', u'rtb': 4}, u'molecule_structures': {u'canonical_smiles': u'COc1ccc2c(c1)c(CC(=O)O)c(C)n2C(=O)c3ccc(Cl)cc3', u'standard_inchi': u'InChI=1S/C19H16ClNO4/c1-11-15(10-18(22)23)16-9-14(25-2)7-8-17(16)21(11)19(24)12-3-5-13(20)6-4-12/h3-9H,10H2,1-2H3,(H,22,23)', u'standard_inchi_key': u'CGIGDMFJXJATDK-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Indo-Lemmon'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Indocin'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Indocin SR'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Indomethacin'}, {u'syn_type': u'BAN', u'synonyms': u'Indometacin'}, {u'syn_type': u'DCF', u'synonyms': u'Indometacin'}, {u'syn_type': u'INN', u'synonyms': u'Indometacin'}, {u'syn_type': u'JAN', u'synonyms': u'Indometacin'}, {u'syn_type': u'FDA', u'synonyms': u'Indomethacin'}, {u'syn_type': u'USP', u'synonyms': u'Indomethacin'}, {u'syn_type': u'USAN', u'synonyms': u'Indomethacin'}, {u'syn_type': u'OTHER', u'synonyms': u'Indomethacin'}, {u'syn_type': u'JAN', u'synonyms': u'Indometacin Farnesil'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'INDOMETHACIN', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': True, u'usan_stem': None, u'usan_stem_definition': None, u'usan_substem': None, u'usan_year': 1963}, {u'atc_classifications': [u'J01CG01'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 9321, u'chirality': u'1', u'dosed_ingredient': False, u'first_approval': 1986, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Inhibitor (beta-lactamase); Synergist (penicillin/cephalosporin),Synergist (penicillin/cephalosporin); Inhibitor (beta-lactamase)', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL403', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL403', u'parent_chembl_id': u'CHEMBL403'}, u'molecule_properties': {u'acd_logd': u'-0.73', u'acd_logp': u'1.27', u'acd_most_apka': u'12.48', u'acd_most_bpka': u'13.48', u'alogp': u'-0.67', u'aromatic_rings': 0, u'full_molformula': u'C8H11NO5S', u'full_mwt': u'233.24', u'hba': 5, u'hbd': 1, u'heavy_atoms': 15, u'molecular_species': u'BASE', u'mw_freebase': u'233.24', u'mw_monoisotopic': u'233.0358', u'num_alerts': 1, u'num_ro5_violations': 0, u'psa': u'100.13', u'qed_weighted': u'0.61', u'ro3_pass': u'N', u'rtb': 1}, u'molecule_structures': {u'canonical_smiles': u'CC1(C)[C@@H](N2[C@@H](CC2=O)S1(=O)=O)C(=O)O', u'standard_inchi': u'InChI=1S/C8H11NO5S/c1-8(2)6(7(11)12)9-4(10)3-5(9)15(8,13)14/h5-6H,3H2,1-2H3,(H,11,12)/t5-,6+/m1/s1', u'standard_inchi_key': u'FKENQMMABCRJMK-RITPCOANSA-N'}, u'molecule_synonyms': [{u'syn_type': u'RESEARCH_CODE', u'synonyms': u'CP-45899'}, {u'syn_type': u'FDA', u'synonyms': u'Sulbactam'}, {u'syn_type': u'BAN', u'synonyms': u'Sulbactam'}, {u'syn_type': u'INN', u'synonyms': u'Sulbactam'}], u'molecule_type': u'Small molecule', u'natural_product': u'1', u'oral': False, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'SULBACTAM', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-bactam', u'usan_stem_definition': u'beta-lactamase inhibitors', u'usan_substem': None, u'usan_year': 1980}, {u'atc_classifications': [u'J01CG02'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 9421, u'chirality': u'1', u'dosed_ingredient': True, u'first_approval': 1993, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Inhibitor (beta-lactamase)', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL404', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL404', u'parent_chembl_id': u'CHEMBL404'}, u'molecule_properties': {u'acd_logd': u'-3.13', u'acd_logp': u'0.60', u'acd_most_apka': u'2.33', u'acd_most_bpka': u'0.85', u'alogp': u'-1.19', u'aromatic_rings': 1, u'full_molformula': u'C10H12N4O5S', u'full_mwt': u'300.29', u'hba': 7, u'hbd': 1, u'heavy_atoms': 20, u'molecular_species': u'ACID', u'mw_freebase': u'300.29', u'mw_monoisotopic': u'300.0528', u'num_alerts': 1, u'num_ro5_violations': 0, u'psa': u'130.84', u'qed_weighted': u'0.69', u'ro3_pass': u'N', u'rtb': 3}, u'molecule_structures': {u'canonical_smiles': u'C[C@]1(Cn2ccnn2)[C@@H](N3[C@@H](CC3=O)S1(=O)=O)C(=O)O', u'standard_inchi': u'InChI=1S/C10H12N4O5S/c1-10(5-13-3-2-11-12-13)8(9(16)17)14-6(15)4-7(14)20(10,18)19/h2-3,7-8H,4-5H2,1H3,(H,16,17)/t7-,8+,10+/m1/s1', u'standard_inchi_key': u'LPQZKKCYTLCDGQ-WEDXCCLWSA-N'}, u'molecule_synonyms': [{u'syn_type': u'RESEARCH_CODE', u'synonyms': u'CL-298741'}, {u'syn_type': u'FDA', u'synonyms': u'Tazobactam'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'YTR-830H'}, {u'syn_type': u'BAN', u'synonyms': u'Tazobactam'}, {u'syn_type': u'INN', u'synonyms': u'Tazobactam'}, {u'syn_type': u'USAN', u'synonyms': u'Tazobactam'}], u'molecule_type': u'Small molecule', u'natural_product': u'1', u'oral': False, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'TAZOBACTAM', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-bactam', u'usan_stem_definition': u'beta-lactamase inhibitors', u'usan_substem': None, u'usan_year': 1989}, {u'atc_classifications': [u'S02AA15', u'S03AA07', u'J01MA02', u'S01AE03'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 100241, u'chirality': u'2', u'dosed_ingredient': True, u'first_approval': 1987, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antibacterial', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL8', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL8', u'parent_chembl_id': u'CHEMBL8'}, u'molecule_properties': {u'acd_logd': u'-0.29', u'acd_logp': u'1.63', u'acd_most_apka': u'6.43', u'acd_most_bpka': u'8.68', u'alogp': u'-1.27', u'aromatic_rings': 1, u'full_molformula': u'C17H18FN3O3', u'full_mwt': u'331.34', u'hba': 6, u'hbd': 2, u'heavy_atoms': 24, u'molecular_species': u'ZWITTERION', u'mw_freebase': u'331.34', u'mw_monoisotopic': u'331.1332', u'num_alerts': 1, u'num_ro5_violations': 0, u'psa': u'72.88', u'qed_weighted': u'0.67', u'ro3_pass': u'N', u'rtb': 3}, u'molecule_structures': {u'canonical_smiles': u'OC(=O)C1=CN(C2CC2)c3cc(N4CCNCC4)c(F)cc3C1=O', u'standard_inchi': u'InChI=1S/C17H18FN3O3/c18-13-7-11-14(8-15(13)20-5-3-19-4-6-20)21(10-1-2-10)9-12(16(11)22)17(23)24/h7-10,19H,1-6H2,(H,23,24)', u'standard_inchi_key': u'MYSWGUAQZAJSOK-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'RESEARCH_CODE', u'synonyms': u'BAY-Q-3939'}, {u'syn_type': u'OTHER', u'synonyms': u'Ciloxan'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Cipro'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Ciprofloxacin'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Ciloxan'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Velmonit'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Ciprobay'}, {u'syn_type': u'BAN', u'synonyms': u'Ciprofloxacin'}, {u'syn_type': u'FDA', u'synonyms': u'Ciprofloxacin'}, {u'syn_type': u'INN', u'synonyms': u'Ciprofloxacin'}, {u'syn_type': u'USP', u'synonyms': u'Ciprofloxacin'}, {u'syn_type': u'USAN', u'synonyms': u'Ciprofloxacin'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'CIPROFLOXACIN', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': True, u'usan_stem': u'-oxacin', u'usan_stem_definition': u'antibacterials (quinolone derivatives)', u'usan_substem': None, u'usan_year': 1987}, {u'atc_classifications': [u'J01MA06', u'S01AE02'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 100246, u'chirality': u'2', u'dosed_ingredient': True, u'first_approval': 1986, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antibacterial', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL9', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL9', u'parent_chembl_id': u'CHEMBL9'}, u'molecule_properties': {u'acd_logd': u'-0.66', u'acd_logp': u'1.74', u'acd_most_apka': u'0.16', u'acd_most_bpka': u'8.68', u'alogp': u'-1.41', u'aromatic_rings': 1, u'full_molformula': u'C16H18FN3O3', u'full_mwt': u'319.33', u'hba': 6, u'hbd': 2, u'heavy_atoms': 23, u'molecular_species': u'ZWITTERION', u'mw_freebase': u'319.33', u'mw_monoisotopic': u'319.1332', u'num_alerts': 1, u'num_ro5_violations': 0, u'psa': u'72.88', u'qed_weighted': u'0.67', u'ro3_pass': u'N', u'rtb': 3}, u'molecule_structures': {u'canonical_smiles': u'CCN1C=C(C(=O)O)C(=O)c2cc(F)c(cc12)N3CCNCC3', u'standard_inchi': u'InChI=1S/C16H18FN3O3/c1-2-19-9-11(16(22)23)15(21)10-7-12(17)14(8-13(10)19)20-5-3-18-4-6-20/h7-9,18H,2-6H2,1H3,(H,22,23)', u'standard_inchi_key': u'OGJPXUAPXNRGGI-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Chibroxin'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Noroxin'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Baccidal'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Quinabic'}, {u'syn_type': u'OTHER', u'synonyms': u'Noroxin'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'MK-366'}, {u'syn_type': u'BAN', u'synonyms': u'Norfloxacin'}, {u'syn_type': u'FDA', u'synonyms': u'Norfloxacin'}, {u'syn_type': u'INN', u'synonyms': u'Norfloxacin'}, {u'syn_type': u'JAN', u'synonyms': u'Norfloxacin'}, {u'syn_type': u'USP', u'synonyms': u'Norfloxacin'}, {u'syn_type': u'USAN', u'synonyms': u'Norfloxacin'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'NORFLOXACIN', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': True, u'usan_stem': u'-oxacin', u'usan_stem_definition': u'antibacterials (quinolone derivatives)', u'usan_substem': None, u'usan_year': 1984}, {u'atc_classifications': [u'N06BA01'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 2679, u'chirality': u'0', u'dosed_ingredient': True, u'first_approval': 1955, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Stimulant (central)', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL405', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL405', u'parent_chembl_id': u'CHEMBL405'}, u'molecule_properties': {u'acd_logd': u'-0.65', u'acd_logp': u'1.79', u'acd_most_apka': None, u'acd_most_bpka': u'9.94', u'alogp': u'1.63', u'aromatic_rings': 1, u'full_molformula': u'C9H13N', u'full_mwt': u'135.21', u'hba': 1, u'hbd': 1, u'heavy_atoms': 10, u'molecular_species': u'BASE', u'mw_freebase': u'135.21', u'mw_monoisotopic': u'135.1048', u'num_alerts': 0, u'num_ro5_violations': 0, u'psa': u'26.02', u'qed_weighted': u'0.66', u'ro3_pass': u'Y', u'rtb': 2}, u'molecule_structures': {u'canonical_smiles': u'CC(N)Cc1ccccc1', u'standard_inchi': u'InChI=1S/C9H13N/c1-8(10)7-9-5-3-2-4-6-9/h2-6,8H,7,10H2,1H3', u'standard_inchi_key': u'KWTSXDURSIMDCE-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'FDA', u'synonyms': u'Amphetamine'}, {u'syn_type': u'USAN', u'synonyms': u'Amphetamine'}, {u'syn_type': u'INN', u'synonyms': u'Amphetamine'}, {u'syn_type': u'INN', u'synonyms': u'Amfetamine'}, {u'syn_type': u'FDA', u'synonyms': u'Amphetamine resin complex'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'AMPHETAMINE', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': None, u'usan_stem_definition': None, u'usan_substem': None, u'usan_year': None}, {u'atc_classifications': [u'C05AE02', u'C01DA08', u'C01DA58'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 6061, u'chirality': u'1', u'dosed_ingredient': True, u'first_approval': 1986, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Vasodilator (coronary)', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL6622', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL6622', u'parent_chembl_id': u'CHEMBL6622'}, u'molecule_properties': {u'acd_logd': u'0.95', u'acd_logp': u'0.95', u'acd_most_apka': None, u'acd_most_bpka': None, u'alogp': u'3.41', u'aromatic_rings': 0, u'full_molformula': u'C6H8N2O8', u'full_mwt': u'236.14', u'hba': 8, u'hbd': 0, u'heavy_atoms': 16, u'molecular_species': None, u'mw_freebase': u'236.14', u'mw_monoisotopic': u'236.0281', u'num_alerts': 2, u'num_ro5_violations': 0, u'psa': u'128.56', u'qed_weighted': u'0.54', u'ro3_pass': u'N', u'rtb': 4}, u'molecule_structures': {u'canonical_smiles': u'[O-][N+](=O)O[C@H]1CO[C@@H]2[C@@H](CO[C@H]12)O[N+](=O)[O-]', u'standard_inchi': u'InChI=1S/C6H8N2O8/c9-7(10)15-3-1-13-6-4(16-8(11)12)2-14-5(3)6/h3-6H,1-2H2/t3-,4+,5-,6-/m1/s1', u'standard_inchi_key': u'MOYKHGMNXAOIAT-JGWLITMVSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Isordil'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Dilatrate-Sr'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Isosorbide Dinitrate'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Sorbitrate'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Dilatrate'}, {u'syn_type': u'OTHER', u'synonyms': u'Sorbide Nitrate'}, {u'syn_type': u'BAN', u'synonyms': u'Isosorbide Dinitrate'}, {u'syn_type': u'FDA', u'synonyms': u'Isosorbide Dinitrate'}, {u'syn_type': u'INN', u'synonyms': u'Isosorbide Dinitrate'}, {u'syn_type': u'JAN', u'synonyms': u'Isosorbide Dinitrate'}, {u'syn_type': u'USP', u'synonyms': u'Isosorbide Dinitrate'}, {u'syn_type': u'USAN', u'synonyms': u'Isosorbide Dinitrate'}], u'molecule_type': u'Small molecule', u'natural_product': u'1', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'ISOSORBIDE DINITRATE', u'prodrug': u'1', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': True, u'usan_stem': None, u'usan_stem_definition': None, u'usan_substem': None, u'usan_year': 1966}, {u'atc_classifications': [u'N06AA02'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 47499, u'chirality': u'2', u'dosed_ingredient': False, u'first_approval': 1959, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antidepressant', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL11', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL11', u'parent_chembl_id': u'CHEMBL11'}, u'molecule_properties': {u'acd_logd': u'4.24', u'acd_logp': u'4.24', u'acd_most_apka': u'9.89', u'acd_most_bpka': u'4.79', u'alogp': u'4.39', u'aromatic_rings': 2, u'full_molformula': u'C19H24N2', u'full_mwt': u'280.41', u'hba': 2, u'hbd': 0, u'heavy_atoms': 21, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'280.41', u'mw_monoisotopic': u'280.1939', u'num_alerts': 0, u'num_ro5_violations': 0, u'psa': u'6.48', u'qed_weighted': u'0.82', u'ro3_pass': u'N', u'rtb': 4}, u'molecule_structures': {u'canonical_smiles': u'CN(C)CCCN1c2ccccc2CCc3ccccc13', u'standard_inchi': u'InChI=1S/C19H24N2/c1-20(2)14-7-15-21-18-10-5-3-8-16(18)12-13-17-9-4-6-11-19(17)21/h3-6,8-11H,7,12-15H2,1-2H3', u'standard_inchi_key': u'BCGWQEUPMDMJNV-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'FDA', u'synonyms': u'Imipramine'}, {u'syn_type': u'OTHER', u'synonyms': u'Janimine'}, {u'syn_type': u'OTHER', u'synonyms': u'Pramine'}, {u'syn_type': u'OTHER', u'synonyms': u'Presamine'}, {u'syn_type': u'OTHER', u'synonyms': u'Tofranil'}, {u'syn_type': u'OTHER', u'synonyms': u'Tofranil-PM'}, {u'syn_type': u'BAN', u'synonyms': u'Imipramine'}, {u'syn_type': u'INN', u'synonyms': u'Imipramine'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'IMIPRAMINE', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-pramine', u'usan_stem_definition': u'antidepressants (imipramine type)', u'usan_substem': None, u'usan_year': None}, {u'atc_classifications': [u'C03BA11'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 5893, u'chirality': u'0', u'dosed_ingredient': True, u'first_approval': 1983, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antihypertensive; Diuretic', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL406', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL406', u'parent_chembl_id': u'CHEMBL406'}, u'molecule_properties': {u'acd_logd': u'1.96', u'acd_logp': u'1.96', u'acd_most_apka': u'9.35', u'acd_most_bpka': u'1.87', u'alogp': u'2.66', u'aromatic_rings': 2, u'full_molformula': u'C16H16ClN3O3S', u'full_mwt': u'365.83', u'hba': 4, u'hbd': 2, u'heavy_atoms': 24, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'365.83', u'mw_monoisotopic': u'365.0601', u'num_alerts': 0, u'num_ro5_violations': 0, u'psa': u'100.88', u'qed_weighted': u'0.87', u'ro3_pass': u'N', u'rtb': 3}, u'molecule_structures': {u'canonical_smiles': u'CC1Cc2ccccc2N1NC(=O)c3ccc(Cl)c(c3)S(=O)(=O)N', u'standard_inchi': u'InChI=1S/C16H16ClN3O3S/c1-10-8-11-4-2-3-5-14(11)20(10)19-16(21)12-6-7-13(17)15(9-12)24(18,22)23/h2-7,9-10H,8H2,1H3,(H,19,21)(H2,18,22,23)', u'standard_inchi_key': u'NDDAHWYSQHTHNT-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Indapamide'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Lozol'}, {u'syn_type': u'BAN', u'synonyms': u'Indapamide'}, {u'syn_type': u'FDA', u'synonyms': u'Indapamide'}, {u'syn_type': u'INN', u'synonyms': u'Indapamide'}, {u'syn_type': u'JAN', u'synonyms': u'Indapamide'}, {u'syn_type': u'USP', u'synonyms': u'Indapamide'}, {u'syn_type': u'USAN', u'synonyms': u'Indapamide'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'INDAPAMIDE', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-pamide', u'usan_stem_definition': u'diuretics (sulfamoylbenzoic acid derivatives)', u'usan_substem': None, u'usan_year': 1979}, {u'atc_classifications': [u'V03AB25'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 5103, u'chirality': u'2', u'dosed_ingredient': True, u'first_approval': 1991, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antagonist (to benzodiazepine)', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL407', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL407', u'parent_chembl_id': u'CHEMBL407'}, u'molecule_properties': {u'acd_logd': u'2.15', u'acd_logp': u'2.15', u'acd_most_apka': None, u'acd_most_bpka': u'0.86', u'alogp': u'1.58', u'aromatic_rings': 2, u'full_molformula': u'C15H14FN3O3', u'full_mwt': u'303.29', u'hba': 4, u'hbd': 0, u'heavy_atoms': 22, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'303.29', u'mw_monoisotopic': u'303.1019', u'num_alerts': 1, u'num_ro5_violations': 0, u'psa': u'64.43', u'qed_weighted': u'0.81', u'ro3_pass': u'N', u'rtb': 3}, u'molecule_structures': {u'canonical_smiles': u'CCOC(=O)c1ncn2c1CN(C)C(=O)c3cc(F)ccc23', u'standard_inchi': u'InChI=1S/C15H14FN3O3/c1-3-22-15(21)13-12-7-18(2)14(20)10-6-9(16)4-5-11(10)19(12)8-17-13/h4-6,8H,3,7H2,1-2H3', u'standard_inchi_key': u'OFBIFZUFASYYRE-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Flumazenil'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'Ro-151788'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Romazicon'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Anexate'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Flumazepil'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Mazicon'}, {u'syn_type': u'BAN', u'synonyms': u'Flumazenil'}, {u'syn_type': u'FDA', u'synonyms': u'Flumazenil'}, {u'syn_type': u'INN', u'synonyms': u'Flumazenil'}, {u'syn_type': u'USP', u'synonyms': u'Flumazenil'}, {u'syn_type': u'USAN', u'synonyms': u'Flumazenil'}, {u'syn_type': u'OTHER', u'synonyms': u'Flumazepil'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'Ro-151788000'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': False, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'FLUMAZENIL', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-azenil', u'usan_stem_definition': u'benzodiazepine receptor agonists/antagonists (benzodiazepine derivatives)', u'usan_substem': None, u'usan_year': 1987}, {u'atc_classifications': [u'A10BG01'], u'availability_type': u'0', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 9753, u'chirality': u'0', u'dosed_ingredient': True, u'first_approval': 1997, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antidiabetic', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL408', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL408', u'parent_chembl_id': u'CHEMBL408'}, u'molecule_properties': {u'acd_logd': u'3.65', u'acd_logp': u'4.69', u'acd_most_apka': u'6.35', u'acd_most_bpka': None, u'alogp': u'5.68', u'aromatic_rings': 2, u'full_molformula': u'C24H27NO5S', u'full_mwt': u'441.54', u'hba': 6, u'hbd': 2, u'heavy_atoms': 31, u'molecular_species': u'ACID', u'mw_freebase': u'441.54', u'mw_monoisotopic': u'441.1610', u'num_alerts': 0, u'num_ro5_violations': 1, u'psa': u'110.16', u'qed_weighted': u'0.62', u'ro3_pass': u'N', u'rtb': 5}, u'molecule_structures': {u'canonical_smiles': u'Cc1c(C)c2OC(C)(COc3ccc(CC4SC(=O)NC4=O)cc3)CCc2c(C)c1O', u'standard_inchi': u'InChI=1S/C24H27NO5S/c1-13-14(2)21-18(15(3)20(13)26)9-10-24(4,30-21)12-29-17-7-5-16(6-8-17)11-19-22(27)25-23(28)31-19/h5-8,19,26H,9-12H2,1-4H3,(H,25,27,28)', u'standard_inchi_key': u'GXPHKUHSUJUWKP-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'RESEARCH_CODE', u'synonyms': u'GR-92132X'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Prelay'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Rezulin'}, {u'syn_type': u'OTHER', u'synonyms': u'Rezulin'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'CI-991'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'CS-045'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'Gr92132X'}, {u'syn_type': u'BAN', u'synonyms': u'Troglitazone'}, {u'syn_type': u'FDA', u'synonyms': u'Troglitazone'}, {u'syn_type': u'INN', u'synonyms': u'Troglitazone'}, {u'syn_type': u'USAN', u'synonyms': u'Troglitazone'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'TROGLITAZONE', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-glitazone', u'usan_stem_definition': u'PPST agonists (thiazolidene derivatives)', u'usan_substem': None, u'usan_year': 1995}, {u'atc_classifications': [u'L02BB03'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': None, u'chirality': u'0', u'dosed_ingredient': True, u'first_approval': 1995, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antineoplastic', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL409', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL409', u'parent_chembl_id': u'CHEMBL409'}, u'molecule_properties': {u'acd_logd': u'4.14', u'acd_logp': u'4.14', u'acd_most_apka': u'11.49', u'acd_most_bpka': None, u'alogp': u'2.93', u'aromatic_rings': 2, u'full_molformula': u'C18H14F4N2O4S', u'full_mwt': u'430.37', u'hba': 5, u'hbd': 2, u'heavy_atoms': 29, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'430.37', u'mw_monoisotopic': u'430.0610', u'num_alerts': 2, u'num_ro5_violations': 0, u'psa': u'115.64', u'qed_weighted': u'0.54', u'ro3_pass': u'N', u'rtb': 6}, u'molecule_structures': {u'canonical_smiles': u'CC(O)(CS(=O)(=O)c1ccc(F)cc1)C(=O)Nc2ccc(C#N)c(c2)C(F)(F)F', u'standard_inchi': u'InChI=1S/C18H14F4N2O4S/c1-17(26,10-29(27,28)14-6-3-12(19)4-7-14)16(25)24-13-5-2-11(9-23)15(8-13)18(20,21)22/h2-8,26H,10H2,1H3,(H,24,25)', u'standard_inchi_key': u'LKJPYSCBVHEWIU-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Casodex'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Cosudex'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Calutide'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Kalumid'}, {u'syn_type': u'OTHER', u'synonyms': u'Casodex'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Bicalutamide'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'ICI-176334'}, {u'syn_type': u'BAN', u'synonyms': u'Bicalutamide'}, {u'syn_type': u'FDA', u'synonyms': u'Bicalutamide'}, {u'syn_type': u'INN', u'synonyms': u'Bicalutamide'}, {u'syn_type': u'USAN', u'synonyms': u'Bicalutamide'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'BICALUTAMIDE', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-lutamide', u'usan_stem_definition': u'non-steroid antiandrogens', u'usan_substem': None, u'usan_year': 1994}, {u'atc_classifications': [u'N05BA01'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 49575, u'chirality': u'2', u'dosed_ingredient': True, u'first_approval': 1963, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Sedative-Hypnotic', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL12', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL12', u'parent_chembl_id': u'CHEMBL12'}, u'molecule_properties': {u'acd_logd': u'2.80', u'acd_logp': u'2.80', u'acd_most_apka': None, u'acd_most_bpka': u'3.40', u'alogp': u'3.17', u'aromatic_rings': 2, u'full_molformula': u'C16H13ClN2O', u'full_mwt': u'284.74', u'hba': 2, u'hbd': 0, u'heavy_atoms': 20, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'284.74', u'mw_monoisotopic': u'284.0716', u'num_alerts': 0, u'num_ro5_violations': 0, u'psa': u'32.67', u'qed_weighted': u'0.81', u'ro3_pass': u'N', u'rtb': 1}, u'molecule_structures': {u'canonical_smiles': u'CN1C(=O)CN=C(c2ccccc2)c3cc(Cl)ccc13', u'standard_inchi': u'InChI=1S/C16H13ClN2O/c1-19-14-8-7-12(17)9-13(14)16(18-10-15(19)20)11-5-3-2-4-6-11/h2-9H,10H2,1H3', u'standard_inchi_key': u'AAOVKJBEBIDNHE-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Diastat'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Diastat Acudial'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Diazepam'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Diazepam Intensol'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Dizac'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Q-Pam'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Valium'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Valrelease'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Apozepam'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'E-Pam'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Paxel'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Relanium'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Scriptopam'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Serenack'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Stesolid'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Tranimul'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Vivol'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'LA-III'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'WY-3467'}, {u'syn_type': u'BAN', u'synonyms': u'Diazepam'}, {u'syn_type': u'FDA', u'synonyms': u'Diazepam'}, {u'syn_type': u'INN', u'synonyms': u'Diazepam'}, {u'syn_type': u'JAN', u'synonyms': u'Diazepam'}, {u'syn_type': u'USP', u'synonyms': u'Diazepam'}, {u'syn_type': u'USAN', u'synonyms': u'Diazepam'}, {u'syn_type': u'RESEARCH_CODE', u'synonyms': u'Ro-52807'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'DIAZEPAM', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': True, u'usan_stem': u'-azepam', u'usan_stem_definition': u'antianxiety agents (diazepam type)', u'usan_substem': None, u'usan_year': 1963}, {u'atc_classifications': [u'C07AB02', u'C07AB52'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'1', u'chebi_par_id': 6904, u'chirality': u'0', u'dosed_ingredient': True, u'first_approval': 1978, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Anti-Adrenergic (beta-receptor),Antihypertensive,Antihypertensive; Anti-Anginal', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL13', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL13', u'parent_chembl_id': u'CHEMBL13'}, u'molecule_properties': {u'acd_logd': u'-0.47', u'acd_logp': u'1.63', u'acd_most_apka': u'13.89', u'acd_most_bpka': u'9.43', u'alogp': u'1.76', u'aromatic_rings': 1, u'full_molformula': u'C15H25NO3', u'full_mwt': u'267.36', u'hba': 4, u'hbd': 2, u'heavy_atoms': 19, u'molecular_species': u'BASE', u'mw_freebase': u'267.36', u'mw_monoisotopic': u'267.1834', u'num_alerts': 0, u'num_ro5_violations': 0, u'psa': u'50.72', u'qed_weighted': u'0.72', u'ro3_pass': u'N', u'rtb': 9}, u'molecule_structures': {u'canonical_smiles': u'COCCc1ccc(OCC(O)CNC(C)C)cc1', u'standard_inchi': u'InChI=1S/C15H25NO3/c1-12(2)16-10-14(17)11-19-15-6-4-13(5-7-15)8-9-18-3/h4-7,12,14,16-17H,8-11H2,1-3H3', u'standard_inchi_key': u'IUBSYMUCCVWXPE-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'OTHER', u'synonyms': u'Lopressidone'}, {u'syn_type': u'FDA', u'synonyms': u'Metoprolol'}, {u'syn_type': u'OTHER', u'synonyms': u'Toprol-XL'}, {u'syn_type': u'BAN', u'synonyms': u'Metoprolol'}, {u'syn_type': u'INN', u'synonyms': u'Metoprolol'}, {u'syn_type': u'USAN', u'synonyms': u'Metoprolol'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'METOPROLOL', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': u'-olol', u'usan_stem_definition': u'beta-blockers (propranolol type)', u'usan_substem': None, u'usan_year': 1976}, {u'atc_classifications': [u'G03CB02', u'G03CC05', u'L02AA01'], u'availability_type': u'0', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 41922, u'chirality': u'2', u'dosed_ingredient': True, u'first_approval': 1973, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Estrogen', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL411', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL411', u'parent_chembl_id': u'CHEMBL411'}, u'molecule_properties': {u'acd_logd': u'5.33', u'acd_logp': u'5.33', u'acd_most_apka': u'10.19', u'acd_most_bpka': None, u'alogp': u'5.14', u'aromatic_rings': 2, u'full_molformula': u'C18H20O2', u'full_mwt': u'268.35', u'hba': 2, u'hbd': 2, u'heavy_atoms': 20, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'268.35', u'mw_monoisotopic': u'268.1463', u'num_alerts': 1, u'num_ro5_violations': 1, u'psa': u'40.46', u'qed_weighted': u'0.75', u'ro3_pass': u'N', u'rtb': 4}, u'molecule_structures': {u'canonical_smiles': u'CC\\C(=C(\\CC)/c1ccc(O)cc1)\\c2ccc(O)cc2', u'standard_inchi': u'InChI=1S/C18H20O2/c1-3-17(13-5-9-15(19)10-6-13)18(4-2)14-7-11-16(20)12-8-14/h5-12,19-20H,3-4H2,1-2H3/b18-17+', u'standard_inchi_key': u'RGLYKWWBQGJZGM-ISLYRVAYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'TRADE_NAME', u'synonyms': u'Diethylstilbestrol'}, {u'syn_type': u'USAN', u'synonyms': u'Diethylstilbestrol'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Stilbestrol'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Stilbetin'}, {u'syn_type': u'TRADE_NAME', u'synonyms': u'Stilboesterol'}, {u'syn_type': u'OTHER', u'synonyms': u'Cyren A'}, {u'syn_type': u'OTHER', u'synonyms': u'Fonatol'}, {u'syn_type': u'OTHER', u'synonyms': u'Estrobene'}, {u'syn_type': u'OTHER', u'synonyms': u'Palestrol'}, {u'syn_type': u'OTHER', u'synonyms': u'Synestrin'}, {u'syn_type': u'OTHER', u'synonyms': u'Estromenin'}, {u'syn_type': u'OTHER', u'synonyms': u'Estrogenine'}, {u'syn_type': u'OTHER', u'synonyms': u'Stilbestrol'}, {u'syn_type': u'OTHER', u'synonyms': u'Synthestrin'}, {u'syn_type': u'OTHER', u'synonyms': u'Stilboestrol'}, {u'syn_type': u'OTHER', u'synonyms': u'New-Estranol 1'}, {u'syn_type': u'OTHER', u'synonyms': u'Stilbestroform'}, {u'syn_type': u'BAN', u'synonyms': u'Diethylstilbestrol'}, {u'syn_type': u'FDA', u'synonyms': u'Diethylstilbestrol'}, {u'syn_type': u'INN', u'synonyms': u'Diethylstilbestrol'}, {u'syn_type': u'USP', u'synonyms': u'Diethylstilbestrol'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': True, u'polymer_flag': False, u'pref_name': u'DIETHYLSTILBESTROL', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': True, u'usan_stem': u'-estr-', u'usan_stem_definition': u'estrogens', u'usan_substem': None, u'usan_year': None}, {u'atc_classifications': [u'C02BB01'], u'availability_type': u'1', u'biotherapeutic': None, u'black_box_warning': u'0', u'chebi_par_id': 6706, u'chirality': u'0', u'dosed_ingredient': False, u'first_approval': 1956, u'first_in_class': u'0', u'helm_notation': None, u'indication_class': u'Antihypertensive', u'inorganic_flag': u'0', u'max_phase': 4, u'molecule_chembl_id': u'CHEMBL267936', u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL267936', u'parent_chembl_id': u'CHEMBL267936'}, u'molecule_properties': {u'acd_logd': u'-0.62', u'acd_logp': u'0.57', u'acd_most_apka': None, u'acd_most_bpka': u'8.00', u'alogp': u'2.18', u'aromatic_rings': 0, u'full_molformula': u'C11H21N', u'full_mwt': u'167.29', u'hba': 1, u'hbd': 1, u'heavy_atoms': 12, u'molecular_species': u'NEUTRAL', u'mw_freebase': u'167.29', u'mw_monoisotopic': u'167.1674', u'num_alerts': 0, u'num_ro5_violations': 0, u'psa': u'12.03', u'qed_weighted': u'0.63', u'ro3_pass': u'Y', u'rtb': 1}, u'molecule_structures': {u'canonical_smiles': u'CNC1(C)C2CCC(C2)C1(C)C', u'standard_inchi': u'InChI=1S/C11H21N/c1-10(2)8-5-6-9(7-8)11(10,3)12-4/h8-9,12H,5-7H2,1-4H3', u'standard_inchi_key': u'IMYZQPCYWPFTAG-UHFFFAOYSA-N'}, u'molecule_synonyms': [{u'syn_type': u'BAN', u'synonyms': u'Mecamylamine'}, {u'syn_type': u'INN', u'synonyms': u'Mecamylamine'}], u'molecule_type': u'Small molecule', u'natural_product': u'0', u'oral': True, u'parenteral': False, u'polymer_flag': False, u'pref_name': u'MECAMYLAMINE', u'prodrug': u'0', u'structure_type': u'MOL', u'therapeutic_flag': True, u'topical': False, u'usan_stem': None, u'usan_stem_definition': None, u'usan_substem': None, u'usan_year': None}], u'page_meta': {u'limit': 20, u'next': u'/chemblws/molecule.json?max_phase=4&limit=20&offset=20', u'offset': 0, u'previous': None, u'total_count': 2879}}
The dictionary contains two top-level keys:
molecules
arraypage_meta
dictionaryThis means that by requesting data from the url-endpoint we are not getting the whole result set but a single page.
The page consists of a single portion of data (molecules
array) and some meta information about the page and whole result set (page_meta
dictionary).
# The default size of single page is 20 results:
len(url_approved_drugs['molecules'])
20
# But it can be extended up to 1000 results by providing `limit` argument:
url = url_stem + "/molecule.json?max_phase=4&limit=200"
bigger_page = requests.get(url).json()
print url
print len(bigger_page['molecules'])
http://localhost/chemblws/molecule.json?max_phase=4&limit=200 200
#Let's see what data is provided in `page-meta` dictionary:
url_approved_drugs['page_meta']
{u'limit': 20, u'next': u'/chemblws/molecule.json?max_phase=4&limit=20&offset=20', u'offset': 0, u'previous': None, u'total_count': 2879}
It gives following information:
limit
- current size of the page (the actual amount of data can be smaller if the whole result set is smaller than page size or we are looking at the last page)offset
- the difference between first element in the whole result set and the first element on current pagenext
- url poiting to the next page (if it exists)previous
- url pointing to the previous page (if it exists)total_count
- number of elements in the whole result setThis means that in order to get the whole result set we need to loop through the pages:
# Getting all approved drugs using url endpoint
localhost = "http://localhost/"
url_approved_drugs = requests.get(localhost + "chemblws/molecule.json?max_phase=4&limit=1000").json()
results = url_approved_drugs['molecules']
while url_approved_drugs['page_meta']['next']:
url_approved_drugs = requests.get(localhost + url_approved_drugs['page_meta']['next']).json()
results += url_approved_drugs['molecules']
print len(results)
print len(results) == url_approved_drugs['page_meta']['total_count']
2879 True
With the client-generated results, we no longer have to worry about pagination:
# The QuerySet object returned by the client is a lazily-evaluated iterator
# This means that it's ready to use and it will try to reduce the amount of server requests
# All results are cached as well so they are fetched from server only once.
approved_drugs = molecule.filter(max_phase=4)
# Getting the lenght of the whole result set is easy:
print len(approved_drugs)
# So is getting a single element:
print approved_drugs[123]
# Or a chunk of elements:
print approved_drugs[2:5]
# Or using in the loops or list comprehensions:
drug_smiles = [drug['molecule_structures']['canonical_smiles'] for drug in approved_drugs if drug['molecule_structures']]
print len(drug_smiles)
2879 {u'max_phase': 4, u'usan_stem': u'-ifen(e)', u'parenteral': False, u'dosed_ingredient': False, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': 41774, u'first_approval': 1977, u'atc_classifications': [u'L02BA01'], u'prodrug': u'1', u'molecule_structures': {u'standard_inchi_key': u'NKANXQFJJICGDU-QPLCGJKRSA-N', u'canonical_smiles': u'CC\\C(=C(/c1ccccc1)\\c2ccc(OCCN(C)C)cc2)\\c3ccccc3', u'standard_inchi': u'InChI=1S/C26H29NO/c1-4-25(21-11-7-5-8-12-21)26(22-13-9-6-10-14-22)23-15-17-24(18-16-23)28-20-19-27(2)3/h5-18H,4,19-20H2,1-3H3/b26-25-'}, u'chirality': u'2', u'usan_substem': None, u'pref_name': u'TAMOXIFEN', u'polymer_flag': False, u'molecule_chembl_id': u'CHEMBL83', u'therapeutic_flag': True, u'molecule_properties': {u'num_ro5_violations': 1, u'mw_freebase': u'371.51', u'psa': u'12.47', u'full_mwt': u'371.51', u'ro3_pass': u'N', u'num_alerts': 1, u'acd_logd': u'2.79', u'full_molformula': u'C26H29NO', u'hba': 2, u'molecular_species': u'NEUTRAL', u'mw_monoisotopic': u'371.2249', u'heavy_atoms': 28, u'aromatic_rings': 3, u'alogp': u'6.32', u'acd_most_apka': u'9.52', u'qed_weighted': u'0.43', u'acd_most_bpka': u'8.29', u'hbd': 0, u'acd_logp': u'3.72', u'rtb': 8}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': u'antiestrogens of the clomifene and tamoxifen groups', u'natural_product': u'0', u'black_box_warning': u'1', u'availability_type': u'1', u'inorganic_flag': u'0', u'molecule_synonyms': [{u'synonyms': u'ICI-46474', u'syn_type': u'RESEARCH_CODE'}, {u'synonyms': u'Nolvadex', u'syn_type': u'OTHER'}, {u'synonyms': u'Soltamox', u'syn_type': u'OTHER'}, {u'synonyms': u'Tamoxifen', u'syn_type': u'FDA'}, {u'synonyms': u'Tamoxifen', u'syn_type': u'BAN'}, {u'synonyms': u'Tamoxifen', u'syn_type': u'INN'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL83', u'parent_chembl_id': u'CHEMBL83'}, u'indication_class': u'Anti-Estrogen', u'usan_year': 1976, u'first_in_class': u'0', u'topical': False, u'oral': True} [{u'max_phase': 4, u'usan_stem': u'-azosin', u'parenteral': False, u'dosed_ingredient': False, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': 8364, u'first_approval': 1976, u'atc_classifications': [u'C02CA01'], u'prodrug': u'0', u'molecule_structures': {u'standard_inchi_key': u'IENZQIKPVFGBNW-UHFFFAOYSA-N', u'canonical_smiles': u'COc1cc2nc(nc(N)c2cc1OC)N3CCN(CC3)C(=O)c4occc4', u'standard_inchi': u'InChI=1S/C19H21N5O4/c1-26-15-10-12-13(11-16(15)27-2)21-19(22-17(12)20)24-7-5-23(6-8-24)18(25)14-4-3-9-28-14/h3-4,9-11H,5-8H2,1-2H3,(H2,20,21,22)'}, u'chirality': u'2', u'usan_substem': None, u'pref_name': u'PRAZOSIN', u'polymer_flag': False, u'molecule_chembl_id': u'CHEMBL2', u'therapeutic_flag': True, u'molecule_properties': {u'num_ro5_violations': 0, u'mw_freebase': u'383.40', u'psa': u'106.94', u'full_mwt': u'383.40', u'ro3_pass': u'N', u'num_alerts': 0, u'acd_logd': u'2.09', u'full_molformula': u'C19H21N5O4', u'hba': 7, u'molecular_species': u'NEUTRAL', u'mw_monoisotopic': u'383.1594', u'heavy_atoms': 28, u'aromatic_rings': 3, u'alogp': u'2.11', u'acd_most_apka': None, u'qed_weighted': u'0.74', u'acd_most_bpka': u'6.52', u'hbd': 1, u'acd_logp': u'2.14', u'rtb': 4}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': u'antihypertensives (prazosin type)', u'natural_product': u'0', u'black_box_warning': u'0', u'availability_type': u'1', u'inorganic_flag': u'0', u'molecule_synonyms': [{u'synonyms': u'CP-12299', u'syn_type': u'RESEARCH_CODE'}, {u'synonyms': u'Prazosin', u'syn_type': u'FDA'}, {u'synonyms': u'Prazosin', u'syn_type': u'BAN'}, {u'synonyms': u'Prazosin', u'syn_type': u'INN'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL2', u'parent_chembl_id': u'CHEMBL2'}, u'indication_class': u'Antihypertensive', u'usan_year': 1968, u'first_in_class': u'0', u'topical': False, u'oral': True}, {u'max_phase': 4, u'usan_stem': None, u'parenteral': False, u'dosed_ingredient': True, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': 17688, u'first_approval': 1984, u'atc_classifications': [u'N07BA01'], u'prodrug': u'0', u'molecule_structures': {u'standard_inchi_key': u'SNICXCGAKADSCV-JTQLQIEISA-N', u'canonical_smiles': u'CN1CCC[C@H]1c2cccnc2', u'standard_inchi': u'InChI=1S/C10H14N2/c1-12-7-3-5-10(12)9-4-2-6-11-8-9/h2,4,6,8,10H,3,5,7H2,1H3/t10-/m0/s1'}, u'chirality': u'1', u'usan_substem': None, u'pref_name': u'NICOTINE', u'polymer_flag': True, u'molecule_chembl_id': u'CHEMBL3', u'therapeutic_flag': True, u'molecule_properties': {u'num_ro5_violations': 0, u'mw_freebase': u'162.23', u'psa': u'16.13', u'full_mwt': u'162.23', u'ro3_pass': u'Y', u'num_alerts': 0, u'acd_logd': u'-0.62', u'full_molformula': u'C10H14N2', u'hba': 2, u'molecular_species': u'NEUTRAL', u'mw_monoisotopic': u'162.1157', u'heavy_atoms': 12, u'aromatic_rings': 1, u'alogp': u'1.24', u'acd_most_apka': None, u'qed_weighted': u'0.62', u'acd_most_bpka': u'8.00', u'hbd': 0, u'acd_logp': u'0.57', u'rtb': 1}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': None, u'natural_product': u'1', u'black_box_warning': u'0', u'availability_type': u'2', u'inorganic_flag': u'0', u'molecule_synonyms': [{u'synonyms': u'Habitrol', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Nicoderm CQ', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Nicotine', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Nicotine', u'syn_type': u'USAN'}, {u'synonyms': u'Nicotrol', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Prostep', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Nicotrol Inhaler', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Nicotrol NS', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Nicoderm', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Nicotine', u'syn_type': u'MERCK_INDEX'}, {u'synonyms': u'Nicotine', u'syn_type': u'FDA'}, {u'synonyms': u'Nicotine', u'syn_type': u'USP'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL3', u'parent_chembl_id': u'CHEMBL3'}, u'indication_class': u'Smoking Cessation Adjunct', u'usan_year': 1985, u'first_in_class': u'0', u'topical': True, u'oral': True}, {u'max_phase': 4, u'usan_stem': u'-oxacin', u'parenteral': True, u'dosed_ingredient': True, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': 7731, u'first_approval': 1990, u'atc_classifications': [u'S02AA16', u'J01MA01', u'S01AE01'], u'prodrug': u'0', u'molecule_structures': {u'standard_inchi_key': u'GSDSWSVVBLHKDQ-UHFFFAOYSA-N', u'canonical_smiles': u'CC1COc2c(N3CCN(C)CC3)c(F)cc4C(=O)C(=CN1c24)C(=O)O', u'standard_inchi': u'InChI=1S/C18H20FN3O4/c1-10-9-26-17-14-11(16(23)12(18(24)25)8-22(10)14)7-13(19)15(17)21-5-3-20(2)4-6-21/h7-8,10H,3-6,9H2,1-2H3,(H,24,25)'}, u'chirality': u'0', u'usan_substem': None, u'pref_name': u'OFLOXACIN', u'polymer_flag': False, u'molecule_chembl_id': u'CHEMBL4', u'therapeutic_flag': True, u'molecule_properties': {u'num_ro5_violations': 0, u'mw_freebase': u'361.37', u'psa': u'73.31', u'full_mwt': u'361.37', u'ro3_pass': u'N', u'num_alerts': 1, u'acd_logd': u'-0.39', u'full_molformula': u'C18H20FN3O4', u'hba': 7, u'molecular_species': u'ACID', u'mw_monoisotopic': u'361.1438', u'heavy_atoms': 26, u'aromatic_rings': 1, u'alogp': u'-1.37', u'acd_most_apka': u'5.19', u'qed_weighted': u'0.65', u'acd_most_bpka': u'7.37', u'hbd': 1, u'acd_logp': u'1.86', u'rtb': 2}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': u'antibacterials (quinolone derivatives)', u'natural_product': u'0', u'black_box_warning': u'1', u'availability_type': u'1', u'inorganic_flag': u'0', u'molecule_synonyms': [{u'synonyms': u'Floxin', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Floxin Otic', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'HOE-280', u'syn_type': u'RESEARCH_CODE'}, {u'synonyms': u'Ocuflox', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Ofloxacin', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Visiren', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'Tarivid', u'syn_type': u'TRADE_NAME'}, {u'synonyms': u'DL-8280', u'syn_type': u'RESEARCH_CODE'}, {u'synonyms': u'Ofloxacin', u'syn_type': u'BAN'}, {u'synonyms': u'Ofloxacin', u'syn_type': u'FDA'}, {u'synonyms': u'Ofloxacin', u'syn_type': u'INN'}, {u'synonyms': u'Ofloxacin', u'syn_type': u'JAN'}, {u'synonyms': u'Ofloxacin', u'syn_type': u'USP'}, {u'synonyms': u'Ofloxacin', u'syn_type': u'USAN'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL4', u'parent_chembl_id': u'CHEMBL4'}, u'indication_class': u'Antibacterial', u'usan_year': 1984, u'first_in_class': u'0', u'topical': True, u'oral': True}] 2460
Similar to filtering, it's also possible to order the result set, there is a parameter called order_by
that is reposnsible for ordering:
# Sort approved drugs by molecular weight ascending (from lightest to heaviest) and get the first (lightest) element
lightest_drug = molecule.filter(max_phase=4).order_by('molecule_properties__mw_freebase')[0]
lightest_drug['pref_name']
u'AMMONIA N 13'
# Sort approved drugs by molecular weight descending (from heaviest to lightest) and get the first (heaviest) element
heaviest_drug = molecule.filter(max_phase=4).order_by('-molecule_properties__mw_freebase')[0]
heaviest_drug['pref_name']
u'INSULIN LISPRO PROTAMINE RECOMBINANT'
# Do the same using url endpoint
url_1 = url_stem + "/molecule.json?max_phase=4&order_by=molecule_properties__mw_freebase"
lightest_drug = requests.get(url_1).json()['molecules'][0]
print url_1
print lightest_drug['pref_name']
url_2 = url_stem + "/molecule.json?max_phase=4&order_by=-molecule_properties__mw_freebase"
heaviest_drug = requests.get(url_2).json()['molecules'][0]
print url_2
print heaviest_drug['pref_name']
http://localhost/chemblws/molecule.json?max_phase=4&order_by=molecule_properties__mw_freebase AMMONIA N 13 http://localhost/chemblws/molecule.json?max_phase=4&order_by=-molecule_properties__mw_freebase INSULIN LISPRO PROTAMINE RECOMBINANT
It is possible to filter molecules by SMILES
# Atorvastatin...
smiles = "CC(C)c1c(C(=O)Nc2ccccc2)c(c3ccccc3)c(c4ccc(F)cc4)n1CC[C@@H](O)C[C@@H](O)CC(=O)O"
# By default, the type of search used is 'exact search' which means that only compounds with exacly same SMILES string will be picked:
result = molecule.filter(molecule_structures__canonical_smiles=smiles)
print len(result)
# This is quivalent of:
result1 = molecule.filter(molecule_structures__canonical_smiles__exact=smiles)
print len(result1)
# For convenience, we have a shortcut call:
result2 = molecule.filter(smiles=smiles)
print len(result2)
# Checking if they are all the same:
print result[0]['pref_name'] == result1[0]['pref_name'] == result2[0]['pref_name']
# And because SMILES string are unique in ChEMBL, this is similar to:
result3 = molecule.get(smiles)
print result[0]['pref_name'] == result3['pref_name']
1 1 1 True True
There are however different filtering operators that can be applied to SMILES; the most important one is called flexmatch
, which will return all structures described by given SMILES string even if this is non-canonical SMILES.
# Flexmatch will look for structures that match given SMILES, ignoring stereo:
records = molecule.filter(molecule_structures__canonical_smiles__flexmatch=smiles)
print len(records)
for record in records:
print("{:15s} : {}".format(record["molecule_chembl_id"], record['molecule_structures']['canonical_smiles']))
1 CHEMBL1487 : CC(C)c1c(C(=O)Nc2ccccc2)c(c3ccccc3)c(c4ccc(F)cc4)n1CC[C@@H](O)C[C@@H](O)CC(=O)O
Unlike with the exact string match, it is possible to retrieve multiple records when a SMILES is used for the flexmatch
lookup (i.e. it is potentially one-to-many instead of one-to-one as the ID lookups are). This is due to the nature of flexmatch
.
In our case two structures are returned, CHEMBL1487 (Atorvastatin) and CHEMBL1207181, which is the same structure as the former but with one of the two stereocentres undefined.
# The same can be achieved using url endpoint:
url_1 = url_stem + "/molecule.json?molecule_structures__canonical_smiles=" + quote(smiles)
url_2 = url_stem + "/molecule.json?molecule_structures__canonical_smiles__exact=" + quote(smiles)
url_3 = url_stem + "/molecule.json?smiles=" + quote(smiles)
url_4 = url_stem + "/molecule.json?molecule_structures__canonical_smiles__flexmatch=" + quote(smiles)
exact_match = requests.get(url_1).json()
explicit_exact_match = requests.get(url_2).json()
convenient_shortcut = requests.get(url_3).json()
flexmatch = requests.get(url_4).json()
print url_1
print len(exact_match['molecules'])
print url_2
print len(explicit_exact_match['molecules'])
print url_3
print len(convenient_shortcut['molecules'])
print url_4
print len(flexmatch['molecules'])
print exact_match == explicit_exact_match
http://localhost/chemblws/molecule.json?molecule_structures__canonical_smiles=CC%28C%29c1c%28C%28%3DO%29Nc2ccccc2%29c%28c3ccccc3%29c%28c4ccc%28F%29cc4%29n1CC%5BC%40%40H%5D%28O%29C%5BC%40%40H%5D%28O%29CC%28%3DO%29O 1 http://localhost/chemblws/molecule.json?molecule_structures__canonical_smiles__exact=CC%28C%29c1c%28C%28%3DO%29Nc2ccccc2%29c%28c3ccccc3%29c%28c4ccc%28F%29cc4%29n1CC%5BC%40%40H%5D%28O%29C%5BC%40%40H%5D%28O%29CC%28%3DO%29O 1 http://localhost/chemblws/molecule.json?smiles=CC%28C%29c1c%28C%28%3DO%29Nc2ccccc2%29c%28c3ccccc3%29c%28c4ccc%28F%29cc4%29n1CC%5BC%40%40H%5D%28O%29C%5BC%40%40H%5D%28O%29CC%28%3DO%29O 1 http://localhost/chemblws/molecule.json?molecule_structures__canonical_smiles__flexmatch=CC%28C%29c1c%28C%28%3DO%29Nc2ccccc2%29c%28c3ccccc3%29c%28c4ccc%28F%29cc4%29n1CC%5BC%40%40H%5D%28O%29C%5BC%40%40H%5D%28O%29CC%28%3DO%29O 1 True
The URL-based example above used the HTTP GET method, which means the SMILES are passed via the URL. This can cause problems where the SMILES inludes the '/', '' or '#' characters, for example:
# CHEMBL477889
smiles = "[Na+].CO[C@@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-]"
url = url_stem + "/molecule/" + smiles + ".json"
result = requests.get(url)
print url
print result.ok
print result.status_code
http://localhost/chemblws/molecule/[Na+].CO[C@@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-].json False 404
There are two solutions to this problem:
urllib.quote
functionX-HTTP-Method-Override
: GET
header# Method one:
url = url_stem + "/molecule/" + quote(smiles) + ".json"
result_by_get = requests.get(url)
print url
print result_by_get.ok
print result_by_get.status_code
http://localhost/chemblws/molecule/%5BNa%2B%5D.CO%5BC%40%40H%5D%28CCC%23C%5CC%3DC/CCCC%28C%29CCCCC%3DC%29C%28%3DO%29%5BO-%5D.json True 200
# Method two:
url = url_stem + "/molecule.json"
result_by_post = requests.post(url, data={"smiles": smiles}, headers={"X-HTTP-Method-Override": "GET"})
print result_by_post.ok
print result_by_post.status_code
True 200
print smiles
print result_by_post.json()
print result_by_get.json() == result_by_post.json()['molecules'][0]
[Na+].CO[C@@H](CCC#C\C=C/CCCC(C)CCCCC=C)C(=O)[O-] {u'page_meta': {u'previous': None, u'total_count': 1, u'offset': 0, u'limit': 20, u'next': None}, u'molecules': [{u'max_phase': 0, u'usan_stem': None, u'parenteral': False, u'dosed_ingredient': False, u'molecule_type': u'Small molecule', u'biotherapeutic': None, u'chebi_par_id': None, u'first_approval': None, u'atc_classifications': [], u'prodrug': u'-1', u'molecule_structures': {u'standard_inchi_key': u'RLSXKIUQYFNFBI-PJZMSVRGSA-M', u'canonical_smiles': u'[Na+].CO[C@@H](CCC#C\\C=C/CCCC(C)CCCCC=C)C(=O)[O-]', u'standard_inchi': u'InChI=1S/C20H32O3.Na/c1-4-5-6-12-15-18(2)16-13-10-8-7-9-11-14-17-19(23-3)20(21)22;/h4,7-8,18-19H,1,5-6,10,12-17H2,2-3H3,(H,21,22);/q;+1/p-1/b8-7-;/t18?,19-;/m0./s1'}, u'chirality': u'-1', u'usan_substem': None, u'pref_name': None, u'polymer_flag': False, u'molecule_chembl_id': u'CHEMBL477889', u'therapeutic_flag': False, u'molecule_properties': {u'num_ro5_violations': 1, u'mw_freebase': u'320.47', u'psa': u'46.53', u'full_mwt': u'342.45', u'ro3_pass': u'N', u'num_alerts': 3, u'acd_logd': u'2.30', u'full_molformula': u'C20H31NaO3', u'hba': 3, u'molecular_species': u'ACID', u'mw_monoisotopic': u'320.2351', u'heavy_atoms': 23, u'aromatic_rings': 0, u'alogp': u'6.02', u'acd_most_apka': u'4.31', u'qed_weighted': u'0.23', u'acd_most_bpka': u'0.16', u'hbd': 1, u'acd_logp': u'5.31', u'rtb': 15}, u'structure_type': u'MOL', u'helm_notation': None, u'usan_stem_definition': None, u'natural_product': u'-1', u'black_box_warning': u'0', u'availability_type': u'-1', u'inorganic_flag': u'-1', u'molecule_synonyms': [{u'synonyms': u'(Z)-Stellettic Acid B Sodium Salt', u'syn_type': u'OTHER'}], u'molecule_hierarchy': {u'molecule_chembl_id': u'CHEMBL477889', u'parent_chembl_id': u'CHEMBL477888'}, u'indication_class': None, u'usan_year': None, u'first_in_class': u'-1', u'topical': False, u'oral': False}]} True
As well as ID lookups, the web services may also be used to perform substructure searches. Currently, only SMILES-based searches are supported, although this could change if there is is a need for more powerful search abilities (e.g. SMARTS searching).
# Lapatinib contains the following core...
query = "c4ccc(Nc2ncnc3ccc(c1ccco1)cc23)cc4"
Chem.MolFromSmiles(query)
# Perform substructure search on query using client
substructure = new_client.substructure
records = substructure.filter(smiles=query)
# ... and using raw url-endpoint
url = url_stem + "/substructure/" + quote(query) + ".json"
result = requests.get(url).json()
print url
print result['page_meta']['total_count']
http://localhost/chemblws/substructure/c4ccc%28Nc2ncnc3ccc%28c1ccco1%29cc23%29cc4.json 82
mols = [Chem.MolFromSmiles(x['molecule_structures']['canonical_smiles']) for x in records[:6]]
legends=[str(x["molecule_chembl_id"]) for x in records]
Draw.MolsToGridImage(mols, legends=legends, subImgSize=(400, 200), useSVG=False)
The web services may also be used to perform SMILES-based similarity searches.
# Lapatinib
smiles = "CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2"
# Peform similarity search on molecule using client...
# Note that a percentage similarity must be supplied.
similarity = new_client.similarity
res = similarity.filter(smiles=smiles, similarity=85)
len(res)
7
##### ... and using raw url-endpoint
url = url_stem + "/similarity/" + quote(smiles) + "/85.json"
result = requests.get(url).json()
print url
print result['page_meta']['total_count']
http://localhost/chemblws/similarity/CS%28%3DO%29%28%3DO%29CCNCc1oc%28cc1%29c2ccc3ncnc%28Nc4ccc%28OCc5cccc%28F%29c5%29c%28Cl%29c4%29c3c2/85.json 7
mols = [Chem.MolFromSmiles(x['molecule_structures']['canonical_smiles']) for x in res[:6]]
legends = [str(x["molecule_chembl_id"]) for x in res]
Draw.MolsToGridImage(mols, legends=legends, subImgSize=(400, 200), useSVG=False)
The versions (e.g. salt forms) for a parent compound may be retrieved for a ChEMBL ID. Keep in mind that a parent structure is one that has had salt/solvate components removed; it corresponds to the bioactive moiety and its use facilitates structure searching, comparison etc. A compound without salt/solvate components is its own parent.
# Neostigmine (a parent)...
chembl_id = "CHEMBL278020"
records = new_client.molecule_form.get(chembl_id)['molecule_forms']
records
[{u'molecule_chembl_id': u'CHEMBL278020', u'parent': u'True'}, {u'molecule_chembl_id': u'CHEMBL54126', u'parent': u'False'}, {u'molecule_chembl_id': u'CHEMBL211471', u'parent': u'False'}]
The ChEMBL ID lookup service may now be used to get the full records for the salt forms...
for chembl_id in [x["molecule_chembl_id"] for x in records if x["parent"] == 'False']:
record = new_client.molecule.get(chembl_id)
print("{:10s} : {}".format(chembl_id, record['molecule_structures']['canonical_smiles']))
CHEMBL54126 : [Br-].CN(C)C(=O)Oc1cccc(c1)[N+](C)(C)C CHEMBL211471 : COS(=O)(=O)[O-].CN(C)C(=O)Oc1cccc(c1)[N+](C)(C)C
The mechanisms of action of marketed drugs may be retrieved.
Note that this data may not be recorded for the parent structure, but rather for one of its versions. For example, the marketed drug, Tykerb, containing the active ingredient Lapatinib (CHEMBL554) is actually the ditosylate monohydrate (CHEMBL1201179).
# Molecule forms for Lapatinib are used here...
for chembl_id in (x["molecule_chembl_id"] for x in new_client.molecule_form.get("CHEMBL554")['molecule_forms']):
print("The recorded mechanisms of action of '{}' are...".format(chembl_id))
mechanism_records = new_client.mechanism.filter(molecule_chembl_id=chembl_id)
if mechanism_records:
for mech_rec in mechanism_records:
print("{:10s} : {}".format(mech_rec["molecule_chembl_id"], mech_rec["mechanism_of_action"]))
print("-" * 50)
The recorded mechanisms of action of 'CHEMBL554' are... -------------------------------------------------- The recorded mechanisms of action of 'CHEMBL1201179' are... CHEMBL1201179 : Receptor protein-tyrosine kinase erbB-2 inhibitor CHEMBL1201179 : Epidermal growth factor receptor erbB1 inhibitor -------------------------------------------------- The recorded mechanisms of action of 'CHEMBL3526325' are... -------------------------------------------------- The recorded mechanisms of action of 'CHEMBL3542341' are... --------------------------------------------------
The webservice may be used to obtain a PNG image of a compound.
# Lapatinib ditosylate monohydrate (Tykerb)
chembl_id = "CHEMBL1201179"
png = new_client.image.get(chembl_id)
Image(png)
All bioactivity records for a compound may be retrieved via its ChEMBL ID.
# Lapatinib
chembl_id = "CHEMBL554"
records = new_client.activity.filter(molecule_chembl_id=chembl_id)
len(records), records[:2]
(1749, [{u'document_journal': u'Bioorg. Med. Chem. Lett.', u'bao_endpoint': u'BAO_0000190', u'potential_duplicate': False, u'uo_units': u'UO_0000065', u'canonical_smiles': u'CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2', u'assay_type': u'B', u'standard_flag': True, u'molecule_chembl_id': u'CHEMBL554', u'target_organism': u'Homo sapiens', u'assay_description': u'Inhibitory activity against epidermal growth factor receptor', u'record_id': 15408, u'document_chembl_id': u'CHEMBL1146682', u'bao_format': u'BAO_0000357', u'standard_units': u'nM', u'activity_id': 190221, u'standard_type': u'IC50', u'target_chembl_id': u'CHEMBL203', u'data_validity_comment': None, u'standard_relation': u'=', u'document_year': 2004, u'target_pref_name': u'Epidermal growth factor receptor erbB1', u'assay_chembl_id': u'CHEMBL674106', u'published_value': u'0.01', u'published_relation': u'=', u'standard_value': u'10', u'qudt_units': u'http://www.openphacts.org/units/Nanomolar', u'published_units': u'uM', u'pchembl_value': u'8.00', u'published_type': u'IC50', u'activity_comment': None}, {u'document_journal': u'Bioorg. Med. Chem. Lett.', u'bao_endpoint': u'BAO_0000190', u'potential_duplicate': False, u'uo_units': u'UO_0000065', u'canonical_smiles': u'CS(=O)(=O)CCNCc1oc(cc1)c2ccc3ncnc(Nc4ccc(OCc5cccc(F)c5)c(Cl)c4)c3c2', u'assay_type': u'B', u'standard_flag': True, u'molecule_chembl_id': u'CHEMBL554', u'target_organism': u'Homo sapiens', u'assay_description': u'Inhibitory activity against ERBB2 receptor kinase', u'record_id': 15408, u'document_chembl_id': u'CHEMBL1146682', u'bao_format': u'BAO_0000357', u'standard_units': u'nM', u'activity_id': 190222, u'standard_type': u'IC50', u'target_chembl_id': u'CHEMBL1824', u'data_validity_comment': None, u'standard_relation': u'=', u'document_year': 2004, u'target_pref_name': u'Receptor protein-tyrosine kinase erbB-2', u'assay_chembl_id': u'CHEMBL682312', u'published_value': u'0.009', u'published_relation': u'=', u'standard_value': u'9', u'qudt_units': u'http://www.openphacts.org/units/Nanomolar', u'published_units': u'uM', u'pchembl_value': u'8.05', u'published_type': u'IC50', u'activity_comment': None}])
The webservices may also be used to obtain information on biological targets, i.e. the entities, such as proteins, cells or organisms, with which compounds interact.
# Like with any other resource type, a complete list of targets can be requested using the client:
records = new_client.target.all()
len(records)
11019
records[:4]
[{u'target_components': [{u'component_id': 434, u'accession': u'O43451', u'component_type': u'PROTEIN', u'target_component_synonyms': [{u'component_synonym': u'MGA ', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'MGAML', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'MGAM', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'Maltase-glucoamylase, intestinal', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Maltase', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Glucoamylase', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Alpha-glucosidase', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Glucan 1,4-alpha-glucosidase', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'3.2.1.20', u'syn_type': u'EC_NUMBER'}, {u'component_synonym': u'3.2.1.3', u'syn_type': u'EC_NUMBER'}]}], u'target_chembl_id': u'CHEMBL2074', u'target_type': u'SINGLE PROTEIN', u'pref_name': u'Maltase-glucoamylase', u'species_group_flag': False, u'organism': u'Homo sapiens'}, {u'target_components': [{u'component_id': 294, u'accession': u'O60706', u'component_type': u'PROTEIN', u'target_component_synonyms': [{u'component_synonym': u'SUR2', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'ABCC9', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'ATP-binding cassette sub-family C member 9', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Sulfonylurea receptor 2', u'syn_type': u'UNIPROT'}]}], u'target_chembl_id': u'CHEMBL1971', u'target_type': u'SINGLE PROTEIN', u'pref_name': u'Sulfonylurea receptor 2', u'species_group_flag': False, u'organism': u'Homo sapiens'}, {u'target_components': [{u'component_id': 124, u'accession': u'O76074', u'component_type': u'PROTEIN', u'target_component_synonyms': [{u'component_synonym': u'PDE5', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'PDE5A', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u"cGMP-specific 3',5'-cyclic phosphodiesterase", u'syn_type': u'UNIPROT'}, {u'component_synonym': u'cGMP-binding cGMP-specific phosphodiesterase', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'CGB-PDE', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'3.1.4.35', u'syn_type': u'EC_NUMBER'}]}], u'target_chembl_id': u'CHEMBL1827', u'target_type': u'SINGLE PROTEIN', u'pref_name': u'Phosphodiesterase 5A', u'species_group_flag': False, u'organism': u'Homo sapiens'}, {u'target_components': [{u'component_id': 167, u'accession': u'O95180', u'component_type': u'PROTEIN', u'target_component_synonyms': [{u'component_synonym': u'CACNA1H', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'Voltage-dependent T-type calcium channel subunit alpha-1H', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Low-voltage-activated calcium channel alpha1 3.2 subunit', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Voltage-gated calcium channel subunit alpha Cav3.2', u'syn_type': u'UNIPROT'}]}], u'target_chembl_id': u'CHEMBL1859', u'target_type': u'SINGLE PROTEIN', u'pref_name': u'Voltage-gated T-type calcium channel alpha-1H subunit', u'species_group_flag': False, u'organism': u'Homo sapiens'}]
# Count target types...
counts = Counter([x["target_type"] for x in records if x["target_type"]])
for targetType, n in sorted(counts.items(), key=itemgetter(1), reverse=True): print("{:30s} {:-4d}".format(targetType, n))
SINGLE PROTEIN 6262 ORGANISM 2136 CELL-LINE 1598 PROTEIN COMPLEX 263 TISSUE 242 PROTEIN FAMILY 231 SELECTIVITY GROUP 97 PROTEIN COMPLEX GROUP 50 NUCLEIC-ACID 29 SMALL MOLECULE 25 PROTEIN-PROTEIN INTERACTION 21 UNKNOWN 18 SUBCELLULAR 10 METAL 9 PROTEIN NUCLEIC-ACID COMPLEX 6 OLIGOSACCHARIDE 6 MACROMOLECULE 5 CHIMERIC PROTEIN 4 LIPID 2 PHENOTYPE 2 NO TARGET 1 ADMET 1 UNCHECKED 1
Data on any target type may be obtained via a lookup of its ChEMBL ID.
# Receptor protein-tyrosine kinase erbB-2
chembl_id = "CHEMBL1824"
record = new_client.target.get(chembl_id)
record
{u'organism': u'Homo sapiens', u'pref_name': u'Receptor protein-tyrosine kinase erbB-2', u'species_group_flag': False, u'target_chembl_id': u'CHEMBL1824', u'target_components': [{u'accession': u'P04626', u'component_id': 120, u'component_type': u'PROTEIN', u'target_component_synonyms': [{u'component_synonym': u'HER2 ', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'MLN19', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'NEU', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'NGL', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'ERBB2', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'Receptor tyrosine-protein kinase erbB-2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Metastatic lymph node gene 19 protein', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Proto-oncogene Neu', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Proto-oncogene c-ErbB-2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'Tyrosine kinase-type cell surface receptor HER2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'p185erbB2', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'CD_antigen=CD340', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'MLN 19', u'syn_type': u'UNIPROT'}, {u'component_synonym': u'2.7.10.1', u'syn_type': u'EC_NUMBER'}]}], u'target_type': u'SINGLE PROTEIN'}
Remember that all targets have ChEMBL IDs, not just proteins...
# SK-BR-3, a cell line over-expressing erbB-2
chembl_id = "CHEMBL613834"
record = new_client.target.get(chembl_id)
record
{u'organism': u'Homo sapiens', u'pref_name': u'SK-BR-3', u'species_group_flag': False, u'target_chembl_id': u'CHEMBL613834', u'target_components': [], u'target_type': u'CELL-LINE'}
Data on protein targets may also be obtained using UniProt ID.
# UniProt ID for erbB-2, a target of Lapatinib
uniprot_id = "P04626"
records = new_client.target.filter(target_components__accession=uniprot_id)
print [(x['target_chembl_id'], x['pref_name']) for x in records]
[(u'CHEMBL1824', u'Receptor protein-tyrosine kinase erbB-2'), (u'CHEMBL2363049', u'Epidermal growth factor receptor'), (u'CHEMBL2111431', u'Epidermal growth factor receptor and ErbB2 (HER1 and HER2)')]
All bioactivities for a target may be retrieved.
# Receptor protein-tyrosine kinase erbB-2
chembl_id = "CHEMBL1824"
records = new_client.activity.filter(target_chembl_id=chembl_id)
len(records)
5959
# Show assays with most recorded bioactivities...
for assay, n in sorted(Counter((x["assay_chembl_id"], x["assay_description"]) for x in records).items(), key=itemgetter(1), reverse=True)[:5]:
print("{:-4d} {:14s} {}".format(n, *assay))
1742 CHEMBL1909205 DRUGMATRIX: Protein Tyrosine Kinase, ERBB2 (HER2) enzyme inhibition (substrate: Poly(Glu:Tyr)) 583 CHEMBL1963780 PUBCHEM_BIOASSAY: Navigating the Kinome. (Class of assay: other) Panel member name: ERBB2 369 CHEMBL1962280 GSK_PKIS: ERBB2 mean inhibition at 0.1 uM [Nanosyn] 369 CHEMBL1962281 GSK_PKIS: ERBB2 mean inhibition at 1 uM [Nanosyn] 83 CHEMBL874202 Inhibition of human epidermal growth factor receptor-2 (HER-2) autophosphorylation
The approved drugs for a target may be retrieved.
# Receptor protein-tyrosine kinase erbB-2
chembl_id = "CHEMBL1824"
activities = new_client.mechanism.filter(target_chembl_id=chembl_id)
compound_ids = [x['molecule_chembl_id'] for x in activities]
approved_drugs = new_client.molecule.filter(molecule_chembl_id__in=compound_ids).filter(max_phase=4)
for record in approved_drugs:
print("{:10s} : {}".format(record["molecule_chembl_id"], record["pref_name"]))
CHEMBL1201179 : LAPATINIB DITOSYLATE CHEMBL1201585 : TRASTUZUMAB CHEMBL1743082 : TRASTUZUMAB EMTANSINE CHEMBL2007641 : PERTUZUMAB CHEMBL2105712 : AFATINIB DIMALEATE
Information about assays may also be retrieved by the web services.
Details of an assay may be retrieved via its ChEMBL ID.
# Inhibitory activity against epidermal growth factor receptor
chembl_id = "CHEMBL674106"
record = new_client.assay.get(chembl_id)
record
{u'assay_category': None, u'assay_cell_type': None, u'assay_chembl_id': u'CHEMBL674106', u'assay_organism': None, u'assay_strain': None, u'assay_subcellular_fraction': None, u'assay_tax_id': None, u'assay_test_type': None, u'assay_tissue': None, u'assay_type': u'B', u'assay_type_description': u'Binding', u'bao_format': u'BAO_0000357', u'cell_chembl_id': None, u'confidence_description': u'Homologous single protein target assigned', u'confidence_score': 8, u'description': u'Inhibitory activity against epidermal growth factor receptor', u'document_chembl_id': u'CHEMBL1146682', u'relationship_description': u'Homologous protein target assigned', u'relationship_type': u'H', u'src_assay_id': None, u'src_id': 1, u'target_chembl_id': u'CHEMBL203'}
All bioactivity records for an assay may be requested.
records = new_client.activity.filter(assay_chembl_id=chembl_id)
len(records), records[:2]
(16, [{u'document_journal': u'Bioorg. Med. Chem. Lett.', u'bao_endpoint': u'BAO_0000190', u'potential_duplicate': False, u'uo_units': u'UO_0000065', u'canonical_smiles': u'Oc1ccc2ncnc(Nc3ccc(OCc4ccccc4)cc3)c2c1', u'assay_type': u'B', u'standard_flag': True, u'molecule_chembl_id': u'CHEMBL14932', u'target_organism': u'Homo sapiens', u'assay_description': u'Inhibitory activity against epidermal growth factor receptor', u'record_id': 15404, u'document_chembl_id': u'CHEMBL1146682', u'bao_format': u'BAO_0000357', u'standard_units': u'nM', u'activity_id': 183887, u'standard_type': u'IC50', u'target_chembl_id': u'CHEMBL203', u'data_validity_comment': None, u'standard_relation': u'=', u'document_year': 2004, u'target_pref_name': u'Epidermal growth factor receptor erbB1', u'assay_chembl_id': u'CHEMBL674106', u'published_value': u'98', u'published_relation': u'=', u'standard_value': u'98', u'qudt_units': u'http://www.openphacts.org/units/Nanomolar', u'published_units': u'nM', u'pchembl_value': u'7.01', u'published_type': u'IC50', u'activity_comment': None}, {u'document_journal': u'Bioorg. Med. Chem. Lett.', u'bao_endpoint': u'BAO_0000190', u'potential_duplicate': False, u'uo_units': u'UO_0000065', u'canonical_smiles': u'COCCOc1cc2ncnc(Nc3cccc(c3)C#C)c2cc1OCCOC', u'assay_type': u'B', u'standard_flag': True, u'molecule_chembl_id': u'CHEMBL553', u'target_organism': u'Homo sapiens', u'assay_description': u'Inhibitory activity against epidermal growth factor receptor', u'record_id': 15410, u'document_chembl_id': u'CHEMBL1146682', u'bao_format': u'BAO_0000357', u'standard_units': u'nM', u'activity_id': 186628, u'standard_type': u'IC50', u'target_chembl_id': u'CHEMBL203', u'data_validity_comment': None, u'standard_relation': u'=', u'document_year': 2004, u'target_pref_name': u'Epidermal growth factor receptor erbB1', u'assay_chembl_id': u'CHEMBL674106', u'published_value': u'0.001', u'published_relation': u'=', u'standard_value': u'1', u'qudt_units': u'http://www.openphacts.org/units/Nanomolar', u'published_units': u'uM', u'pchembl_value': u'9.00', u'published_type': u'IC50', u'activity_comment': None}])
As noted previously, there are many other resources that can be useful. They won't be covered in this document in a great detail but some examples may be helpful.
# Documents - retrieve all publications published after 1985 in 5th volume.
print new_client.document.filter(doc_type='PUBLICATION').filter(year__gt=1985).filter(volume=5)
[{u'doc_type': u'PUBLICATION', u'doi': u'10.1016/0960-894X(94)00451-K', u'title': None, u'journal': u'Bioorg. Med. Chem. Lett.', u'year': 1995, u'volume': u'5', u'first_page': u'15', u'last_page': u'18', u'pubmed_id': None, u'authors': None, u'document_chembl_id': u'CHEMBL1128135', u'issue': u'1'}, {u'doc_type': u'PUBLICATION', u'doi': u'10.1016/0960-894X(94)00456-P', u'title': None, u'journal': u'Bioorg. Med. Chem. Lett.', u'year': 1995, u'volume': u'5', u'first_page': u'19', u'last_page': u'24', u'pubmed_id': None, u'authors': None, u'document_chembl_id': u'CHEMBL1128136', u'issue': u'1'}, {u'doc_type': u'PUBLICATION', u'doi': u'10.1016/0960-894X(94)00452-L', u'title': None, u'journal': u'Bioorg. Med. Chem. Lett.', u'year': 1995, u'volume': u'5', u'first_page': u'25', u'last_page': u'30', u'pubmed_id': None, u'authors': None, u'document_chembl_id': u'CHEMBL1150533', u'issue': u'1'}, {u'doc_type': u'PUBLICATION', u'doi': u'10.1016/0960-894X(94)00449-P', u'title': None, u'journal': u'Bioorg. Med. Chem. Lett.', u'year': 1995, u'volume': u'5', u'first_page': u'35', u'last_page': u'38', u'pubmed_id': None, u'authors': None, u'document_chembl_id': u'CHEMBL1128137', u'issue': u'1'}, '...(remaining elements truncated)...']
# Cell lines:
print new_client.cell_line.get('CHEMBL3307242')
{u'efo_id': u'EFO_0002312', u'cell_id': 2, u'cell_source_tissue': u'Lyphoma', u'cellosaurus_id': u'CVCL_2676', u'cell_description': u'P3HR-1', u'cell_source_tax_id': 9606, u'cell_source_organism': u'Homo sapiens', u'cell_chembl_id': u'CHEMBL3307242', u'clo_id': u'CLO_0008331', u'cell_name': u'P3HR-1'}
# Protein class:
print new_client.protein_class.filter(l6="CAMK protein kinase AMPK subfamily")
[{u'protein_class_id': 409, u'l6': u'CAMK protein kinase AMPK subfamily', u'l7': None, u'l4': u'CAMK protein kinase group', u'l5': u'CAMK protein kinase CAMK1 family', u'l2': u'Kinase', u'l3': u'Protein Kinase', u'l1': u'Enzyme', u'l8': None}]
# Source:
print new_client.source.filter(src_short_name="ATLAS")
[{u'src_description': u'Gene Expression Atlas Compounds', u'src_short_name': u'ATLAS', u'src_id': 26}]
# Target component:
print new_client.target_component.get(375)
{u'component_id': 375, u'go_slims': [{u'go_id': u'GO:0003676'}, {u'go_id': u'GO:0004518'}, {u'go_id': u'GO:0006259'}, {u'go_id': u'GO:0009058'}, {u'go_id': u'GO:0016740'}, {u'go_id': u'GO:0016779'}, {u'go_id': u'GO:0034641'}, {u'go_id': u'GO:0042802'}, {u'go_id': u'GO:0043167'}], u'component_type': u'PROTEIN', u'sequence': u'PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKRKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYPGIRVRQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLRTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKTPKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTNRGRQKVVTLTDTTNQKTELQAIYLALQDSGLEVNIVTDSQYALGIIQAQPDQSESELVNQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVLFLDGID', u'accession': u'Q72547', u'target_component_synonyms': [{u'component_synonym': u'pol', u'syn_type': u'GENE_SYMBOL'}, {u'component_synonym': u'Reverse transcriptase/RNaseH', u'syn_type': u'UNIPROT'}], u'protein_classifications': [{u'protein_classification_id': 646}], u'tax_id': 11676, u'organism': u'Human immunodeficiency virus 1', u'targets': [{u'target_chembl_id': u'CHEMBL247'}], u'description': u'Reverse transcriptase/RNaseH'}
# ChEMBL ID Lookup: check if CHEMBL1 is a molecule, assay or target:
print new_client.chembl_id_lookup.get("CHEMBL1")['entity_type']
COMPOUND
# ATC class:
print new_client.atc_class.get('H03AA03')
{u'level1_description': u'SYSTEMIC HORMONAL PREPARATIONS, EXCL. ', u'level1': u'H', u'level2': u'H03', u'level3': u'H03A', u'level4': u'H03AA', u'level5': u'H03AA03', u'who_id': u'who1673', u'level4_description': u'Thyroid hormones', u'who_name': u'combinations of levothyroxine and liothyronine', u'level3_description': u'THYROID PREPARATIONS', u'level2_description': u'THYROID THERAPY'}