Notebook

Beaker: Using RDKit without RDKit¶

myChEMBL team, ChEMBL group, EMBL-EBI.¶

There are cases when one would need to use a chemoinformatics toolkit without having to install it. Such cases include a computer without sufficient privileges, a JavaScript web widget or mobile phone application. Another, very pragmatic, reason could be simply the lack of technical knowledge or experience to install such a toolkit or the need to quickly check some chemical properties without spending too much time on installation.

In such cases, our tool called Beaker can be very helpful. To the less advanced users, Beaker can be seen as a part of public web services, provided by ChEMBL. In the same way, one can use ChEMBL web services to check details of a compound for the given ID, now one may call the same web services to convert molfiles to SMILES, depict them, calculate fingerprints, etc.

The only requirement to use this functionality is having an internet connection. If you have myChEMBL VM, working internet connection is not required - web services are preloaded on the machine. This means you can use it straight away and the rest of this notebook shows how to do it in Python.

In order to access web services from Python we will use official ChEMBL python client called "chembl_webresource_client".

Configuration¶

In [1]:

# First of all we have to import some useful libraries:

# json for converting python dicts to json objects back and forth
import json

# lxml.etree for pretty-printing XML documents
from lxml import etree

# Ipython helper for displaying images
from IPython.display import Image, display
from IPython.display import SVG
from IPython.display import Javascript

# By default our Python client will use public instance of web services, requiring internet connection.
# We want to use a local instance provided with myChEMBL so we are doing some additional configuration.
# You should skip this when using the client outside of myChEMBL
from chembl_webresource_client.settings import Settings
Settings.Instance().UTILS_SPORE_URL = 'http://localhost/utils/spore'

# Finally, importing utils (aka Beaker) part of ChEMBL webservices, and we are ready to go!
import chembl_webresource_client.utils as utils_mod
utils = utils_mod.utils
print dir(utils)

['MMFFctab23D', 'MMFFsmiles23D', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'addHs', 'atomIsInRing', 'atomRings', 'bondIsInRing', 'bondRings', 'breakBonds', 'call_spore_function', 'canonicalizeSmiles', 'cipStereoInfo', 'clean', 'ctab23D', 'ctab2image', 'ctab2inchi', 'ctab2inchiKey', 'ctab2json', 'ctab2smarts', 'ctab2smiles', 'ctab2svg', 'ctab2xyz', 'description', 'descriptors', 'getNumAtoms', 'getNumBonds', 'hydrogenize', 'image2ctab', 'image2smiles', 'inchi2ctab', 'inchi2inchiKey', 'inchi2svg', 'kekulize', 'logP', 'mcs', 'molExport', 'molWt', 'neutralise', 'numAtomRings', 'numBondRings', 'numRings', 'official', 'reactionConverter', 'reactionExport', 'removeHs', 'rules', 'sanitize', 'sdf2SimilarityMap', 'sdf2fps', 'session', 'smiles23D', 'smiles2SimilarityMap', 'smiles2ctab', 'smiles2image', 'smiles2inchi', 'smiles2inchiKey', 'smiles2json', 'smiles2svg', 'sssr', 'standardise', 'status', 'symmSSSR', 'tpsa', 'unsalt']

Format conversion¶

In [2]:

# We will start with converting SMILES to molfile

# Lets take SMILES of aspirin:
smiles = 'O=C(Oc1ccccc1C(=O)O)C'

# And this is how we do the conversion, simple!
ctab = utils.smiles2ctab(smiles)

# And here we can see the result:
print ctab

     RDKit          2D

 13 13  0  0  0  0  0  0  0  0999 V2000
   -3.0122    1.1850    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -2.6987   -0.2818    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.2716   -0.7438    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.1580    0.2612    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4715    1.7281    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.6420    2.7330    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.0691    2.2711    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.3827    0.8043    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.2691   -0.2007    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.5826   -1.6676    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0097   -2.1295    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    0.4690   -2.6725    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
   -3.8123   -1.2868    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0
  2  3  1  0
  3  4  1  0
  4  5  2  0
  5  6  1  0
  6  7  2  0
  7  8  1  0
  8  9  2  0
  9 10  1  0
 10 11  2  0
 10 12  1  0
  2 13  1  0
  9  4  1  0
M  END
$$$$

In [3]:

# OK, now having our molfile (ctab), let's convert is back to SMILES
# By default, computed SMILES will be canonical:
smi_file = utils.ctab2smiles(ctab)
print smi_file
# The result is a *.smi file with the header so in order to get only SMILES, we have to extract the relevant part:
canonical_smiles = smi_file.split()[2]
print canonical_smiles

SMILES Name 
CC(=O)Oc1ccccc1C(=O)O 0

CC(=O)Oc1ccccc1C(=O)O

In [4]:

# Having our aspirin molfile (ctab), we can compute InCHI:
inchi = utils.ctab2inchi(ctab)
print inchi

InChI=1S/C9H8O4/c1-6(10)13-8-5-3-2-4-7(8)9(11)12/h2-5H,1H3,(H,11,12)

In [5]:

# And, of course, an InCHIKey from InCHI:
inchiKey = utils.inchi2inchiKey(inchi)
print inchiKey

BSYNRYMUTXBXSQ-UHFFFAOYSA-N

In [6]:

# It's also possible to convert InCHI back to molfile (ctab2):
ctab2 = utils.inchi2ctab(inchi)

# And a molfile to smiles:
smiles2 = utils.ctab2smiles(ctab2).split()[2]

# Let's check if we've go the same canonical SMILES after this round trip:
canonical_smiles == smiles2

Out[6]:

True

Marvin utils¶

Marvin 4 JS is a JavaScript port of very popular compound editor. To provide some extended functionality, such as importing/exporting from different formats, providing stereochemistry information or performing other calculations, Marvin 4 JS is using web services. The specification is open so everyone can provide own version of web services compatible with Marvin. Beaker provides set of methods conforming to that specification.

In [7]:

# Again, let's start from the SMILES for Aspirin
smiles = 'O=C(Oc1ccccc1C(=O)O)C'

# We will convert it to *.mrv format
mrv = json.loads(utils.molExport(structure=smiles, parameters="mrv"))['structure']

# Since *.mrv files are XML-based we can pretty-print it:
root = etree.fromstring(mrv).getroottree()
print etree.tostring(root, pretty_print=True)

<cml>
  <MDocument>
    <MChemicalStruct>
      <molecule molID="m1">
        <atomArray>
          <atom elementType="O" id="a1" x2="-2.99879997601" y2="1.1797333239"/>
          <atom elementType="C" id="a2" x2="-2.68669331184" y2="-0.280559997756"/>
          <atom elementType="O" id="a3" x2="-1.26597332321" y2="-0.740506660743"/>
          <atom elementType="C" id="a4" x2="-0.157359998741" y2="0.260026664586"/>
          <atom elementType="C" id="a5" x2="-0.469466662911" y2="1.72031998624"/>
          <atom elementType="C" id="a6" x2="0.639146661553" y2="2.72085331157"/>
          <atom elementType="C" id="a7" x2="2.05986665019" y2="2.26109331524"/>
          <atom elementType="C" id="a8" x2="2.37215998102" y2="0.800613326928"/>
          <atom elementType="C" id="a9" x2="1.26354665656" y2="-0.199733331735"/>
          <atom elementType="C" id="a10" x2="1.57565332073" y2="-1.66021332005"/>
          <atom elementType="O" id="a11" x2="2.99637330936" y2="-2.11997331637"/>
          <atom elementType="O" id="a12" x2="0.467039996264" y2="-2.66074664538"/>
          <atom elementType="C" id="a13" x2="-3.7953066363" y2="-1.28109332308"/>
        </atomArray>
        <bondArray>
          <bond atomRefs2="a1 a2" order="2"/>
          <bond atomRefs2="a2 a3" order="1"/>
          <bond atomRefs2="a3 a4" order="1"/>
          <bond atomRefs2="a4 a5" order="2"/>
          <bond atomRefs2="a5 a6" order="1"/>
          <bond atomRefs2="a6 a7" order="2"/>
          <bond atomRefs2="a7 a8" order="1"/>
          <bond atomRefs2="a8 a9" order="2"/>
          <bond atomRefs2="a9 a10" order="1"/>
          <bond atomRefs2="a10 a11" order="2"/>
          <bond atomRefs2="a10 a12" order="1"/>
          <bond atomRefs2="a2 a13" order="1"/>
          <bond atomRefs2="a9 a4" order="1"/>
        </bondArray>
      </molecule>
    </MChemicalStruct>
  </MDocument>
</cml>

In [8]:

# OK, now let's do the opposite. Starting with some mrv (cml) file, let's compute a stereo information:
cml = '''<cml>
             <MDocument>
                 <MChemicalStruct>
                     <molecule molID="m1">
                         <atomArray>
                             <atom id="a1" elementType="C" x2="-3.1249866416667733" y2="-0.5015733293207466"/>
                             <atom id="a2" elementType="C" x2="-4.458533297665067" y2="-1.2715733231607467"/>
                             <atom id="a3" elementType="C" x2="-4.458533297665067" y2="-2.81175997750592"/>
                             <atom id="a4" elementType="C" x2="-3.1249866416667733" y2="-3.58175997134592"/>
                             <atom id="a5" elementType="C" x2="-1.7912533190033066" y2="-2.81175997750592"/>
                             <atom id="a6" elementType="C" x2="-1.7912533190033066" y2="-1.2715733231607467"/>
                             <atom id="a7" elementType="C" x2="-0.45751999633984003" y2="-0.5013866626555733"/>
                             <atom id="a8" elementType="O" x2="-0.45751999633984003" y2="1.0384266583592534"/>
                             <atom id="a9" elementType="C" x2="0.87583999299328" y2="-1.2713866564955734"/>
                             <atom id="a10" elementType="C" x2="0.87583999299328" y2="-2.8113866441755735"/>
                         </atomArray>
                         <bondArray>
                             <bond atomRefs2="a1 a2" order="2"/>
                             <bond atomRefs2="a2 a3" order="1"/>
                             <bond atomRefs2="a3 a4" order="2"/>
                             <bond atomRefs2="a4 a5" order="1"/>
                             <bond atomRefs2="a5 a6" order="2"/>
                             <bond atomRefs2="a6 a1" order="1"/>
                             <bond atomRefs2="a6 a7" order="1"/>
                             <bond atomRefs2="a7 a9" order="1"/>
                             <bond atomRefs2="a9 a10" order="1"/>
                             <bond atomRefs2="a7 a8" order="1">
                                 <bondStereo>W</bondStereo>
                             </bond>
                         </bondArray>
                     </molecule>
                 </MChemicalStruct>
             </MDocument>
         </cml>
'''

# According to Marvin 4 JS WS specification, the result has to be json:
stereo_info = json.loads(utils.cipStereoInfo(structure=cml))
print stereo_info

{u'headers': {u'tetraHedral': {u'source': u'CALCULATOR', u'type': u'COMPLEX', u'name': u'tetraHedral'}, u'doubleBond': {u'source': u'CALCULATOR', u'type': u'COMPLEX', u'name': u'doubleBond'}}, u'tetraHedral': [{u'atomIndex': 6, u'chirality': u'S'}], u'doubleBond': []}

Standardisation¶

Standardiser is a tool written by Francis Atkinson, designed to provide a simple way of standardising molecules as a prelude to e.g. molecular modelling exercises. Thanks to Beaker, we don't have to install standardiser to use it.

In [9]:

# First method provided by standardiser is to break bonds to Group I and II metal atoms:
# Before using it, we have to convert our input SMILES string to ctab:
mol = utils.smiles2ctab("[Na]OC(=O)c1ccccc1")

# Now we can apply the function
br = utils.breakBonds(mol)

# In order to get our result back in SMILES format we have to make a conversion:
smiles = utils.ctab2smiles(br).split()[2]

# And here is the result:
print smiles

# We can even use Beaker to render input and output:
[display(Image(utils.smiles2image("[Na]OC(=O)c1ccccc1"))), display(Image(utils.smiles2image("[Na+].O=C([O-])c1ccccc1")))]

O=C([O-])c1ccccc1.[Na+]

Out[9]:

[None, None]

In [10]:

# The second method neutralizes charges by adding/removing protons
# Again, we have to convert SMILES to ctab first, then apply the method and convert result back to SMILES:
mol = utils.smiles2ctab("C(C(=O)[O-])(Cc1n[n-]nn1)(C[NH3+])(C[N+](=O)[O-])")
ne = utils.neutralise(mol)
smiles = utils.ctab2smiles(ne).split()[2]

# Now we can print the result
print smiles

# And render input and output
[display(Image(utils.smiles2image("C(C(=O)[O-])(Cc1n[n-]nn1)(C[NH3+])(C[N+](=O)[O-])"))), display(Image(utils.smiles2image("NCC(Cc1nn[nH]n1)(C[N+](=O)[O-])C(=O)O")))]

NCC(Cc1nn[nH]n1)(C[N+](=O)[O-])C(=O)O

Out[10]:

[None, None]

In [11]:

# Third method applies many structure-normalisation transformations

# Invoking it in standard way
mol = utils.smiles2ctab("Oc1nccc2cc[nH]c(=N)c12")
ru = utils.rules(mol)
smiles = utils.ctab2smiles(ru).split()[2]

# Printing the results:
print smiles

# Rendering input and output:
[display(Image(utils.smiles2image("Oc1nccc2cc[nH]c(=N)c12"))), display(Image(utils.smiles2image("Nc1nccc2cc[nH]c(=O)c12")))]

Nc1nccc2cc[nH]c(=O)c12

Out[11]:

[None, None]

In [12]:

# Forth method can be used to discard any salt/solvate components

# We alredy know what to do:
mol = utils.smiles2ctab("[Na+].OC(=O)Cc1ccc(CN)cc1.OS(=O)(=O)C(F)(F)F")
un = utils.unsalt(mol)
smiles = utils.ctab2smiles(un).split()[2]

# printing results:
print smiles

# rendering input and output:
[display(Image(utils.smiles2image("[Na+].OC(=O)Cc1ccc(CN)cc1.OS(=O)(=O)C(F)(F)F"))), display(Image(utils.smiles2image("NCc1ccc(CC(=O)O)cc1")))]

NCc1ccc(CC(=O)O)cc1

Out[12]:

[None, None]

In [13]:

# The last method from the Standardiser package aggregates four previous into one:
mol = utils.smiles2ctab("[Na]OC(=O)Cc1ccc(C[NH3+])cc1.c1nnn[n-]1.O")
st = utils.standardise(mol)
smiles = utils.ctab2smiles(st).split()[2]
print smiles
[display(Image(utils.smiles2image("[Na]OC(=O)Cc1ccc(C[NH3+])cc1.c1nnn[n-]1.O"))), display(Image(utils.smiles2image("NCc1ccc(CC(=O)O)cc1")))]

NCc1ccc(CC(=O)O)cc1

Out[13]:

[None, None]

Descriptors calculation¶

In [14]:

# We will now calculate a number of chemical descriptors

# As prevously we will start with aspirin SMILES:
aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')

# First descriptor will e the number of heavy atoms:
num_atoms = json.loads(utils.getNumAtoms(aspirin))[0]
print "num atoms = %s" % num_atoms

# Molecular weight:
mol_wt = json.loads(utils.molWt(aspirin))[0]
print "mol_wt = %s" % mol_wt

# Log_p:
log_p = json.loads(utils.logP(aspirin))[0]
print "log_p = %s" % log_p

# TPSA:
tpsa = json.loads(utils.tpsa(aspirin))[0]
print "tpsa = %s" % tpsa

# Or we can just calculate all those descriptors (and more!) at once: 
descriptors = json.loads(utils.descriptors(aspirin))[0]
print descriptors

num atoms = 13
mol_wt = 180.159
log_p = 1.3101
tpsa = 63.6
{u'fr_C_O_noCOO': 1, u'MaxEStateIndex': 10.611948223733938, u'Chi4v': 0.8871712192374142, u'fr_Ar_COO': 1, u'Chi4n': 0.8871712192374142, u'SMR_VSA4': 0.0, u'fr_urea': 0, u'fr_para_hydroxylation': 1, u'fr_barbitur': 0, u'fr_Ar_NH': 0, u'fr_halogen': 0, u'fr_dihydropyridine': 0, u'fr_priamide': 0, u'Chi0n': 6.981359543650051, u'fr_Al_COO': 0, u'fr_guanido': 0, u'MinPartialCharge': -0.4775395271554559, u'fr_furan': 0, u'fr_morpholine': 0, u'fr_term_acetylene': 0, u'SlogP_VSA6': 24.26546827384644, u'fr_amidine': 0, u'fr_benzodiazepine': 0, u'ExactMolWt': 180.042258736, u'SlogP_VSA1': 4.736862953800049, u'MolWt': 180.15899999999996, u'NumHDonors': 1, u'fr_hdrzine': 0, u'NumAromaticRings': 1, u'fr_quatN': 0, u'NumSaturatedHeterocycles': 0, u'NumAliphaticHeterocycles': 0, u'fr_benzene': 1, u'fr_phos_acid': 0, u'fr_sulfone': 0, u'VSA_EState10': 0.0, u'fr_aniline': 0, u'fr_N_O': 0, u'fr_sulfonamd': 0, u'fr_thiazole': 0, u'TPSA': 63.60000000000001, u'fr_piperzine': 0, u'SMR_VSA10': 11.938610575903699, u'PEOE_VSA13': 0.0, u'PEOE_VSA12': 0.0, u'PEOE_VSA11': 0.0, u'PEOE_VSA10': 11.3129633249809, u'BalabanJ': 3.0435273546341013, u'fr_lactone': 0, u'Chi3v': 1.3711546649445034, u'Chi2n': 2.3949556783206725, u'EState_VSA10': 9.589074368143644, u'EState_VSA11': 0.0, u'HeavyAtomMolWt': 172.09499999999997, u'Chi0': 9.844934982691242, u'Chi1': 6.109060905280622, u'NumAliphaticRings': 0, u'MolLogP': 1.3100999999999998, u'fr_nitro': 0, u'fr_Al_OH': 0, u'fr_azo': 0, u'NumAliphaticCarbocycles': 0, u'fr_C_O': 2, u'fr_ether': 1, u'fr_phenol_noOrthoHbond': 0, u'RingCount': 1, u'fr_alkyl_halide': 0, u'NumValenceElectrons': 68, u'fr_aryl_methyl': 0, u'MinEStateIndex': -1.1140277777777772, u'HallKierAlpha': -1.8399999999999999, u'fr_C_S': 0, u'fr_thiocyan': 0, u'fr_NH0': 0, u'VSA_EState4': 0.0, u'fr_nitroso': 0, u'VSA_EState6': 0.0, u'VSA_EState7': 0.0, u'VSA_EState1': 0.0, u'VSA_EState2': 0.0, u'VSA_EState3': 0.0, u'fr_HOCCN': 0, u'BertzCT': 343.2228677267164, u'SlogP_VSA12': 0.0, u'VSA_EState9': 40.166666666666664, u'SlogP_VSA10': 0.0, u'SlogP_VSA11': 5.749511833283905, u'fr_COO': 1, u'NHOHCount': 1, u'fr_unbrch_alkane': 0, u'NumSaturatedRings': 0, u'MaxPartialCharge': 0.33900378687731025, u'fr_methoxy': 0, u'fr_amide': 0, u'SlogP_VSA8': 0.0, u'SlogP_VSA9': 0.0, u'SlogP_VSA4': 0.0, u'SlogP_VSA5': 17.281725875459443, u'NumAromaticCarbocycles': 1, u'SlogP_VSA7': 0.0, u'fr_Imine': 0, u'SlogP_VSA2': 17.045137970744406, u'SlogP_VSA3': 4.794537184071822, u'fr_phos_ester': 0, u'fr_NH2': 0, u'MinAbsPartialCharge': 0.33900378687731025, u'SMR_VSA3': 0.0, u'NumHeteroatoms': 4, u'fr_NH1': 0, u'fr_ketone_Topliss': 0, u'fr_SH': 0, u'LabuteASA': 74.75705264447721, u'fr_thiophene': 0, u'Chi3n': 1.3711546649445034, u'fr_imidazole': 0, u'fr_nitrile': 0, u'SMR_VSA2': 0.0, u'SMR_VSA1': 19.432464716784395, u'SMR_VSA7': 29.828919765543436, u'SMR_VSA6': 0.0, u'EState_VSA8': 4.736862953800049, u'EState_VSA9': 5.106527394840706, u'EState_VSA6': 12.13273413692322, u'fr_nitro_arom': 0, u'SMR_VSA9': 5.749511833283905, u'EState_VSA5': 19.056471336613846, u'EState_VSA2': 11.3129633249809, u'fr_Ndealkylation2': 0, u'fr_Ndealkylation1': 0, u'EState_VSA1': 11.938610575903699, u'PEOE_VSA14': 11.938610575903699, u'Kappa3': 2.297415032519928, u'Ipc': 729.6807528797516, u'fr_diazo': 0, u'Kappa2': 3.7092512583454584, u'fr_Ar_N': 0, u'fr_Nhpyrrole': 0, u'EState_VSA7': 0.0, u'MolMR': 44.71030000000002, u'VSA_EState5': 0.0, u'EState_VSA4': 0.0, u'fr_COO2': 1, u'fr_prisulfonamd': 0, u'fr_oxime': 0, u'SMR_VSA8': 0.0, u'fr_isocyan': 0, u'EState_VSA3': 0.0, u'Chi2v': 2.3949556783206725, u'HeavyAtomCount': 13, u'fr_aldehyde': 0, u'SMR_VSA5': 6.923737199690624, u'NumHAcceptors': 3, u'fr_lactam': 0, u'fr_allylic_oxid': 0, u'VSA_EState8': 0.0, u'fr_oxazole': 0, u'fr_piperdine': 0, u'fr_Ar_OH': 0, u'NumRadicalElectrons': 0, u'fr_sulfide': 0, u'fr_alkyl_carbamate': 0, u'NOCount': 4, u'Chi1n': 3.6174536478673316, u'MaxAbsEStateIndex': 10.611948223733938, u'PEOE_VSA7': 12.13273413692322, u'PEOE_VSA6': 12.13273413692322, u'PEOE_VSA5': 0.0, u'PEOE_VSA4': 0.0, u'PEOE_VSA3': 4.794537184071822, u'PEOE_VSA2': 4.794537184071822, u'PEOE_VSA1': 9.843390348640755, u'NumSaturatedCarbocycles': 0, u'fr_imide': 0, u'FractionCSP3': 0.1111111111111111, u'Chi1v': 3.6174536478673316, u'fr_Al_OH_noTert': 0, u'fr_epoxide': 0, u'fr_hdrzone': 0, u'fr_isothiocyan': 0, u'NumAromaticHeterocycles': 0, u'fr_bicyclic': 0, u'Kappa1': 9.249605734767023, u'MinAbsEStateIndex': 0.01601851851851821, u'fr_phenol': 0, u'fr_ester': 1, u'PEOE_VSA9': 0.0, u'fr_azide': 0, u'PEOE_VSA8': 6.923737199690624, u'fr_pyridine': 0, u'fr_tetrazole': 0, u'fr_ketone': 0, u'fr_nitro_arom_nonortho': 0, u'Chi0v': 6.981359543650051, u'fr_ArN': 0, u'NumRotatableBonds': 2, u'MaxAbsPartialCharge': 0.4775395271554559}

Fingerprints¶

In [15]:

# As well as descriptor we can compute fingerprints.
# The output will be an FPS format. You can use optional "type" argument to choose type of fingerprints.
# This can be "morgan", "pair" or "maccs". Default is "morgan".

aspirin = utils.smiles2ctab('O=C(Oc1ccccc1C(=O)O)C')
fingerprints = utils.sdf2fps(aspirin)
print fingerprints

#FPS1
#num_bits=2048
#software=RDKit/2016.03.1
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000020000000000000000001000000000000000000000000000000000000000000000004000000008000000000000000000000000000800000000000000000000000002000000000000000000000000000020008800002000000010000000000000000000000008000000000000000000000000000000000000000000000100000000400000080000010000000000000000000000000000000000000010000000000000000000000000002004000008000000000000000000000000002000000002000000000000004008000000000000000	BSYNRYMUTXBXSQ-UHFFFAOYSA-N

Compound Images¶

In [16]:

# In addition to compute compound images in raster format (png), Beaker supports vector formats as well.
# We will first introduce JSON-based format. You can for example use `smiles2json` method to generate json object
# describing the visual representation. In order to render it, you can use raphael.js library and it's 
# `paper.add` method:

aspirin = 'O=C(Oc1ccccc1C(=O)O)C'
print utils.smiles2json(aspirin)
code = """
window.define = undefined;
$.getScript('https://cdnjs.cloudflare.com/ajax/libs/raphael/2.1.0/raphael-min.js', function(){
    var target = $(':focus').parent('div');
    var paper = Raphael(target, 320, 200);
    paper.add(%s);
    $(paper.canvas).delay( 2000 ).fadeOut( 400 );
});
"""
Javascript(code % utils.smiles2json(aspirin))

[{"path": "M0,0L200,0L200,200L0,200Z", "type": "path", "fill": "rgb(255, 255, 255)"}, {"height": 10.554066799834093, "width": 10.554066799834093, "stroke": "rgb(255, 255, 255)", "y": 67.30090812747672, "x": 32.7215256071902, "type": "rect", "fill": "rgb(255, 255, 255)"}, {"font-size": 10.554066799834093, "text": "O", "stroke": "rgb(255, 0, 0)", "y": 72.57794152739376, "x": 37.99855900710725, "type": "text"}, {"height": 10.554066799834093, "width": 10.554066799834093, "stroke": "rgb(255, 255, 255)", "y": 113.10291896469946, "x": 74.05581960245203, "type": "rect", "fill": "rgb(255, 255, 255)"}, {"font-size": 10.554066799834093, "text": "O", "stroke": "rgb(255, 0, 0)", "y": 118.3799523646165, "x": 79.33285300236908, "type": "text"}, {"height": 10.554066799834093, "width": 10.554066799834093, "stroke": "rgb(255, 255, 255)", "y": 146.00938941490907, "x": 181.0, "type": "rect", "fill": "rgb(255, 255, 255)"}, {"font-size": 10.554066799834093, "text": "O", "stroke": "rgb(255, 0, 0)", "y": 151.28642281482612, "x": 186.27703339991706, "type": "text"}, {"height": 10.554066799834093, "width": 21.108133599668186, "stroke": "rgb(255, 255, 255)", "y": 158.90492980192218, "x": 99.55901339796274, "type": "rect", "fill": "rgb(255, 255, 255)"}, {"font-size": 10.554066799834093, "text": "HO", "stroke": "rgb(255, 0, 0)", "y": 164.18196320183924, "x": 110.11308019779683, "type": "text"}, {"path": "M36.9974015195,80.517288044L39.9304856489,94.2399588128", "stroke": "rgb(255, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M39.9304856489,94.2399588128L42.8635697783,107.962629582", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M42.1578737437,79.4142885306L45.0909578731,93.1369592993", "stroke": "rgb(255, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M45.0909578731,93.1369592993L48.0240420025,106.859630068", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M45.4438058904,107.411129825L59.7498127464,112.041535924", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M59.7498127464,112.041535924L74.0558196025,116.671942024", "stroke": "rgb(255, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M84.6098864023,113.617663042L95.1932719398,104.066625532", "stroke": "rgb(255, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M95.1932719398,104.066625532L105.776657477,94.5155880224", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M105.776657477,94.5155880224L98.3314127172,59.6824011404", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M110.192605281,89.9292700787L104.236409473,62.0627205731", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M98.3314127172,59.6824011404L124.775216484,35.8180367982", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M124.775216484,35.8180367982L158.664265012,46.7868579225", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M126.539110798,41.9355187005L153.65034962,50.7105756", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M158.664265012,46.7868579225L166.109506233,81.62004622", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M166.109506233,81.62004622L139.665706005,105.484410916", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M159.929664526,80.0888823807L138.774624343,99.1803741375", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M139.665706005,105.484410916L147.110950057,140.31759886", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M146.298444628,142.827898703L160.6044529,147.458305472", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M160.6044529,147.458305472L174.910461171,152.08871224", "stroke": "rgb(255, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M147.923455486,137.807299016L162.229463757,142.437705785", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M162.229463757,142.437705785L176.535472029,147.068112553", "stroke": "rgb(255, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M147.110950057,140.31759886L136.527565227,149.868636242", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M136.527565227,149.868636242L125.944180398,159.419673624", "stroke": "rgb(255, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M45.4438058904,107.411129825L19.0,131.275494167", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}, {"path": "M139.665706005,105.484410916L105.776657477,94.5155880224", "stroke": "rgb(0, 0, 0)", "stroke-width": 1.2, "type": "path"}]

Out[16]:

In [17]:

# Most popular vector graphics format is XML-based SVG, this is how we can render compound as a SVG image:
benzene = 'c1ccccc1'
svg = utils.smiles2svg(benzene)

# pretty-printing SVG input, just to prove this is a vector graphic:
root = etree.fromstring(svg).getroottree()
print etree.tostring(root, pretty_print=True)

# And finally displaying it:
SVG(svg)

<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="200pt" height="200pt" viewBox="0 0 200 200" version="1.1">
<g id="surface294">
<rect x="0" y="0" width="200" height="200" style="fill:rgb(100%,100%,100%);fill-opacity:1;stroke:none;"/>
<path style="fill:none;stroke-width:1.2;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 127 100 L 113.5 123.382812 "/>
<path style="fill:none;stroke-width:1.2;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 122.1875 100.339844 L 111.386719 119.042969 "/>
<path style="fill:none;stroke-width:1.2;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 113.5 123.382812 L 86.5 123.382812 "/>
<path style="fill:none;stroke-width:1.2;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 86.5 123.382812 L 73 100 "/>
<path style="fill:none;stroke-width:1.2;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 88.613281 119.042969 L 77.8125 100.339844 "/>
<path style="fill:none;stroke-width:1.2;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 73 100 L 86.5 76.617188 "/>
<path style="fill:none;stroke-width:1.2;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 86.5 76.617188 L 113.5 76.617188 "/>
<path style="fill:none;stroke-width:1.2;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 89.199219 80.617188 L 110.800781 80.617188 "/>
<path style="fill:none;stroke-width:1.2;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;" d="M 113.5 76.617188 L 127 100 "/>
</g>
</svg>

Out[17]:

In [18]:

# And finally our old friends - raster images:
aspirin = 'O=C(Oc1ccccc1C(=O)O)C'
img = utils.smiles2image(aspirin)
Image(img)

Out[18]:

Maximum Common Substructure¶

In [19]:

# This is how to find a maximum common substructure (MCS) of three molecules:
smiles = ["O=C(NCc1cc(OC)c(O)cc1)CCCC/C=C/C(C)C", "CC(C)CCCCCC(=O)NCC1=CC(=C(C=C1)O)OC", "c1(C=O)cc(OC)c(O)cc1"]

# converting out molecules SMILES to molfiles:
mols = [utils.smiles2ctab(smile) for smile in smiles]

# joining molfiles to create a SDF file:
sdf = ''.join(mols)

# and finally computing MCS
result = utils.mcs(sdf)

# and displaying results:
print result

[#6]:1(:[#6]:[#6](:[#6](:[#6]:[#6]:1)-[#8])-[#8]-[#6])-[#6]

Compouting 3D coordinates¶

In [20]:

# It's very easy to compute a molfile with 3D coordinates:
aspirin = 'O=C(Oc1ccccc1C(=O)O)C'
mol_3D = utils.smiles23D(aspirin)
print mol_3D

     RDKit          3D

 21 21  0  0  0  0  0  0  0  0999 V2000
    2.4376    0.6881   -1.4041 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.3845   -0.1969   -0.5077 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.1672   -0.5490    0.0891 O   0  0  0  0  0  0  0  0  0  0  0  0
   -0.0101    0.2162    0.0456 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0378    1.6234    0.0045 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.1394    2.3726   -0.0087 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.3762    1.7311    0.0254 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4414    0.3379    0.0831 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.2652   -0.4345    0.1029 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.3587   -1.9150    0.1530 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4684   -2.6234   -0.3874 O   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4224   -2.5526    0.7899 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.6147   -0.9634   -0.1596 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.9839    2.1470    0.0163 H   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0923    3.4536   -0.0391 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.2876    2.3148    0.0083 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.4158   -0.1328    0.0875 H   0  0  0  0  0  0  0  0  0  0  0  0
   -3.1218   -2.0424    1.3144 H   0  0  0  0  0  0  0  0  0  0  0  0
    4.5068   -0.3053   -0.2249 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.7327   -1.8128   -0.8640 H   0  0  0  0  0  0  0  0  0  0  0  0
    3.5345   -1.3564    0.8755 H   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0
  2  3  1  0
  3  4  1  0
  4  5  2  0
  5  6  1  0
  6  7  2  0
  7  8  1  0
  8  9  2  0
  9 10  1  0
 10 11  2  0
 10 12  1  0
  2 13  1  0
  9  4  1  0
  5 14  1  0
  6 15  1  0
  7 16  1  0
  8 17  1  0
 12 18  1  0
 13 19  1  0
 13 20  1  0
 13 21  1  0
M  END
$$$$

Optical Structure Recognition - convering image to structure¶

OSRA is an open source tool performing Optical Structure Recognition - it can be used to convert an image, containing one or more compounds to it's structures in smi or mol format. Beaker uses OSRA to implement "image2ctab" method.

In [21]:

# Traditionally, let's start with aspirin SMILES:
aspirin = 'CC(C)Oc1ccccc1C(=O)O'

# Let's convert it to image:
im = utils.smiles2image(aspirin)

# And use OSRA to convert image to molfile:
mol = utils.image2ctab(im)

# We can now convert molfile to SMILES:
smiles = utils.ctab2smiles(mol).split()[2]

# And check if we get the same SMILES string:
smiles == aspirin

Out[21]:

True

Kekulisation¶

In [22]:

# Last piece of Beaker functionality is kekulisation:

# This time we will start with molfile:
aromatic='''
  Mrv0541 08191414212D

  6  6  0  0  0  0            999 V2000
   -1.7679    1.5616    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4823    1.1491    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.4823    0.3241    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7679   -0.0884    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0534    0.3241    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0534    1.1491    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  4  0  0  0  0
  1  6  4  0  0  0  0
  2  3  4  0  0  0  0
  3  4  4  0  0  0  0
  4  5  4  0  0  0  0
  5  6  4  0  0  0  0
M  END

'''

# Kekulising is trivial:
kek = utils.kekulize(aromatic)

# Rendering the result
Image(utils.ctab2image(kek))

Out[22]: