Working with Reactions

In [1]:
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import AllChem
from rdkit.Chem import Draw

The building blocks

Start by reading in a set of building blocks from ZINC

http://zinc.docking.org/subsets/zbb

In [2]:
import gzip,random
inlines = gzip.open('data/zbb.smi.gz').readlines()
random.seed(42)
random.shuffle(inlines)
indata = '\n'.join(inlines[:10000])
In [3]:
suppl = Chem.SmilesMolSupplier()
suppl.SetData(indata)
ms = [x for x in suppl if x is not None]
In [4]:
ms[0]
Out[4]:
In [5]:
from rdkit.Chem import FunctionalGroups
fgs = FunctionalGroups.BuildFuncGroupHierarchy()
In [6]:
fgs
Out[6]:
[<rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2a9510>,
 <rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2a9890>,
 <rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2a9a50>,
 <rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2a9b90>,
 <rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2b5050>,
 <rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2b51d0>,
 <rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2b5390>,
 <rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2b5490>,
 <rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2b5650>,
 <rdkit.Chem.FunctionalGroups.FGHierarchyNode at 0x10e2b5a90>]
In [7]:
[x.name for x in fgs]
Out[7]:
['Acid Chloride',
 'Carboxylic acid',
 'Sulfonyl Chloride',
 'Amine',
 'Boronic Acid',
 'Isocyanate',
 'Alcohol',
 'Aldehyde',
 'Halogen',
 'Azide']
In [8]:
[x.label for x in fgs]
Out[8]:
['AcidChloride',
 'CarboxylicAcid',
 'SulfonylChloride',
 'Amine',
 'BoronicAcid',
 'Isocyanate',
 'Alcohol',
 'Aldehyde',
 'Halogen',
 'Azide']
In [9]:
[x.smarts for x in fgs]
Out[9]:
['C(=O)Cl',
 'C(=O)[O;H,-]',
 '[$(S-!@[#6])](=O)(=O)(Cl)',
 '[N;!H0;$(N-[#6]);!$(N-[!#6;!#1]);!$(N-C=[O,N,S])]',
 '[$(B-!@[#6])](O)(O)',
 '[$(N-!@[#6])](=!@C=!@O)',
 '[O;H1;$(O-!@[#6;!$(C=!@[O,N,S])])]',
 '[CH;D2;!$(C-[!#6;!#1])]=O',
 '[$([F,Cl,Br,I]-!@[#6]);!$([F,Cl,Br,I]-!@C-!@[F,Cl,Br,I]);!$([F,Cl,Br,I]-[C,S](=[O,S,N]))]',
 '[N;H0;$(N-[#6]);D2]=[N;D2]=[N;D1]']
In [10]:
[x.label for x in fgs[1].children]
Out[10]:
['CarboxylicAcid.Aromatic',
 'CarboxylicAcid.Aliphatic',
 'CarboxylicAcid.AlphaAmino']

The functional groups are in a hierarchy. For our purposes a flat datastructure is better. Convert to a dictionary keyed by functional group name:

In [11]:
from collections import namedtuple
nt = namedtuple('pattern','smarts mol')
def flattenFgs(fgs,res):
    if not fgs:
        return
    for x in fgs:
        res[x.label]=nt(x.smarts,x.pattern)
        flattenFgs(x.children,res)
allFgDefs={}
flattenFgs(fgs,allFgDefs)
In [13]:
allFgNames=sorted(allFgDefs.keys())
allFgNames
Out[13]:
['AcidChloride',
 'AcidChloride.Aliphatic',
 'AcidChloride.Aromatic',
 'Alcohol',
 'Alcohol.Aliphatic',
 'Alcohol.Aromatic',
 'Aldehyde',
 'Aldehyde.Aliphatic',
 'Aldehyde.Aromatic',
 'Amine',
 'Amine.Aliphatic',
 'Amine.Aromatic',
 'Amine.Cyclic',
 'Amine.Primary',
 'Amine.Primary.Aliphatic',
 'Amine.Primary.Aromatic',
 'Amine.Secondary',
 'Amine.Secondary.Aliphatic',
 'Amine.Secondary.Aromatic',
 'Azide',
 'Azide.Aliphatic',
 'Azide.Aromatic',
 'BoronicAcid',
 'BoronicAcid.Aliphatic',
 'BoronicAcid.Aromatic',
 'CarboxylicAcid',
 'CarboxylicAcid.Aliphatic',
 'CarboxylicAcid.AlphaAmino',
 'CarboxylicAcid.Aromatic',
 'Halogen',
 'Halogen.Aliphatic',
 'Halogen.Aromatic',
 'Halogen.Bromine',
 'Halogen.Bromine.Aliphatic',
 'Halogen.Bromine.Aromatic',
 'Halogen.NotFluorine',
 'Halogen.NotFluorine.Aliphatic',
 'Halogen.NotFluorine.Aromatic',
 'Isocyanate',
 'Isocyanate.Aliphatic',
 'Isocyanate.Aromatic',
 'SulfonylChloride',
 'SulfonylChloride.Aliphatic',
 'SulfonylChloride.Aromatic']
In [16]:
allFgs={}
for fgn in allFgNames:
    patt = allFgDefs[fgn]
    allFgs[fgn]=[m for m in ms if m.HasSubstructMatch(patt.mol)]
    print '%s: Found %d '%(fgn,len(allFgs[fgn]))        
AcidChloride: Found 62 
AcidChloride.Aliphatic: Found 21 
AcidChloride.Aromatic: Found 41 
Alcohol: Found 1137 
Alcohol.Aliphatic: Found 864 
Alcohol.Aromatic: Found 304 
Aldehyde: Found 210 
Aldehyde.Aliphatic: Found 21 
Aldehyde.Aromatic: Found 189 
Amine: Found 3643 
Amine.Aliphatic: Found 2366 
Amine.Aromatic: Found 1427 
Amine.Cyclic: Found 1069 
Amine.Primary: Found 908 
Amine.Primary.Aliphatic: Found 127 
Amine.Primary.Aromatic: Found 781 
Amine.Secondary: Found 857 
Amine.Secondary.Aliphatic: Found 203 
Amine.Secondary.Aromatic: Found 664 
Azide: Found 1 
Azide.Aliphatic: Found 0 
Azide.Aromatic: Found 1 
BoronicAcid: Found 0 
BoronicAcid.Aliphatic: Found 0 
BoronicAcid.Aromatic: Found 0 
CarboxylicAcid: Found 2052 
CarboxylicAcid.Aliphatic: Found 1325 
CarboxylicAcid.AlphaAmino: Found 128 
CarboxylicAcid.Aromatic: Found 736 
Halogen: Found 2828 
Halogen.Aliphatic: Found 477 
Halogen.Aromatic: Found 2483 
Halogen.Bromine: Found 607 
Halogen.Bromine.Aliphatic: Found 118 
Halogen.Bromine.Aromatic: Found 499 
Halogen.NotFluorine: Found 2135 
Halogen.NotFluorine.Aliphatic: Found 407 
Halogen.NotFluorine.Aromatic: Found 1774 
Isocyanate: Found 8 
Isocyanate.Aliphatic: Found 0 
Isocyanate.Aromatic: Found 8 
SulfonylChloride: Found 0 
SulfonylChloride.Aliphatic: Found 0 
SulfonylChloride.Aromatic: Found 0 

Note: It would be much, much more efficient to take advantage of the hierarchy in the functional groups to do this.

Reactions from Reaction SMARTS

Pick a popular class from Stephen Roughley's paper (Roughley, S. & Jordan, A. The Medicinal Chemist’s Toolbox: An Analysis of Reactions Used in the Pursuit of Drug Candidates. J. Med. Chem. 54, 3451–3479 (2011).).

Since there are no boronic acids in the BB library from ZINC I can't do a Suzuki coupling, so I'll use N-arylation instead.

Start with a very rough definition of the reaction:

In [17]:
rxn = AllChem.ReactionFromSmarts('[a:1]-[Br,I].[N;H1;D2;$(N(-[#6])-[#6]);!$(N-[!#6;!#1]);!$(N-C=[O,N,S]):2]>>[a:1]-[N:2]')

Take a look at the reaction using the new (and very crude) function Draw.ReactionToImage()

In [18]:
Draw.ReactionToImage(rxn)
Out[18]:

Now grab the building blocks we're going to use and look at them.

In [19]:
halogens = allFgs['Halogen.NotFluorine.Aromatic']
amines = allFgs['Amine.Secondary']
In [20]:
Draw.MolsToGridImage(halogens[:20],molsPerRow=4,legends=[x.GetProp('_Name') for x in halogens])
Out[20]:
In [21]:
Draw.MolsToGridImage(amines[:20],molsPerRow=4,legends=[x.GetProp('_Name') for x in amines])
Out[21]:

There's lots of stuff in there that looks like it may interfere, so let's do a bit more filtering

In [22]:
halogens = [x for x in halogens if len(x.GetSubstructMatches(allFgDefs['Halogen.NotFluorine'].mol))==1]
halogens = [x for x in halogens if not x.HasSubstructMatch(allFgDefs['Amine'].mol)]
len(halogens)
Out[22]:
955
In [23]:
Draw.MolsToGridImage(halogens[:20],molsPerRow=4,legends=[x.GetProp('_Name') for x in halogens])
Out[23]:
In [24]:
amines = [x for x in amines if len(x.GetSubstructMatches(allFgDefs['Amine'].mol))==1]
amines = [x for x in amines if not x.HasSubstructMatch(allFgDefs['Halogen.NotFluorine'].mol)]
len(amines)
Out[24]:
485
In [25]:
Draw.MolsToGridImage(amines[:20],molsPerRow=4,legends=[x.GetProp('_Name') for x in amines])
Out[25]:

Enumerating the library

The convenience function AllChem.EnumerateLibraryFromReaction() makes it very easy to enumerate libraries of molecules using a reaction and some lists of building blocks.

The return value is a generator object that encodes the entire library. The members of the library are not actually generated until you iterate through the generator object. Consequently, the command below executes in milliseconds even though the underlying library contains 955x485 (>460K) members.

In [39]:
products = AllChem.EnumerateLibraryFromReaction(rxn,(halogens,amines))
products
Out[39]:
<generator object EnumerateLibraryFromReaction at 0x111e4de10>

The results are tuples with one entry per product of the reaction.

This reaction has a single product, so there's only one entry in the tuple.

In [40]:
products.next()
Out[40]:
(<rdkit.Chem.rdchem.Mol at 0x111e2cc80>,)

pull out a block of products:

In [41]:
first20 = [products.next()[0] for x in range(20)]
Draw.MolsToGridImage(first20,molsPerRow=4)
Out[41]:

The above molecules all contain some ugliness because they have not been sanitized after being constructed. This is easy to fix:

In [42]:
[Chem.SanitizeMol(x) for x in first20]
Draw.MolsToGridImage(first20,molsPerRow=4)
Out[42]:

We can always go back and get more (until we get to the end of the 450K library):

In [38]:
next20 = [products.next()[0] for x in range(20)]
[Chem.SanitizeMol(x) for x in next20]
Draw.MolsToGridImage(next20,molsPerRow=4)
Out[38]: