A quick note on the use of one of the RDKit convenience tools.
The function AllChem.ConstrainedEmbed()
allows you to generate 3D conformations of a molecule that are constrained to have the coordinates of a subset of atoms match those in a reference molecule. This allows, for example, generating 3D conformers for a series of compounds with a common scaffold geometry.
The data I will use are a set of compounds from the well-known Sutherland 3D QSAR paper[1]. I'm using the pre-processed version that Paolo Tosco created for the Open3DAlign[2] validation set.
The original files can be found here: http://sourceforge.net/projects/open3dalign/files/validation_suite/open3dalign_validation_suite.tar.bz2/download
from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw,PyMol,MCS
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
print rdBase.rdkitVersion
2014.03.1pre
Take the thermolysin data set, since it's got a nice flexible core bit.
ms = [x for x in Chem.SDMolSupplier('../data/open3dalign_validation_suite/therm_data.sdf',removeHs=False)]
ms2 = [x for x in Chem.SDMolSupplier('../data/open3dalign_validation_suite/therm_data.sdf')]
_=[AllChem.Compute2DCoords(x) for x in ms2]
Draw.MolsToGridImage(ms2[:12],molsPerRow=4)
Use the MCS code to pull out a core:
res=MCS.FindMCS(ms2,threshold=0.9,completeRingsOnly=True)
p = Chem.MolFromSmarts(res.smarts)
p
Get the molecules that match that core definition:
matchingMols = [x for x in ms if x.HasSubstructMatch(p)]
print len(ms),len(matchingMols)
76 69
v = PyMol.MolViewer()
Get a set of 3D coordinates of the core from one of the molecules:
core1 = AllChem.DeleteSubstructs(AllChem.ReplaceSidechains(Chem.RemoveHs(matchingMols[1]),p),Chem.MolFromSmiles('*'))
core1.UpdatePropertyCache()
v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
v.GetPNG(preDelay=2)
Demonstrate that the input conformations can be reasonably aligned to that core:
v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
for m in matchingMols[:10]:
nm = Chem.Mol(m)
AllChem.AlignMol(core1,nm)
v.ShowMol(nm,nm.GetProp('_Name'),showOnly=False)
v.GetPNG(preDelay=2)
Try generating a random conformation and seeing how well it aligns:
nm = Chem.Mol(matchingMols[3])
AllChem.EmbedMolecule(nm,randomSeed=0xF00D)
AllChem.MMFFOptimizeMolecule(nm)
rms = AllChem.AlignMol(core1,nm)
print 'RMS:',rms
v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
v.ShowMol(nm,'straight embed',showOnly=False)
v.GetPNG(preDelay=2)
RMS: 0.958738145983
The poor performance isn't much of a surprise given how flexible the molecules are.
AllChem.ConstrainedEmbed()
provides a solution to this problem: it generates conformations that match the template conformation.
The function works by:
The RMS of the final alignment is available in a molecule property named "EmbedRMS".
Here's an illustration of the results:
nm = Chem.Mol(matchingMols[3])
AllChem.ConstrainedEmbed(nm,core1)
rms = float(nm.GetProp('EmbedRMS'))
print 'RMS:',rms
v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
v.ShowMol(nm,'constrained embed',showOnly=False)
v.GetPNG(preDelay=2)
RMS: 0.0404323330462
By default the force field used for the structure refinment is UFF. This can be changed to MMFF:
nm = Chem.Mol(matchingMols[3])
GetFF=lambda x,confId=-1:AllChem.MMFFGetMoleculeForceField(x,AllChem.MMFFGetMoleculeProperties(x),confId=confId)
AllChem.ConstrainedEmbed(nm,core1,getForceField=GetFF)
rms = float(nm.GetProp('EmbedRMS'))
print 'RMS:',rms
v.ShowMol(core1,molB=Chem.MolToMolBlock(core1,kekulize=False),name='core')
v.SetDisplayStyle('core','sticks')
v.ShowMol(nm,'constrained embed',showOnly=False)
v.GetPNG(preDelay=2)
RMS: 0.0407469103071