Inspired by this RDKit-discuss question from Liz Wylie: http://www.mail-archive.com/rdkit-discuss@lists.sourceforge.net/msg03676.html
from rdkit import Chem
from rdkit.Chem import MCS
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole
from rdkit import rdBase
print rdBase.rdkitVersion
2014.03.1pre
Liz wanted to use custom atom types in the MCS code, yet still be able to get a readable SMILES for the MCS.
Here's a demonstration of the standard behavior of the MCS code for Liz's example.
smis=["COc1ccc(C(Nc2nc3c(ncn3COCC=O)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1",
"COc1ccc(C(Nc2nc3c(ncn3COC(CO)(CO)CO)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1"]
ms = [Chem.MolFromSmiles(x) for x in smis]
Draw.MolsToGridImage(ms,subImgSize=(300,300))
Start by defining a simple atom-type hash that combines atomic num and hybridization
def label(a):
return 100*int(a.GetHybridization())+a.GetAtomicNum()
The easiest way to use custom atom types is to set (bogus) isotope labels on the atoms in a copy of each molecule:
nms = [Chem.Mol(x) for x in ms]
for nm in nms:
for at in nm.GetAtoms():
at.SetIsotope(label(at))
And now run the MCS on the copy using the "istope" mode and print the results:
mcs=MCS.FindMCS(nms,atomCompare='isotopes')
print mcs.smarts
[406*]-[308*]-[306*]:1:[306*]:[306*]:[306*](:[306*]:[306*]:1)-[406*](-[306*]:1:[306*]:[306*]:[306*]:[306*]:[306*]:1)(-[306*]:1:[306*]:[306*]:[306*]:[306*]:[306*]:1)-[307*]-[306*]:1:[307*]:[306*]:2:[306*](:[306*](:[307*]:1)=[308*]):[307*]:[306*]:[307*]:2-[406*]-[408*]-[406*]
That's what we asked for, but it's not exactly readable.
We can get to a more readable form in a two step process
This works because we know that the atom indices in the copies and the original molecules are the same.
Start by getting the match
mcsp = Chem.MolFromSmarts(mcs.smarts)
match = nms[0].GetSubstructMatch(mcsp)
And now use Chem.MolFragmentToSmiles to generate the actual SMILES. This function generates the SMILES for a user-specified subset of a molecule.
smi=Chem.MolFragmentToSmiles(ms[0],atomsToUse=match,isomericSmiles=True,canonical=False)
print smi
COc1ccc(C(Nc2nc3c(ncn3COC)c(=O)[nH]2)(c2ccccc2)c2ccccc2)cc1
core = Chem.MolFromSmiles(smi)
core
Note that in this particular case, the custom atom types don't make a difference in the MCS that is found.