#functions to help visualise output
from rdkit import Chem
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
from rdkit.Chem import AllChem
def depict(input):
if(">>" in input):
rxn = AllChem.ReactionFromSmarts(input)
return Draw.ReactionToImage(rxn)
else:
temp = Chem.MolFromSmiles(input)
return temp
def showFile(in_file):
fo = open(in_file, "r")
mols=[]
ids=[]
n=0
for line in fo:
if(n < 10):
smi,id = line.rstrip().split()
mols.append( Chem.MolFromSmiles(smi) )
ids.append(id)
n=n+1
return Draw.MolsToGridImage(mols,molsPerRow=5,legends=ids)
def showMMPs(in_file):
fo = open(in_file, "r")
mols=[]
trans=[]
n=0
for line in fo:
if(n < 9):
m1,m2,id1,id2,t,c = line.rstrip().split(",")
smi = "%s.%s" % (m1,m2)
mols.append( Chem.MolFromSmiles(smi) )
trans.append(t)
n=n+1
return Draw.MolsToGridImage(mols,molsPerRow=3,subImgSize= ( 300 , 300 ) ,legends=trans)
def showLine(in_string):
f = in_string.split(",")
rxn =f[-2].split(">>")
mols=[]
ids=[]
mols.append( Chem.MolFromSmiles(f[-6]) )
mols.append( Chem.MolFromSmiles(f[-5]) )
mols.append( Chem.MolFromSmiles(rxn[0]) )
mols.append( Chem.MolFromSmiles(rxn[1]) )
mols.append( Chem.MolFromSmiles(f[-1]) )
ids.append(f[-3])
ids.append(f[-4])
ids.append("LHS")
ids.append("RHS")
ids.append("CONTEXT")
return Draw.MolsToGridImage(mols,molsPerRow=6,legends=ids)
The program to generate the MMPs from a set is divided into two parts; fragmentation and indexing.
Before running the programs, make sure your input set of SMILES:
If your smiles set doesn't satisfy the conditions above the programs are likely to fail or in the case of canonicalisation result in not identifying MMPs involving H atom substitution.
The file sample.smi has been canonicalised using RDKit.
cd t1_files/
ls
*Note:* You need a ! to run a Linux command in this ipython notebook. It's not needed on the command line.
!wc -l sample.smi
!head sample.smi
showFile('sample.smi')
Example fragmentation command:
python $RDBASE/Contrib/mmpa/rfrag.py <SMILES_FILE >FRAGMENT_OUTPUT
Program help (use the command: rfrag.py -h)
Program that fragments a user input set of smiles.
The program enumerates every single,double and triple acyclic single bond cuts in a molecule.
USAGE: ./rfrag.py <file_of_smiles
Format of smiles file: SMILES ID (space separated)
Output: whole mol smiles,ID,core,context
Lets run an example:
!python $RDBASE/Contrib/mmpa/rfrag.py <sample.smi >out_fragmented.txt
Lets have a look at the output:
!head out_fragmented.txt
Example indexing command:
python $RDBASE/Contrib/mmpa/indexing.py <FRAGMENT_OUTPUT >MMP_OUTPUT.CSV
Format of output: SMILES_OF_LEFT_MMP,SMILES_OF_RIGHT_MMP,ID_OF_LEFT_MMP,ID_OF_RIGHT_MMP,SMIRKS_OF_TRANSFORMATION,SMILES_OF_CONTEXT
Program help (use the command: indexing.py -h)
Usage: indexing.py [options]
Program to generate MMPs
Options:
-h, --help show this help message and exit
-s, --symmetric Output symmetrically equivalent MMPs, i.e output both
cmpd1,cmpd2, SMIRKS:A>>B and cmpd2,cmpd1, SMIRKS:B>>A
-m MAXSIZE, --maxsize=MAXSIZE
Maximum size of change (in heavy atoms) allowed in
matched molecular pairs identified. DEFAULT=10.
Note: This option overrides the ratio option if both
are specified.
-r RATIO, --ratio=RATIO
Maximum ratio of change allowed in matched molecular
pairs identified. The ratio is: size of change /
size of cmpd (in terms of heavy atoms). DEFAULT=0.3.
Note: If this option is used with the maxsize option,
the maxsize option will be used.
Lets some an examples:
Default settings:
!python $RDBASE/Contrib/mmpa/indexing.py <out_fragmented.txt >out_mmps_default.csv
!head out_mmps_default.csv
showLine("Cc1cccn2cc(-c3cccc(S(=O)(=O)N4CCCCC4)c3)nc12,Cc1cccn2cc(-c3ccc(S(=O)(=O)N4CCCCC4)cc3)nc12,2963575,1156028,[*:1]c1cccc([*:2])c1>>[*:1]c1ccc([*:2])cc1,[*:1]c1cn2cccc(C)c2n1.[*:2]S(=O)(=O)N1CCCCC1")
showMMPs("out_mmps_default.csv")
Output MMPs where maximum size of change is 3 heavy atoms:
!python $RDBASE/Contrib/mmpa/indexing.py -m 3 <out_fragmented.txt
showLine("Nc1ccc(-c2cc3ccccc3oc2=O)cc1,O=c1oc2ccccc2cc1-c1ccc(O)cc1,310860,4055843,[*:1]N>>[*:1]O,[*:1]c1ccc(-c2cc3ccccc3oc2=O)cc1")
Output MMPs where no more that 10% of the compound has changed (and using the symmetric option):
!python $RDBASE/Contrib/mmpa/indexing.py -r 0.1 <out_fragmented.txt
showLine("Nc1ccc(-c2cc3ccccc3oc2=O)cc1,O=c1oc2ccccc2cc1-c1ccc(O)cc1,310860,4055843,[*:1]N>>[*:1]O,[*:1]c1ccc(-c2cc3ccccc3oc2=O)cc1")
!python $RDBASE/Contrib/mmpa/indexing.py -r 0.1 -s <out_fragmented.txt >out_mmps_sym.csv
!head out_mmps_sym.csv
showMMPs("out_mmps_sym.csv")
Take a look at the following SMIRKS:
depict("[*:2]c1ccc([*:1])o1>>[*:1]c1ccc([*:2])cc1")
depict("[*:1]c1ccc([*:2])o1>>[*:2]c1ccc([*:1])cc1")
Notice the SMIRKS strings are different but the change they describe are the same (as the positions on the furan and benzene rings are symmetrically equivalent).
The algorithm used to canonicalise SMIRKS is as follows:
The MMP identification script uses a SMIRKS canonicalisation routine so the same change always has the same output SMIRKS. To canonicalise a SMIRKS (generated elsewhere) so it is in the same format as MMP identification scripts use command:
cansmirk.py <SMIRKS_FILE >SMIRKS_OUTPUT_FILE
Format of SMIRKS_FILE (space or comma separated): SMIRKS ID
Format of output: CANONICALISED_SMIRKS ID
Note: The script will NOT deal with SMARTS characters, so the SMIRKS must contain valid SMILES for left and right hand sides.
Let's try an example:
!head sample_smirks.txt
!python $RDBASE/Contrib/mmpa/cansmirk.py <sample_smirks.txt
If you want to apply a SMIRKS/transform generated by the programs above to a compound, use the mol_transform.py program with the following command:
python $RDBASE/Contrib/mmpa/mol_transform.py -f TRANSFORM_FILE <SMILES_FILE >OUTPUT_FILE
If you want to use a set SMIRKS generated elsewhere, please make sure they have been canonicalised using the cansmirk.py command.
Program help: Usage: mol_transform.py [options]
Program to apply transformations to a set of input molecules
Options:
-h, --help show this help message and exit
-f TRANSFORM_FILE, --file=TRANSFORM_FILE
The file containing the transforms to apply to your
input SMILES
Example command: mol_transform.py -f TRANSFORM_FILE <SMILES_FILE
Format of smiles file: SMILES ID <space or comma separated>
Format of transform file: transform <one per line>
Output: SMILES,ID,Transfrom,Modified_SMILES
Let's go through an example:
!head sample_smirks_mol_trans.txt
depict("[*:1]C(=O)O>>[*:1]C(N)=O")
!head sample_smiles_mol_trans.smi
depict("O=C(O)c1ccc(NC(=O)C2COc3ccccc3O2)cc1")
Let's run the command:
!python $RDBASE/Contrib/mmpa/mol_transform.py -f sample_smirks_mol_trans.txt <sample_smiles_mol_trans.smi
depict("NC(=O)c1ccc(NC(=O)C2COc3ccccc3O2)cc1")