In our benchmarking paper, we focused on simulating one of the classic virtual screening use cases: you have a set of diverse actives (e.g. from an HTS experiment) and want to identify the next compounds that should be tested. When we did the model fusion paper we wanted to also simulate a second use case: you have a small set of similar actives (e.g. from a paper or patent) and want to identify the next compounds that should be tested. To accomplish this we added a second type of dataset to the benchmarking platform, for want of a better term we called these "Data sets II" (I was advocating for "leave one paper out"; it's probably good that Nikolas and Sereina prevailed).
In the paper itself we didn't have the space to do a great job of describing these datasets and why they are interesting. I'm going to try and at least partially remedy that here. It also gives me a chance to play around with an idea for pulling the scaffold out of a set of compounds.
To build datasets II, Sereina started with the 50 ChEMBL targets we had identified as being difficult enough to be interesting for the machine-learning exercise. For each of those targets, here are the steps:
This process left us with 37 targets. The targets had from 4-37 papers each and the papers had 10-112 actives.
Given that the documents in ChEMBL are mainly from med chem papers, and knowing the content of the typical med chem paper, we asserted that each paper that makes up these datasets is likely to contain data about one or two chemical series along with a small number of reference compounds. I'll show here that, at least for a random selection of sets, this is actually true.
The approach is a simple one: pull the compounds and calculate the MCS that covers at least 80% of the compounds in the dataset. This arbitrary cutoff allows the reference compound(s) to be ignored. To allow for the small changes that are sometimes made to scaffolds, I use generic atom types for the MCS-finding and then, in a post-processing step, assign the atom types that are preserved across all compounds matching the MCS.
The datasets are in github: https://github.com/rdkit/benchmarking_platform
import pandas as pd
from rdkit.Chem.Draw import IPythonConsole
from rdkit.Chem import Draw
Draw.DrawingOptions.elemDict[0]=(0.,0.,0.) # draw dummy atoms in black
from rdkit.Chem import PandasTools
from rdkit.Chem import AllChem as Chem
from rdkit.Chem import DataStructs
from rdkit.Chem import MCS
import cPickle
from rdkit import rdBase
print(rdBase.rdkitVersion)
import time
print time.asctime()
2014.03.1pre Tue Feb 4 17:54:32 2014
import glob
pkls = glob.glob('/home/glandrum/Code/benchmarking_platform/compounds/ChEMBL_II/*.pkl')
print len(pkls)
37
Get target information from our local ChEMBL copy:
%load_ext sql
%config SqlMagic.feedback = False
import os
tgts={}
for pkl in pkls:
fn = os.path.split(pkl)[-1]
basen = os.path.splitext(fn)[0]
tgtn=int(basen.split('_')[-1])
data = %sql postgresql://localhost/chembl_16 \
select tid,pref_name,organism from target_dictionary where tid=:tgtn ;
tgts[data[0][0]]=(pkl,data[0][1],data[0][2])
for idx in sorted(tgts.keys()):
pkl,tgt,species = tgts[idx]
print idx,tgt,species
15 Carbonic anhydrase II Homo sapiens 25 Glucocorticoid receptor Homo sapiens 43 Beta-2 adrenergic receptor Homo sapiens 51 Serotonin 1a (5-HT1a) receptor Homo sapiens 61 Muscarinic acetylcholine receptor M1 Homo sapiens 65 Cytochrome P450 19A1 Homo sapiens 72 Dopamine D2 receptor Homo sapiens 87 Cannabinoid CB1 receptor Homo sapiens 90 Dopamine D4 receptor Homo sapiens 93 Acetylcholinesterase Homo sapiens 100 Norepinephrine transporter Homo sapiens 107 Serotonin 2a (5-HT2a) receptor Homo sapiens 108 Serotonin 2c (5-HT2c) receptor Homo sapiens 114 Adenosine A1 receptor Homo sapiens 121 Serotonin transporter Homo sapiens 126 Cyclooxygenase-2 Homo sapiens 130 Dopamine D3 receptor Homo sapiens 165 HERG Homo sapiens 259 Cannabinoid CB2 receptor Homo sapiens 10188 MAP kinase p38 alpha Homo sapiens 10193 Carbonic anhydrase I Homo sapiens 10260 Vanilloid receptor Homo sapiens 10280 Histamine H3 receptor Homo sapiens 10434 Tyrosine-protein kinase SRC Homo sapiens 10980 Vascular endothelial growth factor receptor 2 Homo sapiens 11140 Dipeptidyl peptidase IV Homo sapiens 11365 Cytochrome P450 2D6 Homo sapiens 11489 11-beta-hydroxysteroid dehydrogenase 1 Homo sapiens 11534 Cathepsin S Homo sapiens 11575 C-C chemokine receptor type 2 Homo sapiens 11631 Sphingosine 1-phosphate receptor Edg-1 Homo sapiens 12209 Carbonic anhydrase XII Homo sapiens 12252 Beta-secretase 1 Homo sapiens 12952 Carbonic anhydrase IX Homo sapiens 13001 Matrix metalloproteinase-2 Homo sapiens 17045 Cytochrome P450 3A4 Homo sapiens 19905 Melanin-concentrating hormone receptor 1 Homo sapiens
sets = cPickle.load(file(tgts[11631][0]))
The data is organized in a dictionary with one entry per paper.
docid,cmpds = sets.iteritems().next()
data = %sql postgresql://localhost/chembl_16 \
select journal,volume,first_page,year from docs where doc_id=:docid ;
print ', '.join(str(x) for x in data[0])
Bioorg. Med. Chem. Lett., 22, 144, 2012
ids,smis= zip(*cmpds)
mols = [Chem.MolFromSmiles(x) for x in smis]
Draw.MolsToGridImage(mols,molsPerRow=5,legends=ids)
Define an MCS-based approach to find the scaffold from the paper:
def MCS_Report(ms,printSmarts=True,atomCompare='any',**kwargs):
"""
the "convert to specific" algorithm used isn't perfect because it doesn't deal correctly
with possible symmetries in the MCS-molecule match, but it's at least a start.
"""
mcs = MCS.FindMCS(ms,atomCompare=atomCompare,timeout=60,**kwargs)
nAts = numpy.array([x.GetNumAtoms() for x in ms])
print 'Mean nAts %.1f, mcs nAts: %d'%(nAts.mean(),mcs.numAtoms)
if printSmarts:
print mcs.smarts
mcsM = Chem.MolFromSmarts(mcs.smarts)
# find the common atoms
if atomCompare == 'any':
mcsM2 = Chem.MolFromSmiles(mcs.smarts,sanitize=False)
atNums=[-1]*mcsM.GetNumAtoms()
for m in ms:
match = m.GetSubstructMatch(mcsM)
if not match:
continue
for qidx,midx in enumerate(match):
anum = m.GetAtomWithIdx(midx).GetAtomicNum()
if atNums[qidx]==-1:
atNums[qidx]=anum
elif anum!=atNums[qidx]:
atNums[qidx]=0
for idx,atnum in enumerate(atNums):
if atnum>0:
mcsM.GetAtomWithIdx(idx).SetAtomicNum(atnum)
mcsM.UpdatePropertyCache(False)
Chem.SetHybridization(mcsM)
img=Draw.MolToImage(mcsM,kekulize=False)
return mcsM,img
The function prints out the mean number of atoms per molecule in the input along with the size of the MCS. This is intended to help assess whether or not the MCS is actually a scaffold.
mcs,img = MCS_Report(mols,threshold=0.8)
img
Mean nAts 30.6, mcs nAts: 29 [*]-[*]-1=[*](-[*]-[*]-[*]:2:[*]:[*](:[*]:[*]:[*]-1:2)-[*]-[*]-[*]-[*]-[*]:1:[*]:[*]:[*]:[*]:[*]:1)-[*]-[*]-1-[*]-[*](-[*]-1)-[*](-[*])=[*]
Now let's look at all the papers
def processTarget(tid):
pkl,target,species=tgts[tid]
print '-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*'
print '\t',target,species
sets = cPickle.load(file(pkl))
print '\t\t %d papers'%(len(sets))
alldata=[]
for docid,cmpds in sets.iteritems():
data = %sql postgresql://localhost/chembl_16 \
select journal,volume,first_page,year from docs where doc_id=:docid ;
ref=', '.join(str(x) for x in data[0])
print '----------------'
print ref
ids,smis= zip(*cmpds)
mols = [Chem.MolFromSmiles(x) for x in smis]
mcs,img = MCS_Report(mols,printSmarts=False,threshold=0.8,atomCompare='any')
alldata.append([docid,ref,mcs,img,mols])
return alldata
alldata=processTarget(11631)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Sphingosine 1-phosphate receptor Edg-1 Homo sapiens 7 papers ---------------- Bioorg. Med. Chem. Lett., 22, 144, 2012 Mean nAts 30.6, mcs nAts: 29 ---------------- Bioorg. Med. Chem. Lett., 22, 1779, 2012 Mean nAts 31.2, mcs nAts: 29 ---------------- Bioorg. Med. Chem. Lett., 14, 3351, 2004 Mean nAts 24.6, mcs nAts: 23 ---------------- J. Med. Chem., 48, 6169, 2005 Mean nAts 29.4, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 21, 1390, 2011 Mean nAts 26.5, mcs nAts: 12 ---------------- Bioorg. Med. Chem., 15, 663, 2007 Mean nAts 24.9, mcs nAts: 16 ---------------- Bioorg. Med. Chem. Lett., 16, 3564, 2006 Mean nAts 31.4, mcs nAts: 15
alldata=processTarget(13001)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Matrix metalloproteinase-2 Homo sapiens 10 papers ---------------- Bioorg. Med. Chem. Lett., 21, 2820, 2011 Mean nAts 37.8, mcs nAts: 28 ---------------- Bioorg. Med. Chem. Lett., 19, 3445, 2009 Mean nAts 36.1, mcs nAts: 33 ---------------- Bioorg. Med. Chem. Lett., 11, 1009, 2001 Mean nAts 27.8, mcs nAts: 23 ---------------- Bioorg. Med. Chem., 15, 6170, 2007 Mean nAts 30.8, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 15, 4961, 2005 Mean nAts 38.7, mcs nAts: 35 ---------------- Bioorg. Med. Chem. Lett., 16, 3096, 2006 Mean nAts 34.1, mcs nAts: 25 ---------------- J. Med. Chem., 45, 4954, 2002 Mean nAts 26.4, mcs nAts: 22 ---------------- Bioorg. Med. Chem. Lett., 8, 2087, 1998 Mean nAts 33.5, mcs nAts: 29 ---------------- Bioorg. Med. Chem. Lett., 21, 1376, 2011 Mean nAts 37.3, mcs nAts: 35 ---------------- Bioorg. Med. Chem., 15, 791, 2007 Mean nAts 25.6, mcs nAts: 23
alldata=processTarget(87)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Cannabinoid CB1 receptor Homo sapiens 32 papers ---------------- Bioorg. Med. Chem., 16, 6489, 2008 Mean nAts 28.0, mcs nAts: 27 ---------------- J. Med. Chem., 52, 4496, 2009 Mean nAts 31.2, mcs nAts: 27 ---------------- J. Med. Chem., 51, 6970, 2008 Mean nAts 33.8, mcs nAts: 32 ---------------- Bioorg. Med. Chem. Lett., 15, 783, 2005 Mean nAts 38.5, mcs nAts: 34 ---------------- Bioorg. Med. Chem. Lett., 22, 547, 2012 Mean nAts 32.2, mcs nAts: 24 ---------------- Bioorg. Med. Chem., 16, 7510, 2008 Mean nAts 29.9, mcs nAts: 24 ---------------- J. Med. Chem., 48, 7343, 2005 Mean nAts 30.5, mcs nAts: 18 ---------------- Bioorg. Med. Chem. Lett., 17, 2706, 2007 Mean nAts 29.7, mcs nAts: 28 ---------------- Bioorg. Med. Chem. Lett., 17, 3652, 2007 Mean nAts 24.7, mcs nAts: 22 ---------------- J. Med. Chem., 48, 7486, 2005 Mean nAts 26.7, mcs nAts: 25 ---------------- Bioorg. Med. Chem. Lett., 20, 1448, 2010 Mean nAts 31.9, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 19, 5195, 2009 Mean nAts 34.2, mcs nAts: 31 ---------------- Bioorg. Med. Chem., 17, 5549, 2009 Mean nAts 31.5, mcs nAts: 23 ---------------- Bioorg. Med. Chem. Lett., 20, 26, 2010 Mean nAts 34.1, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 19, 309, 2009 Mean nAts 26.8, mcs nAts: 17 ---------------- Bioorg. Med. Chem. Lett., 17, 3925, 2007 Mean nAts 25.1, mcs nAts: 22 ---------------- J. Med. Chem., 51, 1904, 2008 Mean nAts 28.4, mcs nAts: 25 ---------------- Bioorg. Med. Chem. Lett., 16, 731, 2006 Mean nAts 29.1, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 17, 673, 2007 Mean nAts 32.0, mcs nAts: 30 ---------------- Bioorg. Med. Chem. Lett., 19, 2591, 2009 Mean nAts 31.9, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 17, 6299, 2007 Mean nAts 27.4, mcs nAts: 24 ---------------- Bioorg. Med. Chem. Lett., 15, 645, 2005 Mean nAts 32.1, mcs nAts: 23 ---------------- J. Med. Chem., 52, 3001, 2009 Mean nAts 24.0, mcs nAts: 11 ---------------- Bioorg. Med. Chem. Lett., 20, 1278, 2010 Mean nAts 30.6, mcs nAts: 23 ---------------- Bioorg. Med. Chem. Lett., 15, 4794, 2005 Mean nAts 30.8, mcs nAts: 23 ---------------- J. Med. Chem., 51, 5397, 2008 Mean nAts 34.0, mcs nAts: 29 ---------------- Bioorg. Med. Chem. Lett., 20, 608, 2010 Mean nAts 34.9, mcs nAts: 33 ---------------- Bioorg. Med. Chem. Lett., 21, 182, 2011 Mean nAts 27.3, mcs nAts: 25 ---------------- J. Med. Chem., 53, 1338, 2010 Mean nAts 37.4, mcs nAts: 15 ---------------- Bioorg. Med. Chem. Lett., 18, 3695, 2008 Mean nAts 32.6, mcs nAts: 25 ---------------- Bioorg. Med. Chem. Lett., 18, 2830, 2008 Mean nAts 27.5, mcs nAts: 21 ---------------- Bioorg. Med. Chem. Lett., 20, 453, 2010 Mean nAts 35.5, mcs nAts: 28
alldata=processTarget(11534)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Cathepsin S Homo sapiens 11 papers ---------------- Bioorg. Med. Chem. Lett., 16, 5107, 2006 Mean nAts 35.7, mcs nAts: 30 ---------------- J. Med. Chem., 50, 591, 2007 Mean nAts 30.5, mcs nAts: 23 ---------------- Bioorg. Med. Chem. Lett., 17, 5525, 2007 Mean nAts 43.2, mcs nAts: 40 ---------------- Bioorg. Med. Chem. Lett., 20, 2370, 2010 Mean nAts 41.2, mcs nAts: 33 ---------------- Bioorg. Med. Chem. Lett., 16, 1975, 2006 Mean nAts 35.3, mcs nAts: 31 ---------------- Bioorg. Med. Chem. Lett., 16, 2209, 2006 Mean nAts 42.6, mcs nAts: 39 ---------------- Bioorg. Med. Chem. Lett., 20, 4507, 2010 Mean nAts 24.4, mcs nAts: 21 ---------------- Bioorg. Med. Chem. Lett., 18, 5280, 2008 Mean nAts 32.2, mcs nAts: 30 ---------------- Bioorg. Med. Chem. Lett., 20, 2375, 2010 Mean nAts 43.0, mcs nAts: 40 ---------------- Bioorg. Med. Chem. Lett., 20, 2379, 2010 Mean nAts 45.2, mcs nAts: 37 ---------------- Bioorg. Med. Chem. Lett., 19, 6131, 2009 Mean nAts 38.4, mcs nAts: 35
alldata=processTarget(11140)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Dipeptidyl peptidase IV Homo sapiens 23 papers ---------------- Bioorg. Med. Chem. Lett., 18, 5435, 2008 Mean nAts 32.2, mcs nAts: 26 ---------------- Bioorg. Med. Chem., 19, 172, 2011 Mean nAts 31.2, mcs nAts: 25 ---------------- Bioorg. Med. Chem. Lett., 19, 6340, 2009 Mean nAts 29.6, mcs nAts: 23 ---------------- Bioorg. Med. Chem. Lett., 17, 49, 2007 Mean nAts 25.8, mcs nAts: 20 ---------------- Bioorg. Med. Chem. Lett., 15, 4770, 2005 Mean nAts 16.3, mcs nAts: 15 ---------------- Bioorg. Med. Chem. Lett., 20, 4395, 2010 Mean nAts 26.1, mcs nAts: 21 ---------------- Bioorg. Med. Chem. Lett., 16, 6226, 2006 Mean nAts 28.3, mcs nAts: 19 ---------------- Bioorg. Med. Chem. Lett., 20, 6273, 2010 Mean nAts 27.8, mcs nAts: 26 ---------------- J. Med. Chem., 50, 6450, 2007 Mean nAts 30.5, mcs nAts: 22 ---------------- Bioorg. Med. Chem. Lett., 18, 2362, 2008 Mean nAts 20.5, mcs nAts: 19 ---------------- Bioorg. Med. Chem. Lett., 17, 5806, 2007 Mean nAts 29.8, mcs nAts: 19 ---------------- Bioorg. Med. Chem. Lett., 18, 3158, 2008 Mean nAts 32.9, mcs nAts: 16 ---------------- Bioorg. Med. Chem. Lett., 21, 1366, 2011 Mean nAts 35.0, mcs nAts: 29 ---------------- Eur. J. Med. Chem., 45, 4953, 2010 Mean nAts 30.6, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 20, 1109, 2010 Mean nAts 26.2, mcs nAts: 25 ---------------- Bioorg. Med. Chem. Lett., 17, 2005, 2007 Mean nAts 25.5, mcs nAts: 15 ---------------- Bioorg. Med. Chem. Lett., 19, 1991, 2009 Mean nAts 26.7, mcs nAts: 25 ---------------- Bioorg. Med. Chem., 19, 4953, 2011 Mean nAts 25.7, mcs nAts: 23 ---------------- Bioorg. Med. Chem. Lett., 20, 7246, 2010 Mean nAts 30.4, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 14, 5151, 2004 Mean nAts 34.1, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 18, 2409, 2008 Mean nAts 27.7, mcs nAts: 20 ---------------- Bioorg. Med. Chem. Lett., 15, 2253, 2005 Mean nAts 29.2, mcs nAts: 24 ---------------- Bioorg. Med. Chem. Lett., 21, 3809, 2011 Mean nAts 29.4, mcs nAts: 23
alldata=processTarget(114)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Adenosine A1 receptor Homo sapiens 33 papers ---------------- J. Med. Chem., 54, 5205, 2011 Mean nAts 28.2, mcs nAts: 23 ---------------- Bioorg. Med. Chem. Lett., 19, 1399, 2009 Mean nAts 32.2, mcs nAts: 28 ---------------- Bioorg. Med. Chem., 17, 5259, 2009 Mean nAts 34.4, mcs nAts: 28 ---------------- J. Med. Chem., 49, 2861, 2006 Mean nAts 25.2, mcs nAts: 21 ---------------- J. Med. Chem., 50, 4061, 2007 Mean nAts 24.3, mcs nAts: 19 ---------------- J. Med. Chem., 50, 5676, 2007 Mean nAts 26.0, mcs nAts: 20 ---------------- J. Med. Chem., 51, 4449, 2008 Mean nAts 21.7, mcs nAts: 19 ---------------- Bioorg. Med. Chem., 16, 8546, 2008 Mean nAts 30.3, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 15, 3081, 2005 Mean nAts 25.1, mcs nAts: 20 ---------------- J. Med. Chem., 51, 1719, 2008 Mean nAts 28.3, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 20, 5241, 2010 Mean nAts 25.0, mcs nAts: 20 ---------------- J. Med. Chem., 51, 7099, 2008 Mean nAts 27.1, mcs nAts: 24 ---------------- J. Med. Chem., 48, 6887, 2005 Mean nAts 19.6, mcs nAts: 16 ---------------- Bioorg. Med. Chem. Lett., 20, 1697, 2010 Mean nAts 23.1, mcs nAts: 21 ---------------- Bioorg. Med. Chem., 16, 2741, 2008 Mean nAts 26.3, mcs nAts: 19 ---------------- J. Med. Chem., 49, 3682, 2006 Mean nAts 29.8, mcs nAts: 25 ---------------- Bioorg. Med. Chem. Lett., 19, 2664, 2009 Mean nAts 23.7, mcs nAts: 15 ---------------- J. Med. Chem., 49, 7373, 2006 Mean nAts 30.8, mcs nAts: 25 ---------------- Bioorg. Med. Chem. Lett., 18, 1397, 2008 Mean nAts 29.6, mcs nAts: 25 ---------------- Bioorg. Med. Chem., 18, 2491, 2010 Mean nAts 23.7, mcs nAts: 19 ---------------- Bioorg. Med. Chem. Lett., 16, 3642, 2006 Mean nAts 35.0, mcs nAts: 25 ---------------- Bioorg. Med. Chem. Lett., 20, 4140, 2010 Mean nAts 29.8, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 17, 6779, 2007 Mean nAts 30.9, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 21, 1933, 2011 Mean nAts 35.9, mcs nAts: 27 ---------------- J. Med. Chem., 50, 828, 2007 Mean nAts 23.2, mcs nAts: 22 ---------------- J. Med. Chem., 45, 770, 2002 Mean nAts 31.1, mcs nAts: 28 ---------------- Bioorg. Med. Chem. Lett., 16, 5993, 2006 Mean nAts 22.2, mcs nAts: 18 ---------------- J. Med. Chem., 49, 273, 2006 Mean nAts 28.9, mcs nAts: 23 ---------------- J. Med. Chem., 52, 3994, 2009 Mean nAts 35.6, mcs nAts: 23 ---------------- J. Med. Chem., 48, 7932, 2005 Mean nAts 20.6, mcs nAts: 15 ---------------- Bioorg. Med. Chem. Lett., 15, 609, 2005 Mean nAts 36.0, mcs nAts: 36 ---------------- Bioorg. Med. Chem. Lett., 18, 2813, 2008 Mean nAts 33.7, mcs nAts: 29 ---------------- J. Med. Chem., 51, 5875, 2008 Mean nAts 27.8, mcs nAts: 23
alldata=processTarget(10980)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Vascular endothelial growth factor receptor 2 Homo sapiens 24 papers ---------------- Bioorg. Med. Chem. Lett., 14, 351, 2004 Mean nAts 27.6, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 16, 1440, 2006 Mean nAts 29.2, mcs nAts: 16 ---------------- J. Med. Chem., 51, 3814, 2008 Mean nAts 31.8, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 16, 1913, 2006 Mean nAts 30.0, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 16, 4371, 2006 Mean nAts 24.5, mcs nAts: 17 ---------------- J. Med. Chem., 50, 4453, 2007 Mean nAts 36.2, mcs nAts: 31 ---------------- J. Med. Chem., 52, 278, 2009 Mean nAts 28.3, mcs nAts: 21 ---------------- Bioorg. Med. Chem. Lett., 16, 2158, 2006 Mean nAts 22.9, mcs nAts: 18 ---------------- Bioorg. Med. Chem., 15, 3635, 2007 Mean nAts 35.3, mcs nAts: 17 ---------------- Bioorg. Med. Chem. Lett., 16, 4266, 2006 Mean nAts 38.0, mcs nAts: 28 ---------------- J. Med. Chem., 51, 1231, 2008 Mean nAts 29.0, mcs nAts: 26 ---------------- J. Med. Chem., 50, 611, 2007 Mean nAts 37.8, mcs nAts: 28 ---------------- J. Med. Chem., 50, 627, 2007 Mean nAts 34.4, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 15, 1429, 2005 Mean nAts 27.9, mcs nAts: 22 ---------------- J. Med. Chem., 49, 2143, 2006 Mean nAts 27.8, mcs nAts: 23 ---------------- Bioorg. Med. Chem., 18, 7150, 2010 Mean nAts 34.3, mcs nAts: 31 ---------------- J. Med. Chem., 47, 6363, 2004 Mean nAts 24.8, mcs nAts: 20 ---------------- Bioorg. Med. Chem. Lett., 20, 3356, 2010 Mean nAts 34.5, mcs nAts: 33 ---------------- Bioorg. Med. Chem. Lett., 13, 2973, 2003 Mean nAts 27.1, mcs nAts: 19 ---------------- Bioorg. Med. Chem. Lett., 21, 2106, 2011 Mean nAts 29.6, mcs nAts: 22 ---------------- Bioorg. Med. Chem. Lett., 17, 1246, 2007 Mean nAts 28.5, mcs nAts: 26 ---------------- J. Med. Chem., 43, 2310, 2000 Mean nAts 26.8, mcs nAts: 24 ---------------- Bioorg. Med. Chem. Lett., 16, 1726, 2006 Mean nAts 28.2, mcs nAts: 25 ---------------- Bioorg. Med. Chem. Lett., 12, 3537, 2002 Mean nAts 28.5, mcs nAts: 22
alldata=processTarget(12252)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Beta-secretase 1 Homo sapiens 6 papers ---------------- Bioorg. Med. Chem. Lett., 16, 3635, 2006 Mean nAts 44.2, mcs nAts: 37 ---------------- Bioorg. Med. Chem. Lett., 20, 603, 2010 Mean nAts 41.9, mcs nAts: 38 ---------------- Bioorg. Med. Chem., 18, 630, 2010 Mean nAts 27.7, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 18, 414, 2008 Mean nAts 41.7, mcs nAts: 38 ---------------- J. Med. Chem., 51, 3313, 2008 Mean nAts 41.6, mcs nAts: 31 ---------------- Bioorg. Med. Chem. Lett., 20, 1885, 2010 Mean nAts 38.2, mcs nAts: 25
alldata=processTarget(19905)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Melanin-concentrating hormone receptor 1 Homo sapiens 33 papers ---------------- Bioorg. Med. Chem., 15, 2092, 2007 Mean nAts 31.8, mcs nAts: 17 ---------------- Bioorg. Med. Chem. Lett., 16, 5270, 2006 Mean nAts 32.3, mcs nAts: 29 ---------------- J. Med. Chem., 53, 5576, 2010 Mean nAts 40.4, mcs nAts: 31 ---------------- Bioorg. Med. Chem. Lett., 16, 4262, 2006 Mean nAts 38.6, mcs nAts: 34 ---------------- Bioorg. Med. Chem. Lett., 19, 5339, 2009 Mean nAts 34.3, mcs nAts: 31 ---------------- Bioorg. Med. Chem. Lett., 17, 832, 2007 Mean nAts 35.7, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 16, 5066, 2006 Mean nAts 36.7, mcs nAts: 29 ---------------- Bioorg. Med. Chem., 18, 7365, 2010 Mean nAts 40.1, mcs nAts: 35 ---------------- Bioorg. Med. Chem. Lett., 16, 4237, 2006 Mean nAts 33.2, mcs nAts: 26 ---------------- Bioorg. Med. Chem., 15, 3896, 2007 Mean nAts 40.1, mcs nAts: 35 ---------------- Bioorg. Med. Chem. Lett., 16, 5445, 2006 Mean nAts 28.9, mcs nAts: 21 ---------------- Bioorg. Med. Chem. Lett., 16, 5207, 2006 Mean nAts 34.7, mcs nAts: 18 ---------------- Bioorg. Med. Chem., 19, 883, 2011 Mean nAts 30.4, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 15, 4174, 2005 Mean nAts 32.2, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 16, 4450, 2006 Mean nAts 40.5, mcs nAts: 32 ---------------- Bioorg. Med. Chem. Lett., 19, 5186, 2009 Mean nAts 30.9, mcs nAts: 28 ---------------- Bioorg. Med. Chem. Lett., 15, 5293, 2005 Mean nAts 29.9, mcs nAts: 24 ---------------- Bioorg. Med. Chem. Lett., 19, 4589, 2009 Mean nAts 30.7, mcs nAts: 27 ---------------- J. Med. Chem., 49, 2294, 2006 Mean nAts 36.9, mcs nAts: 31 ---------------- Bioorg. Med. Chem. Lett., 15, 3439, 2005 Mean nAts 32.7, mcs nAts: 30 ---------------- J. Med. Chem., 49, 6569, 2006 Mean nAts 31.5, mcs nAts: 29 ---------------- Bioorg. Med. Chem. Lett., 19, 4274, 2009 Mean nAts 36.4, mcs nAts: 29 ---------------- Bioorg. Med. Chem. Lett., 14, 5075, 2004 Mean nAts 33.8, mcs nAts: 19 ---------------- Bioorg. Med. Chem. Lett., 16, 3674, 2006 Mean nAts 39.6, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 16, 3668, 2006 Mean nAts 38.7, mcs nAts: 32 ---------------- Bioorg. Med. Chem., 19, 5539, 2011 Mean nAts 31.8, mcs nAts: 29 ---------------- J. Med. Chem., 49, 7095, 2006 Mean nAts 35.2, mcs nAts: 23 ---------------- Bioorg. Med. Chem. Lett., 22, 427, 2012 Mean nAts 37.7, mcs nAts: 36 ---------------- Bioorg. Med. Chem. Lett., 17, 2365, 2007 Mean nAts 37.6, mcs nAts: 31 ---------------- Bioorg. Med. Chem. Lett., 16, 1070, 2006 Mean nAts 33.9, mcs nAts: 21 ---------------- Bioorg. Med. Chem. Lett., 15, 3701, 2005 Mean nAts 30.2, mcs nAts: 18 ---------------- Bioorg. Med. Chem. Lett., 15, 3696, 2005 Mean nAts 35.2, mcs nAts: 30 ---------------- Bioorg. Med. Chem., 19, 6261, 2011 Mean nAts 31.8, mcs nAts: 28
alldata=processTarget(90)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Dopamine D4 receptor Homo sapiens 4 papers ---------------- Bioorg. Med. Chem. Lett., 14, 4847, 2004 Mean nAts 27.3, mcs nAts: 21 ---------------- J. Med. Chem., 43, 4563, 2000 Mean nAts 25.0, mcs nAts: 24 ---------------- Bioorg. Med. Chem. Lett., 9, 585, 1999 Mean nAts 23.4, mcs nAts: 16 ---------------- Bioorg. Med. Chem. Lett., 15, 5253, 2005 Mean nAts 28.4, mcs nAts: 21
alldata=processTarget(93)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Acetylcholinesterase Homo sapiens 8 papers ---------------- J. Med. Chem., 51, 3588, 2008 Mean nAts 32.9, mcs nAts: 22 ---------------- J. Med. Chem., 49, 7540, 2006 Mean nAts 36.8, mcs nAts: 23 ---------------- J. Med. Chem., 52, 2724, 2009 Mean nAts 28.4, mcs nAts: 27 ---------------- J. Med. Chem., 51, 7308, 2008 Mean nAts 42.1, mcs nAts: 18 ---------------- Bioorg. Med. Chem., 15, 575, 2007 Mean nAts 32.4, mcs nAts: 22 ---------------- J. Med. Chem., 37, 1996, 1994 Mean nAts 23.5, mcs nAts: 21 ---------------- Bioorg. Med. Chem. Lett., 21, 2505, 2011 Mean nAts 27.4, mcs nAts: 8 ---------------- Bioorg. Med. Chem. Lett., 5, 2077, 1995 Mean nAts 31.1, mcs nAts: 28
That second to last one didn't work very well. Let's look at the compounds:
pkl,target,species=tgts[93]
sets = cPickle.load(file(pkl))
l = list(sets.iteritems())
docid,cmpds = l[-2]
ids,smis= zip(*cmpds)
mols = [Chem.MolFromSmiles(x) for x in smis]
Draw.MolsToGridImage(mols,molsPerRow=5)
Looks like a pathology in the MCS-finding algorithm.
alldata=processTarget(10280)
Draw.MolsToGridImage([x[2] for x in alldata],molsPerRow=5,kekulize=False)
-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* Histamine H3 receptor Homo sapiens 23 papers ---------------- Bioorg. Med. Chem. Lett., 22, 186, 2012 Mean nAts 25.6, mcs nAts: 24 ---------------- Bioorg. Med. Chem. Lett., 20, 3295, 2010 Mean nAts 27.3, mcs nAts: 23 ---------------- J. Med. Chem., 48, 6482, 2005 Mean nAts 26.8, mcs nAts: 19 ---------------- Bioorg. Med. Chem. Lett., 20, 6246, 2010 Mean nAts 22.2, mcs nAts: 11 ---------------- Bioorg. Med. Chem. Lett., 17, 2566, 2007 Mean nAts 27.8, mcs nAts: 25 ---------------- J. Med. Chem., 51, 5423, 2008 Mean nAts 31.8, mcs nAts: 28 ---------------- Bioorg. Med. Chem. Lett., 20, 6226, 2010 Mean nAts 27.1, mcs nAts: 24 ---------------- Bioorg. Med. Chem. Lett., 18, 5101, 2008 Mean nAts 26.8, mcs nAts: 21 ---------------- Bioorg. Med. Chem. Lett., 17, 3130, 2007 Mean nAts 30.2, mcs nAts: 20 ---------------- Bioorg. Med. Chem. Lett., 21, 6126, 2011 Mean nAts 28.8, mcs nAts: 23 ---------------- Bioorg. Med. Chem., 18, 5441, 2010 Mean nAts 20.1, mcs nAts: 18 ---------------- Bioorg. Med. Chem. Lett., 19, 4232, 2009 Mean nAts 29.2, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 20, 2359, 2010 Mean nAts 29.7, mcs nAts: 26 ---------------- Bioorg. Med. Chem. Lett., 18, 5032, 2008 Mean nAts 32.7, mcs nAts: 28 ---------------- Bioorg. Med. Chem. Lett., 21, 5384, 2011 Mean nAts 27.7, mcs nAts: 24 ---------------- Bioorg. Med. Chem., 17, 3037, 2009 Mean nAts 22.9, mcs nAts: 19 ---------------- Bioorg. Med. Chem. Lett., 20, 4210, 2010 Mean nAts 26.3, mcs nAts: 23 ---------------- Bioorg. Med. Chem. Lett., 20, 5713, 2010 Mean nAts 27.5, mcs nAts: 24 ---------------- Bioorg. Med. Chem. Lett., 18, 39, 2008 Mean nAts 29.7, mcs nAts: 27 ---------------- Bioorg. Med. Chem. Lett., 17, 1047, 2007 Mean nAts 29.8, mcs nAts: 28 ---------------- Bioorg. Med. Chem. Lett., 16, 3162, 2006 Mean nAts 25.2, mcs nAts: 23 ---------------- Bioorg. Med. Chem. Lett., 22, 1504, 2012 Mean nAts 25.4, mcs nAts: 18 ---------------- Bioorg. Med. Chem. Lett., 17, 1443, 2007 Mean nAts 25.7, mcs nAts: 22