Summary
Recent research has shown that using data fusion rules in fingerprint-based similarity searching can improve results over traditional searches. Group fusion scores, which use multiple reference compounds, have in particular been shown to be quite effective in increasing enrichment rates over single reference structure based searches. In this paper, the effectiveness of using data fusion with multiple reference compounds to increase similarity search recall rates was investigated using 44 biological targets and four different 2D fingerprinting systems, including a new 2D typed triangle fingerprinting system introduced here. Scaffold-hopping abilities using data fusion rules were investigated using eight (8) different classes of scaffolds active against cGMP phosphodiesterase isoform 5 (PDE5). An approach to using the reference group for ranking and visualizing important fingerprints bits, or reverse fingerprinting, was presented, and used to score and visualize important pharmacophore features within sample active molecules. Finally, similarity statistics within the reference groups were investigated and compared to recall rates.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Abbreviations
- GpiDAPH:
-
graph pi-donor-acceptor-polar-hydrophobe fingerprints
- TGT:
-
typed graph triangle fingerprints
- PCH:
-
polar-charged-hydrophobe fingerprints
- MACCS:
-
166 public MACCS keys
- n :
-
group count (# of reference structures)
- ROC:
-
receiver operator characteristic curve
- AVE:
-
average fusion rule
- MAX:
-
maximum fusion rule
- vHTS:
-
virtual high-throughput screening
- Ck :
-
bit coverage in the training group
- Tk :
-
bit importance
- wL :
-
pharmacophore fragment score
- fk i :
-
bit position
References
Willett, P., Chemical similarity searching, J. Chem. Inf. Comput. Sci., 38 (1998) 983–996.
Sheridan, R.P. and Kearsley, S.K., Why do we need so many chemical similarity search methods?, Drug Discovery Today, 7 (2002) 903–911.
Miller, M.A., Chemical Database Techniques in Drug Discovery, Nat. Rev. Drug Discov., 1 (2002) 220–227.
Walters, P. et al., Virtual Screening — an Overview, Drug Discov. Today, 3 (1998) 160–178.
Johnson, M.A. and Maggiora, G.M., Concepts and Applications of Molecular Similarity, Wiley, New York, 1990.
Kubinyi, H., Similarity and Dissimilarity — A Medicinal Chemists View, Perspect. Drug Discovery Des., 11 (1998) 225–252.
Martin, Y.C., Kofron, J.L. and Traphagan, L.M., Do Structurally Similar Molecules Have Similar Biological Activity?, J. Med. Chem., 45 (2002) 4350–4358.
Downs, G.M. and Willett, P., Similarity Searching in Databases of Chemical Structures, Rev. Comput. Chem., 7 (1995) 1–66.
Leach, A.R. and Gillet, V.J., An Introduction to Chemoinfomatics, Kluwer Academic, Boston, 2003.
Ginn, C.M.R., Willett, P. and Bradshaw, J., Combination of Molecular Similarity Measures using Data Fusion, Perspect Drug Discov Design, 20 (2000) 1–16.
Ginn, C.M.R., The Application of Data Fusion to Similarity Searching of Chemical Databases, Ph.D. thesis, University of Sheffield, 1998.
Charifson, P.S., Corkery, J.J., Murcko, M.A. and Walters, W.P., Consensus Scoring: A method for Obtaining Improved Hit Rates from Docking Databases of Three-Dimensional Structures to Proteins, J. Med. Chem., 42 (1999) 5100–5109.
Kontoyianni, M., McClellan, L. and Sokol, G.S., Evaluation of Docking Performance: Comparative Data on Docking Algorithms, J. Med. Chem., 47 (2004) 558–565.
Bissantz, C., Folkers, G. and Rognan, D., Protein-based Virtual Screening of Chemical Databases. 1. Evaluation of Different Docking/Scoring Combinations, J. Med. Chem., 43 (2000) 4759–4767.
Stahl, M. and Rarey, M., Detailed Analysis of Scoring Functions for Virtual Screening, J. Med. Chem., 44 (2001) 1035–1042.
Tong, W., Hong, H., Fang, H., Xie, Q. and Perkins, R., Decision Forest: Combining the Predictions of Multiple Independent Decision Tree Models, J, Chem. Inf. Comp. Sci., 43 (2003) 525–531.
Jurs, P.C., Kaufmann, G.W. and Mattioni, B.E., Predicting the Genotoxicity of Secondary and Aromatic Amines using Data Subsetting to Generate a Model Ensemble, J. Chem. Inf. Comp. Sci., 43 (2003) 949–963.
Mozziconacci, J.C., Arnoult, E., Baurin, N., Chavatte, P., Marot, C. and Morin-Allory, L., 2-D QSAR Consensus Prediction for High-Throughput Virtual Screening; An Application to COX-2 Inhibition Modeling and Screening of the NCI Database, J. Chem. Inf. Comp. Sci., 44 (2004) 276–285.
Votano, J.R., Parham, M., Hall, L.H., Kier, L.B., Oloff, S., Tropsha, A., Xie, Q. and Tong, W., Three New Consensus QSAR Models for the Prediction of Ames Genotoxicity, Mutagenesis, 19 (2004) 365–378.
Votano, J.R., Parham, M., Hall, L.H. and Kier, L.B., New Predictors for Several ADME/Tox Properties: Aqueous Solubility, Human Oral Absorption, and Ames Genotoxicity Using Topological Descriptors, Mol. Divers., 8 (2004) 835–841.
Wang, R. and Wang, S., How does Consensus Scoring work for Virtual Library Screening? An Idealized Computer Experiment, J. Chem. Inf. Comp. Sci., 41 (2001) 1422–1426.
Feher, M., Baber, J.C., Shirley, W.A. and Gao, Y., The Use of Consensus Scoring in Ligand-based Virtual Screening, J. Chem. Inf. Comput. Sci., 46 (2006) 277–288.
Klon, A.E., Glick, M., Thoma, M., Acklin, P. and Davies, J.W., Finding more Needles in the Haystack; A Simple and Efficient Method for Improving High-Throughput Docking Results, J. Med. Chem., 47 (2004) 2743–2749.
Hert, J., Willett, P., Wilton, D.J., Acklin, P.A., Azzaoui, K., Jacoby, E. and Schuffenhauer, A., Comparison of Topological Descriptors for Similarity-Based Virtual Screening using Multiple Bioactive Reference Structures, Org. Biomol. Chem., 2 (2004) 3256–3266.
Willett, P., Searching Techniques for Databases of Two- and Three-Dimensional Chemical Structures, J. Med. Chem., 48 (2005) 4183–4199.
Brown, R. and Martin, E., Use of Structure-Activity Data to Compare Structure-Based Clustering Methods and Descriptors for use in Compound Selection, J. Chem. Inf. Comput. Sci., 36 (1996) 572–584.
Schuffenhauer, A., Floersheim, P., Acklin, P. and Jacoby, E., Similarity Metrics for Ligands Reflecting the Similarity of the Target Proteins, J. Chem. Inf. Comput. Sci., 43 (2003) 391–405.
Rarey, M. and Dixon, J.S., Feature Trees: A New Molecular Similarity Measure Based on Tree Matching, J. Comput. Aided Mol. Des., 12 (1998) 471–490.
Xue, L., Godden, J.W., Stahura, F.L. and Bajorath, J., Design and Evaluation of a Molecular Fingerprint Involving the Transformation of Property Descriptor Values into a Biniary Classification Scheme, J. Chem. Inf. Comp. Sci. 43 (2003) 1151–1157.
James, C.A. and Weininger, D., Daylight Theory manual, Daylight Chemical Information Systems, Inc., Irvine, CA, USA, www.daylight.com
Unity, Chemical Information Software, Tripos, Inc., St. Louis, MO, USA, www.tripos.com
Durant, J.L, Leland, B.A., Henry, D.R. and Nourse, J.G., Reoptimization of MDL Keys for use in Drug Discovery, J. Chem. Inf. Comput. Sci., 42 (2002) 1273–1280.
ECFP*/FCFP*, Extended Connectivity Rings, Scitegic Inc., San Diego CA, USA 92123 www.scitegic.com
BCI — Barnard Chemical Information Ltd., Sheffield, UK, www.bci.gb.com
Xue, L., Godden, J.W. and Bajorath, J., Database Searching for Compounds with Similar Biological Activity Using Short Binary Bit String Representations of Molecules, J. Chem. Inf. Comput. Sci., 39 (1999) 881–886.
Good, A.C.; Hermsmeier, M.A. and Hindle, S.A., Measuring CAMD Technique Performance: A Virtual Screening Case Study in the Design of Validation Experiments, J. Comput.-Aided Mol. Des., 18 (2004) 529–536.
Good, A.C., Mason, J.S. and Cho, S.-J., Descriptors You Can Count On? Normalized and Filtered Descriptors for Virtual Screening, J. Comput.-Aided Mol. Des., 18 (2004) 523–527.
MOE software (Version 2005.06) available from Chemical Computing Group Inc., 1010 Sherbrooke St. West, Montreal, Quebec, Canada www.chemcomp.com
Sheridan, R.P., Miller, M.D., Underwood, D.J. and Kearsley, S.K., Chemical Similarity Using Geometric Atom Pair Descriptors, J. Chem. Inf. Comput. Sci., 36 (1996) 128–135.
Clark, R.D., Fox, P.C. and Abrahamian, E.J., Using pharmacophore multiplets fingerprint for virtual high throughput screening. In: Alvarez, J., Shoichet, B. (Eds.), Virtual Screening in Drug Discovery, Taylor and Francis, New York, 2005, ISBN 0-8247-5479-4, pp. 207–224.
Schneider, G., Neidhart, W., Giller, T. and Schmid, G., “Scaffold hopping” by Topological Pharmacophore Search: A Contribution to Virtual Screening, Angew. Chem. Int. Ed., 38 (1999) 2894–2896.
Gelman, A., Carlin, J.B., Stern, H.S. and Rubin, D.B., Bayesian Data Analysis, Chapman and Hall, New York, 1998.
Labute, P., Binary-QSAR: A New Method for Quantitative Structure-Activity Relationships, in Biocomputing: Proccedings of the 1999 Pacific Symposium, pp. 444–455. World Scientific Publishing, Singapore, 1999.
Shemetulskis, N.E., Weininger, D., Blankley, C.J., Yang, J.J. and Humblet, C., Stigmata: An Algorithm to Determine Structural Commonalities in Diverse Datasets, J. Chem. Inf. Comput. Sci., 36 (1996) 862–871.
MACCS keys: MDL Information Syetems, Inc., 14600 Catalina Street, San Leandro, CA 94577.
Witten, I.H. and Frank, E., Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations, Morgan Kaufmann Publishers, New York, 1999.
Holtje, H.-D., Pharmacophore Identification and Receptor Mapping, In Wermuth, C.G. (Ed.), The Practice of Medicinal Chemistry, Academic Press, Boston, 2003, pp. 387–403.
Bramson, N.H. et al., Oxindole-Based Inhibitors of Cyclin-Dependent Kinase 2 (CDK2): Design, Synthesis, Enzymatic Activites and X-ray Crystallographic Analysis, J. Med. Chem., 44 (2001) 4339–4358.
Norman, P., PDE4 Inhibitors: Patent and Literature Activity 1999-mid 2000, Exp. Opin. Ther. Patents, 10 (2000) 1417–1429.
Brandstetter, H., Kuhne, A., Bode, W., Huber, R., Von der Saal, W., Wirthensohn, K. and Engh, R.A., X-ray Structure of Active Site Inhibited Clotting Factor Xa: Implications for Drug Design and Substrate Recognition, J. Biol. Chem., 271 (1996) 29988.
Rotella, D.P., Phosphodiestarase 5 Inhibitors: Current Status and Potential Applications, Nature Reviews: Drug Discovery, 1 (2002) 674–682.
Watanabe, Y., Usui, H., Shibano, T., Tanaka, T. and Kanoa, M., Synthesis of Monocyclic and Bicyclic 2,4(1H,3H)-Pyrimidinediones and their Serotonin 2 Antagonist Activities, Chem. Pharm. Bull., 38 (1990) 2726–2732.
Ketanserin patent, Janssen Pharmaceuticals N.V., European Patent Office. Kennis, L.E.J., Van der Aa, M.J.M., Van Heertum, A.M.A. and Jones, A.J. (1980) Nr. 001362, Appl. Nr. 803000–595.
Xu, R.X. et al., Crystal Structures of the Catalytic Domain of Phosphodiesterase 4B Complexed with AMP, 8-Br-AMP and Rolipram, J. Mol. Biol., 337 (2004) 355–365.
Bode, W., Turk, D. and Karshikov, A., The refined 1.9 A X-ray crystal structure of D-Phe-Pro-Arg chloromethyl ketone-inhibited human α thrombin: Structural analysis, overall structure, detailed active site geometry and structure-function relationships, Protein Sci., 1 (1992) 426–471.
Zhang, K.Y.J. et al., A Glutamine Switch Mechanism for Nucleotide Selectivity by Phosphodiesterases, Mol. Cell., 15 (2004) 279–286.
Schneider, G. and Fechner, U., Computer-Based De Novo Design of Drug-Like Molecules, Nature Reviews: Drug Discovery, 4 (2005) 649–663.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Williams, C. Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance. Mol Divers 10, 311–332 (2006). https://doi.org/10.1007/s11030-006-9039-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11030-006-9039-z