Skip to main content

Similarity Searching Using 2D Structural Fingerprints

  • Protocol
  • First Online:
Chemoinformatics and Computational Chemical Biology

Part of the book series: Methods in Molecular Biology ((MIMB,volume 672))

Abstract

This chapter reviews the use of molecular fingerprints for chemical similarity searching. The fingerprints encode the presence of 2D substructural fragments in a molecule, and the similarity between a pair of molecules is a function of the number of fragments that they have in common. Although this provides a very simple way of estimating the degree of structural similarity between two molecules, it has been found to provide an effective and an efficient tool for searching large chemical databases. The review describes the historical development of similarity searching since it was first described in the mid-1980s, reviews the many different coefficients, representations, and weightings that can be combined to form a similarity measure, describes quantitative measures of the effectiveness of similarity searching, and concludes by looking at current developments based on the use of data fusion and machine learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Rouvray, D. H. (1990) The evolution of the concept of molecular similarity, in Concepts and Applications of Molecular Similarity (Johnson, M. A., and Maggiora, G. M., Eds.), pp 15–42, John Wiley, Chichester.

    Google Scholar 

  2. Bender, A., and Glen, R. C. (2004) Molecular similarity: a key technique in molecular informatics. Organic and Biomolecular Chemistry 2, 3204–3218.

    Article  PubMed  CAS  Google Scholar 

  3. Dean, P. M., (Ed.) (1994) Molecular Similarity in Drug Design, Chapman and Hall, Glasgow.

    Google Scholar 

  4. Downs, G. M., and Willett, P. (1995) Similarity searching in databases of chemical structures. Reviews in Computational Chemistry 7, 1–66.

    Google Scholar 

  5. Maldonado, A. G., Doucet, J. P., Petitjean, M., and Fan, B.-T. (2006) Molecular similarity and diversity in chemoinformatics: from theory to applications. Molecular Diversity 10, 39–79.

    Article  PubMed  CAS  Google Scholar 

  6. Nikolova, N., and Jaworska, J. (2003) Approaches to measure chemical similarity – a review. Quantitative Structure-Activity Relationships and Combinatorial Science 22, 1006–1026.

    Google Scholar 

  7. Sheridan, R. P., and Kearsley, S. K. (2002) Why do we need so many chemical similarity search methods? Drug Discovery Today 7, 903–911.

    Article  PubMed  Google Scholar 

  8. Alvarez, J., and Shoichet, B., (Eds.) (2005) Virtual Screening in Drug Discovery, CRC Press, Boca Raton.

    Google Scholar 

  9. Bajorath, J. (2002) Integration of virtual and high-throughput screening. Nature Reviews Drug Discovery 1, 882–894.

    Article  PubMed  CAS  Google Scholar 

  10. Böhm, H.-J., and Schneider, G., (Eds.) (2000) Virtual Screening for Bioactive Molecules, Wiley-VCH, Weinheim.

    Google Scholar 

  11. Klebe, G., (Ed.) (2000) Virtual Screening: An Alternative or Complement to High Throughput Screening, Kluwer, Dordrecht.

    Google Scholar 

  12. Lengauer, T., Lemmen, C., Rarey, M., and Zimmermann, M. (2004) Novel technologies for virtual screening. Drug Discovery Today 9, 27–34.

    Article  PubMed  CAS  Google Scholar 

  13. Oprea, T. I., and Matter, H. (2004) Integrating virtual screening in lead discovery. Current Opinion in Chemical Biology 8, 349–358.

    Article  PubMed  CAS  Google Scholar 

  14. Gedeck, P., Rhode, B., and Bartels, C. (2006) QSAR – how good is it in practice? Comparison of descriptor sets on an unbiased cross section of corporate data sets. Journal of Chemical Information and Modeling 46, 1924–1936.

    Article  PubMed  CAS  Google Scholar 

  15. McGaughey, G. B., Sheridan, R. P., Bayly, C. I., Culberson, J. C., Kreatsoulas, C., Lindsley, S., Maiorov, V., Truchon, J.-F., and Cornell, W. D. (2007) Comparison of topological, shape, and docking methods in virtual screening. Journal of Chemical Information and Modeling 47, 1504–1519.

    Article  PubMed  CAS  Google Scholar 

  16. Sheridan, R. P. (2007) Chemical similarity searches: when is complexity justified? Expert Opinion on Drug Discovery 2, 423–430.

    Article  CAS  Google Scholar 

  17. Sheridan, R. P., McGaughey, G. B., and Cornell, W. D. (2008) Multiple protein structures and multiple ligands: effects on the apparent goodness of virtual screening results. Journal of Computer-Aided Molecular Design 22, 257–265.

    Article  PubMed  CAS  Google Scholar 

  18. Talevi, A., Gavernet, L., and Bruno-Blanch, L. E. (2009) Combined virtual screening strategies. Current Computer-Aided Drug Design 5, 23–37.

    Article  CAS  Google Scholar 

  19. Warren, G. L., Andrews, C. W., Capelli, A.-M., Clarke, B., LaLonde, J., Lambert, M. H., Lindvall, M., Nevins, N., Semus, S. F., Senger, S., Tedesco, G., Wall, I. D., Woolven, J. M., Peishoff, C. E., and Head, M. S. (2006) A critical assessment of docking programs and scoring functions. Journal of Medicinal Chemistry 49, 5912–5931.

    Article  PubMed  CAS  Google Scholar 

  20. Wilton, D., Willett, P., Lawson, K., and Mullier, G. (2003) Comparison of ranking methods for virtual screening in lead-discovery programs. Journal of Chemical Information and Computer Sciences 43, 469–474.

    Article  PubMed  CAS  Google Scholar 

  21. Bajorath, J., (Ed.) (2004) Chemoinformatics Concepts, Methods and Tools for Drug Discovery, Humana Press, Totowa NJ.

    Google Scholar 

  22. Gasteiger, J., and Engel, T., (Eds.) (2003) Chemoinformatics: A Textbook, Wiley-VCH, Weinheim.

    Google Scholar 

  23. Leach, A. R., and Gillet, V. J. (2007) An Introduction to Chemoinformatics, 2nd edition, Kluwer, Dordrecht.

    Book  Google Scholar 

  24. Gasteiger, J., (Ed.) (2003) Handbook of Chemoinformatics, Wiley-VCH, Weinheim.

    Google Scholar 

  25. Johnson, M. A., and Maggiora, G. M., (Eds.) (1990) Concepts and Applications of Molecular Similarity. John Wiley, New York.

    Google Scholar 

  26. Willett, P. (2009) Similarity methods in chemoinformatics. Annual Review of Information Science and Technology 43, 3–71.

    Article  Google Scholar 

  27. Eckert, H., and Bajorath, J. (2007) Molecular similarity analysis in virtual screening: foundations, limitation and novel approaches. Drug Discovery Today 12, 225–233.

    Article  PubMed  CAS  Google Scholar 

  28. Willett, P. (2006) Similarity-based virtual screening using 2D fingerprints. Drug Discovery Today 11, 1046–1053.

    Article  PubMed  CAS  Google Scholar 

  29. Hagadone, T. R. (1992) Molecular substructure similarity searching – efficient retrieval in two-dimensional structure databases. Journal of Chemical Information and Computer Sciences 32, 515–521.

    Article  CAS  Google Scholar 

  30. Senger, S. (2009) Using Tversky similarity searches for core hopping: finding the needles in the haystack. Journal of Chemical Information and Modeling 49, 1514–1524.

    Article  PubMed  CAS  Google Scholar 

  31. Willett, P. (1985) An algorithm for chemical superstructure searching. Journal of Chemical Information and Computer Sciences 25, 114–116.

    Article  CAS  Google Scholar 

  32. Carhart, R. E., Smith, D. H., and Venkataraghavan, R. (1985) Atom pairs as molecular-features in structure activity studies – definition and applications. Journal of Chemical Information and Computer Sciences 25, 64–73.

    Article  CAS  Google Scholar 

  33. Willett, P., Winterman, V., and Bawden, D. (1986) Implementation of nearest-neighbour searching in an online chemical structure search system. Journal of Chemical Information and Computer Sciences 26, 36–41.

    Article  CAS  Google Scholar 

  34. Adamson, G. W., and Bush, J. A. (1973) A method for the automatic classification of chemical structures. Information Storage and Retrieval 9, 561–568.

    Article  CAS  Google Scholar 

  35. Willett, P., Barnard, J. M., and Downs, G. M. (1998) Chemical similarity searching. Journal of Chemical Information and Computer Sciences 38, 983–996.

    Article  CAS  Google Scholar 

  36. Wilkins, C. L., and Randic, M. (1980) A graph theoretical approach to structure-property and structure-activity correlation. Theoretica Chimica Acta 58, 45–68.

    Article  CAS  Google Scholar 

  37. Patterson, D. E., Cramer, R. D., Ferguson, A. M., Clark, R. D., and Weinberger, L. E. (1996) Neighbourhood behaviour: a useful concept for validation of “molecular diversity” descriptors. Journal of Medicinal Chemistry 39, 3049–3059.

    Article  PubMed  CAS  Google Scholar 

  38. Dixon, S. L., and Merz, K. M. (2001) One-dimensional molecular representations and similarity calculations: methodology and validation. Journal of Medicinal Chemistry 44, 3795–3809.

    Article  PubMed  CAS  Google Scholar 

  39. Papadatos, G., Cooper, A. W. J., Kadirkamanathan, V., Macdonald, S. J. F., McLay, I. M., Pickett, S. D., Pritchard, J. M., Willett, P., and Gillet, V. J. (2009) Analysis of neighborhood behaviour in lead optimisation and array design. Journal of Chemical Information and Modeling 49, 195–208.

    Article  PubMed  CAS  Google Scholar 

  40. Perekhodtsev, G. D. (2007) Neighbourhood behavior: validation of two-dimensional molecular similarity as a predictor of similar biological activities and docking scores. QSAR and Combinatorial Science 26, 346–351.

    Article  CAS  Google Scholar 

  41. Willett, P., and Winterman, V. (1986) A comparison of some measures of inter-molecular structural similarity. Quantitative Structure-Activity Relationships 5, 18–25.

    Article  CAS  Google Scholar 

  42. Willett, P. (1987) Similarity and Clustering in Chemical Information Systems, Research Studies Press, Letchworth.

    Google Scholar 

  43. Brown, R. D., and Martin, Y. C. (1996) Use of structure-activity data to compare structure-based clustering methods and descriptors for use in compound selection. Journal of Chemical Information and Computer Sciences 36, 572–584.

    Article  CAS  Google Scholar 

  44. Brown, R. D., and Martin, Y. C. (1997) The information content of 2D and 3D structural descriptors relevant to ligand-receptor binding. Journal of Chemical Information and Computer Sciences 37, 1–9.

    Article  CAS  Google Scholar 

  45. Martin, Y. C., Kofron, J. L., and Traphagen, L. M. (2002) Do structurally similar molecules have similar biological activities? Journal of Medicinal Chemistry 45, 4350–4358.

    Article  PubMed  CAS  Google Scholar 

  46. Steffen, A., Kogej, T., Tyrchan, C., and Engkvist, O. (2009) Comparison of molecular fingerprint methods on the basis of biological profile data. Journal of Chemical Information and Modeling 49, 338–347.

    Article  PubMed  CAS  Google Scholar 

  47. Sheridan, R. P., Feuston, B. P., Maiorov, V. N., and Kearsley, S. K. (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. Journal of Chemical Information and Computer Sciences 44, 1912–1928.

    Article  PubMed  CAS  Google Scholar 

  48. He, L., and Jurs, P. C. (2005) Assessing the reliability of a QSAR model’s predictions. Journal of Molecular Graphics and Modelling 23, 503–523.

    Article  PubMed  CAS  Google Scholar 

  49. Bostrom, J., Hogner, A., and Schmitt, S. (2006) Do structurally similar ligands bind in a similar fashion? Journal of Medicinal Chemistry 49, 6716–6725.

    Article  PubMed  CAS  Google Scholar 

  50. Paolini, G. V., Shapland, R. H. B., van Hoorn, W. P., Mason, J. S., and Hopkins, A. L. (2006) Global mapping of pharmacological space. Nature Biotechnology 24, 805–815.

    Article  PubMed  CAS  Google Scholar 

  51. Schuffenhauer, A., Floersheim, P., Acklin, P., and Jacoby, E. (2003) Similarity metrics for ligands reflecting the similarity of the target proteins. Journal of Chemical Information and Computer Sciences 43, 391–405.

    Article  PubMed  CAS  Google Scholar 

  52. Hert, J., Keiser, M. J., Irwin, J. J., Oprea, T. I., and Shoichet, B. K. (2008) Quantifying the relationship among drug classes. Journal of Chemical Information and Modeling 48, 755–765.

    Article  PubMed  CAS  Google Scholar 

  53. Keiser, M. J., Roth, B. L., Armbruster, B. N., Ernsberger, P., Irwin, J. J., and Shoichet, B. K. (2007) Relating protein pharmacology by ligand chemistry. Nature Biotechnology 25, 197–206.

    Article  PubMed  CAS  Google Scholar 

  54. Cleves, A. E., and Jain, A. N. (2006) Robust ligand-based modeling of the biological targets of known drugs. Journal of Medicinal Chemistry 49, 2921–2938.

    Article  PubMed  CAS  Google Scholar 

  55. Stahura, F. L., and Bajorath, J. (2002) Bio- and chemo-informatics beyond data management: crucial challenges and future opportunities. Drug Discovery Today 7, S41–S47.

    Article  PubMed  CAS  Google Scholar 

  56. Kubinyi, H. (1998) Similarity and dissimilarity: a medicinal chemist’s view. Perspectives in Drug Discovery and Design 911, 225–232.

    Article  Google Scholar 

  57. Maggiora, G. M. (2006) On outliers and activity cliffs – why QSAR often disappoints. Journal of Chemical Information and Modeling 46, 1535.

    Article  PubMed  CAS  Google Scholar 

  58. Peltason, L., and Bajorath, J. (2007) SAR index: quantifying the nature of structure-activity relationships. Journal of Medicinal Chemistry 50, 5571–5578.

    Article  PubMed  CAS  Google Scholar 

  59. Todeschini, R., and Consonni, V. (2002) Handbook of Molecular Descriptors, Wiley-VCH, Weinheim.

    Google Scholar 

  60. Glen, R. C., and Adams, S. E. (2006) Similarity metrics and descriptor spaces – which combinations to choose? QSAR and Combinatorial Science 25, 1133–1142.

    Article  CAS  Google Scholar 

  61. Godden, J. W., Xue, L., Kitchen, D. B., Stahura, F. L., Schermerhorn, E. J., and Bajorath, J. (2002) Median partitioning: a novel method for the selection of representative subsets from large compound pools. Journal of Chemical Information and Computer Sciences 42, 885–893.

    Article  PubMed  CAS  Google Scholar 

  62. Godden, J. W., Furr, J. R., Xue, L., Stahura, F. L., and Bajorath, J. (2004) Molecular similarity analysis and virtual screening by mapping of consensus positions in binary-tansformed chemical descriptor spaces with variable dimensionality. Journal of Chemical Information and Computer Sciences 44, 21–29.

    Article  PubMed  CAS  Google Scholar 

  63. Kier, L. B., and Hall, H. L. (1986) Molecular Connectivity in Structure-Activity Analysis, Wiley, New York.

    Google Scholar 

  64. Lowell, H., Hall, H. L., and Kier, L. B. (2001) Issues in representation of molecular structure: the development of molecular connectivity. Journal of Molecular Graphics and Modelling 20, 4–18.

    Article  Google Scholar 

  65. Estrada, E., and Uriarte, E. (2001) Recent advances on the use of topological indices in drug discovery research. Current Medicinal Chemistry 8, 1573–1588.

    Article  PubMed  CAS  Google Scholar 

  66. Raymond, J. W., and Willett, P. (2002) Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases. Journal of Computer-Aided Molecular Design 16, 59–71.

    Article  PubMed  CAS  Google Scholar 

  67. Rarey, M., and Dixon, J. S. (1998) Feature trees: a new molecular similarity measure based on tree matching. Journal of Computer-Aided Molecular Design 12, 471–490.

    Article  PubMed  CAS  Google Scholar 

  68. Rarey, M., and Stahl, M. (2001) Similarity searching in large combinatorial chemistry spaces. Journal of Computer-Aided Molecular Design 15, 497–520.

    Article  PubMed  CAS  Google Scholar 

  69. Barker, E. J., Buttar, D., Cosgrove, D. A., Gardiner, E. J., Gillet, V. J., Kitts, P., and Willett, P. (2006) Scaffold-hopping using clique detection applied to reduced graphs. Journal of Chemical Information and Modeling 46, 503–511.

    Article  PubMed  CAS  Google Scholar 

  70. Stiefl, N., Watson, I. A., Baumann, K., and Zaliani, A. (2006) ErG: 2D pharmacophore descriptions for scaffold hopping. Journal of Chemical Information and Modeling 46, 208–220.

    Article  PubMed  CAS  Google Scholar 

  71. Mason, J. S., Morize, I., Menard, P. R., Cheney, D. L., Hulme, C., and Labaudiniere, R. F. (1999) New 4-point pharmacophore method for molecular similarity and diversity applications: overview of the method and applications, including a novel approach to the design of combinatorial libraries containing privileged substructures. Journal of Medicinal Chemistry 42, 3251–3264.

    Article  PubMed  CAS  Google Scholar 

  72. Mount, J., Ruppert, J., Welch, W., and Jain, A. N. (1999) Icepick: a flexible surface-based system for molecular diversity. Journal of Medicinal Chemistry 42, 60–66.

    Article  PubMed  CAS  Google Scholar 

  73. Cheeseright, T., Mackey, M., Rose, S., and Vinter, A. (2006) Molecular field extrema as descriptors of biological activity: definition and validation. Journal of Chemical Information and Modeling 46, 6650–6676.

    Article  CAS  Google Scholar 

  74. Mestres, J., Rohrer, D. C., and Maggiora, G. M. (1997) MIMIC: a molecular-field matching program. Exploiting applicability of molecular similarity approaches. Journal of Computational Chemistry 18, 934–954.

    Article  CAS  Google Scholar 

  75. Ballester, P. J., and Richards, W. G. (2007) Ultrafast shape recognition to search compound databases for similar molecular shapes. Journal of Computational Chemistry 28, 1711–1723.

    Article  PubMed  CAS  Google Scholar 

  76. Rush, T. S., Grant, J. A., Mosyak, L., and Nicholls, A. (2005) A shape-based 3-D scaffold hopping method and its application to a bacterial protein-protein interaction. Journal of Medicinal Chemistry 48, 1489–1495.

    Article  PubMed  CAS  Google Scholar 

  77. Barnard, J. M. (1993) Substructure searching methods – old and new. Journal of Chemical Information and Computer Sciences 33, 532–538.

    Article  CAS  Google Scholar 

  78. Brown, N. (2009) Chemoinformatics – an introduction for computer scientists. ACM Computing Surveys.

    Google Scholar 

  79. Adamson, G. W., Cowell, J., Lynch, M. F., McLure, A. H. W., Town, W. G., and Yapp, A. M. (1973) Strategic considerations in the design of screening systems for substructure searches of chemical structure files. Journal of Chemical Documentation 13, 153–157.

    Article  CAS  Google Scholar 

  80. Durant, J. L., Leland, B. A., Henry, D. R., and Nourse, J. G. (2002) Re-optimisation of MDL keys for use in drug discovery. Journal of Chemical Information and Modeling 42, 1273–1280.

    Article  CAS  Google Scholar 

  81. Hodes, L. (1976) Selection of descriptors according to discrimination and redundancy – application to chemical-structure searching. Journal of Chemical Information and Computer Sciences 16, 88–93.

    Article  PubMed  CAS  Google Scholar 

  82. Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Molecular similarity searching using atom environments: information-based feature selection and a naive Bayesian classifier. Journal of Chemical Information and Computer Sciences 44, 170–178.

    Article  PubMed  CAS  Google Scholar 

  83. Bender, A., Jenkins, J. L., Scheiber, J., Sukuru, S. C. K., Glick, M., and Davies, J. W. (2009) How similar are similarity searching methods? A principal components analysis of molecular descriptor space. Journal of Chemical Information and Modeling 49, 108–119.

    Article  PubMed  CAS  Google Scholar 

  84. Ewing, T. J. A., Baber, J. C., and Feher, F. (2006) Novel 2D fingerprints for ligand-based virtual screening. Journal of Chemical Information and Modeling 46, 2423–2431.

    Article  PubMed  CAS  Google Scholar 

  85. Fechner, U., Paetz, J., and Schneider, G. (2005) Comparison of three holographic fingerprint descriptors and their binary counterparts. QSAR and Combinatorial Science 24, 961–967.

    Article  CAS  Google Scholar 

  86. Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2004) Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Organic and Biomolecular Chemistry 2, 3256–3266.

    Article  PubMed  CAS  Google Scholar 

  87. Schneider, G., Neidhart, W., Giller, T., and Schmid, G. (1999) “Scaffold-hopping” by topological pharmacophore search: a contribution to virtual screening. Angewandte Chemie-International Edition 38, 2894–2896.

    Article  CAS  Google Scholar 

  88. Böhm, H.-J., Flohr, A., and Stahl, M. (2004) Scaffold hopping. Drug Discovery Today: Technologies 1, 217–224.

    Article  CAS  Google Scholar 

  89. Brown, N., and Jacoby, E. (2006) On scaffolds and hopping in medicinal chemistry. Mini-Reviews in Medicinal Chemistry 6, 1217–1229.

    Article  PubMed  CAS  Google Scholar 

  90. Schneider, G., Schneider, P., and Renner, S. (2006) Scaffold-hopping: how far can you jump? QSAR and Combinatorial Science 25, 1162–1171.

    Article  CAS  Google Scholar 

  91. Martin, Y. C., and Muchmore, S. (2009) Beyond QSAR: lead hopping to different structures. QSAR & Combinatorial Science 28, 797–801.

    Article  CAS  Google Scholar 

  92. Eckert, H., and Bajorath, J. (2006) Determination and mapping of activity-specific descriptor value ranges for the identification of active compounds. Journal of Medicinal Chemistry 49, 2284–2293.

    Article  PubMed  CAS  Google Scholar 

  93. Xue, L., Godden, J. W., Stahura, F. L., and Bajorath, J. (2003) Design and evaluation of a molecular fingerprint involving the transformation of property descriptor values into a binary classification scheme. Journal of Chemical Information and Computer Sciences 43, 1151–1157.

    Article  PubMed  CAS  Google Scholar 

  94. Briem, H., and Lessel, U. F. (2000) In vitro and in silico affinity fingerprints: finding similarities beyond structural classes. Perspectives in Drug Discovery and Design 20, 231–244.

    Article  CAS  Google Scholar 

  95. Kauvar, L. M., Higgins, D. L., Villar, H. O., Sportsman, J. R., Engqvist-Goldstein, A., Bukar, R., Bauer, K. E., Dilley, H., and Rocke, D. M. (1995) Predicting ligand binding to proteins by affinity fingerprinting. Chemistry & Biology 2, 107–118.

    Article  CAS  Google Scholar 

  96. Ormerod, A., Willett, P., and Bawden, D. (1989) Comparison of fragment weighting schemes for substructural analysis, Quantitative Structure-Activity Relationships 8, 115–129.

    Article  CAS  Google Scholar 

  97. Goldman, B. B., and Walters, W. P. (2006) Machine learning in computational chemistry. Annual Reports in Computational Chemistry 2, 127–140.

    Article  CAS  Google Scholar 

  98. Moock, T. E., Grier, D. L., Hounshell, W. D., Grethe, G., Cronin, K., Nourse, J. G., and Theodosiou, J. (1988) Similarity searching in the organic reaction domain. Tetrahedron Computer Methodology 1, 117–128.

    Article  CAS  Google Scholar 

  99. Downs, G. M., Poirrette, A. R., Walsh, P., and Willett, P. (1993) Evaluation of similarity searching methods using activity and toxicity data, in Chemical Structures 2. The International Language of Chemistry. (Warr, W. A., Ed.), pp 409–421, Springer Verlag, Berlin.

    Google Scholar 

  100. Azencott, C.-A., Ksikes, A., Swamidass, S. J., Chen, J. H., Ralaivola, L., and Baldi, P. (2007) One- to four-dimensional kernels for virtual screening and the prediction of physical, chemical and biological properties. Journal of Chemical Information and Modeling 47, 965–974.

    Article  PubMed  CAS  Google Scholar 

  101. Chen, X., and Reynolds, C. H. (2002) Performance of similarity measures in 2D fragment-based similarity searching: comparison of structural descriptors and similarity coefficients. Journal of Chemical Information and Computer Sciences 42, 1407–1414.

    Article  PubMed  CAS  Google Scholar 

  102. Olah, M., Bologa, C., and Oprea, T. I. (2004) An automated PLS search for biologically relevant QSAR descriptors. Journal of Computer-Aided Molecular Design 18, 437–449.

    Article  PubMed  CAS  Google Scholar 

  103. Arif, S. M., Holliday, J. D., and Willett, P. (2009) Analysis and use of fragment occurrence data in similarity-based virtual screening. Journal of Computer-Aided Molecular Design 23, 655–668.

    Article  PubMed  CAS  Google Scholar 

  104. Everitt, B. S., Landau, S., and Leese, M. (2001) Cluster Analysis, 4th edition, Edward Arnold, London.

    Google Scholar 

  105. Gower, J. C. (1982) Measures of similarity, dissimilarity and distance, in Encyclopaedia of Statistical Sciences (Kotz, S., Johnson, N. L., and Read, C. B., Eds.), pp 397–405, John Wiley, Chichester.

    Google Scholar 

  106. Hubálek, Z. (1982) Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biological Reviews of the Cambridge Philosophical Society 57, 669–689.

    Article  Google Scholar 

  107. Flower, D. R. (1988) On the properties of bit string based measures of chemical similarity. Journal of Chemical Information and Computer Sciences 38, 379–386.

    Google Scholar 

  108. Dixon, S. L., and Koehler, R. T. (1999) The hidden component of size in two-dimensional fragment descriptors: side effects on sampling in bioactive libraries. Journal of Medicinal Chemistry 42, 2887–2900.

    Article  PubMed  CAS  Google Scholar 

  109. Fligner, M. A., Verducci, J. S., and Blower, P. E. (2002) A modification of the Jaccard-Tanimoto similarity index for diverse selection of chemical compounds using binary strings. Technometrics 44, 110–119.

    Article  Google Scholar 

  110. Godden, J. W., Xue, L., and Bajorath, J. (2000) Combinatorial preferences affect molecular similarity/diversity calculations using binary fingerprints and Tanimoto coefficients. Journal of Chemical Information and Computer Sciences 40, 163–166.

    Article  PubMed  CAS  Google Scholar 

  111. Tversky, A. (1977) Features of similarity. Psychological Review 84, 327–352.

    Article  Google Scholar 

  112. Bradshaw, J. (1997) Introduction to Tversky similarity measure, in MUG ‘97 – 11th Annual Daylight User Group Meeting Laguna Beach CA.

    Google Scholar 

  113. Maggiora, G. M., Mestres, J., Hagadone, T. R., and Lajiness, M. S. (1997) Asymmetric similarity and molecular diversity, in 213th National Meeting of the American Chemical Society, April 13–17, 1997, San Francisco, CA.

    Google Scholar 

  114. Chen, X., and Brown, F. K. (2006) Asymmetry of chemical similarity. ChemMedChem 2, 180–182.

    Article  CAS  Google Scholar 

  115. Wang, Y., Eckert, H., and Bajorath, J. (2007) Apparent asymmetry in fingerprint similarity searching is a direct consequence of differences in bit densities and molecular size. ChemMedChem 2, 1037–1042.

    Article  PubMed  CAS  Google Scholar 

  116. Wang, Y., and Bajorath, J. (2008) Balancing the influence of molecular complexity on fingerprint similarity searching. Journal of Chemical Information and Modeling 48, 75–84.

    Article  PubMed  CAS  Google Scholar 

  117. Wang, Y., and Bajorath, J. (2009) Development of a compound-class directed similarity coefficient that accounts for molecular complexity effects in fingerprint searching. Journal of Chemical Information and Modeling 49, 1369–1376.

    Article  PubMed  CAS  Google Scholar 

  118. Varin, T., Bureau, R., Mueller, C., and Willett, P. (2009) Clustering files of chemical structures using the Székely-Rizzo generalisation of Ward’s method. Journal of Molecular Graphics and Modelling 28, 187–195.

    Article  PubMed  CAS  Google Scholar 

  119. Gower, J. C., and Legendre, P. (1986) Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification 5, 5–48.

    Article  Google Scholar 

  120. Edgar, S. J., Holliday, J. D., and Willett, P. (2000) Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. Journal of Molecular Graphics and Modelling 18, 343–357.

    Article  PubMed  CAS  Google Scholar 

  121. Willett, P. (2004) The evaluation of molecular similarity and molecular diversity methods using biological activity data. Methods in Molecular Biology 275, 51–63.

    Article  PubMed  CAS  Google Scholar 

  122. Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., and Sheridan, R. P. (1996) Chemical similarity using physicochemical property descriptors. Journal of Chemical Information and Computer Sciences 36, 118–127.

    Article  CAS  Google Scholar 

  123. Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2004) Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures. Journal of Chemical Information and Computer Sciences 44, 1177–1185.

    Article  PubMed  CAS  Google Scholar 

  124. Cuissart, B., Touffet, F., Crémilleux, B., Bureau, R., and Rault, S. (2002) The maximum common substructure as a molecular depiction in a supervised classification context: experiments in quantitative structure/biodegradability relationships. Journal of Chemical Information and Computer Sciences 42, 1043–1052.

    Article  PubMed  CAS  Google Scholar 

  125. Triballeau, N., Acher, F., Brabet, I., Pin, J.-P., and Bertrand, H.-O. (2005) Virtual screening workflow development guided by the “Receiver Operating Characteristic” curve approach. Application to high-throughput docking on metabotropic glutamate receptor type 4. Journal of Medicinal Chemistry 48, 2534–2547.

    Article  PubMed  CAS  Google Scholar 

  126. Truchon, J.-F., and Bayly, C. I. (2007) Evaluating virtual screening methods: good and bad metrics for the “early recognition” problem. Journal of Chemical Information and Modeling 47, 488–508.

    Article  PubMed  CAS  Google Scholar 

  127. Jain, A. N., and Nicholls, A. (2008) Recommendations for evaluation of computational methods. Journal of Computer-Aided Molecular Design 22, 133–139.

    Article  PubMed  CAS  Google Scholar 

  128. Nicholls, A. (2008) What do we know and when do we know it? Journal of Computer-Aided Molecular Design 22, 239–255.

    Article  PubMed  CAS  Google Scholar 

  129. Good, A. C., Hermsmeier, M. A., and Hindle, S. A. (2004) Measuring CAMD technique performance: a virtual screening case study in the design of validation experiments. Journal of Computer-Aided Molecular Design 18, 529–536.

    Article  PubMed  CAS  Google Scholar 

  130. Willett, P. (2006) Data fusion in ligand-based virtual screening. QSAR and Combinatorial Science 25, 1143–1152.

    Article  CAS  Google Scholar 

  131. Feher, M. (2006) Consensus scoring for protein-ligand interactions. Drug Discovery Today 11, 421–428.

    Article  PubMed  CAS  Google Scholar 

  132. Ginn, C. M. R., Turner, D. B., Willett, P., Ferguson, A. M., and Heritage, T. W. (1997) Similarity searching in files of three-dimensional chemical structures: evaluation of the EVA descriptor and combination of rankings using data fusion. Journal of Chemical Information and Computer Sciences 37, 23–37.

    Article  CAS  Google Scholar 

  133. Ginn, C. M. R., Willett, P., and Bradshaw, J. (2000) Combination of molecular similarity measures using data fusion. Perspectives in Drug Discovery and Design 20, 1–16.

    Article  CAS  Google Scholar 

  134. Sheridan, R. P., Miller, M. D., Underwood, D. J., and Kearsley, S. K. (1996) Chemical similarity using geometric atom pair descriptors. Journal of Chemical Information and Computer Sciences 36, 128–136.

    Article  CAS  Google Scholar 

  135. Holliday, J. D., Hu, C.-Y., and Willett, P. (2002) Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. Combinatorial Chemistry and High-Throughput Screening 5, 155–166.

    PubMed  CAS  Google Scholar 

  136. Salim, N., Holliday, J. D., and Willett, P. (2003) Combination of fingerprint-based similarity coefficients using data fusion. Journal of Chemical Information and Computer Sciences 43, 435–442.

    Article  PubMed  CAS  Google Scholar 

  137. Whittle, M., Gillet, V. J., Willett, P., Alex, A., and Loesel, J. (2004) Enhancing the effectiveness of virtual screening by fusing nearest neighbor lists: a comparison of similarity coefficients. Journal of Chemical Information and Computer Sciences 44, 1840–1848.

    Article  PubMed  CAS  Google Scholar 

  138. Xue, L., Stahura, F. L., Godden, J. W., and Bajorath, J. (2001) Fingerprint scaling increases the probability of identifying molecules with similar activity in virtual screening calculations. Journal of Chemical Information and Computer Sciences 41, 746–753.

    Article  PubMed  CAS  Google Scholar 

  139. Williams, C. (2006) Reverse fingerprinting, similarity searching by group fusion and fingerprint bit importance. Molecular Diversity 10, 311–332.

    Article  PubMed  CAS  Google Scholar 

  140. Zhang, Q., and Muegge, I. (2006) Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring. Journal of Medicinal Chemistry 49, 1536–1548.

    Article  PubMed  CAS  Google Scholar 

  141. Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2005) Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbour information. Journal of Medicinal Chemistry 48, 7049–7054.

    Article  PubMed  CAS  Google Scholar 

  142. Hert, J., Willett, P., Wilton, D. J., Acklin, P., Azzaoui, K., Jacoby, E., and Schuffenhauer, A. (2006) New methods for ligand-based virtual screening: use of data-fusion and machine-learning techniques to enhance the effectiveness of similarity searching. Journal of Chemical Information and Modeling 46, 462–470.

    Article  PubMed  CAS  Google Scholar 

  143. Gardiner, E. J., Gillet, V. J., Haranczyk, M., Hert, J., Holliday, J. D., Malim, N., Patel, Y., and Willett, P. (2009) Turbo similarity searching: effect of fingerprint and dataset on virtual-screening performance. Statistical Analysis and Data Mining 2, 103–114.

    Article  Google Scholar 

  144. Baber, J. C., Shirley, W. A., Gao, Y., and Feher, M. (2006) The use of consensus scoring in ligand-based virtual screening. Journal of Chemical Information and Modelling 46, 277–288.

    Article  CAS  Google Scholar 

  145. Whittle, M., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: theoretical model. Journal of Chemical Information and Modeling 46, 2193–2205.

    Article  PubMed  CAS  Google Scholar 

  146. Whittle, M., Gillet, V. J., Willett, P., and Loesel, J. (2006) Analysis of data fusion methods in virtual screening: similarity and group fusion. Journal of Chemical Information and Modeling 46, 2206–2219.

    Article  PubMed  CAS  Google Scholar 

  147. Cramer, R. D., Redl, G., and Berkoff, C. E. (1974) Substructural analysis. A novel approach to the problem of drug design. Journal of Medicinal Chemistry 17, 533–535.

    Article  PubMed  CAS  Google Scholar 

  148. Capelli, A. M., Feriani, A., Tedesco, G., and Pozzan, A. (2006) Generation of a focused set of GSK compounds biased toward ligand-gated ion-channel ligands. Journal of Chemical Information and Modeling 46, 659–664.

    Article  PubMed  CAS  Google Scholar 

  149. Cosgrove, D. A., and Willett, P. (1998) SLASH: a program for analysing the functional groups in molecules. Journal of Molecular Graphics and Modelling 16, 19–32.

    Article  PubMed  CAS  Google Scholar 

  150. Medina-Franco, J. L., Petit, J., and Maggiora, G. M. (2006) Hierarchical strategy for identifying active chemotype classes in compound databases. Chemical Biology & Drug Design 67, 395–408.

    Article  CAS  Google Scholar 

  151. Schreyer, S. K., Parker, C. N., and Maggiora, G. M. (2004) Data shaving: a focused screening approach. Journal of Chemical Information and Computer Sciences 44, 470–479.

    Article  PubMed  CAS  Google Scholar 

  152. Hassan, M., Brown, R. D., Varma-O’Brien, S., and Rogers, D. (2006) Cheminformatics analysis and learning in a data pipelining environment. Molecular Diversity 10, 283–299.

    Article  PubMed  CAS  Google Scholar 

  153. Rogers, D., Brown, R. D., and Hahn, M. (2005) Using extended-connectivity fingerprints with Laplacian-modified Bayesian analysis in high-throughput screening follow-up. Journal of Biomolecular Screening 10, 682–686.

    Article  PubMed  CAS  Google Scholar 

  154. Xia, X. Y., Maliski, E. G., Gallant, P., and Rogers, D. (2004) Classification of kinase inhibitors using a Bayesian model. Journal of Medicinal Chemistry 47, 4463–4470.

    Article  PubMed  CAS  Google Scholar 

  155. Bender, A., Mussa, H. Y., Glen, R. C., and Reiling, S. (2004) Similarity searching of chemical databases using atom environment descriptors: evaluation of performance. Journal of Chemical Information and Computer Sciences 44, 1708–1718.

    Article  PubMed  CAS  Google Scholar 

  156. Vogt, M., Nisius, B., and Bajorath, J. (2009) Predicting the similarity search performance of fingerprints and their combination with molecular property descriptors using probabilistic and information theoretic modeling. Statistical Analysis and Data Mining 2, 123–134.

    Article  Google Scholar 

  157. Vogt, M., and Bajorath, J. (2008) Bayesian screening for active compounds in high-dimensional chemical spaces combining property descriptors and molecular fingerprints. Chemical and Biological Drug Design 71, 8–14.

    Article  CAS  Google Scholar 

  158. Wang, Y., and Bajorath, J. (2008) Bit silencing in fingerprints enables the derivation of compound class-directed similarity metrics. Journal of Chemical Information and Modeling 48, 1754–1759.

    Article  PubMed  CAS  Google Scholar 

  159. Vogt, I., and Bajorath, J. (2007) Analysis of a high-throughput screening data set using potency-scaled molecular similarity algorithms. Journal of Chemical Information and Modeling 47, 367–375.

    Article  PubMed  CAS  Google Scholar 

  160. Geppert, H., Horvath, T., Gartner, T., Wrobel, S., and Bajorath, J. (2008) Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2D fingerprints and multiple reference compounds. Journal of Chemical Information and Modeling 48, 742–746.

    Article  PubMed  CAS  Google Scholar 

  161. Shemetulskis, N. E., Weininger, D., Blankey, C. J., Yang, J. J., and Humblet, C. (1996) Stigmata: an algorithm to determine structural commonalities in diverse datasets. Journal of Chemical Information and Computer Sciences 36, 862–871.

    Article  PubMed  CAS  Google Scholar 

  162. Tovar, A., Eckert, H., and Bajorath, J. (2007) Comparison of 2D fingerprint methods for multiple-template similarity searching on compound activity classes of increasing structural diversity. ChemMedChem 2, 208–217.

    Article  PubMed  CAS  Google Scholar 

  163. Hessler, G., Zimmermann, M., Matter, H., Evers, A., Naumann, T., Lengauer, T., and Rarey, M. (2005) Multiple-ligand-based virtual screening: methods and applications of the MTree approach. Journal of Medicinal Chemistry 48, 6575–6584.

    Article  PubMed  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Willett, P. (2010). Similarity Searching Using 2D Structural Fingerprints. In: Bajorath, J. (eds) Chemoinformatics and Computational Chemical Biology. Methods in Molecular Biology, vol 672. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-60761-839-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-60761-839-3_5

  • Published:

  • Publisher Name: Humana Press, Totowa, NJ

  • Print ISBN: 978-1-60761-838-6

  • Online ISBN: 978-1-60761-839-3

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics