Skip to main content

Matching of Chemical and Biological Structures Using Subgraph and Maximal Common Subgraph Isomorphism Algorithms

  • Chapter
Rational Drug Design

Part of the book series: The IMA Volumes in Mathematics and its Applications ((IMA,volume 108))

Abstract

This paper discusses the use of graph-isomorphism algorithms for substructure and maximal common substructure searching in databases of chemical and biological molecules. Subgraph isomorphism algorithms are used in substructure searchingwhereone wishes to identify all molecules in a database that contain a user-defined pattern, with an initial screening search being used to eliminate a large fraction of the database from the time-consuming graph-matching search. A maximal common sub-graph isomorphism algorithm provides an effective way of identifying the structural features common to a pair of molecules. The application of these isomorphism techniques to database searching is illustrated by reference to 2D and 3D small molecules, 3D protein structures and carbohydrate sequences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  • Artymiuk, P. J., Grindley, H. M., MacKenzie, A. B., Rice, D. W., Ujah, E. C. and Willett, P. (1995). PROTEP: a program for graph-theoretic similarity searching of the 3-D structures in the Protein Data Bank. In: Carbo, R. (editor) Molecular Similarity and Reactivity: from Quantum Chemical to Phenomenological Approaches. Netherlands: Kluwer Academic Publishers.

    Google Scholar 

  • Artymiuk, P. J., Grindley, H. M., Poirrette, A. R.RiceD. W., Ujah, E. C. and Willett, P. (1994a). Identification of β-sheet motifs, of ϕ-loops and of patterns of amino-acid residues in three-dimensional protein structures using a subgraph-isomorphism algorithm. Journal of Chemical Information and Computer Sciences 34, 54–62.

    Google Scholar 

  • Artymiuk, P. J., Poirrette, A. R., Grindley, H. M., Rice, D. W. and Willett, P. (1994b). A graph-theoretic approach to the identification of three-dimensional patterns of amino acid sidechains in protein structures. Journal of Molecular Biology 243, 327–344.

    Article  Google Scholar 

  • Artymiuk, P. J., Poirrette, A. R., Rice, D. W. and Willett, P. (1996). Biotin carboxylase comes into the fold. Nature Structural Biology 3, 128–132.

    Article  Google Scholar 

  • Artymiuk, P. J., Rice, D. W., Mitchell, E. M. and Willett, P. (1990). Structural resemblance between the families of bacterial signal-transduction proteins and ofGproteins revealed by graph theoretical techniques. Protein Engineering, 4, 39–43.

    Article  Google Scholar 

  • Ash, J. E., Warr, W. A. and Willett, P. (editors) (1991). Chemical Structure Systems. Chichester: Ellis Horwood.

    Google Scholar 

  • Babel, L. (1991). Finding maximum cliques in arbitray and special graphs. Computing 46, 321–341.

    Article  MathSciNet  MATH  Google Scholar 

  • Barnard, J. M. (1993). Substructure searching methods: old and new. Journal of Chemical Information and Computer Sciences, 33, 532–538.

    Google Scholar 

  • Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jnr., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, M. and Tasumi, M. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. Journal of Molecular Biology 112, 535–542.

    Article  Google Scholar 

  • Brint, A. T. and Willett, P. (1987a). Pharmacophoric pattern matching in files of 3D chemical structures: comparison of geometric searching algorithms. Journal of Molecular Graphics 5, 49–56.

    Article  Google Scholar 

  • Brint, A. T. and Willett, P. (1987b). Algorithms for the identification of three-dimensional maximal common substructures. Journal of Chemical Information and Cornputer Sciences 27, 152–158.

    Google Scholar 

  • Brint, A. T. and Willett, P. (1988). Upperbound procedures for the identification of similar three-dimensional chemical structures. Journal of Computer-Aided Molecular Design 2, 311–320.

    Article  Google Scholar 

  • Bron, C. and Kerbosch, J. (1973). Algorithm 457. Finding all cliques of an undirected graph. Communications of the ACM 16, 575–577.

    Article  MATH  Google Scholar 

  • Brown, R. D., Jones, G., Willett, P. and Glen, R. C. (1994). Matching two-dimensional chemical graphs using genetic algorithms. Journal of Chemical Information and Computer Sciences 34, 63–70.

    Google Scholar 

  • Bruno, I. J., Kemp, N. M., Artymiuk, P. J. and Willett, P. (1997). Representation and searching of carbohydrate structures using graph-theoretic techniques, submitted for publication.

    Google Scholar 

  • Bures, M. G. (1997) Intergration of molecular modelling and database searching. In: Martin, Y. C. and Willett, P. (editors) (1997) Designing Bioactive Molecules: Three-Dimensional Techniques and Applications. Washington DC: American Chemical Society, in the press.

    Google Scholar 

  • Bures, M. G., Danaher, E., DeLazzer, J. and Martin, Y. C. (1994). New molecular modelling tools using three-dimensional chemical substructures. Journal of Chemical Information and Computer Sciences 34, 218–223.

    Google Scholar 

  • Burt, C., Richards, W. H. and Huxley, P. (1990). The application of molecular similarity calculations. Journal of Computational Chemistry 11, 1139–1146.

    Article  Google Scholar 

  • Clark, D. E., Jones, G., Willett, P., Kenny, P. W. and Glen, R. C. (1994). Pharmacophoric pattern matching in files of three-dimensional chemical structures: comparison of conformational-searching algorithms for flexible searching. Journal of Chemical Information and Computer Sciences, 34, 197–206.

    Google Scholar 

  • Clark, D. E., Willett, P. and Kenny, P. W. (1992). Pharmacophoric pattern matching in files of three-dimensional chemical structures: use of smoothed bounded-distance matrices for the representation and searching of conformationally-flexible molecules. Journal of Molecular Graphics 10, 194–204.

    Article  Google Scholar 

  • Crandell, C. W. and Smith, D. H. (1983). Computer-assisted examination of compounds for common three-dimensional substructures. Journal of Chemical Information and Computer Sciences 23, 186–197.

    Google Scholar 

  • Cringean, J. K., Pepperrell, C. A., Poirrette, A. R. and Willett, P. (1990). Selection of screens for three-dimensional substructure searching. Tetrahedron Computer Methodology 3, 37–46.

    Article  Google Scholar 

  • Crippen, G. M. and Havel, T. F. (1988). Distance Geometry and Molecular Conformation. Letchworth: Research Studies Press.

    MATH  Google Scholar 

  • Doubet, S., Bock, K., Smith, D., Darvill, A. and Albersheim, P. (1989). The Complex Carbohydrate Structure Database. Trends in Biochemical Sciences 14, 475–477.

    Article  Google Scholar 

  • Downs, G. M., Lynch, M. F., Willett, P., Manson, G. A. and Wilson, G. A. (1988). Transputer implementations of chemical substructure searching algorithms. Tetrahedron Computer Methodology 1, 207–217.

    Article  Google Scholar 

  • Downs, G. M. and Willett, P. (1996) Similarity searching in databases of chemical structures. Reviews in Computational Chemistry 7, 1–66.

    Article  Google Scholar 

  • Fan, C., Moews, P. C., Shi, Y., Walsh, C. T. and Knox, J. R. (1995). A common fold for peptide synthetases cleaving ATP to ADP - glutathione synthetase and DAlanine-D-Alanine ligase of Escherichia Coli. Proceedings of the National Academy of Sciences of the USA, 92, 1172–1176.

    Article  Google Scholar 

  • Feizi, T. and Bundle, D. (1996) Carbohydrates and glycoconjugates. The coming age for oligosaccharide ligands and databases for saccharide structures. slCurrent Opinion in Structural Biology, 6, 659–662.

    Article  Google Scholar 

  • Fisanick, W., Cross, K. P. and Rusinko, A. (1992). Similarity searching on CAS Registry substances. 1. Global molecular property and generic atom triangle geometric searching. Journal of Chemical Information and Computer Sciences 32, 664–674.

    Google Scholar 

  • Gardiner, E. (1996) An Evaluation of a New Clique-Detection Procedure for Matching Chemical Structures. MSc dissertation, University of Sheffield.

    Google Scholar 

  • Garey, M. R. and Johnson, D. S. (1979). Computers and Intractability: a Guide to the Theory of NP-Completeness. San Francisco: W. H. Freeman.

    MATH  Google Scholar 

  • Gati, G. (1979). Further annotated bibliography on the isomorphism disease. Journal of Graph Theory 3, 95–109.

    Article  MathSciNet  MATH  Google Scholar 

  • Gibson, T. J. and Argos, P. (1990). Protruding domain of tomato bushy stunt virus coat protein is a hitherto unrecognized class of jellyroll conformation. Journal of Molecular Biology 212, 7–9.

    Article  Google Scholar 

  • Good, A. C., Hodgkin, E. E. and Richards, W. G. (1992). The utilisation of Gaussian functions for the rapid evaluation of molecular similarity. Journal of Chemical Information and Computer Science 32, 188–191.

    Article  Google Scholar 

  • Good, A. C. and Mason, J. S. (1996). Three-dimensional structure database searches. Reviews in Computational Chemistry 7, 67–117.

    Article  Google Scholar 

  • Gund, P. (1977). Three-dimensional pharmacophoric pattern searching. Progress in Molecular and Subcellular Biology 5, 117–143.

    Article  Google Scholar 

  • Hagadone, T. R. (1992). Molecular subsimilarity searching: efficient retrieval in two-dimensional structure databases. Journal of Chemical Information and Computer Sciences, 32, 515–521.

    Google Scholar 

  • Hodes, L. (1976). Selection of descriptors according to discrimination and redundancy. Application to chemical substructure searching. Journal of Chemical Information and Computer Sciences 16, 88–93.

    Google Scholar 

  • Hurst, T. (1994). Flexible 3D searching: the directed tweak technique. Journal of Chemical Information and Computer Sciences 34, 190–196.

    Google Scholar 

  • Hutchinson, E. G. and Thornton, J. M. (1990). HERA — A program to draw schematic diagrams of protein secondary structures. Proteins: Structure, Function, and Genetics, 8, 203–212.

    Article  Google Scholar 

  • Jakes, S. E. and Willett, P. (1986). Pharmacophoric pattern matching in files of 3-D chemical structures: selection of inter-atomic distance screens. Journal of Molecular Graphics 4, 12–20.

    Article  Google Scholar 

  • Jones, G., Willett, P. and Glen, R. C. (1995). A genetic algorithm for flexible molecular overlay and pharmacophore detection. Journal of Computer-Aided Molecular Design 9, 532–549.

    Article  Google Scholar 

  • Kubinyi, H. (1993) (editor). 3D QSAR in Drug Design. Leiden: ESCOM.

    Google Scholar 

  • Kuhl, F. S., Crippen, G. M. and Friesen, D. K. (1984). A combinatorial algorithm for calculating ligand binding. Journal of Computational Chemistry 5, 24–34.

    Article  Google Scholar 

  • Leach, A. R. (1991). A survey of methods for searching the conformational space of small and medium-sized molecules. Reviews in Computational Chemistry, 2, 1–55.

    MathSciNet  Google Scholar 

  • Levi, G. (1972). A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo, 9, 341–352.

    Article  Google Scholar 

  • McGregor, J. J. (1982). Backtrack search algorithms and the maximal common subgraph problem. Software Practice and Experience, 12, 23–34.

    Article  MATH  Google Scholar 

  • Manaut, F., Sanz, F., Jose, J. and Milesi, M. (1991). Automatic search for maximum similarity between molecular electrostatic potential distributions. Journal of Computer-Aided Molecular Design 5, 371–380.

    Article  Google Scholar 

  • Martin, Y. C., Bures, M. G., Danaher, E. A., DeLazzer, J., Lico, I. and Pavlik, P. A. (1993). A fast new approach to pharmacophore mapping and its application to dopaminergic and benzodiazepine agonists. Journal of Computer-Aided Molecular Design 7, 83–102.

    Article  Google Scholar 

  • Martin, Y. C. and Willett, P. (editors) (1997) Designing Bioactive Molecules: Three-Dimensional Techniques and Applications. Washington DC: American Chemical Society, in the press.

    Google Scholar 

  • Mitchell, E. M., Artymiuk, P. J., Rice, D. W. and Willett, P. (1990). Use of techniques derived from graph theory to compare secondary structure motifs in proteins. Journal of Molecular Biology, 212, 151–166.

    Article  Google Scholar 

  • Morgan, H. L. (1965). The generation of a unique machine description for chemical structures: a technique developed at Chemical Abstracts Service. Journal of Chemical Documentation 5, 107–113.

    Article  Google Scholar 

  • Nilakantan, R., Bauman, N. and Venkataraghavan, R. (1993). A new method for rapid characterisation of molecular shape: applications in drug design. Journal of Chemical Information and Computer Sciences 33, 79–85.

    Google Scholar 

  • Pardalos, P. M. and Xue, J. (1994). The maximum clique problem. Journal of Global Optimization 4, 301–328.

    Article  MathSciNet  MATH  Google Scholar 

  • Pearlman, R. S. (1987). Rapid generation of high quality approximate 3D molecular structures. Chemical Design Automation News, 2, 1–6.

    Google Scholar 

  • Pepperrell, C. A. and Willett, P. (1991). Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances. Journal of Computer-Aided Molecular Design 5, 455–474.

    Article  Google Scholar 

  • Pepperrell, C. A., Willett, P. and Taylor, R. (1990). Implementation and use of an atom-mapping procedure for similarity searching in databases of 3-D chemical structures. Tetrahedron Computer Methodology 3, 575–593.

    Article  Google Scholar 

  • Poirrette, A. R., Willett, P. and Allen, F. H. (1991). Pharmacophoric pattern matching in files of three-dimensional chemical structures: characterisation and use of generalised valence angle screens. Journal of Molecular Graphics 9, 203–217.

    Article  Google Scholar 

  • Poirrette, A. R., Willett, P. and Allen, F. H. (1993). Pharmacophoric pattern matching in files of three-dimensional chemical structures: characterisation and use of generalised torsion angle screens. Journal of Molecular Graphics 11, 2–14.

    Article  Google Scholar 

  • Ray, L. C. and Kirsch, R. A. (1957). Finding chemical records by digital computers. Science 126, 814–819.

    Article  Google Scholar 

  • Read, R. C. and Corneil, D. G. (1977). The graph isomorphism disease. Journal of Graph Theory 1, 339–363.

    Article  MathSciNet  MATH  Google Scholar 

  • Richard, A. M. (1991). Quantitative comparison of molecular electrostatic potentials for structure-activity studies. Journal of Computational Chemistry 12, 959–969.

    Article  Google Scholar 

  • Richardson, J. S. (1977). β-sheet topology and the relatedness of proteins. Nature, 268,495–500.

    Article  Google Scholar 

  • Sadowski, J. and Gasteiger, J. (1993). From atoms and bonds to three-dimensional atomic coordinates: automatic model builders. Chemical Reviews 93, 2567–2581.

    Article  Google Scholar 

  • Sheridan, R. P., Nilakantan, R., Rusinko, A., Bauman, N., Haraki, K. S. and Venkataraghavan, R. (1989). 3DSEARCH: a system for three-dimensional substructure searching. Journal of Chemical Information and Computer Sciences 29, 255–260.

    Google Scholar 

  • Stewart, J. J. M. (1990). MOPAC: a semi-empirical molecular orbital program. Journal of Computer-Aided Molecular Design 4, 1–105.

    Article  Google Scholar 

  • Sussenguth, E. H. (1965). A graph-theoretic algorithm for matching chemical structures. Journal of Chemical Documentation, 5, 36–43.

    Article  Google Scholar 

  • Thorner, D. A., Willett, P., Wright, P. M. and Taylor, R. (1997). Similarity searching in files of three-dimensional chemical structures: representation and searching of molecular electrostatic potentials using field-graphs. Journal of Computer-Aided Molecular Design 11, 163–174.

    Article  Google Scholar 

  • Thorner, D. A., Wild, D. J, Willett, P. and Wright, P. M. (1996). Similarity searching in files of three-dimensional chemical structures: flexible field-based searching of molecular electrostatic potentials. Journal of Chemical Information and Computer Sciences 36, 900–908.

    Google Scholar 

  • Trinajstic, N. (editor) (1983).Chemical Graph Theory. Chichester: Ellis Horwood.

    Google Scholar 

  • Ullmann, J. R. (1976). An algorithm for subgraph isomorphism. Journal of the Association for Computing Machinery, 23, 31–42.

    Article  MathSciNet  Google Scholar 

  • Vinter, J. G. and Gardner, M. (editors) (1994). Molecular Modelling and Drug Design. London: Macmillan.

    Google Scholar 

  • von Itzstein, M., Wu, W.-Y., Kok, G. B., Pegg, M. S., Dyason, J. C., Jin, B., Phan, T. V., Smythe, M. L., White, H. F., Oliver, S. W., Colman, P. M., Varghese, J. N., Ryan, D. M., Woods, J. M., Bethell, R. C., Hotham, V. J., Cameron, J. M. and Penn, C. R. (1993). Rational design of potent sialidase-based inhibitors of influenza-virus replication. Nature 363, 418–423.

    Article  Google Scholar 

  • Warr, W. A. and Willett, P. (1997). The principles and practice of 3D database searching. In: Martin, Y. C. and Willett, P. (editors) Designing Bioactive Molecules: Three-Dimensional Techniques and Applications. Washington DC: American Chemical Society, in the press.

    Google Scholar 

  • Wild, D. J and Willett, P. (1996). Similarity searching in files of three-dimensional chemical structures: alignment of Molecular Electrostatic Potentials with a genetic algorithm. Journal of Chemical Information and Computer Sciences, 36, 159–167.

    Google Scholar 

  • Willett, P. (1986). Modern Approaches to Chemical Reaction Searching. Aldershot: Gower.

    Google Scholar 

  • Willett, P. (1991). Three-Dimensional Chemical Structure Handling. Taunton: Research Studies Press.

    Google Scholar 

  • Wilson, R. (1972). Introduction to Graph Theory. Edinburgh: Oliver and Boyd.

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer Science+Business Media New York

About this chapter

Cite this chapter

Willett, P. (1999). Matching of Chemical and Biological Structures Using Subgraph and Maximal Common Subgraph Isomorphism Algorithms. In: Truhlar, D.G., Howe, W.J., Hopfinger, A.J., Blaney, J., Dammkoehler, R.A. (eds) Rational Drug Design. The IMA Volumes in Mathematics and its Applications, vol 108. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1480-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-1480-9_3

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-7159-8

  • Online ISBN: 978-1-4612-1480-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics