Abstract
This paper discusses the use of graph-isomorphism algorithms for substructure and maximal common substructure searching in databases of chemical and biological molecules. Subgraph isomorphism algorithms are used in substructure searchingwhereone wishes to identify all molecules in a database that contain a user-defined pattern, with an initial screening search being used to eliminate a large fraction of the database from the time-consuming graph-matching search. A maximal common sub-graph isomorphism algorithm provides an effective way of identifying the structural features common to a pair of molecules. The application of these isomorphism techniques to database searching is illustrated by reference to 2D and 3D small molecules, 3D protein structures and carbohydrate sequences.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Artymiuk, P. J., Grindley, H. M., MacKenzie, A. B., Rice, D. W., Ujah, E. C. and Willett, P. (1995). PROTEP: a program for graph-theoretic similarity searching of the 3-D structures in the Protein Data Bank. In: Carbo, R. (editor) Molecular Similarity and Reactivity: from Quantum Chemical to Phenomenological Approaches. Netherlands: Kluwer Academic Publishers.
Artymiuk, P. J., Grindley, H. M., Poirrette, A. R.RiceD. W., Ujah, E. C. and Willett, P. (1994a). Identification of β-sheet motifs, of ϕ-loops and of patterns of amino-acid residues in three-dimensional protein structures using a subgraph-isomorphism algorithm. Journal of Chemical Information and Computer Sciences 34, 54–62.
Artymiuk, P. J., Poirrette, A. R., Grindley, H. M., Rice, D. W. and Willett, P. (1994b). A graph-theoretic approach to the identification of three-dimensional patterns of amino acid sidechains in protein structures. Journal of Molecular Biology 243, 327–344.
Artymiuk, P. J., Poirrette, A. R., Rice, D. W. and Willett, P. (1996). Biotin carboxylase comes into the fold. Nature Structural Biology 3, 128–132.
Artymiuk, P. J., Rice, D. W., Mitchell, E. M. and Willett, P. (1990). Structural resemblance between the families of bacterial signal-transduction proteins and ofGproteins revealed by graph theoretical techniques. Protein Engineering, 4, 39–43.
Ash, J. E., Warr, W. A. and Willett, P. (editors) (1991). Chemical Structure Systems. Chichester: Ellis Horwood.
Babel, L. (1991). Finding maximum cliques in arbitray and special graphs. Computing 46, 321–341.
Barnard, J. M. (1993). Substructure searching methods: old and new. Journal of Chemical Information and Computer Sciences, 33, 532–538.
Bernstein, F. C., Koetzle, T. F., Williams, G. J. B., Meyer, E. F. Jnr., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, M. and Tasumi, M. (1977). The Protein Data Bank: a computer-based archival file for macromolecular structures. Journal of Molecular Biology 112, 535–542.
Brint, A. T. and Willett, P. (1987a). Pharmacophoric pattern matching in files of 3D chemical structures: comparison of geometric searching algorithms. Journal of Molecular Graphics 5, 49–56.
Brint, A. T. and Willett, P. (1987b). Algorithms for the identification of three-dimensional maximal common substructures. Journal of Chemical Information and Cornputer Sciences 27, 152–158.
Brint, A. T. and Willett, P. (1988). Upperbound procedures for the identification of similar three-dimensional chemical structures. Journal of Computer-Aided Molecular Design 2, 311–320.
Bron, C. and Kerbosch, J. (1973). Algorithm 457. Finding all cliques of an undirected graph. Communications of the ACM 16, 575–577.
Brown, R. D., Jones, G., Willett, P. and Glen, R. C. (1994). Matching two-dimensional chemical graphs using genetic algorithms. Journal of Chemical Information and Computer Sciences 34, 63–70.
Bruno, I. J., Kemp, N. M., Artymiuk, P. J. and Willett, P. (1997). Representation and searching of carbohydrate structures using graph-theoretic techniques, submitted for publication.
Bures, M. G. (1997) Intergration of molecular modelling and database searching. In: Martin, Y. C. and Willett, P. (editors) (1997) Designing Bioactive Molecules: Three-Dimensional Techniques and Applications. Washington DC: American Chemical Society, in the press.
Bures, M. G., Danaher, E., DeLazzer, J. and Martin, Y. C. (1994). New molecular modelling tools using three-dimensional chemical substructures. Journal of Chemical Information and Computer Sciences 34, 218–223.
Burt, C., Richards, W. H. and Huxley, P. (1990). The application of molecular similarity calculations. Journal of Computational Chemistry 11, 1139–1146.
Clark, D. E., Jones, G., Willett, P., Kenny, P. W. and Glen, R. C. (1994). Pharmacophoric pattern matching in files of three-dimensional chemical structures: comparison of conformational-searching algorithms for flexible searching. Journal of Chemical Information and Computer Sciences, 34, 197–206.
Clark, D. E., Willett, P. and Kenny, P. W. (1992). Pharmacophoric pattern matching in files of three-dimensional chemical structures: use of smoothed bounded-distance matrices for the representation and searching of conformationally-flexible molecules. Journal of Molecular Graphics 10, 194–204.
Crandell, C. W. and Smith, D. H. (1983). Computer-assisted examination of compounds for common three-dimensional substructures. Journal of Chemical Information and Computer Sciences 23, 186–197.
Cringean, J. K., Pepperrell, C. A., Poirrette, A. R. and Willett, P. (1990). Selection of screens for three-dimensional substructure searching. Tetrahedron Computer Methodology 3, 37–46.
Crippen, G. M. and Havel, T. F. (1988). Distance Geometry and Molecular Conformation. Letchworth: Research Studies Press.
Doubet, S., Bock, K., Smith, D., Darvill, A. and Albersheim, P. (1989). The Complex Carbohydrate Structure Database. Trends in Biochemical Sciences 14, 475–477.
Downs, G. M., Lynch, M. F., Willett, P., Manson, G. A. and Wilson, G. A. (1988). Transputer implementations of chemical substructure searching algorithms. Tetrahedron Computer Methodology 1, 207–217.
Downs, G. M. and Willett, P. (1996) Similarity searching in databases of chemical structures. Reviews in Computational Chemistry 7, 1–66.
Fan, C., Moews, P. C., Shi, Y., Walsh, C. T. and Knox, J. R. (1995). A common fold for peptide synthetases cleaving ATP to ADP - glutathione synthetase and DAlanine-D-Alanine ligase of Escherichia Coli. Proceedings of the National Academy of Sciences of the USA, 92, 1172–1176.
Feizi, T. and Bundle, D. (1996) Carbohydrates and glycoconjugates. The coming age for oligosaccharide ligands and databases for saccharide structures. slCurrent Opinion in Structural Biology, 6, 659–662.
Fisanick, W., Cross, K. P. and Rusinko, A. (1992). Similarity searching on CAS Registry substances. 1. Global molecular property and generic atom triangle geometric searching. Journal of Chemical Information and Computer Sciences 32, 664–674.
Gardiner, E. (1996) An Evaluation of a New Clique-Detection Procedure for Matching Chemical Structures. MSc dissertation, University of Sheffield.
Garey, M. R. and Johnson, D. S. (1979). Computers and Intractability: a Guide to the Theory of NP-Completeness. San Francisco: W. H. Freeman.
Gati, G. (1979). Further annotated bibliography on the isomorphism disease. Journal of Graph Theory 3, 95–109.
Gibson, T. J. and Argos, P. (1990). Protruding domain of tomato bushy stunt virus coat protein is a hitherto unrecognized class of jellyroll conformation. Journal of Molecular Biology 212, 7–9.
Good, A. C., Hodgkin, E. E. and Richards, W. G. (1992). The utilisation of Gaussian functions for the rapid evaluation of molecular similarity. Journal of Chemical Information and Computer Science 32, 188–191.
Good, A. C. and Mason, J. S. (1996). Three-dimensional structure database searches. Reviews in Computational Chemistry 7, 67–117.
Gund, P. (1977). Three-dimensional pharmacophoric pattern searching. Progress in Molecular and Subcellular Biology 5, 117–143.
Hagadone, T. R. (1992). Molecular subsimilarity searching: efficient retrieval in two-dimensional structure databases. Journal of Chemical Information and Computer Sciences, 32, 515–521.
Hodes, L. (1976). Selection of descriptors according to discrimination and redundancy. Application to chemical substructure searching. Journal of Chemical Information and Computer Sciences 16, 88–93.
Hurst, T. (1994). Flexible 3D searching: the directed tweak technique. Journal of Chemical Information and Computer Sciences 34, 190–196.
Hutchinson, E. G. and Thornton, J. M. (1990). HERA — A program to draw schematic diagrams of protein secondary structures. Proteins: Structure, Function, and Genetics, 8, 203–212.
Jakes, S. E. and Willett, P. (1986). Pharmacophoric pattern matching in files of 3-D chemical structures: selection of inter-atomic distance screens. Journal of Molecular Graphics 4, 12–20.
Jones, G., Willett, P. and Glen, R. C. (1995). A genetic algorithm for flexible molecular overlay and pharmacophore detection. Journal of Computer-Aided Molecular Design 9, 532–549.
Kubinyi, H. (1993) (editor). 3D QSAR in Drug Design. Leiden: ESCOM.
Kuhl, F. S., Crippen, G. M. and Friesen, D. K. (1984). A combinatorial algorithm for calculating ligand binding. Journal of Computational Chemistry 5, 24–34.
Leach, A. R. (1991). A survey of methods for searching the conformational space of small and medium-sized molecules. Reviews in Computational Chemistry, 2, 1–55.
Levi, G. (1972). A note on the derivation of maximal common subgraphs of two directed or undirected graphs. Calcolo, 9, 341–352.
McGregor, J. J. (1982). Backtrack search algorithms and the maximal common subgraph problem. Software Practice and Experience, 12, 23–34.
Manaut, F., Sanz, F., Jose, J. and Milesi, M. (1991). Automatic search for maximum similarity between molecular electrostatic potential distributions. Journal of Computer-Aided Molecular Design 5, 371–380.
Martin, Y. C., Bures, M. G., Danaher, E. A., DeLazzer, J., Lico, I. and Pavlik, P. A. (1993). A fast new approach to pharmacophore mapping and its application to dopaminergic and benzodiazepine agonists. Journal of Computer-Aided Molecular Design 7, 83–102.
Martin, Y. C. and Willett, P. (editors) (1997) Designing Bioactive Molecules: Three-Dimensional Techniques and Applications. Washington DC: American Chemical Society, in the press.
Mitchell, E. M., Artymiuk, P. J., Rice, D. W. and Willett, P. (1990). Use of techniques derived from graph theory to compare secondary structure motifs in proteins. Journal of Molecular Biology, 212, 151–166.
Morgan, H. L. (1965). The generation of a unique machine description for chemical structures: a technique developed at Chemical Abstracts Service. Journal of Chemical Documentation 5, 107–113.
Nilakantan, R., Bauman, N. and Venkataraghavan, R. (1993). A new method for rapid characterisation of molecular shape: applications in drug design. Journal of Chemical Information and Computer Sciences 33, 79–85.
Pardalos, P. M. and Xue, J. (1994). The maximum clique problem. Journal of Global Optimization 4, 301–328.
Pearlman, R. S. (1987). Rapid generation of high quality approximate 3D molecular structures. Chemical Design Automation News, 2, 1–6.
Pepperrell, C. A. and Willett, P. (1991). Techniques for the calculation of three-dimensional structural similarity using inter-atomic distances. Journal of Computer-Aided Molecular Design 5, 455–474.
Pepperrell, C. A., Willett, P. and Taylor, R. (1990). Implementation and use of an atom-mapping procedure for similarity searching in databases of 3-D chemical structures. Tetrahedron Computer Methodology 3, 575–593.
Poirrette, A. R., Willett, P. and Allen, F. H. (1991). Pharmacophoric pattern matching in files of three-dimensional chemical structures: characterisation and use of generalised valence angle screens. Journal of Molecular Graphics 9, 203–217.
Poirrette, A. R., Willett, P. and Allen, F. H. (1993). Pharmacophoric pattern matching in files of three-dimensional chemical structures: characterisation and use of generalised torsion angle screens. Journal of Molecular Graphics 11, 2–14.
Ray, L. C. and Kirsch, R. A. (1957). Finding chemical records by digital computers. Science 126, 814–819.
Read, R. C. and Corneil, D. G. (1977). The graph isomorphism disease. Journal of Graph Theory 1, 339–363.
Richard, A. M. (1991). Quantitative comparison of molecular electrostatic potentials for structure-activity studies. Journal of Computational Chemistry 12, 959–969.
Richardson, J. S. (1977). β-sheet topology and the relatedness of proteins. Nature, 268,495–500.
Sadowski, J. and Gasteiger, J. (1993). From atoms and bonds to three-dimensional atomic coordinates: automatic model builders. Chemical Reviews 93, 2567–2581.
Sheridan, R. P., Nilakantan, R., Rusinko, A., Bauman, N., Haraki, K. S. and Venkataraghavan, R. (1989). 3DSEARCH: a system for three-dimensional substructure searching. Journal of Chemical Information and Computer Sciences 29, 255–260.
Stewart, J. J. M. (1990). MOPAC: a semi-empirical molecular orbital program. Journal of Computer-Aided Molecular Design 4, 1–105.
Sussenguth, E. H. (1965). A graph-theoretic algorithm for matching chemical structures. Journal of Chemical Documentation, 5, 36–43.
Thorner, D. A., Willett, P., Wright, P. M. and Taylor, R. (1997). Similarity searching in files of three-dimensional chemical structures: representation and searching of molecular electrostatic potentials using field-graphs. Journal of Computer-Aided Molecular Design 11, 163–174.
Thorner, D. A., Wild, D. J, Willett, P. and Wright, P. M. (1996). Similarity searching in files of three-dimensional chemical structures: flexible field-based searching of molecular electrostatic potentials. Journal of Chemical Information and Computer Sciences 36, 900–908.
Trinajstic, N. (editor) (1983).Chemical Graph Theory. Chichester: Ellis Horwood.
Ullmann, J. R. (1976). An algorithm for subgraph isomorphism. Journal of the Association for Computing Machinery, 23, 31–42.
Vinter, J. G. and Gardner, M. (editors) (1994). Molecular Modelling and Drug Design. London: Macmillan.
von Itzstein, M., Wu, W.-Y., Kok, G. B., Pegg, M. S., Dyason, J. C., Jin, B., Phan, T. V., Smythe, M. L., White, H. F., Oliver, S. W., Colman, P. M., Varghese, J. N., Ryan, D. M., Woods, J. M., Bethell, R. C., Hotham, V. J., Cameron, J. M. and Penn, C. R. (1993). Rational design of potent sialidase-based inhibitors of influenza-virus replication. Nature 363, 418–423.
Warr, W. A. and Willett, P. (1997). The principles and practice of 3D database searching. In: Martin, Y. C. and Willett, P. (editors) Designing Bioactive Molecules: Three-Dimensional Techniques and Applications. Washington DC: American Chemical Society, in the press.
Wild, D. J and Willett, P. (1996). Similarity searching in files of three-dimensional chemical structures: alignment of Molecular Electrostatic Potentials with a genetic algorithm. Journal of Chemical Information and Computer Sciences, 36, 159–167.
Willett, P. (1986). Modern Approaches to Chemical Reaction Searching. Aldershot: Gower.
Willett, P. (1991). Three-Dimensional Chemical Structure Handling. Taunton: Research Studies Press.
Wilson, R. (1972). Introduction to Graph Theory. Edinburgh: Oliver and Boyd.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer Science+Business Media New York
About this chapter
Cite this chapter
Willett, P. (1999). Matching of Chemical and Biological Structures Using Subgraph and Maximal Common Subgraph Isomorphism Algorithms. In: Truhlar, D.G., Howe, W.J., Hopfinger, A.J., Blaney, J., Dammkoehler, R.A. (eds) Rational Drug Design. The IMA Volumes in Mathematics and its Applications, vol 108. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1480-9_3
Download citation
DOI: https://doi.org/10.1007/978-1-4612-1480-9_3
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7159-8
Online ISBN: 978-1-4612-1480-9
eBook Packages: Springer Book Archive