Abstract
Mining chemical compounds in silico has drawn increasing attention from both academia and pharmaceutical industry due to its effectiveness in aiding the drug discovery process. Since graphs are the natural representation for chemical compounds, most of the mining algorithms focus on mining chemical graphs. Chemical graph mining approaches have many applications in the drug discovery process that include structure-activity-relationship (SAR) model construction and bioactivity classification, similar compound search and retrieval from chemical compound database, target identification from phenotypic assays, etc. Solving such problems in silico through studying and mining chemical graphs can provide novel perspective to medicinal chemists, biologist and toxicologist. Moreover, since the large scale chemical graph mining is usually employed at the early stages of drug discovery, it has the potential to speed up the entire drug discovery process. In this chapter, we discuss various problems and algorithms related to mining chemical graphs and describe some of the state-of-the-art chemical graph mining methodologies and their applications.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, first edition, 1999.
H.J. Bohm and G. Schneider. Virtual Screening for Bioactive Molecules. Wiley-VCH, 2000.
K. M. Borgwardt, C. S. Ong, S. Schonauer, S. V. Vishwanathan, A. Smola, and H. P. Kriegel. Protein function prediction via graph kernels. BMC Bioinformatics, 21:47–56, 2005.
Chemaxon. Screen, Chemaxon Inc., 2005.
Y. Z. Chen and C. Y. Ung. Prediction of potential toxicity and side effect protein targets of a small molecule by a ligand-protein inverse docking approach. J Mol Graph Model, 20(3):199–218, 2001.
K. Crammer and Y. Singer. A new family of online algorithms for category ranking. Journal of Machine Learning Research., 3:1025–1058, 2003.
Daylight. Daylight Toolkit, Daylight Inc, Mission Viejo, CA, USA, 2008.
M. Deshpande, M. Kuramochi, N. Wale, and G. KarypisFrequent substructure-based approaches for classifying chemical compounds. IEEE TKDE., 17(8):1036–1050, 2005.
Inderjit S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Knowledge Discovery and Data Mining, pages 269–274, 2001.
J. L. Durant, B. A. Leland, D. R. Henry, and J. G. Nourse. Reoptimization of mdl keys for use in drug discovery. J. Chem. Info. Model., 42(6):1273–1280, 2002.
ECFP. Pipeline Pilot, Accelrys Inc: San Diego CA 2008., 2006.
Ulrike S Eggert and Timothy J Mitchison. Small molecule screening by imaging. Curr Opin Chem Biol, 10(3):232–237, Jun 2006.
F. Fouss, A. Pirotte, J. Renders, and M. Sacrens. Random walk computation of similarities between nodes of a graph with application to collaborative filtering. IEEE TKDE, 19(3):355–369, 2007.
H. Geppert, T. Horvath, T. Gartner, S. Wrobel, and J. Bajorath. Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2d fingerprints and multiple reference compounds. J. Chem. Inf. Model., 48:742–746, 2008.
M. Glick, J. L. Jenkins, J. H. Nettles, H. Hitchings, and J. H. Davies. Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and laplacian-modified naive bayesian classifiers. J. Chem. Inf. Model., 46:193–200, 2006.
S. Godbole and S. Sarawagi. Discriminative methods for multi-labeled classification. PAKDD., pages 22–30, 2004.
C. Hansch, P. P. Maolney, T. Fujita, and R. M. Muir. Correlation of biological activity of phenoxyacetic acids with hammett substituent constants and partition coefficients. Nature, 194:178–180, 1962.
J. Hert, P. Willet, and D. Wilton. New methods for ligand based virtual screening: Use of data fusion and machine learning to enchance the effectiveness of similarity searching. J. Chem. Info. Model., 46:462–470, 2006.
J. Hert, P. Willett, D. J. Wilton, P. Acklin, K. Azzaoui, E. Jacoby, and A. Schuffenhauer. Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org Biomol Chem, 2(22):3256–66, 2004.
Hologram. Hologram Fingerprints, Tripos Inc. 1699 South Hanley Road, St Louis, MO 63144-2913, USA. http://www.tripos.com, 2003.
Andrew L. Hopkins. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol, 4(11):682–690, November 2008.
J. Huan, D. Bandyopadhyay, W. Wang, J. Snoeyink, J. Prins, and A. Tropsha. Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J. Comput. Biol., 12(6):657–671, 2005.
J. L. Jenkins, A. Bender, and J. W. Davies. In silico target fishing: Predicting biological targets from chemical structure. Drug Discovery Today, 3(4):413–421, 2006.
R. N. Jorissen and M. K. Gibson. Virtual screening of molecular databases using support vector machines. J. Chem. Info. Model., 45(3):549–561, 2005.
K. Kawai, S. Fujishima, and Y. Takahashi. Predictive activity profiling of drugs by topological-fragment-spectra-based support vector machines. J. Chem. Info. Model., 48(6):1152–1160, 2008.
T. Kogej, O. Engkvist, N. Blomberg, and S. Moresan. Multifingerprint based similarity searches for targeted class compound selection. J. Chem. Info. Model., 46(3):1201–1213, 2006.
M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. IEEE TKDE., 16(9):1038–1051, 2004.
A. R. Leach and V. J. Gillet. An Introduction to Chemoinformatics. Springer, 2003.
Andrew R. Leach. Molecular Modeling: Principles and Applications. Prentice Hall, Englewood Cliffs, NJ, second edition, 2001.
W. Liu, W. Lin, A. Davis, F. Jordan, H. Yang, and M. Hwang. A network perspective on the topological importance of enzymes and their phylogenetic conservation. BMC Bioinformatics, 8:121, 2007.
Y. Liu. A comparative study on feature selection methods for drug discovery. J. Chem. Inf. Comput. Sci., 44:1823–1828, 2004.
MDL. MDL Information Systems Inc., San Leandro, CA, USA. http://www.mdl.com, 2004.
S. Menchetti, F. Costa, and P. Frasconi. Weighted decomposition kernels. Proceedings of the 22nd International Conference in Machine Learning., 119:585–592, 2005.
H. L. Morgan. The generation of unique machine description for chemical structures: a technique developed at chemical abstract services. Journal of Chemical Documentation, 5:107–113, 1965.
J. Nettles, J. Jenkins, A. Bender, Z. Deng, J. Davies, and M. Glick. Bridging chemical and biological space: “target fishing” using 2d and 3d molecular descriptors. J Med Chem, 49:6802–6810, Nov 2006.
Nidhi, M. Glick, J. Davies, and J. Jenkins. Prediction of biological targets for compounds using multiple-category bayesian models trained on chemogenomics databases. J Chem Inf Model, 46:1124–1133, 2006.
S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. Proceedings of SIGKDD, pages 647–652, 2004.
G. V. Paolini, R. H. Shapland, W. P. Van Hoorn, J. S. Mason, and A. Hopkins. Global mapping of pharmacological space. Nature biotechnology, 24:805–815, 2006.
Pubchem. The PubChem Project, 2007.
L. Ralaivola, S. J. Swamidassa, H. Saigo, and P. Baldi. Graph kernels for chemical informatics. Neural Networks, 18(8):1093–1110, 2005.
J. W. Raymond and P. Willett. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comp. Aided Mol. Des., 16(7):521–533, 2002.
D. Rogers, R. Brown, and M. Hahn. Using extended-connectivity fingerprints with laplacian-modified bayesian analysis in high-throughput screening. J. Biomolecular Screening, 10(7):682–686, 2005.
D. Rognan. Chemogenomic approaches to rational drug design. Br J Pharmacol, 152(1):38–52, Sep 2007.
A. P. Russ and S. Lampel. The druggable genome: an update. Drug Discov Today, 10(23–24):1607–10, 2005.
Jamal C. Saeh, Paul D. Lyne, Bryan K. Takasaki, and David A. Cosgrove. Lead hopping using svm and 3d pharmacophore fingerprints. J. Chem. Info. Model., 45:1122–113, 2005.
Frank Sams-Dodd. Target-based drug discovery: is something wrong? Drug Discov Today, 10(2):139–147, Jan 2005.
A.J. Smola and R. Kondor. Kernels and regularization on graphs. In Proceedings COLT and Kernels Workshop, pages 144–158. M. Warmuth and B. Scholkopf, 2003.
Nikolaus Stiefl, Ian A. Watson, Kunt Baumann, and Andrea Zaliani. Erg: 2d pharmacophore descriptor for scaffold hopping. J. Chem. Info. Model., 46:208–220, 2006.
S. J. Swamidass, J. Chen, J. Bruand, P. Phung, L. Ralaivola, and P. Baldi. Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(1):359–368, 2005.
B. Teufel and S. Schmidt. Full text retrieval based on syntactic similarities. Information Systems, 31(1), 1988.
Unity. Unity Fingerprints, Tripos Inc. 1699 South Hanley Road, St Louis, MO 63144-2913, USA. http://www.tripos.com, 2003.
V. Vapnik. Statistical Learning Theory. John Wiley, New York, 1998.
N. Wale and G. Karypis. Target identification for chemical compounds using target-ligand activity data and ranking based methods. Technical Report TR-08-035, University of Minnesota, 2008. Accepted: Jour. Chem. Inf. Model, Published on the web, September 18, 2009.
N. Wale, G. Karypis, and I. A. Watson. Method for effective virtual screening and scaffold-hopping in chemical compounds. Comput Syst Bioinformatics Conf, 6:403–414, 2007.
N. Wale, I. A. Watson, and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14:347–375, 2008.
N. Wale, I. A. Watson, and G. Karypis. Indirect similarity based methods for effective scaffold-hopping in chemical compounds. J. Chem. Info. Model., 48(4):730–741, 2008.
A. M. Wassermann, H. Geppert, and J. Bajorath. Searching for target-selective compounds using different combinations of multiclass support vector machine ranking methods, kernel functions, and fingerprint descriptors. J. Chem. Inf. Model., 49:582–592, 2009.
J. Wegner, H. Frohlich, and Andreas Zell. Feature selection for descriptor based classification models. 1. theory and ga-sec algorithm. J. Chem. Inf. Comput. Sci., 44:921–930, 2004.
P. Willett. A screen set generation algorithm. J. Chem. Inf. Comput. Sci., 19:159–162, 1979.
Y. Yamanishi, M. Araki, A. Gutteridge, W. Hondau, and M. Kanehisa. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24:232–240, 2008.
Xifeng Yan and Jiawei Han. gspan: Graph-based substructure pattern mining. ICDM, pages 721–724, 2002.
M. Yildirim, K. Goh, M. Cusick, A. Barabasi, and M. Vidal. Drug-target network. Nat Biotechnol, 25(10):1119–1126, Oct 2007.
Brian P. Zambrowicz and Arthur T. Sands. Modeling drug action in the mouse with knockouts and rna interference. Drug Discovery Today: TARGETS, 3(5):198–207, 2004.
Qiang Zhang and Ingo Muegge. Scaffold hopping through virtual screening using 2d and 3d similarity descriptors: Ranking, voting and consensus scoring. J. Chem. Info. Model., 49:1536–1548, 2006.
Ziding Zhang and Martin G Grigorov. Similarity networks of protein binding sites. Proteins, 62(2):470–478, Feb 2006.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag US
About this chapter
Cite this chapter
Wale, N., Ning, X., Karypis, G. (2010). Trends in Chemical Graph Data Mining. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_19
Download citation
DOI: https://doi.org/10.1007/978-1-4419-6045-0_19
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)