Trends in Chemical Graph Data Mining

Wale, Nikil; Ning, Xia; Karypis, George

doi:10.1007/978-1-4419-6045-0_19

Nikil Wale³,
Xia Ning³ &
George Karypis³

Part of the book series: Advances in Database Systems ((ADBS,volume 40))

7390 Accesses
8 Citations

Abstract

Mining chemical compounds in silico has drawn increasing attention from both academia and pharmaceutical industry due to its effectiveness in aiding the drug discovery process. Since graphs are the natural representation for chemical compounds, most of the mining algorithms focus on mining chemical graphs. Chemical graph mining approaches have many applications in the drug discovery process that include structure-activity-relationship (SAR) model construction and bioactivity classification, similar compound search and retrieval from chemical compound database, target identification from phenotypic assays, etc. Solving such problems in silico through studying and mining chemical graphs can provide novel perspective to medicinal chemists, biologist and toxicologist. Moreover, since the large scale chemical graph mining is usually employed at the early stages of drug discovery, it has the potential to speed up the entire drug discovery process. In this chapter, we discuss various problems and algorithms related to mining chemical graphs and describe some of the state-of-the-art chemical graph mining methodologies and their applications.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Graph-Based Methods for Rational Drug Design

Boosting Similar Compounds Searches via Correlated Subgraph Analysis

Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction

Article Open access 03 April 2023

Keywords

References

Ricardo Baeza-Yates and Berthier Ribeiro-Neto. Modern Information Retrieval. Addison Wesley, first edition, 1999.
Google Scholar
H.J. Bohm and G. Schneider. Virtual Screening for Bioactive Molecules. Wiley-VCH, 2000.
Google Scholar
K. M. Borgwardt, C. S. Ong, S. Schonauer, S. V. Vishwanathan, A. Smola, and H. P. Kriegel. Protein function prediction via graph kernels. BMC Bioinformatics, 21:47–56, 2005.
Google Scholar
Chemaxon. Screen, Chemaxon Inc., 2005.
Google Scholar
Y. Z. Chen and C. Y. Ung. Prediction of potential toxicity and side effect protein targets of a small molecule by a ligand-protein inverse docking approach. J Mol Graph Model, 20(3):199–218, 2001.
Article Google Scholar
K. Crammer and Y. Singer. A new family of online algorithms for category ranking. Journal of Machine Learning Research., 3:1025–1058, 2003.
Article MATH MathSciNet Google Scholar
Daylight. Daylight Toolkit, Daylight Inc, Mission Viejo, CA, USA, 2008.
Google Scholar
M. Deshpande, M. Kuramochi, N. Wale, and G. KarypisFrequent substructure-based approaches for classifying chemical compounds. IEEE TKDE., 17(8):1036–1050, 2005.
Google Scholar
Inderjit S. Dhillon. Co-clustering documents and words using bipartite spectral graph partitioning. In Knowledge Discovery and Data Mining, pages 269–274, 2001.
Google Scholar
J. L. Durant, B. A. Leland, D. R. Henry, and J. G. Nourse. Reoptimization of mdl keys for use in drug discovery. J. Chem. Info. Model., 42(6):1273–1280, 2002.
Article Google Scholar
ECFP. Pipeline Pilot, Accelrys Inc: San Diego CA 2008., 2006.
Google Scholar
Ulrike S Eggert and Timothy J Mitchison. Small molecule screening by imaging. Curr Opin Chem Biol, 10(3):232–237, Jun 2006.
Article Google Scholar
F. Fouss, A. Pirotte, J. Renders, and M. Sacrens. Random walk computation of similarities between nodes of a graph with application to collaborative filtering. IEEE TKDE, 19(3):355–369, 2007.
Google Scholar
H. Geppert, T. Horvath, T. Gartner, S. Wrobel, and J. Bajorath. Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2d fingerprints and multiple reference compounds. J. Chem. Inf. Model., 48:742–746, 2008.
Article Google Scholar
M. Glick, J. L. Jenkins, J. H. Nettles, H. Hitchings, and J. H. Davies. Enrichment of high-throughput screening data with increasing levels of noise using support vector machines, recursive partitioning, and laplacian-modified naive bayesian classifiers. J. Chem. Inf. Model., 46:193–200, 2006.
Article Google Scholar
S. Godbole and S. Sarawagi. Discriminative methods for multi-labeled classification. PAKDD., pages 22–30, 2004.
Google Scholar
C. Hansch, P. P. Maolney, T. Fujita, and R. M. Muir. Correlation of biological activity of phenoxyacetic acids with hammett substituent constants and partition coefficients. Nature, 194:178–180, 1962.
Article Google Scholar
J. Hert, P. Willet, and D. Wilton. New methods for ligand based virtual screening: Use of data fusion and machine learning to enchance the effectiveness of similarity searching. J. Chem. Info. Model., 46:462–470, 2006.
Article Google Scholar
J. Hert, P. Willett, D. J. Wilton, P. Acklin, K. Azzaoui, E. Jacoby, and A. Schuffenhauer. Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. Org Biomol Chem, 2(22):3256–66, 2004.
Article Google Scholar
Hologram. Hologram Fingerprints, Tripos Inc. 1699 South Hanley Road, St Louis, MO 63144-2913, USA. http://www.tripos.com, 2003.
Andrew L. Hopkins. Network pharmacology: the next paradigm in drug discovery. Nat Chem Biol, 4(11):682–690, November 2008.
Article Google Scholar
J. Huan, D. Bandyopadhyay, W. Wang, J. Snoeyink, J. Prins, and A. Tropsha. Comparing graph representations of protein structure for mining family-specific residue-based packing motifs. J. Comput. Biol., 12(6):657–671, 2005.
Article Google Scholar
J. L. Jenkins, A. Bender, and J. W. Davies. In silico target fishing: Predicting biological targets from chemical structure. Drug Discovery Today, 3(4):413–421, 2006.
Article Google Scholar
R. N. Jorissen and M. K. Gibson. Virtual screening of molecular databases using support vector machines. J. Chem. Info. Model., 45(3):549–561, 2005.
Article Google Scholar
K. Kawai, S. Fujishima, and Y. Takahashi. Predictive activity profiling of drugs by topological-fragment-spectra-based support vector machines. J. Chem. Info. Model., 48(6):1152–1160, 2008.
Article Google Scholar
T. Kogej, O. Engkvist, N. Blomberg, and S. Moresan. Multifingerprint based similarity searches for targeted class compound selection. J. Chem. Info. Model., 46(3):1201–1213, 2006.
Article Google Scholar
M. Kuramochi and G. Karypis. An efficient algorithm for discovering frequent subgraphs. IEEE TKDE., 16(9):1038–1051, 2004.
Google Scholar
A. R. Leach and V. J. Gillet. An Introduction to Chemoinformatics. Springer, 2003.
Google Scholar
Andrew R. Leach. Molecular Modeling: Principles and Applications. Prentice Hall, Englewood Cliffs, NJ, second edition, 2001.
Google Scholar
W. Liu, W. Lin, A. Davis, F. Jordan, H. Yang, and M. Hwang. A network perspective on the topological importance of enzymes and their phylogenetic conservation. BMC Bioinformatics, 8:121, 2007.
Article Google Scholar
Y. Liu. A comparative study on feature selection methods for drug discovery. J. Chem. Inf. Comput. Sci., 44:1823–1828, 2004.
Google Scholar
MDL. MDL Information Systems Inc., San Leandro, CA, USA. http://www.mdl.com, 2004.
S. Menchetti, F. Costa, and P. Frasconi. Weighted decomposition kernels. Proceedings of the 22nd International Conference in Machine Learning., 119:585–592, 2005.
Article Google Scholar
H. L. Morgan. The generation of unique machine description for chemical structures: a technique developed at chemical abstract services. Journal of Chemical Documentation, 5:107–113, 1965.
Article Google Scholar
J. Nettles, J. Jenkins, A. Bender, Z. Deng, J. Davies, and M. Glick. Bridging chemical and biological space: “target fishing” using 2d and 3d molecular descriptors. J Med Chem, 49:6802–6810, Nov 2006.
Article Google Scholar
Nidhi, M. Glick, J. Davies, and J. Jenkins. Prediction of biological targets for compounds using multiple-category bayesian models trained on chemogenomics databases. J Chem Inf Model, 46:1124–1133, 2006.
Article Google Scholar
S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. Proceedings of SIGKDD, pages 647–652, 2004.
Google Scholar
G. V. Paolini, R. H. Shapland, W. P. Van Hoorn, J. S. Mason, and A. Hopkins. Global mapping of pharmacological space. Nature biotechnology, 24:805–815, 2006.
Article Google Scholar
Pubchem. The PubChem Project, 2007.
Google Scholar
L. Ralaivola, S. J. Swamidassa, H. Saigo, and P. Baldi. Graph kernels for chemical informatics. Neural Networks, 18(8):1093–1110, 2005.
Article Google Scholar
J. W. Raymond and P. Willett. Maximum common subgraph isomorphism algorithms for the matching of chemical structures. J. Comp. Aided Mol. Des., 16(7):521–533, 2002.
Article Google Scholar
D. Rogers, R. Brown, and M. Hahn. Using extended-connectivity fingerprints with laplacian-modified bayesian analysis in high-throughput screening. J. Biomolecular Screening, 10(7):682–686, 2005.
Article Google Scholar
D. Rognan. Chemogenomic approaches to rational drug design. Br J Pharmacol, 152(1):38–52, Sep 2007.
Article Google Scholar
A. P. Russ and S. Lampel. The druggable genome: an update. Drug Discov Today, 10(23–24):1607–10, 2005.
Article Google Scholar
Jamal C. Saeh, Paul D. Lyne, Bryan K. Takasaki, and David A. Cosgrove. Lead hopping using svm and 3d pharmacophore fingerprints. J. Chem. Info. Model., 45:1122–113, 2005.
Article Google Scholar
Frank Sams-Dodd. Target-based drug discovery: is something wrong? Drug Discov Today, 10(2):139–147, Jan 2005.
Article Google Scholar
A.J. Smola and R. Kondor. Kernels and regularization on graphs. In Proceedings COLT and Kernels Workshop, pages 144–158. M. Warmuth and B. Scholkopf, 2003.
Google Scholar
Nikolaus Stiefl, Ian A. Watson, Kunt Baumann, and Andrea Zaliani. Erg: 2d pharmacophore descriptor for scaffold hopping. J. Chem. Info. Model., 46:208–220, 2006.
Article Google Scholar
S. J. Swamidass, J. Chen, J. Bruand, P. Phung, L. Ralaivola, and P. Baldi. Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity. Bioinformatics, 21(1):359–368, 2005.
Article Google Scholar
B. Teufel and S. Schmidt. Full text retrieval based on syntactic similarities. Information Systems, 31(1), 1988.
Google Scholar
Unity. Unity Fingerprints, Tripos Inc. 1699 South Hanley Road, St Louis, MO 63144-2913, USA. http://www.tripos.com, 2003.
V. Vapnik. Statistical Learning Theory. John Wiley, New York, 1998.
MATH Google Scholar
N. Wale and G. Karypis. Target identification for chemical compounds using target-ligand activity data and ranking based methods. Technical Report TR-08-035, University of Minnesota, 2008. Accepted: Jour. Chem. Inf. Model, Published on the web, September 18, 2009.
Google Scholar
N. Wale, G. Karypis, and I. A. Watson. Method for effective virtual screening and scaffold-hopping in chemical compounds. Comput Syst Bioinformatics Conf, 6:403–414, 2007.
Article Google Scholar
N. Wale, I. A. Watson, and G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification. Knowledge and Information Systems, 14:347–375, 2008.
Article Google Scholar
N. Wale, I. A. Watson, and G. Karypis. Indirect similarity based methods for effective scaffold-hopping in chemical compounds. J. Chem. Info. Model., 48(4):730–741, 2008.
Article Google Scholar
A. M. Wassermann, H. Geppert, and J. Bajorath. Searching for target-selective compounds using different combinations of multiclass support vector machine ranking methods, kernel functions, and fingerprint descriptors. J. Chem. Inf. Model., 49:582–592, 2009.
Article Google Scholar
J. Wegner, H. Frohlich, and Andreas Zell. Feature selection for descriptor based classification models. 1. theory and ga-sec algorithm. J. Chem. Inf. Comput. Sci., 44:921–930, 2004.
Google Scholar
P. Willett. A screen set generation algorithm. J. Chem. Inf. Comput. Sci., 19:159–162, 1979.
Google Scholar
Y. Yamanishi, M. Araki, A. Gutteridge, W. Hondau, and M. Kanehisa. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24:232–240, 2008.
Article Google Scholar
Xifeng Yan and Jiawei Han. gspan: Graph-based substructure pattern mining. ICDM, pages 721–724, 2002.
Google Scholar
M. Yildirim, K. Goh, M. Cusick, A. Barabasi, and M. Vidal. Drug-target network. Nat Biotechnol, 25(10):1119–1126, Oct 2007.
Article Google Scholar
Brian P. Zambrowicz and Arthur T. Sands. Modeling drug action in the mouse with knockouts and rna interference. Drug Discovery Today: TARGETS, 3(5):198–207, 2004.
Article Google Scholar
Qiang Zhang and Ingo Muegge. Scaffold hopping through virtual screening using 2d and 3d similarity descriptors: Ranking, voting and consensus scoring. J. Chem. Info. Model., 49:1536–1548, 2006.
Google Scholar
Ziding Zhang and Martin G Grigorov. Similarity networks of protein binding sites. Proteins, 62(2):470–478, Feb 2006.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science & Engineering University of Minnesota, Twin Cities, US
Nikil Wale, Xia Ning & George Karypis

Authors

Nikil Wale
View author publications
You can also search for this author in PubMed Google Scholar
Xia Ning
View author publications
You can also search for this author in PubMed Google Scholar
George Karypis
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikil Wale .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, U.S.A.
Charu C. Aggarwal
Microsoft Research Asia, Zhichun Road 49, Beijing, 100080, China, People's Republic
Haixun Wang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wale, N., Ning, X., Karypis, G. (2010). Trends in Chemical Graph Data Mining. In: Aggarwal, C., Wang, H. (eds) Managing and Mining Graph Data. Advances in Database Systems, vol 40. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-6045-0_19

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6045-0_19
Published: 18 January 2010
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-6044-3
Online ISBN: 978-1-4419-6045-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Trends in Chemical Graph Data Mining

Abstract

Chapter PDF

Similar content being viewed by others

Graph-Based Methods for Rational Drug Design

Boosting Similar Compounds Searches via Correlated Subgraph Analysis

Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Trends in Chemical Graph Data Mining

Abstract

Chapter PDF

Similar content being viewed by others

Graph-Based Methods for Rational Drug Design

Boosting Similar Compounds Searches via Correlated Subgraph Analysis

Pharmacophoric-constrained heterogeneous graph transformer model for molecular property prediction

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation