Abstract
Elucidating the mechanisms of metabolic pathways helps us understand the cascade of enzyme-catalyzed reactions that lead to the conversion of substances into final products. This has implications for predicting how newly synthesized compounds will affect a person’s metabolism and, hence, the development of novel treatments to improve one’s health. The study of metabolomic pathways, together with protein engineering, may also aid in the extraction, at a scale, of natural products to be used as drugs and drug precursors. Several approaches have been used to correlate protein annotations to metabolic pathways in order to derive pathways directly related to specific organisms. These could range from association rule-mining techniques to machine learning methods such as decision trees, naïve Bayes, logistic regression, and ensemble methods.
In this chapter, we will be reviewing the use of machine learning for metabolic pathway analyses, with a step-by-step focus on the use of deep learning to predict the association of compounds (metabolites) to their respective metabolomic pathway classes. This prediction could help explain interactions of small molecules in organisms. Inspired by the work of Baranwal et al. (2019), we demonstrate how to build and train a deep learning neural network model to perform a multi-label prediction. We considered two different types of fingerprints as features (inputs to the model). The output of the model is the set of metabolic pathway classes (from the KEGG dataset) in which the input molecule participates. We will walk through the various steps of this process, including data collection, feature engineering, model selection, training, and evaluation. This model-building and evaluation process may be easily transferred to other domains of interest. All the source code used in this chapter is made publicly available at https://github.com/jp-um/machine_learning_for_metabolomic_pathway_analyses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Nicholson JK, Lindon JC (2008) Systems biology: metabonomics. Nature 455:1054
Fiehn O (2002) Metabolomics – the link between genotypes and phenotypes. Plant Mol Biol 48:155
Holmes E, Wilson ID, Nicholson JK (2008) Metabolic phenotyping in health and disease. Cell 134:714
Vermeersch KA, Styczynski MP (2013) Applications of metabolomics in cancer research. J Carcinog 12:9
Kraj A, Drabik A, Silberring (2010) Nowe podejście w oznaczaniu i identyfikacji mikroorganizmów (Polish). Wydawnictwa Uniwersytetu Warszawskiego, Warszawa, pp 1-4–15-18
Bu Q, Huang YN, Yan GY, Cen XB, Zhao YL (2012) Metabolomics: a revolution for novel cancer marker identification. Comb Chem High Throughput Screen 15:266
Spratlin JL, Serkova NJ, Eckhardt SG (2009) Clinical applications of metabolomics in oncology: a review. Clin Cancer Res 15:431
Gika HG, Theodoridis GA, Plumb RS, Wilson ID (2014) Current practice of liquid chromatography-mass spectrometry in metabolomics and metabonomics. J Pharm Biomed Anal 87:12
Blekherman G, Laubenbacher R, Cortes DF, Mendes P, Torti FM et al (2011) Bioinformatics tools for cancer metabolomics. Metabolomics 7:329
Ellis DI, Dunn WB, Griffin JL, Allwood JW, Goodacre R (2007) Metabolic fingerprinting as a diagnostic tool. Pharmacogenomics 8:1243
Drexler DM, Reily MD, Shipkova PA (2011) Metabolomics guides rational development of a simplified cell culture medium for drug screening against Trypanosoma brucei. Anal Bioanal Chem 399:2645
Schuhmacher R, Krska R, Weckwerth W, Goodacre R (2013) Metabolomics and metabolite profiling. Anal Bioanal Chem 405:5003
McCulloch W, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133
Le Cun Y, Bengio Y, Hinton G (2015) Deep learning, 436. Nature 521
Cambiaghi A, Ferrario M, Masseroli M (2016) Analysis of metabolomic data: tools, current strategies and future challenges for omics data integration. Brief Bioinform 18(3):498–510
Smith R, Ventura D, Prince JT (2013) LC-MS alignment in theory and practice: a comprehensive algorithmic review. Brief Bioinform 16(1):104–117
Alonso A, Marsal S, Julia A (2015) Analytical methods in untargeted metabolomics: state of the art in 2015. Front Bioeng Biotechnol 3:23
Nguyen DH, Nguyen CH, Mamitsuka H (2018) Recent advances and prospects of computational methods for metabolite identification: a review with emphasis on machine learning approaches. Brief Bioinform 20(6):2028–2043
Puchades-Carrasco L, Palomino-Schatzlein M, Perez-Rambla C et al (2015) Bioinformatics tools for the analysis of NMR metabolomics studies focused on the identification of clinically relevant biomarkers. Brief Bioinform 17(3):541–552
Baranwal M, Magner A, Elvati P et al (2020) A deep learning architecture for metabolomic pathway prediction. Bioinformatics 36(8):2547–2553
Pomyen Y, Wanichthanarak K, Poungsombat P et al (2020) Deep metabolome: applications of deep learning in metabolomics. Comput Struct Biotechnol J 18:2818–2825
Chollet F (2017) Deep learning with python. Manning Publications Co
Chollet F, Allaire JJ (2018) Deep learning with R. Manning Publications Co.
Kim P (2017) MATLAB deep learning: with machine learning, neural networks and artificial intelligence. Apress
Abadi M, Barham P, Chen J et al (2016) Tensorflow: a system for large-scale machine learning. Proc. 12th USENIX Symposium on Operating Systems Design and Implementation
Chollet F. Keras. https://keras.io. Accessed 6th Jan 2022
Pazke A, Gross S, Massa F et al (2019) PyTorch: an imperative style, high-performance deep learning library. Adv Neural Inform Proc Syst 32:8024–8035
Pedregosa F, Varoquaux G, Gramfort A et al (2019) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830, 2011
KEGG Pathway Database. Available at: https://www.genome.jp/kegg/pathway.html. Accessed 6th Jan 2022
Good AC, Oprea TI (2008) Optimization of CAMD techniques 3. Virtual screening enrichment studies: a help or hindrance in tool selection? J Comput-aided Mol Des 22:169–178
RDKit: Open-source cheminformatics, available at https://www.rdkit.org. Accessed 6th Jan 2022
Weininger D (1988) SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 28(1):31–36
Butina D (1999) Unsupervised data base clustering based on daylight’s fingerprint and tanimoto similarity: a fast and automated way to cluster small and large data sets. J Chem Inf Comput Sci 39(4):747–750
Ballester PJ, Richards WG (2007) Ultrafast shape recognition for similarity search in molecular databases. Proc R Soc A Math Phys Eng Sci 463:1307–1321
Rogers D, Hahn M (2010) Extended-connectivity fingerprints. J Chem Inf Model 50(5):742–754
Durant JL, Leland BA, Henry DR, Nourse JG (2002) Reoptimization of MDL keys for use in drug discovery. J Chem Inform Comp Sci 42:1273–1280
Szymanski P, Kajdanowicz T (2017) A scikit-based Python environment for performing multi-label classification. arXiv:1702.01460
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. arXiv:1412.6980
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Bonetta Valentino, R., Ebejer, JP., Valentino, G. (2023). Machine Learning Using Neural Networks for Metabolomic Pathway Analyses. In: Selvarajoo, K. (eds) Computational Biology and Machine Learning for Metabolic Engineering and Synthetic Biology. Methods in Molecular Biology, vol 2553. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2617-7_17
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2617-7_17
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2616-0
Online ISBN: 978-1-0716-2617-7
eBook Packages: Springer Protocols