Abstract
The integrative method approaches are continuously evolving to provide accurate insights from the data that is received through experimentation on various biological systems. Multi-omics data can be integrated with predictive machine learning algorithms in order to provide results with high accuracy. This protocol chapter defines the steps required for the ML-multi-omics integration methods that are applied on biological datasets for its analysis and the visual interpretation of the results thus obtained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cobb M (2017) 60 years ago, Francis crick changed the logic of biology. PLoS Biol 15(9):e2003243–e2003243
Reel PS, Reel S, Pearson E et al (2021) Using machine learning approaches for multi-omics data analysis: a review. Biotechnol Adv 49:107739
Surowiec I, Karimpour M, Gouveia-Figueira S et al (2016) Multi-platform metabolomics assays for human lung lavage fluids in an air pollution exposure study. Anal Bioanal Chem 408(17):4751–4764
Wei Z, Xi J, Gao S et al (2018) Metabolomics coupled with pathway analysis characterizes metabolic changes in response to BDE-3 induced reproductive toxicity in mice. Sci Rep 8(1):5423–5423
Karnovsky A, Weymouth T, Hull T et al (2012) Metscape 2 bioinformatics tool for the analysis and visualization of metabolomics and gene expression data. Bioinformatics (Oxford, England) 28(3):373–380
Li S, Park Y, Duraisingham S et al (2013) Predicting network activity from high throughput metabolomics. PLoS Comput Biol 9(7):e1003123–e1003123
Chakraborty S, Hosen MI, Ahmed M et al (2018) Onco-multi-OMICS approach: a new frontier in cancer research. Biomed Res Int 2018:9836256–9836256
Sathya R, Abraham A (2013) Comparison of supervised and unsupervised learning algorithms for pattern classification. Int J Adv Res Artif Intell 2(2)
Argelaguet R, Velten B, Arnol D et al (2018) Multi-omics factor analysis-a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol 14(6):e8124–e8124
Meng C, Helm D, Frejno M et al (2015) moCluster: identifying joint patterns across multiple omics data sets. J Proteome Res 15(3):755–765
Fridley BL, Lund S, Jenkins GD et al (2012) A Bayesian integrative genomic model for pathway analysis of complex traits. Genet Epidemiol 36(4):352–359
Wu D, Wang D, Zhang MQ et al (2015) Fast dimension reduction and integrative clustering of multi-omics data using low-rank approximation: application to cancer molecular classification. BMC Genomics 16:1022–1022
Shen R, Olshen AB, Ladanyi M (2009) Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics (Oxford, England) 25(22):2906–2912
Raftopoulou P, Petrakis EGM iCluster: A self-organizing overlay network for P2P information retrieval. Lecture notes in computer science. Springer, Berlin/Heidelberg, pp 65–76
Subramanian I, Verma S, Kumar S et al (2020) Multi-omics data integration, interpretation, and its application. Bioinform Biol Insights 14:1177932219899051–1177932219899051
Lock EF, Hoadley KA, Marron JS et al (2013) Joint and individual variation explained (jive) for integrated analysis of multiple data types. Ann Appl Stat 7(1):523–542
Ray P, Zheng L, Lucas J et al (2014) Bayesian joint analysis of heterogeneous genomics data. Bioinformatics 30(10):1370–1376
Zhang S, Liu C-C, Li W et al (2012) Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res 40(19):9379–9391
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429
Salzberg SL (1994) C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993. Mach Learn 16(3):235–240
Domingos P, Pazzani M (1997) Mach Learn 29(2/3):103–130
Vapnik VN (2000) Direct methods in statistical learning theory. The nature of statistical learning theory. Springer, New York, pp 225–265
Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46(3):175–185
Cleary JG, Trigg LE (1995) K*: an instance-based learner using an entropic distance measure. Machine learning proceedings 1995. Elsevier, pp 108–114
Elith J, Leathwick JR, Hastie T (2008) A working guide to boosted regression trees. J Anim Ecol 77(4):802–813
Awad M, Khanna R (2015) Efficient learning machines. Apress
Van Dyke Parunak H (1998) Book review: neural networks for pattern recognition by Christopher M. Bishop (Clarendon Press, 1995). ACM SIGART Bull 9(1):41–43
Tang B, Pan Z, Yin K et al (2019) Recent advances of deep learning in bioinformatics and computational biology. Front Genet 10:214–214
Hristoskova A, Boeva V, Tsiporkova E (2014) A formal concept analysis approach to consensus clustering of multi-experiment expression data. BMC Bioinform 15:151–151
Kirk P, Griffin JE, Savage RS et al (2012) Bayesian correlated clustering to integrate multiple datasets. Bioinformatics (Oxford, England) 28(24):3290–3297
Lock EF, Dunson DB (2013) Bayesian consensus clustering. Bioinformatics (Oxford, England) 29(20):2610–2616
Wang B, Mezlini AM, Demir F et al (2014) Similarity network fusion for aggregating data types on a genomic scale. Nat Methods 11(3):333–337
Freeman JL, Perry GH, Feuk L et al (2006) Copy number variation: new insights in genome diversity. Genome Res 16(8):949–961
Yuan Y, Savage RS, Markowetz F (2011) Patient-specific data fusion defines prognostic cancer subtypes. PLoS Comput Biol 7(10):e1002227–e1002227
Bonnet E, Calzone L, Michoel T (2015) Integrative multi-omics module network inference with lemon-tree. PLoS Comput Biol 11(2):e1003983–e1003983
Akavia UD, Litvin O, Kim J, et al (2009) Abstract B70: conexic: a Bayesian framework to detect drivers and their function uncovers an endosomal signature in melanoma. Poster presentations – proffered abstracts, American Association for Cancer Research
Draghici S, Potter RB (2003) Predicting HIV drug resistance with neural networks. Bioinformatics 19(1):98–107
Bavafaye Haghighi E, Knudsen M, Elmedal Laursen B et al (2019) Hierarchical classification of cancers of unknown primary using multi-omics data. Cancer Informat 18:1176935119872163–1176935119872163
Ma A, McDermaid A, Xu J et al (2020) Integrative methods and practical challenges for single-cell multi-omics. Trends Biotechnol 38(9):1007–1022
Shen HB, Chou KC (2006) Ensemble classifier for protein fold pattern recognition. Bioinformatics 22(14):1717–1722
Sharifi-Noghabi H, Zolotareva O, Collins CC et al (2019) MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics (Oxford, England) 35(14):i501–i509
Xu J, Wu P, Chen Y et al (2019) A hierarchical integration deep flexible neural forest framework for cancer subtype classification by integrating multi-omics data. BMC bioinformatics 20(1):527–527
Chung R-H, Kang C-Y (2019) A multi-omics data simulator for complex disease studies and its application to evaluate multi-omics data analysis methods for disease classification. GigaScience 8(5):giz045
Rappoport N, Shamir R (2019) NEMO: cancer subtyping by integration of partial multi-omic data. Bioinformatics (Oxford, England) 35(18):3348–3356
Speicher NK, Pfeifer N (2015) Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery. Bioinformatics (Oxford, England) 31(12):i268–i275
Tepeli YI, Ünal AB, Akdemir FM et al (2019) PAMOGK: a pathway graph kernel based multi-omics clustering approach for discovering cancer patient subgroups. Cold Spring Harbor, Laboratory
Kim S, Jhong J-H, Lee J et al (2017) Meta-analytic support vector machine for integrating multiple omics data. BioData mining 10:2–2
Lanckriet GRG, De Bie T, Cristianini N et al (2004) A statistical framework for genomic data fusion. Bioinformatics 20(16):2626–2635
Seoane JA, Day INM, Gaunt TR et al (2014) A pathway-based data integration framework for prediction of disease progression. Bioinformatics (Oxford, England) 30(6):838–845
Bowd C, Medeiros FA, Zhang Z et al (2005) Relevance vector machine and support vector machine classifier analysis of scanning laser polarimetry retinal nerve fiber layer measurements. Invest Ophthalmol Vis Sci 46(4):1322–1329
Zhou Y, Kantarcioglu M, Thuraisingham B (2012) Sparse Bayesian adversarial learning using relevance vector machine ensembles. 2012 IEEE 12th international conference on data mining. IEEE
Wu C-C, Asgharzadeh S, Triche TJ et al (2010) Prediction of human functional genetic networks from heterogeneous data using RVM-based ensemble learning. Bioinformatics (Oxford, England) 26(6):807–813
Giang T-T, Nguyen T-P, Tran D-H (2020) Stratifying patients using fast multiple kernel learning framework: case studies of Alzheimer’s disease and cancers. BMC Med Inform Decis Mak 20(1):108–108
Tsuda K, Shin H, Scholkopf B (2005) Fast protein classification with multiple networks. Bioinformatics 21(Suppl 2):ii59–ii65
Culp M, Michailidis G (2008) Graph-based semisupervised learning. IEEE Trans Pattern Anal Mach Intell 30(1):174–179
Kim D, Joung J-G, Sohn K-A et al (2015) Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction. J Am Med Inform Assoc: JAMIA 22(1):109–120
Bhardwaj A, Van Steen K (2020) Multi-omics data and analytics integration in ovarian cancer. IFIP Advances in Information and Communication Technology, Springer International Publishing, pp 347–357
Yue Z, Meng D, He J et al (2017) Semi-supervised learning through adaptive Laplacian graph trimming. Image Vis Comput 60:38–47
Shin H, Lisewski AM, Lichtarge O (2007) Graph sharpening plus graph integration: a synergy that improves protein functional classification. Bioinformatics 23(23):3217–3224
Shin H, Hill NJ, Lisewski AM et al (2010) Graph sharpening. Expert Syst Appl 37(12):7870–7879
Mostafavi S, Morris Q (2010) Fast integration of heterogeneous data sources for predicting gene function with limited annotation. Bioinformatics (Oxford, England) 26(14):1759–1765
Rhodes DR, Tomlins SA, Varambally S et al (2005) Probabilistic model of the human protein-protein interaction network. Nat Biotechnol 23(8):951–959
Wang T, Shao W, Huang Z et al (2020) MORONET: multi-omics integration via graph convolutional networks for biomedical data classification. Cold Spring Harbor, Laboratory
Chaudhary K, Poirion OB, Lu L et al (2018) Deep learning-based multi-omics integration robustly predicts survival in liver cancer. Clin Cancer Res 24(6):1248–1259
Xiang Q, Dai X (2008) Improving missing value imputation in microarray data by using gene regulatory information. 2008 2nd international conference on bioinformatics and biomedical engineering. IEEE
Bengio Y (2009) Learning deep architectures for AI. Now Publishers Inc.
Zhu J, Sova P, Xu Q et al (2012) Stitching together multiple data dimensions reveals interacting metabolomic and transcriptomic networks that modulate cell regulation. PLoS Biol 10(4):e1001301–e1001301
Liu W, Ma S, Fenyö D (2017) Pathway-level integration of proteogenomic data in breast cancer using independent component analysis. Cold Spring Harbor, Laboratory
Kaplan A, Lock EF (2017) Prediction with dimension reduction of multiple molecular data sources for patient survival. Cancer Informat 16:1176935117718517–1176935117718517
Grapov D, Wanichthanarak K, Fiehn O (2015) MetaMapR: pathway independent metabolomic network analysis incorporating unknowns. Bioinformatics (Oxford, England) 31(16):2757–2760
Grapov D, Fahrmann J, Wanichthanarak K et al (2018) Rise of deep learning for genomic, proteomic, and metabolomic data integration in precision medicine. Omics: J Integr Biol 22(10):630–636
Nguyen ND, Wang D (2020) Multiview learning for understanding functional multiomics. PLoS Comput Biol 16(4):e1007677–e1007677
Arjovsky M, Bottou L, Gulrajani I et al (2019) Invariant risk minimization. arXiv:1907.02893
Ma J, Yu MK, Fong S et al (2018) Using deep learning to model the hierarchical structure and function of a cell. Nat Methods 15(4):290–298
Tini G, Marchetti L, Priami C et al (2017) Multi-omics integration—a comparison of unsupervised clustering methodologies. Brief Bioinform 20(4):1269–1279
Picard M, Scott-Boyer M-P, Bodein A et al (2021) Integration strategies of multi-omics data for machine learning analysis. Comput Struct Biotechnol J 19:3735–3746
Nicora G, Vitali F, Dagliati A et al (2020) Integrated multi-omics analyses in oncology: a review of machine learning methods and tools. Front Oncol 10:1030–1030
Glass K, Huttenhower C, Quackenbush J et al (2013) Passing messages between biological networks to refine predicted interactions. PLoS One 8(5):e64832–e64832
Wahl S, Vogt S, Stückler F et al (2015) Multi-omic signature of body weight change: results from a population-based cohort study. BMC Med 13:48–48
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Niranjan, V., Uttarkar, A., Kaul, A., Varghese, M. (2023). A Machine Learning-Based Approach Using Multi-omics Data to Predict Metabolic Pathways. In: Selvarajoo, K. (eds) Computational Biology and Machine Learning for Metabolic Engineering and Synthetic Biology. Methods in Molecular Biology, vol 2553. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2617-7_19
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2617-7_19
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2616-0
Online ISBN: 978-1-0716-2617-7
eBook Packages: Springer Protocols