Machine Learning from Omics Data

Rex, René

doi:10.1007/978-1-0716-1787-8_18

René Rex³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2390))

6073 Accesses
1 Altmetric

Abstract

Machine learning (ML) already accelerates discoveries in many scientific fields and is the driver behind several new products. Recently, growing sample sizes enabled the use of ML approaches in larger omics studies. This work provides a guide through a typical analysis of an omics dataset using ML. As an example, this chapter demonstrates how to build a model predicting Drug-Induced Liver Injury based on transcriptomics data contained in the LINCS L1000 dataset. Each section covers best practices and pitfalls starting from data exploration and model training including hyperparameter search to validation and analysis of the final model. The code to reproduce the results is available at https://github.com/Evotec-Bioinformatics/ml-from-omics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Harnessing big ‘omics’ data and AI for drug discovery in hepatocellular carcinoma

Article 03 January 2020

Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

Article Open access 15 January 2020

Computational Techniques and Tools for Omics Data Analysis: State-of-the-Art, Challenges, and Future Directions

Article 01 February 2021

References

Subramanian A, Narayan R, Corsello SM et al (2017) A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171:1437–1452.e17
Article CAS Google Scholar
Liu Z, Thakkar S (2020) Deep learning on high-throughput transcriptomics to predict drug-induced liver injury. Front Bioeng Biotechnol 8:14
Article Google Scholar
Walker PA, Ryder S, Lavado A et al (2020) The evolution of strategies to minimise the risk of human drug-induced liver injury (DILI) in drug discovery and development. Arch Toxicol 94:2559–2585
Article Google Scholar
Leek J, Scharpf R, Bravo H et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739
Article CAS Google Scholar
McInnes L and Healy J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. ArXiv abs/1802.03426
Google Scholar
Narayan A, Berger B, Cho H (2020) Density-preserving data visualization unveils dynamic patterns of single-cell transcriptomic variability. bioRxiv
Google Scholar
Van Rossum G, Drake FL (2009) Python 3 reference manual. CreateSpace, Scotts Valley, CA
Google Scholar
Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Google Scholar
Li L, Jamieson K, DeSalvo G et al (2017) Hyperband: a novel bandit-based approach to Hyperparameter optimization. J Mach Learn Res 18:185:1–185:52
Google Scholar
Falkner S, Klein A, Hutter F (2018) BOHB: robust and efficient hyperparameter optimization at scale. In: Dy J, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning. PMLR, Stockholmsmässan, Stockholm Sweden, pp 1437–1446
Google Scholar
Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21:6
Article Google Scholar
Institute of Medicine (2012) Evolution of translational omics: lessons learned and the path forward. The National Academies Press, Washington, DC
Google Scholar
Carbon S, Douglass E, Good BM et al (2021) The gene ontology resource: enriching a GOld mine. Nucleic Acids Res 49:D325–D334
Article CAS Google Scholar

Download references

Acknowledgments

I am deeply grateful to my wife Sophia Rex and others for proofreading the final version of the manuscript. Additionally, I acknowledge the indispensable support of my parents Carmen and Michael Rex during the ongoing pandemic. Furthermore, I thank Thomas Siegmund, who is my superior at Evotec, for his constant encouragement and for giving me the opportunity to create this work.

Author information

Authors and Affiliations

Evotec International GmbH, Göttingen, Germany
René Rex

Authors

René Rex
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to René Rex .

Editor information

Editors and Affiliations

Computational Drug Discovery, Evotec (UK) Ltd., Abingdon, Oxfordshire, UK
Alexander Heifetz

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Rex, R. (2022). Machine Learning from Omics Data . In: Heifetz, A. (eds) Artificial Intelligence in Drug Design. Methods in Molecular Biology, vol 2390. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1787-8_18

Download citation

DOI: https://doi.org/10.1007/978-1-0716-1787-8_18
Published: 04 November 2021
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1786-1
Online ISBN: 978-1-0716-1787-8
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Machine Learning from Omics Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Harnessing big ‘omics’ data and AI for drug discovery in hepatocellular carcinoma

Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

Computational Techniques and Tools for Omics Data Analysis: State-of-the-Art, Challenges, and Future Directions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Machine Learning from Omics Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Harnessing big ‘omics’ data and AI for drug discovery in hepatocellular carcinoma

Diverse approaches to predicting drug-induced liver injury using gene-expression profiles

Computational Techniques and Tools for Omics Data Analysis: State-of-the-Art, Challenges, and Future Directions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation