Skip to main content

Machine Learning from Omics Data

  • Protocol
  • First Online:
Artificial Intelligence in Drug Design

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2390))

Abstract

Machine learning (ML) already accelerates discoveries in many scientific fields and is the driver behind several new products. Recently, growing sample sizes enabled the use of ML approaches in larger omics studies. This work provides a guide through a typical analysis of an omics dataset using ML. As an example, this chapter demonstrates how to build a model predicting Drug-Induced Liver Injury based on transcriptomics data contained in the LINCS L1000 dataset. Each section covers best practices and pitfalls starting from data exploration and model training including hyperparameter search to validation and analysis of the final model. The code to reproduce the results is available at https://github.com/Evotec-Bioinformatics/ml-from-omics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Subramanian A, Narayan R, Corsello SM et al (2017) A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171:1437–1452.e17

    Article  CAS  Google Scholar 

  2. Liu Z, Thakkar S (2020) Deep learning on high-throughput transcriptomics to predict drug-induced liver injury. Front Bioeng Biotechnol 8:14

    Article  Google Scholar 

  3. Walker PA, Ryder S, Lavado A et al (2020) The evolution of strategies to minimise the risk of human drug-induced liver injury (DILI) in drug discovery and development. Arch Toxicol 94:2559–2585

    Article  Google Scholar 

  4. Leek J, Scharpf R, Bravo H et al (2010) Tackling the widespread and critical impact of batch effects in high-throughput data. Nat Rev Genet 11:733–739

    Article  CAS  Google Scholar 

  5. McInnes L and Healy J (2018) UMAP: uniform manifold approximation and projection for dimension reduction. ArXiv abs/1802.03426

    Google Scholar 

  6. Narayan A, Berger B, Cho H (2020) Density-preserving data visualization unveils dynamic patterns of single-cell transcriptomic variability. bioRxiv

    Google Scholar 

  7. Van Rossum G, Drake FL (2009) Python 3 reference manual. CreateSpace, Scotts Valley, CA

    Google Scholar 

  8. Pedregosa F, Varoquaux G, Gramfort A et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  9. Li L, Jamieson K, DeSalvo G et al (2017) Hyperband: a novel bandit-based approach to Hyperparameter optimization. J Mach Learn Res 18:185:1–185:52

    Google Scholar 

  10. Falkner S, Klein A, Hutter F (2018) BOHB: robust and efficient hyperparameter optimization at scale. In: Dy J, Krause A (eds) Proceedings of the 35th International Conference on Machine Learning. PMLR, Stockholmsmässan, Stockholm Sweden, pp 1437–1446

    Google Scholar 

  11. Chicco D, Jurman G (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21:6

    Article  Google Scholar 

  12. Institute of Medicine (2012) Evolution of translational omics: lessons learned and the path forward. The National Academies Press, Washington, DC

    Google Scholar 

  13. Carbon S, Douglass E, Good BM et al (2021) The gene ontology resource: enriching a GOld mine. Nucleic Acids Res 49:D325–D334

    Article  CAS  Google Scholar 

Download references

Acknowledgments

I am deeply grateful to my wife Sophia Rex and others for proofreading the final version of the manuscript. Additionally, I acknowledge the indispensable support of my parents Carmen and Michael Rex during the ongoing pandemic. Furthermore, I thank Thomas Siegmund, who is my superior at Evotec, for his constant encouragement and for giving me the opportunity to create this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to René Rex .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Rex, R. (2022). Machine Learning from Omics Data . In: Heifetz, A. (eds) Artificial Intelligence in Drug Design. Methods in Molecular Biology, vol 2390. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1787-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-1787-8_18

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-1786-1

  • Online ISBN: 978-1-0716-1787-8

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics