Abstract
Deep learning has emerged as a powerful tool for solving complex problems, including reconstruction of gene regulatory networks within the realm of biology. These networks consist of transcription factors and their associations with genes they regulate. Despite the utility of deep learning methods in studying gene expression and regulation, their accessibility remains limited for biologists, mainly due to the prerequisites of programming skills and a nuanced grasp of the underlying algorithms. This chapter presents a deep learning protocol that utilize TensorFlow and the Keras API in R/RStudio, with the aim of making deep learning accessible for individuals without specialized expertise. The protocol focuses on the genome-wide prediction of regulatory interactions between transcription factors and genes, leveraging publicly available gene expression data in conjunction with well-established benchmarks. The protocol encompasses pivotal phases including data preprocessing, conceptualization of neural network architectures, iterative processes of model training and validation, as well as forecasting of novel regulatory associations. Furthermore, it provides insights into parameter tuning for deep learning models. By adhering to this protocol, researchers are expected to gain a comprehensive understanding of applying deep learning techniques to predict regulatory interactions. This protocol can be readily modifiable to serve diverse research problems, thereby empowering scientists to effectively harness the capabilities of deep learning in their investigations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G (2012) Wisdom of crowds for robust gene network inference. Nat Methods 9:796–804
Li Z, Gao E, Zhou J, Han W, Xu X, Gao X (2023) Applications of deep learning in understanding gene regulation. Cell Rep Methods 3:100384
Muley VY, König R (2022) Human transcriptional gene regulatory network compiled from 14 data resources. Biochimie 193:115–125
Dynan WS, Tjian R (1983) The promoter-specific transcription factor Sp1 binds to upstream sequences in the SV40 early promoter. Cell 35(1):79–87
Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM (2009) A census of human transcription factors: function, expression and evolution. Nat Rev Genet 10:252–263
Muley VY, López-Victorio CJ, Ayala-Sumuano JT, González-Gallardo A, González-Santos L, Lozano-Flores C, Wray G, Hernández-Rosales M, Varela-Echavarría A (2020) Conserved and divergent expression dynamics during early patterning of the telencephalon in mouse and chick embryos. Prog Neurobiol 186:101735
Levine M, Tjian R (2003) Transcription regulation and animal diversity. Nature 424:147–151
Salah FS, Ebbinghaus M, Muley VY, Zhou Z, Al-Saadi KR, Pacyna-Gengelbach M, O’Sullivan GA, Betz H, König R, Wang ZQ, Bräuer R (2016) Tumor suppression in mice lacking GABARAP, an Atg8/LC3 family member implicated in autophagy, is associated with alterations in cytokine secretion and cell death. Cell Death Dis 7(4):e2205
Greener JG, Kandathil SM, Moffat L, Jones DT (2022) A guide to machine learning for biologists. Nat Rev Mol Cell Biol 23(1):40–55
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, Holko M, Yefanov A (2012) NCBI GEO: archive for functional genomics data sets—update. Nucleic Acids Res 41(D1):D991–D995
Barzel B, Barabási AL (2013) Network link prediction by global silencing of indirect correlations. Nat Biotechnol 31(8):720–725
Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
Cybenko G (1918) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314
Allaire J, Chollet F (2023) Keras: R interface to ’keras’
Allaire J, Tang Y (2022) Tensorflow: R interface to ’TensorFlow’
Deribe YL, Pawson T, Dikic I (2010) Post-translational modifications in signal integration. Nat Struct Mol Biol 17(6):666–672
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Gal Y, Ghahramani Z (2016) Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International conference on machine learning, pp 1050–1059
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning, pp 448–456
Sarkans U, Füllgrabe A, Ali A, Athar A, Behrangi E, Diaz N, Fexova S, George N, Iqbal H, Kurri S, Munoz J (2021) From arrayexpress to biostudies. Nucleic Acids Res 49(D1):D1502–D1506
Osorio D, Kuijjer ML, Cai JJ (2022) rPanglaoDB: an R package to download and merge labeled single-cell RNA-seq data from the PanglaoDB database. Bioinformatics 38(2):580–582
Collado-Torres L, Nellore A, Jaffe AE (2017) Recount workflow: accessing over 70,000 human RNA-seq samples with bioconductor. F1000Research 6:1558
Lizio M, Harshbarger J, Shimoji H, Severin J, Kasukawa T, Sahin S, Abugessaisa I, Fukuda S, Hori F, Ishikawa-Kato S, Mungall CJ (2015) Gateways to the FANTOM5 promoter level mammalian expression atlas. Genome Biol 16:1–4
GTEx Consortium (2020) The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369(6509):1318–1330
Katz K, Shutov O, Lapoint R, Kimelman M, Brister JR, O’Sullivan C (2022) The sequence read archive: a decade more of explosive growth. Nucleic Acids Res 50(D1):D387–D390
Robinson MD, Oshlack A (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol 11(3):1–9
Smyth GK, Speed T (2003) Normalization of cDNA microarray data. Methods 31(4):265–273
Heumos L, Schaar AC, Lance C, Litinetskaya A, Drost F, Zappia L, Lücken MD, Strobl DC, Henao J, Curion F (2023) Best practices for single-cell analysis across modalities. Nat Rev Genet:1–23
Garcia-Alonso L, Holland CH, Ibrahim MM, Turei D, Saez-Rodriguez J (2019) Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res 29(8):1363–1375
Liska O, Bohár B, Hidas A, Korcsmáros T, Papp B, Fazekas D, Ari E (2022) TFLink: an integrated gateway to access transcription factor–target gene interactions for multiple species. Database 2022:baac083
Mathelier A, Zhao X, Zhang AW, Parcy F, Worsley-Hunt R, Arenillas DJ, Buchman S, Chen CY, Chou A, Ienasescu H, Lim J (2014) JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42(D1):D142–D147
Johnson JM, Khoshgoftaar TM (2019) Survey on deep learning with class imbalance. J Big Data 6(1):1–54
Kuhn M (2008) Building predictive models in R using the caret package. J Stat Softw 28:1–26
Lunardon N, Menardi G, Torelli N (2014) ROSE: a package for binary imbalanced learning. R J 6(1):79–89
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–327
Lippmann R (1987) An introduction to computing with neural nets. IEEE Assp Mag 4(2):4–22
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Muley, V.Y. (2024). Deep Learning for Predicting Gene Regulatory Networks: A Step-by-Step Protocol in R. In: Mandal, S. (eds) Reverse Engineering of Regulatory Networks. Methods in Molecular Biology, vol 2719. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-3461-5_15
Download citation
DOI: https://doi.org/10.1007/978-1-0716-3461-5_15
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-3460-8
Online ISBN: 978-1-0716-3461-5
eBook Packages: Springer Protocols