Abstract
Since its inception, deep learning has revolutionized the field of machine learning and data-driven science. One such data-driven science to be transformed by deep learning is genomics. In the past decade, numerous genomics studies have adopted deep learning and its applications range from predicting regulatory elements to cancer classification. Despite its dominating efficacy in these applications, deep learning is not without drawbacks. A prominent shortcoming of deep learning is the lack of interpretability. Hence, the main objective of this study is to address this obstacle in the deep learning cancer classification. Here we adopt a feature importance scoring methodology (Gradient-based class activation mapping or Grad-CAM) on a quasi-recurrent neural network model that classify cancer based on FASTA sequencing data. In this study, we managed to formulate a nucleotide-to-genomic-region Grad-CAM scoring methodology, as well as, validate the use this methodology for the chosen model. Consequently, this allows for the utilization of the Grad-CAM scoring methodology for feature importance in deep learning cancer classification. The results from our study identify potential novel candidate genes, genomic elements, and mechanisms for future cancer research.
Yue Yang (Alan) Teo and Artem Danilevsky are equal contributors
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. The MIT Press
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ (eds) Advances in neural information processing systems 25. Curran Associates, Inc., pp 1097–1105
Ciregan D, Meier U, Schmidhuber J (2012) Multi-column deep neural networks for image classification. In: 2012 IEEE conference on computer vision and pattern recognition, Providence, RI, pp 3642–3649. https://doi.org/10.1109/CVPR.2012.6248110
Hinton G, Deng L, Yu D et al (2012) Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process Mag 29(6):82–97. https://doi.org/10.1109/MSP.2012.2205597
Morgan N, Bourlard H, Renals S et al (1993) Hybrid neural network/hidden markov model systems for continuous speech recognition. Intern J Pattern Recognit Artif Intell 07(04):899–916. https://doi.org/10.1142/S0218001493000455
Lee C-H (2009) Developments and directions in speech recognition and understanding, part 1. IEEE Signal Process Mag 26(3):75–80
Eraslan G, Avsec Ž, Gagneur J et al (2019) Deep learning: new computational modelling techniques for genomics. Nat Rev Genet 20(7):389–403. https://doi.org/10.1038/s41576-019-0122-6
Kelley DR, Snoek J, Rinn JL (2016) Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res 26(7):990–999. https://doi.org/10.1101/gr.200535.115
Zhou J, Troyanskaya OG (2015) Predicting effects of noncoding variants with deep learning-based sequence model. Nat Methods 12(10):931–934. https://doi.org/10.1038/nmeth.3547
Kelley DR, Reshef YA, Bileschi M et al (2018) Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 28(5):739–750. https://doi.org/10.1101/gr.227819.117
Angermueller C, Lee HJ, Reik W et al (2017) DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol 18(1):67. https://doi.org/10.1186/s13059-017-1189-z
Zeng H, Gifford DK (2017) Predicting the impact of non-coding variants on DNA methylation. Nucleic Acids Res 45(11):e99. https://doi.org/10.1093/nar/gkx177
Rhee S, Seo S, Kim S (2018) Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the twenty-seventh international joint conference on artificial intelligence, pp 3527–3534
Wang M, Tai C, Weinan E et al (2018) DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res 46(11):e69. https://doi.org/10.1093/nar/gky215
Zhou B, Khosla A, Lapedriza A, et al (2015) Learning deep features for discriminative localization, arXiv:1512.04150 [cs]
Alipanahi B, Delong A, Weirauch MT et al (2015) Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol 33(8):831–838. https://doi.org/10.1038/nbt.3300
Greenside P, Shimko T, Fordyce P et al (2018) Discovering epistatic feature interactions from neural network models of regulatory DNA sequences. Bioinformatics 34(17):i629–i637. https://doi.org/10.1093/bioinformatics/bty575
Selvaraju RR, Cogswell M, Das A et al (2020) Grad-CAM: visual explanations from deep networks via gradient-based localization. Int J Comput Vis 128(2):336–359. https://doi.org/10.1007/s11263-019-01228-7
Lyu B, Haque A (2018) Deep learning based tumor type classification using gene expression data, bioRxiv, p 364323. https://doi.org/10.1101/364323
Conesa A, Madrigal P, Tarazona S et al (2016) A survey of best practices for RNA-seq data analysis. Genome Biol 17(1):1–19. https://doi.org/10.1186/s13059-016-0881-8
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95. https://doi.org/10.1109/MCSE.2007.55
Virtanen P, Gommers R, Oliphant TE, et al (2019) SciPy 1.0--Fundamental Algorithms for Scientific Computing in Python, arXiv:1907.10121 [physics]
van der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22–30. https://doi.org/10.1109/MCSE.2011.37
Bradski G (2000) The OpenCV library. Dr. Dobb’s J Software Tools 120:122–125
Phallen J, Sausen M, Adleff V et al (2017) Direct detection of early-stage cancers using circulating tumor DNA. Sci Transl Med 9(403):eaan2415. https://doi.org/10.1126/scitranslmed.aan2415
Leech NL, Barrett KC, Morgan GA et al (2014) IBM SPSS for intermediate statistics: use and interpretation, 5th edn. Routledge, New York
Mi H, Muruganujan A, Ebert D et al (Jan. 2019) PANTHER version 14: more genomes, a new PANTHER GO-slim and improvements in enrichment analysis tools. Nucleic Acids Res 47(D1):D419–D426. https://doi.org/10.1093/nar/gky1038
Proenca CC, Gao KP, Shmelkov SV et al (2011) Slitrks as emerging candidate genes involved in neuropsychiatric disorders. Trends Neurosci 34(3):143. https://doi.org/10.1016/j.tins.2011.01.001
Chano T, Kita H, Avnet S et al (2018) Prominent role of RAB39A-RXRB axis in cancer development and stemness. Oncotarget 9(11):9852–9866. https://doi.org/10.18632/oncotarget.23955
Peschansky VJ, Wahlestedt C (Jan. 2014) Non-coding RNAs as direct and indirect modulators of epigenetic regulation. Epigenetics 9(1):3–12. https://doi.org/10.4161/epi.27473
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Teo, Y.Y.(., Danilevsky, A., Shomron, N. (2021). Overcoming Interpretability in Deep Learning Cancer Classification. In: Shomron, N. (eds) Deep Sequencing Data Analysis. Methods in Molecular Biology, vol 2243. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-1103-6_15
Download citation
DOI: https://doi.org/10.1007/978-1-0716-1103-6_15
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-1102-9
Online ISBN: 978-1-0716-1103-6
eBook Packages: Springer Protocols