Abstract
The massive amount of experimental DNA and RNA sequence information provides an encyclopedia for cell biology that requires computational tools for efficient interpretation. The ability to write and apply simple computing scripts propels the investigator beyond the boundaries of online analysis tools to more broadly interrogate laboratory experimental data and to integrate them with all available datasets to test and challenge hypotheses. Here we describe robust prototypic bash and C++ scripts with metrics and methods for validation that we have made publicly available to address the roles of non-B DNA-forming motifs in eliciting genetic instability and to query The Cancer Genome Atlas. Importantly, the methods presented provide practical data interpretation tools to examine fundamental relationships and to enable insights and correlations between alterations in gene expression patterns and patient outcome. The exemplary source codes described are simple and can be efficiently modified, elaborated, and applied to other relationships and areas of investigation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pucker B, Schilbert HM, Schumacher SF (2019) Integrating molecular biology and bioinformatics education. J Integr Bioinform 16:20190005
Houl JH, Ye Z, Brosey CA, Balapiti-Modarage LPF, Namjoshi S, Bacolla A, Laverty D, Walker BL, Pourfarjam Y, Warden LS et al (2019) Selective small molecule PARG inhibitor causes replication fork stalling and cancer cell death. Nat Commun 10:5654
Eckelmann BJ, Bacolla A, Wang H, Ye Z, Guerrero EN, Jiang W, El-Zein R, Hegde ML, Tomkinson AE, Tainer JA et al (2020) XRCC1 promotes replication restart, nascent fork degradation and mutagenic DNA repair in BRCA2-deficient cells. NAR. Cancer 2:zcaa013
Lees-Miller JP, Cobban A, Katsonis P, Bacolla A, Tsutakawa SE, Hammel M, Meek K, Anderson DW, Lichtarge O, Tainer JA et al (2020) Uncovering DNA-PKcs ancient phylogeny, unique sequence motifs and insights for human disease. Prog Biophys Mol Biol 163:87–108
Consortium ITP-CAoWG (2020) Pan-cancer analysis of whole genomes. Nature 578:82–93
Gerstung M, Jolly C, Leshchiner I, Dentro SC, Gonzalez S, Rosebrock D, Mitchell TJ, Rubanova Y, Anur P, Yu K et al (2020) The evolutionary history of 2,658 cancers. Nature 578:122–128
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A, Jordan T, Shakir K, Roazen D, Thibault J et al (2013) From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics 43:11 10 11–11 10 33
Franke KR, Crowgey EL (2020) Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for genome analysis toolkit algorithms. Genomics Inform 18:e10
Hesketh AR (2019) RNA sequencing best practices: experimental protocol and data analysis. Methods Mol Biol 2049:113–129
Vieth B, Parekh S, Ziegenhain C, Enard W, Hellmann I (2019) A systematic evaluation of single cell RNA-seq analysis pipelines. Nat Commun 10:4667
van Wietmarschen N, Sridharan S, Nathan WJ, Tubbs A, Chan EM, Callen E, Wu W, Belinky F, Tripathi V, Wong N et al (2020) Repeat expansions confer WRN dependence in microsatellite-unstable cancers. Nature 586:292–298
McKinney JA, Wang G, Vasquez KM (2020) Distinct mechanisms of mutagenic processing of alternative DNA structures by repair proteins. Mol Cell Oncol 7:1743807
Berroyer A, Kim N (2020) The functional consequences of eukaryotic topoisomerase 1 interaction with G-quadruplex DNA. Genes 11:193
Bacolla A, Tainer JA, Vasquez KM, Cooper DN (2016) Translocation and deletion breakpoints in cancer genomes are associated with potential non-B DNA-forming sequences. Nucleic Acids Res 44:5673–5688
Puig Lombardi E, Londono-Vallejo A (2020) A guide to computational methods for G-quadruplex prediction. Nucleic Acids Res 48:1–15
Cer RZ, Donohue DE, Mudunuri US, Temiz NA, Loss MA, Starner NJ, Halusa GN, Volfovsky N, Yi M, Luke BT et al (2013) Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic Acids Res 41:D94–D100
Brazda V, Kolomaznik J, Lysek J, Haronikova L, Coufal J, St'astny J (2016) Palindrome analyser - a new web-based server for predicting and evaluating inverted repeats in nucleotide sequences. Biochem Biophys Res Commun 478:1739–1745
Buske FA, Bauer DC, Mattick JS, Bailey TL (2012) Triplexator: detecting nucleic acid triple helices in genomic and transcriptomic data. Genome Res 22:1372–1381
Hon J, Martinek T, Rajdl K, Lexa M (2013) Triplex: an R/Bioconductor package for identification and visualization of potential intramolecular triplex patterns in DNA sequences. Bioinformatics 29:1900–1901
Bacolla A, Ye Z, Ahmed Z, Tainer JA (2019) Cancer mutational burden is shaped by G4 DNA, replication stress and mitochondrial dysfunction. Prog Biophys Mol Biol 147:47–61
Zhao J, Wang G, Del Mundo IM, McKinney JA, Lu X, Bacolla A, Boulware SB, Zhang C, Zhang H, Ren P et al (2018) Distinct mechanisms of nuclease-directed DNA-structure-induced genetic instability in cancer genomes. Cell Rep 22:1200–1210
Seo SH, Bacolla A, Yoo D, Koo YJ, Cho SI, Kim MJ, Seong MW, Kim HJ, Kim JM, Tainer JA et al (2020) Replication-based rearrangements are a common mechanism for SNCA duplication in Parkinson's disease. Mov Disord 35:868–876
Bacolla A, Sengupta S, Ye Z, Yang C, Mitra J, De-Paula RB, Hegde ML, Ahmed Z, Mort M, Cooper DN et al (2021) Heritable pattern of oxidized DNA base repair coincides with pre-targeting of repair complexes to open chromatin. Nucleic Acids Res 49:221–243
Singh M, Bacolla A, Chaudhary S, Hunt CR, Pandita S, Chauhan R, Gupta A, Tainer JA, Pandita TK (2020) Histone acetyltransferase MOF orchestrates outcomes at the crossroad of oncogenesis, DNA damage response, proliferation, and stem cell development. Mol Cell Biol 40
Acknowledgments
This work was supported by grants from the National Institutes of Health P01 CA092584, R35 CA220430, by the Cancer Prevention and Research Institute of Texas RP180813 and by a Robert A. Welch Chemistry Chair to J.A.T. The research used the Bridges/Bridges2 Pittsburgh Supercomputing Center through the Extreme Science and Engineering Discovery Environment (XSEDE), which are supported by the National Science Foundation grants ACI-1445606 and ACI-1548562, and the Texas Advanced Computing Center, supported by National Science Foundation grant ACI-1134872.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Bacolla, A., Tainer, J.A. (2022). Robust Computational Approaches to Defining Insights on the Interface of DNA Repair with Replication and Transcription in Cancer. In: Mosammaparast, N. (eds) DNA Damage Responses. Methods in Molecular Biology, vol 2444. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2063-2_1
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2063-2_1
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2062-5
Online ISBN: 978-1-0716-2063-2
eBook Packages: Springer Protocols