Abstract
Cancer primarily develops because of somatic alterations in the genome. Advances in sequencing have enabled large-scale sequencing studies across many tumor types, emphasizing the discovery of alterations in protein-coding genes. However, the protein-coding exome comprises less than 2% of the human genome. Here we analyze the complete genome sequences of 863 human tumors from The Cancer Genome Atlas and other sources to systematically identify noncoding regions that are recurrently mutated in cancer. We use new frequency- and sequence-based approaches to comprehensively scan the genome for noncoding mutations with potential regulatory impact. These methods identify recurrent mutations in regulatory elements upstream of PLEKHS1, WDR74 and SDHD, as well as previously identified mutations in the TERT promoter. SDHD promoter mutations are frequent in melanoma and are associated with reduced gene expression and poor prognosis. The non-protein-coding cancer genome remains widely unexplored, and our findings represent a step toward targeting the entire genome for clinical purposes.
Similar content being viewed by others
References
Weinstein, J.N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).
Hudson, T.J. et al. International network of cancer genome projects. Nature 464, 993–998 (2010).
Kandoth, C. et al. Integrated genomic characterization of endometrial carcinoma. Nature 497, 67–73 (2013).
Church, D.N. et al. DNA polymerase ɛ and δ exonuclease domain mutations in endometrial cancer. Hum. Mol. Genet. 22, 2820–2828 (2013).
Cancer Genome Atlas Research Network. Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543–550 (2014).
Pao, W. & Girard, N. New driver mutations in non-small-cell lung cancer. Lancet Oncol. 12, 175–180 (2011).
ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).
Lee, W. et al. The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 465, 473–477 (2010).
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).
Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Huang, F.W. et al. Highly recurrent TERT promoter mutations in human melanoma. Science 339, 957–959 (2013).
Horn, S. et al. TERT promoter mutations in familial and sporadic melanoma. Science 339, 959–961 (2013).
Lehmann, K.V. & Chen, T. Exploring functional variant discovery in non-coding regions with SInBaD. Nucleic Acids Res. 41, e7 (2013).
Chapman, M.A. et al. Initial genome sequencing and analysis of multiple myeloma. Nature 471, 467–472 (2011).
Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Lawrence, M.S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Pleasance, E.D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010).
Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast tumours. Nature 490, 61–70 (2012).
Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
Vinagre, J. et al. Frequency of TERT promoter mutations in human cancers. Nat. Commun. 4, 2185 (2013).
Killela, P.J. et al. TERT promoter mutations occur frequently in gliomas and a subset of tumors derived from cells with low rates of self-renewal. Proc. Natl. Acad. Sci. USA 110, 6021–6026 (2013).
Mayer, B.J., Ren, R., Clark, K.L. & Baltimore, D. A putative modular domain present in diverse signaling proteins. Cell 73, 629–630 (1993).
Tsujimoto, Y., Finger, L.R., Yunis, J., Nowell, P.C. & Croce, C.M. Cloning of the chromosome breakpoint of neoplastic B cells with the t(14;18) chromosome translocation. Science 226, 1097–1099 (1984).
Virgilio, L. et al. Identification of the TCL1 gene involved in T-cell malignancies. Proc. Natl. Acad. Sci. USA 91, 12530–12534 (1994).
Mahajan, N.P. et al. Activated Cdc42-associated kinase Ack1 promotes prostate cancer progression via androgen receptor tyrosine phosphorylation. Proc. Natl. Acad. Sci. USA 104, 8438–8443 (2007).
Król, M. et al. Transcriptomic signature of cell lines isolated from canine mammary adenocarcinoma metastases to lungs. J. Appl. Genet. 51, 37–50 (2010).
Schneider, C., Pasqualucci, L. & Dalla-Favera, R. Molecular pathogenesis of diffuse large B-cell lymphoma. Semin. Diagn. Pathol. 28, 167–177 (2011).
Stirnimann, C.U., Petsalaki, E., Russell, R.B. & Muller, C.W. WD40 proteins propel cellular networks. Trends Biochem. Sci. 35, 565–574 (2010).
Thurman, R.E. et al. The accessible chromatin landscape of the human genome. Nature 489, 75–82 (2012).
Baysal, B.E. et al. Mutations in SDHD, a mitochondrial complex II gene, in hereditary paraganglioma. Science 287, 848–851 (2000).
Niemann, S. & Muller, U. Mutations in SDHC cause autosomal dominant paraganglioma, type 3. Nat. Genet. 26, 268–270 (2000).
Thukral, S.K., Eisen, A. & Young, E.T. Two monomers of yeast transcription factor ADR1 bind a palindromic sequence symmetrically to activate ADH2 expression. Mol. Cell. Biol. 11, 1566–1577 (1991).
Williams, T. & Tjian, R. Analysis of the DNA-binding and activation properties of the human transcription factor AP-2. Genes Dev. 5, 670–682 (1991).
Lawrence, M.S. et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature 505, 495–501 (2014).
Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).
Saunders, C.T. et al. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics 28, 1811–1817 (2012).
Hubbard, T. et al. The Ensembl genome database project. Nucleic Acids Res. 30, 38–41 (2002).
Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).
Quinlan, A.R. & Hall, I.M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Pollard, K.S., Dudoit, S. & Laan, M.J.v.d. in Bioinformatics Computational Biology Solutions Using R and Bioconductor (Springer, New York, 2005).
Chen, C.L. et al. Impact of replication timing on non-CpG and CpG substitution rates in mammalian genomes. Genome Res. 20, 447–457 (2010).
Hansen, R.S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proc. Natl. Acad. Sci. USA 107, 139–144 (2010).
Robinson, M.D., McCarthy, D.J. & Smyth, G.K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2010).
Gentleman, R.C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5, R80 (2004).
Acknowledgements
The authors thank TCGA as well as numerous other groups who have made their data available for public analysis. The authors additionally thank the CGHub team, members of the Memorial Sloan Kettering Cancer Center Bioinformatics Core, G. Rätsch and J. Chodera for the setup and administration of high-performance computing resources. This work was supported in part by a TCGA GDAC grant from the US National Institutes of Health for N.W., N.S., C.S. and W.L. (NCI U24-CA143840) and grants from the Danish Research Council and the Carlsberg Foundation for A.J.
Author information
Authors and Affiliations
Contributions
Project planning and design: N.W., A.J., N.S., C.S. and W.L. Method design and data analysis: N.W., A.J. and W.L. Manuscript writing and figures: N.W., A.J. and W.L. All authors reviewed and edited the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 Number of mutations per cancer type.
Total number of mutations per sample (including coding regions), grouped by cancer type. Supplementary Table 2 provides definitions for the cancer type abbreviations. The red line represents the 500,000-mutation limit, and the 5 samples above that line were filtered from further analysis.
Supplementary Figure 2 TERT expression analysis.
Expression differences between wild-type patient (without regulatory mutation) and patients affected by TERT promoter mutation for melanoma (top left), low-grade glioma (top right), glioblastoma (bottom left) and across all tumor types with TERT promoter mutation (bottom right). The wild-type (WT) set of samples in each panel consists of all samples without promoter mutation. The pan-cancer analysis excluded non-diploid samples from the wild-type cohort.
Supplementary Figure 3 PLEKHS1 promoter mutations are in a palindromic sequence.
Palindromic sequence containing mutations in the PLEKHS1 promoter (mutated positions are highlighted in red) and potential secondary structure for this sequence.
Supplementary Figure 4 PLEKHS1 expression analysis.
Expression differences between wild-type patient (without regulatory mutation) and patients affected by PLEKHS1 promoter mutation for bladder cancer (left) and samples from all tumor types (right). The pan-cancer analysis excluded non-diploid samples from the wild-type cohort.
Supplementary Figure 5 WDR74 expression analysis.
Expression differences between wild-type patient and patients affected by WDR74 promoter mutation for samples across all tumor types. The wild-type cohort only included samples that were diploid at the WDR74 locus.
Supplementary Figure 6 ELF1 transcription factor binding site.
Human ELF1 transcription factor binding site from the Jaspar database, showing the highly conserved TTCC core response element.
Supplementary Figure 7 Expression correlation between SDHD and transcription factors.
Left, mRNA expression levels are not correlated between SDHD and ETS1 in patients with SDHD promoter mutation (P = 0.21) and patients without SDHD promoter mutation (P = 0.69). Right, mRNA expression levels are not correlated between SDHD and EHF in patients with SDHD promoter mutation (P = 0.051) and patients without SDHD promoter mutation (P = 0.77).
Supplementary Figure 8 Hotspot analysis on individual tumor types compared to a pan-cancer cohort.
Hotspot analysis on six subsets of the total data set (astr, brca, brain (lgg + gbm), lihc, luad and medu) identifies potential cancer-specific candidates (NCOA7 in brca and ANKRD30BL in lihc), although the majority of hits are more apparent in a pan-cancer setting (WDR74).
Supplementary Figure 9 Regional recurrence analysis on individual tumor types compared to a pan-cancer cohort.
Regional recurrence analysis on six subsets of the total data set (astr, brca, brain (lgg + gbm), lihc, luad and medu) identifies potential cancer-specific candidates, such as the PADI2-PADI4 enhancer.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–9 (PDF 413 kb)
Supplementary Tables 1–18
Supplementary Tables 1–18 (XLSX 253 kb)
Source data
Rights and permissions
About this article
Cite this article
Weinhold, N., Jacobsen, A., Schultz, N. et al. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat Genet 46, 1160–1165 (2014). https://doi.org/10.1038/ng.3101
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3101
- Springer Nature America, Inc.
This article is cited by
-
Discovery of a non-canonical GRHL1 binding site using deep convolutional and recurrent neural networks
BMC Genomics (2023)
-
Evaluation of vital genes correlated with CD8 + T cell infiltration as prognostic biomarkers in stomach adenocarcinoma
BMC Gastroenterology (2023)
-
Model selection and robust inference of mutational signatures using Negative Binomial non-negative matrix factorization
BMC Bioinformatics (2023)
-
Prospective Identification of Prognostic Hot-Spot Mutant Gene Signatures for Leukemia: A Computational Study Based on Integrative Analysis of TCGA and cBioPortal Data
Molecular Biotechnology (2023)
-
WDR74 facilitates TGF-β/Smad pathway activation to promote M2 macrophage polarization and diabetic foot ulcer wound healing in mice
Cell Biology and Toxicology (2023)