Genome-wide analysis of noncoding regulatory mutations in cancer

  • Analysis
  • Published:

From Nature Genetics

Cancer primarily develops because of somatic alterations in the genome. Advances in sequencing have enabled large-scale sequencing studies across many tumor types, emphasizing the discovery of alterations in protein-coding genes. However, the protein-coding exome comprises less than 2% of the human genome. Here we analyze the complete genome sequences of 863 human tumors from The Cancer Genome Atlas and other sources to systematically identify noncoding regions that are recurrently mutated in cancer. We use new frequency- and sequence-based approaches to comprehensively scan the genome for noncoding mutations with potential regulatory impact. These methods identify recurrent mutations in regulatory elements upstream of PLEKHS1, WDR74 and SDHD, as well as previously identified mutations in the TERT promoter. SDHD promoter mutations are frequent in melanoma and are associated with reduced gene expression and poor prognosis. The non-protein-coding cancer genome remains widely unexplored, and our findings represent a step toward targeting the entire genome for clinical purposes.

Figure 1: Summary of data and methods.
Figure 2: Hotspot analysis.
Figure 3: Regional recurrence analysis.
Figure 4: Transcription factor analysis.

The authors thank TCGA as well as numerous other groups who have made their data available for public analysis. The authors additionally thank the CGHub team, members of the Memorial Sloan Kettering Cancer Center Bioinformatics Core, G. Rätsch and J. Chodera for the setup and administration of high-performance computing resources. This work was supported in part by a TCGA GDAC grant from the US National Institutes of Health for N.W., N.S., C.S. and W.L. (NCI U24-CA143840) and grants from the Danish Research Council and the Carlsberg Foundation for A.J.

Author information

Authors and Affiliations



Project planning and design: N.W., A.J., N.S., C.S. and W.L. Method design and data analysis: N.W., A.J. and W.L. Manuscript writing and figures: N.W., A.J. and W.L. All authors reviewed and edited the final manuscript.

Corresponding author

Correspondence to William Lee.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Number of mutations per cancer type.

Total number of mutations per sample (including coding regions), grouped by cancer type. Supplementary Table 2 provides definitions for the cancer type abbreviations. The red line represents the 500,000-mutation limit, and the 5 samples above that line were filtered from further analysis.

Supplementary Figure 2 TERT expression analysis.

Expression differences between wild-type patient (without regulatory mutation) and patients affected by TERT promoter mutation for melanoma (top left), low-grade glioma (top right), glioblastoma (bottom left) and across all tumor types with TERT promoter mutation (bottom right). The wild-type (WT) set of samples in each panel consists of all samples without promoter mutation. The pan-cancer analysis excluded non-diploid samples from the wild-type cohort.

Supplementary Figure 3 PLEKHS1 promoter mutations are in a palindromic sequence.

Palindromic sequence containing mutations in the PLEKHS1 promoter (mutated positions are highlighted in red) and potential secondary structure for this sequence.

Supplementary Figure 4 PLEKHS1 expression analysis.

Expression differences between wild-type patient (without regulatory mutation) and patients affected by PLEKHS1 promoter mutation for bladder cancer (left) and samples from all tumor types (right). The pan-cancer analysis excluded non-diploid samples from the wild-type cohort.

Supplementary Figure 5 WDR74 expression analysis.

Expression differences between wild-type patient and patients affected by WDR74 promoter mutation for samples across all tumor types. The wild-type cohort only included samples that were diploid at the WDR74 locus.

Supplementary Figure 6 ELF1 transcription factor binding site.

Human ELF1 transcription factor binding site from the Jaspar database, showing the highly conserved TTCC core response element.

Supplementary Figure 7 Expression correlation between SDHD and transcription factors.

Left, mRNA expression levels are not correlated between SDHD and ETS1 in patients with SDHD promoter mutation (P = 0.21) and patients without SDHD promoter mutation (P = 0.69). Right, mRNA expression levels are not correlated between SDHD and EHF in patients with SDHD promoter mutation (P = 0.051) and patients without SDHD promoter mutation (P = 0.77).

Supplementary Figure 8 Hotspot analysis on individual tumor types compared to a pan-cancer cohort.

Hotspot analysis on six subsets of the total data set (astr, brca, brain (lgg + gbm), lihc, luad and medu) identifies potential cancer-specific candidates (NCOA7 in brca and ANKRD30BL in lihc), although the majority of hits are more apparent in a pan-cancer setting (WDR74).

Supplementary Figure 9 Regional recurrence analysis on individual tumor types compared to a pan-cancer cohort.

Regional recurrence analysis on six subsets of the total data set (astr, brca, brain (lgg + gbm), lihc, luad and medu) identifies potential cancer-specific candidates, such as the PADI2-PADI4 enhancer.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–9 (PDF 413 kb)

Supplementary Tables 1–18

Supplementary Tables 1–18 (XLSX 253 kb)

Source data

Weinhold, N., Jacobsen, A., Schultz, N. et al. Genome-wide analysis of noncoding regulatory mutations in cancer. Nat Genet 46, 1160–1165 (2014).

