Abstract
Chromosomal instability (CIN) results in the accumulation of large-scale losses, gains and rearrangements of DNA1. The broad genomic complexity caused by CIN is a hallmark of cancer2; however, there is no systematic framework to measure different types of CIN and their effect on clinical phenotypes pan-cancer. Here we evaluate the extent, diversity and origin of CIN across 7,880 tumours representing 33 cancer types. We present a compendium of 17 copy number signatures that characterize specific types of CIN, with putative aetiologies supported by multiple independent data sources. The signatures predict drug response and identify new drug targets. Our framework refines the understanding of impaired homologous recombination, which is one of the most therapeutically targetable types of CIN. Our results illuminate a fundamental structure underlying genomic complexity in human cancers and provide a resource to guide future CIN research.
Similar content being viewed by others
Main
CIN has complex consequences, including loss or amplification of driver genes, focal rearrangements, extrachromosomal DNA, micronuclei formation and activation of innate immune signalling1. This leads to associations with disease stage, metastasis, poor prognosis and therapeutic resistance3. The causes of CIN are also diverse and include mitotic errors, replication stress, homologous recombination deficiency (HRD), telomere crisis and breakage fusion bridge cycles, among others1,4.
Because of the diversity of these causes and consequences, CIN is generally used as an umbrella term. Measures of CIN either divide tumours into broad categories of high or low CIN5, are restricted to a single aetiology such as HRD6, are limited to a particular genomic feature such as whole-chromosome-arm changes7, or can only be quantified in specific cancer types8,9. As a result, there is no systematic framework to comprehensively characterize the diversity, extent and origins of CIN pan-cancer, or to define how different types of CIN within a tumour relate to clinical phenotypes. Here we present a robust analysis framework to quantitatively measure different types of CIN across cancer types.
Deconstructing CIN
We derived 7,880 high-quality absolute copy number profiles across 33 tumour types using single-nucleotide polymorphism (SNP) array data from The Cancer Genome Atlas (TCGA) (Extended Data Fig. 1a). Extending our previously developed framework for quantifying signatures of CIN in ovarian cancer8, we determined that 6,335 of the 7,880 samples (80%) had detectable CIN and were suitable for pan-cancer detection of copy number signatures (Extended Data Fig. 1b). This estimate was consistent with previous pan-cancer estimates of CIN10 (Extended Data Fig. 1c–e).
Using these 6,335 genome-wide copy number profiles, we computed distributions of five fundamental copy number features previously demonstrated to encode patterns of copy number changes that represent different underlying causes of CIN8 (Extended Data Fig. 2a and Supplementary Methods). These features included: the copy number change between a segment and the neighbouring segment; segment length; breakpoint count per 10 Mb; breakpoint count per chromosome arm; and length of chains of oscillating copy number states. Only segments that deviated from a normal, diploid state were considered for the segment size and changepoint features. We did not include a feature representing the copy number of a segment to avoid redundant signatures that encode the same aetiology across different ploidy backgrounds.
We applied mixture modelling to define distinct components for each cohort-wide feature distribution, identifying a total of 43 mixture components across the 5 features (Extended Data Fig. 2b, c and Supplementary Methods). Conceptually, these components represent the basic building blocks for defining CIN processes. We used these mixture components to encode each tumour genome by probabilistically assigning copy number events to these components, resulting in a 6,335 × 43 dimensional matrix. We then applied a Bayesian implementation of non-negative matrix factorization to identify copy number signatures (Extended Data Figs. 2d and 3a, b). We first used the complete matrix and found 10 pan-cancer copy number signatures, then used subsets of the matrix representing individual cancer types with at least 100 samples, and found an additional 7 signatures (Extended Data Fig. 3b–e and Supplementary Methods). We merged both sets of signatures and computed their activities using linear combination decomposition to yield a pan-cancer compendium of 17 copy number signatures and their activities in tumours across the 33 cancer types (Extended Data Figs. 3f, g and 4 and Supplementary Figs. 1 and 2).
We validated this approach by correctly identifying signatures in a collection of simulated cancer genomes with copy number changes caused by five well-studied mutational processes (Supplementary Figs. 3–6 and Supplementary Methods). We used a second simulation study to derive signature-specific activity thresholds, to test the stability of signature definitions and to test the stability of signature activities (Methods, Extended Data Fig. 5 and Supplementary Fig. 7). We then tested the robustness of our approach across different high-throughput technologies comparing signature definitions and activities across five platforms: SNP 6.0 without matched normal, whole-genome sequencing (WGS) downsampled to SNP 6.0 positions, WGS downsampled to shallow WGS, on-target whole-exome sequencing (WES) and off-target WES. Quantification of signature activity was robust across all platforms. Signature identification was possible across the WGS platforms but performance deteriorated for WES (Extended Data Fig. 6).
Putative causes underlying each signature
To determine the putative causes underlying each of the 17 signatures (named CX1 to CX17), we developed a data integration framework and assigned a confidence score to each signature aetiology based on the quality and extent of supporting data (Extended Data Fig. 7). To propose putative aetiologies, we used the patterns of copy number change encoded by the signature (Extended Data Fig. 4, Supplementary Figs. 8 and 9 and Methods) and signature associations with known cancer driver mutations (Extended Data Fig. 8a and Supplementary Figs. 10–17). We used these driver gene associations as markers for putative pathways involved in the aetiologies and assumed the same pathway deregulation for samples where no driver gene was mutated (similar to how BRCAness is defined in the absence of BRCA1 or BRCA2 mutation11). In many cases, the signature pattern was already suggestive of a mechanism (for example, whole-chromosome missegregation). Once a putative cause was proposed, we sought additional supporting data (Fig. 1, Extended Data Figs. 8 and 9 and Supplementary Methods) including: data from two additional patient cohorts and their clinical metadata (approximately 1,900 patients from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project and approximately 400 patients from the International Cancer Genome Consortium (ICGC) project); five types of mutational signatures (single-base substitution (SBS), insertion–deletion (ID), doublet base substitutions, ovarian copy number and rearrangement); 14 molecular features (somatic point mutations, gene expression, cell cycle score, aneuploidy score, whole-chromosome copy number aberrations (CNAs), tandem duplications, loss of heterozygosity, chromothripsis, kataegis, whole-genome duplication status, telomere length and elongation machinery activity, extrachromosomal DNA and centrosome amplification score (CA20)), and 11 DNA repair-specific features (germline BRCA1/2 mutations, BRCA1 and RAD51C hypermethylation data, HRDetect response, HRD score (Myriad myChoice), TP53 inactivation score, telomeric imbalances score, large-scale state transition score, loss of heterozygosity score, DNA repair proficiency score, protein expression score for 23 DNA-damage repair genes and PCAWG structural variants with associated microhomologies). Here we provide a synthesis of the data supporting the putative aetiologies (summarized in Fig. 2).
Mitotic signatures
CX1, CX6 and CX14 all encoded patterns related to whole-arm or whole-chromosome changes and significantly correlated with direct counts of whole-chromosome changes (Supplementary Fig. 18). This suggested putative causes resulting in chromosome missegregation during mitosis. In agreement with this hypothesis, CX14 had significantly higher activity in tumours with inactivating mutations in CIC12; CX1 with mutations in CIC12, VHL13 and PBRM1 (ref. 14); and CX6 with mutations in CUL1 (ref. 15) and RAC1 (ref. 16) (Extended Data Fig. 8a). Each of the three signatures correlated with downregulation of telomerase activity (Supplementary Fig. 19b), with CX1 also being negatively correlated with telomere length (Supplementary Fig. 19a) and associated with a lack of TERC and TERT amplification and expression (Supplementary Fig. 19c–e, and 20). Therefore, telomere shortening may have a key role in the mechanisms underlying these signatures4. CX1 positively correlated with the ‘clock-like’ SBS1 signature, suggesting that these errors might also be mediated via a natural ageing process such as age-related telomere attrition4 (Extended Data Fig. 8 and Supplementary Fig. 21).
Signatures of impaired homologous recombination
CX2, CX3 and CX5 all exhibited patterns that had previously been shown to associate with impaired homologous recombination (IHR): CX2 showed a pattern of short-to-medium-sized, oscillating changes associated with tandem duplications17; CX5 showed medium-sized events associated with tandem duplication17; and CX3 showed long-sized, single-copy changes with associated loss of heterozygosity18,19 (Extended Data Fig. 4 and Supplementary Figs. 18 and 22). All three signatures were observed at significantly higher levels in tumours with somatic BRCA1 mutation, independently of each other (Extended Data Figs. 8a and 9a and Supplementary Table 12). This suggested varying roles for disruption of HR as underlying causes11. Several lines of evidence supported the link between these signatures and HR: increased CX2, CX3 and CX5 activity across germline-mutated BRCA1 carriers (and BRCA2 carriers for CX3); higher activity in cases with methylated RAD51C (except CX5)20 (Extended Data Fig. 9a); correlation with tandem duplication scores17 (Supplementary Fig. 22), rearrangement signatures 1, 3 and 5 (ref. 21) (Supplementary Fig. 23), SBS3 signature and ID6 (Supplementary Fig. 21), centrosome amplification score22 (Supplementary Fig. 24), and ovarian copy number signatures 3 and 7 (ref. 8) (Supplementary Fig. 25); association with loss of heterozygosity18, chromothripsis23 (except CX3) and kataegis24(Supplementary Fig. 18); increased utilization of theta-mediated end joining and single-strand annealing backup repair pathways visible as microhomologies at breakpoints11 (Supplementary Fig. 26); as well as correlation with seven HRD metrics25 (Extended Data Fig. 9d). The strength of these associations increased from CX2 to CX5 and to CX3. This suggested an increasing spectrum of CIN complexity associated with disruptions in HR-mediated repair. Indeed, CX2 appears to be only associated with disruption of HR, whereas CX5 and CX3 have associations that indicate the involvement of replication stress (via amplification and overexpression of MAPK1 (ref. 26), PPP2R1A27 and U2AF1 (ref. 28)). The larger copy number changes observed for CX5 and CX3 suggest faster cell cycling and breaks carried through to mitosis11, which was supported by strong correlation with cell cycle scores (Extended Data Fig. 9b) and increased CNAs estimated to occur during mitosis (Supplementary Fig. 27 and Supplementary Methods). Further associations were observed for CX3, including missense mutations in ERCC2 (ref. 29) and downregulation of key nucleotide excision repair (NER) genes suggesting defects in NER (Extended Data Fig. 9c and Supplementary Fig. 28), as well as TP53 mutation suggesting impaired damage sensing30 (Extended Data Fig. 8a). These CX3 associations are reminiscent of what has been termed BRCAness or HRD11. However, CX5, and especially CX2, appear to represent a more moderate impairment of HR. Therefore, we use the term IHR for the aetiology underlying all three signatures rather than HRD.
Whole-genome duplication signature
CX4 encompassed a unique pattern of copy number change with neighbouring segments separated by two copy changes (Extended Data Fig. 4), a pattern commonly used to define the presence of a whole-genome duplication (WGD) event31. CX4 was also associated with whole-chromosome changes (Extended Data Fig. 8b), a feature commonly observed in tetraploid cells due to increased mitotic errors32. The specific cause of WGD (endoreduplication, errors in cytokinesis or cell fusion33) was not evident from our data; however, this signature had high activity in tumours with PIK3R2, AKT1 and MAPK1 mutations, suggesting that tolerance to WGD may be mediated by PI3K–AKT activation34,35 (Extended Data Fig. 8a).
Signature of impaired non-homologous end joining
CX10 displayed a pattern of clustered and oscillating copy number changes (Extended Data Fig. 4). Its activity was significantly higher in tumours with inactivating mutations in FBXW7 and correlated with FBXW7-mutant-mediated tandem duplication class 1/2 (Extended Data Fig. 8 and Supplementary Fig. 22), suggesting impaired non-homologous end joining17,36 as a putative cause. A significant increase in the proportion of breakpoints with microhomologies in samples with this signature was indicative of a lack of blunt-end joining, which is a hallmark of non-homologous end joining (Supplementary Fig. 29a).
Signatures of amplification
CX8, CX9, CX11 and CX13 encoded patterns of low-level, mid-level, mid-level and high-level amplifications, respectively (Extended Data Fig. 4). Higher activity of CX8 in the context of amplification and overexpression of U2AF1 (ref. 28) and MAPK1 (ref. 26), and for CX9 ERBB3 (ref. 37) (Extended Data Fig. 8), suggested replication stress as a putative cause. All four signatures were associated with increased cell cycle score (Supplementary Fig. 30), reinforcing replication stress as a causal factor. In addition, CX8, CX9 and CX13 were associated with APOBEC mutagenesis (SBS2 and/or SBS13 signatures; Supplementary Fig. 21a), and CX9 and CX11 were associated with ID signatures 1 and 2 (ref. 38) (Supplementary Fig. 21). CX9 copy number changes were not part of oscillating chains; however, the remaining amplification signatures were. CX13 was strongly associated with extrachromosomal DNA circularization and amplification events (Supplementary Fig. 31); however, the specific mechanism causing the extrachromosomal DNA was not evident.
Unknown aetiologies
CX7, CX12, CX15, CX16 and CX17 did not have patterns of copy number change or associations clearly indicative of a putative cause (Extended Data Figs. 4 and 8a). Therefore, these signatures currently have unknown aetiologies.
Cross-signature observations
Many covariates demonstrated associations with multiple signatures. Chromothripsis was linked with seven different signatures (Extended Data Fig. 8), suggesting that many potential aetiologies underpin these complex rearrangements. Replication stress was associated with eight signatures, highlighting it as a major source of CIN (Fig. 2). Different signatures showed a bias for occurrence before WGD (CX1, CX2, CX7 and CX15) or after WGD (CX3, CX5, CX6, CX8, CX9, CX13 and CX17), demonstrating the importance of WGD events in modulating CIN (Extended Data Fig. 8b and Supplementary Fig. 18e, f). Finally, signatures of APOBEC mutagenesis and kataegis were associated with six signatures, highlighting these as a common feature of CIN39 (Extended Data Fig. 8b and Supplementary Figs. 18 and 21).
Drug response prediction and drug target identification
The putative signature aetiologies implicated canonical cancer pathways as some of the major drivers of CIN. Many of these pathways have been the focus of targeted therapy development. Therefore, given that our signatures can be readily measured in tumours from patients, we explored their utility for therapy response prediction and drug target identification. We integrated data from 297 cancer cell lines, including copy number profiling, genome-wide clustered regularly interspaced short palindromic repeat (CRISPR–Cas9) knockout screens, genome-wide RNA interference (RNAi) screens and the profiling relative inhibition simultaneously in mixtures (PRISM) drug repurposing screen (Supplementary Methods). We assessed correlations between signature activities, gene essentiality and sensitivity to drug perturbation of the gene (Fig. 3a).
We identified 40 genes where copy number signature activity was significantly correlated with both genetic and drug perturbation of the target (Fig. 3b and Supplementary Table 56). Among these, several revealed promising new therapeutic avenues for targeting CIN. CX4 (associated with PI3K–AKT activation) was correlated with response to inhibition of CCND1 via arcyriaflavin-A, which may indicate a therapeutic strategy for reversing tolerance to WGD40. CX5, a signature of IHR, predicted response to olaparib via inhibition of PARP1. Given that this signature was also correlated with RNAi knockdown of PARP1, this may represent a biomarker that is specific to the inhibition of regular protein function rather than PARP trapping41. CX9 (associated with replication stress) was correlated with response to multiple kinase inhibitors targeting genes involved in major mitogenic pathways (EGFR, JAK1, MET, PRKCA and PIK3CA), suggesting that a multikinase inhibitor approach may be suitable for targeting replication stress. Correlation of CX13 (also associated with replication stress) with response to inhibition of CDK4 may potentially represent a biomarker-led approach for improving CDK4/6 inhibitor-mediated tumour sensitization to immune checkpoint blockade42.
Copy number signature correlations with gene essentiality scores from both CRISPR and RNAi perturbation screens identified 104 target genes with druggable structures that currently have no targeted therapies in the clinic (Supplementary Table 57). These represent putative synthetic lethal drug targets, 49 of which had evidence of being implicated in CIN-related mechanisms (Fig. 3c). A number of these show promising links between the signature aetiology and potential consequence of target inhibition. CX1 activity was correlated with perturbation of ACTL6A (involved in the SWI/SNF complex) and TERF1 (involved in telomere maintenance), both of which are required for faithful chromosome segregation during mitosis4,43. The combined dysregulation of mitosis and telomere elongation machinery associated with CX1 suggests that inhibiting either one of these genes might be a promising therapeutic strategy by creating synthetic lethality. Indeed, inhibition of both genes has been previously suggested to induce cell lethality by generating excessive CIN44. CX9 was correlated with perturbation of BUB1B, a spindle assembly checkpoint gene recently identified as therapeutically relevant in CIN-high cells measured via WGD status45 and an aneuploidy score7. This association with CX9 suggests that the spindle assembly checkpoint may have a crucial role in tolerating mid-level amplifications, and reducing levels of BUB1B may induce excessive and catastrophic chromosome missegregation46. Finally, CX11, which was strongly associated with CDK4 amplification, was correlated with inhibition of GNL2, which in turn impedes the formation of the cyclin D1–CDK4 complex47.
Predicting platinum sensitivity
The aetiologies of the three IHR signatures suggested a model of increasing CIN complexity (Fig. 4a and Extended Data Fig. 9). IHR alone gives rise to CX2, a signature of small copy number changes indicative of tandem duplication. IHR plus replication stress leads to CX5, which involves larger CNAs. Finally, IHR plus replication stress, impaired damage sensing and impaired NER gives rise to CX3 with the largest CNAs that are strongly associated with loss of heterozygosity. Our results did not reveal whether the different levels of complexity developed in a stepwise manner or by independent processes.
Disruption of both HR11 and NER48 have been shown to confer sensitivity to platinum-based chemotherapy. Given that only CX3 was associated with disruption of NER, we hypothesized that the IHR signatures may demonstrate differing abilities to predict platinum sensitivity. As patients with ovarian cancer are routinely treated with platinum-based chemotherapy, we tested the ability of all three signatures to predict overall survival, and hence platinum sensitivity, using a Cox proportional hazards model (Fig. 4b and Supplementary Fig. 32). CX2 showed no association with platinum sensitivity, CX5 was predictive of resistance and CX3 was predictive of sensitivity.
Given that these IHR signatures were able to dissect platinum response, we further hypothesized that they could be used in combination to provide better predictors of platinum sensitivity. As CX2 was not predictive, we used it as a baseline for capturing non-predictive IHR-related genomic changes, and required that the predictive CX3 activity exceed it to potentially confer sensitivity. This resulted in a simple classification rule: ‘if CX3 activity is greater than CX2 activity, then predict sensitivity’ (Fig. 4c). This interpretable classifier was able to distinguish significant overall survival separation across cohorts of BRCA1 germline mutant ovarian cancers, ovarian cancers from the TCGA cohort, an independent validation cohort and an oesophageal cancer cohort (also routinely treated with platinum-based chemotherapy) (Fig. 4d, Extended Data Fig. 10 and Supplementary Figs. 33–36). Other classifiers using all three IHR signatures, including more complex machine learning methods, did not outperform this decision rule (Supplementary Fig. 37). Furthermore, this simple classifier had comparable performance to more complex state-of-the-art HRD predictors, which rely on additional data beyond copy number, applied to cohorts of ovarian, oesophageal and breast cancers (Extended Data Fig. 10c, d). By applying this classifier to the whole TCGA ovarian cohort, we estimate that 27% of ovarian tumours might be platinum sensitive. Applying the classifier pan-cancer, we estimate that 8% of all tumours might be sensitive.
Discussion
Here we present a robust analysis framework for CIN in human cancers built on a pan-cancer analysis across 33 cancer types. This resource advances the field in two ways: it untangles CIN according to characteristic genomic patterns and underlying causes, and defines copy number signatures as new biomarkers to quantitatively measure different types of CIN. Our approach complements previous landscape studies of the genetic consequences of CIN49, which generally focused on recurrent somatic copy number events at individual loci. By contrast, copy number signatures8,9 uncover mechanistic biases in the patterns of alterations across all chromosomes.
In its current form, the signature methodology cannot account for selection pressures on CNAs. For single-nucleotide variant signatures, passenger mutations provide strong signals for detection. However, for CNAs, the distinction between driver and passenger mutations is less clear. For example, large homozygous deletions are likely to be subject to strong negative selection, whereas other CNAs can be subject to strong positive selection. This has implications for the ability to detect signatures of CIN. Those processes that generate CNAs under positive selection will be easier to detect than those that generate CNAs under negative selection. Quantitatively, the relationship between signature detection and selection is not yet well understood and will depend on genomic background. For example, negative selection will be weaker in whole-genome duplicated samples (approximately 50% of tumours) and in tumours that have lost their ability to sense DNA damage (for example, via TP53 mutation).
To maximize sample size, we used SNP 6.0 technology data from the TCGA collection. This technology is well established for copy number analysis, but has lower resolution than WGS. As further WGS data become available, there will be an opportunity to refine our signatures and increase their resolution. In their current form, we have demonstrated that the signatures are widely applicable across technologies, including inexpensive assays such as shallow WGS that can be easily applied in a clinical setting to formalin-fixed tumour material50. However, it is important to note that the bulk-DNA samples that we analysed do not show dynamics of CIN, and future work is needed to extend our approach to multiple samples or single cells from the same patient to show how patterns of CIN change over time. Further work is also required to quantify copy number signature activity at specific genomic loci, as our method currently only supports signature quantification at a whole-genome level.
The 17 copy number signatures and their putative aetiologies provide a valuable resource for furthering our understanding of CIN. For example, CX1 represents the most prevalent type of CIN across tumours: chromosome missegregation. Aetiology analysis of CX1 pointed at multiple different mitotic defects giving rise to this signature. This suggests that, despite diversity in the potential causes of mitotic defects, these all result in the same change in genome structure1. These missegregation events typically result in large copy number changes, potentially disrupting the function of many genes; however, our signature analysis reveals that these changes only represent, on average, 4% of the total number of copy number changes observed in a tumour (Supplementary Fig. 38). By contrast, CX2 accounts for 23% of the copy number changes observed in a tumour. This highlights the power of our compendium of signatures to quantify and disentangle the causes and functional effect that different types of CIN have on tumour genomes. Our results also highlight the potential of our signatures to improve the treatment of patients with extreme CIN tumours. Platinum-based chemotherapy is currently considered a broad-spectrum cytotoxic chemotherapy and is routinely used to treat cancers with extreme CIN. However, here we showed that platinum response can be robustly dissected using different signatures of IHR. By developing the IHR signatures into a companion diagnostic assay, platinum-based therapies could potentially be administered in a more targeted manner, allowing resistant patients to avoid their toxic side effects, and healthcare systems to reduce the cost burden of ineffectual treatment. Similarly for other signatures, our analysis of drug response across cell lines reinforces their potential to be developed into companion diagnostics for improved patient stratification during clinical trials.
The signature compendium presented here is an important resource to guide future studies into a deeper understanding of the origins and diversity of CIN and how to therapeutically target different types of CIN.
Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this paper.
Data availability
All data used in this study were obtained from publicly available sources and are described in detail in Supplementary Table 1, section ‘Data and Code’ in the Supplementary Methods. Some raw data have restricted access (TCGA dbGaP accession number: phs000178.v11.p8; ICGC EGA accession number: EGAS00001001692). Access can be obtained by applying to the relevant Data Access Committees (TCGA or ICGC). The authors declare that all other data supporting the findings of this study, including the source data for all figures, are publicly available without restrictions and are also available in the Supplementary Information and the Github repositories. All data supporting the analysis of our copy number signatures are navigable via our web portal (https://markowetz.cruk.cam.ac.uk/cincompendium/).
Code availability
The code is publicly accessible via our hub repository (https://github.com/markowetzlab/Drews2022_CIN_Compendium), which describes how the CIN signatures were derived and how to reproduce the figures and tables in this publication. The repository also contains the publicly accessible data and intermediary results used and produced in this study. The hub repository links to other repositories containing the code for specialized tasks.
References
Bakhoum, S. F. & Cantley, L. C. The multifaceted role of chromosomal instability in cancer and its microenvironment. Cell 174, 1347–1360 (2018).
Hanahan, D. & Weinberg, R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011).
Tijhuis, A. E., Johnson, S. C. & McClelland, S. E. The emerging links between chromosomal instability (CIN), metastasis, inflammation and tumour immunity. Mol. Cytogenet. 12, 17 (2019).
Chakravarti, D., LaBella, K. A. & DePinho, R. A. Telomeres: history, health, and hallmarks of aging. Cell 184, 306–322 (2021).
Bakhoum, S. F. et al. Chromosomal instability drives metastasis through a cytosolic DNA response. Nature 553, 467–472 (2018).
Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 23, 517–525 (2017).
Cohen-Sharir, Y. et al. Aneuploidy renders cancer cells vulnerable to mitotic checkpoint inhibition. Nature 590, 486–491 (2021).
Macintyre, G. et al. Copy number signatures and mutational processes in ovarian carcinoma. Nat. Genet. 50, 1262–1270 (2018).
Steele, C. D. et al. Undifferentiated sarcomas develop through distinct evolutionary pathways. Cancer Cell 35, 441–456.e8 (2019).
Ben-David, U. & Amon, A. Context is everything: aneuploidy in cancer. Nat. Rev. Genet. 21, 44–62 (2020).
Stok, C., Kok, Y. P., van den Tempel, N. & van Vugt, M. A. T. M. Shaping the BRCAness mutational landscape by alternative double-strand break repair, replication stress and mitotic aberrancies. Nucleic Acids Res. 49, 4239–4257 (2021).
Takemon, Y. et al. Multi-omic analyses reveal a role for mammalian CIC in cell cycle regulation and mitotic fidelity Preprint at bioRxiv https://www.biorxiv.org/content/10.1101/533323v2 (2019).
Hell, M. P., Duda, M., Weber, T. C., Moch, H. & Krek, W. Tumor suppressor VHL functions in the control of mitotic fidelity. Cancer Res. 74, 2422–2431 (2014).
Brownlee, P. M., Chambers, A. L., Cloney, R., Bianchi, A. & Downs, J. A. BAF180 promotes cohesion and prevents genome instability and aneuploidy. Cell Rep. 6, 973–981 (2014).
Silverman, J. S., Skaar, J. R. & Pagano, M. SCF ubiquitin ligases in the maintenance of genome stability. Trends Biochem. Sci. 37, 66–73 (2012).
Godinho, S. A. & Pellman, D. Causes and consequences of centrosome abnormalities in cancer. Phil. Trans. R. Soc. B 369, 20130467 (2014).
Menghi, F. et al. The tandem duplicator phenotype is a prevalent genome-wide cancer configuration driven by distinct gene mutations. Cancer Cell 34, 197–210.e5 (2018).
Abkevich, V. et al. Patterns of genomic loss of heterozygosity predict homologous recombination repair defects in epithelial ovarian cancer. Br. J. Cancer 107, 1776–1782 (2012).
Popova, T. et al. Ploidy and large-scale genomic instability consistently identify basal-like breast carcinomas with BRCA1/2 inactivation. Cancer Res. 72, 5454–5462 (2012).
The Cancer Genome Atlas Research Network. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609–615 (2011).
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Ogden, A., Rida, P. C. G. & Aneja, R. Prognostic value of CA20, a score based on centrosome amplification-associated genes, in breast tumors. Sci Rep. 7, 262 (2017).
Piazza, A. & Heyer, W.-D. Homologous recombination and the formation of complex genomic rearrangements. Trends Cell Biol. 29, 135–149 (2019).
Guirouilh-Barbat, J., Lambert, S., Bertrand, P. & Lopez, B. S. Is homologous recombination really an error-free process? Front. Genet. 5, 175 (2014).
Knijnenburg, T. A. et al. Genomic and molecular landscape of dna damage repair deficiency across The Cancer Genome Atlas. Cell Rep. 23, 239–254.e6 (2018).
Saavedra, H. I., Fukasawa, K., Conn, C. W. & Stambrook, P. J. MAPK mediates RAS-induced chromosome instability. J. Biol. Chem. 274, 38083–38090 (1999).
Perl, A. L. et al. Protein phosphatase 2A controls ongoing DNA replication by binding to and regulating cell division cycle 45 (CDC45). J. Biol. Chem. 294, 17043–17059 (2019).
Chen, L. et al. The augmented R-loop is a unifying mechanism for myelodysplastic syndromes induced by high-risk splicing factor mutations. Mol. Cell 69, 412–425.e6 (2018).
Li, Q. et al. ERCC2 helicase domain mutations confer nucleotide excision repair deficiency and drive cisplatin sensitivity in muscle-invasive bladder cancer. Clin. Cancer Res. 25, 977–988 (2019).
Menon, V. & Povirk, L. Involvement of p53 in the repair of DNA double strand breaks: multifaceted roles of p53 in homologous recombination repair (HRR) and non-homologous end joining (NHEJ). Subcell. Biochem. 85, 321–336 (2014).
Dentro, S. C. et al. Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes. Cell 184, 2239–2254.e39 (2021).
Dewhurst, S. M. et al. Tolerance of whole-genome doubling propagates chromosomal instability and accelerates cancer genome evolution. Cancer Discov. 4, 175–185 (2014).
Davoli, T. & de Lange, T. The causes and consequences of polyploidy in normal development and cancer. Annu. Rev. Cell Dev. Biol. 27, 585–610 (2011).
Berenjeno, I. M. et al. Oncogenic PIK3CA induces centrosome amplification and tolerance to genome doubling. Nat. Commun. 8, 1773 (2017).
Darp, R., Vittoria, M. A., Ganem, N. J. & Ceol, C. J. Oncogenic BRAF induces whole-genome doubling through suppression of cytokinesis. Preprint at bioRxiv https://doi.org/10.1101/2021.04.08.439023 (2021).
Zhang, Q. et al. FBXW7 facilitates nonhomologous end-joining via K63-linked polyubiquitylation of XRCC4. Mol. Cell 61, 419–433 (2016).
Citri, A., Skaria, K. B. & Yarden, Y. The deaf and the dumb: the biology of ErbB-2 and ErbB-3. Exp. Cell Res. 284, 54–65 (2003).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Venkatesan, S. et al. Induction of APOBEC3 exacerbates DNA replication stress and chromosomal instability in early breast and lung cancer evolution. Cancer Discov. 11, 2456–2473 (2021).
Crockford, A. et al. Cyclin D mediates tolerance of genome-doubling in cancers with functional p53. Ann. Oncol. 28, 149–156 (2017).
Ray Chaudhuri, A. & Nussenzweig, A. The multifaceted roles of PARP1 in DNA repair and chromatin remodelling. Nat. Rev. Mol. Cell Biol. 18, 610–621 (2017).
Goel, S. et al. CDK4/6 inhibition triggers anti-tumour immunity. Nature 548, 471–475 (2017).
Brownlee, P. M., Meisenberg, C. & Downs, J. A. The SWI/SNF chromatin remodelling complex: Its role in maintaining genome stability and preventing tumourigenesis. DNA Repair 32, 127–133 (2015).
Kops, G. J. P., Foltz, D. R. & Cleveland, D. W. Lethality to human cancer cells through massive chromosome loss by inhibition of the mitotic checkpoint. Proc. Natl Acad. Sci. USA 101, 8699–8704 (2004).
Quinton, R. J. et al. Whole-genome doubling confers unique genetic vulnerabilities on tumour cells. Nature 590, 492–497 (2021).
Janssen, A., Kops, G. J. P. L. & Medema, R. H. Elevating the frequency of chromosome mis-segregation as a strategy to kill tumor cells. Proc. Natl Acad. Sci. USA 106, 19108–19113 (2009).
Datta, D. et al. Nucleolar GTP-binding protein-1 (NGP-1) promotes G1 to S phase transition by activating cyclin-dependent kinase inhibitor p21 Cip1/Waf1. J. Biol. Chem. 290, 21536–21552 (2015).
Martin, L. P., Hamilton, T. C. & Schilder, R. J. Platinum resistance: the role of DNA repair pathways. Clin. Cancer Res. 14, 1291–1295 (2008).
Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).
Scheinin, I. et al. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res. 24, 2022–2032 (2014).
Acknowledgements
We thank M. Eldridge for setting up the online resource and A. Berman for bug fixes. R.M.D., P.S.S., D.-L.C., J.D.B. and F.M. are funded by Cancer Research UK (core grants C14303/A17197, A22905 and A19274) and the Cambridge Cancer Centre (grant C9685/A25117). G.M. and B.H. are hosted by the Centro Nacional de Investigaciones Oncológicas (CNIO), which is supported by the Instituto de Salud Carlos III and recognized as a ‘Severo Ochoa’ Centre of Excellence (ref. CEX2019-000891-S) by the Spanish Ministry of Science and Innovation (MCIN/AEI/ 10.13039/501100011033). G.M. and B.H. were also supported by a Spanish Ministry of Science and Innovation grant PID2019-111356RA-I00 (MCIN/AEI/ 10.13039/501100011033). M.T. was supported as a postdoctoral researcher of the F.R.S.-FNRS. L.M.G. was supported by the Wellcome Trust PhD programme in Mathematical Genomics and Medicine (grant number RG92770). M.S. was supported by the Horizon 2020 (H2020) Integrated Training Network CONTRA (grant 766030-CONTRA-H2020-MSCA-ITN-2017). This work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001202), the UK Medical Research Council (FC001202) and the Wellcome Trust (FC001202). P.V.L. is a Winton Group Leader in recognition of the Winton Charitable Foundation’s support towards the establishment of The Francis Crick Institute.
Author information
Authors and Affiliations
Contributions
G.M. and F.M. contributed equally to this work. R.M.D., G.M. and F.M. conceived and designed the study. R.M.D., B.H., D.-L.C., M.S. and G.M. developed the methodology of the study. R.M.D., B.H., M.T., K.H., T.L., P.S.S., L.M.G., L.L., M.S. and G.M. developed the software for the study. R.M.D., M.T., K.H., T.L., P.S.S., D.-L.C. and G.M. contributed to the validation of the method and results. R.M.D., B.H., L.M.G., D.-L.C., L.L. and G.M. contributed to the formal analysis presented in this study. R.M.D., B.H., K.H., T.L., P.S.S., L.M.G. and P.V.L. provided access to data and contributed to gathering, processing and curating data. R.M.D., J.D.B., P.V.L., G.M. and F.M. wrote the original draft. R.M.D., B.H., G.M. and F.M. produced and contributed to the visualizations of the study. R.M.D., G.M. and F.M. supervised the project. All authors had access to all of the data in the study. All authors contributed to the review and the editing of the manuscript. All authors approved the manuscript before the initial submission and all other resubmissions.
Corresponding authors
Ethics declarations
Competing interests
J.D.B., G.M., F.M. are co-founders, directors and share holders of Tailor Bio Ltd. Cancer Research UK, the University of Cambridge and the Spanish National Cancer Research Centre (CNIO) are in the process of applying for a patent application GB2114203.9 covering the copy number signature methodology and drug target associations that lists R.M.D., B.H., G.M. and F.M. as inventors. G.M., F.M. and J.D.B are listed on a patent on using copy number signatures to predict response to doxorubicin treatment in ovarian cancer (PCT/EP2021/065058).
Peer review
Peer review information
Nature thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Workflow of sample filtering and detectable chromosomal instability (dCIN).
a, REMARK diagram showing flow of samples through the study. b, For each copy number feature of the previous ovarian signatures: a histogram of number of events per sample that could not be assigned to an ovarian copy number signature on the TCGA ovarian cohort. Red dotted line indicates the quantile 0.95. c, Scatterplot of cancer types comparing our estimate of detectable CIN (Supplementary Methods) to estimates reported in the Mitelman database. d,e, Boxplots comparing our estimate of detectable CIN with aneuploidy score and four CNA-specific metrics. Boxes represent the interquartile range (IQR) with the median as a bolded line. The whiskers extend to the largest/smallest value no further than 1.5 * IQR from the hinge. Outliers beyond the end of the whiskers are marked individually as points. Results of two-sided Welch’s t-test shown on top of the boxplots.
Extended Data Fig. 2 Overview of copy number features and signature identification.
a, A schematic showing the 5 fundamental copy number features that were computed using 6,335 samples with detectable CIN (dCIN). Note, a feature capturing absolute copy number is not included in our method. b, A schematic showing how mixture modelling is used to split the genome-wide feature distributions into smaller components by either Variational Bayes Gaussian mixture models or Finite Poisson mixture models. The actual number of resulting components is listed below each feature distribution. These components represent basic building blocks of each feature distribution. c, An example of how the probability of a CNA belonging to a mixture component (posterior probability) is calculated and how these are summed. d, (Right) The resulting 43-dimensional feature vectors for each sample, after all posterior probabilities are summed for each component. (Left) A schematic of how the sum-of-posterior matrix for all 6,335 samples was split in two matrices by a Bayesian implementation of the non-negative matrix factorisation (NMF), resulting in a signature catalogue and an activity catalogue.
Extended Data Fig. 3 Schematic of the signature compendium identification.
a, From the complete input matrix 10 pan-cancer signatures were identified. b, For the 20 cancer types with over 100 samples each, 128 cancer-type enriched signatures (CTES) were identified. c, All CTES were removed that had a cosine similarity over 0.74 with any pan-cancer signature. d, From the groups of CTES that had cosine similarities over 0.74 to each other, the signature with activities in the largest number of samples was taken as a representative signature. e, We performed non-negative least squares on each pair of pan-cancer specific signatures to each CTES. For any combination which showed a reconstruction error below 0.1, this CTES was removed. f, The sets of 10 pan-cancer and 7 CTES were joined to a compendium of 17 signatures. g, Using linear combination decomposition, the signature activities were calculated for the 6,335 TCGA samples.
Extended Data Fig. 4 Signature interpretation matrix.
Displayed on the left are the five features, their mixture components and component means. The heatmap on the right shows the signature interpretation values, which combine information from the sum-of-posterior matrix, signature activity matrix and the signature definition matrix (Supplementary Methods). Only components that are positively correlated with signature activity levels are displayed. Interpretation values are normalised per feature and signature.
Extended Data Fig. 5 Monte Carlo simulation results for determining signature-specific noise thresholds.
a, Each plot (1 per signature) shows the interquartile range of sample signature activities after the introduction of noise in the copy number features using a Monte Carlo simulation. Samples are ordered by their observed signature activity (red line). b, Schematic showing how we fitted a Gaussian distribution to the simulated values of all samples with an observed signature activity of 0 (red line). The horizontal black line represents the quantile 0.95 of the fitted Gaussian and forms the basis of our signature specific noise threshold, where values below this line are not distinguishable from 0. c, Plot of the signature-specific thresholds for the 17 copy number signatures.
Extended Data Fig. 6 Signature stability across different copy number profiling technologies.
Across the same set of 478 tumours, we compared the SNP6-array based copy number profiles and signatures to copy number profiles and signatures derived using different copy number profiling technologies. The columns contain results for the different technologies and the rows contain results for comparison between copy number profiles (top), signature activities (middle) and signature definitions (bottom, limited to pan-cancer signatures). For each comparison we show results for a range of penalties for ASCAT’s piecewise constant fitting or ASCAT.sc’s circular binary segmentation. (*): For settings marked with a star it was not possible to derive solutions for K = 10, instead the optimal number of K was chosen (lower than K = 10).
Extended Data Fig. 7 Workflow for determining signature aetiology and confidence rating.
a, Flowchart showing how an association between a mutated gene and signature activity was used to derive a hypothesis for a putative aetiology. b, Flowchart representing the decision making process leading to the assignment of a 3-star rating confidence score. c, Example of the star rating process for CX3.
Extended Data Fig. 8 Summary of associations between signatures and other covariates.
a, Main panel shows significant associations between copy number signatures and mutated genes. Gene annotations summarised in the panels below. Boxes with a red line indicate significant associations that were not considered when determining signature aetiologies as the significant enrichment was via amplification of the gene, which also resided in an ecDNA amplicon, which could be a consequence of the signature rather than a cause, potentially causing a spurious correlation with amplification signatures (CX8, CX9, CX11, CX13). b, Each row shows highly significant associations between signatures and different covariates. Unless otherwise specified, only positive correlations are shown.
Extended Data Fig. 9 Impaired homologous recombination signatures and their associations.
a, Boxplots summarise signature activities of different patient groups (rows) defined by their driver gene mutation status. Ovarian samples are coloured in dark green and breast in orange. Boxes represent the interquartile range (IQR) with the median as a bolded line. The whiskers extend to the largest/smallest value no further than 1.5 * IQR from the hinge. Outliers beyond the end of the whiskers are marked individually as points. Significance tested with two-sided Welch’s t-test between WT BRCA1/2 and each of the categories and corrected for multiple testing by using Benjamini-Hochberg method. Statistically significant comparisons are shown to the right of the boxplots with stars denoting significance (q < 0.05) and arrows denoting the two groups used for the statistical test. (BRCA1/2 = BRCA1 and BRCA2, WT = wild type; LOH = loss of heterozygosity). b, Boxplots (with same characteristics as in a) summarise the scaled signature activities of 5,466 TCGA samples split by low, medium and high cell cycle scores. The brackets and stars (q < 0.05) show where there was a significant increase from low to medium to high cell cycle groups tested with a Welch’s t-test and corrected for multiple testing with Benjamini-Hochberg method. c, Volcano plots showing the results of a correlation between signature activity and expression of genes involved in nucleotide excision repair (NER). Each dot represents a gene, coloured dots show significant correlations. d, Spearman correlation coefficient (y-axis) of correlation between signature activities and seven common metrics of HRD (listed at top). Individual coefficients are displayed for impaired homologous recombination (IHR) signatures and the distribution of coefficients from remaining signatures are represented by boxplots (with same characteristics as in a).
Extended Data Fig. 10 Performance of classifiers for predicting platinum sensitivity.
a, Kaplan-Meier estimator showing the overall survival probabilities of TCGA ovarian cancer patients split into two groups using our CX3/CX2 classifier. b, Hazard ratios and their 95% confidence interval obtained from a Cox proportional hazards model trained on our CX3/CX2 classification predicting overall survival of TCGA ovarian cancer patients. The model also corrected for age and cancer stage of the patients. P-value represents the significance of a Wald test. c,d, Median survival and hazard ratios generated for five cancer cohorts from the TCGA, PCAWG and ICGC projects using predictions from three classifiers (our CX3/CX2 classifier, HRDetect and Myriad myChoice based on the HRD score). Improvements in median survival tested by log-rank test (Kaplan-Meier survival analysis), with the minus symbol representing the predicted resistant group and the plus symbol the predicted sensitive group. Hazard ratios, their 95% confidence interval, and Wald test significance of the predicted sensitive group compared to the predicted resistant group are obtained from Cox proportional hazards models correcting for stage and age of patients, except for HRDetect where tumour stage was omitted as the models did not converge if included. The number and proportion of patients predicted to be sensitive (with HRD) and resistant (without HRD) by each classifier are listed on the right.
Supplementary information
Supplementary Methods
This file contains full descriptions of all methods, links to software, additional results and Supplementary Tables 1–13.
Supplementary Figures
This file contains Supplementary Figs. 1–54.
Supplementary Tables
This zip folder contains Supplementary Tables 14-64 and their accompanying legends.
Rights and permissions
About this article
Cite this article
Drews, R.M., Hernando, B., Tarabichi, M. et al. A pan-cancer compendium of chromosomal instability. Nature 606, 976–983 (2022). https://doi.org/10.1038/s41586-022-04789-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41586-022-04789-9
- Springer Nature Limited
This article is cited by
-
Genome-wide quantification of copy-number aberration impact on gene expression in ovarian high-grade serous carcinoma
BMC Cancer (2024)
-
HATCHet2: clone- and haplotype-specific copy number inference from bulk tumor sequencing data
Genome Biology (2024)
-
Chromosomal instability: a key driver in glioma pathogenesis and progression
European Journal of Medical Research (2024)
-
Allele-specific transcriptional effects of subclonal copy number alterations enable genotype-phenotype mapping in cancer cells
Nature Communications (2024)
-
The chromatin-associated lncREST ensures effective replication stress response by promoting the assembly of fork signaling factors
Nature Communications (2024)