Main

CIN has complex consequences, including loss or amplification of driver genes, focal rearrangements, extrachromosomal DNA, micronuclei formation and activation of innate immune signalling1. This leads to associations with disease stage, metastasis, poor prognosis and therapeutic resistance3. The causes of CIN are also diverse and include mitotic errors, replication stress, homologous recombination deficiency (HRD), telomere crisis and breakage fusion bridge cycles, among others1,4.

Because of the diversity of these causes and consequences, CIN is generally used as an umbrella term. Measures of CIN either divide tumours into broad categories of high or low CIN5, are restricted to a single aetiology such as HRD6, are limited to a particular genomic feature such as whole-chromosome-arm changes7, or can only be quantified in specific cancer types8,9. As a result, there is no systematic framework to comprehensively characterize the diversity, extent and origins of CIN pan-cancer, or to define how different types of CIN within a tumour relate to clinical phenotypes. Here we present a robust analysis framework to quantitatively measure different types of CIN across cancer types.

Deconstructing CIN

We derived 7,880 high-quality absolute copy number profiles across 33 tumour types using single-nucleotide polymorphism (SNP) array data from The Cancer Genome Atlas (TCGA) (Extended Data Fig. 1a). Extending our previously developed framework for quantifying signatures of CIN in ovarian cancer8, we determined that 6,335 of the 7,880 samples (80%) had detectable CIN and were suitable for pan-cancer detection of copy number signatures (Extended Data Fig. 1b). This estimate was consistent with previous pan-cancer estimates of CIN10 (Extended Data Fig. 1c–e).

Using these 6,335 genome-wide copy number profiles, we computed distributions of five fundamental copy number features previously demonstrated to encode patterns of copy number changes that represent different underlying causes of CIN8 (Extended Data Fig. 2a and Supplementary Methods). These features included: the copy number change between a segment and the neighbouring segment; segment length; breakpoint count per 10 Mb; breakpoint count per chromosome arm; and length of chains of oscillating copy number states. Only segments that deviated from a normal, diploid state were considered for the segment size and changepoint features. We did not include a feature representing the copy number of a segment to avoid redundant signatures that encode the same aetiology across different ploidy backgrounds.

We applied mixture modelling to define distinct components for each cohort-wide feature distribution, identifying a total of 43 mixture components across the 5 features (Extended Data Fig. 2b, c and Supplementary Methods). Conceptually, these components represent the basic building blocks for defining CIN processes. We used these mixture components to encode each tumour genome by probabilistically assigning copy number events to these components, resulting in a 6,335 × 43 dimensional matrix. We then applied a Bayesian implementation of non-negative matrix factorization to identify copy number signatures (Extended Data Figs. 2d and 3a, b). We first used the complete matrix and found 10 pan-cancer copy number signatures, then used subsets of the matrix representing individual cancer types with at least 100 samples, and found an additional 7 signatures (Extended Data Fig. 3b–e and Supplementary Methods). We merged both sets of signatures and computed their activities using linear combination decomposition to yield a pan-cancer compendium of 17 copy number signatures and their activities in tumours across the 33 cancer types (Extended Data Figs. 3f, g and 4 and Supplementary Figs. 1 and 2).

We validated this approach by correctly identifying signatures in a collection of simulated cancer genomes with copy number changes caused by five well-studied mutational processes (Supplementary Figs. 36 and Supplementary Methods). We used a second simulation study to derive signature-specific activity thresholds, to test the stability of signature definitions and to test the stability of signature activities (Methods, Extended Data Fig. 5 and Supplementary Fig. 7). We then tested the robustness of our approach across different high-throughput technologies comparing signature definitions and activities across five platforms: SNP 6.0 without matched normal, whole-genome sequencing (WGS) downsampled to SNP 6.0 positions, WGS downsampled to shallow WGS, on-target whole-exome sequencing (WES) and off-target WES. Quantification of signature activity was robust across all platforms. Signature identification was possible across the WGS platforms but performance deteriorated for WES (Extended Data Fig. 6).

Putative causes underlying each signature

To determine the putative causes underlying each of the 17 signatures (named CX1 to CX17), we developed a data integration framework and assigned a confidence score to each signature aetiology based on the quality and extent of supporting data (Extended Data Fig. 7). To propose putative aetiologies, we used the patterns of copy number change encoded by the signature (Extended Data Fig. 4, Supplementary Figs. 8 and 9 and Methods) and signature associations with known cancer driver mutations (Extended Data Fig. 8a and Supplementary Figs. 1017). We used these driver gene associations as markers for putative pathways involved in the aetiologies and assumed the same pathway deregulation for samples where no driver gene was mutated (similar to how BRCAness is defined in the absence of BRCA1 or BRCA2 mutation11). In many cases, the signature pattern was already suggestive of a mechanism (for example, whole-chromosome missegregation). Once a putative cause was proposed, we sought additional supporting data (Fig. 1, Extended Data Figs. 8 and 9 and Supplementary Methods) including: data from two additional patient cohorts and their clinical metadata (approximately 1,900 patients from the Pan-Cancer Analysis of Whole Genomes (PCAWG) project and approximately 400 patients from the International Cancer Genome Consortium (ICGC) project); five types of mutational signatures (single-base substitution (SBS), insertion–deletion (ID), doublet base substitutions, ovarian copy number and rearrangement); 14 molecular features (somatic point mutations, gene expression, cell cycle score, aneuploidy score, whole-chromosome copy number aberrations (CNAs), tandem duplications, loss of heterozygosity, chromothripsis, kataegis, whole-genome duplication status, telomere length and elongation machinery activity, extrachromosomal DNA and centrosome amplification score (CA20)), and 11 DNA repair-specific features (germline BRCA1/2 mutations, BRCA1 and RAD51C hypermethylation data, HRDetect response, HRD score (Myriad myChoice), TP53 inactivation score, telomeric imbalances score, large-scale state transition score, loss of heterozygosity score, DNA repair proficiency score, protein expression score for 23 DNA-damage repair genes and PCAWG structural variants with associated microhomologies). Here we provide a synthesis of the data supporting the putative aetiologies (summarized in Fig. 2).

Fig. 1: Study overview.
figure 1

This schematic summarizes our robust analysis framework, which uses copy number to derive pan-cancer copy number signatures and provide insights. On the left and right are lists of the datasets used to support the signature aetiologies and insights. CCLE, Cancer Cell Line Encyclopedia; DBS, doublet-base substitution; ecDNA, extrachromosomal DNA; RS, rearrangement signature.

Fig. 2: Proposed aetiologies and prevalence of copy number signatures.
figure 2

A summary of the pan-cancer frequency, proposed aetiology (where possible), aetiology confidence rating, pattern of copy number change and distribution across cancer types is provided for each signature. Signatures are labelled on the basis of pan-cancer prevalence, with signature CX1 having the highest pan-cancer frequency. Confidence measures for each signature aetiology are indicated by a star rating. The heatmap shows the signature frequency for each of the 33 cancer types. NHEJ, non-homologous end joining; IHR, impaired homologous recombination; ACC, adrenocortical carcinoma; BLCA, bladder urothelial carcinoma; BRCA, breast invasive carcinoma; CESC, cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL, cholangiocarcinoma; COAD, colon adenocarcinoma; DLBC, lymphoid neoplasm diffuse large B-cell lymphoma; ESCA, oesophageal carcinoma; GBM, glioblastoma multiforme; HNSC, head and neck squamous cell carcinoma; KICH, kidney chromophobe; KIRC, kidney renal clear cell carcinoma; KIRP, kidney renal papillary cell carcinoma; LAML, acute myeloid leukemia; LGG, brain lower grade glioma; LIHC, liver hepatocellular carcinoma; LUAD, lung adenocarcinoma; LUSC, lung squamous cell carcinoma; MESO, mesothelioma; OV, ovarian serous cystadenocarcinoma; PAAD, pancreatic adenocarcinoma; PCPG, pheochromocytoma and paraganglioma; PRAD, prostate adenocarcinoma; READ, rectum adenocarcinoma; SARC, sarcoma; SKCM, skin cutaneous melanoma; STAD, stomach adenocarcinoma; TGCT, testicular germ cell tumours; THCA, thyroid carcinoma; THYM, thymoma; UCEC, uterine corpus endometrial carcinoma; UCS, uterine carcinosarcoma; UVM, uveal melanoma.

Source data

Mitotic signatures

CX1, CX6 and CX14 all encoded patterns related to whole-arm or whole-chromosome changes and significantly correlated with direct counts of whole-chromosome changes (Supplementary Fig. 18). This suggested putative causes resulting in chromosome missegregation during mitosis. In agreement with this hypothesis, CX14 had significantly higher activity in tumours with inactivating mutations in CIC12; CX1 with mutations in CIC12, VHL13 and PBRM1 (ref. 14); and CX6 with mutations in CUL1 (ref. 15) and RAC1 (ref. 16) (Extended Data Fig. 8a). Each of the three signatures correlated with downregulation of telomerase activity (Supplementary Fig. 19b), with CX1 also being negatively correlated with telomere length (Supplementary Fig. 19a) and associated with a lack of TERC and TERT amplification and expression (Supplementary Fig. 19c–e, and 20). Therefore, telomere shortening may have a key role in the mechanisms underlying these signatures4. CX1 positively correlated with the ‘clock-like’ SBS1 signature, suggesting that these errors might also be mediated via a natural ageing process such as age-related telomere attrition4 (Extended Data Fig. 8 and Supplementary Fig. 21).

Signatures of impaired homologous recombination

CX2, CX3 and CX5 all exhibited patterns that had previously been shown to associate with impaired homologous recombination (IHR): CX2 showed a pattern of short-to-medium-sized, oscillating changes associated with tandem duplications17; CX5 showed medium-sized events associated with tandem duplication17; and CX3 showed long-sized, single-copy changes with associated loss of heterozygosity18,19 (Extended Data Fig. 4 and Supplementary Figs. 18 and 22). All three signatures were observed at significantly higher levels in tumours with somatic BRCA1 mutation, independently of each other (Extended Data Figs. 8a and 9a and Supplementary Table 12). This suggested varying roles for disruption of HR as underlying causes11. Several lines of evidence supported the link between these signatures and HR: increased CX2, CX3 and CX5 activity across germline-mutated BRCA1 carriers (and BRCA2 carriers for CX3); higher activity in cases with methylated RAD51C (except CX5)20 (Extended Data Fig. 9a); correlation with tandem duplication scores17 (Supplementary Fig. 22), rearrangement signatures 1, 3 and 5 (ref. 21) (Supplementary Fig. 23), SBS3 signature and ID6 (Supplementary Fig. 21), centrosome amplification score22 (Supplementary Fig. 24), and ovarian copy number signatures 3 and 7 (ref. 8) (Supplementary Fig. 25); association with loss of heterozygosity18, chromothripsis23 (except CX3) and kataegis24(Supplementary Fig. 18); increased utilization of theta-mediated end joining and single-strand annealing backup repair pathways visible as microhomologies at breakpoints11 (Supplementary Fig. 26); as well as correlation with seven HRD metrics25 (Extended Data Fig. 9d). The strength of these associations increased from CX2 to CX5 and to CX3. This suggested an increasing spectrum of CIN complexity associated with disruptions in HR-mediated repair. Indeed, CX2 appears to be only associated with disruption of HR, whereas CX5 and CX3 have associations that indicate the involvement of replication stress (via amplification and overexpression of MAPK1 (ref. 26), PPP2R1A27 and U2AF1 (ref. 28)). The larger copy number changes observed for CX5 and CX3 suggest faster cell cycling and breaks carried through to mitosis11, which was supported by strong correlation with cell cycle scores (Extended Data Fig. 9b) and increased CNAs estimated to occur during mitosis (Supplementary Fig. 27 and Supplementary Methods). Further associations were observed for CX3, including missense mutations in ERCC2 (ref. 29) and downregulation of key nucleotide excision repair (NER) genes suggesting defects in NER (Extended Data Fig. 9c and Supplementary Fig. 28), as well as TP53 mutation suggesting impaired damage sensing30 (Extended Data Fig. 8a). These CX3 associations are reminiscent of what has been termed BRCAness or HRD11. However, CX5, and especially CX2, appear to represent a more moderate impairment of HR. Therefore, we use the term IHR for the aetiology underlying all three signatures rather than HRD.

Whole-genome duplication signature

CX4 encompassed a unique pattern of copy number change with neighbouring segments separated by two copy changes (Extended Data Fig. 4), a pattern commonly used to define the presence of a whole-genome duplication (WGD) event31. CX4 was also associated with whole-chromosome changes (Extended Data Fig. 8b), a feature commonly observed in tetraploid cells due to increased mitotic errors32. The specific cause of WGD (endoreduplication, errors in cytokinesis or cell fusion33) was not evident from our data; however, this signature had high activity in tumours with PIK3R2, AKT1 and MAPK1 mutations, suggesting that tolerance to WGD may be mediated by PI3K–AKT activation34,35 (Extended Data Fig. 8a).

Signature of impaired non-homologous end joining

CX10 displayed a pattern of clustered and oscillating copy number changes (Extended Data Fig. 4). Its activity was significantly higher in tumours with inactivating mutations in FBXW7 and correlated with FBXW7-mutant-mediated tandem duplication class 1/2 (Extended Data Fig. 8 and Supplementary Fig. 22), suggesting impaired non-homologous end joining17,36 as a putative cause. A significant increase in the proportion of breakpoints with microhomologies in samples with this signature was indicative of a lack of blunt-end joining, which is a hallmark of non-homologous end joining (Supplementary Fig. 29a).

Signatures of amplification

CX8, CX9, CX11 and CX13 encoded patterns of low-level, mid-level, mid-level and high-level amplifications, respectively (Extended Data Fig. 4). Higher activity of CX8 in the context of amplification and overexpression of U2AF1 (ref. 28) and MAPK1 (ref. 26), and for CX9 ERBB3 (ref. 37) (Extended Data Fig. 8), suggested replication stress as a putative cause. All four signatures were associated with increased cell cycle score (Supplementary Fig. 30), reinforcing replication stress as a causal factor. In addition, CX8, CX9 and CX13 were associated with APOBEC mutagenesis (SBS2 and/or SBS13 signatures; Supplementary Fig. 21a), and CX9 and CX11 were associated with ID signatures 1 and 2 (ref. 38) (Supplementary Fig. 21). CX9 copy number changes were not part of oscillating chains; however, the remaining amplification signatures were. CX13 was strongly associated with extrachromosomal DNA circularization and amplification events (Supplementary Fig. 31); however, the specific mechanism causing the extrachromosomal DNA was not evident.

Unknown aetiologies

CX7, CX12, CX15, CX16 and CX17 did not have patterns of copy number change or associations clearly indicative of a putative cause (Extended Data Figs. 4 and 8a). Therefore, these signatures currently have unknown aetiologies.

Cross-signature observations

Many covariates demonstrated associations with multiple signatures. Chromothripsis was linked with seven different signatures (Extended Data Fig. 8), suggesting that many potential aetiologies underpin these complex rearrangements. Replication stress was associated with eight signatures, highlighting it as a major source of CIN (Fig. 2). Different signatures showed a bias for occurrence before WGD (CX1, CX2, CX7 and CX15) or after WGD (CX3, CX5, CX6, CX8, CX9, CX13 and CX17), demonstrating the importance of WGD events in modulating CIN (Extended Data Fig. 8b and Supplementary Fig. 18e, f). Finally, signatures of APOBEC mutagenesis and kataegis were associated with six signatures, highlighting these as a common feature of CIN39 (Extended Data Fig. 8b and Supplementary Figs. 18 and 21).

Drug response prediction and drug target identification

The putative signature aetiologies implicated canonical cancer pathways as some of the major drivers of CIN. Many of these pathways have been the focus of targeted therapy development. Therefore, given that our signatures can be readily measured in tumours from patients, we explored their utility for therapy response prediction and drug target identification. We integrated data from 297 cancer cell lines, including copy number profiling, genome-wide clustered regularly interspaced short palindromic repeat (CRISPR–Cas9) knockout screens, genome-wide RNA interference (RNAi) screens and the profiling relative inhibition simultaneously in mixtures (PRISM) drug repurposing screen (Supplementary Methods). We assessed correlations between signature activities, gene essentiality and sensitivity to drug perturbation of the gene (Fig. 3a).

Fig. 3: Signatures as biomarkers for drug response and discovery of novel drug targets.
figure 3

a, A schematic showing how response biomarkers and novel drug targets were found by correlating signature activities with gene essentiality determined by CRISPR–Cas9 or RNAi screens, and with response to drug perturbations measured as the area under the dose response curve, across 297 cell lines. The Venn diagram shows the overlap of significant correlations for each of the signature to target gene associations. The colour of the circles in the Venn diagram matches the schematic above, and the shaded areas indicate which results relate to b and c. b, A summary of the significant associations between copy number signatures and drug response to 44 therapies. Each signature on the right is linked to a therapy on the left if the signature is predictive of response to CRISPR and/or RNAi perturbation of a target gene, and treatment with a therapy that targets that gene. HDAC, histone deacetylase; PR, progesterone receptor; sGC, soluble guanylate cyclase. c, A summary of the significant associations between copy number signatures and target gene perturbation. Each signature on the left is linked to a target gene on the right if the signature is predictive of response to CRISPR and RNAi perturbation of the target gene. The listed targets were filtered for druggability according to their structure or by ligand-based approaches (n = 104) and their previous known association with CIN (n = 49). ROS, reactive oxygen species; IHR, impaired homologous recombination.

Source data

We identified 40 genes where copy number signature activity was significantly correlated with both genetic and drug perturbation of the target (Fig. 3b and Supplementary Table 56). Among these, several revealed promising new therapeutic avenues for targeting CIN. CX4 (associated with PI3K–AKT activation) was correlated with response to inhibition of CCND1 via arcyriaflavin-A, which may indicate a therapeutic strategy for reversing tolerance to WGD40. CX5, a signature of IHR, predicted response to olaparib via inhibition of PARP1. Given that this signature was also correlated with RNAi knockdown of PARP1, this may represent a biomarker that is specific to the inhibition of regular protein function rather than PARP trapping41. CX9 (associated with replication stress) was correlated with response to multiple kinase inhibitors targeting genes involved in major mitogenic pathways (EGFR, JAK1, MET, PRKCA and PIK3CA), suggesting that a multikinase inhibitor approach may be suitable for targeting replication stress. Correlation of CX13 (also associated with replication stress) with response to inhibition of CDK4 may potentially represent a biomarker-led approach for improving CDK4/6 inhibitor-mediated tumour sensitization to immune checkpoint blockade42.

Copy number signature correlations with gene essentiality scores from both CRISPR and RNAi perturbation screens identified 104 target genes with druggable structures that currently have no targeted therapies in the clinic (Supplementary Table 57). These represent putative synthetic lethal drug targets, 49 of which had evidence of being implicated in CIN-related mechanisms (Fig. 3c). A number of these show promising links between the signature aetiology and potential consequence of target inhibition. CX1 activity was correlated with perturbation of ACTL6A (involved in the SWI/SNF complex) and TERF1 (involved in telomere maintenance), both of which are required for faithful chromosome segregation during mitosis4,43. The combined dysregulation of mitosis and telomere elongation machinery associated with CX1 suggests that inhibiting either one of these genes might be a promising therapeutic strategy by creating synthetic lethality. Indeed, inhibition of both genes has been previously suggested to induce cell lethality by generating excessive CIN44. CX9 was correlated with perturbation of BUB1B, a spindle assembly checkpoint gene recently identified as therapeutically relevant in CIN-high cells measured via WGD status45 and an aneuploidy score7. This association with CX9 suggests that the spindle assembly checkpoint may have a crucial role in tolerating mid-level amplifications, and reducing levels of BUB1B may induce excessive and catastrophic chromosome missegregation46. Finally, CX11, which was strongly associated with CDK4 amplification, was correlated with inhibition of GNL2, which in turn impedes the formation of the cyclin D1–CDK4 complex47.

Predicting platinum sensitivity

The aetiologies of the three IHR signatures suggested a model of increasing CIN complexity (Fig. 4a and Extended Data Fig. 9). IHR alone gives rise to CX2, a signature of small copy number changes indicative of tandem duplication. IHR plus replication stress leads to CX5, which involves larger CNAs. Finally, IHR plus replication stress, impaired damage sensing and impaired NER gives rise to CX3 with the largest CNAs that are strongly associated with loss of heterozygosity. Our results did not reveal whether the different levels of complexity developed in a stepwise manner or by independent processes.

Fig. 4: Predicting platinum sensitivity using IHR signatures.
figure 4

a, A proposed model of increasing CIN complexity for IHR signatures based on the signature aetiologies. b, Results for each IHR signature after training a Cox proportional hazards model to predict overall survival across 545 ovarian cancers treated with platinum-based chemotherapy. Hazard ratios, their 95% confidence interval and Wald test significance are reported. The dashed line indicates a hazard ratio of 1. c, A schematic of the clinical classifier built on CX3 and CX2 activities of ovarian cancer samples with germline BRCA1 mutations. d, Results of survival analyses after applying the classifier from c to assign patients into predicted sensitive (plus symbol) or predicted resistant (minus symbol) groups. Each row displays results for each of the four cancer cohorts from the TCGA and PCAWG projects. Differences in median survival are indicated by the arrow, with P values from a log-rank test appearing below (Kaplan–Meier survival analysis). Hazard ratios and their 95% confidence interval of the predicted sensitive group compared to the predicted resistant group are obtained from Cox proportional hazards models correcting for stage and age of patients. The P value represents the corresponding Wald test. AU, Australian Project at the ICGC/PCAWG consortium; OV, ovarian cancer; ESAD, oesophageal cancer; UK, British project at the ICGC/PCAWG consortium.

Source data

Disruption of both HR11 and NER48 have been shown to confer sensitivity to platinum-based chemotherapy. Given that only CX3 was associated with disruption of NER, we hypothesized that the IHR signatures may demonstrate differing abilities to predict platinum sensitivity. As patients with ovarian cancer are routinely treated with platinum-based chemotherapy, we tested the ability of all three signatures to predict overall survival, and hence platinum sensitivity, using a Cox proportional hazards model (Fig. 4b and Supplementary Fig. 32). CX2 showed no association with platinum sensitivity, CX5 was predictive of resistance and CX3 was predictive of sensitivity.

Given that these IHR signatures were able to dissect platinum response, we further hypothesized that they could be used in combination to provide better predictors of platinum sensitivity. As CX2 was not predictive, we used it as a baseline for capturing non-predictive IHR-related genomic changes, and required that the predictive CX3 activity exceed it to potentially confer sensitivity. This resulted in a simple classification rule: ‘if CX3 activity is greater than CX2 activity, then predict sensitivity’ (Fig. 4c). This interpretable classifier was able to distinguish significant overall survival separation across cohorts of BRCA1 germline mutant ovarian cancers, ovarian cancers from the TCGA cohort, an independent validation cohort and an oesophageal cancer cohort (also routinely treated with platinum-based chemotherapy) (Fig. 4d, Extended Data Fig. 10 and Supplementary Figs. 3336). Other classifiers using all three IHR signatures, including more complex machine learning methods, did not outperform this decision rule (Supplementary Fig. 37). Furthermore, this simple classifier had comparable performance to more complex state-of-the-art HRD predictors, which rely on additional data beyond copy number, applied to cohorts of ovarian, oesophageal and breast cancers (Extended Data Fig. 10c, d). By applying this classifier to the whole TCGA ovarian cohort, we estimate that 27% of ovarian tumours might be platinum sensitive. Applying the classifier pan-cancer, we estimate that 8% of all tumours might be sensitive.

Discussion

Here we present a robust analysis framework for CIN in human cancers built on a pan-cancer analysis across 33 cancer types. This resource advances the field in two ways: it untangles CIN according to characteristic genomic patterns and underlying causes, and defines copy number signatures as new biomarkers to quantitatively measure different types of CIN. Our approach complements previous landscape studies of the genetic consequences of CIN49, which generally focused on recurrent somatic copy number events at individual loci. By contrast, copy number signatures8,9 uncover mechanistic biases in the patterns of alterations across all chromosomes.

In its current form, the signature methodology cannot account for selection pressures on CNAs. For single-nucleotide variant signatures, passenger mutations provide strong signals for detection. However, for CNAs, the distinction between driver and passenger mutations is less clear. For example, large homozygous deletions are likely to be subject to strong negative selection, whereas other CNAs can be subject to strong positive selection. This has implications for the ability to detect signatures of CIN. Those processes that generate CNAs under positive selection will be easier to detect than those that generate CNAs under negative selection. Quantitatively, the relationship between signature detection and selection is not yet well understood and will depend on genomic background. For example, negative selection will be weaker in whole-genome duplicated samples (approximately 50% of tumours) and in tumours that have lost their ability to sense DNA damage (for example, via TP53 mutation).

To maximize sample size, we used SNP 6.0 technology data from the TCGA collection. This technology is well established for copy number analysis, but has lower resolution than WGS. As further WGS data become available, there will be an opportunity to refine our signatures and increase their resolution. In their current form, we have demonstrated that the signatures are widely applicable across technologies, including inexpensive assays such as shallow WGS that can be easily applied in a clinical setting to formalin-fixed tumour material50. However, it is important to note that the bulk-DNA samples that we analysed do not show dynamics of CIN, and future work is needed to extend our approach to multiple samples or single cells from the same patient to show how patterns of CIN change over time. Further work is also required to quantify copy number signature activity at specific genomic loci, as our method currently only supports signature quantification at a whole-genome level.

The 17 copy number signatures and their putative aetiologies provide a valuable resource for furthering our understanding of CIN. For example, CX1 represents the most prevalent type of CIN across tumours: chromosome missegregation. Aetiology analysis of CX1 pointed at multiple different mitotic defects giving rise to this signature. This suggests that, despite diversity in the potential causes of mitotic defects, these all result in the same change in genome structure1. These missegregation events typically result in large copy number changes, potentially disrupting the function of many genes; however, our signature analysis reveals that these changes only represent, on average, 4% of the total number of copy number changes observed in a tumour (Supplementary Fig. 38). By contrast, CX2 accounts for 23% of the copy number changes observed in a tumour. This highlights the power of our compendium of signatures to quantify and disentangle the causes and functional effect that different types of CIN have on tumour genomes. Our results also highlight the potential of our signatures to improve the treatment of patients with extreme CIN tumours. Platinum-based chemotherapy is currently considered a broad-spectrum cytotoxic chemotherapy and is routinely used to treat cancers with extreme CIN. However, here we showed that platinum response can be robustly dissected using different signatures of IHR. By developing the IHR signatures into a companion diagnostic assay, platinum-based therapies could potentially be administered in a more targeted manner, allowing resistant patients to avoid their toxic side effects, and healthcare systems to reduce the cost burden of ineffectual treatment. Similarly for other signatures, our analysis of drug response across cell lines reinforces their potential to be developed into companion diagnostics for improved patient stratification during clinical trials.

The signature compendium presented here is an important resource to guide future studies into a deeper understanding of the origins and diversity of CIN and how to therapeutically target different types of CIN.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.