Introduction

Endometrial cancer is the most common invasive gynaecological cancer in women with more than a quarter of a million new cases each year (Ferlay et al. 2010). Moreover, the incidence of endometrial cancer has increased significantly in western countries over the past 20 years (Evans et al. 2011). The majority of endometrial cancers (80–90 %) are histologically classified as endometrioid subtype at the time of presentation (Clement and Young 2002) and are primarily treated by surgical resection. Patients are vulnerable to treatment-related morbidity due to the high prevalence of comorbidities, such as obesity (Dowdy et al. 2012).

A small proportion of endometrial cancer cases (5–25 %, depending on selection criteria) are due to high-risk germline mutations in one of the mismatch repair genes MLH1, MSH2, MSH6, and PMS2 (Buchanan et al. 2014), often presenting with early age of cancer onset, and strong family history of cancer as part of Lynch Syndrome. Evidence also exists for the contribution of other DNA repair-related genes to endometrial cancer risk, including POLD1 (Palles et al. 2013). Candidate gene and genome-wide searches for single-nucleotide polymorphisms (SNPs) associated with modest risk of endometrial cancer have to date convincingly identified one locus at 17q12 (HNF1B) (Spurdle et al. 2011), with evidence for risk loci at 15q21 (CYP19A1) (Setiawan et al. 2009) and 1q42 (near CAPN9) (Long et al. 2012). Thus, the genetic changes underlying the disease remain unknown for most endometrial cancer affected women.

DNA copy number variation (CNV) in the human genome is increasingly recognised as a major source of genetic variation that may influence cancer risk (Krepischi et al. 2012b; Kuiper et al. 2010). Indeed, cancer-predisposing CNVs are known to occur in important cancer-associated genes and/or pathways in numerous cancers, including the mismatch repair genes, MLH1, MSH2, MSH6 and PMS2 in Lynch Syndrome affected families (Thompson et al. 2014). There is also evidence that CNV frequency and/or size across the genome may play an important role in disease development. Increased germline CNV load has been shown to be associated with earlier age of cancer onset in TP53 mutation carriers (Shlien et al. 2008), and also with predisposition to colorectal cancer in high-risk families with or without mutations in mismatch repair genes (Talseth-Palmer et al. 2013; Yang et al. 2014), breast cancer in families that do not carry BRCA1 or BRCA2 mutations (Pylkas et al. 2012) and germline deletions were associated with ovarian cancer in women carrying a mutation in BRCA1 (Yoshihara et al. 2011). However, these associations have only been found in patients from high-risk families. Despite a number of reported population-based genome-wide association studies of individual CNV genotypes across various cancers (Craddock et al. 2010; Krepischi et al. 2012a; Long et al. 2013), we are unaware of any study assessing role of CNV load in cancer predisposition at the population level. Further, many CNV studies published to date have utilised convenience “control” groups with no or little demographic and clinical annotation, limiting interpretation of the results.

It has been shown that the extent of copy number change can correlate with the level of gene expression of the variant (Stranger et al. 2007). Similarly, copy number changes may disrupt expression of small non-coding RNA regions, namely microRNAs (miRNAs) which have a role in key cellular processes including development, cellular proliferation and apoptosis, and small nucleolar RNAs (snoRNAs) and Cajal body-specific RNAs (scaRNA), which are involved in post-transcriptional processing of other non-coding RNAs (Marcinkowska et al. 2011). Finally, alterations to CpG islands by CNVs have been shown to directly influence gene expression levels by changing methylation patterns (Robinson et al. 2010) and modification of transcription factor binding sites (Elango and Yi 2011; Zhang et al. 2009). Thus, consideration of the functional consequences of CNVs is an important aspect to assess in disease risk analyses.

In this study, we utilised SNP array data from a previously reported genome-wide association study of endometrioid endometrial cancer cases (Spurdle et al. 2011) and compared results to those from a well-annotated comparable control group, to assess whether CNVs across the genome show an altered frequency in endometrial cancer cases compared with healthy controls. Our analysis considered the likely functional importance of CNVs by assessing their location relative to genes and gene regulatory elements, and also CNV frequency in the cohort.

Materials and methods

Study cohorts

Detailed descriptions of the case and control sample sets utilised in this study have been previously reported (McEvoy et al. 2010; Spurdle et al. 2011). Briefly, 1,343 endometrioid endometrial cancer cases with self-reported European ancestry were selected for genome-wide genotyping from the Australian National Endometrial Cancer Study (ANECS) or the Studies of Epidemiology and Risk factors in Cancer Heredity study (SEARCH) in the UK (Spurdle et al. 2011). Control samples (n = 528) were female participants in the Hunter Community Study (HCS), a population-based cohort aged 55–86 years, predominantly of European ancestry and residing in the Hunter Region in New South Wales, Australia (McEvoy et al. 2010). Control individuals with report of any cancer were excluded from the analysis.

Genotyping, identification of CNVs and quality control

All DNA samples were genotyped with the Human610-Quad BeadChip (Illumina, Inc, San Diego, CA, USA) with ~610,000 markers, as described previously (Spurdle et al. 2011; Talseth-Palmer et al. 2013). Data for each array were normalised using GenomeStudio 2011.1 software (Illumina). Probe information including, genomic location, signal intensity (Norm R), allele frequency (Norm θ), log R Ratios (LRRs), and B Allele frequencies (BAF) for each sample was calculated and exported from GenomeStudio. All samples had a call rate >95 %. Results from parallel quantitative PCR validation studies (see below: Quantitative PCR CNV validation section, Online Resource 1) demonstrated that PennCNV (Wang et al. 2007) had the highest accuracy in identifying Multiplex ligation-dependent probe amplification (MLPA)- and/or quantitative PCR (qPCR)-detected gene deletions of MMR and other selected genes identified in this cohort, with 13/13 copy number variants called by PennCNV confirmed using these technologies, but lower rates of validation for CNVs called using other algorithms [QuantiSNP (Colella et al. 2007), CNV Partition (http://www.illumina.com/software/illumina_connect.ilmn), Gnosis (http://www.cnvision.org)] (Online Resource 1A). Based on this observation, CNV calls were then generated using the PennCNV program (version 2009 Aug 27), using the default program parameters, library files and genomic wave adjustment.

Case and control individuals were subject to quality control measures as previously defined, including measures of heterozygosity, relatedness, and non-European ancestry (Spurdle et al. 2011; Talseth-Palmer et al. 2013). Additional quality control procedures were performed for this copy number analysis to remove poor-quality array data (Online Resource 2) using the following exclusion criteria: log R ratio standard deviation >0.28, B allele frequency drift >0.01, waviness factor deviating from 0 by >0.04 and/or with the number of CNV calls exceeding 70. To reduce false positives, CNV calls were excluded if they contained <5 probes and/or were ≥1,000 kb in size. A total of 1,209 cases and 528 female controls passed quality control and were included in the analysis.

Quantitative PCR- and multiplex ligation-dependent probe amplification-CNV validations

Experimental validation of predicted CNVs using MLPA and/or qPCR was carried out, prior to (Buchanan et al. 2014) or during this study (Online Resource 1), for a subset of 13 CNVs predicted, using the four CNV calling algorithms, to overlap the known Lynch Syndrome genes MLH1, MSH2, MSH6 and PMS2, or 7 other selected genes (Online Resource 1A). During this study, we predicted a total of nine deletions [including four in MMR genes previously identified using MLPA testing (Buchanan et al. 2014)], and four duplications (including one duplication in MSH6). MLPA for MMR genes was carried out with SALSA kits P003, P003-B2, P008, P072-B2, and P248-A2 (MRC-Holland, Amsterdam, the Netherlands), and qPCR validation studies for all genes tested were performed using the Roche Light Cycler 480 (LC480) (Hoffmann-La Roche Ltd, Basel, Switzerland) and Platinum SYBR Green qPCR SuperMix-UDG (Invitrogen, California, United States of America). Primers (Online Resource 3) were designed to target coding regions overlapping CNVs using previously documented parameters (D’Haene et al. 2010). Genomic regions overlapping ZNF80 (D’Haene et al. 2010) and ALB (Meijerink et al. 2001) were targeted as internal references for normalisation. Each sample was run with four independent replicates and at least three non-CNV carrying controls assayed alongside. Normalised copy number values calculated using the Lightcycler 480 Gene Scanning software. qPCR analysis validated seven deletions and three duplications, including all four MMR gene CNVs previously identified using MLPA. Only CNVs predicted by PennCNV were consistently validated by qPCR (Online Resource 1 A, B, C).

Identification of genes, CpG islands and small nuclear RNAs overlapping CNVs and defining rare CNVs

To avoid examining multiple isoforms from genes, we annotated 39,544 UCSC RefSeq (NCBI36/Hg18) transcripts using the SOURCE database (Diehn et al. 2003) and defined the genomic intervals for a total of 18,791 unique genes. Thus, each gene interval encompassed the start and end of all associated RefSeq transcripts (Online Resource 4). CNVs and gene regions that were estimated to overlap by at least one base pair were identified in a genome-wide scan using Intersect and Join tools from the Galaxy web server (Blankenberg et al. 2010; Giardine et al. 2005; Goecks et al. 2010).

Since CNV calls do not typically conform to discrete genomic regions in different individuals, we used the genome coordinates of 18,791 RefSeq gene (NCBI36/Hg18) boundaries to define a CNV region (Online Resource 4). Each of these regions, therefore, represented a cluster of one or more CNVs overlapping a well-characterised gene in the human genome and was used to measure the frequency of CNVs in our study cohort. Rare gene-overlapping CNVs were defined as those with frequency <1 % of the total cohort. To identify rare CNVs overlapping CpG islands and small RNA genomic regions, a similar approach was carried out using coordinates of CpG islands and sno/miRNA [UCSC Genome Browser (NCBI36/hg18); http://genome.ucsc.edu/cgi-bin/hgGateway] instead of gene regions. The CpG island track used by UCSC Genome Browser defined 20,338 unique coordinates across the genome using the following criteria: GC content of ≥50 %, length >200 bp, ratio of observed to expected CpG dinucleotides >0.6. The data track for sno/miRNA from UCSC Genome Browser (NCBI36/hg18) defined 1,120 regions across the genome.

Statistical analysis of CNV load

The statistical package R Project version R 2.14.2 (http://www.r-project.org/) was used to perform statistical analyses. T tests with the Satterthwaite adjustment for unequal variances were conducted to establish the level of significance associated with the difference in CNV carrier frequencies between the cases and controls. Mid-P exact test was used as a measure of association between CNV load and disease status. P values were considered significant if P value <0.05.

Results

CNV frequency in endometrial cancer cases and controls

After stringent quality control measures were applied, genotype data from Illumina 610 K SNP arrays suitable for genome-wide CNV scans were available for 1,209 cases and 528 female controls (Online Resource 2). Using the PennCNV algorithm, a total of 30,663 and 13,399 CNV calls were generated for cases and controls, respectively (Table 1), ranging in size from 0.5 kb to 998 kb. The average number of CNVs observed per individual did not differ significantly between cases and controls (25.36 vs 25.38, P = 0.97). Likewise, no difference was observed when considering deletions and gains separately (Table 1).

Table 1 Frequency of CNVs in endometrial cancer cases and controls, and overlapping functional and regulatory regions

The average number of CNVs predicted to overlap genes was marginally greater in cases versus controls (8.56 vs 8.17, P = 0.04), with this effect attributable to deletions (case average = 5.07 vs control average = 4.70, P = 0.01) rather than DNA gains (case average = 3.48 vs control average = 3.48, P = 0.98) (Table 1). There was no statistically significant difference between cases and controls for deletions or gains of CpG islands. Increased frequency in cases versus controls for CNVs overlapping sno/miRNA regions (0.10 vs 0.04, P = 6 × 10−5) was attributable to a 2.53-fold increase of deletions overlapping sno/miRNA regions. The average number of sno/miRNAs disrupted by deletions was 4.3-fold higher in cases compared to controls (P = 0.001) (Table 1). Additionally, 34 miRNA regions predicted to be deleted in at least one case sample were not found in controls, with the most frequently affected regions encompassing hsa-mir-661 (n = 14 cases) and hsa-mir-203 (n = 11 cases) (Online Resource 5). For copy number variants overlapping sno/miRNAs but not genes, there was no significant difference between cases and controls when assessing the number of deletions (P = 0.09) or duplications (P = 0.61), however, there was limited power due to small CNV numbers (Online Resource 6). The average size of CNVs did not differ between cases versus controls (Online Resource 7).

Rare CNVs in endometrial cancer cases and controls

To examine the prevalence of rare CNVs overlapping genes in our study cohort, CNVs overlapping RefSeq gene regions in 18 or more individuals (≥1 % frequency in the study cohort) were excluded, all remaining rare CNVs either fully or partially overlapped at least one RefSeq gene region. The number of rare deletions per individual that overlapped at least one gene was 1.73-fold greater in cases compared to controls (1.63 vs 0.94; P = 8 × 10−10), but no statistically significant difference was seen for copy number gains between the two groups (P = 0.69) (Table 2). Conversely, the average number of RefSeq genes predicted to be disrupted by a rare genomic deletion was 2.26-fold higher in cases compared to controls (2.76 vs 1.22; P = 4 × 10−10). Thus, on average, rare deletions in endometrial cancer cases are disrupting one additional gene compared to controls.

Table 2 Frequency of rare CNVs and overlapping genes, CpG islands or sno/miRNAs in endometrial cancer cases and controls

A similar approach was used to identify rare CNVs (<1 % frequency) overlapping CpG islands. The average number of rare deletions predicted to fully or partially overlap at least one CpG island was 1.96-fold greater in cases than controls (0.78 vs 0.40; P = 1 × 10−7), and the number of gains overlapping CpG islands was 1.16-fold greater in cases than controls (0.72 vs 0.62; P = 0.05) (Table 2). Likewise, the average number of CpG islands predicted to be disrupted by a rare genomic deletion was 3.38-fold higher in cases compared to controls (3.01 vs 0.89; P = 2 × 10−7), and the average number of CpG islands impacted by rare genomic copy number gains was 1.41-fold higher in cases compared to controls (2.08 vs 1.47; P = 3 × 10−3). Results of this analysis suggest that rare CNVs in endometrial cancer cases are disrupting two additional CpG islands compared to controls. Interestingly, 64 % of these CpG islands are located within genes also disrupted by rare CNVs (data not shown). To extricate the contribution of CpG islands (falling within the coordinates of a Refseq gene) from the observed results, the loading effect of CNVs overlapping only apparently intergenic CpG islands was assessed. There were more rare CNVs disrupting intergenic CpG islands in cases compared to controls (deletions P = 0.001; gains P = 0.31) (Online Resource 8). Further, intergenic CpG islands were disrupted significantly more often in cases than controls by both rare deletions (0.88 vs 0.55; P = 0.002) and rare DNA gains (1.09 vs 0.77; P = 0.008).

Rare deletions (<1 % frequency) overlapping sno/miRNAs occurred at a frequency 7.69-fold higher in cases than controls (0.07 vs 0.01; P = 3 × 10−9). This equated to a 13.90-fold increase in the number of sno/miRNAs disrupted by rare CNVs (0.16 vs 0.01; P = 6 × 10−4), and there were 191 sno/miRNAs disrupted in cases while only six sno/miRNAs were affected in controls (Table 2). That is, over 30 sno/miRNAs were disrupted by rare deletions in cases for every single disruption event in controls.

DNA repair genes and known cancer susceptibility genes disrupted by CNVs

Given the role of many known high-risk cancer susceptibility genes in DNA repair, we extended our analysis to identify rare CNVs overlapping genes acting in the DNA repair pathway. One hundred and seventy-six DNA repair genes used in this analysis were sourced through an updated version of an online inventory (http://sciencepark.mdanderson.org/labs/wood/dna_repair_genes.html; updated 4th March 2013), and included the MMR genes MLH1, MSH2, MSH6, and PMS2. There were more cases than controls carrying rare deletions overlapping DNA repair genes (19 vs 1, P = 0.007, OR = 8.40, 95 % CI 1.54–177.0) (Table 3), and this finding remained nominally significant even after removing four samples with deletions overlapping MLH1, MSH2, and MSH6 (15 cases versus 1 control, P = 0.03, OR = 6.64, 95 % CI 1.18–141.60) (Online Resource 9). A total of 24 DNA repair genes (including the MMR genes MLH1, MSH2, and MSH6) overlapped rare CNVs across 39 cases (out of 1,209; 3.2 %) compared to eight genes across ten controls (out of 528; 1.9 %). There was no evidence for an increase in rare CNV gains overlapping DNA repair genes in cases versus controls. There was evidence for an increase in CNV deletions between cases with (n = 15) or without (n = 1,190) CNV disruption of DNA repair genes (Online Resource 10), and this difference was observed overall (P = 0.01), for CNVs overlapping all genomic features assessed (P = 0.01 for genes, P = 0.005 for CpG islands, P = 0.05 for sno/miRNAs). Mean CNV size of CNVs overlapping genes and sno/miRNAs was greater in cases without DNA repair gene disruption (P = 0.04), however, no size difference was observed for all CNVs and CNVs overlapping CpG islands (Online Resource 11).

Table 3 Association between endometrial cancer rare CNVs overlapping DNA repair genes

Discussion

Relatively little is known about inherited factors that influence endometrial cancer risk, and to our knowledge this is the first genome-wide study to explore the role of germline CNV load in endometrial cancer. We used the PennCNV algorithm for assessing CNVs detected on the Illumina platform, and compared frequencies of CNVs stratified by functional annotation and frequency in a well-characterised set of endometrial cancer cases and controls. More deletions than gains were observed in both cases and controls in our study, consistent with the fact that probe intensity differences for most DNA gains are typically smaller than those for deletions; hemizygous deletions will reduce the probe intensity by half (2:1 ratio) compared to a 1.5-fold increase in probe intensity (3:2 ratio) for duplications.

For large-scale studies where experimental validation of every predicted CNV is impractical, such as this load analysis, accuracy of CNV calling algorithms is of great importance. PennCNV was designed for use with Illumina array data (Wang et al. 2007), and has been reported in several studies to have a high true positive call rate for CNVs called using a five probe minimum (Dellinger et al. 2010; Marenne et al. 2011). Specificity of 98 % was observed for PennCNV in a controlled study validating CNVs by MLPA analysis in paired samples (Marenne et al. 2011). In support of this, we have found PennCNV to demonstrate high concordance with MLPA-identified and/or qPCR-verified gene deletions in our study. Another important consideration when characterising CNVs, in particular rare CNVs, is the control group selected for comparison. A number of published reports define rare CNVs as those that do not occur in the Database of Genomic Variants (DGV). However, DGV is a compilation of validated and unvalidated CNVs found in studies of individuals from various populations of different ethnicities, and using a variety of array and sequencing platforms (Iafrate et al. 2004). Importantly, there is no medical information available for most of the samples in DGV which precludes filtering for non-diseased individuals to serve as appropriate controls, and the database includes variants overlapping known cancer susceptibility genes (e.g. MSH2 and MSH6) (Iafrate et al. 2004). Therefore, DGV is not an appropriate control dataset to define rare CNVs of clinical importance.

Rare CNV load has been reported to be relevant for pancreatic (Al-Sukhni et al. 2012), breast (Krepischi et al. 2012a) and ovarian (Yoshihara et al. 2011) cancers, in studies that have not considered regulatory regions or defined rare CNVs using a large, well-annotated control dataset. In this study, we provide evidence for a role of germline structural alterations in endometrial cancer risk. We found that overall CNV load of deletions or DNA gains did not differ significantly between cases and controls, but cases presented with an excess of rare germline copy number deletions disrupting genes, CpG islands and sno/miRNAs.

The observation that effects were most obvious when considering rare deletions of functionally important gene regions is supported in the literature. In particular, Krepischi et al. (2012a) postulate that overtly deleterious germline CNVs are removed from the population and those remaining are a result of less stringent selection or alternatively they indicate inefficient DNA repair and apoptosis in relevant individuals. This is corroborated by our finding of an elevated frequency of germline loss of a DNA repair gene in cases versus controls and demonstrates that genes in DNA repair pathways other than the four mismatch repair genes are also worthy of further investigation in future endometrial cancer genetic studies. It was also supported by our observation that cases with CNV disruption of DNA repair genes were themselves more likely to demonstrate increased CNV deletion load. These results are also consistent with whole genome and exome sequencing studies of other cancers, many of which provide evidence for a role for DNA repair gene variants in cancer predisposition (Thompson et al. 2012).

The number of CpG islands disrupted by rare deletions was approximately threefold higher in cases versus controls. In some instances, the CpG gain/loss occurs in tandem with gain/loss of nearby gene(s). However, CpG island loss/gain does not always directly impact the exonic coding sequence, and there are several other ways by which CpG island disruption could contribute to gene dysregulation and tumourigenesis. First, disruption may directly alter transcription factor binding sites and thus gene regulation. For example, deletion of a CpG island in the promoter for AMCAR results in gene upregulation and promotes colon carcinogenesis, whereas simultaneous deletion of another two promoter-located CpG islands was associated with the opposite effect (Zhang et al. 2009). Furthermore, length of CpG islands has been found to correlate with gene expression levels (Elango and Yi 2011). Second, deletions or gains of CpG islands could contribute to endometrial cancer development via abnormal methylation, given that CNVs overlapping CpG islands are reported to directly influence methylation patterns (Robinson et al. 2010) and that abnormal methylation patterns are frequently observed in endometrial tumours (Banno et al. 2006; Ghabreau et al. 2004).

This study found a 2.5-fold increase in deletions overlapping sno/miRNA regions in cases versus controls. Deletions of sno/miRNAs are anticipated to disrupt regulation of multiple genes simultaneously and thus have widespread downstream effects. Both upregulation and downregulation of miRNAs have been implicated in gynaecological cancer development (Torres et al. 2011), but the germline loss of miRNAs is yet to be characterised in endometrial cancer. These findings are supported by literature detailing increased complexity in cellular processes and miRNA regulation due to normal cycling of endometrial tissue (Kuokkanen et al. 2010), and that miRNA dysregulation is well documented in endometrial cancer development (Torres et al. 2011).

The miRNA, hsa-miR-661, deleted in 14 cases and no controls, is a relatively well-characterised regulatory molecule. Transfection studies of the MCF-7 wildtype TP53 breast cancer cell line showed that miR-661 siRNA-mediated inhibition leads to decreased expression of MDM2 and MDM4, both negative regulators of TP53, in a tumour-suppressive manner. In addition, increased expression of hsa-miR-661 was reported to be associated with better outcome for breast cancer patients (Hoffman et al. 2014). hsa-miR-661 expression in MDA-MB-231 breast cancer cells was reported to decrease cell motility, invasiveness, and anchorage, and decrease tumour formation in nude mice (Reddy et al. 2009). In our case cohort, germline loss of hsa-miR-661 is a recurring event, supporting a potential role of miR-661 as a tumour suppressor in endometrial cancer predisposition.

miRNA hsa-miR-203 was deleted in 11 cases and no controls and is known to be intricately involved in regulating endometrial cell cycling. It is downregulated in the late proliferative endometrium and upregulated in the mid-term secretion phase (Kuokkanen et al. 2010). Literature to date describes a decrease of hsa-miR-203 expression in most malignancies, including hormone-related tumours (Viticchie et al. 2011; Wang et al. 2012). There are only two reports that examine the expression of hsa-miR-203 in normal and malignant endometrium. Huang et al. (2014) found that hsa-miR-203 was significantly hypermethylated in tumour samples, correlating with MLH1 methylation status, microsatellite instability, and decreased expression of pro-oncogene SOX4 in Ishikawa cells (Huang et al. 2014). Conversely, a smaller study by Chung et al. (2009) reported increased expression of hsa-miR-203 in endometrial adenocarcinomas compared to normal endometrium samples (Chung et al. 2009). The germline loss of the miR-203 in multiple cases but not controls in our study supports a role of hsa-miR-203 as a tumour suppressor in endometrial cancer predisposition, consistent with most previous studies on hormonal cancers.

Apart from CpG island analysis of rare CNVs, there was no evidence that DNA gains were involved in endometrial cancer predisposition. While this suggests the duplications in general are not as disruptive to gene function as deletions, it may also reflect that SNP arrays are not able to define the genomic location of duplicated material. For example, duplicated genomic material may lie in tandem, within itself, or be inserted at another part of the genome in either a benign or disruptive manner. Thus, the functional relevance of DNA gains would be better assessed using alternative technologies, possibly by determining if they are associated with altered expression of the gene(s) implicated, or directly assessing their effects on gene transcription. We also acknowledge that detection of CNVs in our study was limited by the resolution of the microarray used for genotyping. Future analyses performed at higher resolution with CNV specific arrays, or next-generation sequencing as this becomes more affordable, would improve the potential for detecting causal variants along the genome, both deletions and duplications.

Our results implicate rare germline deletions of functional and regulatory regions as possible mechanisms conferring risk in endometrial cancer. As such, this study provides a baseline for future validation studies that consider the functional relevance of predicted CNVs in assessing the role of CNV load in cancer predisposition. This study has also identified specific regulatory elements as candidates for further investigation in endometrial cancer predisposition.