Introduction

Ductal carcinoma in situ (DCIS) of the breast is a risk factor as well as a precursor lesion for invasive breast cancer (IBC). The current standard of treatment for DCIS involves surgery in combination with radiation therapy and/or endocrine therapy [1,2,3,4]. However, this treatment paradigm was developed based on the natural history of DCIS and IBC in the premammography screening era. In recent decades, DCIS detection rates have increased significantly due to contemporary, advanced screening imaging modalities [5]. Concurrently, multiple studies have demonstrated that only 13–52% of patients with DCIS eventually develop subsequent IBC [3, 6], suggesting that many patients with DCIS may not require extensive treatment, and raising the concern that some DCIS patients are being overtreated.

Therefore, a precision medicine approach to stratify risk of developing IBC among DCIS patients is critically needed. The discovery of genomic features that correlate with high-risk and low-risk DCIS would be important for tailored IBC screening and prevention. A number of studies, including ours, have examined the changes in the genome during the progression of breast neoplasia to IBC. The findings suggest that single nucleotide variations (SNVs) and copy number alterations (CNAs) both are acquired over a series of genomic events, which occur over the course of development of the IBC [7,8,9,10,11].

In our prior studies, we have examined genomic changes in hyperplasia, DCIS, and IBC by targeted sequencing [8], whole genome sequencing [7], and fluorescence in situ hybridization (FISH) [12] to identify genomic changes. These studies have found recurrent genomic changes in preinvasive neoplasia, both CNAs and SNVs, which have also been identified in IBC. Some events, like PIK3CA mutations, can occur quite early in the neoplastic timeline [13] while others, like ERBB2 amplification, occur later [14]. However, our hypothesis is that there is no single genomic feature that correlates with the transition from DCIS to IBC; instead, it is likely that a constellation of features or higher-order features, such as genome complexity or gene pathway alterations, are driving progression.

In this study, we investigated the mutational profiles of DCIS cases that do not develop IBC over a long follow-up interval (median 13 years), and DCIS cases that are initially associated with IBC or later develop IBC. We hypothesize that by combining molecular signatures with clinicopathologic features, we can elucidate the biology of breast cancer progression, and risk-stratify patients with DCIS.

Materials and methods

Patient population

Available cases were identified in the Department of Pathology at the Stanford University Hospital (SUH) from 2000 to June 2011. DCIS cases with adequate tissue for research sampling and confirmed follow-up were categorized as follows: (1) DCIS-only group: DCIS and no development of IBC over a median follow-up of 13 years (average 11 years) or (2) DCIS + IBC group: DCIS with concurrent or subsequent IBC present. There is no restraint imposed for the size of associated invasive cancer or the detection methods (screen-detect v.s. palpable mass, etc.). Surgical samples with sufficient tissue were collected with Health Insurance Portability and Accountability Act (HIPAA)-compliant Stanford University Institutional Review Board (IRB) approval (Protocol number 32496). Clinical data were obtained from the Oncoshare breast cancer research database, which has been described previously [15, 16].

Generation of targeted-capture libraries

Hematoxylin- and eosin-stained slides were reviewed to confirm the diagnosis of DCIS. The areas with abundant amounts of DCIS were chosen for examination. For cases with DCIS and invasive carcinoma present in the same specimen, we carefully selected the area with abundant DCIS away from the invasive component. The DCIS samples were acquired by taking 3-10 numbers of 2-mm cores from the corresponding areas of paraffin blocks (from cases on tissue microarrays TA 239, 419, 420, and 445). The thickness of the tissue in each core is approximately 2–3 mm, so we do not anticipate a major contamination from invasive carcinoma in deeper content of the core. Only tumor samples were analyzed, and no paired normal samples were included. The DNA was extracted using RecoverAll Total Nucleic Acid Isolation kit (Ambion). Targeted libraries of genomic DNAs were generated using the Agilent SureSelect XT kit and Agilent Automation Systems NGS system. Additional information is provided in the supplementary materials and methods.

Sequencing data analysis

The sequencing data were analyzed using a custom pipeline. In brief, reads were aligned to the hg19 human reference genome assembly using BWA [17]. Duplications were marked using Picard Tools V1.118 (http://broadinstitute.github.io/picard). Cases with less than 92 M aligned reads were discarded. Insertion–deletion realignment and base recalibration were achieved using GATK v.3.3-0 [18]. The somatic variant calls were carried out using an ensemble approach with four variant callers: MuTect [19], VarScan2 [19], VarDict [20], and Freebayes. [21] Calls present in at least two out four callers were accepted. The variant annotation was done using ANNOVAR [22] and custom scripts. The full list of nonsynonymous variants is provided in Supplementary Table S6. Bedtools coverage was used to create a histogram of coverage for each feature in the BED file and a summary histogram of all of features in the BED file. These histograms were then plotted using R to show the percentage of capture regions covered at any given depth for every individual sample. The sequencing data were uploaded to the NCBI Sequence Read Archive.

Fluorescence in situ hybridization (FISH)

FISH was performed as previously described [12], and additional details are provided in the Supplementary Materials and methods. Total test-probe green counts (1q32.1, 8q24.21, 11q13.11) were compared with red (2q37.3) control-probe counts, which are frequently unaltered in breast cancer [23]. The signals were evaluated according to two parameters: signals per cell and ratio of test probe to control probes. Cases were scored as gain at the locus if the target to control probe ratio was greater than 1.5 or the number of test signals was greater than three per cell.

Statistical analysis

Descriptive statistics were presented of baseline characteristics among the two DCIS groups (DCIS-only group and DCIS + IBC group). The student t test was used to assess whether differences in mutation burden existed between the DCIS alone and DCIS with IBC cohorts.

Two main multivariable logistic regression models were fit to characterize the association between DCIS progression to IBC and clinical and demographic features. The first model fits DCIS progression to IBC as a function of categorical copy number status, age at diagnosis, race/ethnicity, DCIS nuclear grade, tumor size, margins, surgery type, and an indicator for lack of PIK3CA mutation in the kinase region (PIK3CA-KD). The second model additionally adjusted for ER+ status and HER2+ status. Odds ratios and 95% confidence intervals were reported. Further details on additional models fit are available in the supplementary materials and methods. All statistical analyses were performed in R (Version 3.2.2, Vienna, Austria) [24].

Results

Patient population and demographic data

Cases of DCIS and documented follow-up in two cohorts were identified for genomic studies: (1) DCIS and no development of IBC (DCIS-only group) or (2) DCIS with concurrent/subsequent IBC events (DCIS + IBC group). A total of 125 DCIS cases were successfully analyzed, including 65 cases (52%) in the DCIS-only group and 60 cases (48%) in the DCIS + IBC group. In the DCIS + IBC group, 5 DCIS cases had IBC detected at a later time point. All patients were female, with median age at DCIS diagnosis of 51 years (range 29–89 years, Table 1). DCIS was characterized histologically and immunophenotypically; more than half (77 cases, 62%) demonstrated high-grade nuclei. The median tumor size was 2.4 cm (range 0.4–13.0 cm). The majority (72%) of cases had tumor margins of 0.2 cm and above. Sixty-one percent of surgeries were mastectomies. Eighty-one cases (65%) were positive for estrogen receptor (ER). Seventy-two cases (58%) were negative for HER2 (0, 1 + , or 2 + by immunohistochemistry), and 27 cases (21.6%) were positive for HER2 (3 + by immunohistochemistry). The univariable analysis of these parameters for association with IBC is presented in supplementary Table S2, and the detailed clinicopathologic data are provided in supplementary Table S7.

Table 1 Baseline features of the DCIS cases

The genomic profile of DCIS is similar to IBC

Targeted sequencing for common mutations in IBC was performed on 125 cases of DCIS. The targeted regions include SNPs and coding exons of known breast-cancer/pan-cancer-related genes, including APC, AR, ATM, BAP1, BRAF, CCND1, CHD1, CDKN1A, CTNNB1, DICER1, DNMT3A, EGFR, ERBB2, FOXA1, GATA3, IDH1/2, KRAS, MED12, MYB, NF1, NOTCH1, PIK3CA, PTEN, RB1, VHL, and WT1 (full list in Supplementary Materials and methods). The average read count obtained per case was 5,604,491. Additional quality control data are presented in Supplementary Table S5 and Supplementary Fig. 1. We excluded calls with coverage less than 30 reads. To focus on mutations having tumor suppressive or oncogenic effects, we limited the analysis to recurrent position mutations identified in the TCGA dataset that are frequently mutated in IBC. The most commonly mutated genes were PIK3CA (34.4%) and TP53 (18.4%) (Fig. 1a). We observed no significant difference of mutational burdens in known breast cancer-associated genes between the two groups of DCIS cases (p > 0.05 for all the genes) (Fig. 1b). We further analyzed variants based on their locations within the functional domains in each gene [25,26,27]. We identified “hotspots” of somatic mutations in the helical domain and kinase domain of PIK3CA for both DCIS-only and DCIS + IBC groups (Fig. 2). There was a significant enrichment of PIK3CA kinase domain mutations (PIK3CA-KD mutations) in the DCIS-alone group (p = 0.029). Analysis of other domains in PIK3CA genes found no statistically significant differences in recurrent mutations outside the kinase domain. Similar domain analysis was also performed for other frequently mutated genes, TP53 and GATA3, and no further predictive mutational profiles were identified.

Fig. 1
figure 1

Genomic landscape of DCIS. a Distributions of known recurrent IBC-associated variants are (a) displayed in a bar graph with the number of cases denoted above the bars; b displayed in a heat map. In the group of DCIS + IBC, cases with subsequent IBC events are highlighted in the boxes

Fig. 2
figure 2

Distribution of PIK3CA mutations in two groups of DCIS cases. The number in the circles depicts the number of DCIS cases harboring the mutation in that particular position. Of note, one of the DCIS-only cases exhibited two PIK3CA-KD variants

In addition to SNV and small insertion–deletion mutations identified by targeted exon sequencing, we also interrogated larger-scale copy number variations in selected “hotspot” genomic areas in the two DCIS groups (Tables 1 and S1). Three chromosomal loci were measured by FISH, 1q32, 8q24, and 11q13, based on prior genomic data on invasive breast cancer [28] and DCIS [12]. The current study cohort consists of 73 cases (58.4%) from the previously published cohort [12], and 52 new cases that have not been analyzed before. Consistent with our prior data, frequency of 1q32 gain is the highest in the cohort (59.0%), followed by 8q24 (48.2%) and 11q13 (32.5%).

Multivariable analysis demonstrates strong correlation between PIK3CA-KD mutation and risk of progression

We performed multivariable analysis, examining the association of IBC dependent on variables including PIK3CA-KD mutational status, copy number gains, age, race, nuclear grade, tumor size, margins, and surgery type (Table 2). After removing cases with missing data, we had 97 complete DCIS cases. We identified a statistically significant association between lack of PIK3CAKD mutation and increased risk of IBC (p < 0.05). Patients without PIK3CA-KD mutations were 4.52 times (confidence interval: 1.05-25.27) as likely to have IBC compared to subjects with the mutation. This association was also statistically significant in the univariable analysis (Supplementary Table S2). When subdividing the DCIS + IBC group into the cases with synchronous or recurrent IBC (Supplementary Table S8), similar trend was noticed. The number of cases in the group of DCIS + subsequent IBC is not powerful enough to draw a conclusion.

Table 2 Multivariable associations with DCIS progression to IBC (n = 97)

When additionally controlling for ER and HER2 status (complete case number = 81, Table 3), the association between lack of PIK3CA-KD mutation and IBC risk remained statistically significant: DCIS patients without the PIK3CA-KD mutations were 10.22 times as likely to be progress to IBC as compared to DCIS with PIK3CA-KD mutations (p < 0.05, Table 3). Similar results were observed when DCIS nuclear grades were grouped into a two-tiered system (low-grade vs non-low-grade DCIS; high-grade vs non-high-grade DCIS) (data not shown). In addition, the inverse association between any PIK3CA mutation and IBC risk became statistically significant (OR 4.66, p < 0.05, data not shown).

Table 3 Multivariable associations with DCIS progression to IBC include ER and HER2 statuses (n = 81)

The presence of genomic copy number gain (1q32 only, 8q24 only, or two or three of three gains) was also associated with increased risk of progression to IBC (Table 3). In addition, there were a trend that, but not statistically significant, overexpression of HER2 is inversely associated with the risk of IBC (OR 0.28, p < 0.1). This trend became statistically significant (OR 0.24, p < 0.05) when modeling any PIK3CA mutations instead of PIK3CA-KD mutations in the multivariable analysis (data not shown).

ER status and DCIS nuclear grade are important pathological features that are routinely examined in the clinical setting. Therefore, we investigated the interaction between ER/nuclear grade and the PIK3CA mutation status. The association between lack of PIK3CA-KD mutations and progression to IBC was not modified by ER status (Supplementary Table S3). The association between lack of any PIK3CA mutations (including PIK3CA-KD mutations and other PIK3CA mutations) and progression to IBC was modified by ER status (p < 0.05, Supplementary Table S4). There was a statistically significant interaction between DCIS nuclear grade and the presence of PIK3CA-KD mutation for the association with IBC (data not shown). The effect of PIK3CA-KD mutation appeared to be dependent on the nuclear grade: the association with no progression to IBC was stronger in the group of high-grade DCIS patients.

Discussion

While the genomic landscape of IBC has been extensively studied [29,30,31,32,33], the molecular profiling of DCIS is still under investigation. Limited data have been published for DCIS genomic profiling, using either next-generation sequencing or array CGH [9, 34,35,36,37,38,39]. To the best of our knowledge, this is the largest cohort of genomic profiling with longitudinal clinical follow-up of DCIS cases that did not progress. PIK3CA, TP53, and GATA3 are among the most commonly mutated genes in DCIS, and chromosome 1q and 8q copy number gains are frequently identified in DCIS, as seen in prior studies [34,35,36, 39]. Of note, in these previous reports, there was no recurrent mutation in a single gene that could stratify the risk of progression to IBC. However, further analysis based on protein domains was not performed in these previous publications.

We demonstrated a novel finding that the somatic mutations in the PIK3CA kinase domain provide predictive value of DCIS progression to IBC. The presence of mutations in this particular domain is associated with lower risk of concurrent or subsequent IBC. Previous studies have investigated the role of PIK3CA mutations in in situ breast cancers. In a small cohort of lobular carcinoma in situ (LCIS) without invasive lobular carcinoma (ILC) versus LCIS with associated ILC, the presence of PIK3CA mutations was not correlated with progression [40]. One of the studies showed that in ER-positive/HER2-negative DCIS, PIK3CA “hotspot mutations” were more prevalent in DCIS associated with IBC, compared with DCIS alone [41]. However, these “hotspot mutations” queried in this study include mutations in C2, helical, and kinase domains. This difference could account for the different conclusion drawn in our study, as we demonstrated that specifically PIK3CA kinase domain mutations are associated with lack of progress.

The finding that activating mutations in PIK3CA and overexpression of HER2 oncogenes are correlated with a tendency not to progress to IBC is unexpected. Variants of these two oncogenes are quite prevalent in IBC. In cultures and animal model systems, they demonstrate biologic influences that promote neoplastic growth, invasion, or metastasis [42,43,44]. Given that conventional models of cancer-associated progression with oncogenic mutations, it is surprising that two prominent oncogenes would be inversely correlated with the progression of DCIS to IBC.

One possible explanation for our findings is that alterations in specific pathways (such as HER2, PIK3CA) allow the cells to overcome immediate biological constraints in the process of tumorigenicity, rather than specifically promoting the DCIS to IBC transition. After the initial biologic challenges have been overcome, their influence on progression may be diminished. That they remain at high incidence in the invasive carcinoma may be in part due to the genomic difficulty or biological ambivalence in removing these somatic changes. In fact, PIK3CA mutations are extremely common in hyperplastic lesions of the breast [45]. HER2 is often amplified in DCIS and other noninvasive breast lesions, with a higher rate of amplifications in preinvasive lesions than in IBC [46,47,48,49]. A large cohort study with long-term follow-up data from Sweden showed that HER2-positive DCIS has lower risk of progression to IBC compared to HER2-negative DCIS [50]. Similar results have been reported independently [51]. For PIK3CA, previous researchers have found that in a subset of paired DCIS alone and DCIS with IBC samples, the PIK3CA mutations were present in DCIS alone but not DCIS with IBC, or with lower alternative allele frequency in the IBC component [37, 41, 52, 53]. PIK3CA KD mutations (exon 20 mutations) have also been observed in preneoplastic lesions (usual ductal hyperplasia, columnar cell change, or atypical ductal hyperplasia), while paired IBC lesions lack such mutations [45]. These results and our current findings suggest that selection for HER2 amplification or PIK3CA mutations may address neoplastic challenges that occur well before the transition from in situ to invasive cancer.

Notably, the genomic features we found to correlate with risk of progression consisted of aneuploides or large amplicons. While there are a number of known and suspected oncogenes present in the chromosomal regions with recurrent copy number alterations, it is not clear whether a single driving event is responsible for these somatic changes. We speculate that higher-order function attributable to the gross change in genomic composition (e.g., copy number gains that likely have widespread effects on cellular function and genomic instability) may influence progression rather than a more precise influence on a specific gene function or related pathway. This observation has also been made by other studies in the literature [9, 54]. In our prior whole genome sequencing study, clonally related progression was marked by recurrent events of aneuploidy at the earliest stages and successive DNA copy number events throughout progression to invasion [7].

There are some limitations of our study. First, while some associations in the multivariable analyses were significant, those with large confidence intervals should be carefully interpreted. An independent cohort is required to validate these findings. Also, more than half (57%) of the DCIS-only patient received mastectomy specimen. The choice of mastectomy over lumpectomy potentially could drastically decrease the rate subsequent IBC event. Indeed, in our cohort, only 5 patients subsequently developed invasive disease. In addition, without paired normal controls, we could not reliably distinguish between germline and somatic events. However, we focused on ‘hotspot’ mutations with known impacts on protein functions, such as PIK3CA kinase domain mutations, which are easily recognized. Moreover, there are several other genes that are mutated at frequencies that are likely to be significant for a clinical classifier, such as TP53, GATA3, and MAPK3. However, beyond these, most genes are mutated at less than 1% and are thus unlikely to be useful as clinical biomarkers of progression.

The strength of our cohort is the long-term comprehensive follow-up data, to ensure DCIS-only cases were indeed without invasive or metastatic events. These cases are considered as “low-risk” clinically. It is important to identify this type of DCIS patients, who carry low-risk of progression and could consider forgoing extensive treatment. In this study, we hypothesized that the molecular signatures of the “low-risk” DCIS are distinct from the “high-risk” DCIS. DCIS patients with either subsequent or synchronous IBC are considered “high-risk.” We acknowledge that the risk of progress in the “high-risk” DCIS group may be heterogeneous, and that DCISs with synchronous IBCs have even higher risk and may carry a different pathological mechanism and molecular profiles compared the DCISs with subsequent IBCs. This is an interesting, important, yet separate hypothesis that is beyond the scope of this current study.

Future studies are necessary to provide deeper understanding of this novel finding of the PIK3CA-KD predictive value in DCIS progression. Previous literature has suggested that different PIK3CA variants could cause different downstream signaling pathway alterations and biological functions in in vitro systems [55, 56]. One key goal is to map the occurrence and evolution of these genomic alterations in early neoplastic and precursor lesions, as well as in paired invasive and metastatic samples. This would help us to understand the roles of PIK3CA-KD mutations and copy number gains in breast cancer progression. It is also imperative to combine transcriptional and proteomics analyses, in order to interrogate the downstream effects related to the PIK3CA-KD mutations.

Our novel findings that PIK3CA-KD mutations are associated with relative lack of DCIS progression to IBC, coupled with other traditional and novel risk factors, contribute to knowledge enhancement of the sequence and mechanisms of breast cancer progression. These data also begin to demonstrate the possibility of an integrated clinicopathologic-molecular risk classifier of DCIS. For example, women with low nuclear grade, ER+, PIK3CA-KD mutant DCIS may have particularly low risk of progression. Larger studies with more complete histologic, immunohistochemical, proteomic, molecular, treatment and long-term follow-up data are necessary to build risk -assessment models of precision to counsel patients, tailor therapy, and reduce the overtreatment of the more indolent forms of DCIS.