Introduction

Ductal carcinoma in situ (DCIS) is a non-obligate precursor of invasive breast cancer that has been shown to originate from the terminal ductlobular units (TDLUs) [1]. The detection of DCIS has increased significantly since the 1970s owing to advances in stereotactic biopsy and adoption of mammography screening, to the point that, currently, DCIS accounts for approximately 25% of all diagnosed breast cancers.

The increase in incidence has spurred a greater understanding of the natural history and the clinical and pathologic features of DCIS. We are now cognizant of the fact that this disease is heterogenous and up to 40% of DCIS can progress to invasive disease if untreated [2]. Although there have been some advances in the development of tests to identify the patients at the greatest risk of progression to invasive disease, these assays have proven difficult to be used for the management of individual patients [2]. Understanding which patients are at greatest risk of progression is germane to the identification of those who are in need of further adjuvant treatment after surgical excision. It has been posited that the development of optimal biomarkers for the assessment of risk of progression of DCIS requires an understanding of the genomic alterations and/or epigenetic changes that are required sufficient for DCIS cells to breach the myoepithelial layer and basement membrane.

There are multiple hypotheses for progression from early lesions to invasive carcinoma that are biologically plausible in progression from DCIS to invasive ductal carcinoma (IDC). We [3•, 4, 5••] and others [6, 7••] have previously described three hypothetical models of DCIS progression: (1) a convergent phenotype model wherein multiple genomic and epigenomic events across independent subclones of disease converge onto similar pathways that drive progression of the disease as a whole, (2) an evolutionary bottleneck model wherein only one subclone develops the necessary combination of genomic alterations for progression, and (3) co-migration of multiple clones, whereby an invasive phenotype can result from multiple clones that escape from the duct and invade surrounding tissue [6, 7••]; in the latter context, it is plausible that the ability to invade was acquired rather early in the development of the DCIS or even that the molecular changes that resulted in the development of the DCIS were sufficient for the acquisition of an invasive phenotype.

In this review, we describe some of the extrinsic and intrinsic molecular factors associated with increased risk of DCIS to IDC and discuss which possible markers that can be useful in predicting DCIS behavior. We focus significantly on how the genomic data obtained through massively parallel sequencing and single-cell sequencing approaches have allowed for a better understanding of the genetic heterogeneity of DCIS and IDC, and the insights provided in understanding this complex path towards progression.

Clinical and Pathological Features of DCIS

Although DCIS was first described a century ago [8], it was the seminal work by Wellings and Jensen that postulated DCIS as a precursor lesion to IDC [9]. This was further supported by numerous clinical observations showing a location and temporal correlation of DCIS and IDC. DCIS is not only often found in close proximity to IDC at the time of diagnosis, but longitudinal studies looking at patient outcomes after surgical management of DCIS show a 20–50% risk of developing IDC in the same breast quadrant of the initial DCIS diagnosis [10, 11]. Collins et al. [12] also reviewed biopsies originally diagnosed as benign and identified 13 cases with previously undetected DCIS, of which 46% progressed to invasive carcinoma 5 to 18 years after the initial biopsy.

DCIS is not a single disease; rather, in a way akin to invasive breast cancer, it represents a heterogeneous group of cancers with likely distinct cells of origin, histopathologic features, molecular alterations, and clinical behavior [13]. From a histopathologic perspective, DCIS varies in their architecture, nuclear grade, presence of comedo necrosis, multifocality, and size. All these factors can help stratify the aggressiveness and consequently the risks for recurrence and progression of these lesions [13] (Fig. 1). Though historical classification systems attempted to stratify DCIS on the basis of growth patterns and architectural features, contemporary classification systems group lesions primarily on the basis of nuclear atypia, which appears to correlate to some extent with the subsequent risk of invasion [14, 15]. It should be noted, however, that women with DCIS, independent of the nuclear grade, have been shown to be at risk of having invasive cancer [15]. DCIS has been classified into the same intrinsic subtypes as invasive breast carcinoma, based on the expression of estrogen receptor (ER), progesterone receptor (PR), HER2, and using basal markers as EGFR and cytokeratin 5/6 [16,17,18,19]; however, the relative incidence and prognostic implications of these lesions are not as clear [18, 20].

Fig. 1
figure 1

Progression of ductal carcinoma in situ to invasive carcinoma from a histopathologic perspective. Representative micrographs and schematic representation of progressive stages of breast cancer including in situ carcinoma, microinvasive carcinoma and invasive carcinoma

Randomized clinical trials have not only clarified important aspects of the natural history and the pathologic factors involved in progression of DCIS, but also demonstrated that the risk of progression to IDC can be mitigated, at least in part, by radiation therapy and chemoprevention. The first clinical trial addressing DCIS therapy began in 1985, the NSABP B-17, that demonstrated a reduction of ipsilateral DCIS from 15.4% to 9% with surgery plus radiotherapy [21, 22]. Other clinical trials (EORTC 10853 and UK/ANZ) have also provided similar evidence that radiotherapy improved local control following excision [23, 24]. In addition to the clinical and therapeutic insights provided by these trials, they have also offered a wealth of samples to investigate the pathologic features associated with recurrence [21, 22]. DCIS with high nuclear grade, predominantly solid and with extensive comedo necrosis, was identified as a subgroup of poor prognosis [25]. The NSABP B-24 trial tested the addition of tamoxifen to radiotherapy in women submitted to breast conserving surgery for DCIS. The design did not require ER testing of DCIS. Tamoxifen was found to be effective in reducing the risk of ipsilateral and contralateral invasive cancer and DCIS [26]. Some years later, Allred et al. [27] analyzed the ER status of the NSABP B-24 cases and reported a significant decrease in breast cancer events after 10 years of follow-up, which was almost exclusive to the ER-positive lesions; thus, current practice is to recommend tamoxifen to patients with ER-positive DCIS. The frequency of HER2 overexpression in DCIS is around 35% and studies have had encouraging preliminary results of the NSABP B-43 trial regarding the use of trastuzumab for radiosensitization of DCIS lesions [24, 28, 29]. Though these studies help stratify risk, they fail to identify which patients can be spared from these treatments.

In an attempt to establish a method to predict which patients are at risk of relapse or of developing invasive cancer from DCIS, Oncotype DX, a multigene expression assay used in invasive breast cancer was pared down to 15 genes and tested in patients with DCIS from the ECOG E5194 trial [2, 30], a single-arm observational study to evaluate DCIS outcomes following excision with 3 mm margins and no radiation allowed. This assay was found to be effective to predict early but not late recurrences. Also, so far, there is no suggestion that this score predicts benefit from radiotherapy. In addition, it remains to be determined whether this multigene assay provides information above and beyond that offered by centralized and standardized pathologic and immunohistochemical assessment of DCIS [31]. The Prelude DCISionRT biological signature is another commercially available test that indicates the individual risk of recurrence and radiotherapy benefit after breast-conserving therapy [32]. The risk signature incorporates four clinicopathologic factors (age, size, margin status, and palpability) and seven immunohistochemically assessed biomarkers of hormone receptor and HER2 status, stress response, and proliferation (PR, COX2, FOXA1, HER2, Ki67, and p16). This test has been validated in the SweDCIS study population, and so far, it appears to provide predictive information regarding radiotherapy benefit [33]. The Prelude DCISionRT is on trial also in the USA (NCT03448926) with the estimated primary completion date for February 2023.

DCIS Progression: Intrinsic Factors

Genetic Alterations and Gene Expression Profiles

Genomic analyses of paired and almost invariably synchronous DCIS and invasive breast cancer samples have been performed to investigate whether recurrent somatic genetic alterations would be restricted to or overrepresented in the invasive disease. These studies have demonstrated that somatic genetic alterations of DCIS are remarkably similar to those of IDCs [34, 35••]. At the gene level, similar to invasive cancer, recurrent mutations in PIK3CA, TP53, and GATA3 have been identified in DCIS, demonstrating that, at diagnosis, DCIS is already a genetically advanced disease and that the driver genetic alterations identified in early stage invasive breast cancers are already present at the DCIS stage [36••]. These studies have failed to identify somatic mutations that would constitute a common denominator as a driver of progression from in situ to invasive disease. This is not surprising, as invasion may constitute a convergent phenotype. The analyses of specific pathways rather than specific genes may provide insight into the progression. For example, the Myc [37,38,39] or PI3K/AKT pathways have been shown to be important in the progression of a subset of ER-positive/HER2-negative DCIS, though this finding only explains a small subset of cases [40•].

Given that breast cancer is considered to be a C-class tumor (i.e., a tumor type where copy number alterations (CNAs) play a pivotal role in the biology of the disease) [41], CNAs have been postulated as potential drivers of DCIS progression. Quantitative multigene fluorescence in situ hybridization (QM-FISH) analysis of 66 synchronous DCIS and IDC detecting CNAs in 30 genes revealed frequent amplification of MDMx, CCNE2, ERBB2, IGF1R, CKS1BP7, and MYC and frequent deletion of TP53, CHEK1, RB1, CDH1, CHEK2, and NEK9 in breast cancers [39]. This study demonstrated that the levels of genomic alterations observed in DCIS are similar to those of invasive carcinoma. This also suggests that, in most cases, CNAs occur before the acquisition of an invasive phenotype and may not contribute to the development of invasion [39]. Burkhardt et al. used FISH to assess both pure DCIS and DCIS with associated IDC and found no significant differences in CNAs affecting HER2, ESR1, CCND1, or MYC [42]. A meta-analysis including 26 studies did not identify significant differences in CNAs between DCIS and IDC, further suggesting that if CNAs are a significant part of DCIS progression, they constitute an early event in the process [43]. In addition, no particular CNA was necessary or sufficient for the progression to IDC. This is not surprising, given that CNAs appear to develop in bursts of evolution that take place rather early in the development of a breast cancer [44].

Our group sought to investigate the patterns of progression from DCIS to invasive breast cancer on the basis of single-cell sequencing analysis [5••]. Our study revealed that both DCIS and invasive synchronous carcinoma harbored truncal and clonal genetic alterations. Subclonal events, however, were also found in both the DCIS and invasive components. Examples of both clonal selection and multiclonal invasion, however, were observed. This again implies that invasion and progression do not necessarily need to be related to CNAs, as some previous studies suggested [42, 45, 46], or that the ability to invade was an early event followed by genomic instability and the generation of DCIS resistant subclones [5••].

An analysis that used immunohistochemistry-based intrinsic subtyping and array-based comparative genomic hybridization of 22 DCIS and 30 IDC [47] resulted in the identification of 9 breast cancer-related genes, including TP53 and GATA3, that highly contributed to the discrimination of DCIS into two progression risk clusters. The cluster A (rapidly progressive) showed a greater number of gene and chromosome CNAs, a larger IDC/DCIS ratio, a higher frequency of non-luminal subtype, and a higher nuclear grade when compared with the other group (cluster B). These observations may contribute to triage cases by progression risk to more appropriate treatment [47]. All of these observations suggested that progression from DCIS to invasive breast carcinoma is not driven by highly recurrent genetic alterations in DCIS cells [4] and that some degree of stochasticity likely plays a role in the acquisition of molecular alterations prior to the invasion process.

Doebar et al. described distinct gene expression profiles in cases with pure DCIS compared to cases with DCIS with synchronous IDC [35••]. In addition, there was a high genomic concordance between synchronous DCIS and IDC (52 out of 92 mutations were present in both components). The remaining 40 mutated genes, however, were restricted to the invasive component. The proportion of tumor cells with these mutations was higher in the invasive component compared to the DCIS component in a subset of patients. The same authors had previously described genes that were highly expressed in DCIS with IDC, but not in the pure DCIS [48]. Validation in a large cohort would be required to confirm if the differently expressed genes (PLAU, COL1A1, SCGB1D2, S100A7, KRT81, KRT81, NOTCH3, CXCL14, EGFR) could be used to predict progression.

Clonality and Genomic Intratumor Heterogeneity

Subclonal genetic heterogeneity has been independently identified in both the DCIS and invasive components, and the prevalence of individual subclones has been shown to be different between the synchronous DCIS and invasive lesions [4, 5••, 7••]. Our group previously observed after pairwise analysis of mutations in DCIS and synchronous IDC, that 3/13 cases harbored genomic differences, which could potentially propagate an invasive phenotype, which is supportive of the convergent phenotype hypothesis [3•]. Phylogenetic studies have suggested that ancestral relationships between IDC and DCIS are complex and varied. When Newburger et al. assessed multiple lesions with six cases of IDC, they found that the genome of some ancestor cells had a strong predisposition to generate cancerous progeny, as seen by their ability to produce independent subclones with the invasive phenotype. When comparing numbers of single-nucleotide variation and degree of aneuploidy among progeny, it appeared that these events occurred gradually (not as a part of some acute genomic assault that results in invasion) suggesting other factors at play [49, 50].

Observational studies have shown that there are genetic alterations that are not highly shared in synchronous DCIS and IDC, such as TP53 mutations, which are more prevalent in the invasive component than in the DCIS [51], whereas the opposite was found for GATA3 [36]. These data suggest that TP53 mutations may be implicated in the progression of DCIS whereas GATA3 mutations are selected against progression.

Single-cell sequencing approaches have contributed substantially to our understanding of the progression from in situ to invasive breast cancer [5••]. Our study by Martelotto et al. described and validated a method for whole-genome sequencing-based copy number analysis of single cells derived from formalin-fixed paraffin-embedded (FFPE) tissue. Two cases of synchronously diagnosed DCIS and invasive breast cancer were separately microdissected and nuclei prepared. In one case, hierarchical clustering of all nuclei sequenced displayed, among other clonal events, gains of chromosome 1q and losses of chromosomes 11q and 22. Focal amplifications on chromosome 17q encompassing the ERBB2 (17q12-q21.2), PPM1D and BCAS3 (17q22-q23.2) loci were also observed [5••]. Subclonal events such as losses of chromosomes 1p and 8p and alterations on chromosomes 5 and 8 were found in both components, in situ and invasive. Six distinct but highly related subpopulations were identified also in both components. These findings are consistent with the notion that the ability to invade was acquired early in the in situ disease and followed by genomic instability and the development of multiple genetically heterogeneous DCIS subclones that in parallel progressed to invasive disease. In a second case, clonal alterations were identified, such as loss of chromosomes 13q (RB1) and 17p (TP53) as well as focal amplifications on chromosomes 8p11.2-p12 (FGFR1) and 11q13.3-q13.4 (CCND1). Subclonal events included chromosomes 20p–20q gain, loss of chromosome 9, and segmental losses at 3p21.31–p12.3 and 3q21.2–q24. These and other subclonal alterations defined several distinct clusters, which were restricted to either the DCIS or the invasive component. In this case, based on the identified subclones, the reconstructed phylogeny suggested a scenario where intra-tumor genetic heterogeneity occurred early in disease development and that progression from DCIS to invasive carcinoma might have occurred through the selection of a minor subclone of the DCIS. Interestingly, this putative evolutionary bottleneck may have resulted in the selection of a minor subclone of the DCIS harboring a homozygous deletion of PTEN [5••]. Subsequent validation based on FISH analyses confirmed the single-cell sequencing observations. These findings suggest that the DCIS progression models may be different from patient to patient.

To further address these hypotheses [7••], paired DCIS and invasive carcinoma samples were assessed using a technique for spatial mapping of single-cell copy number analysis called topographic single-cell sequencing (TSCS) combined with high-depth exome DNA sequencing analysis, that enabled to infer single-cell CNA profiles. They identified a range of 1–5 abnormal CNA clones in both components, DCIS and IDC, from each of the 10 analyzed patients, no additional CNA events were acquired during invasion. Interestingly, the prevalence of individual subclones did vary between the in situ and invasive components, but the level of clonal diversity was not significantly different [7••]. The authors also found a mean of 87.4% of non-synonymous somatic mutations in known breast cancer genes such as TP53 and PIK3CA, which were concordant in the in situ and invasive carcinoma components, suggesting that they were acquired in the ducts before the invasion. From these findings, they proposed a model of multiclonal progression in which genetic heterogeneity develops within the duct system prior to the invasion. The TSCS study provided evidence to suggest that the genomic heterogeneity is maintained during progression from DCIS to invasive carcinoma [7••]. Therefore, epigenetic or other non-genomic biologic factors are likely important for invasive progression [52] and should be addressed together with genomics in a broad study encompassing normal breast, pure DCIS and DCIS with invasive synchronous tissue.

Extrinsic Factors on DCIS Progression

Since the extracellular matrix (ECM), fibroblasts, myoepithelial cells (MECs), lymphatics, and vascular endothelium to inflammatory cells, all components of the tumor microenvironment play a role and have been revealed substantial changes by gene expression profiling during progression from DCIS to IDC [52]. The invasive growth has been reported to express several matrix metalloproteases (MMP2, MMP11, and MMP14), which are known effectors of cancer progression in several cancer types [53]. Epigenetics changes in the microenvironment of the tumor also play a role in this process. Surrounding stromal cells and MECs phenotypically aberrant lose their normal function and may create a more permissive environment for the transition from in situ to invasive carcinoma [54].

Methylation is the epigenetic alteration most studied in DCIS. Park et al. evaluated the changes in promoter CpG islands hypermethylation during progression from pre-invasive lesions to invasive breast cancer, and found six new methylation markers of breast cancer, namely DLEC1, GRIN2B, HOXA1, MT1G, SFRP4, and TMEFF2, in addition to APC, GSTP1, HOXA10, IGF2, RARB, RASSF1A, RUNX3, SCGB3A1 (HIN-1), and SFRP1 [55]. The promoter CpG island methylation changed significantly in pre-invasive lesions, and was similar in DCIS and IDC, suggesting that CpG island methylation of tumor-related genes is an early event in breast cancer development [55].

Comparing the DNA methylome of 569 breast tissue samples, including 50 from cancer-free women and 84 from matched normal-cancer pairs, revealed dramatic changes between normal, cancer-free tissue, and normal adjacent tissue with widespread DNA methylation field effects [56]. The methylation patterns in the adjacent normal tissue were found strongly enriched at genetic regulator elements as CTCF and RAD21, proteins critical to chromatin looping suggesting that chromatin remodeling in adjacent normal tissue is critical to DCIS progression. The authors proposed a model where clonal epigenetic reprogramming towards reduced differentiation in normal tissue is an essential step in breast carcinogenesis [56]. Although the described epigenetic alterations in the tumor progression, much remains to be clarified about the mechanisms and the physiopathological role of the methylation in the whole process of invasion.

The Role of Myoepithelial Cells

In addition to the basement membrane (BM), normal breast ducts have a MEC layer between the epithelium and the BM. When DCIS cells acquire the invasive phenotype, they infiltrate through both the MEC and BM layers. Normal MECs and MEC layer integrity are seen in benign lesions, and they are reported to have natural tumor suppressor functions as the maintenance of the BM and the epithelial polarity, including expression of tumor suppressor proteins (p63, p73, WT-1, maspin, and laminin 1). During progression from in situ to invasive disease, interactions between intraductal malignant cells and MECs, which should exert tumor-suppressive effects, eventually allow the progression to invasive disease [57,58,59]. There is evidence that MECs lining ducts colonized by DCIS are phenotypically aberrant, lacking some of their differentiation markers and displaying up-regulation of genes related to angiogenesis and invasion [60]. Accordingly, normal MECs were shown to suppress tumor growth and invasion in the absence of detectable genomic alterations in the tumor epithelial cells. The elimination of mediators of myoepithelial differentiation-related pathways led to MEC loss and progression to invasive disease [59]. Interestingly, it has been noted that the sensitivity of some MEC markers using immunohistochemistry is lower in DCIS-associated MECs than in normal MECs, reflecting the presumable genetic alterations in these cells [61]. Rakha et al. described that MECs isolated from DCIS show gene expression and epigenetic differences compared to MECs isolated from normal breast tissue [62]. Hu et al. demonstrated the role of MECs and fibroblasts in the progression of DCIS to IDC using a mouse model of DCIS. They highlighted the importance of p63, sonic hedgehog signaling, TGFβ, and cell adhesion in the myoepithelial cell differentiation, without which DCIS had an enhanced propensity for progression [59]. Other studies have implicated cell-specific markers in the DCIS, namely CXCL14 in myoepithelial cells and CXCL12 in fibroblasts [63]. Exploring such differences further is clearly clinically important due to their potential as biomarkers of invasive progression.

The Role of Immune Cells

The immune system plays a role in tumor progression, probably by sculpting the immunogenic phenotype of tumors as they develop. The recognition that immunity plays a dual role in the complex interactions between tumors and the host prompted a refinement of the cancer immune-surveillance that is a critical step in tumor evolution [64, 65].

While the tumor infiltrating lymphocyte (TIL) score was not found to be associated with recurrence in breast cancer, there was a trend for recurrent lesions to have fewer TILs than the primary tumors. The loss of TILs in recurrent lesions may reflect suppression of anti-tumor adaptive immune responses during recurrence [52].

The studies analyzing leukocytes and the prognostic value of TILs in breast cancer have focused on invasive tumors, rather than DCIS. DCIS and premalignant lesions have been relatively neglected in the evaluation of lymphocyte role in the progression of DCIS to invasive disease. In the genomic characterization of DCIS, gene expression signatures implying the presence of activated T cells in a subset of tumors were identified [66]. Also, regulatory T cells (Treg) seemed to be increased during tumor progression, suggesting that this could be a marker of risk progression [67]. These two findings are not supported by other studies, which showed that a subset of CD8+ T cells (GZMB+CD8+ and Ki67+CD8+) were very common within DCIS and decreased in local invasive disease, implying that CD8+ cells could be potentially better biomarkers than Tregs for predicting invasive progression [65].

It has been reported that in triple-negative breast cancers (TNBCs), the invasive components have a higher fraction of T cells than DCIS of triple-negative phenotype, implying that more T cells are intermingling with cancer cells in TNBCs but not in HER2-positive breast cancers [65]. Moreover, the opposite was found by the same authors in HER2-positive DCIS, where higher numbers of T cells associated with HER2-positive DCIS compared with HER2-negative DCIS as well as higher numbers of T cells associated with high-grade compared with low-grade DCIS were observed. These findings suggest that HER2-positive DCIS and high-grade DCIS may be associated with a more immunogenic environment [52, 65].

The expression of proteins relevant to the use of immunotherapy for breast cancer patients has been investigated in DCIS. PD-L1, whose expression can be detected in immune cells, in particular CD68-positive cells, tumor cells have been found to display varying expression patterns in breast cancers. In a study regarding the immune escape in DCIS progression, the analysis of each T cell subtypes in DCIS and in synchronous DCIS-IDC by single-cell sequencing associated with functional studies may help to clarify the role of these cells in DCIS progression.

Conclusions

The understanding of the molecular processes involved in the DCIS progression to invasive breast cancer has been fundamentally transformed in the last decade; however, our ability to predict the behavior of individual cases of DCIS remains limited. The surgical treatment with or without sentinel lymph node biopsy followed by radiotherapy and endocrine therapy is still based on clinicopathological and immunohistochemical parameters.

The progression of DCIS to invasive disease is a complex biological phenomenon that involves considerable genomic and epigenetic alterations of the tumor and its microenvironment (Fig. 2), and the mechanisms of progression may vary from patient to patient (e.g., clonal selection vs multiclonal invasion). Despite these challenges, the implementation of cost-effective single-cell sequencing DNA, RNA, and ATAC sequencing approaches coupled with the constitution of international consortia of investigators tackling the biology of DCIS and the validation of biomarkers to predict their clinical behavior will likely result in the development of approaches for the management of DCIS that are biology-driven rather than based on empiricism.

Fig. 2
figure 2

Hypothetical models of invasion of ductal carcinoma in situ (DCIS). a Invasive ductal carcinoma can originate de novo or from non-obligate precursors, following an independent evolutionary pattern. b The evolutionary bottleneck progression postulates that, in a heterogeneous lesion with multiple subclones harboring private mutations, a single subclone is selected and with the support of the microenvironment escapes the basement membrane constituting the invasive carcinoma. c Multiple subclones with support of the microenvironment can escape the basement membrane and co-migrate to adjacent tissue constituting the invasive carcinoma. DCIS, ductal carcinoma in situ; IDC, invasive ductal carcinoma; Treg, regulatory T cells