Introduction

The multistep model of breast cancer progression is based on the morphological evolution from atypical ductal hyperplasia (ADH) to ductal carcinoma in situ (DCIS) to invasive ductal carcinoma (IDC) [13]. Gene expression profiling-based studies have failed to demonstrate significant differences between these different stages of progression and instead have shown that multiple samples from an individual patient cluster closer to one another than to their respective stage of progression (ADH, DCIS and IDC) [47]. These, and other studies (combination of genomic, gene expression and immunohistochemical), suggest that the molecular phenotype is already established at the ADH/DCIS stage and does not change considerably as DCIS progresses to invasive cancer [2, 810]. Nevertheless, some subtle alterations have been identified and relate to processes such as epithelial mesenchymal transition, extracellular matrix (ECM) remodelling, and proliferation [5, 6, 11].

It is becoming increasingly apparent that the tumour microenvironment plays a key role in defining tumour behaviour and patient outcome [12]. Gene expression changes occur in cancer-associated stroma and are known to be implicated in prognosis as well as in cancer progression [5, 1316]. Specifically, Ma et al. [5] provided strong evidence from gene expression profiling that the stroma co-evolves with the epithelial compartments during progression.

Gene expression profiling of breast cancer progression has so far only been studied using fresh frozen (FF) material [5] owing to the highest quality of RNA being available from this sample type. However, formalin-fixed paraffin-embedded tissue (FFPE) archives also provide a valuable resource for clinical research with wide availability of samples, particularly for different stages of progression. Gene expression profiling of FFPE clinical samples is also now feasible due to recent technological advances [17, 18]. Here we present the application of whole-genome cDNA-mediated annealing, selection, extension and ligation assay (WG-DASL, Illumina) [17, 18] to the study of the transition of DCIS to IDC in the context of both epithelial and stromal compartments.

Materials and methods

Tissue samples

FF and archival FFPE materials were accessed locally with approval from the Human Research Ethics Committee of the Royal Brisbane and Women’s Hospital (RBWH), Uniting HealthCare (The Wesley Hospital) and The University of Queensland. FFPE blocks from 17 patients specifically diagnosed with concurrent IDC and DCIS between 2007 and 2009 were selected for gene expression profiling (Table 1). All cases were pure invasive ductal carcinoma no special type (IDC NST) (n = 14) or mixed ductal lobular (n = 3; ductal component exclusively studied), of histological grade 2 or 3 [20, 21]. Only IDCs of a solid or glandular pattern with minimal intervening stroma were used. Three types of breast interlobular stroma were studied: IDC-S (stroma within 3 mm of IDC), DCIS-S (stroma within 3 mm of DCIS) and BC-NS (stroma > 10 mm from IDC and/or DCIS), Fig. 1. The BC-NS was obtained from either the same (n = 5) or an alternative paraffin block (n = 11) to the IDC/DCIS lesion. Additional cohorts of samples used, including reduction mammoplasties (RM, free from inflammation and fibrocystic change) are detailed in Online Resource Table S1.

Table 1 Clinical and pathological features of the cohort and WG-DASL outcomes
Fig. 1
figure 1

Epithelial and stromal components microdissected and analysed in this study. a Schematic detailing the two epithelial compartments (DCIS, IDC) and the three different cancer-associated stromal compartments selected (hatched). The respective distances from the epithelial lesions (DCIS or IDC) is given. bd Haematoxylin and eosin stain of representative cases delineating regions microdissected: b focus of DCIS and associated stroma (within 3 mm; DCIS-S); c morphologically normal stroma dissected at a distance of >10 mm (BC-NS) from the lesion, in this instance, IDC; d focus of IDC and associated stroma (within 3 mm; IDC-S)

RNA extraction and quantitative real time-PCR (qRT-PCR)

Manual microdissection of epithelial (IDC and DCIS) and stromal compartments was performed, by a pathologist, using a stereomicroscope and a sterile needle. Up to 15 tissue sections for DCIS and IDC and 40–50 sections for the corresponding stromal samples were required. RNA was isolated using the High Pure RNA Paraffin kit (Roche Diagnostics Australia Pty Ltd., Castle Hill, NSW, Australia). cDNA synthesis was performed using 250 ng of total RNA and SuperScript™ III (Invitrogen Australia Pty Ltd., Mulgrave, VIC, Australia). TaqMan Gene Expression Assays (Applied Biosystems, Mulgrave, VIC, Australia; COL11A1, Hs00266273_m1; COL17A1, Hs00990073_m1; SFRP1, Hs00610060_m1; SOX10, Hs00366918_m1; MMP13, Hs00233992_m1; ABCB1, Hs01067802_m1 and MRAP2, Hs00536621_m1; USP19, Hs01103458_m1) were selected for qRT-PCR. Relative quantification using the comparative ΔΔCt method was performed [22], normalised to the endogenous control (RPL13A, Hs03043885_g1). A reference sample consisting of equal proportions of epithelial and stromal material was pooled from three RMs. The Mann–Whitney U statistical test was used to determine significance (P < 0.05).

Gene expression profiling and data analysis

Whole-genome DASL (Illumina, Scoresby, VIC, Australia) was performed as previously described [17]. Briefly, 200 ng of RNA from samples meeting quality criteria (sufficient RNA and a Ct value of < 29, as determined by qRT-PCR for RPL13A; RNA from all components of case 5 failed quality control and so the whole case was omitted from WG-DASL see Table 1) was processed using the MCS2 reagent and hybridised to the Human Ref8_V3_BeadChips. Data were collated using GenomeStudio (Illumina). The lumi [23] and limma [24] Bioconductor software packages using R (version 2.13.0) were used to perform quantile normalisation and determine differential expression respectively. Genes with Benjamini and Hochberg adjusted P values (< 0.05) with a greater than 50 % chance of being differentially expressed (positive B-statistic) were considered differentially expressed. Genespring GX 10.0.2 (Agilent Technologies, Mulgrave, VIC, Australia) was used for data visualisation and gene ontology (GO) analysis using P < 0.05. Normalised data can be accessed from GEO (GSE35019).

Immunohistochemistry (IHC) and immunofluorescence (IF) and analysis

IHC for SFRP1 (Abnova, Taipei City, Taiwan) and SOX10 (Santa Cruz Biotechnology Inc., Santa Cruz, CA, USA) was performed on FF and FFPE sections, respectively. The MACH 1 Universal HRP-Polymer kit (Biocare Medical, Concord, CA, USA) was used for detection. Further details are described in Online Resources. Percentage and intensity (weak = 1+, moderate = 2+, strong = 3+) of positive cytoplasmic and/or nuclear staining in normal and tumour epithelial cells were scored. Dual IF for COL11A1 and CK8/18 was performed, scored and analysed as described (Online Resource Methods and Figures S1 and S2).

Results

WG-DASL gene expression profiling

Of the 87 samples analysed by WG-DASL (Table 1), 69 passed average signal intensity (>250) and P95 (>800) criteria as described in [17]. Five IDC samples were performed in duplicate and showed an r 2 value between 0.89 and 0.97 (Online Resource Figure S3); 22,723 probes showed a reliable detection score (P < 0.01) in at least one sample. Unsupervised hierarchical clustering of probes which differed twofold from the mean (10,870 probes) showed that, apart from seven samples, there was a clear separation between stromal and epithelial samples (Fig. 2a). In total, 64 probes representing 58 genes were differentially expressed between IDC and DCIS (Table 2), of which 42 and 16 genes were up- and down-regulated in DCIS compared to IDC, respectively. Clustering of these 58 genes in the epithelial samples (DCIS and IDC) showed near complete segregation of the IDC and DCIS sample types (Fig. 2b; Online Resource Figure S4 details a K-means clustering analysis of this data).

Fig. 2
figure 2

WG-DASL expression analysis of the epithelial and stromal compartments of DCIS and IDC patient samples. a Hierarchical clustering of 69 successful samples using the 10,870 probes that demonstrated a two-fold change (up or down) across the sample set. Individual genes are arranged in rows and samples in columns. IDC and DCIS epithelial samples (black) tended to cluster separately from the stromal samples (white). b Supervised clustering analysis of the epithelial sample cohort using the 64 probes (58 genes) that were differentially expressed between DCIS and IDC (IDC grade 3, purple; IDC grade 2, pale blue; DCIS high-grade, yellow; DCIS intermediate (Int) grade, pink). c Unsupervised clustering of the stromal samples using all probes (>two fold from the mean across the dataset); IDC-S, red; DCIS-S, blue; BC-NS, maroon; RM-NS, grey)

Table 2 Differentially expressed genes identified during progression

Unsupervised clustering (Fig. 2c) separated normal stroma from healthy patients (RM-NS) and breast cancer patients (BC-NS) into the two major arms of the dendrogram. No obvious stratification was observed between the three types of stroma from breast cancer patients (BC-NS, IDC-S, DCIS-S) and no significantly differentially expressed genes were identified between any of these sample comparisons (i.e. IDC-S vs BC-NS; IDC-S vs DCIS-S; DCIS-S vs BC-NS). However, pairwise comparison of the cancer stromal types with the RM-NS samples showed that the same genes were consistently expressed at lower levels in the cancer stroma. Indeed, ten genes were differentially expressed between IDC-S and RM-NS and five of these were differentially expressed between DCIS and RM-NS (Table 2). Furthermore, two of these genes, USP19 and MRAP2, were significantly differentially expressed between the two most ‘normal’ stromal samples, RM-NS and BC-NS, while ABCB1 was down-regulated in all cancer-associated stromal samples compared to RM-NS.

Verification of microarray data by qRT-PCR

To validate the WG-DASL findings, qRT-PCR relative quantitation was performed for a selection of biologically interesting transcripts (COL17A1, COL11A1, SFRP1, SOX10, MMP13, ABCB1, USP19, MRAP2). MMP13, also known as collagenase-3, has been widely studied in breast cancer, where it has been shown that it can promote the DCIS to IDC transition [25, 26]. COL11A1 encodes a minor collagen found in many tissues; its expression has been shown to be dysregulated in cancers such as breast and colon [2730]. COL17A1 is a transmembrane protein expressed in normal breast and involved in cell adhesion and SOX10, a transcription factor of the SRY family of the high mobility group box family, is known to be expressed in normal myoepithelial cells [31]. The SFRP1 gene, located at 8p11-21, is frequently altered or down-regulated in sporadic breast cancer and is implicated in cancer progression through its involvement with Wnt signalling [32]. The melanocortin-2 receptor accessory protein 2 (MRAP2) gene encodes a protein-coupled receptor protein, which regulates adrenocorticotropic hormone signalling [33], while USP19 is a de-ubiquitinating enzyme induced in skeletal muscle atrophy. ABCB1, also known as multidrug resistance gene 1 (MDR1), encodes a glycoprotein (PgP) that acts as an efflux pump, protecting the cells from xenobiotics [34] and has been implicated in breast cancer chemoresistance [35].

Three of five genes that were differentially expressed in DCIS versus IDC WG-DASL comparisons (SOX10, COL11A1 and MMP13) were confirmed as being differentially expressed by qRT-PCR (P < 0.05, Fig. 3a, b). There is considerable variation in expression levels between samples, as expected for human tumour samples. While WG-DASL and qRT-PCR data for COL17A1 expression was inconsistent, SFRP1 showed a consistent trend, and with removal of the two highest expressing IDC samples, significance was achieved (P ≤ 0.01). These two samples are the only ‘triple-negative’ tumours in the cohort (Table 1) and this phenotype is known to be associated with overexpression of SFRP1 (Online Resource Figure S5, [36]). COL11A1 was significantly up-regulated in IDC compared to both DCIS (P < 0.0001) and to its surrounding stroma (IDC-S; P = 0.012, Fig. 3b). IDC-S exhibited variable expression levels of COL11A1, but overall showed a significantly higher expression than the DCIS-S (P = 0.007). DCIS-S and BC-NS stromal samples, and those from reduction mammoplasties, showed consistently low levels of COL11A1 expression relative to IDC and IDC-S, with no significance observed in comparisons between the RM-NS and BC-NS samples (Fig. 3b).

Fig. 3
figure 3

Validation of gene expression changes in the epithelial and stromal components of DCIS and IDC lesions. Selected transcripts were validated using qRT-PCR; the data are normalised to RPL13A1, relative to a pooled normal reference and presented as an RQ (relative quantitation) value. Statistical significance was calculated using the Mann–Whitney U test (GraphPad Prism version 5) and is indicated only where significant (*P ≤ 0.03, **P ≤ 0.01). a Expression changes in the epithelial compartment of IDC and DCIS samples are presented for MMP13, SFRP1, SOX10 and COL17A1. b COL11A1 expression data for all sample types are presented. c Expression changes in ABCB1, MRAP and USP19 in the four stromal sample types (RM-NS, IDC-S, DCIS-S and BC-NS)

For the genes exclusively targeted in the stromal compartment, variable expression was detected. Two of four transcripts validated, again reflecting variability in human clinical samples; DCIS-S sample material was also limiting preventing strong correlations from being made (Fig. 3c). ABCB1 stromal transcripts showed concordance with the WG-DASL data, being significantly down-regulated in cancer stroma samples relative to the RM-NS (IDC-S, P < 0.001 and BC-NS, P < 0.05). MRAP2 qRT-PCR data did not replicate that of the WG-DASL, however, the cancer stroma shows a trend towards bimodality of expression, with divergent groups of high- and low- expressing samples. Exclusion of these ‘high-expressors’ in BC-NS and DCIS-S results in significantly lower levels of expression relative to the RM-NS, as observed by WG-DASL. USP19 expression was significantly different between the RM-NS and BC-NS samples (P = 0.008) yet contradicted the WG-DASL data, however, the presence of a single high expressing BC-NS sample likely skewed the statistics towards significance.

Verification of microarray data by IHC and IF

To validate expression changes at the protein level, we performed IHC for SFRP1 and SOX10, and IF for COL11A1 in independent cohorts of cases (Online Resource Table S2). SFRP1, evaluated in frozen tissue sections (Fig. 4a–c), was expressed in the cytoplasm of luminal cells of all normal TDLUs assessed (median percentage of cells stained 100 %; range 20–100 %). A significant decrease in expression was observed in DCIS (median 30 %; range 10–40 %; P = 0.004) and in IDC (median 20 %; range 0–60 % P = 0.0001) relative to the normal. There was no significant difference between DCIS and IDC. SOX10 was strongly expressed in normal myoepithelial cells of TDLUs and surrounding DCIS, however, expression was not detected in normal luminal epithelial cells, nor in 104/105 DCIS and invasive cancers (Fig. 4).

Fig. 4
figure 4

Immunohistochemical analysis of SFRP1 and SOX10 expression in NE, DCIS and IDC. ac SFRP1 protein expression is observed in the normal luminal epithelial cells (arrowhead) as well as in stromal cells surrounding the TDLU; SFRP1 expression is reduced in DCIS (b) and in IDC (arrow in c) relative to NE (arrowhead in c). df SOX10 is a specific marker of myoepithelial cells of normal terminal ductal lobular units (arrowhead in d, f), and of DCIS (arrowhead in e), whereas luminal epithelial cells of normal TDLUs (d, f) and neoplastic epithelium of DCIS (e) and IDC (arrow in f) lack SOX10 expression

We performed IF staining on frozen sections to assess COL11A1 expression in the different breast compartments; epithelial localisation of COL11A1 was confirmed by co-staining with CK 8/18 (Online Resource Figure S2). High levels were observed in the normal epithelium (NE) from both breast cancer (BC-NE) and healthy patients (RM-NE; mean signal intensity, point analysis: 74.7 and 63.9, respectively). COL11A1 expression was higher in IDC compared to DCIS (mean signal intensity, point analysis; 73.9 versus 51.5, respectively), although this observation agreed with mRNA data, it was not significant, likely owing to limited sample size (Fig. 5). While COL11A1 was also expressed in both normal and neoplastic stroma, this expression was significantly lower compared to that of the epithelial compartments (BC-NE, DCIS and IDC, P ≤ 0.001). Fibroblasts were also shown to express COL11A1 (Online Resource Figure S6).

Fig. 5
figure 5

Dual immunofluorescence for COL11A1 and CK8/18 in breast cancer progression. Expression of COL11A1 was assessed in epithelial and stromal compartments of frozen sections from breast cancer patients containing normal TDLUs, DCIS and IDC and reduction mammoplasties containing normal TDLUs. Co-staining was performed with the epithelial marker CK8/18 (red) and COL11A1 (green) to highlight the presence of co-expression (see merged image—right hand panel). ac Breast cancer case with normal TDLU (a), DCIS (b) and IDC (c) showing reduced expression of COL11A1 in DCIS relative to IDC. d, e Breast cancer case with DCIS (d) and IDC (e) where COL11A1 was expressed to similar levels between the epithelial compartments. f Reduction mammoplasty case showing high COL11A1 expression in epithelium of normal TDLU. The left hand panel shows the DAPI counterstain

Gene ontology analysis and meta-analysis

The list of genes differentially expressed between DCIS and IDC (n = 58) was subjected to GO analysis in an effort to identify the cellular pathways and processes that might be involved in the transition to invasive cancer (Online Resource Table S3). An interesting feature of this list was that 11/58 genes were associated with the ECM. In order to investigate whether stromal contamination was prompting this enrichment for ECM terms, we analysed a list of genes that were also differentially expressed between concurrent DCIS and IDC [6]. This study used FF material and laser capture microdissection (LCM) to obtain pure populations of neoplastic epithelial cells with minimal or no stromal contamination [6]. We attributed GO terms to the ‘Schuetz’ gene list [6] and also found an enrichment for functional terms related to ECM (Online Resource Table S3) in this exclusively epithelial analysis. Specifically, eight probes were shared between the Schuetz list and our own accounting for three genes: COL11A1, COL5A2 and MMP13. Comparing our differentially expressed gene list with another recent analysis of DCIS to IDC transition using a LCM-based approach [11], we found 11 genes to be shared. Intriguingly, of these 11 genes, there were six different collagens and three high molecular weight keratins (COL10A1, COL11A1, COL12A1, COL17A1, COL5A2, COL8A1, GPC6, KRT14, KRT17, KRT5, MYH11). We did not find any overlap between our stromal list and that of Knudsen et al. [11], nor with Hannemann et al. [37] who reported a 35-gene signature that can distinguish DCIS from IDC.

Discussion

We have applied gene expression profiling technology (WG-DASL, using MCS2 version reagents) to FFPE tissues to analyse the breast epithelial and stromal compartments in the context of tumour progression. Technically, the WG-DASL was successful, with five replicate pairs correlating well, ~80 % samples passing internal control criteria and epithelial and stromal samples stratifying as expected following unsupervised hierarchical clustering. Previous WG-DASL studies have focused on the technical feasibility of the assay, through for example, FF versus FFPE comparisons [18, 38, 39] and the validation of candidate genes (e.g. ER [40], Her2 [41]) related to subtyping of breast cancers. Nevertheless, gene expression profiling of archival FFPE samples remains extremely challenging, particularly for discovery approaches to understand important biological and/or clinical scenarios, as has been attempted here with respect to tumour progression. We were able to validate 5/9 differentially expressed genes using qRT-PCR and immunological techniques although we observed considerable variation in the expression levels of some transcripts across the clinical samples. It is difficult to conclude confidently whether this validation rate is appropriate, given the relatively limited number of published WG-DASL studies and that the starting material is FFPE, and therefore of highly variable quality. We would advocate validation of a larger panel of transcripts however, in this instance, clinical material was limiting.

The putative tumour suppressor, Secreted frizzled-related protein 1 (SFRP1) [42] was identified by WG-DASL as being down-regulated during progression from DCIS to IDC, consistent with other reports [5, 43, 44]. However this pattern of expression was validated by qRT-PCR only when the two high expressing IDC were removed from analysis. These ‘outliers’ were the only triple-negative tumours in the cohort studied and together with our meta-analysis of publically available microarray data (Online Resource Figure S4) confirms reports that loss of SFRP1 expression is associated with hormone receptor positivity [44] and conversely is a key phenotypic marker in some basal-like tumours [36]. Interestingly, the gene resides on chromosome 8p11-12 within the complex and variable amplification that is identified in ~10–15 % of breast cancers [45, 46]. Despite being part of this amplicon, SFRP1 gene expression is actually down-regulated due to gene methylation, leading to cancer progression through activation of Wnt signalling pathways [32]. The role of SFRP1 as a tumour suppressor in progression may therefore be restricted to certain tumour subtypes (luminal and HER2 related) and possibly a proportion of basal-like/triple-negative tumours.

Contrary to previous reports [4, 5, 7], we did not observe an enrichment of cell cycle-related genes differing between DCIS and IDC compartments. This observation, however, maybe a reflection of study design; the earlier studies used a mixture of low and high-grade tumours, whereas the current study used a more homogeneous cohort of mostly grade 3 tumours. It is perhaps as a consequence of this that we found just 58 differentially expressed genes between these epithelial components, highlighting the overall similarity between matched cases of DCIS and IDC. This is exemplified by the fact that a number of the genes ‘down-regulated’ in IDC compared to DCIS in this study are specific myoepithelial markers (e.g. SOX10, KRT14, KRT5, KRT17; Table 2) and as such were derived from only small population of DCIS-associated myoepithelial cells (Fig. 4). The high molecular weight cytokeratins are well established myoepithelial markers, whereas SOX10 was only recently described as a specific marker of normal myoepithelial cells [31]. The presence of SOX10 positive DCIS-associated myoepithelial cells, which were included in the microdissection of the DCIS samples, would account for the apparent SOX10 overexpression seen in DCIS relative to IDC by WG-DASL and qRT-PCR analyses.

Despite the lack of overlap in gene lists between the current study and previous reports [4, 5, 37], there remains some common gene families and biological processes that are featured in the transition of DCIS to IDC. For instance, in the current study there were a number of genes differentially expressed between DCIS and IDC epithelia that are related to the ECM (e.g. COL17A, COL5A2, COL22A1, COL8A1, COL12A1, COL10A1, COL11A1, MMP13, GPC6, KLK5, FREM1). Comparisons with reported studies [6, 11, 30, 47], including those that exclusively enriched for epithelial cells using LCM, supported our finding that the epithelial compartment is producing ECM-related components. Specifically, we showed by gene expression profiling, qRT-PCR and IF that COL11A1 is produced by the tumour epithelial cells to significantly higher levels than DCIS and the immediate adjacent stroma (IDC-S < 3 mm) suggesting that COL11A1, among other ECM proteins, might play a role in local invasion of breast cancer cells. In support of this, altered expression of COL11A1 has frequently been associated with tumour development and/or progression by others. Significant differential expression of this gene was recently reported to be the top hit in an array-based comparison of DCIS and IDC stroma and epithelium [11]. Turashvili et al. [47] found COL11A1 to be differentially up-regulated in invasive ductal and lobular carcinomas compared to NE (>six-fold change). Additionally, COL11A1 epithelial expression has also been demonstrated at the protein level, not only in breast tissue [29] but also in colorectal epithelium [27]. The co-expression and concomitant up-regulation of COL11A1 with COL5A2 have previously been shown in colorectal cancer compared to its precursor lesion [28, 29], which is in agreement with our data that showed up-regulation of both these genes in IDC relative to DCIS. Importantly, COL11A1 was found to be the top ranked gene in a late-stage ovarian, colorectal and breast cancer meta-analysis and is therefore a defining gene for a metastatic signature [48].

Gene expression changes in the cancer-associated stroma have proven important in predicting clinical outcome in breast cancer patients [13]. Other data indicates that the dynamic interaction between the epithelium and stroma plays an important role in governing the behaviour of a tumour [12]. Overall, we observed relatively few gene expression changes in the stromal samples along progression from normal through to IDC. This is consistent with recent reports where fewer gene expression changes were identified in the stromal compartments relative to the epithelial compartments during progression [11]. We exclusively used the interlobular stroma (surrounding the tumour, as opposed to intervening stroma) in order to avoid epithelial contamination within our stromal samples and this may explain the lack of correlation between our tumour–stroma gene lists and the two previously reported [5, 11]. This may also indicate that the majority of the stromal change occurs immediately adjacent to neoplastic epithelial cells where direct tumour–stroma interaction is most prominent. It must also be noted that these two published stromal signatures [5, 11] do not significantly overlap with each other, despite being performed with FF samples. We found a set of five genes (USP19, MRAP2, EFCBP1, ABCB1 and FADH1) to be differentially expressed only when compared with normal stromal samples from reduction mammoplasties (RM-NS) using WG-DASL. Further, the most ‘normal’ stromal samples (BC-NS and RM-NS) did not cluster together in the hierarchical clustering, suggesting that subtle gene expression changes exist between normal stroma from healthy and cancer patients.

In summary, we have applied the WG-DASL assay for exploring gene expression pattern changes to address the clinically important scenario of progression from DCIS to IDC and the role played by both the epithelial and stromal compartments in this process. We found that the majority of expression changes during progression occurred within the epithelial cell compartments with relatively little change occurring in the stroma. Consistent with previous reports was the enrichment in biological processes related to ECM remodelling in IDC compared to DCIS, and specifically this related to the elevated expression of genes such as COL11A1, COL5A2 and MMP13 in epithelial cells of IDC. These genes might therefore be playing a crucial role in facilitating the invasion of neoplastic cells into and through the surrounding stroma.