Introduction

Over the years, genomic studies have highlighted that breast cancer cannot be considered a single disease. Instead, breast cancer is perceived as a heterogeneous disease comprising at least five distinguishable entities [13] with distinct biological features, clinical behaviors, and responses to therapy [38]. Several approaches have been developed to classify breast cancer into molecular subtypes. Initially, samples were classified by hierarchical cluster analysis, using intrinsic gene lists [13]. However, this tool was not appropriate for individual sample classification, as it could be only applied to large cohorts of patients. To overcome this limitation, different single sample predictors were developed [9, 10]. The most recent subtype predictor is the clinically applicable PAM50 assay, which classifies tumors into one of the following categories: luminal A, luminal B, Her2-enriched, basal-like, and normal-like subtypes [10]. More recently, a new intrinsic subtype, known as claudin-low, has been identified and a subtype predictor has been reported [11, 12]. Claudin-low tumors are characterized by the low expression of cell–cell junction genes and the high enrichment for mesenchymal and stem cell-like features [11, 12].

The intrinsic subtypes can also be defined by approximations to this classification using pathology-based surrogate definitions by immunohistochemistry (IHC) and/or FISH [13, 14]. In this way, luminal A tumors can be defined as those that express either estrogen receptor (ER) or progesterone receptor (PR) and have low expression of the proliferative-related biomarker Ki-67. Luminal B tumors express either ER or PR and overexpress either ERBB2 or the proliferation marker Ki-67 [15]. HER2 tumors are those ER-negative, PR-negative, and overexpress ERBB2. Finally, tumors that do not express ER, PR, and ERBB2 are the triple negative (TN) tumors. However, this immunohistochemical approach does not perfectly match with the intrinsic subtypes [16].

In this paper, we evaluate the prognostic and predictive information yielded by the genomic and IHC-FISH-based subtype classifications in a cohort of patients enrolled in a clinical trial.

Methods

Study population

Ninety-four patients diagnosed as having locally advanced breast cancer were included in this study. These patients participated in a neoadjuvant clinical trial (registered at the following website: http://www.clinicaltrials.gov; identifier NCT00123929) as described previously [6, 17, 18]. The study was approved by the institutional review board of the Hospital Clínico San Carlos, Madrid, Spain. Before being enrolled in the trial, an informed consent was obtained from every patient. Briefly, eligibility criteria included the following: women aged between 18 and 78 years; clinical stage IIB, IIIA, or IIIB breast cancer; and palpable breast tumors not amenable to breast-preserving surgery. Patients were randomly assigned to receive four cycles of either neoadjuvant doxorubicin (75 mg/m2 body surface area) or docetaxel (Taxotere, Sanofi-Aventis, Spain) (100 mg/m2 body surface area with G-CSF support) every 3 weeks. Women underwent surgery after treatment. Clinical response was evaluated according to RECIST criteria comparing pre- and post-chemotherapy magnetic resonance imaging. Subsequent to surgery, patient treatment assignment was crossed-over to receive four cycles of the opposite drug, plus radiation therapy. Patients with HER2-positive tumors also received adjuvant trastuzumab for 1 year. In addition, patients whose tumors were positive for hormone receptors (HR) received tamoxifen, or aromatase inhibitors, or a sequence of both for at least 5 years.

RNA isolation and microarray expression profiling

Total RNA was extracted from pretreatment tumor biopsies using Qiagen RNeasy Mini Kit (Qiagen Inc., Valencia, CA) following the instructions of the manufacturer. Only samples with >80 % tumor cells were used. Gene expression microarrays where hybridized with tumor total RNA as previously described [18], briefly Whole Human Genome Oligo 4 × 44 Microarrays (Agilent Technologies, Santa Clara, CA) were hybridized, scanned on a GenePix 4000B scanner (Molecular Devices Corporation, Sunnyvale, CA). Genes were filtered by requiring the Lowess normalized intensity values in both channels to be >10 and only genes that reported values in 70 % or more of the samples were included. The genes were median-centered across all samples. Breast cancer molecular subtypes were identified using the PAM50 and the claudin-low predictor (CLP) subtype predictors as previously described [10, 12]. Briefly, the CLP calculates the euclidean distances to the claudin-low centroid and an “others” centroid and assigned the class of the nearest centroid. Claudin-low centroid and an “others” centroid were calculated using specific gene lists as described before [12]. Samples identified as claudin-low were considered claudin-low regardless of the PAM50 call [12]. The primary microarray data presented in this study is available in the Gene Expression Omnibus repository database under accession number GSE21997.

IHC FISH and tumor grading

Paraffin-embedded tumor samples from core biopsy specimens were evaluated by immunohistochemical analysis for ER (clone 1D5, 1:35; DakoCytomation, Glostrup, Denmark), PR (clone PgR 636, 1:50; DakoCytomation), Ki-67 (clone MIB-1, 1:75; DakoCytomation), EGFR (EGFR, cloneEGFRr.25, 1:50. Leica Microsystems), CK5/6 (CK 5/6, clone D5/16B4, prediluted, Master Diagnostica). After incubation with the primary antibodies, immunohistochemical studies were performed using the Autostainer link 48 (DakoCytomation, Carpinteria, CA). Positivity for EGFR, and CK 5/6 was defined as any degree of positive staining. The cut points for ER and PR positivity were established at 1 % or greater of stained cells. Slides of all tumors were reviewed for diagnostic reassessment of the tumor histotype, histological grade, and presence of lymphocytic infiltration, in a blinded fashion, by the study pathologist.

The amplification of ERBB2 was measured by FISH. The probes used were as follows: Centromere enumeration probe 17, labeled in green; and locus-specific identifier ERBB2 probe, labeled in orange (Vysis-Abbott, Downers Grove, IL). Slides were prepared according to the manufacturer’s instructions. A positive result was defined as an ERBB2 gene/chromosome 17 ratio of 2.2 or greater. A minimum of 100 nuclei were counted per case.

Tumors were classified into molecular subtypes based on IHC/FISH parameters as previously described by Hugh et al. [19]. Luminal A tumors were those with positive staining for ER and/or PR, HER2-negative and Ki-67 ≤13 %. Luminal B tumors were defined by positive staining for ER and/or PR, and either HER2-positive or Ki-67 equal or superior to 14 %. HER2 tumors were those ER- and R-negative and HER2-positive.TN tumors corresponded to ER, PR and HER2-negative tumors. TN tumors were further subdivided into the core basal phenotype (CBP) tumors (either CK 5/6- or EGFR-positive) and five negative phenotype 5NP tumors (CK 5/6-negative and EGFR-negative) as described by Nielsen et al. [13].

Statistical analysis

The association between categorical variables was tested by Chi-squared or Fisher exact test when appropriate. The Kruskal–Wallis test was used for comparing more than two groups for not normally distributed quantitative variables. The kappa (κ) coefficients values and corresponding 95 % confidence intervals (95 % CI) were used to assess the agreement between genomic and pathology-based subtype assessment. Samples assigned to the normal-like subtype were not included for this analysis. The strength of agreement is considered to be slight when κ values are between 0.00 and 0.20; fair, 0.21 and 0.40; moderate, 0.41 and 0.60; good, 0.61 and 0.80; and almost perfect, 0.81 and 1.00.

Survival curves were estimated by the Kaplan–Meier method. Likelihood ratio statistics of subtypes assessed by gene expression or pathology-based definitions were also evaluated after accounting for clinical–pathological variables (age at diagnosis and tumor stage). Models were first conditioned on one predictor, and then the significance of the other was tested.

Hazard ratios (HRs) for overall survival (OS) and relapse-free survival (RFS) from Cox proportional hazards models were adjusted for tumor stage and histological grade. Three models were compared for the prediction of survival outcomes: (1) clinical variables alone (histological grade, tumor stage, and neoadjuvant treatment outcome), (2) PAM50 + CLP subtype classification and clinical variables, (3) IHC-FISH-based subtype classification and clinical variables. C-index (c) was chosen to compare the strength of the various models. The C-index (c) measures the probability of concordance between observed and predicted survival based on the pairs of individuals, with c = 0.5 for random predictions and c = 1 for a perfectly discriminating model. P < 0.05 was considered for statistical significance. The statistical analysis was performed using software Stata 11.0. and SPSS 15.0.

Results

Clinical features of breast cancer molecular subtypes

The main clinical characteristics of the study population are presented in Table 1. The median age at breast cancer diagnosis was 51 years (range 27–77 years). Interestingly, we found that the median age of diagnosis for claudin-low tumors (61 years; range 38–76) was significantly higher compared to the rest of the subtypes (P = 0.016). Regarding histology, the majority of tumors (83 %) were invasive ductal carcinomas. Lobular invasive carcinomas corresponded mostly to luminal subtypes. We have encountered one medullary carcinoma that was assigned to the basal-like subtype, one micropapillar carcinoma that corresponded to a luminal B tumor and two mucinous carcinomas that were assigned to the luminal A and normal-like subtypes. The two-way contingency table analysis showed significant association between histological grade and tumor subtype (P = 0.016). Finally, we found that the proportion of tumors showing lymphoid infiltration varied according to the molecular subtype (P = 0.027). As presented in Table 1, tumor lymphoid infiltration was more frequent in Her2-enriched and claudin-low tumors.

Table 1 Clinical–pathological characteristics of the study population according to intrinsic subtype

Agreement between genomic and IHC-FISH-based subtype assessment

The proportion of observed agreement between the pathology-based and PAM50 classifications was 68 % with κ of 0.551 (95 % CI 0.467–0.641), indicating a moderate agreement.

Figure 1a illustrates the intrinsic subtype distribution within TN tumors and within CBP tumors (i.e. triple negative and CK5/6+ and/or EGFR+). As shown, only 57 % of TN tumors were basal-like and 27 % of TN tumors were claudin-low. Equally, only 56 % of CBP tumors corresponded to basal-like tumors and 25 % of these tumors were claudin-low tumors. Indicating that the TN and CBP surrogates for basal-like included a non negligible percentage of samples that were not true basal-like. Conversely, 2 out of 17 (12 %) of basal-like tumors were misclassified as luminal B tumors according to pathology-based classification (Fig. 1b). On the other hand, 56 % of luminal A tumors assessed by pathology-based classification were true luminal A tumors by PAM50 + CLP and only 44 % of luminal B tumors assessed by IHC-FISH-based classification were true luminal B tumors by PAM50 + CLP (Fig. 1c).

Fig. 1
figure 1

a Intrinsic subtype distribution within TN tumors and CBP tumors. b Distribution of IHC-FISH-based classification subtypes within the basal-like tumors. c Intrinsic subtype distribution within luminal A and luminal B tumors assessed by the IHC-FISH-based classification

Prognostic information provided by breast cancer molecular subtypes

The median follow-up time was 3.9 years (range 0.8–8.8 years). Disease recurrence was noted for 38 (40.4 %) women and 22 (23.4 %) deaths were recorded. Luminal A tumors had the lowest 5-year mortality rate according to PAM50 + CLP and IHC-FISH-based classification (Fig. 2, 0 and 5.6 %, respectively). In contrast, basal-like tumors as well as TN tumors showed worst outcome with 67.3 % and 40.1 % 5-year mortality, respectively. The Kaplan–Meier curves for OS and RFS according to PAM50 + CLP and IHC-FISH-based classification are displayed in Fig. 2. As shown, breast cancer subtypes assessed by both methodologies were prognostic for OS and RFS by Kaplan–Meier analysis (P < 0.05 by log-rank and Breslow).

Fig. 2
figure 2

Kaplan–Meier RFS and OS curves for breast cancer molecular subtypes assessed by PAM50 + CLP and by IHC-FISH-based classification

In univariate analysis, statistically significant differences in OS were observed according to tumor stage, PAM50 + CLP and pathology-based classification. Regarding RFS, significant HRs were observed for histological grade, stage, PAM50 + CLP, and IHC-FISH-based classification (P value <0.05) (Supplemental Table 1).

Multivariable Cox models were constructed to test the independent prognostic value of PAM50 + CLP and IHC-FISH-based classification against the standard clinical variables including histological grade and tumor stage. Pathological complete response (pCR) was included in the Cox model for RFS, as it has been clearly demonstrated that pCR is an independent survival predicting factor. It should be noted that, we did not register any cancer-related death events within patients that showed a pCR to neoadjuvant treatment and therefore this variable did not achieve convergence in the OS model (Supplemental Table 2).

In the multivariate analysis, basal-like subtype had significantly worst OS and RFS (HR = 5.89, 95 % CI 2.40–14.42, P < 0.001 and HR = 2.28, 95 % CI 1.09–4.75, P = 0.023, respectively). Similarly, the TN phenotype was an independent predictor of poor prognosis (HR = 2.74, 95 % CI 1.09–6.86, P = 0.031 for OS and HR = 2.09, 95 % CI 1.01–4.30, P = 0.046 for RFS).

To compare the amount of independent prognostic information provided by each subtype classification, we estimated the likelihood ratio statistic after accounting for clinical–pathological variables (age at diagnosis and tumor stage).The results showed that the PAM50 + CLP predictor significantly added a greater degree of prognostic information compared to the pathology-based classification in terms of OS (Fig. 3).

Fig. 3
figure 3

Relapse-free survival (AB) and overall survival (CD) likelihood ratio statistics of subtypes defined by gene-expression or pathology-based definitions. Models were first conditioned on one predictor, and then the significance of the other was tested

Additionally, we calculate the Harrell’s C-index to evaluate the ability of Cox models to discriminate between deceased and non-deceased patients and relapsed and non-relapsed patients. Table 2 provides the C-index to predict OS and RFS for the Cox model based on clinical variables, the model including PAM50 + CLP and the clinical variables and the model based on the pathology-based classification and the clinical variables. As shown, all methods demonstrated statistically significant capacity to discriminate patients with different prognosis (c > 0.5). The model of clinical variables alone yielded the lowest C-index. However, there were no significant differences between the C-index yielded by the clinical variables model and any of the models including the subtype assessment.

Table 2 C-index to predict OS and RFS and their confidence intervals (CI) by the model of clinical variables alone, PAM50 + CLP with the clinical variables, and IHC-FISH based classification and clinical variables

Chemosensitivity of triple negative tumors

Among the 94 patients, 54 (57.3 %) were treated preoperatively with single agent doxorubicin and 40 (43 %) were treated with single agent docetaxel. Patients that showed a progressive disease (5 %) assessed by RECIST criteria did not adhered the adjuvant protocol and were treated according to normal standard of care. There were 14 deaths in the doxorubicin branch and 8 patients deceased in docetaxel arm. In the overall group, no significant differences in OS were found among patients preoperatively treated with docetaxel and those treated with doxorubicin (P = 0.580) (Fig. 4a). However, when stratifying by subtype, significant differences were found in OS according to preoperative chemotherapy within the subgroup of TN (P < 0.05). As shown in Fig. 4b, TN tumors had worse OS when treated with neoadjuvant doxorubicin (adjusted HR = 5.98, 95 % CI 1.25–28.67, P = 0.025). Similarly, basal-like subtype appears to have higher risk of death when treated with preoperative doxorubicin. This difference approached statistical significance (adjusted HR = 5.02, 95 % CI 0.96–26.38, P = 0.057) (Fig. 4c). In contrast, regarding claudin-low tumors we found that there were no differences in OS according to preoperative treatment (P = 0.784) (Fig. 4d), indicating that, unlike the basal-like tumors which appears to have different outcome depending on the preoperative drug, claudin-low tumors might not behave this way. Indeed, we found that claudin-low tumors treated with neoadjuvant doxorubicin had significantly better OS than basal-like tumors treated with neoadjuvant doxorubicin (adjusted HR = 0.16, 95 % CI 0.04–0.69, P = 0.014).

Fig. 4
figure 4

a OS Kaplan–Meier plot according to treatment branch. b OS Kaplan–Meier plot according to treatment branch within TN tumors. c OS Kaplan–Meier plot according to treatment branch within basal-like tumors. d Kaplan–Meier plot according to treatment branch within claudin-low tumors

Discussion

In this study, we have evaluated the clinical, prognostic, and predictive significance of the intrinsic subtypes in a cohort of patients enrolled in a clinical trial. Patient’s outcome in terms of treatment response has been previously reported by our group [6]. Now, we corroborate the previously reported results with the survival data and importantly, we show, for the first time, that claudin-low and basal-like tumors might respond differentially to neoadjuvant chemotherapy.

First, our data shows that the correlation between the genomic and the pathology-based subtype assessment is suboptimal (κ = 0.551, 95 % CI 0.467–0.641). Remarkably, our data suggest that TN phenotype and CBP are not good surrogates for basal-like subtype. CBP has been proposed as a suitable proxy for basal-like tumors [13, 20]. However, for CBP definition, the basal-like subtype was assessed as described by Sorlie et al. [2] and therefore claudin-low subtype was not considered, as this subtype was not yet identified by that time. It has been reported, that CBP tumors have significantly worse outcome compared to the 5NP tumors [20]. This observation has led to the speculation that 5NP tumors would be enriched for claudin-low tumors. Our results are not in this direction since 5NP were mostly basal-like tumors (data not show). Nonetheless, our cohort is rather small and further paired microarray-IHC studies are warranted. It should be noted that for the claudin-low centroid calculation, the expression of hundreds of genes are taken into account. Thus, unlike PAM50 assay CLP is not useful for FFPE samples and sufficient high quality RNA generation might constitute a limiting step for the claudin-low subtype identification in sized cohorts.

Similarly, the assignment of luminal B subtype by PAM50 does not equate with the clinical subgroup. Some studies have drawn attention to the fact that reproducibility of luminal B identification is not as good as would be desired [21, 22]. This might be due to heterogeneity within the luminal B subtype and it is likely that additional sub-classification of this subtype might occur as genomic knowledge advances. Indeed, Curtis et al. [23] have reported, novel subgroups, predominantly in luminal tumors.

Additionally, we have comprehensively described the clinical and histological characteristics of intrinsic subtypes. In this regard, we have found that claudin-low tumors are diagnosed in more aged patients (P = 0.016). This is particularly pronounced if we compare the age at diagnosis of basal-like and claudin-low tumors (47 vs. 61 years, respectively) supporting that within TN tumors claudin-low tumors are clinically distinct from basal-like tumors. To our knowledge, this observation has not been reported elsewhere, if this observation is validated in lager studies, the age of tumor diagnosis might serve as a variable that could help clinicians to orientate the tumor subtype within TN tumors.

Numerous studies have analyzed the prognostic information provided by breast cancer molecular subtypes [3, 4, 9, 1216]. As expected, in our study tumor subtype showed prognostic significance in terms of OS and RFS. In multivariable analysis, that incorporated the standard clinical variables, PAM50 + CLP and IHC-FISH-based classification remained significant. In addition, we have found that the amount of prognostic information, in terms of OS, provided by PAM50 + CLP is greater than the information provided by the pathology-based classification. However, we failed to demonstrate an improvement in the C-index estimated for the combined models over the clinical model. This may be due to the small sample size of the study.

Several studies have highlighted that treatment outcome could be affected by tumor subtype [6, 7]. We have previously showed that triple negative/basal-like tumors had better pCR rates when treated with neoadjuvant docetaxel while no significant differences were seen in the remaining intrinsic subtypes [6]. Therefore, subtyping does not only provide prognostic information but also it could be useful for treatment selection. Subsequently, we demonstrated that the pathological assessment of neoadjuvant chemotherapy response accurately predicted OS and RFS [17]. Consistent with this, our results now indicate that basal-like as well as TN tumors have better survival when treated with neoadjuvant docetaxel compared to doxorubicin. Several studies have reported that the addition of taxanes to anthracyclines-based chemotherapy reduce the likelihood of recurrence and death [24, 25]. However, it is not clear if this benefit is because multi-drug combinations act synergistically or because the spectrum of sensible tumors might increment as drugs are added to chemotherapy combinations. In the view of our results, it can be hypothesized that basal-like tumors and TN tumors might not benefit from doxorubicin. Therefore, by treating these patients with anthracyclines combinations we might be exposing them to some toxicities of considerable clinical concern without a significant benefit.

According to our results, claudin-low tumors appear to have different chemosensitivity from basal-like tumors and therefore the identification of this subtype may be of clinical importance. Unlike basal-like tumors, OS in claudin-low tumors was not affected by neoadjuvant treatment branch. Moreover, claudin-low tumors had significantly lower risk of death than basal-like tumors when they were first treated with doxorubicin. The value of our findings is strengthened by the prospective and planned nature of the study, as well as the study design (comparative, single drug arms). However, the study cohort included few patients and this observation should be validated in more sized cohorts.

In this study, we considered the outcome of patients following cross over to the opposite drug although, patients showing progressive disease did not fully adhere to the protocol. It is well established that neoadjuvant tumor response determines patient’s survival [17, 26]. On the other hand, it can be hypothesized that sequence in treatment might affect survival in certain subtypes. Unfortunately, our results are inconclusive at this concern and more studies to explore this issue would be of particular interest.

In conclusion, accurate stratification of breast cancer molecular subtypes is of clinical interest as it will enable personalized approaches and thus enhance the efficacy of therapy, eliminate ineffective treatments and reduce therapy cost. Here, we report significant differences in clinical–pathological features and chemosensitivity between claudin-low and basal-like tumors suggesting that these tumors are not only different at the genomic level, but also they might be clinically different entities. Nonetheless, the sample size of the study cohort is rather small and larger trials would be needed to validate our observations.