Introduction

Controversy exists as to which methodology is best to inform the choice of specific treatment regimens for patients with early breast cancer. Undisputed is however the existence of molecular subgroups within breast cancers with different activated pathways, different outcomes, and responses to therapy [1, 2]. The number of molecular subgroups and the corresponding targeted therapies represent a field under intensive investigation [3]. However, generally accepted is the existence of at least the following 4 molecular subgroups: Luminal A, Luminal B, HER-2-enriched, and Basal-like [4].

Immunohistochemistry (IHC) may be used as a surrogate way to stratify patients according to these subgroups, but nowadays, several molecular assays have come to market that aim to identify the different molecular subgroups. These gene expression-based assays, such as BluePrint, measure a greater number of genes than with currently used pathological criteria (IHC/FISH) [5]. Discrepancy is expected between IHC/FISH and molecular-based subtyping as the true biological profile of a tumor is more accurately identified by evaluating the expression level of more genes. Indeed, the two methods look at different things: where ER, PgR, and HER-2 are measured individually at the protein level by IHC, BluePrint was developed to capture the underlying biologic pathways [5]. For ER, for instance, mutations have been described that make it dysfunctional, and thus the tumor is non-luminal regulated; however, IHC identifies the tumor as being ER-positive. BluePrint subtyping was in line with the genotype classifying these tumors as non-Luminal-type [6].

Treatment allocation thus far has been based on a sub-optimal way of capturing the tumor’s biological pathway, and thus its ability to respond to a certain treatment. The goal of capturing the true biological profile of the tumor is to improve treatment allocation. For instance, non-Luminal-driven tumors are not responsive to endocrine treatment.

The 5-year outcomes have been reported for the international, prospective, randomized, phase III MINDACT study, and the results provided level IA evidence for the clinical utility of the MammaPrint signature when used in addition to standard clinical–pathological criteria for selecting patients for adjuvant chemotherapy [7]. These data provide an advantage for better understanding the seeming counterintuitive discrepancies between molecular and IHC/FISH-based subtyping, with central pathology assessments for ER, PgR, HER-2, and Ki67, as well as central mRNA expression data for 5806 patients (87% of patient samples) [8].

The primary outcome of MINDACT showed that MammaPrint could identify 46% of clinically assessed high-risk patients as molecularly Low Risk of recurrence for whom adjuvant chemotherapy could be avoided, with a trade-off of 1.5% in distant-metastases-free-survival (DMFS) at 5 years [7]. The clinical assessment of risk was based on Adjuvant! Online, an easy to implement, relatively accurate, and homogeneous way of assessing the clinicopathologic risk often used in daily clinical practice at the time of MINDACT development. However, there are several limitations of Adjuvant! Online and clinical–pathological assessment of risk has evolved since the MINDACT trial was developed, currently including more defined thresholds for PgR as well as Ki67 assessment to stratify Luminal A versus Luminal B tumors. This sub-study of the MINDACT trial allows for comparison of the molecular and the pathological classifications of breast carcinoma.

Patients and methods

Patients

Female patients (n = 6693) with histologically proven operable invasive breast cancer and 0–3 positive lymph nodes were enrolled in MINDACT, between February 2007 and July 2011. For further details see [9, 10]. The protocol was approved by independent ethics committees and medical authorities of participating countries. All patients provided written informed consent. The study was conducted in accordance with the Declaration of Helsinki and good clinical practice guidelines. Here, we compare outcome based on molecular subtyping (MS) to surrogate pathological subtyping (PS) as endorsed by 2013 St. Gallen Consensus.

Tumor Samples

Prior to enrollment for randomization and stratification, local pathology categorization of hormone receptor status was determined and a frozen core biopsy (3–6 mm) of the surgical tumor sample was sent to Agendia NV (Amsterdam, The Netherlands) for microarray analyses. A representative diagnostic paraffin tissue block of each tumor was sent from each participating center to the European Institute of Oncology (IEO) (Milan, Italy) for central pathology re-assessment. Combined MammaPrint and BluePrint readout was available for 6688 patients. Central pathology results were unavailable for 865 patients because the sample had not been sent for central assessment. Among the 5823 patients with central pathology laboratory results, 17 had incomplete data, leaving 5806 samples for comparative analyses.

mRNA microarray assessment

Microarray analysis for obtaining the 80-gene BluePrint subtype and 70-gene MammaPrint profiles was performed at the centralized Agendia laboratory (Amsterdam, the Netherlands) on frozen tumor samples, blinded for clinical and pathological data. Frozen sections of each sample were obtained and stained with hematoxylin and eosin and analyzed by an experienced breast pathologist. To ensure sufficient tumor volume for microarray analysis, all samples included in this study showed a tumor cell percentage of at least 30%. RNA isolation, labeling, and hybridization were performed as described previously [11]. RNA was co-hybridized with a standard reference to the custom-designed diagnostic chip, each containing oligonucleotide probes for the profiles in triplicate or more. Fluorescence intensities on scanned images were quantified and normalized using Feature Extraction software (Agilent Technologies, Santa Clara, USA). BluePrint determines the correlation index of each sample’s 80-gene profile with each of three distinct molecular subtyping centroids: Luminal, HER2-enriched, and Basal-type. MammaPrint sub-stratifies Luminal-type samples into Luminal A (MammaPrint Low Risk) and Luminal B (MammaPrint High Risk) [5].

Central IHC/FISH assessment

In the central laboratory, ER and PgR status were assessed on FFPE tissue blocks by IHC using the ER/PgR PharmDX kit (Dako, Glostrup, Denmark). Tumors were classified as ER- or PgR-positive when ≥1% invasive tumor cells showed definite nuclear staining, irrespective of staining intensity [12]. Additional analyses were done for ER and PgR using ≥10% as borderline for classifying tumors as positive. HER-2 expression was evaluated with the HercepTest kit (Dako) and scored as 0, 1+, 2+, or 3+, according to the FDA scoring system. Tumors scored as 2+ or 3+ were re-tested with FISH using the PathVysion HER-2 DNA probe kit (Vysis-Abbott, Chicago, USA). Cases were considered HER-2-positive if scored 3+ by IHC and/or amplified by FISH (ratio ≥2).

Ki67 protein status was assessed in FFPE tissue blocks by IHC using the MIB-1 monoclonal antibody (Dako). Tumors were classified as Ki67-positive when ≥14% invasive tumor cells showed definite nuclear staining, irrespective of staining intensity. Additional analyses were done for Ki67 using ≥20% as cut-off for classifying tumors as positive, as well as several potential cut-offs for positivity in order to determine the optimal threshold for Ki67 positivity (described below in statistical analysis section).

Statistical analysis

This translational research compared DMFS between patients classified by molecular subtyping (MS) and pathological subtyping (PS) as endorsed by 2013 St. Gallen. Distant metastasis-free survival (DMFS) was defined as the time until the first distant metastatic recurrence or death from any cause. The primary hypothesis was that among PS Luminal patients, those with HER-2+ or Basal-type tumors by MS would have a decreased DMFS compared to MS Luminal patients. At α = 5% with 220 events, the study has 80% power to demonstrate this for HR = 2.44. Reported hazard ratios were adjusted for chemotherapy (yes/no) and endocrine therapy (yes/no) administration. Five-year DMFS estimates were obtained using the Kaplan–Meier method. Agreement is depicted in cross tables for the clinical–pathological subtypes as defined by St. Gallen 2013 [4]. For Luminal A versus B sub-stratification analysis, the agreement statistics for the 14 and 20% cut-points for Ki67 are provided by percentage of concordance and Cohen’s κ coefficient [13]. Also, we have implemented 100 possible cut-off values for Ki67 (from 1 to 100%) and for each cut-off, the specificity and sensitivity was calculated with respect to MammaPrint. Statistical calculations were conducted using SAS® 9.4 (SAS Institute Inc., Cary, USA).

Results

According to Pathological Subtyping (using 4 categories of the St. Gallen 2013 surrogate definitions), 47% of tumors were classified as Luminal A, 34% as Luminal B (HER2−), 10% as HER2+ , and 9% as Triple Negative. According to BluePrint and MammaPrint (Molecular) subtyping, 63% of tumors were Luminal A, 20% Luminal B, 6% HER2-enriched, and 11% Basal-type (Table 1; also depicted as pie charts in Fig. 1). Treatment allocation for all patients both classified according to Molecular and Pathological Subtyping is shown in Tables 2 and 3.

Table 1 Molecular subtypes (BluePrint/MammaPrint) versus central pathology St. Gallen 2013 subtypes (four categories)
Fig. 1
figure 1

Reclassification based on molecular subtyping

Table 2 Adjuvant treatment for patients classified according to molecular subtyping
Table 3 Adjuvant treatment for patients classified according to pathological Subtyping

The primary hypothesis was not met. PS luminal patients (total n = 4718) classified as HER-2+ (n = 30) or Basal-type (n = 99) by MS did not have a significantly different 5-year DMFS (88.0% for HER-2+ and 90.2% for Basal) compared with the MS Luminal classified patients (95.9%): HR 1.40; 95% CI 0.75–2.60 (p = 0.294).

Comparing the two subtyping methodologies, 1738 (30%) of tumors had a different classification. The most pronounced differences were as follows:

  1. (1)

    Molecular Subtyping (MS) classified 54% as Luminal A among the Luminal B by Pathological Subtyping (PS).

  2. (2)

    MS classified 38% as Luminal (A and B) and 5% as Basal-type among the HER-2+ by PS.

  3. (3)

    MS classified 5% as Luminal (A and B) among the TN cases by PS.

Secondary exploratory analyses comparing total patient population for both assessments indicate several differences:

Luminal A versus B cases by the two types of classification (i)

Looking in more detail at the classification of Luminal patients, MS identifies 3657 patients (63%) as Luminal A, while PS identifies 2747 (47%) of patients as Luminal A. Treatment allocation for the Luminal A and B patients according to MS and PS is depicted in Tables 2 and 3 respectively.

When comparing the 5-year DMFS for the Luminal A patients as classified by both methods, the 5-year DMFS for MS was 96.7%, while DMFS was 97.2% for PS (Fig. 2a, b).

Fig. 2
figure 2

a DMFS by molecular subtype. MS classified 63% of patients as Luminal A disease. b DMFS by clinical subtype. PS identified 47% of patients as Luminal A

Low agreement with ‘optimal’ cut-off for Ki67

St. Gallen 2013 has changed its surrogate definition of Luminal A/B from 2011 by increasing the Ki67 threshold from ≥14 to ≥20% and including a PgR threshold of <20% for Luminal B definition. St. Gallen 2013 surrogate definitions of Luminal A and B are in better concordance (71%, 69–72 95% CI) with MammaPrint/BluePrint than the St. Gallen 2011 definitions (60%, 58–61 95% CI); however, the agreement with MammaPrint/BluePrint classification is only “fair” (kappa 0.35, 0.32–0.37 95% CI), with about one-third of cases discordant. From 100 possible cut-off values (from 1 to 100%), the ‘optimal’ cut-off for Ki67 with respect to MammaPrint is 18% (Supplementary Fig. 1). An updated pathological definition of intrinsic molecular subtypes has been proposed which includes an additional stratification for patients with “intermediate” (14 to 19%) or “high” (≥20%) Ki67 positivity stratified by PgR expression (negative or low versus high) [14]. Overall concordance increased even further (76.2%, 75.0–77.5 95% CI); however, the comparison still did not reach satisfactory agreement (kappa 0.43, 0.40–0.46 95% CI) (Supplementary Table 1).

Pathologically assessed HER-2 cases classified as Luminal by BluePrint (ii)

Among HER-2+ patients by PS (n = 557), MS classified 38% as Luminal (A and B) and 5% as Basal-type. The relatively large group of clinical HER2+ cases that are BluePrint Luminal suggests that tumor expression of the Luminal profile is dominant compared with the expression of the HER2 profile.

The 5-year DMFS for MS Luminal HER-2+ patients was 91.9% (95% CI 87.0–95.0). The 5-year DMFS was 95.4% (95% CI 92.4–97.3) for the MS HER-2 classified HER-2+ patients. One could hypothesize that the Luminal classified HER-2+ patients would benefit less from anti-HER-2 treatment, given the tumor’s dominant underlying genotype. However, subgroup analysis shows that the 5-year DMFS is higher (96.2%) for the subgroup of patients (n = 80) treated with anti-HER-2 treatment, compared with 89.2% for the subgroup of patients (n = 131) not treated with anti-HER-2 treatment. Note that treatment was not based on either molecular or pathological subtyping, therefore caution is mandated in interpreting these results.

Disparities in clinical triple-negative cases and Basal-type cases by BluePrint (iii)

In PS TN cancers, MS identified a small subgroup of 24 out of 531 patients (5%) as Luminal-type (Luminal A n = 14; Luminal B n = 10) with 5-year DMFS of 100% versus 71.4% for MS HER-2+ or 90.1% for MS Basal-type (Fig. 3).

Fig. 3
figure 3

DMFS for In Triple-Negative (TN) cancers, re-classified with Molecular Subtyping. * Firth’s method was used, since classical estimation failed due to 0 events in the Luminal group

99 out of 625 MS Basal-type patients are PS Luminal HER2-negative; 2/3 of these patients have low centrally assessed IHC PR expression and 1/3 have low centrally assessed ER expression (≥1 and <10%). The 5-year DMFS outcome of these patients (90.2%; 95% CI 82.0–94.8) indicates the outcome to be in line with the outcome of the MS Basal-type patients that are also Triple Negative according to central pathology (90.1%; 95% CI 87.0–92.5).

Discussion

Marked differences were observed between BluePrint and MammaPrint (microarray based) breast cancer (MS) subtypes and centrally re-assessed pathological surrogates (PS) (based on ER, PR, HER2 & Ki67). Patients with molecularly defined Luminal-type tumors did not have statistically different DMFS compared with patients with HER-2 and Basal-type tumors. However, when comparing molecular with pathological subtyping, the greatest discordance is seen in the subgrouping of Luminal patients: MS re-stratified 54% of patients with a Luminal B PS subtype to a low-risk Luminal A-type group without compromising outcome. One possible criticism to the MINDACT trial is that Adjuvant! Online is used for clinical–pathological risk assessment, which does not include some important factors such as the level of positivity of ER, PgR, or Ki67. The current study shows that an improved clinical classification, using centrally assessed pathological markers including high-quality Ki67, might still overestimate the number of patients assigned to adjuvant chemotherapy.

The surrogate pathology-assessed definition of Luminal A and B tumors is largely based on the Ki67 labeling index. However, Ki67 measurements lack inter-observer and inter-laboratory reproducibility: in a recent international Ki67 reproducibility study, substantial variability in Ki67 scoring was observed among some of the world’s most experienced laboratories. The authors concluded that Ki67 values and cut-offs for clinical decision-making cannot be transferred between laboratories without standardizing scoring methodology because the analytical validity is limited [15]. Misinterpretation of the Ki67 labeling index may result in a lost opportunity for patients to receive chemotherapy or may result in patients being over-treated. In 2011, the International Ki67 in Breast Cancer working group published recommendations for Ki67 assessment in breast cancer. The guideline aimed for better analysis, reporting, and use of Ki67 that should minimize inter-laboratory variability and improve inter-study comparability of Ki67 results [16]. However, ‘intermediate’ Ki67 labeling index (10–30%) remains a challenge and well-validated methodologies to evaluate the “grey zone” around the cut-off points of Ki67 may allow more accurate risk estimation and therefore better clinical management [17]. In the St. Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer 2015, a majority of the Panel was prepared to accept a threshold value of Ki67 within the range of 20–29% to distinguish ‘luminal B-like’ disease, though about one-fifth of the Panel felt that Ki67 should not be used at all for this distinction. Only a quarter of the Panel believed that subtype determination could be replaced by risk scores derived from multi-parameter molecular markers [18].

Our study shows that the optimal threshold for Ki67 best correlating with the results of MammaPrint is 18%. This is slightly higher than the 13.25% threshold based on the 50-gene PAM50 classifier to identify tumors as being either luminal A or luminal B [19].

The group with the least discordancy between pathological and molecular classification is the Basal-type/triple negative. Conventionally classified triple-negative tumors using central IHC/FISH pathology were Basal-type by Molecular Subtyping in 94% of cases, which is similar to a previously reported study [8]. Our study results indicate that patients with tumors not confirmed to be Basal-type by BluePrint, have an outcome not different from those with luminal tumors, albeit based on limited patient numbers. This is exploratory evidence of the potential to improve prognostication among PS TN patients by using MS.

The present study confirms that approximately 1 in 50 IHC ER-positive breast cancer patients are classified as Basal-type by Molecular Subtyping and High Risk by MammaPrint. This was previously described by a relatively high expression of the dominant negative ERα-splice variant ERD7 in ER-positive/Basal-type tumors as compared to ER-positive/Luminal-type tumors (p < 0.0001) [6]. Expression of the dominant negative ERα variant ERD7 provides a rationale as to why tumors are BluePrint Basal-type while staining ER-positive by IHC; the BluePrint test appears to measure ER activity independent of the ERα mRNA expression level itself. These tumors may lack a functional response to estrogen and consequently may not respond to endocrine therapy.

The percentage of triple-negative tumors classified as Basal-type with BluePrint is higher (94%; 500 out 531 patients) than reported for PAM50, for which a prevalence of 73 and 80% has been reported [3, 20]. The discordance can be explained by a reclassification of some of these tumors to the HER2-enriched group by PAM50 thus leading to dilution of ‘anti-Her2 treatment sensitivity identification’. BluePrint/MammaPrint molecular subtyping classifies less than 1% of clinical luminal/HER2-negative and 1% of triple-negative patients as HER2-type, allowing the predictive sensitivity for anti-Her2 treatment to be significant between molecularly versus pathologically identified subgroups [21]. A difference between PAM50 and BluePrint molecular subtyping profile can be expected because the development of the two molecular subtyping profiles is inherently different; PAM50 was based on unsupervised clustering while BluePrint was developed on a supervised training, leading to a functional subtyping profile [5].

A large proportion (38%) of pathologically assessed HER2-positive cases are Luminal by Molecular Subtyping, indicating that tumor expression of the Luminal profile is dominant compared with the expression of the HER2 profile. No treatment implications for this patient group can be drawn based on the current analysis, since patients have not been treated according to the Molecular classification, and therefore, for the time being, anti-HER2 therapies are indicated for all pathologically assessed HER-2-positive cases.

The main implication of this study is the higher percentage of patients assigned to the low-risk Luminal A-type group by the molecular subtyping, where adjuvant chemotherapy is usually not indicated. Patients with discordant Basal-type results need special consideration, since the current analysis provides evidence for outcomes to be in line with the molecular classification. Thus, when disparity exists between the two classification methods, there may be a role for molecular classification when determining treatment allocation.

In conclusion, the current study shows that the primary outcome of the MINDACT study is not limited to the use of Adjuvant! Online for the clinical assessment of risk: When compared with a more contemporarily used classification method, including high-quality assessment of Ki67, the molecular classification may be able to identify a larger group of patients with a low risk of recurrence.