Introduction

Breast cancer is a heterogeneous disease with four major molecular subtypes: luminal A, luminal B, human epidermal growth factor receptor 2 (HER2)-enriched, and basal-like [1, 2]. These subtypes vary in their genomic, clinical, and pathologic features, and have important implications for treatment [3, 4] and clinical outcome [5]. In particular, subtype classification is clinically relevant for predicting recurrence risk and survival. Patients with basal-like tumors have a poorer prognosis than patients with luminal subtypes, although patients with luminal B subtype have significantly worse clinical outcome than those with luminal A subtype [1, 5,6,7].

Breast cancer subtype classification based on immunohistochemical (IHC) surrogate methods is widely used in clinical practice in accordance with St. Gallen International Breast Cancer Consensus recommendations [8, 9]. Molecular subtyping derived using tumor grade and IHC is highly correlated with intrinsic subtypes [6, 10, 11], and is a practical and cost-effective alternative to gene expression profiling [8]. While tumor grade is a valuable prognostic factor for breast cancer prognosis [12], it may not be optimal for distinguishing luminal A versus B subtypes due to heterogeneity among moderately differentiated (grade 2) tumors [13, 14].

Ki67—also known as Ki67 antigen or MKI67 (marker of proliferation Ki67)—is a marker of proliferation expressed exclusively during active phases of the cell cycle [15, 16]. Ki67 is commonly assessed by IHC in clinical settings and has been correlated with clinical outcome [17]. However, use of Ki67 in the clinical management of breast cancer patients is limited by the lack of analytic validity in its assessment [18]. Ki67 scoring reproducibility is only moderate when manual scoring methods are used [19], and thus, there is currently no consensus on the optimal Ki67 cut point for molecular subtyping and prediction of breast cancer prognosis [18].

Using prospective data from the Nurses’ Health Study cohort, we systematically evaluated the robustness of Ki67 staining by Definiens digital image analysis (DIA). In addition, we examined the prognostic value of using Ki67 at various cut points to distinguish luminal tumors for distant recurrence, breast cancer-specific and overall mortality, adjusting for established prognostic clinico-pathologic and lifestyle factors.

Materials and methods

Study population

The Nurses’ Health Study (NHS), established in 1976, is an ongoing prospective cohort study of 121,701 female registered nurses aged 30–55 at enrollment. Biennial questionnaires are used to collect data on lifestyle factors and health outcomes, including breast cancer, with a follow-up rate of over 90% [20]. Return of questionnaires was considered implied consent. Incident breast cancer cases were ascertained by biennial questionnaire and the National Death Index, and confirmed by medical record review [21]. Informed consent was obtained from all participants to collect and use tissue specimens for research. This study was approved by the Human Subjects Committee at Brigham and Women’s Hospital (Boston, MA).

Breast cancer tissue block collection and selection

The collection of archived formalin-fixed paraffin-embedded (FFPE) breast cancer blocks from participants diagnosed with primary incident breast cancer began in 1993 and currently includes 30 years of follow-up (1976–2006). Tissue microarray (TMA) construction was performed as previously described [21, 22]. Participants were eligible for this study if they were diagnosed with non-metastatic primary invasive breast cancer between 1976 and 2006 with no previous history of cancer and had FFPE breast cancer tissue available with pathologist-confirmed tumor on the TMA. We identified 3284 tumors and excluded cases with in situ breast cancer (n = 339), stage IV disease (n = 57), diagnosis before 1976 (n = 1), and previous non-skin cancer diagnosis (n = 234). Our final sample included 2653 breast tumors.

Immunohistochemical analysis

We previously performed IHC staining and scoring for ER-α, PR, HER2, cytokeratin 5/6 (CK5/6), and epidermal growth factor receptor (EGFR) on 5 μm paraffin sections from TMA blocks [22, 23]. Ki67 immunostaining was optimized in the BWH Specialized Histopathology Core and performed on a Dako Autostainer (Dako Corporation, Carpinteria, CA, USA). Briefly, tissue sections were deparaffinized in xylene and rehydrated in a series of ethanol. After heat-induced inactivation of endogenous peroxidase activity and antigen retrieval in citrate buffer (pH 6.1), tissue sections were incubated with Ki67 antibody (1:250 dilution of clone SP6 antibody from VP-RM04, Vector, Burlingame, CA, USA). SP6 clone from VP-RM04 has been used previously in large studies of FFPE TMA breast tumor tissue [10]. In addition, SP6 performs better than MIB1 in image analysis on FFPE TMA breast tumor tissue due to its reduced background [24].

Scoring of Ki67

Nuclear staining of Ki67 was assessed in up to three cores per breast tumor. The percentage of Ki67 positive tumor cells and the intensity of Ki67 staining were measured using DIA with the Definiens Tissue Studio package (Definiens Tissue Studio software, Munich, Germany; Scanner: Pannoramic SCAN by 3DHISTECH; Scanner software: Pannoramic Scanner by 3DHISTECH (Version 1.17); Scanner viewer: Pannoramic Viewer by 3DHISTECH (Version 1.15.4)). DIA was trained to distinguish malignant breast epithelial cells from non-malignant cells (e.g., stroma, lymphocytes and normal breast cells) based on nuclear size, contour, and other presets. For each tumor, the sum of Ki67-positive tumor cells in all cores was divided by the total number of detectable tumor cell nuclei in all cores to create a continuous Ki67 score. We dichotomized Ki67 score at various cut points—6.7% (median), 10, 14, 20, 25, and 30%—to generate different definitions of Ki67 positivity. Ki67 histological score, which sums the weighted proportion of Ki67-positive tumor cells in three levels of staining intensity (low/medium/high), correlated nearly perfectly with Ki67 score (Spearman ρ = 0.99). In a representative subset of tumors (n = 159), we validated DIA Ki67 continuous score with manual (visual estimate of the percent positive tumor cells) Ki67 continuous score ascertained by an expert pathologist (LCC) and found strong agreement between methods (Spearman ρ = 0.86).

Classification of breast cancer molecular phenotype

Five breast cancer molecular subtypes were defined by immunostaining for ER-α, PR, HER2, cytokeratin 5/6 (CK5/6) and epidermal growth factor receptor (EGFR), and histologic grade in the primary definition [23] or Ki67 in the secondary definition, for this study. Luminal A cases were ER+ and/or PR+ and HER2− and grades 1 or 2 (low or intermediate grade). Luminal B cases were ER+ and/or PR+ and HER2− and grade 3 (high grade), or ER+ and/or PR+ and HER2+ with any grade. HER2-enriched cases were ER- and PR- and HER2+. Basal-like cases were ER− and PR− and HER2−, and CK5/6+ and/or EGFR+. Unclassified cases were negative for all five markers. Separately, we defined triple-negative breast cancer (TNBC) as ER−, PR−, and HER2− in subanalyses.

Clinical outcomes

Distant recurrence, breast cancer-specific mortality, and overall mortality were the primary outcomes. Women with incident invasive breast cancer who reported subsequent cancer of the lung, liver, bone, or brain were considered to have breast cancer recurrence. Women who died from breast cancer and did not report recurrence were considered to have recurred two years prior to death [25].

Exposures

Ki67 percent positivity (continuous score), Ki67 high and Ki67 low (dichotomous), and luminal breast cancer subtypes defined with Ki67 at various cut points were the primary exposure variables. Luminal subtype classification based on Ki67 cut points was compared to classification using tumor grade.

Covariates

Information was collected on age at diagnosis (continuous), and on several risk factors prior to diagnosis including birth index (continuous) [26], oral contraceptive (OC) use (categorical), menopausal status and menopausal hormone (MH) use (categorical), BMI (categorical), and smoking status (categorical). Weight change (categorical) and physical activity (categorical) were assessed >12 months after diagnosis [25, 27]. Clinico-pathological features and treatment factors included tumor stage (categorical), ER/PR status (categorical), chemotherapy (yes/no), radiotherapy (yes/no), and hormone therapy (yes/no).

Statistical analysis

Spearman correlations and Wilcoxon two-sample tests were used to assess the statistical significance of staining agreement among tumor cores. Associations of Ki67 with tumor features, breast cancer risk factors, and molecular subtypes defined using tumor grade were evaluated using Chi square (χ 2) tests and Kruskal–Wallis tests to assess significance.

We used multivariable Cox regression models to estimate hazard ratios (HR) and 95% confidence intervals (CI) for the association between luminal subtypes defined with various Ki67 cut points and clinical outcomes, and for the relationship between Ki67 score (and Ki67 at the 14% cut point) and clinical outcomes in all breast cancer and ER+ breast cancer.

All analyses were conducted with SAS version 9.3 (Cary, NC, USA). All statistical tests were two-sided and p-values <0.05 were considered statistically significant.

Results

Table 1 shows the distribution of clinico-pathologic features in 2653 breast tumors according to Ki67 positivity at the 14% cut point. Women with Ki67 high (≥14% positive nuclei), tumors tended to be older at diagnosis (p < 0.0001). Compared with Ki67 low (<14% positive nuclei), Ki67 high tumors were larger size, higher grade, higher stage, and more likely to be ER−, HER2+, CK5/6+, and EGFR+ (p < 0.0001). EGFR+ tumors had a mean Ki67 score of 19.0% compared to 9.4% for EGFR− tumors. ER+ and PR+ tumors had a lower Ki67 score (p < 0.0001). ER+ tumors had a mean Ki67 score of 9.9% compared to 17.5% for ER− tumors. Significant associations were observed at all cut points for Ki67 positivity (data not shown), suggesting that the relationship between Ki67 and these breast tumor features is quite robust.

Table 1 Distribution of clinico-pathological features of 2653 incident invasive breast tumors by Ki67 positivity, Nurses’ Health Study, 1976–2006

Figure 1 shows representative images for IHC staining used for manual scoring (top row) and DIA (bottom row) for Ki67 in breast tumor tissue specimens at Ki67 scores of 1, 5, 10, 14, 20, and 50%.

Fig. 1
figure 1

Ki67 staining in breast tumor tissue specimens using immunohistochemistry (IHC). Representative Ki67 staining images at various percentages of tumor positivity. Top panel is IHC staining image used for manual scoring, bottom panel is Definiens digital analysis image at 20X magnification. Ki67 staining was scored continuously as the percentage of Ki67 positive tumor cells relative to the total number of detected nuclei

Molecular subtypes were defined for 2555 cases. Mean Ki67 score varied significantly across breast cancer subtypes (p < 0.0001) (Table 2). Mean Ki67 score was higher in grade-defined luminal B (12.6%), HER2-enriched (17.9%), and basal-like (20.6%) subtypes compared to luminal A (8.9%).

Table 2 Mean Ki67 score in 2555 incident invasive breast tumors by molecular subtype defined with tumor grade, Nurses’ Health Study, 1976–2006

Next, luminal subtype classification based on Ki67 cut points was compared to classification using tumor grade. Reclassification occurred when a case defined as luminal A by grade was classified as luminal B using Ki67, or vice versa (Table 3). The extent of reclassification varied by Ki67 cut point, ranging from 18.8 to 34.7%. At the Ki67 14% cut point, 24.5% of luminal cases (n = 496) were reclassified, with 47.0% of these being reclassified from luminal A to luminal B. Among the reclassified luminal B cases (n=233), 72% were moderately differentiated (grade 2).

Table 3 Luminal subtype reclassification comparing subtypes defined with tumor grade to subtypes defined with Ki67 at various cut points in 2025 incident invasive luminal breast tumors, Nurses’ Health Study, 1976–2006

After adjusting for clinico-pathologic features, lifestyle prognostic factors, and treatment, there was a modest increased risk of breast cancer-specific death comparing luminal B to luminal A breast cancer consistent across Ki67 cut points (Table 4). The association appeared to be strongest for Ki67 cut points ≤20% (6.7% cut point: HR 1.38, 95% CI (1.13–1.70), p = 0.002; 10% cut point: HR 1.32, 95% CI 1.07–1.63, p = 0.009; 14% cut point: HR 1.38, 95% CI 1.11–1.72, 0.004; 20% cut point: HR 1.28, 95% CI 1.01–1.62, p = 0.04). We observed several suggested increased risks of distant recurrence comparing luminal B to luminal A breast cancer (6.7% cut point: HR 1.23, 95% CI 1.01–1.50, p = 0.04; 10% cut point: HR 1.19, 95% CI 0.97–1.45, p = 0.09; 14% cut point: HR 1.22, 95% CI 0.99–1.51, p = 0.06; 20% cut point: HR 1.17, 95% CI 0.93–1.46, p = 0.19; 25% cut point: HR 1.17, 95% CI 0.92–1.49, p = 0.19; 30% cut point: HR 1.24, 95% CI 0.96–1.59, p = 0.10). We also observed a slight increased risk of all-cause death comparing luminal B to luminal A breast cancer at lower Ki67 cut points (6.7% cut point: HR 1.18, 95% CI 1.02–1.36 p = 0.03; 10% cut point: HR 1.13, 95% CI 0.97–1.31, p = 0.12; 14% cut point: HR 1.17, 95% CI 1.00–1.37, p = 0.05; 20% cut point: HR 1.08, 95% CI 0.91–1.28, p = 0.36; 25% cut point: HR 1.07, 95% CI 0.89–1.28, p = 0.46; 30% cut point: HR 1.11, 95% CI 0.92–1.33, p = 0.28). Strikingly, there were no statistically significant associations of luminal B (compared to luminal A) defined using tumor grade and risk of distant recurrence (HR 1.18, 95% CI 0.96–1.44, p = 0.11), breast cancer-specific mortality (HR 1.16, 95% CI 0.94–1.43, p = 0.16), and risk of overall mortality (HR 1.05, 95% CI 0.91–1.22, p = 0.49).

Table 4 Age-adjusted and multivariable-adjusted relative hazards of clinical outcomes for luminal B (vs luminal A) breast cancer molecular subtypes defined with Ki67 at various cut points or tumor grade, Nurses’ Health Study, 1976–2006

We also examined the prognostic value of Ki67 at various cut points according to ER status. In multivariable models, there was no difference in risk of distant recurrence comparing ER+/Ki67 low tumors to other ER/Ki67 subtypes (ER+/Ki67 high, ER−/Ki67 low and ER−/Ki67 high; data not shown). We observed a modest increased risk of breast cancer-specific mortality comparing ER+/Ki67 low to ER−/Ki67 high tumors defined at Ki67 6.7, 10, and 14% cut points (14% cut point: HR 1.54, 95% CI 1.16–2.02, p = 0.002). We also observed a modest increased risk of overall mortality comparing ER+/Ki67 low to ER-/Ki67 high tumors defined with the Ki67 6.7 and 10% cut points (10% cut point: HR 1.26, 95% CI 1.04–1.53, p = 0.02). There was no difference in risk of distant recurrence, breast cancer-specific mortality or overall mortality comparing ER−/Ki67 low to ER−/Ki67 high tumors, or comparing TNBC/Ki67 high to TNBC/Ki67 low tumors.

Finally, we explored the relationship of Ki67 score (continuous) with clinical outcomes (Supplementary Table 1). In multivariable models, Ki67 score was not associated with clinical outcomes in all tumors but was associated with breast cancer-specific mortality in ER+ tumors (HR 2.94, 95% CI 1.32–6.54, p = 0.008). Further adjustment for tumor grade slightly attenuated this association (HR 2.75, 95% CI 1.22–6.21, p = 0.02).

Discussion

Breast cancer subtype classification is a valuable clinical tool for prognosis and clinical management of breast cancer patients, and this study establishes the Ki67 14% cut point as a predictor of breast cancer-specific mortality in luminal subtypes, independent of risk factors for breast cancer survival, clinico-pathological features, and treatment.

Median Ki67 in all breast cancer (6.7%) was lower than the cut points of 14 and 20% often cited in the literature to distinguish luminal subtypes. This difference is likely due to two factors. First, 50% of our cases were luminal A tumors, which we demonstrated have the lowest mean Ki67 among the subtypes. The distribution of subtypes in this large population of women was not enriched for luminal subtypes, and is similar to other population-based cohorts [28,29,30] although classification methodologies vary. Second, manual reading tended to overestimate Ki67 staining (62% of cases), which could explain higher mean Ki67 scores and a higher cut-off value in studies that use manual Ki67 scoring. The significantly higher mean Ki67 scores in HER2+ tumors, EGFR+ tumors, and CK5/6+ tumors support the theory that increased proliferative capacity may explain in part their aggressive behavior and poor prognosis [6]. Mean Ki67 was lower in luminal breast cancers than in HER2-enriched and basal-like breast cancers, consistent with previous studies [10].

Smaller studies have found that mean Ki67 varied significantly between HER2+ luminal B and HER2− luminal B breast cancer [31]. There was no apparent difference in mean Ki67 score between these two subsets of luminal B in our well-powered study, suggesting that the extent of proliferative activity in these subsets is similar.

Breast cancer may be classified into molecular subtypes using a panel of immunohistochemical markers with tumor grade in clinical settings, but gene expression profiling is the gold standard. Because we do not have gene expression profiling on our breast tumors, we did not aim to validate cut points for subtyping. Instead, we used molecular subtypes defined with tumor grade to assess whether various cut points of Ki67 reasonably well classified luminal breast cancers, and to identify features associated with reclassified tumors. Our results are consistent with previous findings that Ki67 at 14% is a good marker for luminal subtype classification. Importantly, the vast majority of reclassified tumors were luminal A tumors of intermediate grade. In ER+ breast cancer, low grade (grade 1) and high grade (grade 3) tumors have been found to be strongly associated with a gene expression grade index based mostly on cell cycle regulation and proliferation. In contrast, intermediate grade (grade 2) tumors are highly variable in their gene expression grade index [14]. In this study, Ki67 appears to distinguish different groups of luminal tumors that are moderately differentiated (grade 2) based on their variation in proliferative activity. Thus, Ki67 staining may provide a relatively simple and clinically applicable method to refine the classification of ER+ tumors with intermediate grade.

This study is among the first to evaluate Ki67 by DIA, and the first large-scale study to examine the relationship between Ki67 and clinical breast outcomes, adjusting for breast cancer prognostic factors. We observed a small but consistent increased risk of distant recurrence in luminal B compared to luminal A tumors at the Ki67 6.7, 10, and 14% cut points, consistent with previous studies demonstrating that Ki67 may be a valuable clinical marker for predicting breast cancer recurrence in luminal breast cancer [10, 32]. Ki67 predicts recurrence in subgroups of TNBC [33, 34], and it could plausibly predict worse breast cancer-specific mortality in ER− breast cancer. However, there was no difference in breast cancer-specific mortality between ER− tumors or TNBC tumors according to Ki67 positivity at any cut point in our study (data not shown). Our data support the clinical utility of Ki67 in predicting recurrence in luminal breast cancer, but suggest that it may not be as informative in ER- breast cancer, which is consistent with other studies [35].

Although our data suggest that there may be a small increased risk of overall mortality comparing luminal B to luminal A, these results were inconsistent across Ki67 cut points. There was no association between Ki67 positivity (>14%) and overall mortality in ER+ tumors in multivariable models; further investigation with time-varying treatment data may be warranted.

Importantly, there was a significant increased risk of breast cancer-specific mortality comparing luminal B to luminal A defined with the Ki67 14% cut point, but not with tumor grade. These data suggest that the Ki67 14% cut point better distinguishes luminal subtypes that differ in breast cancer prognosis. Although higher Ki67 cut points have recently been suggested [9, 36], we have shown that manual IHC scoring tends to overestimate Ki67 positivity in breast tumor specimens in this study. Identifying distinct luminal breast cancers based on proliferative activity may lead to improved clinical management of breast cancer patients, including enhanced prediction of prognosis. Although the Ki67 14% cut point was not data-derived in our study, we have shown that this cut point may have independent prognostic value for breast cancer-specific mortality. Whether the Ki67 14% cut point is a marker for two distinct luminal subtypes with different underlying prognoses, or a surrogate for luminal tumor aggressiveness and response to chemotherapy, cannot be determined within the scope of this study. A recent study found that Ki67 positivity in normal mammary epithelial cells predicts breast cancer risk among premenopausal women, which argues that Ki67 may play an early role in the etiology of luminal breast cancer [37].

Although we did not have gene expression profiling to benchmark our results, Ki67 in combination with ER, PR, and HER2 has previously been shown to be a cost-effective and robust biomarker panel for classifying luminal tumors. More recently, PR ≥20% has been proposed to distinguish luminal A versus luminal B tumors [38], but we do not currently have manual scoring or DIA at the 20% cut point. Another limitation is that information on neoadjuvant endocrine treatment, type of chemotherapy, and duration of regimen is not known. Therefore, we were not able to explore the potential predictive value of Ki67 for treatment response and there is the possibility of some residual confounding by treatment. This study has several strengths, including the use and validation of DIA to assess Ki67, which is an important step towards standardizing Ki67 assessment for clinical use. In addition, our study includes a large sample size of breast tumors, which provides sufficient statistical power, particularly for luminal subtype analyses. Another strength is that we were able to assess the independent prognostic value of Ki67 in breast cancer with adjustment for breast cancer prognostic factors.

In one of the largest prospective cohort studies to date examining the utility of Ki67 for luminal breast cancer classification and prognosis, we have demonstrated that DIA is a robust method for accurately quantitating Ki67 in breast tumors. Further, our data suggest that the previously established Ki67 14% cut point has prognostic value for luminal tumors independent of clinico-pathological features and breast cancer prognosis factors. Overall, our study provides additional support for the clinical relevance of using Ki67 in a molecular marker panel for luminal subtype classification and breast cancer prognosis.