Introduction

Prognostic factors are the clinicopathological variables associated with final outcomes (usually overall survival [OS]) that are used to estimate the risk of death in early breast cancer (BC) after surgery. Among them, tumor size, nodal status, and histological grade are the variables previously validated in clinical practice [15]. These 3 parameters form the basis of the Nottingham Prognostic Index, which was derived from a retrospective, multivariate regression analysis and splits patients into good, moderate, and poor prognostic groups [6]. Proliferation markers, expressed as the percentage of cells in the cell cycle, have been developed and used as discriminants of more aggressive malignant phenotypes, and are usually expressed by the immunohistochemical (IHC) staining of the cell cycle antigen Ki-67. However, the optimal approach to the assessment and interpretation of Ki-67 in clinical practice is still a matter of debate among pathologists. In particular, the cut-off adopted for the discrimination of cancers with a good versus a poor prognosis is still widely discussed. Ideally, a prognostic variable should identify a disease with (or without) enough of a risk of death or relapse to require further adjuvant medical treatment to improve survival rates. This information could be particularly useful in low-risk disease candidates for chemotherapy (CT).

The first meta-analysis published on the issue was a review of 46 studies by de Azambuja et al., who collected the data of 12,000 patients [7]. Patients were regarded as presenting positive tumors for the expression of Ki-67/MIB-1 according to cut-off points defined by the authors. This meta-analysis concluded that a high Ki-67/MIB-1 labeling index confers a higher risk of relapse and a worse survival rate in patients with early BC. The limitation of this analysis is that a discriminant cut-off point was not established, and the majority of the included studies reported hazard ratios (HRs) calculated as a univariate analysis. As a consequence, a strong and true independent prognostic value of Ki-67 could not be established.

A more comprehensive, independent, prognostic validation of Ki-67 is thus required, particularly when it comes to evaluating clinical outcomes in ER+ BC. We report a systematic review and meta-analysis of the independent significance of Ki-67 expression in terms of clinical outcomes in early BC. We also assess, where possible, the influence of Ki-67 in ER+ BC and according to different cut-off points (10–20 % vs. 20–25 % vs. ≥25–30 %). Finally a metaregression analysis according to ER+ and nodal status was performed.

Materials and methods

The analysis in this paper was conducted in line with the Preferred Reporting Items for Systematic Reviews and Meta-analyses guidelines [8].

Search methods and criteria for selecting studies for this review

An electronic search of PubMed, EMBASE, SCOPUS, the Web of Science, CINAHL, and the Cochrane Register of Controlled Trials was performed. The search terms included ((Ki[All Fields] AND 67[All Fields]) OR (“Ki-67 antigen”[MeSH Terms] OR (“Ki-67”[All Fields] AND “antigen”[All Fields]) OR “Ki-67 antigen”[All Fields] OR “mib 1”[All Fields]) OR (“Ki-67 antigen”[MeSH Terms] OR (“Ki-67”[All Fields] AND “antigen”[All Fields]) OR “Ki-67 antigen”[All Fields] OR “mib 1”[All Fields]) OR Ki-67[All Fields] OR “proliferative marker”[All Fields]) AND ((“breast neoplasms”[MeSH Terms] OR (“breast”[All Fields] AND “neoplasms”[All Fields]) OR “breast neoplasms”[All Fields] OR (“breast”[All Fields] AND “cancer”[All Fields]) OR “breast cancer”[All Fields]) OR (“breast neoplasms”[MeSH Terms] OR (“breast”[All Fields] AND “neoplasms”[All Fields]) OR “breast neoplasms”[All Fields] OR (“breast”[All Fields] AND “carcinoma”[All Fields]) OR “breast carcinoma”[All Fields])) AND (“mortality”[Subheading] OR “mortality”[All Fields] OR “survival”[All Fields] OR “survival”[MeSH Terms]) AND ((hazard[All Fields] AND (“Ratio (Oxf)”[Journal] OR “ratio”[All Fields])) OR HR[All Fields]) AND (multivariate[All Fields] OR (cox[All Fields] AND (“regression (psychology)”[MeSH Terms] OR (“regression”[All Fields] AND “(psychology)”[All Fields]) OR “regression (psychology)”[All Fields] OR “regression”[All Fields]))). The citation lists of the retrieved articles were screened manually to ensure the sensitivity of the search strategy.

The inclusion criteria for the primary analysis were as follows: 1) studies published as full articles, and in the English language, on (at least 10) adult patients with resected non-metastatic BC that reported either the prognostic impact of Ki-67 evaluated with IHC or the mRNA content in the RNA extracted from frozen or formalin-fixed paraffin-embedded (FFPE) tissue and 2) the availability of HRs and 95 % confidence intervals (CI) for OS- or BC-specific survival (BCSS). For a secondary analysis, studies providing HRs for disease-free survival (DFS) or relapse-free survival (RFS) were also included. Duplicate publications were excluded. Two reviewers (FP, MC) independently evaluated all the titles identified by the search strategy. The results were then pooled, and all potentially relevant publications were retrieved in full. The same two reviewers then evaluated the complete articles for eligibility. To avoid the inclusion of duplicated or overlapping data, we compared author names and the institutions where the patients were recruited. Then, if substantial doubts remained, the more recent study was included in the analysis.

Data extraction

The following details were extracted: name of the first author, type of study, year of publication, number of patients included in the analysis, rate of ER+ BCs, cut-off defining high Ki-67 expression, technique and antibody used for the Ki-67 staining, median follow-up, HRs for OS, DFS or BCSS as applicable, and the covariates used for the multivariate analysis of OS. The HRs were only extracted from multivariable analyses.

Data collection and statistical analysis

The meta-analysis was initially conducted for all the included studies for each of the endpoints of interest. OS was the primary outcome of interest and DFS the secondary outcome considered. “High” Ki-67 was defined according to the cut-off chosen by each author. Subgroup analyses were only conducted for ER+ BC, the different cut-offs adopted in the papers for the primary outcome (10–20 % vs. 20–25 % vs. ≥25 %, ≥20 % vs. <20 % and ≥25 % vs. <25 %), and if there were at least 3 papers for each subgroup. The extracted data were aggregated into a meta-analysis using the RevMan 5.3 software (Cochrane Collaboration, Copenhagen, Denmark). Estimates of HRs were weighted and pooled using the generic inverse variance and random effects model or the fixed effects model according to the heterogeneity [9]. Trend across subgroups was tested using metaregression with the rate of Ki-67 expression, ER+, and pN0 rates as the modifier of interest, treated as a continuous variable. The regression equation estimates the percentage of increased risk of death predicted at any given increase in Ki67, ER+, and pN0 rates of each study.

Publication bias was evaluated using Begg’s test (rank regression), Egger’s test (linear regression), and funnel plot. Rosenthal’s fail-safe N test was used to compute the number of missing studies (with a mean effect of zero) that would need to be added to the analysis to yield an overall nonsignificant effect (P > 0.05). A higher N meant more robust results. Heterogeneity was assessed with the Cochran Q and I 2 statistics. All the statistical tests were 2-sided, and statistical significance was defined as P being less than 0.05.

Results

Figure 1 shows the flow diagram of the studies included in our meta-analysis. Forty-one studies published between 1996 and 2015, covering 64,196 patients, were included [1050]. Table 1 presents the studies’ main data. The population of patients in each study varied from 92 to 20,023 cases, and the follow-up time ranged from 28 to 188 months. The MIB-1 antibody was applied to detect Ki-67 expression with IHC methods in n = 23 studies. The IHC methods were adopted in all trials except 1 that used a TMA-based analysis. Scoring system was lacking in almost all trials except in n = 11 where Ki-67 was evaluated after count in at least 250 (n = 1), 250–500 (n = 1), 500 (n = 4), 1000 (n = 3), and 2000 (n = 2) nuclei. Cut-off chosen were >5 % (n = 3); >10–11 % (n = 9); >14–15 % (n = 5); >20 % (n = 15); >25 % (n = 1); >30 % (n = 3); different cut-offs were tested in n = 3 trials and in n = 2 studies and the cut-off was not reported. More than 50 % of papers adopted Ki-67 cut-offs ≥14 %, the value first appeared to distinguish luminal B from luminal A BC as presented in the paper of Cheang et al. in 2009 [53]. In the papers presented before 2009, lower Ki-67 thresholds were, infact, used (Table 2).

Fig. 1
figure 1

Flow diagram of studies included in the meta-analysis

Table 1 Characteristics of included studies
Table 2 Hazard ratios for high versus low Ki-67 levels and covariates used for multivariate analysis

Twenty-five and 27 publications had available data for the OS and DFS analyses, respectively.

Meta-analysis of overall survival

Overall, n = 25 studies were available for the OS analysis (in n = 6 studies, the BCSS rate was available instead of OS). The pooled HR for high versus low Ki-67 was 1.57 (95 % CI 1.33–1.87, P < 0.00001; Fig. 2). The heterogeneity was high (P < 0.00001, I 2 = 76 %), and so a random effects model was used.

Fig. 2
figure 2

Meta-analysis of OS for high versus low ki-67 staining

Meta-analysis of disease-free survival

Overall, n = 29 studies were available for the DFS analysis (in n = 1 and n = 8 studies, event free survival (EFS) and RFS were reported instead of DFS). The pooled HR for high versus low Ki-67 was 1.50 (95 % CI 1.34–1.69, P < 0.00001; Fig. 3). The heterogeneity was high (P < 0.00001, I 2 = 82 %), and so a random effects model was used.

Fig. 3
figure 3

Meta-analysis of DFS for high versus low ki-67 staining

Sensitivity analysis

Twenty-three studies were considered in the subgroup analysis (n = 2 were excluded because they did not define the cut-off level for high Ki-67 expression). In studies where the cut-off value for Ki-67 was ≥10 and <20 % (n = 9 studies), the pooled HR for OS was 1.28 (95 % CI 1–1.64, P = 0.05; Fig. 4). The heterogeneity was high (P = 0.0003, I 2 = 72 %), and so a random effects model was used. In n = 10 studies, the cut-off for high Ki-67 was ≥20 but <25 %, and the pooled HR for OS was 1.44 (95 % CI 1.13–1.83, P = 0.004; Fig. 4). The heterogeneity was high (P = 0.01, I 2 = 58 %), and so a random effects model was used. Finally, where the cut-off used to split high versus low Ki-67 was ≥25 % (n = 5 studies), the pooled HR for OS was 2.05 (95 % CI 1.66–2.53, P < 0.00001; Fig. 4) with low heterogeneity (P = 0.83). The difference between the last subgroup and the others was statistically significant (P = 0.01; Fig. 4). The difference between the 2 subgroups that used the lower cut-offs (10–20 and 20–25 %) was not significant (P = 0.56).

Fig. 4
figure 4

OS meta-analysis of studies according to different ki-67 cut-offs (10–20; 20–25 and ≥25 %)

If we grouped studies with a cut-off of Ki-67 >20 % versus those with a cut-off <20 %, the HRs were 1.31 and 1.64 (P = 0.005 and <0.00001, respectively; data not shown), although the difference among these 2 subgroups was not significant (P = 0.12). If we split the studies according to cut-offs < vs ≥25 %, the HRs were 1.38 and 2.05 (P = 0.0004 and 0.00001, respectively), and the difference among these subgroups was statistically significant (P = 0.005; Fig. 5).

Fig. 5
figure 5

OS meta-analysis of studies according to cut-off ≥ and <25 %

In 6 studies where only ER+ tumors were analyzed, the HR for high versus low Ki-67 was significant (HR = 1.51, 95 % CI 1.25–1.81, P < 0.0001) and the heterogeneity was low (P = 0.28, I 2 = 20 %; Fig. 6).

Fig. 6
figure 6

OS meta-analysis of studies with ER+ patients only

Metaregression analysis confirmed that any increased risk of death due to high Ki-67 level is not dependent and related with rates of ER+ status in each study and rates of pN0 BCs (P = 0.38 and P = 0.31). On the contrary, the regression equation confirmed that for any 10 % increase of Ki-67 level there is a significant 19 % increase in the risk of death (P = 0.05).

We also considered any potential publication biases in the studies analyzed for the primary endpoint. Note that only 3 studies lie to the left of the funnel and 1 to the right (Fig. 7). Moderate asymmetry was observed upon visual inspection of funnel plots; however, quantitative assessment by Begg’s test (P = 0.39) and Egger’s test (P = 0.00003) suggested that there was only modest publication bias. Rosenthal’s fail-safe N was 297, meaning that 297 ‘null’ studies would need to be located and included in order for the combined 2-tailed P value to exceed 0.05. Therefore, the result was relatively robust.

Fig. 7
figure 7

Funnel plot for publication bias of OS meta-analysis: Begg’s test was not significant intending no significant bias was observed P = 0.39 (circles are the studies that were almost all around the vertical line that is the pooled log hazard ratio of meta-analysis)

Discussion

Prognostic factors play an important role in the decision-making process concerning adjuvant treatment in medical oncology. Genomic tools are now available to estimate the prognosis in the initial stages, with low and intermediate risks of relapse; and quantify the added benefit of adjuvant CT when associated with endocrine therapy. The prognostic role of Ki-67 staining in pathology reports is still a matter of debate, and is not conventionally accepted. This meta-analysis shows that a high Ki-67 cut-off level (at least 10 %), evaluated using IHC methods, is associated with more than 50 % risk of death among patients with early BC, particularly in those with ER+ disease, where the risk of death increases by a similar magnitude. Furthermore, a higher Ki-67 labeling index is associated with a greater risk of recurrence (64 % increased risk). The value of the association between the level of Ki-67 and prognosis examined in this paper was only evaluated in studies that calculated HRs using a multivariate Cox regression analysis, where Ki-67 was adjusted with respect to common prognostic variables (e.g., stage, grade, and ER status). The prognostic value of Ki-67 is also confirmed in ER+ BC studies, which is information that could potentially aid decision making about postoperative treatment. Metaregression also validated that Ki-67 level is not influenced by ER expression and nodal status. Indeed, Ki-67 becomes valuable when clinicians estimate a prognosis and have to decide the value of adjuvant CT in early-stage disease, particularly in luminal B BC.

In 2008, a similar meta-analysis was published by Stuart-Harris and colleagues [51], who analyzed the independent prognostic value of Ki-67 for OS and DFS in 13 and 14 studies, respectively. The HRs in that research were 1.73 and 1.84, and both were significant. However, that review only covered trials published up to 2004. In this paper, we now include both more and recent trials, adding further information to current knowledge. Indeed, our study includes an assessment of the cut-off point that is potentially able to separate high versus low-risk patients. A proliferative marker like Ki-67 is useful, for example, in distinguishing luminal A-like from luminal B-like tumors, but the appropriate cut-off point is still a matter of debate among oncologists. At the 2015 St. Gallen Breast Cancer Conference, a median cut-off value within the range of 20–29 % above or below which the disease can be defined as “luminal B-like” was proposed and accepted by the majority of panelists. However, a fifth of them did not consider Ki-67 to be a useful marker with which to distinguish luminal A-like from luminal B-like tumors [52].

The question of the best cut-off point to adopt for Ki-67 in clinical practice is a matter of broad discussion, and a consensus is far from being reached on the original proposal of using a threshold ≥ 14 % to distinguish luminal B from luminal A tumors [53]. ESMO guidelines, for example, refer to a cut-off of 20 % for both Ki-67 and the progesterone receptor PgR to define luminal B-like, HER-2-negative BC that is suitable for adjuvant CT [54]. However, the guidelines state that laboratory-specific cut-off points can be used to distinguish between low and high values for Ki-67 and PgR. Quality assurance programs are also essential for laboratories reporting these results. Conversely, NCCN guidelines do not currently recommend the assessment of Ki-67 [55]. The meta-analysis in this paper shows that Ki-67 is best regarded as a continuum, because all cut-off points above 10 % are associated with a poorer prognosis. We cannot define a precise cut-off point above or below which the prognosis is very different. However, if we split studies with a cut-off >25 % and compare them with those that used a cut-off below this figure, the prognostic information is greatest in the former (HR = 2.05 vs. 1.38) and the difference between them is significant. Alternatively, if we split studies with a cut-off ≥ vs. <20 %, both groups are associated with a significantly increased risk of death (HR 1.64 vs. 1.31), although the subgroup difference is not significant. Attempts to standardize Ki-67 assessments were made by Dowsett et al. [56] and Polley et al., who evaluated the factors contributing to inter-laboratory discordance that can make it difficult to obtain a useful cut-off point for clinical decision making [57]. The same authors made similar attempts in 2015, and stated that before Ki-67 could be recommended for clinical use, further research was needed to standardize how it is assessed and correlate it with outcomes [58].

In addition to its prognostic importance, several authors have also demonstrated the value of Ki-67 for predicting the benefit of adjuvant therapy in high-risk luminal B-like and node-negative patients. Penault-Llorca showed that patients whose tumors had Ki-67 levels >20 % benefited from the addition of docetaxel (HR 0.51 vs. 1.03 for those with Ki-67 < 20 %) [59]. Similarly, Criscitiello and colleagues found a significant benefit from the addition of CT to endocrine therapy in luminal B-like BC with a Ki-67 > 32 % [60]. Viale et al. [11], through a centralized analysis of Ki-67 by IHC in tumor blocks of patients enrolled in the BIG 1-98 trial, confirmed the prognostic role of Ki-67 in a population treated with endocrine therapy alone. They also observed a greater benefit of letrozole compared to tamoxifen in BC patients whose tumors had Ki-67 levels >11 % (HR = 0.53 vs. 0.81 in those with Ki-67 ≤ 11 %). The significance of Ki-67 in highly proliferative tumors like triple-negative BC is, however, unknown, because almost all these BCs have a very proliferative phenotype.

Our meta-analysis has some intrinsic limitations. First, the population data were extracted from published papers and individual patient data were unavailable. Second, the cut-offs used in the series were chosen conventionally by the authors, and there is no definitive consensus among pathologists to date on the optimal Ki-67 threshold for accurately defining high versus low-risk patients, particularly in luminal B-like disease. Third, each cut-off interval defined in this meta-analysis (10–20, 20–25 and >25 %) was chosen somewhat arbitrarily, and the greater magnitude in studies with higher cut-offs does not mean that this represents the ideal point at which to split patients with the poorest prognosis. Also, scoring methods (nuclei counts) were not specified in almost all trials, and this methodology concern could have contributed to the observed differences and heterogeneity among cut-offs. Finally, Ki-67 cut-off, as calculated from this meta-analysis, does not permit to anticipate and suggest a better or lesser benefit from adjuvant therapy. It is merely a prognostic variable, which can aid the discussion and information with the patient to estimate his risk of relapse and death, but not the relative advantage with any systemic therapy. Probably, a Ki-67 labeling index should be used as a continuous score, with the highest values associated with the greatest risk of death, and this was confirmed by metaregression analysis. However, this meta-analysis confirms the independent prognostic significance of Ki-67 in early BC, in both pN0 and pN + stages, and particularly in the ER+ subgroup. An arbitrary cut-off of at least 25 % seems to be associated with better discriminatory power compared with lower cut-off points (10–20 and 20–25 %).

The present data are derived from studies published mainly in the last decade that encompass more than 60,000 BC patients, and are more robust and up to date compared to the work examined in similar previous meta-analyses.

Using IHC to assess the Ki-67 labeling index is less expensive and more widely available than the genomic tests that are currently used to identify patients with high-risk lymph node-negative BC who may benefit from adjuvant CT. Some publications have already evaluated and identified the concordance between the Ki-67 labeling index and the results of the Oncotype Dx® associated risk of recurrence [6164]. Recently, a Magee equation using histological variables including Ki-67 was able to predict correctly the final Oncotype Dx® score in a series of 283 case of BC [65]. Similarly IHC4 score, that considers ER, PgR, HER2, and Ki-67, was able to independently predict prognosis as well as Oncotype Dx® [66, 67]. Accordingly, the Ki-67 labeling index may be a useful marker where molecular assays are not technically feasible or available.

In conclusion, this meta-analysis of a series of more than 60,000 patients confirms that the proliferative marker Ki-67 has an independent prognostic value in terms of survival and relapse in patients with early-stage BC, and should be routinely assessed by pathologists.

A Ki-67 threshold of at least 25 % of immunostained cells is associated with the most powerful outcome prognostication; however, this not relieve the need for a prospective validation.