Introduction

Identification of key prognostic and predictive factors in breast cancer treatment has been vital in designing appropriate therapeutic strategies and develop better patient management algorithms. Traditionally, hormone receptors (HR) expression and human epidermal receptor (HER2/neu) amplification are routinely performed for prognostication and prediction of treatment response [1]. Selection of patients who are more likely to benefit from the use of adjuvant chemotherapy requires better markers of treatment response and efficacy. Although proliferation has been recognized to be of potential importance, its clinical use has been limited by the lack of standard guidelines or firm recommendations for appropriate testing method, quantitation, reporting, and validation of clinical relevance [2••, 36].

To date, immunohistochemical staining for Ki67 (MIB-1) has emerged as an easy and rapid tool to assess the proliferation index of a tumor. Ki67 has been successfully used as a risk stratification tool to not only guide therapeutic decisions; but also, decide therapeutic endpoints for neoadjuvant treatment [6, 7••, 8]. With the onset of molecular profiling, proliferation-related genes have been an integral part of the genomic tests like Oncotype Dx [9]. Ki67 is also a useful supplement to the HR and HER2/neu markers in immunohistochemically assigning tumors to approximate molecular classification subtypes [1, 8]. The Breast Cancer Working Group of the International Breast Group and North American Breast Cancer Group has recognized deficiencies in analysis of Ki67 and proposed a set of guidelines for immunohistochemical evaluation of Ki67 aimed to reduce pre-analytical and analytical variations and allow harmonization of methodology to enhance the clinical utility of ki67 [2••]. A follow-up reproducibility study highlighted major variations that existed even among experienced laboratories and breast pathology experts [7••]. This raises the question: are we really ready for prime time use of Ki67?

Ki67 as a Clinically Useful Biomarker

Malignancy is characterized by autonomous growth secondary to uncontrolled proliferation. Higher proliferation rates have been equated with aggressive behavior in tumors [2••]. Proliferation can be assessed either by visually counting mitotic figures, immunohistochemically by staining tissue sections for various proliferation markers, incorporation of nucleotides into DNA, or flow cytometric analysis of fraction of cells in the S-phase [2••, 1012]. Immunohistochemical staining for Ki67 using anti-MIB1 antibodies is emerging as a rapid and effective assay to evaluate proliferation. A number of studies in the recent literature highlight the role of Ki67 as a prognostic and predictive marker [3, 13].

Ki67 as a Prognostic Marker

As early as 1980s, immunohistochemical staining for Ki67 demonstrated an association of Ki67 expression levels with poor differentiation and early recurrence [14]. A recent review by Yerushalmi [3] and colleagues has cited studies suggesting that there is relatively robust evidence of Ki67 being a prognostic marker for breast cancer. Stuart-Harris et al. [15] showed, shorter overall survival for patients with high Ki67 (HR = 1.73) in their meta-analysis, which included 43 studies and 15,790 patients. De Azambuja et al. [13] showed overall worse disease-free survival (DFS) [HR = 1.93, P <0.001], and overall survival (OS) [HR = 1.95, P <0.001] in patients with positive Ki67 expression. The significantly worse prognosis was noted in node-positive as well as node-negative population. Other individual studies also suggest that Ki67 is useful in assessing the prognosis of estrogen receptor (ER) positive early breast cancers, thus, indicating its usefulness when used in combination with other biomarkers [6, 1619].

Feeley et al. [6] in a recent study confirmed that Ki67 can be used to segregate ER-positive, node-negative breast cancers into prognostically meaningful subgroups to help guide therapy. A cohort of 359 ER-positive, Her2-negative, and node-negative tumors were categorized as Luminal A and Luminal B based on low Ki67 (<14 %) and high Ki67 (≥14 %), respectively. Survival analysis performed revealed Luminal B tumors to have significantly worse disease-free-survival compared with Luminal A tumors (log rank P = 0.0164). Similar results were obtained on univariate Cox regression analysis (RR = 2.0, 95 % CI, 1.12–3.58, P = 0.0187).

Cuzick et al. developed an immunohistochemical score (IHC4) incorporating immunohistochemical stains for Ki67 together with estrogen receptor (ER), progesterone receptor (PgR), and human epidermal receptor-2 (HER2) [19]. Their comparative study suggested the score has prognostic value, at least, equivalent to mRNA-based, 21 gene Genomic Health recurrence score in a cohort of 1125 ER positive tumors. In a study of 1017 patients, the immunohistochemical score (IHC4) was also compared with the PAM50 and Oncotype Dx, and demonstrated relatively similar prognostic information. The value of such immunohistochemical-based scores lies in the fact that, if appropriately performed, they may provide an easy and cheap alternative for the expensive gene-based assays used for prognostication in breast cancer [19, 20].

Ki67 as a Predictive Marker

Studies of Ki67 as a predictive marker are limited. The Breast International Group (BIG) trial suggested that high Ki67 may predict a benefit for an adjuvant taxane-based regimen compared with a nontaxane regimen and for letrozole compared with adjuvant tamoxifen [3]. It has also been hypothesized that higher proliferation rates are consistently related to chemotherapy response in tumors [3]. Thus, Ki67 levels may be helpful in selecting cases which may show a beneficial predictive response. As a corollary, some studies show that ER-negative breast cancers, which tend to have high Ki67 indexes, appear to respond better to chemotherapy compared with ER-positive tumors. Using a cut-off value of 20 %, addition of docetaxel to fluorouracil and epirubicin chemotherapy was reported to be beneficial in ER-positive tumors with high Ki67 levels [21]. In a recent Japanese study [22] pretherapeutic Ki67 levels were evaluated on core biopsies and compared with postoperative levels after neoadjuvant chemotherapy in 121 patients. In their multivariate analysis they observed that Ki67 was a significant independent predictor of complete pathologic response (pCR) in ER-positive, hormone-sensitive tumors. Furthermore, in the subgroup analysis, patients with high Ki67 levels showed significantly improved pCR in Luminal-type breast cancers.

Ki-67 in Clinical Trials

Ki67 has also been advocated for use as a marker to decide end-of-neoadjuvant-treatment endpoint in clinical trials. Reduction in pre-therapy versus post-therapy Ki67 levels has been used in determining response to treatment. In the Immediate Preoperative Anastrozole, Tamoxifen, or Combined with Tamoxifen (IMPACT) study and the P024 study comparing neoadjuvant vs tamoxifen vs combination of anastrozole and tamoxifen difference in the degree of suppression of Ki67 levels in the 2 arms of the study group correlated with difference in the recurrence [2••, 18]. After 4 months of neoadjuvant endocrine therapy with either letrozole or tamoxifen, the authors of the P024 study observed that Ki67 was independently associated with recurrence-free and overall survival along with pathologic tumor stage, node status, and ER status in a multivariate analysis.

Peri-Operative Endocrine Therapy for Individualization Care (POETIC), a randomized trial utilizes Ki67 levels as a marker of benefit from presurgical nonsteroidal aromatase inhibitors. The study, which recruited about 4350 postmenopausal women of early stage, hormone-sensitive breast cancer, will evaluate the advantage of measuring 2-week Ki67 rather than pretreatment Ki67 [2••, 23].

Limitations of Ki67 as a Biomarker

The biggest concern for use of Ki67 as a useful clinical marker for prognostication and predicting response is the lack of standardization of staining techniques coupled by extremely poor reproducibility. Though, investigators from many of the co-operative breast cancer groups from North America and Europe designated “International Ki67 in Breast Cancer Working Group” agreed that immunohistochemical measurement of Ki67 is the current assay of choice for measuring and monitoring proliferation, they recognized that there was poor agreement on the precise clinical uses of Ki67 and substantial heterogeneity and variable levels of validity in methods of assessment [2••]. Experts at the St. Gallen consensus also agreed that standard cut-offs for risk stratification of Ki67 were not reliably established and laboratory specific values should be used [8]. Further, Stuart-Harris et al. [15] did not think it was worthwhile to include Ki67 as a part of the work-up as it did not provide any advantage over the Nottingham prognostic (NPI) and Adjuvant! (online). The meta-analysis by de Azambuja [13] could not prove Ki67 to be an independent prognostic factor given the study design limitation. Let us review some limitations of Ki67 as a biomarker.

Pre-Analytical Variation

Unlike assays for ER, PgR and HER2 receptor status, no definite guidelines are in place for Ki67 assessment. The evaluation of ki67 is laboratory specific. Several pre-analytical issues such as type of specimen, time to fixative, time of fixation, temperature of fixation, and specimen archival techniques are bound to affect Ki67 measurements. Study data is emerging that Ki67 staining has better tolerance of typical pre-analytical variability [2••]. Pinhel et al. [24] reported that Ki67 staining in immediately fixed fresh core-cut biopsies did not significantly differ compared with subsequently fixed main surgical resection specimen. In a study by Bai Y and group [25], no significant difference in Ki67 staining was observed in the core-cut and surgical samples. However, they cited power of their study as a potential limitation. Differences in the appearance of stained nuclei were reported in these studies: the more rapidly fixed cores showed well-circumscribed uniform staining consistently as against variable staining in whole tissue sections [7••]. Overnight delay before fixation, freezing the specimen for frozen section analysis before fixation, use of ethanol or Bouin fixative rather than neutral buffered formalin fixation, and use of EDTA or acid decalcification protocols were pre-analytical factors found to decrease Ki67 labelling [7••].

Analytical Variations

Assessment of Ki67 index is highly variable and observer- as well as laboratory-dependent. The International Ki67 in Breast Cancer Working Group, realizing the need for standardization of Ki67 assessment, recommended guidelines to overcome the pre-analytical variability and improve reproducibility in Ki67 testing [2••]. Subsequently, a reproducibility study was undertaken involving 8 eminent laboratories. Each participating laboratory received 100 breast core samples, 1 set stained by a central laboratory and 1 set to be stained locally by the participating laboratory. Two sets of experiments, 1 examining interlaboratory variability and another examining intralaboratory variability, were conducted. The study observed significant interlaboratory variability and striking heterogeneity in the interpretation of even centrally stained samples, which was worrisome [7••]. The possible reasons cited were discordance in selecting regions for quantification, the counting method, and subjectivity of assessment of staining intensity [7••]. Though intralaboratory variability existed, it was much less compared with interlaboratory variability.

Variations in Assessment of Ki67

Ki67 quantification is further complicated by the method used, which can range from a rapid visual estimate, to either manual or computerized analysis of digitized images [4, 7••, 26].

Gudlaugsson and colleagues [4] compared the effect of different techniques for measurement of Ki67 proliferation on reproducibility as a prognostic factor. In this study, evaluation of Ki67 index as assessed by the following techniques was compared using quick scan rapid estimate, ocular-square-guided counts, and computerized image analysis.

In the quick scan method the pathologist gave a global estimate of the percentage of Ki67 positive cells by scanning the slide rapidly. This can be considered as a crude technique of Ki67 estimation. The ocular-square-guided technique is a more systematic technique where the slide is screened to select areas with high Ki67 and then count the positively staining nuclei using a grid. The Ki67 index is expressed as the percentage of cells with positively staining nuclei to the total number of tumor cells counted. The group showed that these practices produced only modest inter- and intraobserver agreement.

In their comparative study, Gudlaugsson and colleagues [4] suggested that automated techniques were superior to manual methods. Another group [26] showed that a misclassification rate of 5 %–7 % was achieved with the use of digitized image analysis compared with 11 %–18 % by visual assessment technique further bolstering the use of automation in Ki67 estimation. While computerized methods can provide more accurate quantitation of strongly staining areas, weak staining can be read as negative stains by computer-aided programs. Thus, optimization of parameters and appropriate “training” of the programs can reduce the false negative result rate. In our experience, the selection of appropriate areas of tumor by a trained pathologist is essential as the computer can have difficulty in delineating benign and in situ carcinoma from invasive carcinoma. Once selected and properly optimized, digital imaging analysis can be a useful an adjunct in obtaining reproducible quantitation of Ki67 expression levels [27].

Lack of Standardized Cut-Off Values

It is common to dichotomize patients based on cut-off values for Ki67 [28]. Many cut-offs have been used in various clinical trials and studies, though in most the cut-off levels have ranged from 10 %–20 % [8]. The working group did not recommend any ideal cut-off values for clinical use since various studies utilize different cut-off levels of Ki67 and no quality assurance schemes are in place [8]. In the meta-analysis by de Azambuja et al. [13], the cut-off value for Ki67 rate ranged from 3.5 % to 34 %, in the 35 eligible studies that had sufficient data for hazard ratio (HR) computation. Yerushalmi and colleagues [3] noted that each study group chose its own endpoint value of Ki67 and used different set of parameters for their multivariate analysis, creating hurdles during meta-analysis; thus, failing to yield comprehensive understanding of the role of Ki67 in daily practice. They observed that in studies evaluating Ki67 as a predictive marker, the cut-off points variably ranged from as low as 1 % to as high as 40 %. A few studies did not have a cut-off point at all, while some other studies categorized Ki67 values into different scoring groups. The expert panel at the 2013 St. Gallen consensus further commented on this variability. Though, a cut-off value of 14 % has been proposed by the consensus group, to distinguish Luminal A-type (Ki67 < 14 %) from Luminal B-type in the surrogate IHC-based subtyping of breast cancer, the panel voted that a threshold of ≥ 20 % was clearly indicative of ‘high’ Ki67 status [8].

Thoughts and Future Direction

Though majority of the studies have shown Ki67 to be a marker of prognosis and prediction, the methodologies for Ki67 estimation as well as the type of study population have varied in individual studies. This does not provide a level ground for comparison amongst the studies. A Pubmed search was performed to find recent studies evaluating Ki67 as a prognostic and/or predictive marker in breast cancer. Studies published in the years 2013 and 2014, were selected, that had adequate data related to methodology of Ki67 estimation, patient demographics and had univariate or multivariate analysis of Ki67. These studies have not been included in any of the previous meta-analyses. The studies are summarized in Table 1.

Table 1 Recent studies evaluating Ki67 prognostic and/or predictive marker

Clearly there is a lack of standardization in methods used for Ki67 evaluation and quantitation. Hence, patients assigned to a particular group in 1 study may not be categorized in the same group in another study making comparisons between patient populations difficult across various studies. When Caldarella et al. [29] tried to apply cut-off values for Ki67 as suggested by Denkert et al. [30] to their cohort of 1475 cases from the Tuscan Cancer Registry, they altered the categories, thereby impacting their study results. Optimization of cut-off points for Ki67 will, hence, be a challenge. The results from individual studies should be interpreted with great caution.

Serious concerns have been also been raised over the reproducibility of the Ki67 assay, which need to be resolved before it can be used for the intended purpose. As per Evaluation of Genomic Applications in Practice and Prevention initiative, an assay cannot have clinical utility unless its analytical validity has been demonstrated. But the recent most studies [7••] have painted a contrasting picture, revealing an unacceptably poor analytical validity for IHC staining and Ki67 estimation. The reasons for this poor interlaboratory reproducibility include pre-analytical and analytical factors. Quality control measures, similar to that for evaluation of breast tumor levels of estrogen receptor and HER2, are clearly required.

Evaluation of Recommendations from International Ki67 in Breast Cancer Working Group

The recommendations by the International Ki67 Breast Cancer Working Group are a good first step toward standardization. Fixation with neutral buffered formalin for 4–48 hours has been shown to be adequate and antigenicity can be preserved, potentially for decades following proper paraffin embedding [2••]. McCormick et al. [31] observed that immunohistochemical staining for MIB-1 was superior to other proliferation markers such as anti-proliferating cell nuclear antigen (PCNA), or KiS1 producing consistently better performance across wide range of dilutions [31]. In another study, the same group showed that MIB1 was a robust marker of cell proliferation and the morphologic and cell distribution was identical to Ki67 even in formalin fixed paraffin-embedded tissue [31]. Given this proven track record monoclonal antibody to MIB1 has been recommended and considered “gold standard” to assess Ki67. Since negative nuclei are important to determine overall population to be evaluated, counterstaining should be adequately and appropriately optimized during immunohistochemical staining.

A study showed that core biopsies show better consistency and uniformity of nuclear Ki67 staining compared with surgical specimens [24]. This could relate to lesser time to fixation intervals as well as better fixation of the relatively smaller core biopsies compared with larger specimens. However, given the heterogeneous nature of breast cancer, Ki67 evaluation on core biopsies may not be entirely representative resulting in falsely high or low estimates. The current recommendation is to assess the whole section on the slide and record the overall average score. In a homogenously staining sample the recommendation is to count at least 3 randomly selected high power fields. Based on evaluated studies, the working group recommends pathologists to score at least 1000 cells, with a 500 cell count an absolute minimum for reporting to achieve precision. A recent study noted that there was dilution of Ki67 values when static methods ie, a set number of cells, either 200, 400, or 1000 cells were used for evaluation. They instead suggested a stepwise counting strategy for Ki67 estimation [28]. This data; however, requires validation.

Finally, no cut-off value was suggested by the recommendation group citing lack of consensus. There is a suggestion not to have a universal cut-off value but, it should differ depending on the clinical outcomes [32]. This can be one way of addressing the issue of varying population cohort. Cserni et al. [32] found that whether the Ki67 was estimated or quantified meticulously by counting cells, the values tended to cluster around numbers ending in ‘0’ and ‘5’, suggesting a cut-off point ending in these numbers would be more realistic and practical. Nishimura et al. [17] found similar survival outcomes in the group with Ki67 > 20 % and ≤ 50 %, when compared with tumors with Ki67 > 50 %; but, differed significantly in tumors with Ki67 < 20 %. A Ki67 of ≥ 20 % also significantly correlated with poor outcome in studies by Penault-Llorca et al. [21] and Weisner et al. [33]. A 20 % cut-off value, as suggested by the St. Gallen consensus experts, may be helpful to meaningfully stratify patients and can be advocated for use in clinical practice.

Conclusions

In numerous studies and clinical trials Ki67 has shown to have potential application as a prognostic and predictive marker and guide therapeutic strategies. However, extreme variability in reporting cut-off points for risk stratification and individual laboratory driven evaluation results, limit its clinical relevance. It is best for clinicians and researchers to exercise caution while comparing results of various studies in the absence of standardization. Lack of reproducibility among experts, despite recommendations further undermines its use as a clinical tool. It may be prudent to consider automated techniques but only in the context of proper selection of tumor areas by a trained pathologist and optimization of the quantitation parameters. Ki67 is an important and promising biomarker; however, remedial steps, to address the current limitations in its analysis, are required to be taken before advocating its use in routine clinical practice.