Introduction

Proliferation activity of different malignant tumors is an important prognostic factor which has a role in planning surgical and oncological treatment. This feature is also taken into account in breast carcinomas to distinguish between tumors with low and high proliferation [110]. This is especially true for ER positive, HER2 negative tumors which are more likely to respond to systemic chemotherapy when their proliferation is high, than when it is low [6, 1114]. There are different ways to evaluate proliferation activity. Beresford et al. have shown many methods to be problematic and have recommended the use of Ki-67 as a standard proliferation marker [15]. This protein is expressed in all phases of cell cycle, except G0 [16, 17]. This feature makes it the best marker to be detected by immunohistochemistry in different malignant tumors including breast carcinoma. Proliferating tumor cells show positive nuclear reaction with anti-Ki-67 antibodies. Different studies have demonstrated that a high Ki-67 LI indicates an increased risk of recurrence, metastasis, and faster progression of the disease [6, 11, 12, 1823]. The pathologic report generally refers to the percentage of positive tumor cells (Ki-67 labeling index, LI). The Ki-67 LI is determined either by estimation or by counting all over the world, but there are differences in the methods which are used [10, 24]. Because of its clinical role mentioned above, reproducibility is important, but is quite problematic in our opinion. Quality, type and size of the tissue, fixation time and human errors can all be causes of worse reproducibility. In 2009, a >30 % Ki-67 LI cut-off value was accepted at a St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer for high proliferation and for recommending adjuvant chemotherapy in endocrine responsive breast carcinomas [13, 25]. Two years later a >14 % cut-off was taken into account to delineate a surrogate approach for a part of luminal B carcinomas that could also be treated with systemic chemotherapy along with hormonal treatment [5, 14]. There are also different cut-off values of Ki-67 LI which have been proposed by others for the indication of chemotherapy for ER-positive patients [19, 22, 2628]. Beside that, there are no standardized methods for the elimination of the different previously mentioned factors influencing the Ki-67 LI, although efforts have been made towards standardization [5].

In a previous study we highlighted the limitations and suboptimal reproducibility of counting Ki-67-positive tumor cells under the microscope [29]. We also underlined the differences in personal evaluating methods of the LI. In the present study, we aimed to investigate how the use of a standardized, partially digitalized counting method could affect reproducibility of determining the Ki-67 LI.

Materials and Methods

Pretreatment diagnostic core biopsy samples of patients scheduled for neoadjuvant chemotherapy for breast cancer were analyzed in the study. The biopsies were taken from patients with operable T2 ≥ 3 cm or T3-4 and/or N1-2 and M0 breast cancer. For better comparability, the cases were identical with the cases of a previous study assessing interobserver and intraobeserver reproducibility of assessing the Ki-67 LI by routine work microscopy [29]. The tumor samples were fixed in buffered formalin and embedded in paraffin. Samples have been routinely stained with hematoxylin and eosin (HE) and routinely immunostained for estrogen receptor, progesterone receptor, HER-2 and topoisomerase II-alpha. Samples were also immunostained for Ki-67 with the following 3 antibodies: SP6 (monoclonal rabbit antibody, Hisztopatologia Kft., Pécs, Hungary), B56 (monoclonal mouse antibody, Hisztopatologia Kft., Pécs, Hungary) and MIB-1 (monoclonal mouse antibody, Dako, Glostrup, Denmark) for the purpose of a study. Wet antigen retrieval consisted of pretreatment of all samples in microwave oven in a citrate buffer with pH6 for 30, 30 and 50 min in case of MIB-1, B56 and SP6, respectively. All antibodies were diluted at 1:100. Expression of Ki-67 was determined using Dako EnVision FLEX/HRP, DAB + Chromogen (Dako, Glostrup, Denmark).

Microphotographs of each immunostained core-biopsy sample were taken with a x20 objective. A hot-spot area was photographed in all cases where such a hot spot could be identified. Pictures were entered in a Microsoft PowerPoint file. One photograph was analyzed in each case. Four different investigators first determined the Ki-67 LI by estimating the proportion of stained cells with 5 % precision in the same areas (i.e. the same digital image displayed on a screen). No counting was involved in this assessment. Time needed for the evaluation was recorded in series of cases for all investigators. In a second round, a uniform grid composed of equidistant parallel horizontal lines was laid on all digital images (Fig. 1), previously used for estimation. The observers were asked to count the tumor cells crossed by the lines or touching the lines. The lines of the grid can be followed and the touching or crossed cells can be recorded (counted) continuously without the doubt of double counting or omitting single cells. Both immunohistochemically negative and positive nuclei were counted. Non-cancerous cells (stromal elements, lymphocytes etc.) were ignored as much as possible. The ratio of positive cells was derived from these values. In further analyses, rounded values (to the next integer) were used. Evaluation time was also recorded for this method. In all cases, the participating pathologists were asked to consider positive any cell with a brown (stained) rather than blue (unstained) hue.

Fig. 1
figure 1

An example of the digital pictures analyzed. a image used for estimating the Ki-67 LI by eyeballing; b the same image with parallel grid lines laid over delineates the stained and unstained cells (those touching the lines or crossed by them) to be considered when counting

Comparisons were performed between the estimated and counted values of each investigator. Different investigators’ values were also compared with each other.

Kappa statistics were used to evaluate the interobserver reproducibility regarding estimation and counting. The following cut-off values were used (taking the values mentioned by the St Gallen International Expert Consensus on the Primary Therapy of Early Breast Cancer in 2009): 0–15 %, 16–30 % and >30 %. Beside this categorization, Ki-67 values were divided into four quarters (0–25 %, 26–50 %, 51–75 % and 76–100 % Ki–67 LI). Kappa values were calculated according to Fleiss [30] and were interpreted as reflecting slight (0–0.2), fair (0.21–0.4), moderate (0.41–0.6), substantial (0.61–0.8) and almost perfect (>0.8) agreement between observations according to Landis and Koch [31].

Spearman and Pearson correlation analyses were also used in order to compare the intra- and interobserver estimated and calculated values. Coefficients were categorized as follows: values between 0.9 and 1 show excellent, 0.75–0.9 good, 0.5–0.75 moderate, 0.25–0.5 week correlation, whereas values between 0.0 and 0.25 reflect lack of correlation. Comparisons were made for each antibody alone (30 values per observer) and for the three different antibodies combined (90 values per observer) for each pair of investigators. The analyses were performed both for the estimated and the calculated values, using software package SPSS 15.0 for Windows (SPSS Inc., Chicago, Illinois).

Results

Digital images of 30 core-biopsy samples were analyzed. The mean ± SE age of the patients represented was 46 ± 2 (range: 26–70) years. Samples included 28 invasive ductal carcinomas of no special type, and 2 invasive lobular carcinomas. Altogether 720 evaluations were made by 4 independent pathologists with special interest in breast pathology (GC, AV, EC, BK). Mean ± SE estimated and counted Ki-67 LI values provided by the 4 pathologists are shown in Table 1. The calculated Ki-67 LI was based on the assessment of an average of 75–91 cells. There were no major differences in the cells counted from the same set of digital images by different investigators. The range of cells counted on the grid marked images was 9 to 194, as a single image with the same grid was used for each tumor and core biopsy.

Table 1 Mean ± SE Ki-67 values obtained by estimation and calculation for each investigator

Ki-67 values for individual cases and individual observers are presented in Fig. 2. The graphs demonstrate a very good overlap between investigators, both for the estimated and the counted Ki-67 values. Furthermore, the similar shapes of the graphs for the estimated and calculated Ki-67 values suggest a good overlap between the two methods of assessment. These impressions were also substantiated by the statistical analyses.

Fig. 2
figure 2

Proportion of Ki-67 stained cells as determined by different investigators. Cases 1–30 are samples immunostained with the MIB1 antibody, cases 31 to 60 are those stained with B56 and cases 61 to 90 are the ones stained with SP6. a Calculation based values; b Estimation based values

Good to excellent correlation was observed both with the Pearson’s and the Spearman’s methods when comparing the estimated and the calculated Ki-67 values of each investigator when analyzing all antibodies (90 cases per observer) (Table 2). The 90 assessments (all antibodies included) of each pair of investigators were also compared both for the estimated and the calculated Ki-67 LIs. The inter-observer correlation coefficients demonstrate an excellent correlation (Table 3).

Table 2 Intraobserver correlations between the estimated and the calculated Ki-67 LIs for each investigator
Table 3 Interobserver correlations for both the estimated (light gray cells) and the calculated (white cells) Ki-67 LIs

Similar correlation analyses were repeated in case of each antibody, one by one (i.e. only 30 cases with the same antibody per observer), in order to see whether different antibodies were associated with different correlations. To compare the effect of the antibodies on the intra-observer correlation of estimated and calculated Ki-67 LIs, as a basic approach, the 4 correlation coefficients by observers were averaged. In case of SP6 the mean ± SE values of the Person’s and Spearman’s correlation coefficients for the estimated Ki-67 LIs were 0.855 ± 0.044 and 0.857 ± 0.035, respectively. These values were 0.922 ± 0.004 and 0.926 ± 0.008 in case of B56, and 0.904 ± 0.012 and 0.879 ± 0.026 in case of MIB-1, respectively. These results suggest that SP6 might have at least a trend for a slightly weaker intra-observer correlation, than the others. With inter-observer analyses by antibody type, comparing the calculated values (6 pairs of investigators) in case of SP6 the mean ± SE values of correlation coefficients derived from Pearson and Spearmen tests were 0.871 ± 0.044 and 0.881 ± 0.039, respectively. In case of B56 these values were 0.967 ± 0.005 and 0.947 ± 0.005, respectively, while for MIB-1 they were 0.957 ± 0.006 and 0.960 ± 0.006, respectively. For the estimated values, the following results were found using Pearson and Spearman tests (mean ± SE): 0.942 ± 0.008 and 0.943 ± 0.009 for SP6, 0.947 ± 0.008 and 0.962 ± 0.006 for B56, while 0.926 ± 0.012 and 0.896 ± 0.013 for MIB-1, respectively. These results also suggest that SP6 might be associated with less concordance between observers, but only in case of the calculated Ki-67 LIs while in case of MIB-1, the estimation of Ki-67 LIs showed lower correlations.

When all Ki-67 values (30 cases stained with 3 antibodies, i.e. 90 values per observer) estimated by eyeballing were considered, reproducibility of the proliferative activity was substantial both for the classification into 4 equal quarters (kappa: 0.68) and the classification into three categories (kappa: 0.65). The kappas were 0.67 and 0.73, respectively in case of the calculated Ki-67 values, all corresponding to substantial agreement.

Examining the antibodies one by one, using four categories and estimated Ki-67 values, the kappas were 0.65, 0.69 and 0.64 for the MIB1, B56 and SP6 antibodies, respectively. For the three-tiered estimated Ki-67 categories, kappa values were 0.59, 0.69 and 0.67, respectively. Analyzing the calculated Ki-67 LIs, kappas were 0.66, 0.71 and 0.65, respectively for the four categories and 0.90, 0.60 and 0.69, respectively for the three categories. The agreement of the Ki-67 LIs gained by different antibodies was therefore almost always substantial, with two instances suggesting moderate reproducibility but falling just short of the substantial agreement category, and one instance with an almost perfect agreement.

The mean time to evaluate the Ki-67 LI on a single digital image was calculated on the basis of the time used for the investigation of 30 biopsy samples stained by a given antibody. By eyeballing this time ranged between 18 and 50 s per investigator, and this range was between 90 and 180 s when the cells were counted, and the Ki-67 LI was derived from the calculated proportion of stained and all tumor cells.

Discussion

Proliferation assessed by Ki-67 immunostaining is a recognized prognosticator of breast carcinomas [24, 11]. Determining the proliferation activity of breast cancer is an important task for the pathologist because it is a factor considered in therapeutic decision making, especially when chemotherapy is needed, since most chemotherapeutic agents act on proliferating cells. Higher proliferation may result in better chemosensitivity and may also reflect better response to specific hormonal agents (e.g. letrozole versus tamoxifen) [11]. Immunostaining with Ki-67 monoclonal antibodies is the most widely used assessment of proliferative activity today. Pathologists have to try to determine this value as accurately as possible. There are several technical issues relating to the processing of the tissues, areas of the tumor considered, intensity of staining considered positive… etc. that may affect the final LI, but even without these, reproducibility has been found less than optimal in several studies, including our previous work assessing the proliferation on the same set of needle core biopsies [29, 32]. In our previous article, we found that KI-67 LI values were significantly influenced by the investigator even if unified rules were used during the evaluation. Not only the inter-observer, but also the intra-observer agreement was found to be poor to moderate [29]. Although an international consensus recommends the examination of at least 500, but optimally 1,000 cells for deriving the Ki-67 LI [5], this practice is rarely followed, and counting about 100 cells or estimating the overall stained proportion of tumor cells are common methods of assessing proliferation. The lack of time is one of the most important factors deviating from the counting of high number of cells.

The presented results support that by choosing a limited area of the tumor as represented by a digital image, and by helping to choose which cells to count with a grid, improves reproducibility of determining the Ki-67 LI. Indeed, the inter-observer agreement on the Ki-67 LI reached on the real slides of the same cases and derived on the basis of about 100 cells from the area with the highest staining proportion was only fair on the basis of the overall kappa values <0.4 [29], but changed to substantial (overall kappa >0.6) for the digital images. Such an improvement in reproducibility was achieved by counting somewhat less cells on average than the 100 cells in the previous investigation, but the present study did not assess how many cells needed to be evaluated to reflect the proliferative activity of a tumor on the basis of a needle core biopsy, it only concentrated on reproducibility issues. It may well be, that several digital images would be required to reflect tumor proliferation. We expect similarly acceptable reproducibility with 2, 3 or more images. On the dark side of such an improvement, we must accept a loss in time. Making digital images of given areas of a tumor histology slide and adding a standard grid to the image may be fast in some settings, but may also take too much time to be affordable. The evaluation itself is also somewhat time-taking, requiring 2 to 3 min per digital image, depending on the cell density. Therefore, the finding that a rough estimate of the stained proportion of tumor cells may be as reproducible as the calculated LI is of interest. A similar estimation based method is generally used for establishing the percentage of estrogen or progesterone receptor positive cells.

Varga et al. in a very carefully designed study showed, that better reproducibility could be achieved by estimation rather than accurate counting [32]. Our results are in keeping with this observation, as good to excellent correlation was found between the estimated and the counted Ki-67 LI values and the overall kappa values also suggested substantial reproducibility for the eyeballing based estimation of the Ki-67 LI. Eyeballing obviously require less time, as supported by our data.

Most of the studies investigating reproducibility of Ki-67 based proliferation have been performed on surgical samples. We stress on the fact that core-biopsy samples were used in our studies. This factor theoretically decreases tumor sample heterogeneity and different investigators examine the same area with higher chance, due to the small size of the sample. Therefore, in theory, Ki-67 assessment on core biopsy samples could result in better reproducibility, but this was rebutted in our previous work documenting only fair to poor inter-observer agreement [29]. As concerns the use of different antibodies, MIB-1 is the most widely used and generally recommended one [5], but some other antibodies are also used for the estimation of proliferation. Our results suggest that the type of the antibody may also impact on the consistency of both estimating and calculating the Ki-67 LIs.

Conclusion

The use of a simple digital technology, taking microphotographs of proliferating areas of breast cancers and adding a grid to the pictures to better delineate which cells to consider in the count makes possible for different investigators to examine the same area and the same cells when determining the Ki-67 LI. This can significantly improve reproducibility. We found that calculating the LI on the basis of such grid labeled digital images results in better reproducibility than the frustratingly low one found on the basis of counting stained cells on the histology slides of the same core biopsy specimens. However, we also found that estimating the proportion of Ki-67 stained cells on the same digital images is not only faster than counting the stained and unstained cells, but also results in acceptable and substantial reproducibility and the estimated and counted values correlate strongly. Therefore estimation should not be considered an inadequate method of establishing the Ki-67 LI, simply because it is not based on objective numbers.