Keywords

1 Introduction

Biological material necessary for immunochemical examination is taken from affected breast tissue using needle biopsy. The material is then subjected to an immunochemical processing, fixing and staining. Ki-67 positive nuclei are stained by diaminobenzidine (DAB) and have a brown color; negative nuclei appear as blue due to hematoxylin (H) dye. Finally, slide glass with stained material is scanned and evaluated by the pathologist. Ki-67 index is calculated as a ratio between brown-colored nuclei and all nuclei. Individual cases of breast cancer are classified using a fixed cut-off point for the Ki-67 index. During the 2011 St. Gallen International Breast Cancer Conference agreed a cut-off value of 14% based on the study by Cheang et al. [5]. The recommendation had changed in 2013 when most of the panels of St.Gallen conference suggested a threshold of 20%. Finally, the 2015 consensus suggested that cut-off value should be tuned separately for each laboratory based on median value from local results.

The main problem when determining the Ki-67 index is that many pathologists do not have time to visually count the cell nuclei. This process is extremely time-consuming because it requires choosing the most active areas in proliferative rate (hotspots) and then pathologists have to mark manually few hundreds of nuclei. Counting by visual inspection is highly subjective because it is linked to interpretation. Many studies report that inter-observer variability in determining the Ki-67 index is high [16].

Automatic nuclei counting would have allowed avoiding tedious and in fact unnecessary work for pathologist. Also, the process would become free from the subjective assessment of the pathologist. However, we must remember that the accuracy of nuclei segmentation is critical to performance of Computer-Aided Cytology (CAC). Unfortunately, immunochemical images of cytological material are rather challenging for existing object detection methods. In hotspots, they usually create complex, random and heterogeneous structures like clumps and nests. Nuclei are often overlapping and touching. A number of scientific centers conduct an intensive research to develop efficient algorithms for object segmentation. The most common approaches are based on the image thresholding, color deconvolution, data clustering, watershed, active contours and deep learning [4, 8]. One of the well-known systems for determining Ki-67 index is ImmunoRatio [15]. This system estimates the Ki-67 index as the ratio of the brown-colored area of the image to the brown- and blue-colored area of the image. Staining components (DAB and H) are separated using H+DAB color deconvolution technique and intensity thresholding [12]. For the ImmunoRatio to be effective, it must be appropriately calibrated for data from a specific laboratory. There are a lot of reports in the literature about the high accuracy of this system [18]. Saha et al. proposed deep learning framework for hotspots detection and proliferation scoring [13]. They achieved 93% precision, 0.88% recall and 0.91% F-score value. In addition, in their work, we can find a comprehensive list of methods used for estimating the Ki-67 index. In [1] a classifier based algorithm was proposed for Ki-67 scoring in BC tissue microarray images. Proposed approach reached 90% classification accuracy with 0.64 kappa concordance. Xing et al. applied boundary delineation algorithm to localize tumor and nontumor cells for automatic Ki-67 counting [17]. Their algorithm shows promising performance measure in comparison with other popular Ki-67 scoring techniques. Here, we can also find an overview of the other methods used to estimate Ki-67 index.

The aim of our research is to develop and test the new method of Ki-67 scoring based on stochastic geometry. Stochastic geometry is a branch of probability theory that deals with the analysis of random spatial patterns [6]. They are successfully applied wherever there are heterogeneous structures whose spatial distribution is random [7, 10]. We can observe such distribution for cytological material in immunochemical images.

The idea of our approach is to transform H+DAB stained image into a model build up with ellipses which vary in the location, size and orientation. The goal is to find such configuration of ellipses that fit the image data best without violating a priori preferences regarding the distribution of nuclei. The next step is to divide ellipses into two classes due to the color of actual nuclei. This step was carried out with the help of color deconvolution, which allowed us to discover DAB staining areas and H staining areas. The ellipses are by default classified to blue-colored (H) objects and then they are changing their assignment if they are touching the brown-colored (DAB) area. Finally, nuclei models are counted in order to determine Ki-67 index.

To verify the effectiveness of proposed approach, it was applied to estimate Ki-67 index for cytological images of breast cancer. The obtained results were compared with the reference results obtained for manual segmentation and with the results coming from ImmunoRatio.

The remainder of this paper is organized as follows. Section 2 gives the description of methods applied to determine Ki-67 index. Results of experiments are presented in Sect. 3. Concluding remarks are given in Sect. 4.

Fig. 1.
figure 1

Overview of the method

2 Method

2.1 Method Overview

Before it is possible to build immunochemical models of actual images, pre-processing must be applied to these images to separate the areas stained by H and DAB. To achieve this, the RGB image is subjected to H+DAB deconvolution using the procedure implemented in ImageJ plugin [12]. Three separate intensity images are created as a result. The first represents how much of H has been deposited in nuclei, second how much of DAB has been deposited in nuclei, and third residuals (see Fig. 1). For further processing, we are using the image of H density and DAB density. Next step is carried out to determine binary masks of H area and DAB area. Both masks are determined based on intensity thresholding. The H mask is combined with the DAB mask using AND logic operator. As an effect, we are obtaining H+DAB mask used by stochastic based algorithm to detect elliptical objects which resemble the nuclei. The DAB mask alone cannot be used to segment Ki-67 positive nuclei because in some cases only small part of the nucleus is colored by DAB but anyway this means that the nucleus should be counted as Ki-67 positive. Therefore, H+DAB mask is used to segment all nuclei using stochastic geometry approach, and then DAB mask is used to indicate Ki-67 positive nuclei by checking which ellipses overlaps with the area of the DAB mask. The entire procedure is illustrated in Fig. 1.

The crucial step of the proposed strategy is to find a proper configuration of ellipses that fit the H+DAB mask. Stochastic geometry based approach is trying to cover as many as possible pixels of nuclei area on H+DAB mask using a collection of ellipses of different size and orientation [2, 9]. At the same time, the algorithm is trying to avoid to cover background pixels. Of course, there are many different collections of ellipses that fit well to the given H+DAB mask. To choose the best configuration, we need a prior knowledge about preferable configurations of nuclei. For this purpose, we control pairwise interactions between ellipses to limit the number of overlaps between objects. Thus, configurations with a large number of overlapping ellipses are less likely than those that have less overlaps.

2.2 Marked Point Process

In the considered approach nuclei segmentation in H+DAB image boils down to finding the configuration of ellipses varying in size, location and orientation which covers precisely the H+DAB mask and follows the distribution given by aprior model. To find such a configuration of ellipses, we had to assume that the image generation process can be described using marked point process (MPP). So the crucial element necessary to reconstruct the input image is the knowledge of conditional probability mass function (pmf) \(p(\mathbf x |\mathbf y )\) which governs such process:

$$\begin{aligned} p(\mathbf x |\mathbf y ) \propto f(\mathbf y |\mathbf x )p(\mathbf x ), \end{aligned}$$
(1)

where likelihood term \(f(\mathbf y |\mathbf x )\) evaluates the consistency of ellipse configuration \(\mathbf x =\{x_1,\ldots ,x_n\}\) respect to H+DAB mask \(\mathbf y \) and a prior term \(p(\mathbf x )\) reflects constraints on pairwise interactions between ellipses within configuration \(\mathbf x \) [2]:

$$\begin{aligned} p(\mathbf x )=\alpha \beta ^{n(\mathbf x )}\prod _{x_{i} \sim x_{j}}h(x_{i},x_{j}), \end{aligned}$$
(2)

where \(\alpha ,\beta > 0\) are constants, \(n(\mathbf x )\) is the number of disks in configuration \(\mathbf x \), h is the interaction function and \(\sim \) is a symmetric and reflexive relation describing ellipse overlaps.

Fig. 2.
figure 2

Probability distributions for S(x) and \(S\setminus S(x)\) regions

If we assume that variables representing mask values \(y_{t} \in \{0,1\}\) are conditionally independent given configuration \(\mathbf x \) then \(f(\mathbf y |\mathbf x )\) takes the following form:

$$\begin{aligned} f(\mathbf y |\mathbf x ) = \prod _{t \in S(x)} b(y_t;p_{N}) \prod _{t \in S \setminus S(x)} b(y_t;p_{B}), \end{aligned}$$
(3)

where \(b(y_t;p_{N})\) and \(b(y_t;p_{B})\) are Bernoulli pmf’s:

$$\begin{aligned} b(y_{t};p_{N})=\left\{ \begin{array}{ll} 1-p_{N} &{} \text {if } y_{t} = 0\\ p_{N} &{} \text {if } y_{t} = 1, \end{array} \right. \end{aligned}$$
(4)
$$\begin{aligned} b(y_{t};p_{B})=\left\{ \begin{array}{ll} 1-p_{B} &{} \text {if } y_{t} = 1\\ p_{B} &{} \text {if } y_{t} = 0. \end{array} \right. \end{aligned}$$
(5)

They are used to evaluate the likelihood of pixels on H+DAB mask \(\mathbf y \) within nuclei region S(x):

$$\begin{aligned} S(x) = \bigcup _{i=1}^{n}S(x_i), \end{aligned}$$
(6)

and background region \(S \setminus S(x)\) respectively, where S is a pixel lattice of mask H+DAB and \(S(x_{i})\) is the silhouette of the ellipse \(x_{i}\). Parameter \(p_{N}\) describe a probability of occurring actual nuclei pixel within nuclei region S(x) on H+DAB mask \(\mathbf y \), and \(p_{B}\) is a probability of occurring actual background pixel within background \(S\setminus S(x)\). Both parameters were chosen arbitrarily and are shown in Fig. 2.

Finally, we choose Strauss process to implement interaction model [3, 14]:

$$\begin{aligned} p(\mathbf x )=\alpha \beta ^{n(\mathbf x )}\gamma ^{r(\mathbf x )}, \end{aligned}$$
(7)

where \(\alpha ,\beta ,\gamma > 0\) are constants, n(x) is the number of ellipses in configuration and \(r(\mathbf x )\) is the number of pairwise overlaps in configuration \(\mathbf x \). For \(0<\gamma <1\), model exhibits repulsive forces between ellipses and this prevent an excessive number of overlaps in ellipse configurations.

2.3 Optimization

Finding proper configuration of ellipses boils down to optimization problem where \(\varOmega \) is a set of all possible configurations \(\mathbf x \). To solve the problem of nuclei segmentation, we must find in \(\varOmega \) a configuration \(\mathbf x \) that fit the image best without contravene a prior interaction constraints. In Bayesian framework this problem can be viewed as a maximum a posterior estimation problem:

$$\begin{aligned} \hat{\mathbf{x }} = {\mathop {\hbox {arg max}}\limits _\mathbf{x }} \,f(\mathbf y |\mathbf x )p(\mathbf x ). \end{aligned}$$
(8)

Unfortunately, direct sampling from \(f(\mathbf y |\mathbf x )\) and \(p(\mathbf x )\) is not straightforward. But, the problem becomes much more tractable if we deal with the following proportion [2, 9]:

$$\begin{aligned} \begin{aligned} w = \ln \Bigg ( \frac{f(\mathbf y |\mathbf x _{k+1})p(\mathbf x _{k+1})}{f(\mathbf y |\mathbf x _{k})p(\mathbf x _{k})} \Bigg ) \\ =\sum _{t \in S_{N}} \Big (\ln \big (b(y_t;p_{N})\big )- \ln \big (b(y_t;p_{B})\big )\Big )\\ +\ln \big (\gamma \big )\big (r(\mathbf x _{k+1} )-r(\mathbf x _{k})\big )+\ln \big (\beta \big ), \end{aligned} \end{aligned}$$
(9)

where \(\mathbf x _{k}\) is the current configuration, \(\mathbf x _{k+1}\) is the new prospective configuration and \(S_{N} = \big (S(\mathbf x _{k+1})\cup S(\mathbf x _{k})\big ) \setminus \big (S(\mathbf x _{k+1})\cap S(\mathbf x _{k})\big )\). If we limit the ways the new configurations \(\mathbf x _{k+1}\) can emerge by allowing only to add single ellipse u or delete single ellipse u from the current configuration \(\mathbf x _{k}\) then it becomes possible to apply steepest ascent procedure to find the local maximum. Algorithm is always choosing new configuration \(\mathbf x _{k+1}\) to maximize proportion w. Therefore, probability never decreases at any stage and eventual convergence is guaranteed. However, algorithm usually stuck in nearest local maxima. The pseudocode of this procedure is presented in Algorithm 1.

figure a

2.4 Ki-67 Scoring

Ki-67 index is computed as the ratio between brown-colored nuclei and all nuclei. Steepest ascent procedure segments nuclei and returns the configuration of ellipses that approximates actual nuclei but it is not able to distinguish Ki-67 positive nuclei form Ki-67 negative nuclei. To tackle this problem, we are marking all ellipses which are within the DAB mask as Ki-67 positive (see Fig. 1). Finally, we can quickly count all nuclei by checking how many nuclei are in the found configuration \(\hat{\mathbf{x }}\). The estimate of Ki-67 index is computed as the number of positive ellipses to the number of all ellipses.

Fig. 3.
figure 3

Input images (Ki-67 positive nuclei have darker color)

Fig. 4.
figure 4

Segmentation results (Ki-67 positive nuclei are marked with a darker color)

3 Results

The proposed approach was applied to estimate Ki-67 index in 20 test cases of breast cancer. Immunochemical examinations were obtained for 20 patients from the University Hospital in Zielona Góra, Poland. Immunochemical slides were digitized into virtual slides using the Olympus VS120 Virtual Microscopy System. Selected fragments (size \(500 \times 500\) pixels) of these slides were used in experimental studies (see Fig. 3). Each test image contains from 100 to 400 nuclei. All images were manually marked to get the reference results of Ki-67 scoring. The accuracy of the proposed method was compared with the accuracy of ImmunoRatio system.

To compute Ki-67 index we need segmented nuclei. In our approach stochastic geometry is responsible for extracting nuclei models in the form of ellipses. The method requires an atlas of predefined models of nuclei. In this experiment, we have used atlas comprised of 240 ellipses which vary in sizes and orientations. Each ellipse is described using 3 parameters: major axis length \(r_{M} \in [6,\ldots ,15]\), minor axis length to major axis length ratio \(r_{R} \in [0.5, 0.65, 0.8, 1]\) and orientation \(o \in [0^{\circ }, 30^{\circ }, 60^{\circ }, 90^{\circ }, 120^{\circ }, 150^{\circ }]\). The other parameters that must be defined for stochastic geometry are \(\beta \) and \(\gamma \). As a result of the experiments, it was found that the best results were obtained for \(\ln (\beta ) = -600\), and \(\ln (\gamma ) = -700\).

Fig. 5.
figure 5

Segmentation results for image no. 10

Table 1. Results of automatic detection methods in comparison with manual segmentation
Table 2. Confusion matrix (cut-off equal to 14%)
Table 3. Confusion matrix (cut-off equal to 20%)

Segmentation results for all images are presented in Fig. 4. An illustrative example of segmentation results for stochastic geometry, ImmunoRatio, and manual segmentation is presented in Fig. 5. Based on the segmentation results, we were able to estimate the Ki-67 indexes for all test samples. Table 1 summarizes the results obtained for stochastic geometry, ImmunoRatio system and manual segmentation. According to St. Gallen consensus in 2011 and 2013 cut-off points to classify Ki-67 proliferative activity were defined at 14% and 20% levels respectively. We used these cut-off points to classify our results for stochastic geometry, ImmunoRatio, and manual segmentation. Classification results based on automatic segmentation were compared with the reference results using confusion matrices (see Tables 2 and 3). Moreover, we computed Cohen’s kappa coefficient \(\kappa \) to measure agreement between automatic and manual approaches. Stochastic geometry showed high agreement with the reference data because it obtained \(\kappa _{14} = 0.74\) and \(\kappa _{20} = 1\) for the 14% and 20% cut-off points respectively. ImmunoRatio got worse results, \(\kappa _{14} = 0.47\) and \(\kappa _{20} = 0.8\) respectively.

4 Conclusions

We proposed a novel method based on stochastic geometry for estimating Ki-67 index. The preliminary results of the experiments carried out are satisfactory. We showed that this method is more accurate than the approach used in the ImmunoRatio system. Unfortunately, the number of test images is relatively small, thus the statistical significance of the results is not satisfactory. To conduct more reliable tests, a large number of ground truth images must be provided. Unfortunately, manual counting is tedious and requires the involvement of highly qualified medical personnel. For this reason, we decided that the process of building the reference database will be continuous and experimental results will be updated with the expansion of this database.

The segmentation method proposed in the work belongs to a group of methods with high demand for computing power. However, we managed to implement the presented algorithm with intensive use of convolution operation. This allowed us to run our method using parallel computing on GPU. Moreover, we have shown in Eq. 9 that due to iterative nature of the proposed segmentation method only a small part of the prospective ellipses must be evaluated in the consecutive steps of the algorithm. Only in the first step, all prospective ellipses have to be evaluated. Thanks to such manipulations, it was possible to shorten the time needed to process an image containing approximately 250 nuclei up to 2 min.

Future work will concentrate on building more sophisticated models of nuclei based on additional information from input images such as edges and textures [11]. This will allow us to segment the overlapping cell nuclei more accurately.