Keywords

1 Introduction

Automated nuclei detection and segmentation are well-studied problems in digital pathology and cytology. Where, many methods have been discussed in the related works, and new methodologies continue to be investigated [8, 13, 19]. Though detection and segmentation of nuclei in cytology images are considered to be simpler than in histology [8], they still are the challenging problems due to high variability in images of breast FNAC. The variability is present due to the composition of the aspirated sample, underlying disease condition, quality and variability of slide preparation, and the imperfection of data acquisition process. The specific challenges are due to (i) other tissue elements, and nuclear cluster topology mimicking visual and morphological characteristics of nuclei; (ii) closely separated, touching or overlapping nuclei; (iii) loss of contrast between nuclei and their background; (iv) slide preparation artifacts like over/under staining of the samples, and non-specific staining of the cytoplasm by Hematoxylin (in commonly used Hematoxylin and Eosin (H&E) staining technique).

This paper presents a multistage process for segmentation of isolated and closely separated nuclei from high magnification breast fine needle aspiration cytology (FNAC) images. The method proposes a new technique to create a grayscale representation of chromatic and contrast staining properties of the nuclear material that enables accurate segmentation of the nuclei. The grayscale representation also enables the process to minimize effects of other tissue components mimicking nuclear characteristics on the detection and segmentation of nuclei. Additionally, the grayscale representation efficiently incorporates into segmentation process the ability to handle the challenging situation of non-specific staining of cytoplasm by Hematoxylin. The segmentation process thus includes stages of (i) image pre-processing; (ii) pixel transformation to create a grayscale nuclear differential image (NDI); (iii) pre-segmentation of nuclear regions by automatic thresholding [14]; (iv) morphological filtering for suppression of the regions inconsistent with morphological properties of the isolated nuclei [16]; and (v) refined segmentation of the remaining objects by active contours without edges [1].

The remainder of this paper is organized as follows - Sect. 2 presents details of the process for segmentation of nuclei. Section 3 presents details of the experimental setup used to validate and benchmark performance of the segmentation system. Results of the experiments are discussed in Sect. 4 and concluding remarks presented in Sect. 5.

2 Proposed Segmentation Process

A block diagrammatic representation of the proposed multi-stage segmentation process is depicted in Fig. 1 and detailed description of the functional steps is provided in the following sub-sections.

Fig. 1.
figure 1

Block diagrammatic representation of the multi-stage isolated nuclei segmentation process. The image preprocessing block consists of vignetting correction and auto white balance procedures. The refined segmentation of the pre-segmented nuclei can be performed using various techniques like Snakes, Level sets, Fast marching technique, Randoms walker, etc.

2.1 Image Pre-processing

Most of the images acquired in digital microscopy systems include defects present due to the imperfections and limitations of the system elements such as light source, optics, and camera electronics. Additionally, external factors like fungal growth or dust accumulation on the optics also affect the quality of acquired images. Thus, before segmentation of nuclei in an image, the image is conditioned for non-uniform luminance correction [20] and fast auto white balance correction for color constancy [2, 3].

2.2 Creation of the Nuclear Differential Image

This section presents a simple method of NDI creation by combining the visual characteristics of color saturation and stain quantity information. In the H&E staining technique, Hematoxylin binds with nucleic acids and gives the nuclear region its characteristic deep blue-purple color whereas Eosin stains proteins non-specifically and gives magenta-red color to the cytoplasm [17]. Thus, the differential image creation process is defined based on the knowledge that presence of Hematoxylin stain at a point in the smear and high color saturation in its image imply the presence of nuclear material at the point. In many cell samples, using only the Hematoxylin stain separated images for segmentation can not provide desired contrast between the nuclei and their background especially in the case of non-specific staining of the cytoplasm. To handle this scenario we define a nonlinear combination of the quantity of Hematoxylin stain, quantity of Eosin stain, and color saturation information to assign gray level value to the pixels in NDI as

$$\begin{aligned} \text {f}_{D}(i,j) = \min ((\text {f}_{QH}(i,j) -\varepsilon \text {f}_{QE}(i,j)), \text {f}_s(i,j)) \end{aligned}$$
(1)

where \(\text {f}_{QH}(i,j)\) Footnote 1, \(\text {f}_{QE}(i,j)\) and \(\text {f}_s(i,j)\) represent the quantity of Hematoxylin stain, quantity of Eosin stain, and color saturation at location (ij) and \(\varepsilon \) is a factor which is used to define the fraction of measured Eosin stain quantity that should be subtracted from the measured Hematoxylin stain quantity to compensate for the nonspecific staining of the cytoplasm by Hematoxylin. The values of stain quantities \(\text {f}_{QH}(\cdot )\), \(\text {f}_{QE}(\cdot )\) and color saturation \(\text {f}_{s}(\cdot )\) are defined in the range of [0, 255].

For a microscopic image \(\mathbf f \) of the stained cytological sample, acquired under known illumination (in this case a white balanced image), presence of the stains can be quantified by stain separation method described in [17], and presented in equation form as

$$\begin{aligned} \mathbf f _{Q}(i,j) = \mathbf D \times \mathbf f _{OD}(i,j) \end{aligned}$$
(2)

where \(\mathbf f _{Q}(i,j)\) is a column vector representing quantity of the constituent stains, \(\mathbf D \) is the deconvolution matrix for stain separation, and \(\mathbf f _{OD}(i,j)\) is a column matrix depicting optical densities of red, green and blue wavelengths at location (ij).

The NDI thus created represents nuclei as high-intensity regions with a dark background, such that contrast across the nuclear boundary is larger than input color image or its grayscale version. This method effectively extends the ability of stain separation methods of handling the presence of tissue elements like red blood cells and overlapping cellular components to handle non-specific staining of the cytoplasm. In the system, the value of \(\varepsilon \) can be set heuristically for a tissue sample based on the expectation of non-specific staining. Illustrative examples comparing results for the differential image generation process are shown in Fig. 2.

Fig. 2.
figure 2

Comparison of NDIs obtained using different values of \(\varepsilon \) with input and its grayscale version are presented here. The first example illustrates advantage of combining Hematoxylin, Eosin and color saturation information over using just the Hematoxylin and color saturation information when the non-specific staining of cytoplasm by Hematoxylin is suspected. The second example illustrates the condition when just the Hematoxylin and color saturation information are sufficient to obtain a high contrast NDI. (Color figure online)

2.3 Noise Filtering

The process of NDI creation described here works only on pixel level information thus commonly leads to the presence of noisy local variations. To avoid local minima that can affect the segmentation process and to aid accurate segmentation, the luminance similarity-aware weighted-local-difference median filter (LAWLDMF) [5] is employed. The filtered NDI is then used for the segmentation of nuclei.

2.4 Pre-segmentation of Nuclei

In the NDI, nuclei show up as bright blobs with a darker background. In this scenario, the nuclei can be easily segmented by use of thresholding techniques. For this purpose, the segmentation system uses Otsu’s automatic threshold selection technique [14].

2.5 Morphological Filtering of the Connected Components

A pre-segmented image contains multiple connected components representing region masks for single, touching and overlapping nuclei present in the input image. The pre-segmented image may also contain some objects that do not belong to the real nuclear regions and can be removed by use of morphological filtering. Here, only the pre-segmented objects that closely match the size and shape (round/elliptical) of isolated nuclei are retained using a method, similar to the one proposed by Pietka et al. [16]. The rules for filtering and selection of the objects can be presented as

  1. (i)

    Remove all connected components on the edge of an image

  2. (ii)

    From the remaining connected components, retain all those who have the size within the probable range of isolated nuclei or are larger in size but closely match the shape of isolated nuclei (i.e. retain compact, less eccentric rounded objects).

Here, the size of objects is measured in number of pixels, eccentricity is defined as the ratio of the distance between foci of the ellipse and its major axis length, and the compactness is defined as the area (size) upon squared object perimeter in pixels.

2.6 Active Contour Models Based Refined Segmentation

It is observed that, though close to the actual nuclear regions, pre-segmentation of nuclei is not accurate and requires further refinement. The segmentation system described here uses the ‘active contour without edges’, or Chan-Vese level sets technique [1] for refinement of pre-segmentation. Here, the output of morphological filtering process provides initialization masks which are then evolved by Chan-Vese model to obtain refined segmentation.

Active contour without edges: The level sets model introduced by [1] is an energy minimization approach to segmentation, where, the model assumes that a grayscale image \(\text {f}\) is formed by two regions of approximatively piecewise-constant intensities having distinct values \(\text {f}^{in}\) and \(\text {f}^{out}\). Thus, if \(C_0\) is the boundary of the object of interest, then \(\text {f}\approx \text {f}^{in}\) inside \(C_0\) and \(\text {f}\approx \text {f}^{out}\) outside of \(C_0\). The level sets approach then tries to optimize the fitting function \(\mathbb {E}(c_1, c_2, C)\) defined below to find \(C_0\)

$$\begin{aligned} \mathbb {E}(c_1, c_2, C) = \begin{matrix} \mu \cdot \text {Length}(C)~+~\nu \cdot \text {Area}(\text {inside}(C))\\ +~\lambda _1\cdot \int _{\text {inside}(C)}\left| \text {f}(i,j) - c_z1 \right| ^2di~dj\\ +~\lambda _2\cdot \int _{\text {outside}(C)}\left| \text {f}(i,j) - c_2 \right| ^2di~dj \end{matrix} \end{aligned}$$
(3)

where C is a variable curve that can be initialized arbitrarily or to an initial estimate of the region boundary obtained by pre-segmentation; \(c_1\), and \(c_2\) are the average intensities inside and outside of C respectively.

2.7 Post-segmentation Morphological Filtering of the Connected Components

The connected components obtained after refined segmentation process are filtered based on the morphological attributes as defined in Sect. 2.5. Figure 3 shows the processed images at different stages of the complete segmentation process.

Fig. 3.
figure 3

The images on right depict different stages of the segmentation process (a) input image, (b) preprocessed image, (c) NDI, (d) pre-segmented image after morphometric filtering, and (e) nuclear regions mask obtained after entire segmentation process.

3 Experimental Setup

The focus of experimentation in this paper is on the evaluation of refined segmentation strategies. The segmentation quality of various algorithms can be evaluated by comparing their results with suitably defined ground truth using objective measures, and by visual evaluation by experts. The objective measures based benchmarking of Chan-Vase level sets model is performed against three nucleus segmentation methods of Snakes [10], Fast marching method [18], and Random walks [6] commonly used in digital pathology applications. This is followed by visual verification by a cytopathologist. The experimentation strategy also studies the impact of pre-processing and noise filtering techniques on segmentation quality, results for which are provided along with the other results presented in Sect. 4.

3.1 Image Dataset and Ground Truth Segmentation

For development and experimental validation of the image segmentation system, an image dataset was prepared from the slide archives of our institute. The cell samples used for imaging were obtained from routine FNAC performed on the patients with breast masses by an expert cytologist. The slides were prepared by wet fixation and H&E staining methods commonly used for primary diagnosis from FNAC [4]. The slides were imaged using Leica DM750 microscope with \(40{\times }\) magnification objective having the numerical aperture of 0.65. The microscope comes fitted with Leica DFC295 camera via a \(0.5{\times }\) optocoupler and housing a \(1024\times 768\) pixel resolution color image sensor. During image acquisition, the camera was programmed to provide RGB coded pixel data without any image pre-processing. Focus and field illumination settings were user defined, with variations within the acceptable range, over which images retain their diagnostic value for a human expert. The performance of segmentation techniques is tested using a set of 21 randomly selected benign/malignant breast FNAC sample images of size \(1024\times 768\) pixels. Since each image contains a large number of nuclei and manual segmentation of all is impractical, 213 nuclear regions were manually marked by an expert to create ground truth for objective evaluation.

3.2 Configuration of the Segmentation System

Preprocessing, NDI generation, noise, and morphological filtering: When integrating the auto white balance and LAWLDMF for noise filtering, the best settings presented in the relevant literature were applied, for NDI creation in this experiments \(\varepsilon \) is set to 0. The morphological filtering step discards the connected components within 20 pixels from the nuclear boundary, retains nuclei having area in the range of [40, 700] pixels and any shape or nuclei having area >700 pixels and eccentricity less than 0.9.

Refined segmentation algorithms: Each of the compared refined segmentation methods has multiple configuration parameters that can be tuned to get the best performance for individual test images. However, this is undesirable as it is impossible to tune a segmentation technique for each image in real life. Thus, during this study, all methods use the same set of parameters for the entire dataset (specific for each technique), and aggregate values for objective measures are compared. For this study, our implementation of Snakes and Chan-Vese level sets have been used, along with the publicly available segmentation program for Random walks technique [7]. The fixed settings of each algorithm were determined so as to produce visually and objectively most accurate segmentation results on the dataset. In this regard during experimentation, the behavior of compared methods was studied on following parameters to select the best settings that achieve best F-score value (averaged over dataset)- (i) number of iterations for Snakes and Chan-Vese level sets (4 to 104 iterations at interval of 4, 26 experiments each) while setting other parameters constant (\(\mu = 0.9\), \(\nu = 0\), \(\lambda _1 = \lambda _2 = 1\)), (ii) threshold value used for computation of gradient difference weight for Fast marching method (varied between range 4 to 24 at the interval of 4, 6 experiments), and (iii) Weighting parameter used in Random walks method (varied between 10 to 100 at interval of 10, 10 experiments).

3.3 Objective Measures for Performance Evaluation

If the marked ground truth is available for a connected component segmented by a technique, the two regions are compared to measure the statistical measures of precision [12, 15], recall [12, 15], F-Score and Jaccard similarity coefficient [9, 11].

4 Results and Discussion

Figures 4 and 5 show the F-Score and Jaccard coefficient (J) respectively obtained over the parameter set described in Sect. 3.2. The tuning curves describe the behavior of algorithms as input parameters change, and can be used to determine the appropriate input parameters to be used. The F-Score and Jaccard coefficient curves of the Chan-Vese level sets and Snakes, show an almost monotonic increase in the segmentation accuracy over initial increments in the number of iterative steps and saturate after that. The curves of Fast marching and Random walks methods show very small change with variation of the objective parameters. The best performance for Chan-Vese level sets algorithm was obtained at 76 iterations, for Snakes at 104 iterations, for Fast marching method at gray level difference threshold of 12, and Random walks method for weighting parameter function value of 90. The figures do not show an experiment-wise comparison between various configurations of the compared methods.

Objective measure results for configurations leading to the best performance for the compared methods is given in Table 1. It can be observed that Snakes technique has the lowest performance both regarding over (precision) and under (recall)-segmentation. Comparison of Chan-Vese level sets and Random walks method reveals that later has the lower rate of over-segmentation than the former, but under-segmentation rates show inverted behavior. Overall, Chan-Vese level sets method has more balanced performance and higher F-Score among all the compared methods. The same behavior is exhibited on the Jaccard coefficient as well.

Fig. 4.
figure 4

Plot for average F-score for different configurations of the compared refined segmentation algorithms.

Fig. 5.
figure 5

Plot for average Jaccard coefficient for different configurations of the compared refined segmentation algorithms

Table 1. Accuracy for the compared fine-segmentation techniques with respective optimal configuration parameters

Figure 6 shows segmentation results for the compared algorithms on multiple nuclear regions from three sample images from the test dataset for which ground truth is available. The settings used for each of the segmentation techniques correspond to the configurations leading to the highest F-Score values. The images also show the comparison of obtained segmentations with ground truth. It can be observed that Snakes, and Fast marching segmentation techniques, as observed through objective measures, have the tendency to both over and under segment, greater than that of Chan-Vese level sets, and Random walks methods. Random walks method though has the performance comparable with that of Chan-Vese method, it commonly produces jagged nuclear boundary and can potentially affect performance of the feature extraction techniques that quantify the state of nuclear membrane. Chan-Vese level sets method, on the other hand, includes the boundary length (\(\text {Length}\)) term in the energy minimization function, that results in smoother region boundary, which is desirable at least for small lengths.

Results for segmentation of the nuclei by level sets method in various breast FNAC conditions are shown in Fig. 7 where boundaries of the segmented nuclei are overlaid on the input images. The images highlight performance of the proposed segmentation system in various types of cell samples including benign and malignant conditions and difficult to segment cell clustering and cytoplasm conditions. The results of the segmentation technique have been visually verified by an expert cytopathologist.

Fig. 6.
figure 6

Visualizations of the outputs for compared methods, highlighting over and under-segmentation with respect to ground truth. Visualizations for the inputs and corresponding outputs are shown in shaded rectangular boxes with the column-wise left to right arrangement of input, ground truth, and output visualizations for the compared methods. The cropped out regions of three test images are shown in the first column. Ground truth segmentation for the nuclear regions is depicted in the second column, where inside boundary is shown as a colored closed contour overlaid on the input image shown on the top; the bottom image in the same column shows corresponding binary nuclear region mask. Here, () pixels in the ground truth image are the pixels that belong to a nuclear region, and black pixels are the pixels that belong to the background region. The top image in a results column of a compared method shows the over-segmented ( colored pixels) and under-segmented ( colored pixels) pixels in the segmentation mask overlaid on the input image. The bottom image in the column shows the corresponding segmentation mask with the color coding as described above. (Color figure online)

Fig. 7.
figure 7

Results for segmentation of the nuclei by level sets method in various breast FNAC conditions are shown in this figure. The inside boundaries of the segmented nuclei are overlaid in green color on the input image. The images (a) and (b) present the high magnification images of benign samples, with image (a) showing a sheet of benign looking nuclei and the image (b) shows the segmentation performance in the presence of debris present on the slide. The images (c) and (d) correspond to malignant samples with scattered nuclei of variable size and shape with the image (d) showing segmentation performance in a sample with abundant eosinophilic cytoplasm. (Color figure online)

Table 2. Accuracy for the compared fine-segmentation techniques without pre-processing of the images

4.1 Impact of Pre-processing and Noise Filtering Techniques

To study the impact of pre-processing and noise filtering techniques on segmentation quality, these pre-processing steps are bypassed and segmentation accuracy noted for the algorithm configurations that lead to the best performance with those steps enabled. The aggregate results over the dataset are presented in Table 2. All the compared methods except Fast marching method show the degradation in segmentation accuracy and increased variance in the objective measures. Chan-Vese level sets method remains the best performing segmentation algorithm.

5 Conclusions

This paper presented a two-stage segmentation process (pre-segmentation followed by refined segmentation) in the high magnification microscopy images of H&E stained breast FNAC samples. The system integrates image pre-processing, and segmentation techniques to achieve desired high segmentation accuracy for application in CAD systems. Though the segmentation process can use various segmentation techniques, Chan-Vese level sets method provides more balanced performance among all the compared methods with Random walks method process being the close second. Due to the inclusion of \(\text {Length}\) in the energy minimization function, Chan-Vese level sets method provides smoother region boundary than Random walks method and is more suitable for estimation of nuclear morphometric features. The pre-processing steps used here contribute to the improvement of accuracy and consistency across tested refined segmentation techniques. Beyond complete integration of the pre-processing and segmentation techniques, novelty of the system lies in combining image color properties and Hematoxylin and Eosin stain separated images to synthesize the NDI and using it for accurate segmentation of the nuclei. The NDI simplifies the problem of nucleus segmentation into a simpler problem of separation of bright high-intensity regions with a dark background. The use of NDI also gives the system ability to handle the presence of tissue elements like red blood cells, voluminous Eosinophilic cytoplasm covering nuclei, and other common debris. The use of NDI further augments capability of the system to handle non-specific staining of the cytoplasm by Hematoxylin.