Keywords

1 Introduction

Digital Pathology is the process of creating high-resolution images from the digitized histology slides. Digital Pathology is acquiring a lot of significance due to accessibility of WSI scanners [1]. These digitized images allow to apply various image analysis techniques to digital pathology for applications such as identification, segmentation and classification. Previously existing methodologies exhibited their ability not only in reducing the laborious and difficulty in providing accurate quantification but also as second opinion in assisting pathologists to reduce inter-observer variability [2, 3].

Deep Learning is a machine learning paradigm for feature learning which involves in extracting an appropriate feature space exclusively from the data itself. It is a significant feature of deep learning methods, which allows the learned model to be generalized so that it can be used to other autonomous test sets. After training the deep learning network with rich training set, it can be generalized well to not seen circumstances, preventing the requirement of manually engineering features. Thus, deep learning is well for analyzing huge data archives (e.g., TCGA, which includes digital tissue slide images in terms of petabytes).

1.1 Related Work

Various deep learning models have been proposed for cell nuclei segmentation. Song et al. (2014) [4] propose a method based on CNN for the segmentation of cervical nuclei and cytoplasm. They applied a CNN for nuclei detection and then performed coarse segmentation based on Sobel edge operator, morphological operations and thresholding. Xing et al. (2016) [5] generated probability maps for nuclei by applying Two-class CNN to digitized histopathology images. And to solve the problem of overlapping nuclei, the robust shape model (dictionary of nuclei shapes) was constructed and repulsive deformable model at local level was applied. On the other hand, Kumar et al. (2017) [6] proposed Three-class CNN that predicts not only the nuclei and background, but also the boundary of each nucleus. This provided significantly better results in comparison with Two-class problem but the post-processing step was time consuming. The first FCN for semantic segmentation was presented by Long et al. (2015) [7]. Their results showed that the FCN can achieve state-of-the art performance in terms of segmentation. Further, the inference step associated with this method is significantly faster to obtain the corresponding segmentation mask. In order to perform nuclei segmentation in histopathology images, Naylor et al. (2017) [8] used FCN to obtain the nuclei probability map, then watershed method was applied to split the touching nuclei but the nuclei boundaries predicted by this method was not accurate when compared with ground truth image.

Investigation in the area of deep learning is increasing rapidly; hence new architectures are being developed at significantly fast speed. Accounting the importance of cell nuclei segmentation, there are a number of approaches that have been presented to solve this problem, most of which are based on U-Net [9]. U-Net is the most common architecture used for medical image segmentation. Specially designed for biomedical image segmentation, this architecture has conquered the Cell Tracking Challenge in 2015 [9]. Several approaches based on U-Net has presented to resolve the issue of nuclei segmentation. Cui et al. (2018) [10] have proposed a method, inspired by U-Net, to predict nuclei and their contours simultaneously in H&E-stained images. By predicting contour of each nucleus, applying a sophisticated weight map in the loss function they were able to split touching and overlapping nuclei accurately with simple and parameter free post-processing step. Caicedo et al. (2019) [11] trained U-Net model in order to predict the nuclei and their boundaries, giving the loss function with weight which is 10 times more to the boundary class. Winning solutions of the Kaggle data competition 2018[12] were constructed on U-Net and Mask-RCNN. The first best solution by [ods.ai] topcoders [13], used a architecture based on U-Net which is of encoder-decoder type, initializing encoders with pretrained weights. For the post-processing step, a combination of watershed and morphological operations was applied. The third best solution by Deep Retina Team [14] is based on a single Mask-RCNN model using as code-base Matterport’s Mask-RCNN [15]. Kong et al. (2020) [16] have used Two-stage stacked U-Nets, where stage1 for nuclei segmentation and stage2 to tackle the problem of overlapping nuclei. Zhao et al. (2020) [17] used U-Net++, which is a modification to the U-Net [9] architecture, which combined U-Nets of different depths. Pan et al. [18] proposed AS-UNet which is an extension to UNet consists of three parts: encoder module, decoder module and atrous convolutional module. The outcome of the system showed that nuclei could be segmented effectively.

1.2 Nuclei Segmentation

Nuclei segmentation is an important issue because arrangement of nuclei is interrelated with the result [19] and nuclear morphology takes a vital role in different cancer grading schemes [20, 21]. However, there is lot of challenges and difficulties related to this task is associated with image acquisition: presence of noise, background clutter [5], blurriness [22]; Biological data: nucleus occlusion [5], touching or overlapping nuclei [5], variations in shape [22] and texture (differences in chromatin distribution) [10], differences in nuclear appearance in different pathologies [23]; Experimental variations: preparation of samples isn’t uniform [24], variations due to different illumination conditions, use of different staining methods [24]. The review on segmentation [25], shows that detecting these nuclei is not a difficult task, but finding the borders of these nuclei and/or touching nuclei accurately is the present challenge.

1.3 Dataset

The dataset provided by Kaggle 2018 DSB challenge is used. The dataset includes 871 images with 37, 333 manually annotated nuclei. The images represent 31 experiments with 22 cell types, 15 different resolutions and 5 groups of images which are visually indistinguishable. This dataset includes 2D light microscopy images with different staining methods including DAPI, Hoechst or H&E and cells of different sizes which display the structures from variety of organs and animal model. Out of 31 experiments, 16 are for training (670 samples), first-stage evaluation (65 samples) and 15 for second-stage evaluation (106 samples).

2 Proposed Methods

The methodology employed in the experiment is shown in Fig. 1. The methodology has three steps: image pre-processing, nuclei segmentation and post-processing.

2.1 Image Pre-processing

During the process of data collection, due to influence of various factors there exist large imaging differences in the images of the dataset which affect the image segmentation results. Hence, there is a necessity of pre-processing step before segmentation. Firstly, most of the images in the dataset are of grayscale and a few are of colored, the color images are changed into grayscale. Secondly, in some of the images the contrast between the background and nuclei is low, the dataset is pre-processed for histogram equalization to distinguish well the nuclei from the background. Then, to improve signal-to-noise ratio of the image, it is necessary to pre-process the image by filtering. In the experiment, the image is pre-processed by Gaussian smoothing filter. Before training the network, image resizing and normalization is done. To overcome the phenomenon of overfitting in CNN, data augmentation is done by using translations, rotations, horizontal/vertical flipping and zoom.

2.2 Nuclei Segmentation

U-Net architecture is proposed to segment the nuclei from the images in the dataset because of its simplicity towards image segmentation. U-Net is inspired from FCN, however it has more up-sampling layers than FCN, making it symmetric as represented in the Fig. 1.

Fig. 1.
figure 1

Methodology

The U-Net architecture includes two paths [9]. The first path is the down sampling path, the contracting path which is known as encoder. The encoder is composed of convolution and pooling layers, which allows extracting high level features from the image. In the encoder the size of the image decreases, whereas the depth increases. While the spatial information is decreasing the receptive field is increasing, due to max pooling operations. After max pooling operations, less important pixels are removed. The encoder generates feature maps which are low resolution representations of the input image. The second path is up-sampling path, an expanding path also known as decoder. This path converts the low-resolution image into high-resolution image that represents pixel-wise segmentation of the original image. For each layer in the expanding path the image’s height and width are doubled and depth is halved. At this step, spatial information which is present in the contracting path is included into the expanding path and this operation is represented by horizontal gray arrow in Fig. 1.

2.3 Post-processing

After nuclei segmentation, to handle touching/overlapping nuclei watershed transform is used to separate the large objects with the combination of morphological operations.

3 Experimental Results and Discussion

The model is implemented with Keras Functional API over tensorflow framework. Our model took 90 min for training where each step took 2 s on NVIDIA RTX 2080Ti.

3.1 Hyperparameter

Training of the model was done for 50 epochs with batch size 16, Adam Optimizer, Binary cross entropy as loss function, ReLu activation function at convolution layers, sigmoid activation function at output layer and learning rate at le−5. Table 1 lists the hyperparameters.

Table 1. Hyperparameter setting

3.2 Evaluation Metrics

The evaluation metrics used were Precision, Recall, F1 Score and IoU and are calculated as shown in the Eqs. (14). TP, FP, TN and FN denote true positive, false positive, true negative and false negative [18].

$${\rm{Precision}} = {\rm{TP}}/ \left( {{\rm{TP}} + {\rm{FP}}} \right)$$
(1)
$${\rm{Recall}} = {\rm{TP}}/ \left( {{\rm{TP}} + {\rm{FN}}} \right)$$
(2)
$${\rm{F1}} = \left( {{2}*{\rm{Precision}}*{\rm{Recall}}} \right) / \left( {{\rm{Precision}} + {\rm{Recall}}} \right)$$
(3)
$${\rm{IoU}} = \left|{\rm{y}}_{\rm{t}} \cap {\rm{y}}_{\rm{p}} \right| { / (} \left|{\rm{y}}_{\rm{t}} \right| + \left|{\rm{y}}_{\rm{p}} \right| - \left|{\rm{y}}_{\rm{t}} \cap {\rm{y}}_{\rm{p}} \right|)$$
(4)
Table 2. Performance comparison of the model over various classical networks

Table 2 shows the evaluation metric values compared with the state-of-the-art model [18]. The state-of-the-art model in [18] is applied on MOD dataset, a multi-organ which is of 30 H&E images from seven organs such as kidney, breast, colon, stomach, prostate, liver and bladder, 1000 * 1000 resolution with 21,000 manually annotated nuclei. Comparison of the results shows that proposed method performs better than the models in [18].

Table 3. Performance of the proposed method

Table 3 shows the result of proposed method when applied on Kaggle dataset. It shows that the proposed method is behaving significantly better when compared to another dataset shown in Table 2. From the observation it shows that the proposed method is behaving different when applied on different datasets. And also, the method is performing better on multi-organ dataset.

3.3 Segmentation Result

Segmentation result of the model is shown in Fig. 2, which includes some of the images of the dataset used. From Fig. 2, the nucleus positions between original image and the predicted image are similar which indicates that the model behavior is accurate.

Fig. 2.
figure 2

Segmentation result: (a) original image, (b) Ground truth mask (c) Predicted image

4 Conclusion

The dataset set used in the experiment is of diversified data with varying size, shape and color and includes data from multi-organ. Method when applied on different dataset results in different behavior. The result shows that the proposed method proves to be more promising on the multi-organ dataset. Proposed method is a semantic segmentation network, therefore if more than one nucleus is touching, they will be recognized as a single object. The model is applied for first-stage evaluation of the dataset. From the experiment it is observed that, the model is clearly segmenting the images with non-touching nuclei whereas the problem of touching and/or overlapping nuclei is handled by applying postprocessing method.