High-resolution deep transferred ASPPU-Net for nuclei segmentation of histopathology images

Chanchal, Amit Kumar; Lal, Shyam; Kini, Jyoti

doi:10.1007/s11548-021-02497-9

High-resolution deep transferred ASPPU-Net for nuclei segmentation of histopathology images

Original Article
Published: 07 October 2021

Volume 16, pages 2159–2175, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

High-resolution deep transferred ASPPU-Net for nuclei segmentation of histopathology images

Download PDF

954 Accesses
17 Citations
1 Altmetric
Explore all metrics

Abstract

Purpose

Increasing cancer disease incidence worldwide has become a major public health issue. Manual histopathological analysis is a common diagnostic method for cancer detection. Due to the complex structure and wide variability in the texture of histopathology images, it has been challenging for pathologists to diagnose manually those images. Automatic segmentation of histopathology images to diagnose cancer disease is a continuous exploration field in recent times. Segmentation and analysis for diagnosis of histopathology images by using an efficient deep learning algorithm are the purpose of the proposed method.

Method

To improve the segmentation performance, we proposed a deep learning framework that consists of a high-resolution encoder path, an atrous spatial pyramid pooling bottleneck module, and a powerful decoder. Compared to the benchmark segmentation models having a deep and thin path, our network is wide and deep that effectively leverages the strength of residual learning as well as encoder–decoder architecture.

Results

We performed careful experimentation and analysis on three publically available datasets namely kidney dataset, Triple Negative Breast Cancer (TNBC) dataset, and MoNuSeg histopathology image dataset. We have used the two most preferred performance metrics called F1 score and aggregated Jaccard index (AJI) to evaluate the performance of the proposed model. The measured values of F1 score and AJI score are (0.9684, 0.9394), (0.8419, 0.7282), and (0.8344, 0.7169) on the kidney dataset, TNBC histopathology dataset, and MoNuSeg dataset, respectively.

Conclusion:

Our proposed method yields better results as compared to benchmark segmentation methods on three histopathology datasets. Visual segmentation results justify the high value of the F1 score and AJI scores which indicated that it is a very good prediction by our proposed model.

Nuclei Segmentation in Histopathological Images Using Two-Stage Learning

An automatic nuclei segmentation method based on deep convolutional neural networks for histopathology images

Article Open access 17 October 2019

Nucleus segmentation from the histopathological images of liver cancer through an efficient deep learning framework

Article 27 March 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Histopathology is one of the best methods to diagnose cancer disease and early detection of cancer disease is the demand of today’s time. A histopathology image consists of hematoxylin stains cell cores called nuclei, and eosin stain cells with other connected tissue called cytoplasm which is of pink shading. According to the pathologist report [1], grade of cancer is determined by the morphology of tissue while the stage of cancer in the body is decided by tumor size, location, and spread. Most of the information like the stage of cancer, the grade of cancer, mitotic rate, lymph node status of the cancer tissue is identified by analyzing the hematoxylin morphology, location, and spread. The traditional method to diagnose cancer disease is to examine the morphological structure and distribution of nuclei manually by pathologists. After taking the image of the sample tissue, pathologists decide whether tissue regions are cancerous or not and how much is the malignancy level. A lot of time and effort are required in the manual analysis of histopathology images due to the complexity involved in such images. Preparation of histopathology slides is the first stage of the automatic segmentation method which involves tissue collection, fixation, embedding, sectioning, staining, and visualization. The collection of samples from the human body and preserved these samples with some fixation material is the primary step in the preparation of slides. The tissue is implanted in several blocks called embedding and this tissue is used as a block form for the sectioning purpose. Usually, we require a thinner section for the diagnosis purpose. Adding color in the thinner section of tissue is basically to make nuclei more expressive called staining and finally, the stained slide is examined by the different WSI scanners. Most pathologists rely on the visualization of histopathology slides with the help of a microscope or the WSI scanner because a microscope provides faster focusing and scanning. As a segmentation task, our focus is to separate hematoxylin stains cells from eosin stain cells and other connected tissue. Nuclei segmentation can be categorized into conventional and CNN-based deep learning segmentation approaches. The traditional method of image segmentation includes the discontinuity-based approach, similarity-based approach, global clustering, superpixel segmentation, watershed segmentation, active contour segmentation techniques, etc. A summary of conventional image segmentation is presented in Table 1.

Table 1 Conventional (Non deep learning) image segmentation methods

Full size table

These methods performed better in the field of biomedical application but the main challenge is to deal with complex histology images where conventional techniques result in under-segmentation or over-segmentation. To perform the segmentation task, we collected three datasets namely the kidney cancer histopathology image dataset, Triple-Negative Breast Cancer (TNBC) dataset, and Multi-Organ nuclei segmentation (MoNuSeg) dataset. In this study, we are using a deep learning framework for the segmentation process where features are learned automatically from the model, and based on that they predict the segmented image. Deep learning architectures learn the model parameters to separate the nuclei in histology images. The deep learning framework is driven by large amounts of data, activation function, an effective optimization algorithm, and powerful loss function. An activation function can map the input to a meaningful number and an optimization function is responsible for the fast convergence of model parameters. A loss function effectively calculates the difference between the true value and predicted value in the case of regression problems while calculates true probability map and predicted probability map in the case of classification problems. Many recent studies reported that deep learning-based segmentation models provide better accuracy compared to traditional manual feature extraction methods. Although the potential of a deep learning algorithm has proven to be better, these are the challenges in the automatic segmentation of H & E stained histopathology images.

1.
Difficult to deal with the clumped morphological appearance of nuclei in histopathology images, as a single histopathological slide may have more than thousands of nuclei.
2.
Requires a sufficient amount of high-quality labeled data verified by an experienced medical practitioner.
3.
Computational cost is an important issue in deep learning models.

To address the challenges associated with the nuclei segmentation from histopathological images, our contribution is as follows:

1.
A most accurate deep learning architecture of nuclei segmentation from H & E stained histopathology images.
2.
To enrich the features at five distinct stages, our proposed model effectively utilized residual connection throughout the network and atrous spatial pyramid pooling (ASPP) layer at the bottleneck.
3.
Compared to the benchmark segmentation models composed of the deep and thin path, the proposed network is wide and deep that effectively leverages the strength of residual learning as well as encoder–decoder architecture.
4.
The proposed model carefully experimented on three publically available histopathology dataset and outperform five state-of-the-art deep learning models in terms of quality metrics.

The rest of the manuscript is organized as follows: In the related work section, we provide a concise study of the most recent benchmark approach. The detailed mathematical analysis of the proposed model is presented in the proposed architecture section. The process of experimentation is described in the training and implementation section. Experimental results and comparisons are presented in the results and discussion section. Conclusion and future works we provide in the conclusion section.

Related work

Deep learning techniques are a good option whenever availability of sufficient data. As a segmentation task, to regain the relevant information which we lose while going deeper in the network during pooling operation is the most important. A great contribution in the field of biomedical image segmentation [11] called UNet. UNet is an encoder and decoder path that care about both the context and location of an object in the image. UNet achieves very good prediction result for different segmentation applications. For gland segmentation, a deep contour aware network (DCAN) proposed by Hao Chen et al. [12], able to separate overlapped objects. DCAN pool the multilevel features from the hierarchical architecture then fuse it together that improves the performance of the model. Better visualization of object boundary by proposing contour-based architecture make the model efficient. In a segmentation process, pooling operation losses some relevant information; to address this problem, Fisher Yu et al. [13] merge the multiscale information by aggregating the output coming from different dilation rates to regain the resolution. This architecture is the best fit for dense prediction due to the expansion of the receptive field. Semantic segmentation of the road and indoor called SegNet [14] has encoder, decoder, and pixel-wise classification stages. To upsample the low-resolution feature map, utilization of max-pooling indices in the decoder side provides exact boundary information which is very important in any road and indoor understanding. [15] proposed a fully convolutional model which is designed for semantic segmentation. Since their model is inspired by AlexNet which is a classification model and is extended for the segmentation task because it does classification and localization both. [16]’s method is quick and capable to process large batches of data in a sensible amount of time. Their deep learning architecture effectively combines the concept of ResNet and UNet. Recognition of close and overlapped nuclei was the major concern. This particular issue was addressed in [17] by predicting an eroded version of the annotated nuclei. They formulated a global loss function to solve the issue of segmentation of overlapped nuclei. To increase segmentation accuracy a meaningful modification in UNet architecture by implementing an attention gate in [18], which is able to merge only relevant features before concatenation. The best part of this work is that without much computational complexity the attention module can be integrated with any other segmentation model. Different dilation rate is mostly equivalent to different sparse kernels without the extra computational overhead. The idea is effective in [19] to enlarge the field of view. Convolution strategies have a big impact on the efficiency of the system.

The idea of dimension-wise convolution has been implemented by [26] to improve the performance for segmentation and classification tasks. To extract the semantic features of nuclei at different stages [22] incorporated an improved version of atrous spatial pyramid pooling in the encoder–decoder network which used concave point detection method segment touching nuclei and achieved better performance. ASPPU module in combination with binary cross-entropy loss and dice loss better handle the class imbalance problems. One of the recent concepts by [23] is UNet++ for medical image segmentation. UNet++ modify the skip connection method to provide flexibility in the fusion of multilayered features in the decoder path of the network and also their network shares the common encoder for varying depths. Segmentation of liver histopathology images having thousands of nuclei and many of them are overlapped. [24] incorporated residual and attention mechanism in encoder–decoder architecture. Residual block to regain the meaningful information, attention block for correct localization of objects in the decoder section, and bottleneck layer for the maximum extraction of deeper layer features. From the above existing work, it is clear that segmentation of nuclei from histopathological images is still a challenging issue. A brief comparison of existing deep learning approaches is presented in Table 2. The novelty of the proposed model lies in the manner in which we extended the idea of UNet [11] and ASPPU-Net [22] such that the proposed network is efficient and capable to extract intermediate features.

Table 2 Deep learning image segmentation methods

Full size table

Proposed architecture

In deep learning framework segmentation strategy as in [11, 18, 24] involves encoder path where model learns the ‘what’ information or context in the image. In decoder path, the model learns the ‘Where’ information or location in the image. The encoder path consists of repeated application of standard convolution layer and an additional parallel path to the main network through a residual connection to minimize the loss in pooling operation. Each stage of the downsampling path has a ($2\times 2$) max-pooling layer which effectively reduces the spatial size of the image. Similar level features are concatenated with upsampling layers. In the upsampling path, the features having different shapes and size has cropped and merged into the next layer. Our wide network used a powerful decoder by aggregating the similar level spatial features and collecting maximum possible residual information. At the final stage, a ($1\times 1$) convolution is used to map the size ($512\times 512\times 16$) to ($512\times 512\times 1$). To improve the performance of the network shown in Fig. 1, we introduce an additional information retrieval module to extract a more relevant feature used in [22] called ASPPU bottleneck path where a CNN uses multiple dilation rate. Dilation rate is an additional parameter that differed from the resultant feature map to visualize a larger area. Different dilation rates are applied to the same layer and results are then concatenated, which allows the model to take advantage of multiscale feature extraction. This yields very good results because hierarchical information with varying sizes can be identified in the same layer.

Standard convolution layer

Convolution layers consist of a set of learnable parameters. Our input image has three color channels (RGB) and has a dimension of $512\times 512\times 3$, on which we are applying a filter of size ($3\times 3$). If we talk about standard 2D convolution and if we have x(p,q) is the input feature map and h(p,q) is the filter kernel, then y(p,q) is the output of 2D convolution expressed in Eq. (1).

$$\begin{aligned} y(p,q)=\sum _{j=-\infty }^{\infty }\sum _{k=-\infty }^{\infty }x(j,k)h(p-j,q-k). \end{aligned}$$

(1)

When we apply a kernel of $K\times K$ on the image of $N\times N$ with padding P and stride S, we will get the image size as in Eq. (2).

$$\begin{aligned} N\times N \rightarrow \left( \frac{N-K+2P}{S}+1 \right) \left( \frac{N-K+2P}{S}+1 \right) .\nonumber \\ \end{aligned}$$

(2)

Without padding and single-step, stride, the relation between the input image and the output image is in Eq. (3).

$$\begin{aligned} N\times N \rightarrow \left( N-K+1 \right) \left( N-K+1 \right) . \end{aligned}$$

(3)

High-resolution layer

Proposed architecture addressed the problem of degradation of information in deeper network by introducing deep and wide residual network which is easier to train and optimize. The residual connection is realized by creating an additional path parallel to the main encoder–decoder path of the network to restore the flow of information in deep network. Instead of deep and thin encoder and the decoder path our network has wide and deep path that effectively leverage the strength of residual learning as well as encoder–decoder architecture. Proposed architecture has a high-resolution encoder in Fig. 2a, ASPP bottleneck path in Fig. 2b for multilevel feature extraction and an effective decoder shown in Fig. 2c.

Activation function

Any linear function has less capability to learn any complex input. Since the outcome of the convolution layer is based on linear operation, a nonlinear operation is needed to map complex input to a meaningful output. The nonlinear activation function allows the network to learn complex data, compute and learn almost any function, and provide accurate predictions. It also helps the model to generalize or adapt to a variety of data and to differentiate the output. Rectified Linear Unit (ReLU) is the most popular activation function in deep learning models. ReLU takes care of several problems faced by the sigmoid and the tanh activation function, and it also avoids and rectifies the vanishing gradient problems. The ReLU activation function simply has a value of zero, if it receives any negative input, but for any positive value, it returns that value like a linear function. It is computationally economical compared to sigmoid and tanh. Mathematically, ReLU activation function and its derivative are expressed by Eqs. (4) and (5).

$$\begin{aligned} f(x)= & {} {\left\{ \begin{array}{ll} 0 &{} \text { if } x< 0 \\ x &{} \text { if } x\ge 0 \end{array}\right. } \end{aligned}$$

(4)

$$\begin{aligned} f'(x)= & {} {\left\{ \begin{array}{ll} 0 &{} \text { if } x< 0 \\ 1 &{} \text { if } x\ge 0 \end{array}\right. }. \end{aligned}$$

(5)

Pooling layer

Pooling is a concept that makes our model, location invariant, scale-invariant, rotation invariant. It acts as an additional layer. Whatever we have a higher value within the kernel dimension, where higher value corresponds to object region, only that area will reflect in the higher layer which makes detection task easy. By applying a kernel of size K = ($2\times 2$) and stride size $S=2$ on ($4\times 4$) two-dimensional image, the resultant feature size is shown in Fig. 3.

Batch normalization

Due to lots of operations happening between layers, if the input changes slightly in the deep network, this can lead to large changes at later layers. The distribution of each layer’s input changes during training. As we train the network weight keeps changing. If the weight changes randomly in successive iterations, it becomes difficult for the network to adjust each layer’s input. Batch normalization [27] makes sure the distribution does not change too much. During the training procedure gradient of the loss, plays an important role in the convergence. Visualization of internal covariate shift in the deeper network can be seen in Fig. 4. Batch normalization we can think of as an additional layer and it works well in the very deep network. It has the following advantages (a) faster convergence, (b) works as a regularizer, (c) avoids internal covariate shift, and (d) we can train a deeper network.

Let $T_{1}$, $T_{2}$ are two transformations at a particular layer L shown in Eq. (6). Two layers are characterized by weights $W_{1}$ and $W_{2}$. If $W_{1}$ changes, $T_{1}$ will change, and input to $T_{2}$ will change. If $W_{2}$ changes L will change. If these changes are random and large then there is the problem of convergence in deep neural networks.

$$\begin{aligned}&L=T_{2}\left( T_{1}\left( U,W_{1} \right) ,W_{2} \right) \end{aligned}$$

(6)

$$\begin{aligned}&\mathrm{Batch}\, \mathrm{Norm}\left( y_{o} \right) ^{\left( m \right) }= \beta _{1}\left( \frac{\left( y_{o} \right) ^{\left( m \right) }-\mu \left( y_{0} \right) }{\sigma \left( y_{0} \right) } \right) +\beta _{2} \end{aligned}$$

(7)

Batch normalization normalizes each activation independently by controlling the mean and standard deviation of the layer’s output. The process expressed in Eq. (7), where $y_{o}^{m}$ is the value of the output $y_{o}$ on the mth input of a batch and $\beta _{1}, \beta _{2} $ are trainable parameters.

Internal covariate shift

Santurkar et al. [28] analyzed the effect of internal covariate shift, here the idea is the constant changes in the layer’s input distribution are beneficial for fast training. Their experiment also suggests that for larger learning rate training may not converge without batch normalization. Impacts of internal covariate shift on optimization, where it measures the effect at the $k^th$ layer of n layer network having parameters $W_{1:n}$ and $W'_{1:n}$ as $\left\| \nabla _{W_{k}}\pounds \left( W_{1:n} \right) -\nabla _{W_{k}}\pounds \left( W'_{1:k-1},W_{k:n} \right) \right\| $. It follow the properties of convex function. If D is a convex subset of $\mathbb {R}^{n}$ and $\nabla _{X}f,X,Y\in \mathbb {R}^{n}$ Let $f:D\mapsto \mathbb {R}$ is a convex function then

1.
f is L Lipschitz if $\left| f\left( Y \right) -f\left( X \right) \right| \le L\left\| Y-X\right\| \;\;\; \forall _{X,Y}\in D$
2.
f is $\beta $-smooth if its gradient is $\beta $-Lipschitz, i.e., $\left\| \nabla _{X}f-\nabla _{Y}f \right\| \le \beta \left\| X-Y\right\| \; \; \; \forall _{X,Y}\in D$.

Dilated convolution layer

The mathematical expression of a 2-D dilated convolution is given by Eq. (8).

$$\begin{aligned} y(p,q)=\sum _{j=1}^{P}\sum _{j=1}^{Q}x(p+r*i,q+r*j)f(i,j) \end{aligned}$$

(8)

y(p, q) is the output of dilated convolution, x(p, q) is the input feature map, f(i, j) is the filter kernel and, r is the rate ($r> 1$) with length and width of P and Q, respectively.

Dilated convolution or atrous convolution is a simple way to increase the field of view by placing spaces between the elements of the kernel. The rate of dilation is controlled by the parameter r. Dilation rate r, insert ($r-1$) spaces between the elements of the kernel. r equal to 1 corresponds no space, called standard convolution. A kernel of size K dilated by a factor r has an effective size in Eq. (9).

$$\begin{aligned} K\ddot{}= K+\left( K-1 \right) \left( r-1 \right) \end{aligned}$$

(9)

Dilated convolution is similar to standard convolution applied to the input with different gaps.

Visualization of the receptive field with different dilation rates is shown in Fig. 5. If we applied dilation rate 1 to the input 2D image in Fig. 5a, it is similar to standard convolution with kernel size $3\times 3$. Dilation rate 2 produces a receptive field of $7\times 7$ by skipping one pixel shown in Fig. 5b. Dilation rate 4 produces a receptive field of $15\times 15$ by skipping three pixels shown in Fig. 5c. The motive of this work is to extract more relevant features by applying different dilation rates in the same layer. Dilated convolution allows us to expand our filter strides with different rates, so this convolution becomes dilation convolution. In this architecture, the maximum relevant information is retrieved by applying three different dilation rates after each pooling layer. Dilated convolution [29] is a generalization of a standard convolution that allow us to control the resolution of features computed by deep CNN in order to capture multi-level feature.

Table 3 Experimental performance comparison of different architectures with three datasets

Full size table

Implementation and training

This experiment is implemented on the latest version of python with Keras and TensorFlow framework. Google GPU and Colab notebook were the resources for conducting the experiments. We have used three histopathological datasets from the literature [17, 30, 31]. (a) Kidney dataset: This dataset of size $400\times 400$ pixels is used in [29], and consists of 730 H&E renal cell carcinoma (RCC) histology images created from 10 WSIs of TCGA. (b) Triple-Negative Breast Cancer (TNBC) dataset: TNBC dataset [30] consists of 33 H&E stained breast tissue of dimension $512\times 512$ pixels, and these tissues are collected from seven different patients. (c) MoNuSeg dataset: This dataset is first used in [31] and composed of 30 H&E stained histology images of size $1000\times 1000$ pixels of seven organs. We partitioned the obtained data into training, validation, and test. After the creation of patches, we considered 80% of the total images for the training of the model and the rest 20% for validation and testing. We had applied some data augmentation techniques such as horizontal flip and vertical flip in the training sample of the TNBC, and MoNuSeg dataset. Adam [32], is the optimization method and binary cross-entropy used in [17] is the loss function used in this study. The reported results of all deep learning models are the averages of three trials conducted independently. We had calculated the final results based on an average of three trials by initializing random weights in each trial. The final quality metrics were measured as the average quality metrics of all images in the test dataset.

Table 4 Network computational complexity in millions

Full size table

Performance metrics

Jaccard coefficient It is a commonly used method for the measurement of overlap of two sets. It is a measure of similarity or dissimilarity between binary data. Jaccard coefficient measurement is the best method to evaluate the performance in the case of nuclei segmentation from histopathology images in a deep learning framework.

$$\begin{aligned} \mathrm{Jaccard}(A,B)= \frac{\mathrm{Number}\quad \mathrm{of}\quad \mathrm{items}\quad \mathrm{in}\quad A \displaystyle \cap B}{\mathrm{Number}\quad \mathrm{of}\quad \mathrm{items}\quad \mathrm{in}\quad A \displaystyle \cup B} \end{aligned}$$

If we take the Jaccard coefficient of a set with itself ratio will be one and the Jaccard coefficient will be one.

$$\begin{aligned} \mathrm{Jaccard}(A,B)= 1. \end{aligned}$$

If two sets are disjoint and have no members in common, then Jaccard coefficient will be zero.

$$\begin{aligned} \mathrm{Jaccard}(A,B)= 0. \end{aligned}$$

If two sets are not the same size, then the Jaccard coefficient will always assign a number between zero and one.

$$\begin{aligned} 0\le \mathrm{Jaccard}(A,B)\le 1. \end{aligned}$$

Jaccard distance/ Jaccard loss = (1 – Jaccard coefficient).

Aggregated Jaccard index (AJI) The concept of AJI was proposed in [31]. AJI compute and measure the performance of segmentation better than the global Jaccard index by incorporating the concept of the connected components. AJI compute and measure the performance of segmentation better than the global Jaccard index by incorporating the concept of the connected components.

$$\begin{aligned} \mathrm{AJI}= \frac{\sum _{k=1}^{M}\left| G_{k}\bigcap P_{S}^{k} \right| }{\sum _{k=1}^{M}\left| G_{k}\bigcup P_{S}^{k} \right| +\sum _{R\epsilon U}^{}\left| P_{R} \right| } \end{aligned}$$

where $ G_{k} $ is kth nuclei of ground truth having M nuclei. $P_{S}^{k}$ is Sth Connected component in prediction result having the highest Jaccard index with ground truth. Each index of S cannot be utilized more than once. U represents a set of the connected components in the prediction result without corresponding ground truth. $ P_{R} $ represents the ground truth and intersection that are not in the associated space. AJI is a connected component-based method that is the improved version of the pixel-based global Jaccard index. A higher AJI value indicates a better segmentation method.

Accuracy Accuracy is a good measure only when we have symmetric dataset. For data being symmetric, Values of false-positive should be almost the same as false negatives.

$$\begin{aligned} \mathrm{Accuracy}= \frac{[\mathrm{TP}+\mathrm{TN}]}{[\mathrm{Total}\; \mathrm{predicted}\; \mathrm{pixels}]} \end{aligned}$$

Precision It tells out of total predicted positive observation how much sample is true positive.

$$\begin{aligned} \mathrm{Precision}= \frac{\mathrm{TP}}{[\mathrm{Total}\; \mathrm{positive}\; \mathrm{prediction}]} \end{aligned}$$

Recall/Sensitivity It calculates the ratio of correctly predicted positive observation of the observation in actual class.

$$\begin{aligned} \mathrm{Recall/Sensitivity}=\frac{\mathrm{TP}}{[\mathrm{TP}+\mathrm{FN}]} \end{aligned}$$

F1 Score F1 score has the effect of both precision and recall by calculating the harmonic mean between precision and recall. F1 Score is the best method to measure that how much information is retrieved used by [33]. The range of the F1 Score is [0, 1]. The greater the F1 score, the better is the performance of our model.

$$\begin{aligned} \mathrm{F1} \mathrm{Score} =2 \times \, \frac{[\mathrm{Recall}\times \, \mathrm{Precision}]}{[\mathrm{Recall}+\mathrm{Precision}]} \end{aligned}$$

Results

Comparison of methods

Table 3 shows a comparison of the proposed architecture with five other segmentation architectures on three datasets. Performance measurement in terms of F1 Score, AJI score, the total number of trainable parameters that describe the training time and complexity and floating point operations per second (FLOPs). The value of FLOPs describes the computing power of given hardware like GPU, the smaller the value, the faster is the computing ability. Here, our F1 Score and AJI score are the averages of all images of the test set in the dataset. Table 4 shows the comparison of the total number of trainable parameters and FLOPs with other segmentation architectures.

Performance comparison of different architectures

Visual segmentation comparisons of predicted images by different models on the Kidney, TNBC, and MoNuSeg datasets are shown in Fig. 6. Visual results in Fig. 6 are the comparison of predicted images of five state-of-the-art models with the proposed model of two images of each of the three mentioned datasets. For the purpose of discussion on segmentation accuracy, we are taking one sample test image from the kidney dataset shown in row-1 of Fig. 6. Their corresponding ground truth image has 57 annotated nuclei. U-Net [11], detected 53 nuclei clearly, two nuclei are not detected and two are in clustered form. Four additional ducts are detected in U-Net which are not desirable and not present in the ground truth. SegNet [14], also not detected any additional things, but out of 57 nuclei only 47 nuclei detected accurately, three are detected partially and seven nuclei are not detected. The best part of SegNet architecture is that it separates the overlapped nuclei. Attention U-Net [18] is an extended version of the original U-Net that detected 49 nuclei clearly, two nuclei are in overlapped form, five nuclei are partially detected, one nucleus not detected and five additional things found that are not required. Dist [17], architecture detected 49 nuclei almost similar to ground truth, three nuclei not able to detect, three partially detected, and two are in clustered form. ASPPU-Net [22] identifies 7 additional shapes, 2 nuclei are in clustered form out of 57 nuclei.

The best part of our proposed model is out of 57 nuclei, 55 nuclei are clearly identified, no undesirable things are detected, the morphology of the detected nuclei are exactly similar to the ground truth image and it also reflects from performance metrics shown in tabular form. Results indicated that our architecture is able to retrieve more information compared to others. Training and validation accuracy, training, and validation loss curve on three datasets are shown in Fig. 7. These plots indicated that the proposed model better learns the parameters for the prediction on each of mentioned datasets.

Conclusion

Automatic segmentation of H&E stained cell nuclei from histopathology image is a pre-requisites in digital pathology. In this paper, a CNN-based architecture called high-resolution deep transferred ASPPU-Net was proposed that addressed automatic nuclei segmentation of histopathology images having a widely varied spectrum with a large number of artifacts. The implemented networks effectively leverage the strength of residual learning as well as encoder–decoder architecture by incorporating wide and deep network paths that strengthen the intermediate features. Promising results were obtained due to the effective use of the wide and deep network with ASPP at the bottleneck layer. To prove the worthiness of the proposed architecture, we have used the most preferred performance matrices F1 score and AJI score by performing experiments on three different publically available datasets. The proposed model achieved a considerable margin in terms of F1 score and AJI score over state-of-the-art models and works effectively for three histopathological datasets.

Although the proposed model produced excellent results, the segmentation of overlapped nuclei is still a challenge for some histopathology images and reported results are still sub-optimal for clinical use. Furthermore, vanished and blurry boundaries of detected nuclei are another issue. The problem of over-segmentation and under-segmentation for nuclei associated with complex histopathology images has not been solved completely. These issues will be the focus of our future work.

References

Washington MK, Berlin J, Branton P, Burgart LJ, Carter DK, Fitzgibbons PL, Halling K, Frankel W, Jessup J, Kakar S, Minsky B, Nakhleh R, Compton CC (2009) Protocol for the examination of specimens from patients with primary carcinoma of the colon and rectum. Arch Pathol Lab Med 133:1539–1551
Article Google Scholar
Gonzalez RC, Woods RE (2006) Digital image processing, 3rd edn. Prentice Hall, New York
Google Scholar
Win K Y , Choomchuay S, Hamamoto K (2017) K mean clustering based automated segmentation of overlapping cell nuclei in pleural effusion cytology images. In: International conference on advanced technologies for communications (ATC), pp 265–269. https://doi.org/10.1109/ATC.2017.8167630
Dhanachandra N, ChanuY J (2020) An image segmentation approach based on fuzzy c-means and dynamic particle swarm optimization algorithm. Multimed Tools Appl 79:18839–18858. https://doi.org/10.1007/s11042-020-08699-8
Article Google Scholar
Stutz D, Hermans A, Leibe B (2018) Superpixels: an evaluation of the state-of-the-art. Comput Vis Image Understanding 166:1–27. https://doi.org/10.1016/j.cviu.2017.03.007
Article Google Scholar
Albayrak A, Bilgin G (2019) Automatic cell segmentation in histopathological images via two-staged superpixel-based algorithms. Med Biol Eng Comput 57(3):653–665
Article Google Scholar
Cousty J, Bertrand G, Najman L, Couprie M (2010) Watershed cuts: thinnings, shortest path forests, and topological watersheds. IEEE Trans Pattern Anal Mach Intell 32(5):925–939. https://doi.org/10.1109/TPAMI.2009.71
Article PubMed Google Scholar
Gamarra M, Zurek E, Escalante HJ, Hurtado L, San-Juan-Vergara H (2019) Split and merge watershed: a two-step method for cell segmentation in fluorescence microscopy images. Biomed Signal Process Control. https://doi.org/10.1016/j.bspc.2019.101575
Article PubMed PubMed Central Google Scholar
Song T, Sanchez V, Eidaly H, Rajpoot NM (2017) Dual-channel active contour model for megakaryocytic cell segmentation in bone marrow trephine histology images. IEEE Trans Biomed Eng 64(12):2913–2923
Article Google Scholar
Ciecholewski M (2016) An edge-based active contour model using an inflation/deflation force with a damping coefficient. Expert Syst Appl 44:22–36
Article Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: Proceedings MICCAI. Springer, Munich, pp 234–241
Chen H , Qi X , Yu L, Heng PA (2016) DCAN: deep contour-aware networks for accurate gland segmentation. Computer Vision and Pattern Recognition. arXiv:1604.02677v1 [cs.CV]
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. Published as a conference paper at ICLR, San Juan, PR, USA, pp 1–13. https://arxiv.org/pdf/1511.07,122.pdf
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Article Google Scholar
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Article Google Scholar
Remi Y, Eycke V, Balsat C, Verset L, Debeir O, Salmon I, Decaestecker C (2018) Segmentation of glandular epithelium in colorectal tumours to automatically compartmentalise ihc biomarker quantification: a deep learning approach. Med Image Anal 49:35–45
Article Google Scholar
Naylor P, Lae M, Reyal F, Walter T (2019) Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE Trans Med Imaging 38(2):448–459
Article Google Scholar
Schlemper J, Oktay O, Schaap M, Heinrich M, Kainz B, Glocker B, Rueckert D (2019) Attention gated networks: learning to leverage salient regions in medical images. Med Image Anal 53:197–207
Article Google Scholar
Kim J, Heo YS (2019) Efficient semantic segmentation using spatio-channel dilated convolutions. IEEE Access 7:154239–154252
Article Google Scholar
Pan X, Li L, Yang D, He Y, Liu Z, Yang H (2019) An accurate nuclei segmentation algorithm in pathological image based on deep semantic network. IEEE Access 7:110674–110686
Article Google Scholar
Graham S, Epstein D, Rajpoot N (2019) Rota-Net: rotation equivariant network for simultaneous gland and lumen segmentation in colon histology images. European Congress on Digital Pathology, Springer Cham, pp 109–116
Wan T, Zhao L, Feng H, Li D, Tong C, Qin Z (2020) Robust nuclei segmentation in histopathology using ASPPU-Net and boundary refinement. Neurocomputing 408:144–156
Article Google Scholar
Zhou Z, Siddiquee MMR, Tajbakhsh N, Liang J (2020) Unet++: redesigning skip connections to exploit multiscale features in image segmentation. IEEE Trans Med Imaging 39(6):1856–1867
Article Google Scholar
Lal S, Das D, Alabhya K, Kanfade A, Kumar A, Kini J (2021) NucleiSegNet: robust deep learning architecture for the nuclei segmentation of liver cancer histopathology images. Comput Biol Med 128:104075
Article CAS Google Scholar
Khened M, Kori A, Rajkumar H, Krishnamurthi G, Srinivasan B (2021) A generalized deep learning framework for whole slide image segmentation and analysis. Sci Rep 11:11579
Article CAS Google Scholar
Mehta S, Hajishirzi H, Rastegari M (2019) Dicenet: dimension-wise convolutions for efficient networks. Computer Vision and Pattern Recognition. arXiv:1906.03516
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. Machine Learning. arXiv:1502.03167
Santurkar S, Tsipras D, Ilyas A, Madry A (2019) How does batch normalization help optimization. Neural and Evolutionary Computing. arXiv:1805.11604
Chen LC, Papandreou G, Schroff F, Adam H (2017) Rethinking atrous convolution for semantic image segmentation. Computer Vision and Pattern Recognition. arXiv:1706.05587v3
Irshad H, Kouhsari LM, Waltz G, Bucur O, Nowak JA, Dong F, Knoblauch NW, Beck AH (2015) Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd. In: Pacific Symposium on Biocomputing (PSB). https://doi.org/10.13140/2.1.4067.0721
Kumar N, Verma R, Sharma S, Bhargava S, Vahadane A, Sethi A (2017) A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Trans Med Imaging 36(7):1550–1560
Article Google Scholar
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. In: International conference on learning representations, p 9. arXiv:1412.6980v9
Chanchal AK, Kumar A, Lal S, Kini J (2021) Efficient and robust deep learning architecture for segmentation of kidney and breast histopathology images. Comput Electr Eng 92:107177
Article Google Scholar

Download references

Acknowledgements

The authors would like to thank the Editor-in-Chief and anonymous Reviewers for their constructive comments, which improved the quality of the manuscript.

Funding

This research work was supported in part by the Science Engineering and Research Board, Department of Science and Technology, Govt. of India under Grant No. EEG/2018/000323, 2019.

Author information

Authors and Affiliations

Department of E&C Engineering, National Institute of Technology Karnataka, Surathkal, Mangaluru, Karnataka, 575025, India
Amit Kumar Chanchal & Shyam Lal
Department of Pathology, Kasturba Medical College, Manipal Academy of Higher Education, Mangalore, Manipal, India
Jyoti Kini

Authors

Amit Kumar Chanchal
View author publications
You can also search for this author in PubMed Google Scholar
Shyam Lal
View author publications
You can also search for this author in PubMed Google Scholar
Jyoti Kini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shyam Lal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

We have used three publically available datasets.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chanchal, A.K., Lal, S. & Kini, J. High-resolution deep transferred ASPPU-Net for nuclei segmentation of histopathology images. Int J CARS 16, 2159–2175 (2021). https://doi.org/10.1007/s11548-021-02497-9

Download citation

Received: 03 February 2021
Accepted: 08 September 2021
Published: 07 October 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11548-021-02497-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

High-resolution deep transferred ASPPU-Net for nuclei segmentation of histopathology images

Abstract

Purpose

Method

Results

Conclusion:

Similar content being viewed by others

Nuclei Segmentation in Histopathological Images Using Two-Stage Learning

An automatic nuclei segmentation method based on deep convolutional neural networks for histopathology images

Nucleus segmentation from the histopathological images of liver cancer through an efficient deep learning framework

Explore related subjects

Introduction

Related work

Proposed architecture

Standard convolution layer

High-resolution layer

Activation function

Pooling layer

Batch normalization

Internal covariate shift

Dilated convolution layer

Implementation and training

Performance metrics

Results

Comparison of methods

Performance comparison of different architectures

Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation