Introduction

Histopathology is one of the best methods to diagnose cancer disease and early detection of cancer disease is the demand of today’s time. A histopathology image consists of hematoxylin stains cell cores called nuclei, and eosin stain cells with other connected tissue called cytoplasm which is of pink shading. According to the pathologist report [1], grade of cancer is determined by the morphology of tissue while the stage of cancer in the body is decided by tumor size, location, and spread. Most of the information like the stage of cancer, the grade of cancer, mitotic rate, lymph node status of the cancer tissue is identified by analyzing the hematoxylin morphology, location, and spread. The traditional method to diagnose cancer disease is to examine the morphological structure and distribution of nuclei manually by pathologists. After taking the image of the sample tissue, pathologists decide whether tissue regions are cancerous or not and how much is the malignancy level. A lot of time and effort are required in the manual analysis of histopathology images due to the complexity involved in such images. Preparation of histopathology slides is the first stage of the automatic segmentation method which involves tissue collection, fixation, embedding, sectioning, staining, and visualization. The collection of samples from the human body and preserved these samples with some fixation material is the primary step in the preparation of slides. The tissue is implanted in several blocks called embedding and this tissue is used as a block form for the sectioning purpose. Usually, we require a thinner section for the diagnosis purpose. Adding color in the thinner section of tissue is basically to make nuclei more expressive called staining and finally, the stained slide is examined by the different WSI scanners. Most pathologists rely on the visualization of histopathology slides with the help of a microscope or the WSI scanner because a microscope provides faster focusing and scanning. As a segmentation task, our focus is to separate hematoxylin stains cells from eosin stain cells and other connected tissue. Nuclei segmentation can be categorized into conventional and CNN-based deep learning segmentation approaches. The traditional method of image segmentation includes the discontinuity-based approach, similarity-based approach, global clustering, superpixel segmentation, watershed segmentation, active contour segmentation techniques, etc. A summary of conventional image segmentation is presented in Table 1.

Table 1 Conventional (Non deep learning) image segmentation methods

These methods performed better in the field of biomedical application but the main challenge is to deal with complex histology images where conventional techniques result in under-segmentation or over-segmentation. To perform the segmentation task, we collected three datasets namely the kidney cancer histopathology image dataset, Triple-Negative Breast Cancer (TNBC) dataset, and Multi-Organ nuclei segmentation (MoNuSeg) dataset. In this study, we are using a deep learning framework for the segmentation process where features are learned automatically from the model, and based on that they predict the segmented image. Deep learning architectures learn the model parameters to separate the nuclei in histology images. The deep learning framework is driven by large amounts of data, activation function, an effective optimization algorithm, and powerful loss function. An activation function can map the input to a meaningful number and an optimization function is responsible for the fast convergence of model parameters. A loss function effectively calculates the difference between the true value and predicted value in the case of regression problems while calculates true probability map and predicted probability map in the case of classification problems. Many recent studies reported that deep learning-based segmentation models provide better accuracy compared to traditional manual feature extraction methods. Although the potential of a deep learning algorithm has proven to be better, these are the challenges in the automatic segmentation of H & E stained histopathology images.

  1. 1.

    Difficult to deal with the clumped morphological appearance of nuclei in histopathology images, as a single histopathological slide may have more than thousands of nuclei.

  2. 2.

    Requires a sufficient amount of high-quality labeled data verified by an experienced medical practitioner.

  3. 3.

    Computational cost is an important issue in deep learning models.

To address the challenges associated with the nuclei segmentation from histopathological images, our contribution is as follows:

  1. 1.

    A most accurate deep learning architecture of nuclei segmentation from H & E stained histopathology images.

  2. 2.

    To enrich the features at five distinct stages, our proposed model effectively utilized residual connection throughout the network and atrous spatial pyramid pooling (ASPP) layer at the bottleneck.

  3. 3.

    Compared to the benchmark segmentation models composed of the deep and thin path, the proposed network is wide and deep that effectively leverages the strength of residual learning as well as encoder–decoder architecture.

  4. 4.

    The proposed model carefully experimented on three publically available histopathology dataset and outperform five state-of-the-art deep learning models in terms of quality metrics.

The rest of the manuscript is organized as follows: In the related work section, we provide a concise study of the most recent benchmark approach. The detailed mathematical analysis of the proposed model is presented in the proposed architecture section. The process of experimentation is described in the training and implementation section. Experimental results and comparisons are presented in the results and discussion section. Conclusion and future works we provide in the conclusion section.

Related work

Deep learning techniques are a good option whenever availability of sufficient data. As a segmentation task, to regain the relevant information which we lose while going deeper in the network during pooling operation is the most important. A great contribution in the field of biomedical image segmentation [11] called UNet. UNet is an encoder and decoder path that care about both the context and location of an object in the image. UNet achieves very good prediction result for different segmentation applications. For gland segmentation, a deep contour aware network (DCAN) proposed by Hao Chen et al. [12], able to separate overlapped objects. DCAN pool the multilevel features from the hierarchical architecture then fuse it together that improves the performance of the model. Better visualization of object boundary by proposing contour-based architecture make the model efficient. In a segmentation process, pooling operation losses some relevant information; to address this problem, Fisher Yu et al. [13] merge the multiscale information by aggregating the output coming from different dilation rates to regain the resolution. This architecture is the best fit for dense prediction due to the expansion of the receptive field. Semantic segmentation of the road and indoor called SegNet [14] has encoder, decoder, and pixel-wise classification stages. To upsample the low-resolution feature map, utilization of max-pooling indices in the decoder side provides exact boundary information which is very important in any road and indoor understanding. [15] proposed a fully convolutional model which is designed for semantic segmentation. Since their model is inspired by AlexNet which is a classification model and is extended for the segmentation task because it does classification and localization both. [16]’s method is quick and capable to process large batches of data in a sensible amount of time. Their deep learning architecture effectively combines the concept of ResNet and UNet. Recognition of close and overlapped nuclei was the major concern. This particular issue was addressed in [17] by predicting an eroded version of the annotated nuclei. They formulated a global loss function to solve the issue of segmentation of overlapped nuclei. To increase segmentation accuracy a meaningful modification in UNet architecture by implementing an attention gate in [18], which is able to merge only relevant features before concatenation. The best part of this work is that without much computational complexity the attention module can be integrated with any other segmentation model. Different dilation rate is mostly equivalent to different sparse kernels without the extra computational overhead. The idea is effective in [19] to enlarge the field of view. Convolution strategies have a big impact on the efficiency of the system.

The idea of dimension-wise convolution has been implemented by [26] to improve the performance for segmentation and classification tasks. To extract the semantic features of nuclei at different stages [22] incorporated an improved version of atrous spatial pyramid pooling in the encoder–decoder network which used concave point detection method segment touching nuclei and achieved better performance. ASPPU module in combination with binary cross-entropy loss and dice loss better handle the class imbalance problems. One of the recent concepts by [23] is UNet++ for medical image segmentation. UNet++ modify the skip connection method to provide flexibility in the fusion of multilayered features in the decoder path of the network and also their network shares the common encoder for varying depths. Segmentation of liver histopathology images having thousands of nuclei and many of them are overlapped. [24] incorporated residual and attention mechanism in encoder–decoder architecture. Residual block to regain the meaningful information, attention block for correct localization of objects in the decoder section, and bottleneck layer for the maximum extraction of deeper layer features. From the above existing work, it is clear that segmentation of nuclei from histopathological images is still a challenging issue. A brief comparison of existing deep learning approaches is presented in Table 2. The novelty of the proposed model lies in the manner in which we extended the idea of UNet [11] and ASPPU-Net [22] such that the proposed network is efficient and capable to extract intermediate features.

Table 2 Deep learning image segmentation methods

Proposed architecture

In deep learning framework segmentation strategy as in [11, 18, 24] involves encoder path where model learns the ‘what’ information or context in the image. In decoder path, the model learns the ‘Where’ information or location in the image. The encoder path consists of repeated application of standard convolution layer and an additional parallel path to the main network through a residual connection to minimize the loss in pooling operation. Each stage of the downsampling path has a (\(2\times 2\)) max-pooling layer which effectively reduces the spatial size of the image. Similar level features are concatenated with upsampling layers. In the upsampling path, the features having different shapes and size has cropped and merged into the next layer. Our wide network used a powerful decoder by aggregating the similar level spatial features and collecting maximum possible residual information. At the final stage, a (\(1\times 1\)) convolution is used to map the size (\(512\times 512\times 16\)) to (\(512\times 512\times 1\)). To improve the performance of the network shown in Fig. 1, we introduce an additional information retrieval module to extract a more relevant feature used in [22] called ASPPU bottleneck path where a CNN uses multiple dilation rate. Dilation rate is an additional parameter that differed from the resultant feature map to visualize a larger area. Different dilation rates are applied to the same layer and results are then concatenated, which allows the model to take advantage of multiscale feature extraction. This yields very good results because hierarchical information with varying sizes can be identified in the same layer.

Fig. 1
figure 1

Proposed high-resolution deep transferred ASPPU-Net

Standard convolution layer

Convolution layers consist of a set of learnable parameters. Our input image has three color channels (RGB) and has a dimension of \(512\times 512\times 3\), on which we are applying a filter of size (\(3\times 3\)). If we talk about standard 2D convolution and if we have x(p,q) is the input feature map and h(p,q) is the filter kernel, then y(p,q) is the output of 2D convolution expressed in Eq. (1).

$$\begin{aligned} y(p,q)=\sum _{j=-\infty }^{\infty }\sum _{k=-\infty }^{\infty }x(j,k)h(p-j,q-k). \end{aligned}$$
(1)

When we apply a kernel of \(K\times K\) on the image of \(N\times N\) with padding P and stride S, we will get the image size as in Eq. (2).

$$\begin{aligned} N\times N \rightarrow \left( \frac{N-K+2P}{S}+1 \right) \left( \frac{N-K+2P}{S}+1 \right) .\nonumber \\ \end{aligned}$$
(2)

Without padding and single-step, stride, the relation between the input image and the output image is in Eq. (3).

$$\begin{aligned} N\times N \rightarrow \left( N-K+1 \right) \left( N-K+1 \right) . \end{aligned}$$
(3)

High-resolution layer

Proposed architecture addressed the problem of degradation of information in deeper network by introducing deep and wide residual network which is easier to train and optimize. The residual connection is realized by creating an additional path parallel to the main encoder–decoder path of the network to restore the flow of information in deep network. Instead of deep and thin encoder and the decoder path our network has wide and deep path that effectively leverage the strength of residual learning as well as encoder–decoder architecture. Proposed architecture has a high-resolution encoder in Fig. 2a, ASPP bottleneck path in Fig. 2b for multilevel feature extraction and an effective decoder shown in Fig. 2c.

Fig. 2
figure 2

High-resolution encoder path, ASPP bottleneck path, and decoder path

Activation function

Any linear function has less capability to learn any complex input. Since the outcome of the convolution layer is based on linear operation, a nonlinear operation is needed to map complex input to a meaningful output. The nonlinear activation function allows the network to learn complex data, compute and learn almost any function, and provide accurate predictions. It also helps the model to generalize or adapt to a variety of data and to differentiate the output. Rectified Linear Unit (ReLU) is the most popular activation function in deep learning models. ReLU takes care of several problems faced by the sigmoid and the tanh activation function, and it also avoids and rectifies the vanishing gradient problems. The ReLU activation function simply has a value of zero, if it receives any negative input, but for any positive value, it returns that value like a linear function. It is computationally economical compared to sigmoid and tanh. Mathematically, ReLU activation function and its derivative are expressed by Eqs. (4) and (5).

$$\begin{aligned} f(x)= & {} {\left\{ \begin{array}{ll} 0 &{} \text { if } x< 0 \\ x &{} \text { if } x\ge 0 \end{array}\right. } \end{aligned}$$
(4)
$$\begin{aligned} f'(x)= & {} {\left\{ \begin{array}{ll} 0 &{} \text { if } x< 0 \\ 1 &{} \text { if } x\ge 0 \end{array}\right. }. \end{aligned}$$
(5)

Pooling layer

Pooling is a concept that makes our model, location invariant, scale-invariant, rotation invariant. It acts as an additional layer. Whatever we have a higher value within the kernel dimension, where higher value corresponds to object region, only that area will reflect in the higher layer which makes detection task easy. By applying a kernel of size K = (\(2\times 2\)) and stride size \(S=2\) on (\(4\times 4\)) two-dimensional image, the resultant feature size is shown in Fig. 3.

Fig. 3
figure 3

Max-pooling with kernel \(K=2\) and stride \(S=2\)

Batch normalization

Due to lots of operations happening between layers, if the input changes slightly in the deep network, this can lead to large changes at later layers. The distribution of each layer’s input changes during training. As we train the network weight keeps changing. If the weight changes randomly in successive iterations, it becomes difficult for the network to adjust each layer’s input. Batch normalization [27] makes sure the distribution does not change too much. During the training procedure gradient of the loss, plays an important role in the convergence. Visualization of internal covariate shift in the deeper network can be seen in Fig. 4. Batch normalization we can think of as an additional layer and it works well in the very deep network. It has the following advantages (a) faster convergence, (b) works as a regularizer, (c) avoids internal covariate shift, and (d) we can train a deeper network.

Fig. 4
figure 4

Internal covariate shift

Let \(T_{1}\), \(T_{2}\) are two transformations at a particular layer L shown in Eq. (6). Two layers are characterized by weights \(W_{1}\) and \(W_{2}\). If \(W_{1}\) changes, \(T_{1}\) will change, and input to \(T_{2}\) will change. If \(W_{2}\) changes L will change. If these changes are random and large then there is the problem of convergence in deep neural networks.

$$\begin{aligned}&L=T_{2}\left( T_{1}\left( U,W_{1} \right) ,W_{2} \right) \end{aligned}$$
(6)
$$\begin{aligned}&\mathrm{Batch}\, \mathrm{Norm}\left( y_{o} \right) ^{\left( m \right) }= \beta _{1}\left( \frac{\left( y_{o} \right) ^{\left( m \right) }-\mu \left( y_{0} \right) }{\sigma \left( y_{0} \right) } \right) +\beta _{2} \end{aligned}$$
(7)

Batch normalization normalizes each activation independently by controlling the mean and standard deviation of the layer’s output. The process expressed in Eq. (7), where \(y_{o}^{m}\) is the value of the output \(y_{o}\) on the mth input of a batch and \(\beta _{1}, \beta _{2} \) are trainable parameters.

Fig. 5
figure 5

Visualization of the receptive field with multiple dilation rates

Internal covariate shift

Santurkar et al. [28] analyzed the effect of internal covariate shift, here the idea is the constant changes in the layer’s input distribution are beneficial for fast training. Their experiment also suggests that for larger learning rate training may not converge without batch normalization. Impacts of internal covariate shift on optimization, where it measures the effect at the \(k^th\) layer of n layer network having parameters \(W_{1:n}\) and \(W'_{1:n}\) as \(\left\| \nabla _{W_{k}}\pounds \left( W_{1:n} \right) -\nabla _{W_{k}}\pounds \left( W'_{1:k-1},W_{k:n} \right) \right\| \). It follow the properties of convex function. If D is a convex subset of \(\mathbb {R}^{n}\) and \(\nabla _{X}f,X,Y\in \mathbb {R}^{n}\) Let \(f:D\mapsto \mathbb {R}\) is a convex function then

  1. 1.

    f is L Lipschitz if \(\left| f\left( Y \right) -f\left( X \right) \right| \le L\left\| Y-X\right\| \;\;\; \forall _{X,Y}\in D\)

  2. 2.

    f is \(\beta \)-smooth if its gradient is \(\beta \)-Lipschitz, i.e., \(\left\| \nabla _{X}f-\nabla _{Y}f \right\| \le \beta \left\| X-Y\right\| \; \; \; \forall _{X,Y}\in D\).

Dilated convolution layer

The mathematical expression of a 2-D dilated convolution is given by Eq. (8).

$$\begin{aligned} y(p,q)=\sum _{j=1}^{P}\sum _{j=1}^{Q}x(p+r*i,q+r*j)f(i,j) \end{aligned}$$
(8)

y(pq) is the output of dilated convolution, x(pq) is the input feature map, f(ij) is the filter kernel and, r is the rate (\(r> 1\)) with length and width of P and Q, respectively.

Dilated convolution or atrous convolution is a simple way to increase the field of view by placing spaces between the elements of the kernel. The rate of dilation is controlled by the parameter r. Dilation rate r, insert (\(r-1\)) spaces between the elements of the kernel. r equal to 1 corresponds no space, called standard convolution. A kernel of size K dilated by a factor r has an effective size in Eq. (9).

$$\begin{aligned} K\ddot{}= K+\left( K-1 \right) \left( r-1 \right) \end{aligned}$$
(9)

Dilated convolution is similar to standard convolution applied to the input with different gaps.

Visualization of the receptive field with different dilation rates is shown in Fig. 5. If we applied dilation rate 1 to the input 2D image in Fig. 5a, it is similar to standard convolution with kernel size \(3\times 3\). Dilation rate 2 produces a receptive field of \(7\times 7\) by skipping one pixel shown in Fig. 5b. Dilation rate 4 produces a receptive field of \(15\times 15\) by skipping three pixels shown in Fig. 5c. The motive of this work is to extract more relevant features by applying different dilation rates in the same layer. Dilated convolution allows us to expand our filter strides with different rates, so this convolution becomes dilation convolution. In this architecture, the maximum relevant information is retrieved by applying three different dilation rates after each pooling layer. Dilated convolution [29] is a generalization of a standard convolution that allow us to control the resolution of features computed by deep CNN in order to capture multi-level feature.

Table 3 Experimental performance comparison of different architectures with three datasets

Implementation and training

This experiment is implemented on the latest version of python with Keras and TensorFlow framework. Google GPU and Colab notebook were the resources for conducting the experiments. We have used three histopathological datasets from the literature [17, 30, 31]. (a) Kidney dataset: This dataset of size \(400\times 400\) pixels is used in [29], and consists of 730 H&E renal cell carcinoma (RCC) histology images created from 10 WSIs of TCGA. (b) Triple-Negative Breast Cancer (TNBC) dataset: TNBC dataset [30] consists of 33 H&E stained breast tissue of dimension \(512\times 512\) pixels, and these tissues are collected from seven different patients. (c) MoNuSeg dataset: This dataset is first used in [31] and composed of 30 H&E stained histology images of size \(1000\times 1000\) pixels of seven organs. We partitioned the obtained data into training, validation, and test. After the creation of patches, we considered 80% of the total images for the training of the model and the rest 20% for validation and testing. We had applied some data augmentation techniques such as horizontal flip and vertical flip in the training sample of the TNBC, and MoNuSeg dataset. Adam [32], is the optimization method and binary cross-entropy used in [17] is the loss function used in this study. The reported results of all deep learning models are the averages of three trials conducted independently. We had calculated the final results based on an average of three trials by initializing random weights in each trial. The final quality metrics were measured as the average quality metrics of all images in the test dataset.

Table 4 Network computational complexity in millions

Performance metrics

Jaccard coefficient It is a commonly used method for the measurement of overlap of two sets. It is a measure of similarity or dissimilarity between binary data. Jaccard coefficient measurement is the best method to evaluate the performance in the case of nuclei segmentation from histopathology images in a deep learning framework.

$$\begin{aligned} \mathrm{Jaccard}(A,B)= \frac{\mathrm{Number}\quad \mathrm{of}\quad \mathrm{items}\quad \mathrm{in}\quad A \displaystyle \cap B}{\mathrm{Number}\quad \mathrm{of}\quad \mathrm{items}\quad \mathrm{in}\quad A \displaystyle \cup B} \end{aligned}$$

If we take the Jaccard coefficient of a set with itself ratio will be one and the Jaccard coefficient will be one.

$$\begin{aligned} \mathrm{Jaccard}(A,B)= 1. \end{aligned}$$

If two sets are disjoint and have no members in common, then Jaccard coefficient will be zero.

$$\begin{aligned} \mathrm{Jaccard}(A,B)= 0. \end{aligned}$$

If two sets are not the same size, then the Jaccard coefficient will always assign a number between zero and one.

$$\begin{aligned} 0\le \mathrm{Jaccard}(A,B)\le 1. \end{aligned}$$

Jaccard distance/ Jaccard loss = (1 – Jaccard coefficient).

Fig. 6
figure 6figure 6figure 6

Row-wise visual segmentation comparison of different models on three datasets

Aggregated Jaccard index (AJI) The concept of AJI was proposed in [31]. AJI compute and measure the performance of segmentation better than the global Jaccard index by incorporating the concept of the connected components. AJI compute and measure the performance of segmentation better than the global Jaccard index by incorporating the concept of the connected components.

$$\begin{aligned} \mathrm{AJI}= \frac{\sum _{k=1}^{M}\left| G_{k}\bigcap P_{S}^{k} \right| }{\sum _{k=1}^{M}\left| G_{k}\bigcup P_{S}^{k} \right| +\sum _{R\epsilon U}^{}\left| P_{R} \right| } \end{aligned}$$

where \( G_{k} \) is kth nuclei of ground truth having M nuclei. \(P_{S}^{k}\) is Sth Connected component in prediction result having the highest Jaccard index with ground truth. Each index of S cannot be utilized more than once. U represents a set of the connected components in the prediction result without corresponding ground truth. \( P_{R} \) represents the ground truth and intersection that are not in the associated space. AJI is a connected component-based method that is the improved version of the pixel-based global Jaccard index. A higher AJI value indicates a better segmentation method.

Accuracy Accuracy is a good measure only when we have symmetric dataset. For data being symmetric, Values of false-positive should be almost the same as false negatives.

$$\begin{aligned} \mathrm{Accuracy}= \frac{[\mathrm{TP}+\mathrm{TN}]}{[\mathrm{Total}\; \mathrm{predicted}\; \mathrm{pixels}]} \end{aligned}$$

Precision It tells out of total predicted positive observation how much sample is true positive.

$$\begin{aligned} \mathrm{Precision}= \frac{\mathrm{TP}}{[\mathrm{Total}\; \mathrm{positive}\; \mathrm{prediction}]} \end{aligned}$$

Recall/Sensitivity It calculates the ratio of correctly predicted positive observation of the observation in actual class.

$$\begin{aligned} \mathrm{Recall/Sensitivity}=\frac{\mathrm{TP}}{[\mathrm{TP}+\mathrm{FN}]} \end{aligned}$$

F1 Score F1 score has the effect of both precision and recall by calculating the harmonic mean between precision and recall. F1 Score is the best method to measure that how much information is retrieved used by [33]. The range of the F1 Score is [0, 1]. The greater the F1 score, the better is the performance of our model.

$$\begin{aligned} \mathrm{F1} \mathrm{Score} =2 \times \, \frac{[\mathrm{Recall}\times \, \mathrm{Precision}]}{[\mathrm{Recall}+\mathrm{Precision}]} \end{aligned}$$

Results

Comparison of methods

Table 3 shows a comparison of the proposed architecture with five other segmentation architectures on three datasets. Performance measurement in terms of F1 Score, AJI score, the total number of trainable parameters that describe the training time and complexity and floating point operations per second (FLOPs). The value of FLOPs describes the computing power of given hardware like GPU, the smaller the value, the faster is the computing ability. Here, our F1 Score and AJI score are the averages of all images of the test set in the dataset. Table 4 shows the comparison of the total number of trainable parameters and FLOPs with other segmentation architectures.

Fig. 7
figure 7

Training accuracy, validation accuracy, training loss, validation loss curve on three datasets

Performance comparison of different architectures

Visual segmentation comparisons of predicted images by different models on the Kidney, TNBC, and MoNuSeg datasets are shown in Fig. 6. Visual results in Fig. 6 are the comparison of predicted images of five state-of-the-art models with the proposed model of two images of each of the three mentioned datasets. For the purpose of discussion on segmentation accuracy, we are taking one sample test image from the kidney dataset shown in row-1 of Fig. 6. Their corresponding ground truth image has 57 annotated nuclei. U-Net [11], detected 53 nuclei clearly, two nuclei are not detected and two are in clustered form. Four additional ducts are detected in U-Net which are not desirable and not present in the ground truth. SegNet [14], also not detected any additional things, but out of 57 nuclei only 47 nuclei detected accurately, three are detected partially and seven nuclei are not detected. The best part of SegNet architecture is that it separates the overlapped nuclei. Attention U-Net [18] is an extended version of the original U-Net that detected 49 nuclei clearly, two nuclei are in overlapped form, five nuclei are partially detected, one nucleus not detected and five additional things found that are not required. Dist [17], architecture detected 49 nuclei almost similar to ground truth, three nuclei not able to detect, three partially detected, and two are in clustered form. ASPPU-Net [22] identifies 7 additional shapes, 2 nuclei are in clustered form out of 57 nuclei.

The best part of our proposed model is out of 57 nuclei, 55 nuclei are clearly identified, no undesirable things are detected, the morphology of the detected nuclei are exactly similar to the ground truth image and it also reflects from performance metrics shown in tabular form. Results indicated that our architecture is able to retrieve more information compared to others. Training and validation accuracy, training, and validation loss curve on three datasets are shown in Fig. 7. These plots indicated that the proposed model better learns the parameters for the prediction on each of mentioned datasets.

Conclusion

Automatic segmentation of H&E stained cell nuclei from histopathology image is a pre-requisites in digital pathology. In this paper, a CNN-based architecture called high-resolution deep transferred ASPPU-Net was proposed that addressed automatic nuclei segmentation of histopathology images having a widely varied spectrum with a large number of artifacts. The implemented networks effectively leverage the strength of residual learning as well as encoder–decoder architecture by incorporating wide and deep network paths that strengthen the intermediate features. Promising results were obtained due to the effective use of the wide and deep network with ASPP at the bottleneck layer. To prove the worthiness of the proposed architecture, we have used the most preferred performance matrices F1 score and AJI score by performing experiments on three different publically available datasets. The proposed model achieved a considerable margin in terms of F1 score and AJI score over state-of-the-art models and works effectively for three histopathological datasets.

Although the proposed model produced excellent results, the segmentation of overlapped nuclei is still a challenge for some histopathology images and reported results are still sub-optimal for clinical use. Furthermore, vanished and blurry boundaries of detected nuclei are another issue. The problem of over-segmentation and under-segmentation for nuclei associated with complex histopathology images has not been solved completely. These issues will be the focus of our future work.