Abstract
Automated medical image segmentation plays an important role in many clinical applications, which however is a very challenging task, due to complex background texture, lack of clear boundary and significant shape and texture variation between images. Many researchers proposed an encoder–decoder architecture with skip connections to combine low-level feature maps from the encoder path with high-level feature maps from the decoder path for automatically segmenting medical images. The skip connections have been shown to be effective in recovering fine-grained details of the target objects and may facilitate the gradient back-propagation. However, not all the feature maps transmitted by those connections contribute positively to the network performance. In this paper, to adaptively select useful information to pass through those skip connections, we propose a novel 3D network with self-supervised function, named selective information passing network. We evaluate our proposed model on the MICCAI Prostate MR Image Segmentation 2012 Grant Challenge dataset, TCIA Pancreas CT-82 and MICCAI 2017 Liver Tumor Segmentation Challenge dataset. The experimental results across these datasets show that our model achieved improved segmentation results and outperformed other state-of-the-art methods. The source code of this work is available at https://github.com/ahukui/SIPNet.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Medical image segmentation is an essential part of medical image analysis. Accurate segmentation of medical image provides very useful information for computer-aided diagnosis and treatment of cancers as well as other diseases [1]. For instance, segmentation of the liver and tumors plays an important role in hepatocellular carcinoma diagnosis [2]. Accurate prostate segmentation is useful for treatment planning and therapeutic procedures for prostate cancer [3,4,5]. However, automated medical image segmentation is very challenging for several reasons. Taking prostate segmentation as an example: First, due to many slices only have small part of segmented tissues specifically at the apex and base, which always led to those slices lack of clear boundary and make the automated segmentation fail. Second, imaging artifacts always distribute in the whole image randomly, which negatively influence the process of segmentation. Third, tissues can have a wide variation in size and shape among different slices, which adds to the complexity of segmentation. Fourth, the complex background and fuzzy boundary also make the segmentation process challenging. Furthermore, different from natural images dataset, the size of available medical image dataset is limited. Figure 1 shows examples of prostate MR images. Figure 1a shows the phenomenon that imaging artifacts locate in prostate region. Figure 1b shows prostate region lacks clear boundary. Figure 1c shows the prostate and surrounding tissues have similar intensity distribution. All of above phenomena bring challenges for automated medical image segmentation.
To overcome the above challenges, over the past few decades, various methods have been developed for medical image segmentation, including machine learning-based methods [6,7,8,9,10,11,12], level sets [13], atlas-based methods [14,15,16], super-pixel segmentation [17] and active shape model [18, 19]. Recently, deep convolutional neural networks (CNNs) have become the dominant machine learning approach due to their superior performance. CNNs have achieved state-of-the-art performances in many fields including computer vision [12, 20,21,22,23], natural language processing (NLP) [24,25,26] and medical image analysis [27]. The superiority of CNNs [28] can be partially attributed to the ability of learning hierarchical representation of the data.
However, medical image segmentation has a higher-level requirement of accuracy than natural image segmentation, where many excellent networks, such as VGG [29] and FCN [30], cannot be directly utilized. To obtain accurate segmentation results and overcome the problems specific to medical imaging, specific models have been proposed for medical image analysis. For instance, Milletari et al. [31] proposed a network architecture based on the volumetric CNNs, which can segment prostate volumes in a fast and accurate manner. Yu et al. [32] proposed a novel volumetric CNN with mixed long and short residual connections for automated prostate segmentation. Gibson et al. [33] proposed a network called DenseVNet, which can segment the pancreas, esophagus, stomach, liver, spleen, gallbladder, left kidney and duodenum accurately. Li et al. [34] proposed a novel hybrid densely connected U-Net for liver and tumor segmentation. One thing that these medical image segmentation networks have in common is an encode and decode architecture with skip connections for combining low-level feature maps from the encoder path with high-level feature maps from the decoder path. There is no doubt that the skip connections are effective in recovering fine-grained details of the target objects and help the gradient back-propagation. However, as a lot of information can be passed through those skip connections, do all the feature maps transmitted by those connections always contribute positively to the network performance?
To answer this question, we analyzed the behavior of the classical U-Net [35] with and without the long skip connections on the task of prostate segmentation. The segmentation results are shown in Fig. 2. Compared with ground truth segmentation, U-Net can obtain finer details and higher accuracy in general. However, the segmentation result of fully convolutional network (FCN) [30] is smoother and that of U-Net picks up non-prostate regions when those areas are highly inhomogeneous. To make the long skip connections inside the network select the useful information and further improve medical image segmentation performance, in this paper, we propose a novel 3D convolutional network, named SIP-Net. Our proposed SIP-Net adopts densely connected residual blocks (DRBs) and attention-focused modules (AMs). The contributions of this work are summarized as follows.
-
Inspired by the attention mechanism, we propose to integrate attention-focused modules into our model to make the long connections transmit mainly useful features and reduce the negative impact of noise from feature maps. That makes the long connections focus more on the regions to be segmented and the irrelevant noise features from the background and surrounding tissues may be suppressed during feature transmission.
-
In the same time, to overcome the problem of small size of medical image data, we integrate three different types of connections seamlessly into our proposed model. Together with the above attention-focused modules, these connections improve training efficiency and feature extraction capability of the network by enhancing information propagation and encouraging feature reuse.
-
To reduce the computational load and more importantly the number of network parameters for alleviating the potential overfitting problem, we design a modified dense block to construct deeper network, which possesses more than 90 convolutional layers but fewer parameters. Our experimental results show that the proposed model is effective in addressing the problems of complex background, fuzzy boundary and large shape variations.
The remainder of the paper is organized as follows. Section 2 provides a brief survey of related works. Section 3 describes the details of the proposed 3D segmentation network model. In Sect. 4, various experiments of segmenting prostate MR images, pancreas CT images and liver CT images are performed to validate the proposed model. The performance of the proposed method is further discussed through ablation studies in Sect. 5. Finally, several concluding remarks are drawn in Sect. 6.
2 Related works
In this section, we give a brief review of deep learning techniques for semantic image segmentation. We first review the methods for natural image segmentation and then discuss the ones specialized for medical image segmentation.
2.1 Deep learning for semantic segmentation
Semantic segmentation is a critical component in image understanding. The task of semantic segmentation is to assign a categorical label to every pixel in an image. Over the past few years, deep learning-based methods and in particular convolutional networks (CNNs) have improved segmentation results remarkably in pixel-wise semantic segmentation tasks. This success can be attributed to the ability of hierarchical representation of CNNs. Fully convolutional networks (FCNs) mark a major milestone in CNN-based semantic segmentation [30], which is trained end-to-end to perform pixels-to-pixels segmentation. Since then, FCNs have dominated the field of semantic image segmentation with a number of extensions. For instance, Li et al. [36] extended the FCN model for instance-aware semantic segmentation. The model significantly improves the segmentation performance in both accuracy and efficiency.
In the same time, researchers develop deeper and more powerful CNN models to extract more discriminating and complex representation features. For example, Simonyan et al. [37] proposed a 19-layer network, the famous VGG-19 model, to investigate the effect of the depth of CNNs on their accuracy in large-scale image recognition. He et al. [29] presented a residual learning framework to ease the training of very deep networks. Based on this framework, the author proposed a 101-layer model (ResNet-101) and a 152-layer model (ResNet-152) and won the first places in several tracks in ILSVRC & COCO 2015 competitions.Footnote 1 Soon after that, Wu et al. [38] proposed a method for high-performance semantic image segmentation based on the deep residual networks, which achieves the state-of-the-art performance.
2.2 Deep learning for medical image segmentation
Recently, deep CNNs have also become the dominant approach for medical image segmentation. Many researchers have employed various CNN models to segment images from different medical imaging modalities. In our previous work, we proposed a deeply supervised CNN model [39], which employs additional supervised layers and utilizes the residual information to segment the prostate from MR image. To exploit the information from different views of volumetric images but without using 3D convolutions, Mortazi et al. [40] proposed a multi-view CNN to segment structures from cardiac MR images by using an adaptive fusion strategy. Han [41] proposed a 2.5D model to segment liver tumors, which takes a stack of adjacent slices as input and produces the segmentation map corresponding to the center slice.
To fully exploit 3D spatial information in volumetric MR images, a few studies employed 3D convolutions to build CNNs. For example, Li et al. [34] proposed a novel hybrid densely connected U-Net for liver and tumor segmentation. The proposed model consists of a 2D Dense-U-Net and a 3D counterpart, which can extract intra-slice features and hierarchically aggregate 3D contexts under the spirit of the auto-context algorithm [42]. Chen et al. [43] extended deep residual learning into a 3D for 3D brain segmentation. This model also seamlessly integrates the low-level image appearance features, implicit shape information and high-level context together for further improving the 3D segmentation performance. Recently, Yu et al. [44] proposed a novel densely connected volumetric CNN, which adopts the 3D fully convolutional architecture to automatically segment cardiac and vascular structures from 3D cardiac MR images.
Compared with 2D networks, these 3D networks were able to achieve better segmentation performance. However, 3D CNNs have a much larger number of parameters and computational complexity than 2D networks. Due to the limited size of typical medical image dataset, it makes the network difficult to train. Furthermore, the trained network easily suffers from overfitting. Therefore, there is still much need in pushing the potential of CNNs by effectively extracting the information from limited training data to improve the segmentation performance and also reduce the complexity of the networks to avoid overfitting.
3 Methods
In this section, we first give an overview of the proposed SIP-Net and then discuss each module of the model in detail.
3.1 SIP-Net
In order to fully use the 3D spatial contextual information of volumetric data to accurately segment medical images, in this paper, we propose a 3D CNN with densely connected residual blocks (DRBs) and attention-focused modules (AMs), named SIP-Net. The overall structure is shown in Fig. 3. The proposed SIP-Net contains two paths: down-sampling path and up-sampling path. The down-sampling path consists of one convolutional block, three DRBs and three average pooling layers. The pooling layers use stride of 2, which gradually reduces the resolution of feature map and increases the receptive field of the convolutional layers. To obtain accurate segmentation result in the original image resolution, an up-sampling path is implemented, which contains three deconvolutional layers and three DRBs. The deconvolutional layers gradually up-sample the feature maps until reaching the original input size. The overall illustration and detailed structure of proposed network are shown in Fig. 3a and Table 1, respectively.
In our proposed SIP-Net, we could have used the long connections between the down-sampling path and up-sampling path to connect the blocks in the same resolution level in the down-sampling and up-sampling paths. However, our study shows that simply adding the long connections may cause noisy segmentation by considering part of noise as shown in Fig. 2. To make the network focus more on the segmented region and reduce the negative influence from background and surrounding tissues, in this paper, we employ the attention mechanism in our proposed model. Inspired by the attention mechanism in residual attention network [45], three attention-focused modules are used in up-sampling path, which reduces irrelevant noise in background and surrounding tissues and holds segmenting features from down-sampling path and make the network focus more on the areas to be segmented in the up-sampling path.
In addition, to enforce the attention-focused modules to act effectively as information pass filters, we also integrate a deep supervision mechanism [46] for the attention-focused modules. An additional supervision layer is added after each deconvolutional layer. Each of the three additional supervision layers consists of one up-sampling layer for enlarging the feature map to its original size and one convolutional layer for obtaining the segmentation output as shown in Fig. 3a. Those additional supervision layers bring two advantages. First, it helps to supervise the attention-focused modules to produce accurate attention masks to guide information passing. Second, it can accelerate the network convergence speed during training due to the shorter backpropagation paths from the additional supervision outputs.
In total, our proposed SIP-Net has more than 100 layers in depth including convolutional layers, pooling layers, layers in dense blocks, transitional layers, dropout layers and deconvolutional layers. The dense layers contain different numbers of BN-ReLU-Conv(1 × 1 × 1)-BN-ReLU-Conv(3 × 3 × 3) with growth rate of \(k = 32\). The transition layer is implemented using a BN-ReLU-Conv(1 × 1 × 1) layer. After each Conv(3 × 3 × 3) layer, a dropout layer with 0.3 dropout rate is added to help deal with the potential overfitting problem. The designs of DRB and AM are shown in Fig. 3b, c and the details are given in the rest of this section.
3.2 Densely connected residual block (DRB)
Let \({x_l}\) be the output of the lth convolutional layer, which can be considered as the result of applying a nonlinear transformation \(H_l\) defined as a convolution followed by a batch-normalization and a rectified linear unit (ReLU) in the lth layer. And \({x_0}\) denotes the input data sample passed to the CNN. For a classical CNN layer with straightforward connection, \({x_l}\) can be modeled as
where \(x_{l - 1}\) is the output of the \((l - 1)\)th layer. However, when a network goes deeper, the network suffers from the degradation problem—the gradient may vanish or explode. This phenomenon leads to large training errors and the network training may not converge.
To alleviate the problem by promoting information propagation within the network, in this paper, we propose a new block by combining dense block [47] with residual connection as show in Fig. 3b. The dense connected layers provide a directly connects with all subsequent layers. The feature maps produced by all the preceding layers are concatenated as input for the subsequent layers. Consequently, the lth layer receives all feature maps produced by \([0,1,...,l - 1]\) layers as inputs. The output of the lth layer is then defined as
where \([{x_0}, {x_1}, \ldots , {x_{l - 1}}]\) represent the concatenation of the feature maps.
To reduce the number of features and efficiently fuse the features from dense layers, a transition layer is added at the end of each dense block. The transition layer consists of a 1 × 1 convolution layer, a batch-normalization and a ReLU. The out of the transition layer is
where \({H_t}\) is a nonlinear transformation of transition layer. To further promote information propagation and make the network easier to optimize, we also employ residual connection into our block.
3.3 Attention-focused module (AM)
To make the network focus more on the region to be segmented and to reduce noise features from the surrounding region, we introduce an attention-focused module in our model. The structure of attention-focused module is shown in Fig. 3c, which consists of a sigmoid layer and an element-wise multiplication layer. The output of AM is the element-wise multiplication of input feature-maps and attention masks. The attention masks are produced by sigmoid layer:
where \({M_t}(x)\) denotes the attention mask, whose values range in [0, 1], \({H_t}(x)\) denotes the feature map from long connection.
4 Experiments
To evaluate the performance of our proposed model, we applied the developed method on the MICCAI Prostate MR Image Segmentation 2012 Grant Challenge dataset,Footnote 2 TCIA Pancreas CT-82Footnote 3 and MICCAI 2017 Liver Tumor Segmentation (LiTS) Challenge datasetFootnote 4 for image segmentation.
4.1 Implementation details
The proposed method is implemented using the open source deep learning library Keras. Our network is trained end-to-end by using the stochastic gradient descent (SGD) optimization method. In the training phase, the learning rate is initially set to 0.0001 and decreased with a weight decay of 10e−6. The momentum is set to 0.9. Experiments are carried out on a NVIDIA GTX 1080ti GPU with 11GB memory.
4.2 Prostate segmentation from MR image
We first evaluated our proposed method on MICCAI 2012 Prostate MR Image Segmentation (PROMISE12) challenge dataset. There are in total 50 transversal T2-weighted MR images of the prostate and the corresponding ground truth segmentation, which were checked and corrected by a radiological resident with more than 6 years of experience in prostate MRI. These images are a representative set of the types of MR images acquired in different hospitals. And these images are from multiple vendors and have different acquisition protocols and variations in voxel size, dynamic range, position, field of view and anatomic appearance. To evaluate the proposed algorithms, the organizers provide 30 testing MR images and the corresponding ground truth is held out.
Before training the network, we resampled all MR volumes into a fixed resolution of 0.625 × 0.625 × 1.5mm and then normalized them as zero mean and unit variance. To facilitate network training, we applied data augmentation operations including rotation, scaling and flipping. During training, we adopted a random cropping strategy, where sub-volumes in the size of 16 × 96 × 96 (\(d\times w \times h\)) voxels are randomly cropped from the training data during every iteration. In the testing phase, similar to the works in [32, 44], we used overlapping sliding windows to crop sub-volumes and used the average of the probability maps of these sub-volumes to get the whole volume prediction. The sub-volume size was also 16 × 96 × 96 and the stride was 8 × 48 × 48. Due to the limitation of the memory, we used the mini-batch size of 4. The number of parameters of SIP-Net was 3.16M, and the prediction time was approximately 1 min for one MR volume.
Several sample results of our proposed method are shown in Fig. 4. It can be seen that our model can accurately segment the prostate and obtain smooth and continuous prostate boundaries. Quantitative evaluation was also performed. The evaluation metrics used in PROMISE12 challenge include Dice similarity coefficient (DSC), percentage of the absolute difference between the volumes (aRVD), average over the shortest distance between the boundary points of the volumes (ABD) and Hausdorff distance (HD). All the evaluation metrics are calculated in 3D. In addition to evaluating these metrics over the entire prostate segmentation, the challenge organizers also calculated the boundary measures specifically for the apex and base parts of the prostate, because those parts are difficult to segment but in the same time very important for many clinical applications. The apex and base the prostate are determined by dividing the prostate into three approximately equal sized parts along the axial direction (the first 1/3 as apex and the last 1/3 as base). Then an overall score will be computed by taking all the criteria into consideration rank the algorithms.
The results of our proposed method and the competitors are shown in Table 2. Only the top 10 teams are listed. Note that all the results reported in this section were obtained directly from the challenge website. As it can be seen from the table, our overall performance was the best and therefore ranked the first place among all the teams (by May 22, 2018)Footnote 5 with the score of 89.18. From Table 2, it can be seen that our proposed model achieved the best performance in several measures. The segmentation results of our model were the best not only for whole prostate segmentation, but also in the base and apex areas, which demonstrates the effectiveness of the proposed 3D model with DRBs and AM modulated long connections.
4.3 Pancreas segmentation
The proposed model is also evaluated on another publicly available dataset—TCIA Pancreas CT-82. This dataset contains 82 contrast enhanced 3D CT scans, which have resolutions of 512 × 512 pixels with varying pixel sizes and slice thickness between 1.5 and 2.5 mm, acquired on Philips and Siemens MDCT scanners [52]. The dataset is publicly available and commonly used to benchmark CT pancreas segmentation frameworks. In our experiments, the 82 scans are randomly split with 62 images for training and 20 images for testing. Before training the model, we resampled all volumes into a fixed resolution of 1.0 mm × 1.0 mm × 1.0 mm. Then all the scans are normalized to have zero mean and unit variance. We again applied data augmentation operations including rotation, scaling and flipping. We also employed the random cropping strategy, where sub-volumes in the size of 64 × 96 × 96 (\(d \times w \times h\)) voxels are randomly cropped from the training data during every iteration. In the testing phase, we used overlapping sliding windows to crop sub-volumes and used the average probability maps of these sub-volumes to get the whole volume prediction. The sub-volume size was also 64 × 96 × 96, and the stride was 32 × 48 × 48. The architecture of network was same as that utilized on prostate segmentation. The prediction time was approximately 1 min for one CT volume.
To evaluate the proposed architecture, we compare the performance of the model against other state-of-the-art CT pancreas segmentation methods. The results are summarized in Table 3. It can be seen that our proposed model achieved 83.9 ± 4.51 in DSC for pancreas labels, which outperform other state-of-the-art methods. Several example segmentation results of our proposed method are shown in Fig. 5. Our proposed model can accurately segment the pancreas from CT images. It is worth noting that we only employ a single model to segment pancreas and our model does not require multiple CNN models as in [48].
4.4 Liver segmentation
We also tested our proposed model on the competitive dataset of MICCAI 2017 LiTS Challenge, which contains 131 contrast enhanced 3D abdominal CT scans with radiologist hand-drawn ground truths for training and the rest 70 used for testing with unreleased ground truth. Since the data were acquired from different clinical sites, which have different scanners and protocols, the scans have largely varying in-plane resolution (0.55–1.0 mm) and slice spacing (0.45–6.0 mm). Before training the model, we truncated the image intensity values of all scans to the range of [− 200,200] to remove the irrelevant details and then normalized each volume. In addition to 3D model, we also evaluate the 2D model with same network structures for evaluating the influence of parameters (Table 4).
During the network training, we randomly cropped patches in the size of 224 × 224 × 16 pixels for 3D model (224 × 224 pixels for 2D model) from the training data during every iteration. In the testing phase, we used overlapping sliding windows to crop sub-volumes and used the average probability maps of these sub-volumes to get the whole volume prediction. The cropped size was also 224 × 224 × 16 pixels for 3D model (224 × 224 pixels for 2D model) and the stride was 112 × 112 × 8 for 3D model (112 × 112 for 2D model). The number of parameters of SIP-Net (2D) was 1.43M, and the prediction time was approximately 2 min for one CT volume.
There were more than 60 submissions for the MICCAI LiTS Challenge. The segmentation performances of the teams are listed on the leaderboardFootnote 6 and we were among the top seven teams (by November 15, 2018, team of Qikui_sigma-RPI). We compared the performance of our model with two published top-performance models: H-DenseUNet [34] and CascadedResNet [53]. H-DenseUNet employed a simple ResNet to process the original data, which makes the network subject to the performance of pro-processing. In addition, H-DenseUNet employed 3D convolutional layers inside the model with much more parameters and thus increased training difficulty. CascadedResNet, on the other hand, achieved good results but took approximately 7 days on two Titan × GPUs for training. Our proposed method performs similarly to the above two approaches with negligible differences, however, can be trained much more efficiently than those methods. Comparing the performance of 2D and 3D models reveals that the 2D model can even obtain better performance. This indicates that the network architecture is the key for the performance gain and a larger number of network parameters may lead to performance decrease.
5 Discussions
In this section, we provide in-depth discussions of the effects of some of our proposed components.
5.1 Ablation study of network structure
In order to evaluate the effectiveness of the residual connections in dense blocks, the long connections and attention-focused modules used in our model, we performed a set of ablation study experiments. The prostate MR image dataset was used. We randomly selected 10 patients for validation, and the rest 40 patients were utilized for training.
To analyze the learning behaviors of our model, we created four different configurations of our model: using only dense block (D-Net), using only DRBs (DR-Net), using DRBs and long connections (DRL-Net), using DRBs, long connections and attention-focused module (SIP-Net). We first analyzed the leaning behaviors of these models. Figures 6 and 7 present the training and validation losses of different networks. It is observed that the models with either residual connections, long connections and attention-focused module converge faster and achieve lower validation loss than the one with only dense block, which demonstrates that the use of residual connections, long connections and attention-focused modules can improve the training efficiency and the performance of the models. Figure 7 further shows that the long connections can accelerate the convergence speed and alleviate the risk of over-fitting on limited training data.
Table 5 shows the performance of our proposed model with different connections and blocks. It is can be seen that adding residual connections, long connections and attention-focused modules can achieve better Dice scores than the network with only dense blocks. The network with residual connections and dense block has marginally better performance than that with only dense block, which demonstrates that the enhanced information propagation inside each block can improve the performance of the model. The model with long connections obtained better performance than the one without. It is conceivable that enhancing information propagation both locally and globally inside the model and combining them together can further improve the performance. The network with attention-focused modules achieves the best performance in the ablation experiments, indicating that attention-focused module further improves the performance of model.
To demonstrate the efficiency of the proposed method in utilizing training data, we compare the performance of the model against that of FCN, which is indeed the version of our model without DRB and AM, using different amount of training data. In this experiment, we, respectively, used 40%, 50%, 60%, 70% and 80% of data for training and reserved up to 20% of the data for testing. To avoid potential data distribution bias, in each setting, we randomly selected five different subsets from the entire dataset for training and testing. The average performances over the five runs under each setting are reported and shown in Fig. 8. It can be seen that, when only 40% of the training data were used, the proposed method and FCN achieved similar performance. The performances are poor as the training data is very limited in that case as we expected. As the size of the training dataset increases, both methods start to perform better. However, Fig. 8 shows that the proposed method improves in a much faster rate, with the contribution from the proposed DRB and AM modules. Eventually, the proposed model only needs less than 70% of the training data to outperform the FCN trained with the entire 80% of the data. The experiment demonstrates that the proposed structures can help deep CNNs get trained more efficiently with small number of training images, which is a very desired property for medical imaging applications where labeled data are usually in scarce.
5.2 Analysis of attention-focused modules
To further analyze the function of attention-focused modules, we visualized the generated attention masks in the up-sampling path. Four different types of input images were selected, which are selected from base, mid-gland, apex and also outside of the prostate. It can be seen that the attention masks have much higher weights in the prostate region than in the non-prostate region as shown in Fig. 9. And the shape of attention mask was very close to the ground truth. It is conceivable that higher weight was inside the attention masks, which helps to locate the region of prostate. The shape of the attention mask volume was again close to the ground truth. It suggests that the attention mask can help the network pay more attention to the region of prostate and suppress the features from the non-prostate region towards better image segmentation.
5.3 Effects of batch size
To evaluate the influence of batch size on the segmentation results, we compared the performance of our proposed model under various batch size. The prostate MR image dataset also was used, 10 patients were randomly selected for validation and the rest 40 patients were utilized for training. The segmentation performance is listed in Table 6. It can be seen that the size of batch has a slight effect on the segmentation results and the model performed the best when batch size is 4.
6 Conclusions
In this paper, we first prove that not all the feature maps transmitted by skip connections contribute positively to the network performance. And to adaptive select information passed through those skip connections, we propose a novel network, named SIP-Net, which can adaptive select the information passed through those skip connections by our proposed attention-focused modules. Expect for making the skip connections between the down-sampling path and up-sampling path can further improve the context and gradient information propagation both forward and backward and address the vanishing-gradient problem, our proposed SIP-Net also makes the model focus on the region of interest. Extensive experiments on the publicly available MICCAI Prostate MR Image Segmentation 2012 Grant Challenge dataset, TCIA Pancreas CT-82 and MICCAI 2017 Liver Tumor Segmentation (LiTS) Challenge dataset demonstrate that our proposed method can get more accurate boundaries and achieve superior results compared with other state-of-the-art methods.
Notes
References
Zhu Y, Zhuang F, Wang J, Chen J, Shi Z, Wu W, He Q (2019) Multi-representation adaptation network for cross-domain image classification. Neural Netw 119:214–221
Heimann T, Van Ginneken B, Styner MA, Arzhaeva Y, Aurich V, Bauer C, Beck A, Becker C, Beichel R, Bekes G et al (2009) Comparison and evaluation of methods for liver segmentation from CT datasets. IEEE Trans Med Imaging 28(8):1251–1265
Liao S, Gao Y, Oto A, Shen D (2013) Representation learning: a unified deep learning framework for automatic prostate MR segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 254–261
Zhu Q, Du B, Yan P (2019) Boundary-weighted domain adaptive neural network for prostate MR image segmentation. IEEE Trans Med Imaging 39(3):753–763
Zhu Q, Du B, Turkbey B, Choyke P, Yan P (2018) Exploiting interslice correlation for MRI prostate image segmentation, from recursive neural networks aspect. Complexity 2018(1):10–18
Du B, Wei Q, Liu R (2019) An improved quantum-behaved particle swarm optimization for endmember extraction. IEEE Trans Geosci Remote Sens 57(8):6003–6017
Wu J, Hong Z, Pan S, Zhu X, Zhang C, Cai Z (2014) Multi-graph learning with positive and unlabeled bags. In: Proceedings of the 2014 SIAM international conference on data mining. SIAM, pp 217–225
Bi X, Wang H (2019) Early alzheimers disease diagnosis based on eeg spectral images using deep learning. Neural Netw 114:119–135
Li X, Du B, Xu C, Zhang Y, Zhang L, Tao D (2020) Robust learning with imperfect privileged information. Artif Intell 282:103246
Wu J, Zhu X, Zhang C, Philip SY (2014) Bag constrained structure pattern mining for multi-graph classification. IEEE Trans Knowl Data Eng 26(10):2382–2396
Liu F, Xue S, Wu J, Zhou C, Hu W, Paris C, Nepal S, Yang J, Yu PS (2020) Deep learning for community detection: progress, challenges and opportunities. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, IJCAI-20. Survey track, pp 4981–4987
Wu J, Cai Z, Zeng S, Zhu × (2013) Artificial immune system for attribute weighted naive bayes classification. In: The 2013 international joint conference on neural networks (IJCNN). IEEE, pp 1–8
Qin X, Li X, Liu Y, Lu H, Yan P (2014) Adaptive shape prior constrained level sets for bladder MR image segmentation. IEEE J Biomed Health Inform 18(5):1707–1716
Huo Y, Liu J, Xu Z, Harrigan RL, Assad A, Abramson RG, Landman BA (2018) Robust multicontrast MRI spleen segmentation for splenomegaly using multi-atlas segmentation. IEEE Trans Biomed Eng 65(2):336–343
McIntosh C, Purdie TG (2016) Contextual atlas regression forests: multiple-atlas-based automated dose prediction in radiation therapy. IEEE Trans Med Imaging 35(4):1000–1012
Zhu H, Cheng H, Yang X, Fan Y (2016) Metric learning for label fusion in multi-atlas based image segmentation. In: 13th international symposium on biomedical imaging (ISBI). IEEE, pp 1338–1341
Gao Q, Asthana A, Tong T, Rueckert D et al (2014) Multi-scale feature learning on pixels and super-pixels for seminal vesicles MRI segmentation. In: Medical imaging 2014: image processing international society for optics and photonics, vol 9034, p 903407
Yan P, Xu S, Turkbey B, Kruecker J (2011) Adaptively learning local shape statistics for prostate segmentation in ultrasound. IEEE Trans Biomed Eng 58(3):633–641
Gloger O, Tönnies K, Laqua R, Vjlzke H (2015) Fully automated renal tissue volumetry in mr volume data using prior-shape-based segmentation in subject-specific probability maps. IEEE Trans Biomed Eng 62(10):2338–2351
Luo F, Zhang L, Du B, Zhang L (2020) Dimensionality reduction with enhanced hybrid-graph discriminant learning for hyperspectral image classification. IEEE Trans Geosci Remote Sens 58(8):5336–5353. https://doi.org/10.1109/TGRS.2020.2963848
Lu X, Chen Y, Li X (2018) Hierarchical recurrent neural hashing for image retrieval with hierarchical convolutional features. IEEE Trans Image Process 27(1):106–120
Wu J, Pan S, Zhu X, Cai Z (2014) Boosting for multi-graph classification. IEEE Trans Cybern 45(3):416–429
Luo F, Zhang L, Zhou X, Guo T, Cheng Y, Yin T (2019) Sparse-adaptive hypergraph discriminant analysis for hyperspectral image classification. IEEE Geosci Remote Sens Lett 17(6):1082–1086. https://doi.org/10.1109/LGRS.2019.2936652
Lai S, Xu L, Liu K, Zhao J (2015) Recurrent convolutional neural networks for text classification. AAAI 333:2267–2273
Goldberg Y (2016) A primer on neural network models for natural language processing. J Artif Intell Res 57:345–420
Johnson R, Zhang T (2015) Semi-supervised convolutional neural networks for text categorization via region embedding. In: Advances in neural information processing systems, pp 919–927
Kim B-C, Yoon JS, Choi J-S, Suk H-I (2019) Multi-scale gradual integration CNN for false positive reduction in pulmonary nodule detection. Neural Netw 115:1–10
Wu J, Zhu X, Zhang C, Cai Z (2013) Multi-instance multi-graph dual embedding learning. In: 2013 IEEE 13th international conference on data mining. IEEE, pp 827–836
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Milletari F, Navab N, Ahmadi S-A (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth international conference on 3D vision (3DV). IEEE, pp 565–571
Yu L, Yang X, Chen H, Qin J, Heng P-A (2017) Volumetric convnets with mixed residual connections for automated prostate segmentation from 3D MR images. In: AAAI, pp 66–72
Gibson E, Giganti F, Hu Y, Bonmati E, Bandula S, Gurusamy K, Davidson B, Pereira SP, Clarkson MJ, Barratt DC (2018) Automatic multi-organ segmentation on abdominal CT with dense V-networks. IEEE Transact Med Imaging 37(8):1822–1834. https://doi.org/10.1109/TMI.2018.2806309
Li X, Chen H, Qi X, Dou Q, Fu C-W, Heng P-A (2018) H-DenseUNet: hybrid densely connected unet for liver and tumor segmentation from CT volumes. IEEE Transact Med Imaging 37(12):2663–2674. https://doi.org/10.1109/TMI.2018.2845918
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Li Y, Qi H, Dai J, Ji X, Wei Y (2016) Fully convolutional instance-aware semantic segmentation. arXiv preprint arXiv:1611.07709
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Wu Z, Shen C, van den Hengel A (2016) High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339
Zhu Q, Du B, Turkbey B, Choyke PL, Yan P (2017) Deeply-supervised CNN for prostate segmentation. In: 2017 International joint conference on neural networks (IJCNN). IEEE, pp 178–184
Mortazi A, Karim R, Rhode K, Burt J, Bagci U (2017) Cardiacnet: segmentation of left atrium and proximal pulmonary veins from MRI using multi-view CNN. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 377–385
Han × (2017) Automatic liver lesion segmentation using a deep convolutional neural network method. arXiv preprint arXiv:1704.07239
Tu Z (2008) Auto-context and its application to high-level vision tasks. In: IEEE conference on computer vision and pattern recognition, 2008. IEEE, pp 1–8
Chen H, Dou Q, Yu L, Heng P-A (2016) Voxresnet: deep voxelwise residual networks for volumetric brain segmentation. arXiv preprint arXiv:1608.05895
Yu L, Cheng J-Z, Dou Q, Yang X, Chen H, Qin J, Heng P-A (2017) Automatic 3D cardiovascular MR segmentation with densely-connected volumetric convnets. In: International conference on medical image computing and computer-assisted intervention, pp 287–295
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, Wang X, Tang × (2017) Residual attention network for image classification. arXiv preprint arXiv:1704.06904
Lee C-Y, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: CVPR, vol 1, p 3
Roth HR, Lu L, Lay N, Harrison AP, Farag A, Sohn A, Summers RM (2018) Spatial aggregation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation. Med Image Anal 45:94–107
Cai J, Lu L, Xie Y, Xing F, Yang L (2017) Improving deep pancreas segmentation in CT and MRI images via recurrent neural contextual learning and direct loss function. arXiv preprint arXiv:1707.04912
Zhou Y, Xie L, Shen W, Wang Y, Fishman EK, Yuille AL (2017) A fixed-point model for pancreas segmentation in abdominal CT scans. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 693–701
Oktay O, Schlemper J, Le Folgoc L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B et al (2018) Attention U-Net: learning where to look for the pancreas. arXiv preprint arXiv:1804.03999
Roth HR, Farag A, Turkbey EB, Lu L, Liu J, Summers RM (2016) Data from pancreas-CT. Cancer Imaging Arch. https://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU
Bi L, Kim J, Kumar A, Feng D (2017) Automatic liver lesion detection using cascaded deep residual networks. arXiv preprint arXiv:1704.02703
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhu, Q., Li, L., Hao, J. et al. Selective information passing for MR/CT image segmentation. Neural Comput & Applic 35, 13007–13020 (2023). https://doi.org/10.1007/s00521-020-05407-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05407-3