Abstract
Abdominal multi-organ segmentation is fast becoming a key instrument in preoperative diagnosis. Using the results of abdominal CT image segmentation for three-dimensional reconstruction is an intuitive and accurate method for surgical planning. In this paper, we propose a stable three-stage fast automatic segmentation method for abdominal 13 organs: liver, spleen, pancreas, right kidney, left kidney, stomach, gallbladder, esophagus, aorta, inferior vena cava, right adrenal gland, left adrenal gland, and duodenum. Our method includes preprocessing the CT data, segmenting the multi-organ and post-processing the segmentation outputs. The results on the test set show that the average DSC performance is about 0.766. The average time and GPU memory consumption for each case is 81.42 s and 1953 MB.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Abdominal multi organ segmentation is of great significance in medical diagnosis and research. Through pixel level segmentation of CT or MRI and three-dimensional reconstruction of the segmentation results, doctors can obtain more intuitive information of patients’ abdominal organs [3, 4, 10, 17, 20]. In recent years, medical image automatic segmentation algorithm has made a great breakthrough. Methods based on deep learning has achieved excellent performance in this task [9, 12, 18, 19]. The deep learning technology based on neural networks can achieve fast segmentation, and effectively solve the problem of low accuracy and long time-consuming image segmentation [8, 15]. The research in recent years mainly focuses on the network structure and segmentation framework. At present, the most widely used network structure is the encoding-decoding shaped structure similar to U-Net [14], such as 3D U-Net [1] and V-Net [13], and nnU-Net [7] has also achieved excellent results in the field of segmentation framework. For example, in the MICCAI challenge 2019 kits19 competition, the accuracy of nnU-Net using 3D U-Net in the task of kidney segmentation is very close to that of human, but the required time to complete a segmentation is far less than that of manual segmentation. The deep learning-based methods not only surpass the traditional algorithms, but also approach the accuracy of manual segmentation. However, previous published studies are limited to be used on low-configuration devices.
In this paper, we propose a stable three-stage automatic segmentation method for abdominal 13 organs: liver, spleen, pancreas, right kidney, left kidney, stomach, gallbladder, esophagus, aorta, inferior vena cava, right adrenal gland, left adrenal gland, and duodenum. Our method can complete the segmentation task, including preprocessing the CT data, segmenting the multi-organ and finally post-processing the segmentation outputs, with low GPU memory occupation.
2 Methods
2.1 Preprocessing
In the preprocessing stage, we first standardize the spacing of CT. Due to the amount of available GPU memory, the patch size that can be processed in 3D CNNs is typically quite limited. Thus, the target spacing, which directly impacts the total size of the images in voxels, also determines how much contextual information the CNN can capture in its patch size. We reshape all the data with the voxel spacing of \(4.4 \times 2.5 \times 2.5\) mm for the first step and \(3.0 \times 1.6 \times 1.6\) mm for the second step. After spacing standardization, we set the maximum in-plain resolution to \(128 \times 176\) pixels for the first step and \(230 \times 300\) pixels for the second step, so as to prevent data with high original spacing from being too large after the standardization of spacing and resulting in a significant increase in segmentation time.
2.2 Proposed Method
To verify the impact of segmentation pipeline strategy on the results, we used an improved 3D U-Net as the segmentation network. The network architecture is illustrated in Fig. 1. The network includes an encoding path and a decoding path, each of which has four resolution levels. Each level of the encoding path contains two \(3 \times 3 \times 3\) convolution layers, and the convolution layers followed by a ReLu layer and a \(2 \times 2 \times 2\) Maximum pool layer with step size of 2. In the decoding path, each level also contains two \(3 \times 3 \times 3\) convolution layers , and the convolution layers followed by a ReLu layer and an upsampling layer.The summation between Dice loss and cross entropy loss is chosen as the loss function. We used adaptive moment estimation (Adam) as the optimizer. The batch size was set to be 2. The networks were initialized using Kaiming normal initialization. We set the learning rate to be 1e-3 and reduced the learning rate by a multiplier of 0.99 after every 5 epochs until it reached 1e-6.
The pipeline of our method consists of three stages: global locating, organ locating and organ segmentation. Each stage of our method will generate a segmentation result for the complete CT, and the operation of the second and the third steps are based on the previous result. As shown in Fig. 2, in the global locating stage, we first cut the original CT into several ROIs, and then segment each ROI with the first trained neural network. In the organ locating stage, we first locate the region of abdominal organs in the whole CT according to the results of the first step, and then we save this region with a higher resolution and segment it with the second trained network. In the stage of organ segmentation, we locate and crop each organ according to the results of the second step, and then use the corresponding network to fine segment each organ. Finally, we superimpose the segmentation results of each organ to the corresponding position and then generate the feature map of final segmentation result.
In order to further improve the robustness of the network on different data, we adopt the training strategy of semi-supervised learning.Since no research has proved that more unlabeled data in semi supervised learning is better, we set the unlabeled data as much as the number of labeled data. In the training process, we use 40 labeled data and 50 randomly selected unlabeled data as the training set used in the stage of global locating and organ locating. We use the labeled data to train the model in the first 50 epochs, and then introduce the unlabeled data.We use the trained model to segmentation the unlabeled dataset after each five epochs, and we use the results as the label for training. As the first two stages are the segmentation of complete CT, which is different from the third stage, we only use the semi-supervised learning strategy for the first two stages.
2.3 Post-processing
In the post-processing stage, we splice the results of the network segmentation. We keep the region with the largest volume and remove the rest to eliminate isolated incorrectly predicted labels. To improve the segmentation efficiency of our method, we clear the cache and delete the used feature map and the model from the GPU after each step. Finally, the maximum GPU memory we use is 1953MB.
3 Experiments
3.1 Dataset and Evaluation Measures
The FLARE2022 dataset is curated from more than 20 medical groups under the license permission, including MSD [16], KiTS [5, 6], AbdomenCT-1K [11], and TCIA [2]. The training set includes 50 labelled CT scans with pancreas disease and 2000 unlabelled CT scans with liver, kidney, spleen, or pancreas diseases. The validation set includes 50 CT scans with liver, kidney, spleen, or pancreas diseases. The testing set includes 200 CT scans where 100 cases has liver, kidney, spleen, or pancreas diseases and the other 100 cases has uterine corpus endometrial, urothelial bladder, stomach, sarcomas, or ovarian diseases. All the CT scans only have image information and the center information is not available.
The evaluation measures consist of two accuracy measures: Dice Similarity Coefficient (DSC) and Normalized Surface Dice (NSD), and three running efficiency measures: running time, area under GPU memory-time curve, and area under CPU utilization-time curve. All measures will be used to compute the ranking. Moreover, the GPU memory consumption has a 2 GB tolerance.
3.2 Implementation Details
Environment Settings. The environments and requirements are presented in Table 1.
Training Protocols. The Training protocols are presented in Table 2.
3.3 Resource Consumption
The Resource consumption during inference is presented in Table 3.
4 Results and Discussion
As the accuracy metrics, the average DSC between the predicted mask and the ground truth mask were employed. Assume A and B are two masks, the metric is given by (1).
4.1 Quantitative Results on Validation Set
Table 4 compares the experimental data on the segmentation results on 13 organs in the three stages. In the stage of global locating, organ locating and organ segmentation, our method achieves average DSC of 0.63, 0.73 and 0.77 respectively. The highest DSC between the three stages are highlighted in Table 4. It is apparent from this table that the DSC results in stage 3 is significantly higher than the previous stages.
As the models used in the first and the second stage were semi-supervised trained with unlabeled data, we also test the effect of unlabeled data. Table 5 shows the DSC comparison of our method with and without using unlabeled data. It can be observed that the accuracy of our method using unlabeled data has been improved.
4.2 Qualitative Results on Validation Set
Figure 3 shows three examples with good segmentation results on CT slices in validation set. Figure 4 shows the results with voxel-based rendering from three examples in the validation set. In these results, the performance of our method is generally stable.
As shown in Fig. 5, there also have examples with bad segmentation results on CT slices in validation set. In the first case of the bad results, part of the right kidney tumor and pancreas were not correctly recognized. This is because there is not much data with kidney tumors in the training set, and the characteristic boundary between pancreas and surrounding tissues is not particularly obvious. In the second case, our method performs bad on spleen and stomach. The gray value of stomach is abnormally high in CT image, which not only led to the wrong recognition of the stomach, but also covered the correct label of spleen. In the third case, a typical liver recognition error occurred. Due to the rarity of such features in training data, the network habitually takes the lung boundary as the criterion for judging the region of liver.
5 Conclusion
We propose a three-stage automatic segmentation method for abdominal 13 organs based on improved 3D U-Net. The results show that the average dice of our method is 0.77 on the official validation leaderboard. The results show that the accuracy of our method on massive organs is better than that for small organs. The speed of three-stage method is fast, but it is difficult to achieve higher accuracy due to the limitation of feature map size. Future work will focus on promoting accuracy based on less stage methods, in which the segmentation speed can be further improved.
References
Çiçek, Ö., Abdulkadir, A., Lienkamp, S.S., Brox, T., Ronneberger, O.: 3D U-Net: learning dense volumetric segmentation from sparse annotation. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9901, pp. 424–432. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46723-8_49
Clark, K., et al.: The cancer imaging archive (TCIA): maintaining and operating a public information repository. J. Digit. Imaging 26(6), 1045–1057 (2013)
Couteaux, V., et al.: Kidney cortex segmentation in 2D CT with U-Nets ensemble aggregation. Diagn. Interv. Imaging 100(4), 211–217 (2019)
Fu, Y., et al.: A novel MRI segmentation method using CNN-based correction network for MRI-guided adaptive radiotherapy. Med. Phys. 45(11), 5129–5137 (2018)
Heller, N., et al.: The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: results of the KiTS19 challenge. Med. Image Anal. 67, 101821 (2021)
Heller, N., et al.: An international challenge to use artificial intelligence to define the state-of-the-art in kidney and kidney tumor segmentation in CT imaging 38(6), 626 (2020)
Isensee, F., Jäger, P.F., Kohl, S.A., Petersen, J., Maier-Hein, K.H.: Automated design of deep learning methods for biomedical image segmentation. arXiv preprint arXiv:1904.08128 (2020)
Kim, D.Y., Park, J.W.: Computer-aided detection of kidney tumor on abdominal computed tomography scans. Acta Radiol. 45(7), 791–795 (2004)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Li, J., Zhu, S.A., Bin, H.: Medical image segmentation techniques. J. Biomed. Eng. 23(4), 891–894 (2006)
Ma, J., et al.: Abdomenct-1k: Is abdominal organ segmentation a solved problem. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2021)
Micheli-Tzanakou, E.: Artificial neural networks: an overview. Netw. Comput. Neural Syst. 22(1–4), 208–230 (2011)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE, Stanford, CA, USA (2016)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Shelhamer, E., Long, J., Darrell, T.: Fully convolutional networks for semantic segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 640–651 (2017)
Simpson, A.L., et al.: A large annotated medical image dataset for the development and evaluation of segmentation algorithms. arXiv preprint arXiv:1902.09063 (2019)
Yang, Y., Jiang, H., Sun, Q.: A multiorgan segmentation model for CT volumes via full convolution-deconvolution network. BioMed. Res. Int. 2017, 6941306 (2017)
Zarándy, Á., Rekeczky, C., Szolgay, P., Chua, L.O.: Overview of CNN research: 25 years history and the current trends. In: 2015 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 401–404. IEEE, Lisbon, Portugal (2015)
Zhang, J., Zong, C., et al.: Deep neural networks in machine translation: an overview. IEEE Intell. Syst. 30(5), 16–25 (2015)
Zhao, C., Carass, A., Lee, J., He, Y., Prince, J.L.: Whole brain segmentation and labeling from CT using synthetic MR images. In: Wang, Q., Shi, Y., Suk, H.-I., Suzuki, K. (eds.) MLMI 2017. LNCS, vol. 10541, pp. 291–298. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67389-9_34
Acknowledgements
The authors of this paper declare that the segmentation method they implemented for participation in the FLARE 2022 challenge has not used any pre-trained models nor additional datasets other than those provided by the organizers. The proposed solution is fully automatic without any manual intervention. This work was supported by Natural Science Foundation of China (Grant No. 62173014) and Natural Science Foundation of Beijing Municipality (Grant No. L192057).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lv, Y., Ning, Y., Wang, J. (2022). Coarse to Fine Automatic Segmentation of Abdominal Multiple Organs. In: Ma, J., Wang, B. (eds) Fast and Low-Resource Semi-supervised Abdominal Organ Segmentation. FLARE 2022. Lecture Notes in Computer Science, vol 13816. Springer, Cham. https://doi.org/10.1007/978-3-031-23911-3_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-23911-3_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-23910-6
Online ISBN: 978-3-031-23911-3
eBook Packages: Computer ScienceComputer Science (R0)