Abstract
We propose a new adversarial network, named voxel-GAN, to mitigate imbalanced data problem in brain tumor semantic segmentation where the majority of voxels belong to a healthy region and few belong to tumor or non-health region. We introduce a 3D conditional generative adversarial network (cGAN) comprises two components: a segmentor and a discriminator. The segmentor is trained on 3D brain MR or CT images to learn the segmentation label’s in voxel-level, while the discriminator is trained to distinguish a segmentor output, coming from the ground truth or generated artificially. The segmentor and discriminator networks simultaneously train with new weighted adversarial loss to mitigate imbalanced training data issue. We show evidence that the proposed framework is applicable to different types of brain images of varied sizes. In our experiments on BraTS-2018 and ISLES-2018 benchmarks, we find improved results, demonstrating the efficacy of our approach.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Brain imaging studies using magnetic resonance imaging (MRI) or computed tomography (CT) provides an important information for disease diagnosis and treatment planning [6]. One of the major challenges in brain tumor segmentation is unbalanced training data which the majority of the voxel healthy and only fewer voxels are non-healthy or a tumor. A model learned from class imbalanced training data is biased towards the majority class. The predicted results of such networks have low sensitivity, showing the ability of not correctly predicting non-healthy classes. In medical applications, the cost of misclassification of the minority class could be more than the cost of misclassification of the majority class. For example, the risk of not detecting tumor could be much higher than referring to a healthy subject to doctors.
The problem of class imbalanced has been recently addressed in diseases classification, tumor recognition, and tumor segmentation. Two types of approaches proposed in the literature: data-level and algorithm-level approaches.
At data-level, the objective is to balance the class distribution through re-sampling the data space [21], by including SMOTE (Synthetic Minority Over-sampling Technique) of the positive class [10] or by under-sampling of the negative class [19]. However, these approaches often lead to remove some important samples or add redundant samples to the training set.
Algorithm-level based solutions address class imbalanced problem by modifying the learning algorithm to alleviate the bias towards majority class. Examples are cascade training [8, 33, 36], training with cost-sensitive function [40], such as Dice coefficient loss [12, 35, 38], and asymmetric similarity loss [16] that modifying the training data distribution with regards to the misclassification cost.
Here, we study the advantage of mixing adversarial loss with weighted categorical cross-entropy and weighted \(\ell 1\) losses in order to mitigate the negative impact of the class imbalanced. Moreover, we train voxel-GAN simultaneously with semantic segmentation masks and inverse class frequency segmentation masks, named complementary segmentation labels. Assume, Y is true segmentation label annotated by expert and \(\bar{Y}\) is complementary label where the \(P(\bar{Y} = i\mid Y=j), i\ne j \in \{0,1,...,c-1\},\) and c is a number of semantic segmentation class labels. The complementary label \(\bar{Y}\) is a reverse label for the background labels. Then, our network train with both true segmentation mask Y and complementary segmentation mask \(\bar{Y}\) at the same time.
Automating brain tumor segmentation is challenging task due to the high diversity in the appearance of tissues among different patients, and in many cases, the similarity between healthy and non-healthy tissues. Numerous automatic approaches have been developed to speed up medical image segmentation [6, 25]. We can roughly divide the current automated algorithms into two categories: those based on generative models and those based on discriminative models.
Generative probabilistic approaches build the model based on prior domain knowledge about the appearance and spatial distribution of the different tissue types. Traditionally, generative probabilistic models have been popular where simple conditionally independent Gaussian models [13] or Bayesian learning [32] are used for tissue appearance. On the contrary, discriminative probabilistic models, directly learn the relationship between the local features of images and segmentation labels without any domain knowledge. Traditional discriminative approaches such as SVMs [2, 9], random forests [23], and guided random walks [11] have been used in medical image segmentation. Deep neural networks (DNNs) are one of the most popular discriminative approaches, where the machine learns the hierarchical representation of features without any handcrafted features [22]. In the field of medical image segmentation, Ronneberger et al. [37] presented a fully convolutional neural network, named UNet, for segmenting neuronal structures in electron microscopic stacks.
Recently, GANs [15] have gained a lot of momentum in the research fraternities. Mirza et al. [26] extended the GANs framework to the conditional setting by making both the generator and the discriminator network class conditional. Conditional GANs have the advantage of being able to provide better representations for multi-modal data generation since there is a control on the modes of the data being generated. This makes cGANs suitable for image semantic segmentation task, where we condition on an observed image and generate a corresponding output image.
Unlike previous works on cGANs [18, 27, 34, 36, 41], we investigate the 3D MR or CT images into 3D semantic segmentation. Summarizing, the main contributions of this paper are:
-
We introduce voxel-GAN, a new adversarial framework that improves semantic segmentation accuracy.
-
Our proposed method mitigates imbalanced training data with biased complementary labels in task of semantic segmentation.
-
We study the effect of different losses and architectural choices that improve semantic segmentation.
The rest of the paper is organized as follows: in the next section, we explain our proposed method for learning brain tumor segmentation, while the detailed experimental results are presented in Sect. 3. We conclude the paper and give an outlook on future research in Sect. 4.
2 voxel-GAN
In a conventional generative adversarial network, generative model G tries to learn a mapping from random noise vector z to output image y; \(G: z \rightarrow y\). Meanwhile, a discriminative model D estimates the probability of a sample coming from the training data \(x_{real}\) rather than the generator \(x_{fake}\). The GAN objective function is a two-player mini-max game like Eq. (1).
Similar conditional GAN [26]; in our proposed voxel-GAN, segmentor network takes 3D multimodal MR or CT images x and Gaussian vector z, and outputs a 3D semantic segmentation; The discriminator takes the segmentor output S(x, z) and the ground truth annotated by an expert \(y_{seg}\) and outputs a confidence value D(x) of whether a 3D object input x is real or synthetic. The training procedure is similar to two-player mini-max game as shown in Eq. (2).
In this work, similar to the work of Isola et al. [18], we used Gaussian noise z in the generator alongside the input data x. As discussed by Isola et al. [18], in training procedure of conditional generative model from conditional distribution P(y|x), that would be better to model produces more than one sample y, from each input x. When the generator G, takes plus input image x, random vector z, then G(x, z) can generate as many different values for each x as there are values of z. Specially for medical image segmentation, the diversity of image acquisition methods (e.g., MRI, fMRI, CT, ultrasound), regarding their settings (e.g., echo time, repetition time), geometry (2D vs. 3D), and differences in hardware (e.g., field strength, gradient performance) can result in variations in the appearance of body organs and tumour shapes [17], thus learning random vector z with input image x makes network robust against noise and act better in the output samples. This has been confirmed by our experimental results using datasets having a large range of variation.
To mitigate the problem of unbalanced training samples, the segmentor loss is weighted same as Eq. (3) to reduce effect of class voxel frequencies for the whole training dataset.
The segmentor loss Eq. (4) is mixed with \(\ell 1\) term to minimize the absolute difference between the predicted value and the existing largest value. Previous studies [36, 41] on cGANs have shown the success of mixing the cGANs objective with \(\ell 1\) distance. Hence, the \(\ell 1\) objective function takes into account CNNs feature differences between the predicted segmentation and the ground truth segmentation and resulting in fewer noises and smoother boundaries.
The final objective function for semantic segmentation of brain tumors \(\mathcal {L}_{seg}\) calculated by adversarial loss and additional segmentor \(\ell 1\) loss as follows:
2.1 Segmentor Network
As shown in Fig. 1, the segmentor architecture is two, 3D fully convolutional encoder-decoder network that predicts a label for each voxel. The first encoder takes \(64 \times 64 \times 64\) of multi-modal MRI or CT images at same time as different channel input. Last decoder outputs 3D images with size \(64 \times 64 \times 64\). Similar to UNet [37], we added the skip connections between each layer i and layer \(n-i\), where n is the total number of layers in each encoder and decoder part. Each skip connection simply concatenates all channels at layer i with those at layer \(n-i\). Moreover, we concatenate the bottleneck features and last convolutional decoder to capture better feature representation.
2.2 Discriminator Network
As depicted in Fig. 1, the discriminator is 3D fully convolutional encoder network which classifies whether a predicted voxel label belongs to right class. Similar to the pix-to-pix [18], we use path-GAN as discriminator with setting of voxel to voxel analysis. More specifically, the discriminator is trained to minimize the average negative cross-entropy between predicted and the true labels.
Then, the segmentor and the discriminator networks are trained through back propagation corresponding to a two-player mini-max game. We use categorical cross entropy [29] as an adversarial loss. As mentioned before, we weighted loss to only attenuate healthy voxel impact in training and testing time.
3 Experiments
We validated the performance of our voxel-GAN on two recent medical imaging challenges: real patient data obtained from the MICCAI 2018, MRI brain tumor segmentation (BraTS) [3,4,5, 25] and CT brain lesion segmentation challenge (ISLES-2018) [20, 24].
3.1 Datasets and Pre-processing
The first experiment is carried out on real patient data obtained from BraTS2018 challenge [3,4,5, 25]. The BraTS2018 released data in three subsets train, validation, and test comprising 289, 68, and 191 MR images respectively in four multisite modalities of T1, T2, T1ce, and Flair which the annotated file provided only for the training set. The challenge is semantic segmentation of complex and heterogeneously located of tumor(s) on highly imbalanced data. Pre-processing is an important step to bring all subjects in similar distributions, we applied z-score normalization on four modalities with computing the mean and stdev of the brain intensities. We also applied bias field correction introduced by Nyúl et al. [30]. Figure 2 shows an 2D slice of prepocessed images (our network takes 3D images).
In second experiment, We applied the ISLES2018 benchmark which contains 94 computer tomography (CT) and MRI training data in six modalities of CT, 4DPWI, CBF, CBV, MTT, Tmax, and annotated ground truth file. The examined patients were suffering from different brain cancers. The challenging part is binary segmentation of unbalance labels. Here, pre-processing is carried out in a slice-wise fashion. We applied Hounsfield unit (HU) values, which were windowed in the range of [30, 100] to get soft tissues and contrast. Furthermore, we applied histogram equalization to increase the contrast for better differentiation of abnormal lesion tissue.
To prevent over fitting, we added data augmentation to each datasets such as randomly cropped, re-sizing, scaling, rotation between \(-10\) and 10 degree, and Gaussian noise applied on training and testing time for both datasets.
3.2 Implementation
Configuration: Our proposed method is implemented based on a Keras library [7] with back-end Tensorflow [1] supporting 3D convolutional network and is publicly availableFootnote 1. All training and experiments were conducted on a workstation equipped with a multiple GPUs. The learning rate was initially set to 0.0001. The Adadelta optimizer is used in both the segmentor and the discriminator that continues learning even when many updates have been done. The model is trained for up to 200 epochs on each dataset separately. We used Adadelta as an optimizer for cGAN network.
Network Architecture: In this work, a segmentor network is a modified UNet architecture that we designed two UNet architecture with sharing circumvent bottlenecks and last fully convolutional layer in decoder part. The UNet architecture allows low-level features to shortcut across the network. Motivated by previous studies on interpreting encoder-decoder networks [31], that show the bottleneck features carried local features and fully convolutional up-sampling encoder represented global features, we concatenate circumvent bottlenecks and last fully convolutional layer to capture more important features.
Our discriminator is fully convolutional Markovian PatchGAN classifier [18] which only penalizes structure at the scale of image patches. Unlike, the PathGAN discriminator introduced by Isola et al. [18] which classified each N N patch for real or fake, we have achieved better results for task of semantic segmentation in voxel level 1 \(\times \) 1 \(\times \) d we consider N = 1 and different d = 64, 32, 16, and 8. We used categorical cross entropy [29] as an adversarial loss with combination of \(\ell 1\) loss in generator network.
Regarding the highly imbalance datasets as shown in Fig. 3, minority voxels with lesion label are not trained as well as majority voxels with non-lesion label. Therefore, we weighted only non-lesion classes to be in same average of lesion or tumor(s) classes. Tables 1 and 2 describe our achieved results with and without weighting loss on BraTS2018.
3.3 Evaluation
We followed the evaluation criteria introduced by the BraTSFootnote 2, the ISLESFootnote 3 challenge organizers.
The segmentation of the brain tumor or lesion from medical images is highly interesting in surgical planning and treatment monitoring. As mentioned by Menze et al. [25], the goal of segmentation is to delineate different tumor structures such as active tumor core, enhanced tumor, and whole tumor regions.
Figure 4 shows good trade-off between Dice and Sensitivities in training and validation time which it shows success for tackling of unbalancing data.
From Table 1, the proposed voxel-GAN achieved better results in terms of Dice compared to 2D-cGAN. One likely explanation is that the voxel-GAN architecture is trained on 3D convolutional features and the segmentor loss is weighted for imbalanced data.
Unlike previous works [14, 28, 39], we start training from scratch and even after 200 epochs our results are not as good as top ranked team. From Table 1, two top ranked team used ensemble of pre-trained models. Ensemble networks provides good solution for imbalanced data by modifying the training data distribution with regards to the different misclassification costs. In future we will focus on training voxel-GAN with one segmentor from scratch and many different pre-trained discriminators.
4 Conclusion
In this paper, we presented a new 3D conditional generative adversarial architecture, named voxel-GAN, that mitigates the issue of unbalanced data for the brain lesion or tumor segmentation. To this end, we proposed a segmentor network and a discriminator network where the first segments the voxel label, and the later classifies whether the segmented output is real or fake. Moreover, we analyzed an effects of different losses and architectural choices that help to improve semantic segmentation results. We validated our framework on CT ISLES2018 and MRI BraTS-2018 images for lesion and tumor semantic segmentation. In the future, we plan to investigate ensemble network based on voxel-GAN but with many pre-trained discriminator networks for semantic segmentation task.
References
Abadi, M., et al.: Tensorflow: a system for large-scale machine learning. In: OSDI, vol. 16, pp. 265–283 (2016)
Afshin, M., et al.: Regional assessment of cardiac left ventricular myocardial function via MRI statistical features. IEEE Trans. Med. Imaging 33(2), 481–494 (2014)
Bakas, S., Akbari, H.: Segmentation labels and radiomic features for the pre-operative scans of the TCGA-GBM collection. Cancer Imaging Arch. 286 (2017). https://doi.org/10.7937/K9/TCIA.2017.KLXWJJ1Q
Bakas, S., et al.: Segmentation labels and radiomic features for the pre-operative scans of the TCGA-LGG collection. Cancer Imaging Arch. 286 (2017). https://doi.org/10.7937/K9/TCIA.2017.GJQ7R0EF
Bakas, S., et al.: Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Nat. Sci. Data (2017). https://doi.org/10.1038/sdata.2017.117
Bakas, S., Reyes, M., Menze, B. et al.: Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the brats challenge. arXiv preprint arXiv:1811.02629 (2018)
Chollet, F., et al.: Keras (2015)
Christ, P.F., et al.: Automatic liver and tumor segmentation of CT and MRI volumes using cascaded fully convolutional neural networks. CoRR abs/1702.05970 (2017). http://arxiv.org/abs/1702.05970
Ciecholewski, M.: Support vector machine approach to cardiac SPECT diagnosis. In: Aggarwal, J.K., Barneva, R.P., Brimkov, V.E., Koroutchev, K.N., Korutcheva, E.R. (eds.) IWCIA 2011. LNCS, vol. 6636, pp. 432–443. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21073-0_38
Douzas, G., Bacao, F.: Effective data generation for imbalanced learning using conditional generative adversarial networks. Expert Syst. Appl. 91, 464–471 (2018)
Eslami, A., Karamalis, A., Katouzian, A., Navab, N.: Segmentation by retrieval with guided random walks: application to left ventricle segmentation in MRI. Med. Image Anal. 17(2), 236–253 (2013)
Fidon, L., et al.: Generalised wasserstein dice score for imbalanced multi-class segmentation using holistic convolutional networks. In: Crimi, A., Bakas, S., Kuijf, H., Menze, B., Reyes, M. (eds.) BrainLes 2017. LNCS, vol. 10670, pp. 64–76. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75238-9_6
Fischl, B., et al.: Sequence-independent segmentation of magnetic resonance images. Neuroimage 23, S69–S84 (2004)
Gholami, A., et al.: A novel domain adaptation framework for medical image segmentation. CoRR abs/1810.05732 (2018). http://arxiv.org/abs/org/abs/1810.05732
Goodfellow, I.J., et al.: Generative Adversarial Networks. ArXiv e-prints (2014)
Hashemi, S.R., Salehi, S.S.M., Erdogmus, D., Prabhu, S.P., Warfield, S.K., Gholipour, A.: Tversky as a loss function for highly unbalanced image segmentation using 3D fully convolutional deep networks. CoRR abs/1803.11078 (2018). http://arxiv.org/abs/1803.11078
Inda, M.d.M., Bonavia, R., Seoane, J., et al.: Glioblastoma multiforme: a look inside its heterogeneous nature. Cancers 6(1), 226–239 (2014)
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017
Jang, J., et al.: Medical image matching using variable randomized undersampling probability pattern in data acquisition. In: 2014 International Conference on Electronics, Information and Communications (ICEIC), pp. 1–2, January 2014. https://doi.org/10.1109/ELINFOCOM.2014.6914453
Kistler, M., Bonaretti, S., Pfahrer, M., Niklaus, R., Büchler, P.: The virtual skele-ton database: an open access repository for biomedical research and collaboration. J. Med. Internet Res. 15(11) (2013)
Kohli, M.D., Summers, R.M., Geis, J.R.: Medical image data and datasets in the era of machine learning—whitepaper from the 2016 C-MIMI meeting dataset session. J. Digit. Imaging 30(4), 392–399 (2017)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Mahapatra, D.: Automatic cardiac segmentation using semantic information from random forests. J. Digit. Imaging 27(6), 794–804 (2014)
Maier, O., et al.: ISLES 2015-a public evaluation benchmark for ischemic stroke lesion segmentation from multispectral MRI. Med. Image Anal. 35, 250–269 (2017)
Menze, B.H., et al.: The multimodal brain tumor image segmentation benchmark (brats). IEEE transactions on medical imaging 34(10), 1993–2024 (2015)
Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR abs/1411.1784 (2014)
Moeskops, P., Veta, M., Lafarge, M.W., Eppenhof, K.A.J., Pluim, J.P.W.: Adversarial training and dilated convolutions for brain MRI segmentation. CoRR abs/1707.03195 (2017). http://arxiv.org/abs/1707.03195
Myronenko, A.: 3D MRI brain tumor segmentation using autoencoder regularization. arXiv preprint arXiv:1810.11654 (2018)
Nasr, G.E., Badr, E., Joun, C.: Cross entropy error function in neural networks: forecasting gasoline demand. In: FLAIRS Conference, pp. 381–384 (2002)
Nyúl, L.G., Udupa, J.K., Zhang, X.: New variants of a method of MRI scale standardization. IEEE Trans. Med. Imaging 19(2), 143–150 (2000)
Palade, V., Neagu, D.-C., Patton, R.J.: Interpretation of trained neural networks by rule extraction. In: Reusch, B. (ed.) Fuzzy Days 2001. LNCS, vol. 2206, pp. 152–161. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45493-4_20
Pohl, K.M., Fisher, J., Grimson, W.E.L., Kikinis, R., Wells, W.M.: A Bayesian model for joint segmentation and registration. NeuroImage 31(1), 228–239 (2006)
Rezaei, M., Yang, H., Meinel, C.: Instance tumor segmentation using multitask convolutional neural network. In: 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, July 2018. https://doi.org/10.1109/IJCNN.2018.8489105
Rezaei, M., et al.: A conditional adversarial network for semantic segmentation of brain tumor. In: Crimi, A., Bakas, S., Kuijf, H., Menze, B., Reyes, M. (eds.) BrainLes 2017. LNCS, vol. 10670, pp. 241–252. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75238-9_21
Rezaei, M., Yang, H., Meinel, C.: Deep neural network with l2-norm unit for brain lesions detection. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, E.S. (eds.) ICONIP 2017. LNCS, vol. 10637, pp. 798–807. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70093-9_85
Rezaei, M., Yang, H., Meinel, C.: Whole heart and great vessel segmentation with context-aware of generative adversarial networks. In: Maier, A., Deserno, T., Handels, H., Maier-Hein, K., Palm, C., Tolxdorff, T. (eds.) Bildverarbeitung für die Medizin 2018. Informatik aktuell, pp. 353–358. Springer, Heidelberg (2018). https://doi.org/10.1007/978-3-662-56537-7_89
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Jorge Cardoso, M.: Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso, M., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 240–248. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_28
Wang, G., Li, W., Ourselin, S., Vercauteren, T.: Automatic brain tumor segmentation using convolutional neural networks with test-time augmentation. arXiv preprint arXiv:1810.07884 (2018)
Xu, J., Schwing, A.G., Urtasun, R.: Tell me what you see and i will show you where it is. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3190–3197 (2014)
Xue, Y., Xu, T., Zhang, H., Long, L.R., Huang, X.: Segan: adversarial network with multi-scalel1 loss for medical image segmentation. CoRR abs/1706.01805 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Rezaei, M., Yang, H., Meinel, C. (2019). voxel-GAN: Adversarial Framework for Learning Imbalanced Brain Tumor Segmentation. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2018. Lecture Notes in Computer Science(), vol 11384. Springer, Cham. https://doi.org/10.1007/978-3-030-11726-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-030-11726-9_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11725-2
Online ISBN: 978-3-030-11726-9
eBook Packages: Computer ScienceComputer Science (R0)