1 Introduction

The significant growth of medical imaging applications in the last decade has witnessed a matching increase in image segmentation and classification. Such growth has encouraged researchers in clinical fields to develop models that make segmentation work similar to the human process in clinical practices [1, 2, 28, 30]. To this end, machine learning-based brain segmentation, in which brain images are divided into multiple tissues, has emerged as it makes brain image segmentation more accurate [3, 4].

Many brain image segmentation models have been proposed in the literature. A common technique is to use two-stage models, which involves fusing global information with local information generated in two subsequent stages, to achieve acceptable segmentation results. The design of multi-stage models, in general, allows achieving better results, since it helps solve the information loss problem [5,6,7,8].

There have been many studies [10,11,12, 15, 26, 27] proposing techniques to improve the accuracy of brain image segmentation to reach results that are close enough to manual reference. Recently, the use of deep learning algorithms for brain image segmentation started to emerge. However, there is still a lack of available data to train deep learning models. To address such an issue, adversarial learning and few-shot learning techniques have been developed to perform well in cases where only a few labeled images are available [9, 13]. For example, Mondal et al. [9] proposed a few-shot 3D multi-modal image segmentation using a GAN model, which consists of U-net, a generator, and an encoder [9]. Fake images were first generated using the generator, then used along with labeled and unlabeled data to train the discriminator, which in turn distinguishes between generated and true data. The encoder was used to compute the predicted noise mean and log-variance. Despite the merits of such a model, its achieved results were not significantly higher than previous state-of-the-art models.

While previous techniques enabled neural networks to produce acceptable segmentation output, there were very few models that address the segmentation of infant brain images into White Matter (WM), Grey Matter (GM), and Cerebrospinal Fluid (CSF). As an example, Dolz et al. [14] proposed a model to segment infant brain images, which was evaluated using the iSEG Grand MICCAI challenge dataset. The model utilized the direct connections between layers from the same and different paths, which were used to improve the learning process. However, that model did not take into consideration deeper networks with fewer filters per layer. Moreover, individual weights from dense connections were not investigated.

Therefore, in this paper, we propose \(MRI-GAN\), a novel Generative Adversarial Network (GAN) model that performs segmentation of MRI brain images, particularly WM, GM, and CSF. Our model enables the generation of more labeled data from existing labeled and unlabeled data. To do this, we employ an MRI encoder with a ground truth encoder to compress the features and convert them into low-dimensional MRI and tissues vectors. Each encoder is capable of compressing one or more inputs. In summary, this paper makes the following contributions:

  • Novel MRI-GAN Model: Introduces a new GAN model for segmenting brain MRI images into WM, GM, and CSF tissues.

  • Data Augmentation: Enables data modeling from labeled and unlabeled data, addressing limited annotated datasets.

  • Integrated Encoders: Uses MRI and ground truth encoders for efficient feature compression and vector conversion.

  • Improved Accuracy: Outperforms existing methods in accurate tissue segmentation.

The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 presents the \(MRI-GAN\) model. Section 4 presents our setup materials and methods. Section 5 presents and discusses our experimental results. Finally, Sect. 6 concludes the paper and suggests possible future work.

2 Related Work

This section reviews the work related to our study.

2.1 Generative Adversarial Network for Brain Segmentation

GANs have shown promising results in both medical image diagnostics [20] and brain image segmentation [19, 23]. The standard GAN has two parts: The generator is to generate the data and the discriminator is to distinguish between the generated data and real data. Much research on brain image segmentation has been conducted using GANs. For example, Cirillo et al. [21] proposed a 3D volume-to-volume (GAN) to segment the images of brain tumors. Their model achieved 94% result when the generator loss was weighted five times higher than the discriminator loss. The proposed model was evaluated on the BraTS 2013 dataset. Their model outperformed previous models with an overall accuracy of 66%. Delannoy et al. [22] proposed a super-resolution and segmentation framework using GANs to neonatal brain MRI images. The framework composed of (a) a training of a generating network that estimates the corresponding high resolution (HR) image for a given input image and (b) a discriminator network D to distinguish real HR and segmentation images. Their model outperformed previous models with an overall accuracy of 83%.

2.2 Encoder/Decoder

The encoder/decoder model emerged more than a decade ago as a concept to describe an image [5]. A well-known study of encoder/decoder was the auto encoder/decoder [17], which has investigated the encoder and decoder model based on pixel-wise classification. In addition, this model enabled the use of nonlinear upsampling and a smaller number of parameters for training, which requires higher computational power than any other deep learning architectures. However, many studies that performed encoding/decoding considered mapping a dense block into a standard encoder/decoder model. We expect that applying encoder/decoder models in a GAN model will provide more accurate segmentation results for brain images. To achieve this, we first develop a new encoder-decoder model that compresses the feature of the inputs and also maps the tissues’ information to the decoder. Results show that our \(MRI-GAN\) model exhibits results that are fairly close to the manual reference, and a significant reduction in training time compared to the state-of-the-art models. Furthermore, the Dice coefficient is applied to better demonstrate the significance of the \(MRI-GAN\) model.

3 Proposed Model

This section describes the structure of our proposed GAN model.

3.1 Encoder/Decoder

Our \(MRI-GAN\) model consists of generator and discriminator. Fig. 1 shows our proposed GAN model. All the MRI encoder, ground truth encoder, tissues mapping, boundary detection network, and decoder together represent the generator of the \(MRI-GAN\) model. MRI encoder and ground truth encoder take MRI image and ground truth then convert them to MRI and ground truth vectors. The detection network provides more information about the boundary. The output of the decoder is a GT image where GT denotes the image generated from the generator.

Fig. 1.
figure 1

Illustration of our proposed GAN model

3.2 Mapping

The decoder upscales the MRI code into a 3D geometry using SpiralBlocks that are conditioned by the ground code using Adaptive Instance normalization (AdaIN) [29]. Given a sample x that is passing through the network, AdaIN first normalizes the activations in each channel of x to a zero \(\mu \) and unit \(\sigma \). The activations are then scaled on a per-channel basis. We use a mapping function R that maps a ground code y into (\(\mu \), \(\sigma \)) parameters for every channel of each AdaIN layer. Hence the following equation:

$$\begin{aligned} AdaIN(x,y)= R_{\sigma }(y) \frac{x-\mu (x)}{\sigma (x)}+R_{\mu }(y) \end{aligned}$$
(1)

where R is a learned affine function composed of multiple fully connected layers, taking the ground latent code as input. Since the AdaIN transformation operates on whole channels, the ground code alters global appearance information while the local features are determined by the MRI code.

3.3 Loss Function

Discriminator Loss Function. The discriminator in the \(MRI-GAN\) model has labeled data loss, unlabeled data loss, and GT images loss (fake loss). We formulate the overall loss function of \(MRI-GAN\) as follows:

$$\begin{aligned} \begin{aligned} \text{ l}_{\text {discriminator}} = \lambda _{\text {labeled}} l_{\text {labeled}} + \lambda _{\text {unlabeled}} l_{\text {unlabeled}} + \lambda _{\text {fake}} l_{\text {fake}}, \end{aligned} \end{aligned}$$
(2)

where \(\lambda _{\text {labeled}}\), \(\lambda _{\text {unlabeled}}\), and \(\lambda _{\text {fake}}\) are hyper-parameters. We set the hyper-parameters in Equation (2) to \(\lambda _{\text {labeled}}=1.0\), \(\lambda _{\text {unlabeled}}=1.0\), and \(\lambda _{\text {fake}}=2.0\).

We used the proposed loss functions from Mondal et al. [9], where Pmodel refers to the probability distribution of the data. More details about loss functions can be found in [9].

(3)
(4)
(5)
$$\begin{aligned} Z_{\text {i}}(x) = \sum \limits _{{\text {k=1}}} ^K exp [l_{\text {i,k}}(x)] \end{aligned}$$
(6)

Generator Loss Function. We propose a novel generated loss to induce G to generate real data. Let x and z denote real data and noise, respectively.

(7)

In our paper, we consider f(x) to contain the activation of the last layer.

$$\begin{aligned} L(G) =\mid \mid C-x\mid \mid ^2_{2}, \end{aligned}$$
(8)

By minimizing this loss, we force the generator to generate real data in order to match our data and the corresponding K classes of real data, which are defined as \(classes={1,...,K}\).

4 Setup Materials and Methods

This section describes the setup materials and methods used in our paper.

4.1 Datasets

MICCAI iSEG Dataset. The MICCAIiSEG organizersFootnote 1 introduced a publicly available evaluation framework to allow comparing different segmentation models of WM, GM, and CSF on T1-weighted (T1) and T2-weighted (T2). The MICCAI iSEG dataset contains: 10 images (i.e., subject-1 up to subject-10), subject T1 : T1-weighted image, subject T2 : T2-weighted, and a manual segmentation label. All these images are used as a training set. The dataset also contains 13 images (i.e., subject-11 up to subject-23), which are used as a testing set. An example of the MICCAI iSEG dataset (T1, T2, and manual reference contour) is shown in Fig. 2.

Fig. 2.
figure 2

An example of the MICCAI iSEG dataset (T1, T2, and manual reference contour)

Table 1 shows the parameters used to generate T1 and T2. The dataset has two different times: the longitudinal relaxation time and the transverse relaxation time, which are used to generate T1 and T2. The dataset has been interpolated, registered, and the images are skull-removed by the MICCAI iSEG organizers.

Table 1. Parameters used to generate T1 and T2

MRBrains Dataset. The MRBrains dataset contains 20 adult images for the segmentation of (a) cortical gray matter, (b) basal ganglia, (c) white matter, (d) white matter lesions, (e) peripheral cerebrospinal fluid, (f) lateral ventricles, (g) cerebellum, and (h) brain stem on T1, T2, and FLAIR. Five images (i.e., 2 male and 3 female) are provided as a training set and 15 images are provided as a testing set. For segmentation evaluation, these structures merged into gray matter (\(a-b\)), white matter (\(c-d\)), and cerebrospinal fluid (\(e-f\)). The cerebellum and brainstem were excluded from the evaluation.

4.2 Experimental Setup

The experiments of the proposed model were conducted using Python on a PC with NVIDIAGPU running Ubuntu 16.04. Training \(MRI-GAN\) took 30 hours in total, whereas testing took 5 minutes.

4.3 Segmentation Evaluation

Dice Coefficient (DC). To better highlight the significance of our proposed \(MRI-GAN\) model, we use the Dice Coefficient (DC) metric to evaluate the performance of the \(MRI-GAN\) model. Dice Coefficient (DC) has been used to compare state-of-the-art segmentation models. We use \(V_{\text {ref}}\) for reference segmentation and \(V_{\text {auto}}\) for automated segmentation. The DC is given by the following equation:

$$\begin{aligned} \text {{ DC}}(V_{\text {ref}}, V_{\text {auto}}) = \frac{2 V_{\text {ref}} \bigcap V_{\text {auto}} |}{|V_{\text {ref}}| + |V_{\text {auto}}|}[18], \end{aligned}$$
(9)

where DC values range between [0, 1], where 1 indicates a perfect overlap and 0 indicates a complete mismatch.

5 Result and Discussion

We train and test the \(MRI-GAN\) model on two datasets of different ages: adults and infants. Table 2 presents the results of the \(MRI-GAN\) model to segment CSF,GM, and WM using the MICCAI iSEG dataset. Our \(MRI-GAN\) model achieves a DC value of 93% in CSF segmentation. In contrast, the DC values achieved from segmenting CSF by Standard GAN is 86%, which is 7% less accurate. In addition, our \(MRI-GAN\) model achieves DC values of 94% and 92% in segmenting GM and WM, respectively. The Standard GAN model, in contrast, achieves a DC value of 80% (14% lower) for GM segmentation and 81% (11% lower) for WM segmentation. These results highlight the remarkable efficiency achieved by the \(MRI-GAN\) model compared to the standard GAN.

Table 2. Dice Coefficient (DC) results of the segmentation achieved on the MICCAIiSEG dataset. The best performance for each tissue class is highlighted in bold.

Table 3 presents the results achieved using the MRBrains dataset. We observe that our \(MRI-GAN\) model achieves a DC value of 91% on CSF segmentation, 90% on GM segmentation, and 95% on WM segmentation. Such results are superior to the results achieved by the Standard GAN model.

Figure 3 shows a sample visualized result of our \(MRI-GAN\) model on a subject used as part of the validation set. As the images show, we observe that the segmentation achieved by the \(MRI-GAN\) model is fairly close to the manual reference (ground truth) contour provided by the MICCAI iSEG organizers.

Table 3. Dice Coefficient (DC) results of the segmentation achieved on the MRBrains dataset. The best performance for each tissue class is highlighted in bold.

Our evaluation results show that the proposed model not only outperforms two baselines (Standard GAN and 3D, FCN + MIL+G+K [15]) on the three tissues, but also attempts to outperform Multi-stage [24] on two tissues. The proposed \(MRI-GAN\) model improved the results in GM and WM on the MICCAIiSEG dataset and WM on the MRBrains dataset compared with Multi-stage [24]. We acknowledge that our model may not perform well for all cases and still has limitations due to the small number of images available, which we aim to improve further in the future.

Fig. 3.
figure 3

A sample visualized result from the MICCAI iSEG dataset

6 Conclusion

In this paper, we proposed \(MRI-GAN\), a novel Generative Adversarial Network (GAN) model that performs segmentation of MRI brain images. Our model makes segmentation more accurate by applying encoder and decoder algorithms separately, which demonstrated a significant increase in the accuracy of brain image segmentation results. We first extracted and compressed the features of the MRI encoder and ground truth encoder inputs, and then mapped the information to the decoder. Our experimental results show that the \(MRI-GAN\) model is a viable solution for brain segmentation as it achieves a significant improvement in the accuracy of brain segmentation compared to the standard GAN model while taking a shorter training time.

Directions for Future Work. Based on our model, we have a number of possible directions for future work. We aim to investigate our model performance in segmenting more brain tissues and consider pathological brain images, such as with tumours or edema. Pathological brain images are not included in this study due to the lack of data.

7 Declarations

7.1 Competing Interests

The authors declare that they have no known competing financial interests.

7.2 Consent for Publication

Not applicable.

7.3 Availability of Data and Materials

The data that support the findings of this study are available from MICCAI grand challenge on 6-month infant brain MRI segmentation [1] and MRBrainS and are publicly available (see Footnote 1).