1 Introduction

Brain tissues grow rapidly in the early stages of human life. Over the past two decades, brain segmentation has relied on manual segmentation, which is extremely expensive and time-consuming [33]. For example, 15–20 images of infant’s brain may require 9–11 h to segment. Achieving accurate tissue segmentation of infant’s brain images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) is important to (a) measure abnormal early brain development, (b) monitor their progression, and (c) evaluate treatment outcomes [2]. However, due to the low contrast and unclear boundaries between WM and GM, it might be hard to achieve accurate segmentation. Moreover, different experts may generate different segmentation results.

Much research has studied the segmentation of brain images using automated models, including atlas-based, statistical, and deep learning models.

Deep learning models, convolutional neural networks (CNNs) in particular, have recently been used to perform automated segmentation of infant brain [3]. Previous models have achieved acceptable segmentation performance. However, prior studies have paid little attention to the separation of actual brain pixels from the background of brain images. Failure to perform such a separation may (a) distort brain segmentation models and (b) introduce overhead to the modeling performance. Therefore, developing robust models to segment brain regions is important to improve pathology detection and diseases diagnosis.

In this paper, we extend our work on improving the performance of brain segmentation using a fully CNN model [22] by expanding our (a) analysis and discussion of the achieved results and (b) review of advanced studies on the application of deep learning models for brain segmentation. In particular, our proposed model employs (i) a multi-instance loss method to separate actual brain pixels from background and (ii) Gabor filter banks and K-means clustering to provide informative segmentation details to support machine-learned features. To overcome the lack of medical imaging applications [11], we use full images as input to our model and apply max pooling and mean pooling to process the data. To evaluate our model, we use infant and adult datasets and measure the performance of our model using dice coefficients.

Unlike the state-of-the-art models, our results are evaluated by the MICCAI iSEG organizers (experts in medical imaging) [34]. Our model achieves dice coefficients ranging between 87.4% and 94.1% (i.e., an improvement of up to 11% to the results achieved by five state-of-the-art models). Moreover, our model is 1.2x–2.6x faster than the state-of-the-art models. These results indicate that our model is practically more accurate and efficient for the segmentation of both infant and adult brain image.

The rest of this paper is organized as follows. Section 2 presents the work related to brain segmentation. Section  3 presents the methods used in this paper. Section  4 presents our experimental results and evaluation. Section  5 discusses threats to the validity of our results. Finally, Sect.  6 concludes the paper and suggests future work.

2 Related Work

This section presents previous studies related to our work. First, we describe in detail recent models used for brain segmentation. Second, we describe magnetic resonance imaging (MRI).

2.1 Machine learning for brain segmentation

The main objective of a brain segmentation model is to solve the problem of having low contrast and unclear boundaries between the white matter and gray matter in brain images. Machine learning has been extensively used to solve the brain segmentation problem. Although some previous models on brain segmentation targeted infantile stages [29] (e.g., using multiple modalities [14]), some other models targeted early adults (< 12 months). Images used by previous models are either T1 or T2 MRI images.

In general, most state-of-the-art methods emphasized on three tissue types: white matter (WM), grey matter (GM), and cerebrospinal fluid (CSF). These models required specific training datasets to address a specific tissue type. More importantly, only a few organizations (e.g., iSEG organizers) provide the training and testing dataset [35].

Dolz et al. [11] proposed 3D and fully CNN for subcortical brain structure segmentation. Bao and Chung [2] improved the model proposed by Dolz et al. by using a multi-scale structured CNN with label consistency. Badrinarayanan et al. [1] proposed CNNs models with the use of residual connections to segment white matter hyperintensity from T1 and flair images. Their models outperformed previous models with an overall dice coefficient of 0.75% on H95 and 27.26% on an average surface distance. Fechter et al. [12] also used fully CNNs for brain segmentation. Using five datasets, they achieved dice coefficient ranging between 0.82 and 0.91 for each dataset. Visser et al. [33] proposed CNN models for brain segmentation using a multi modal method and subcortical segmentation. de Brebisson and Montana [7] proposed a random walker approach driven by a 3D fully CNN to different tissue classes. Their model was capable for segmenting the esophagus using CT images. Khaled et al. proposed two brain tissues segmentation models, one using FCN+MIL+G+K [22] and another using a multi-stage GAN model [23]. They evaluated their models on two infants and adults brain images and obtained good segmentation results, expressed by dice coefficients of up to 94% for segmenting GM and WM.

2.2 Deep Learning for Brain Segmentation

Dakai et al. [6] presented fully convolutional neural network (FCN) with residual connections for brain segmentation. A residual block consists of two convolutional layers at the same resolution. A residual learning was implemented through a shortcut connection that bypasses nonlinear layers with an identity mapping. This helps to recast nonlinear layers to fit a residual function with respect to its input, thus alleviating overfitting. To better capture finer scale details, fine-to-coarse down sampling path and coarse-to-fine up sampling path with shortcut connections were used. The model was evaluated on T1 and FLAIR images from the MICCAI2017 dataset. Despite the good segmentation results achieved, the model focused only on the white matter hyper intensities (WMHs), which are commonly found in the brain of healthy elders and patients diagnosed with small vessel disease and other neurological disorders. Figure 1 shows the work flow of the proposed WMH segmentation model.

Fig. 1
figure 1

Illustration of the (FCN) brain segmentation model with residual connections (from [6])

Fig. 2
figure 2

Illustration of the 3D U-Net brain segmentation model (from [26])

Zhegyang et al. [26] proposed a global aggregation block using 3D U-Net for brain segmentation. The global aggregation block aggregates global information from feature maps of any size without introducing more parameters. This model also contains an up-sampling global aggregation block, which helps alleviating information loss. Each position of the output feature maps depends on all positions of the input feature maps to achieve global information fusion through a block. The model was used to segment three tissues: gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF). The model surpassed previous models on infant brain image segmentation. Figure 2 shows an illustration of the proposed global aggregation block.

Khedher et al. [24] proposed a computer-aided diagnosis (CAD) system for brain segmentation. The proposed model aims to distinguish between elderly normal controls (NC), AD, and mild cognitive impairment (MCI) subjects. In the proposed model, dimension reduction techniques were used to reduce the information contained in brain images. Also, a principal component analysis (PCA) was used to reduce the original high-dimensional space of brain images to a lower-dimensional subspace. Partial least squares (PLS) were applied to maximize the covariance between different sets of variables. The model was evaluated on an (ADNI) dataset, the Alzheimer’s disease Neuroimaging Initiative , which contains 188 AD patients, 401 MCI patients, and 229 control subjects. Brain tissue segmentation was performed for gray matter (GM) and white matter (WM). Figure 3 shows detailed schema of the proposed model.

Fig. 3
figure 3

Illustration of the computer-aided diagnosis (CAD) system for brain segmentation (from [24])

Muhammad et al. [31] proposed a convolutional neural network (CNN) for segmenting brain lesions with considerable mass effect. This model used a deep medic (from a publicly available toolbox) to improve deep medic’s performance by incorporating global spatial information (GSI) into the network. In addition, 2D CNN was used as opposed to the 3D CNN used in the original study due to the anisotropy of MR images. The model also combines spatial information at the convolution level of CNN. The model was evaluated on segmenting the white matter (WMH) in comparison with deep Boltzmann machine (DBM). Figure 4 shows a diagram of CNN architectures used for this model.

Fig. 4
figure 4

Illustration of the (CNN) brain lesion segmentation model with considerable mass effect (from [31])

Yongchao et al. [37] proposed a semi-automated brain segmentation framework that is based on a mathematical morphology and max-tree representations of brain images. The model was designed to segment the white matter hyper intensities (WM). To construct the maxtree, the model employed an immersion algorithm based on the union-find process, starting from the root to leaves. The model achieved accurate results with excellent reproducibility and with minimal manual corrections. Besides 1.5T, which corresponds to the usual clinical practice, the model also performed well on images acquired at 3T, thus suggesting the generalizability of the model. Figure 5 shows the pipeline used for segmenting different neonatal brain tissues.

Fig. 5
figure 5

Illustration of the brain segmentation model with max-tree and mathematical morphology (from [37])

Fig. 6
figure 6

Illustration of the hyper-densely connected 3D CNN brain segmentation model (from [8])

Fig. 7
figure 7

Illustration of the 11-layer deep, three-dimensional CNN brain segmentation model (from [25])

Fig. 8
figure 8

Illustration of the brain segmentation model using fuzzy c-means and k-means clustering (from [38])

Dolz et al. [8] proposed a hyper-densely connected 3D convolutional neural network for brain segmentation. Two models were built by adding direct connections from any layer to all subsequent layers in a feedforward manner, which made training easier and more accurate. The direct connections between all layers help improve the flow of information as well as gradients throughout the network. The model was evaluated on the MICCAI iSEG dataset, containing 10 images for training and 13 images for testing, and was verified by the MICCAIiSEG organizers. Figure 6 shows a section of the proposed HyperDenseNet.

Kamnitsas et al. [25] proposed a 11-layer deep, three-dimensional convolutional neural network for brain lesion segmentation. This model used post-processing using a connected conditional random field, which helps mitigating false positives. The model was evaluated on three datasets and performed the best on two benchmarks ISLES 2015 and BRATS 2015 benchmarks. Figure 7 shows the baseline CNN consisting of four layers with 53 kernels for feature extraction, leading to a receptive field of size 173.

Zhang et al. [38] proposed a model for the automatic segmentation of white matter (WM) and gray matter (GM) using fuzzy c-means and k-means clustering methods and evaluated on 50 DICOM brain images. The model used both intensity values and statistical-based values. The results showed that statistical feature-based clustering achieved higher results than intensity-based clustering. Figure 8 shows the block diagram of the model.

2.3 Remarks on Related Work

Despite the research invested on brain segmentation, we observe that previous models were trained using images that contain actual brain pixels intermixed with the image background, which could negatively affect segmentation accuracy. Therefore, in this paper, we propose to separate brain pixels from background to improve the overall performance of brain segmentation. Then, we use fully CNN model and supply it with additional machine-learned features. In summary, our proposed method performs the following.

  • accelerating model training;

  • producing more accurate segmentation results;

  • improving information and gradients flow throughout the entire network; and

  • reducing the risk of overfitting.

Furthermore, what makes our work distinct from previous studies is that our results are evaluated by the MICCAI iSEG organizers.

3 Methodology

This section presents the methods we use to process brain images, extract addition features, and build brain segmentation models. Figure 9 shows an overview of our proposed model. Table 1 lists all the symbols we refer to in this paper.

Table 1 List of symbols referred to in this paper

3.1 The proposed CNN model

In our proposed model, we use two paths where each path has six groups of layers, as follows:

  • \(1\mathrm{st}\)group of layers consists of two layers, each of which containing 90 filters. Each filter in a layer is applied to the input images. The outcome of this process is known as a feature map. Feature maps are fed into the second group of layers.

  • \(2\mathrm{nd}\) group of layers consists of two layers, each of which contains 120 filters. Our kernel size is \(3\times 3\times 3\), which allows the network to learn more complex features with a reduced risk of overfitting. Feature maps from the second convolutional layers were upsampled through a deconvolution layer.

  • \(3\mathrm{rd}\)group of layers consists of two convolutional layers, each with 120 filters.

  • \(4\mathrm{th}\) and \(5\mathrm{th}\) groups of layers consists of deconvolution layers. Since we employ four classes (i.e., WM, GM, CSF, and background), the last deconvolution layer has four filters (i.e., one filer per class). Convolution layers are used after each deconvolution operation.

  • The last layer performs classification using softmax units.

Fig. 9
figure 9

Proposed fully CNN model on multi-instance loss and Gabor filter bank

Overfitting is a major problem in deep neural networks. Jonathan et al. [19] reported that deconvolution layers perform upsampling by learning to deconvolve the input feature map. Badrinarayanan et al. [1] reported that index-upsampling uses max pooling indices to upsample feature maps (without learning) and convolves with a bank of trainable filters. We experiment both upsampling strategies using our data and observe, as shown in Fig. 10a, that the deconvolution layer performs better than index upsampling. Therefore, we choose to use the deconvolution layer to upsample the input feature map to higher spatial space. After each convolution layer, we use PReLU [16] as an activation function, which (a) introduces a number of additional parameters, equal to the number of channels, and (b) prevents overfitting.

As is the case with deep models, the weights were initialized to random weights. He et al. [15] included restricted Boltzmann machines and showed that the equivalence between RBM’s and infinite directed nets with tied weights suggests an efficient learning algorithm for multi-layer networks in which the weights are not tied. Besides, He et al. [15] reported that deep models may have convergence difficulties and therefore proposed a weight initialization strategy to improve the accuracy of deep neural networks. Figure 10b shows that the initialization strategy proposed by He et al. performs better than the other two strategies. Therefore, in our model, we use the initialization strategy proposed by He et al., which employs variant responses in each layer.

Fig. 10
figure 10

a Two upsampling strategies, b three initialization strategies

A careful selection of a learning rate value can lead to better performance results. However, increasing the learning rate makes model training slower due to local optimizations used to update the parameters. To this end, we experiment with different learning rates to investigate what suits our data and topology. We start with a learning rate that is taken from a group of comparable models. First, we perform multiple runs by changing the learning rate value by alternating factors of 3 or 10 (i.e., 0.01, 0.003, 0.001, 0.0003, and so on). Once we achieve an acceptable estimate of the sweet spot, we tweak the final digit to reach an optimum value. Second, we increase the initial learning rate by a factor of 10 until the model does not converge to an optimum value. Similarly, we perform experiments to identify the lowest number of epochs needed to train our model. Finally, we initially set the learning rate to 0.01 and then reduce it by a factor of 10 after every 10 epochs.

Dropout and normalization techniques are also used to reduce overfitting in neural network models and other gradient-related problems [15]. During forward propagation in neural network models, activations are passed from one layer to another. Such activations may not fit a single distribution. In addition, in model training, each layer has to learn a new distribution every time, which slows down the training process (i.e., internal covariate shift). Hence, fixing the distribution of layer inputs eliminates the internal covariate shift and offers faster and better model training. Therefore, in our model, we compute batch mean and batch variance to normalize the inputs/outputs of each layer. In batch normalization, layer outputs are normalized to a fit a single distribution by maintaining a standard deviation of 1 and a mean of 0. Dropout randomly sets the activations of a certain number of neurons (i.e., dropout rate) to 0. This allows neurons to survive and participate in model learning at the next layer. In our model, we applied batch normalization according to the strategy proposed in [16]. Note that we do not preprocess the T1 and T2 input images.

3.2 Loss Methods

In our proposed model, we use stochastic gradient descent with two loss methods: (i) multi-instance loss at the intermediate stages and (ii) cross-entropy loss at the final stage.

Fig. 11
figure 11

Pink area marks the receptive field of one pixel in Layer 2. Blue area marks the receptive field of one pixel in Layer 3

3.2.1 Multi-instance Loss

Multi-instance learning is used to describe learning examples in a diverse array. Each learning example contains a bag of instances instead of a single feature vector. Each bag is associated with a label. In the training examples, a single example object has feature vectors (instance). Only one of those feature vectors is responsible for the classification of the object [13]. In traditional supervised learning, the aim was to find a model that predicts the value of a target variable, \( y \in \{0, 1\}\), for a given instance, \( x \in R^D \)). In multi-instance learning, there is a bag of instances instead of a single instance \( X = \{x1, . . . , x^K\}\) and there is a single label Y associated with the bag, \( y^k \in \{0, 1 \} \). During model training, there is no access to the labels as they remain unknown.

Multi-instance loss was inspired by multi-instance learning assumptions. The assumption is that (a) if at least one instance in the bag is positive, then the bag is positive, and (b) if all instances in the bag are negative, then the bag is negative. In our model, the third and fourth layers can be considered as a multi-instance problem. We use two loss functions: one for positive pixels (i.e., inside the MRI image) and another for negative pixels (i.e., outside the MRI image). The I-loss function (inside the MRI image) is given by the following equation:

$$\begin{aligned} LOS1=\sum \limits _{{\text {i,j}},R_{\text {i,j}}=1}\log (1+\exp {(-R_{\text {i,j}}* H^m_{\text {i,j}}})), \end{aligned}$$
(1)

where \(\ R_{\text {i,j}}\) is the ground truth provided by the dataset organizers. \(\ R_{\text {i,j}}=1\) if pixel \(\ {\text {(i,j)}} \) is inside the MRI image. The novelty here is that pixel \(\ \text {(i,j)} \) is in \(\ R_{\text {i,j}}\), and such pixel has a receptive field. The receptive field refers to a certain part of an image. If the receptive field has at least one positive pixel (inside MRI image), then \(\ \text {(i,j)} \) should be positive. Otherwise, \(\ \text {(i,j)} \) is negative. Figure 11 depicts the receptive field where pink area marks the receptive field of one pixel in Layer 2, whereas blue area marks the receptive field of one pixel in Layer 3. A bag-level predictive map \(\ H^m_{\text {i,j}} \) represents max pooling from feature maps. Ou-loss functions (outside the MRI image) is given by the following equation:

$$\begin{aligned} LOS2=\sum \limits _{{\text {i,j}},R_{\text {i,j}}=-1}\log (1+\exp {(-R_{\text {i,j}}* H^a_{\text {i,j}}})), \end{aligned}$$
(2)

where \(\ R_{\text {i,j}} =-1\) if pixel \(\ {\text {(i,j)}} \) is outside the MRI image. A bag-level predictive map \(\ H^a_{\text {i,j}} \) represents average pooling from feature maps. The total multi-instance loss function is given by the following equation:

$$\begin{aligned} MIL=LOS1+ \parallel W \parallel _2 LOS2, \end{aligned}$$
(3)

where the MIL ensures a proper differentiation between actual brain pixels and background. \(\ \parallel W \parallel _2 \) presents the weights in the neural network, which is given by the following equation:

$$\begin{aligned} \parallel W \parallel _2 = \sqrt{W^2_1+W^2_2+\cdots +W^2_n}. \end{aligned}$$
(4)

3.2.2 Cross-Entropy Loss

Loss functions are crucial in machine learning pipelines. However, knowing which loss function to use can be challenging. Cross-entropy loss is commonly used as a cost function when training classifiers. Cross-entropy loss is also used to measure the performance of a classification model. In our model, we use the softmax function to convert the output of the classification layer into normalized probability values.

3.3 Gabor Filters

Due to the low contrast and lack of clear boundaries between WM and GM, features are not sufficient for accurate segmentation. Gabor filter is a strong tool for the description of textures in images. Figure 12 shows the process of obtaining Gabor filters. Gabor filter can be obtained by convolving the image and applying it to our model as human-designed features to improve segmentation results [17]. The equation is given by:

$$\begin{aligned} \begin{aligned}&G(x,y; \lambda ,\theta ,\psi ,\sigma ,\gamma )\\&= \quad \ \ \exp (-((x'^2+y^2y'^2)/2\sigma 2)) \exp (i(2\pi x'/y +\psi )), \end{aligned} \end{aligned}$$
(5)

where \( \sigma \) is the standard deviation of the Gaussian envelope, \( \psi \) is the phase shift, \( \lambda \) is the wavelength of the sinusoid, \( \theta \) is the spatial orientation of the filter, and \( \gamma \) is the spatial aspect ratio. The terms \(x'\) and \(y'\) are given by the following equations:

$$\begin{aligned} x'&=xcon(\theta )+ysin(\theta ), \end{aligned}$$
(6)
$$\begin{aligned} y'&=ycos(\theta )-xsin(\theta ), \end{aligned}$$
(7)

where filter sizes are from 0.3 to 1.5 and the wavelength of sinusoid coefficients was 0.8, 1.0, 1.2 and 1.5.

Fig. 12
figure 12

Gabor filter bank

3.4 K-means

K-means is a technique used to cluster a dataset into k groups. In our model, we merge filter responses together and apply the K-means clustering algorithm to cluster pixels with similar features.

4 Experiments

This section presents our experimental design and evaluation.

4.1 Datasets

In our work, we use two different datasets of brain images: the MICCAI iSEG dataset and MRBrains dataset. We describe each of these datasets in the following.

4.1.1 MICCAI iSEG Dataset

The aim of the evaluation frameworkFootnote 1 introduced by the MICCAI iSEG organizers is to compare segmentation models of WM, GM and CSF on T1 and T2. The MICCAI iSEG dataset contains 10 images, named subject-1 through subject-10, subject T1 : T1-weighted image, subject T2 : T2-weighted, and a ‘manual segmentation’ label used as a training set. The dataset also contains 13 images, named subject-11 through subject-23, used as a testing set. An example of the MICCAI iSEG dataset (T1, T2, and manual reference contour) is shown in Fig. 13. The dataset has two different times (i.e., longitudinal relaxation time and transverse relaxation time), which are used to generate T1 and T2 (Table 2). The dataset has been interpolated, registered, and skull-removed by the MICCAI iSEG organizers. We present the evaluation equations in Sect. 4.2.

4.1.2 MRBrains Dataset

The MRBrainsFootnote 2 dataset contains 20 subjects for adults for segmentation of (a) cortical gray matter, (b) basal ganglia, (c) white matter, (d) white matter lesions, (e) peripheral cerebrospinal fluid, (f) lateral ventricles, (g) cerebellum, and (h) brainstem on T1, T2, and FLAIR. Five (i.e., 2 male and 3 female) subjects are provided to the training set, and 15 subjects are provided for the testing set. On the evaluation of the segmentation, these structures merged into gray matter (a-b), white matter (c-d), and cerebrospinal fluid (e-f). The cerebellum and brainstem were excluded from the evaluation.

Fig. 13
figure 13

Example of MICCAI iSEG dataset (T1, T2, and manual reference contour)

Table 2 Parameters used to generate T1 and T2

4.2 Segmentation Evaluation

To better demonstrate the significance of our model, we submitted our results to be evaluated by the MICCAI iSEG organizers [34]. The MICCAI iSEG organizers have used Dice Coefficient (DC) metric to evaluate our model.

4.2.1 Dice Coefficient (DC)

We use \(V_{\text {ref}}\) for the reference segmentation and \(V_{\text {auto}}\) for the automated segmentation. The DC is given by the following equation:

$$\begin{aligned} \text {{ DC}}(V_{\text {ref}}, V_{\text {auto}}) = \frac{2 | V_{\text {ref}} \bigcap V_{\text {auto}} |}{|V_{\text {ref}}| + |V_{\text {auto}}|}, \end{aligned}$$
(8)

where DC values are given in this range [0, 1]. 1 corresponding to the perfect overlap and 0 indicating the total mismatch.

4.2.2 Comparing Our Results with the State-of-the-Art

To demonstrate the significance of our model, we compare our achieved results with the results of five state-of-the-art models. We choose these five models because (a) they have been considered as a baseline to compare segmentation models in the literature to compare brain segmentation models [34] [25] [5] [30] [27] and (b) all implementation details of such models are publicly available.

4.3 Experiment Environment

We implement our proposed model using Python TensorFlow on a computer with a NVIDIA GPU and Ubuntu 16.04 operating system. We train and test our model on each of the two datasets independently.

5 Results and Discussion

This section discusses the evaluation results of our model compared to the state-of-the-art models.

5.1 Analysis of the Results

Our model is trained and tested on two datasets of different ages (i.e., infants and adults). Table 3 presents the results of our model to segment CSF, GM, and WM on the MICCAI iSEG dataset. Our model achieves a DC value of 94.1% in CSF segmentation. The DC values achieved from segmenting CSF by state-of-the-art models range between 83.5% and 91.5%. The results indicate that our proposed model improves CSF segmentation by 2.6%–10.6%. In addition, our model achieves DC values of 90.2% and 89.7% in segmenting GM and WM, respectively. The state-of-the-art models, on the other hand, achieved DC values in the range of 85.2%–88.6% for GM segmentation and 80.6%–88.7% for WM segmentation. According to the results achieved, we observe that our model achieves a significant improvement of 1.5%–9.6% on segmenting GM and WM. Such results highlight the remarkable efficiency gained by separating actual brain pixels from background and the additional features used in our model.

Table 3 Segmentation accuracy in dice coefficient (DC) achieved on the MICCAI iSEG dataset

Table 4 compares the results achieved by our model on the MRBrains dataset. We observe that our model achieves a DC value of 87.4% for CSF segmentation, 90.6% for GM segmentation, and 90.1% for WM segmentation. Such results surpass those achieved by the state-of-the-art models. Therefore, we argue that our model can perform better on segmenting both infant or adult brain structures.

Table 4 Segmentation accuracy in dice coefficient (DC) achieved on the MRBrains datasets

5.2 Role of MIL

Our results on the two datasets show the performance of our model with and without the multi-instance loss (MIL). Looking at the segmentation results on the MICCAI iSEG dataset shown in Table 3, we observe that MIL has contributed to about \(6\%\) improvement to the segmentation of CSF and \(7\%\) improvement to the segmentation of WM. However, for segmenting GM, we observe only a \(1\%\) improvement. The segmentation results on the MRBrains dataset shows a similar performance, i.e., higher improvement in segmenting CSF and WM, but lower in segmenting GM. The less improvement in segmenting GM is likely due to the mixed features between GM and WM regions caused by the low contrast images.

5.3 Role of G+K

Our results on the two datasets show the performance of our model with and without Gabor filter (G) and K-means (K). Looking at Table 4, we observe that G and K have contributed to about \(3\%\) improvement to the segmentation of CSF and \(3\%\) to the segmentation of GM. In contrast with MIL, G+K showed less improvement in segmenting WM rather than GM, which is, once more, likely due to unclear boundaries between these particular tissues of the brain. We aim in the future to investigate this problem further to explore the reasons behind the deviation in segmentation performance across different brain tissues.

5.4 Visualized Segmentation Results

Figure 14 shows a sample of a visualized result of our model on the subject used for validation set. We observe that our model performs well for brain segmentation, especially for brain tissues. In Fig. 15, we show an example of our segmentation results and how our result compares to the ground truth. We observe that segmentation results achieved by our model fairly close to the manual reference contour provided by the MICCAI iSEG organizers [34]. As expected, much of the improvement in our model is gained at the brain boundary (i.e., between GM and WM). Moreover, we observe that the use of multi-instance loss and Gabor filter banks enabled our model to better handle thin regions than using original brain images.

Fig. 14
figure 14

A sample of our model results on the subject used for validation. a 10 epochs, b 20 epochs, c 30 epochs

Fig. 15
figure 15

a T1, b T2, c manual reference contour, and d our model result on the subject used for validation

5.5 Execution Time

Table 5 presents the execution time (in minutes) for our proposed model compared to the state-of-the-art models. We observe that the execution of our proposed model is faster than that of the state-of-the-art models. Such results indicate that our model is more efficient and practical to be used in real-time systems. As we can see, the addition of MIL+G+K did not negatively affect the execution time of our model, thus suggesting its practicality.

Table 5 Average execution time (in minutes) and standard deviation (SD) in the MRBrains dataset

5.6 Discussion

Accuracy on two different datasets Our model is evaluated on two completely different datasets of brain images, one for infants and one for adults. Each of these datasets contains a limited number of images with low contrast. Yet, our model shows high results for segmenting brain tissues, outperforming the state-of-the-art models in this context. Future work should investigate the performance of these models on more datasets with larger numbers of brain images.

Multi-instance loss + Gabor filter bank + K-means clustering. Our model adopts a multi-instance loss, Gabor filter bank, and K-means clustering mechanisms to support the segmentation tasks. This addition to our model shows a positive effect to the effectiveness of our model, as our evaluation shows the improvement introduced by these mechanisms. We believe that adopting more sophisticated mechanisms can further help improve brain segmentation, such as different loss functions, different networks, etc.

Model complexity It can be argued that our model has become more complex with the additional layers and filtering used to extract more informative features. However, our model shows better efficiency, expressed by the faster execution times compared to the state-of-the-art models. This shows that our model maintains a balance between accuracy and efficiency, making it more practical in real world scenarios. Still, future work is encouraged to optimize our model further to make it more accurate while preserving its complexity and efficiency.

Imperfect segmentation performance Even though our model performance surpassed the state-of-the-art models, results are still not perfect. This is likely due to the limited number of MRI images, which are normally of low contrast. This can make the information of different brain tissues unclear or mixed, thus producing misleading features for each tissue. Future work is recommended to explore more advanced ways to identify the exact boundary of each brain region to achieve higher segmentation performance.

6 Threats to Validity

This section discusses the validity threats of our results and how we address them in our study.

6.1 External Validity

Threats to external validity are related to the generalizability of our results. One could argue that our datasets do not have enough samples. We mitigate such threat by using two datasets that (a) contain both infant and adult brain data and (b) were previously used by prior studies. In addition, we compare our model with five prior models on the same datasets. Furthermore, we use the small-size kernels, deconvolution layer (to upsample the input), PReLU, dropout, and normalization methods to reduce the risk of overfitting. Hence, any potential deficiency in the data should affect all the implemented models. Nevertheless, our model achieves higher performance than the state-of-the-art models.

6.2 Internal Validity

Threats to internal validity are related to experimental errors and bias. Our model is constructed using data extracted from medical images of low contracts. To mitigate such threat, we use the multi-instance loss method to reduce any potential noise in the data by separating actual brain pixels from background. Such method has improved the efficiency and accuracy of our model as well as the accuracy. In addition, our results have been evaluated by the same medical experts (i.e., the organizers of the MICCAI iSEG dataset).

7 Conclusion

We proposed a fully convolutional neural network (CNN) model for improving brain segmentation supported by (i) separating brain pixels from background using the multi-instance loss method and (ii) adding additional features using Gabor filter bank and K-means clustering. This paper expands the analysis and discussion of the performance of our model and provides detailed review of related work on the application of deep learning for brain segmentation.

Our results were evaluated by the MICCAI iSEG organizers, who found them to be fairly close to the manual reference. In addition, we compare our model to five baseline state-of-the-art models. We observe that our model achieves an improvement of up to 11%. In particular, we achieve dice coefficients that range between 87.4% and 94.1%. Such results indicate that the adoption of the multi-instance loss method and Gabor filter banks has significantly improved segmentation results. We argue that our model is more efficient and accurate in practice for both infant and adult brain segmentation.

Despite the promising segmentation results achieved by our model, we believe that further improvements can be achieved in the future. For example, conditional random fields (i.e., statistical modeling methods) can be used to predict sequences in pattern recognition and machine learning. We plan to supply a conditional random field to brain segmentation models to investigate whether it is possible to gain better segmentation performance. We also plan to perform brain boundary detection to identify the exact regions of brain regions in order to make feature extraction more precise.