1 Introduction

Gliomas tumors are the most common brain tumors that emerge from glial of the brain. They are characterized into two types: LGG which exhibit benign tendencies, and HGG which are malignant and more aggressive. The gliomas comprise 80.7% of all malignant tumors [1] and 26.5% of all primary brain tumors. In 2018 itself, there are around 80,000 new malignant cases and non-malignant primary brain tumor and other central nervous system (CNS) tumors were reported to be diagnosed. Among them, around 32% are reported to have primary malignant tumors and other CNS tumors. Also, there are around 17,000 deaths reported in US during the year 2018 due to the same problem. Only 34.9% of 5-year relative survival rate has been reported for 2018 after being diagnosed with primary malignant brain and other CNS tumors and it is only 90.47% for the cases with primary non-malignant brain tumors [2]. Therefore, it is essential to effectively segment the brain tumor images to understand the details and give proper medication. The motivation behind image segmentation [13] is to segment the image into different regions depending on given conditions and is a key task in several medical applications such as abnormality detection, post-surgical evaluation, surgical planning, etc. Segmentation of brain tumor could be quite sophisticated, challenging, and precise segmentation of MRI brain tumor is extremely necessary for detection of edema, enhancing tumor, and necrotic tissues. A challenging and time-taking job is to manually segment the brain tumor from a big number of clinically produced MRI images for diagnosis of cancer. Due to poor contrast, unknown noise, inhomogeneity, and unclear borders that are usually present in clinical images, manual segmentation doesn’t give proper results. With the achievement of outperforming results, automatic segmentation [3, 4] using deep learning techniques became popular. Deep learning methods [5, 6] also allows effective processing and objective assessment of huge images of MRI data.

1.1 Related Work

Several researchers are working towards efficient detection of MRI brain tumors and a number of recent and significant works related to this work are briefly presented in this section. Cuadra et al. [7] have proposed a technique to deform brain atlas within the existence of huge space-occupying tumors, supported a priori model of lesion advance by assuming that the lesion expands from the starting point. Ronneberger et al. [8] have suggested a network and string that is in agreement with the effective use of deep networks using data augmentation to make more efficient use of the available labelled samples. Ali et al. [9] have given an impression of all MRI brain tumor segmentation using DL techniques and explains how these methods give far superior results compared to traditional methods in terms of effective processing and evaluation of large input image data. Pereira et al. [10] proposed a reliable and automatic segmentation technique supported CNN; which explores small 3 × 3 filters due to deeper architecture, and utilized normalization of intensity as pre-processing step together with augmentation of data and proved to be very effective. Dong et al. [11] suggested, and assessed on BraTS 2015 datasets, a reliable and fully automatic segmentation technique using U-net-based FCN for effective tumor measurement. Cross-validation has demonstrated that their technique effectively delivers promising segmentation results. Havaei et al. [12] have suggested a fully automatic brain tumor segmentation technique specifically designed for low and high grade MR image glioblastomas using a novel CNN architecture with a fully connected layer, allowing 40-fold speed and a double stage training procedure that enables tumor label imbalance challenges to be addressed. In the 2013 BraTS, a cascade of CNNs was also used to achieve higher data rates. Kamnitsas et al. [13] suggested a dual, 11-layer deep, 3D CNN network to divide the brain lesion resulting from a thorough assessment of current network constraints.

Li et al. [14] have proposed an automatic glioma segmentation algorithm by first using spatial fuzzy c-means clustering to estimate ROI in multimodal MR brain tumor images, and a few seed points are extracted from there to use region growing supported algorithm on a replacement “affinity”. High metric values of dice, sensitivity, positive prediction value, Euclidean distance, and Hausdroff distance are obtained with rank of one when obtained performance metrics compared with the state-of the-art methods. Myronenko [15] has proposed an automated segmentation of volumetric MR brain tumor images which is required to diagnose, monitor, and treatment planning of the disease. They described semantic segmentation model based on encoder-decoder architecture for extracting the tumor sub-region from 3D MRIs and won first place in the 2018 BraTS challenge. Iqbal et al. [16] presented an extended version of multiple neural network layers connected to peer-level feeding of convolutional feature maps in sequential order for segmentation of brain tumor in multi-spectral MRI using CNN on BraTS dataset. Fabian Isensee et al. [17] has shown the adequacy of a well-trained U-net with regard to the BraTS 2018 challenge which improved the segmentation performance with minor adjustments in the network design. A large patch size was used to make the training dataset and the proposed network was trained with the above dataset using a dice loss function. An aggressive Dice scores are accomplished on validation data and stood in second position. Mehta et al. [18] presented multi-modal brain tumor by using a 3D U-net modified design. The adjustment here is a stronger gradient flow, which helps the network to learn learnable parameters effectively and to produce better segmentation. 2018 BraTS training dataset is used to train the network and good segmentation results are obtained.

Stawiaski [19] have presented a DenseNet based on a densely connected convolution network encoder for automatic segmentation of MRI brain tumor in 3D multi-modal images and attained the average DSC scores of 0.79, 0.90, and 0.85 for ET, WT, and TC, respectively by evaluating the challenge of BraTS 2018. Feng et al. [20] suggested a 3D U-net for segmentation of MRI brain tumor with diverse hyper parameters and a linear model is developed from extracted features of imaging and non-imaging for patient survival prediction. The average dice scores of 0.7917, 0.9094, and 0.8362 are achieved for ET, WT, and TC, respectively. McKinley et al. [21] launched a novel classifier family using DeepSCAN architecture in which tightly linked dilated convolution blocks are integrated in a shallow down or up-sampling U-net-style connection framework. These networks are trained using the 2018 database of the Multimodal BraTS. Yang et al. [22] proposed a deep learning system combining small kernels two-path CNN (SK-TPCNN) and random forest algorithm for the segmentation of brain tumors in MRI. The SK-TPCNN structure combines small convolution kernels with large convolution kernels to improve the capacity of nonlinear mapping and prevents over-fitting, and also increases the multiformity of features. The algorithm is validated and evaluated on the BraTS challenge of 2015. Hu et al. [23] suggested a novel technique based on multi-cascaded CNN (MCCNN) and FC conditional random fields (CRFs) for brain tumor segmentation. They trained three models for segmentation using image patches obtained from three different orientations like sagittal, coronal, and axial; and final result is obtained by combining them all.

2 BraTS Database

The scope of the BraTS [24] challenge is that the specialized best segmentation techniques are used in the analysis of brain tumor in MRI multimodal sequence scans. The multi-institutional MRI sequence scans of pre-operative images are used in BraTS 2018 database and concentrated on segmentation of heterogeneous brain tumors, particularly gliomas, moreover, to locate the medical importance of the segmentation task.

2.1 Segmentation Task of Gliomas

The labels given in the BraTS database are ‘1’ for non-enhancing tumor (NET) and necrotic (NCR), ‘2’ for edema (ED), ‘4’ for enhancing tumor (ET) or active tumor (AT), 0 for everything else. In this work, three glioma sub regions are used to evaluate the segmentation method which are ET, WT, and TC respectively. Enhancing tumor is calculated by finding the area of hyper-intensity in T1Gd [6] or T1c image when contrasted with image T1-weighted, yet additionally, when contrasted with healthy white matter in T1c image. The ET is segmented from label 4 in T1c image modality of database. Core tumor is segmented by using the combination of labels 1 and 4 in T1c image. The TC describes the non-enhancing and also the necrotic regions of the tumor. The presence of NCR and NET tumor core is regularly hypo-intense in T1c when contrasted with T1. Whole tumor consists of labels 1, 2, and 4 and is a complete extension of the tumor which is segmented mainly from T2 and FLAIR. It is utilized to check the edema’s extension and separate it against ventricles and other necrotic structures.

2.2 Modalities of MRI Brain Tumor Sequence

All MRI sequences are available as a format of NIfTI files (.nii.gz) in BraTS database. They are described as (i) T1-weighted native image (T1), acquired in the orientation of axial, sagittal, and coronal (ii) T1-weighted image post-contrast enhancement (T1c), (iii) T2-weighted native image (T2) and T2 weighted FLAIR volumes. In general, the difference between T1 and T2 images is cerebral spinal fluid (CSF) which appear as dark in T1 imaging and bright in T2 imaging. Another commonly used MRI scan sequence is the T2-weighted FLAIR which is like a T2-weighted image. Abnormalities continue to be bright, though the CSF is attenuated and created dark in FLAIR so that it is very easy to differentiate the abnormality and CSF in FLAIR imaging. Contrast enhancement or Gadolinium (Gad) is a post contrast enhancing agent in T1-weighted imaging which is injected throughout the MRI scan so that it changes pixel intensities by shortening T1. Accordingly, Gad is very bright on T1 images. T1c images are particularly helpful in observing breakdown in the blood–brain barrier and vascular structures.

3 Methodology and Model Architecture

In this section, the algorithm and methodology used to solve the problem are described and architecture of the proposed model is also explained.

3.1 Algorithm Description

figure c

3.2 Methodology

The MRI images in BraTS database are already skull stripped and resampled to 1 mm3 resolution. These images change as the intensity of same tissues differ across the image due to the bias field distortion. The N4ITK (N3) algorithm is used to correct the intensity non-uniformities of the image which are affected by the inhomogeneity of scanner’s magnetic field. The SimpleITK is used to read the NIFTI format data and covert to numpy array format. The data size of each subject is 240 × 240 × 155, only the 60th–130th axial slices are picked as training data as the remaining part of brain is very unlikely to have any tumor. The mean and standard deviation are used to make the slices zero-mean normalized.

The deep learning model utilized to segment MRI brain tumor is demonstrated based on U-net and VGG16 architectures. Multi modal MRI brain tumor images of size 240 × 240 × 3 are applied as input and the database images are in the format of nii (neuroimaging informatics technology initiative) which are generally used to represent brain imaging data. In this methodology, brain tumor sub-regions like ET, TC, and WT are segmented separately. Due to complexity of medical images, applying single modality of the image and using single model are not enough to segment sub region of MRI brain tumor as the tumor core is internal portion of the edema and ET is part of TC. To solve this problem, the advantage of full tumor prediction and calculation of the center point of full tumor is used, then the center point is used to crop out the training data for ET and TC. The number of cropping images depend on the size of full tumor and even the overlap part is cropped to do data-augmentation, cropping size is fixed as 64 × 64. The T1c image is cropped according to center point of the full tumor prediction. If the patch size is bigger than 64 × 64, then more than one patch is cropped. Then the 64 × 64 training data is applied to another deep learning model to train and predict. The result of tumor core prediction and enhancing tumor prediction are pasted back to original full tumor prediction according to the center point of it.

Figure 1 presents the proposed approach in a flowchart form. Instead of using all MRI modalities, T2 and FLAIR data are used for full tumor segmentation using 23 layers DL model. Only T1c modality for enhancing tumor and tumor core segmentation using 18 layers DL model are used to accelerate training. The whole tumor is primarily obtained by segmenting the T2-weighted images and is utilized to check the edema’s extension in T2-weighted FLAIR and discriminate it against ventricles and other necrotic structure. Enhancing tumor and tumor core are both segmented by evaluating the hyper-intensities in T1-weighted contrast enhanced images and patches obtained from full tumor images earlier.

Fig. 1
figure 1

Segmentation of glioma sub regions using proposed methodology

3.3 Deep Learning Model Architecture

The segmentation process of brain tumor sub regions consists of two models. A 23-layer deep learning model is used to segment full tumor by training the model with T2 weighted and T2 FLAIR modalities. Figure 2 shows the proposed DL model architecture. This segmentation result is used as one of the input and another is T1c images which are applied to train the 18 layers DL model to segment ET and TC separately that is shown in Fig. 1. It is clear that all the modalities of data are not considered at a time. This is the first difference from original U-net. The proposed 18 layer DL model is obtained by removing C4 block from contracting path and E1 block from expanding path. The proposed DL models are thought of as an auto-encoder, wherever a contracting path tries to learn features of the image and an expanding path attempts to utilize these features to reconstruct the image with low dimensional data like the ground truth. The convolution or pooling layers are stacked in the contracting path, whereas in the expanding path up-sampling or transposed convolution layers are incorporated. With the up-sampled output of various phases, high-resolution features are concatenated from the contracting path in order to localize. These are known as skip connections [25, 26]. The difference between original U-net and proposed DL model is that a batch normalization layer is connected after each convolution layer to keep the gradient levels controlled, speed up convergence and decrease the result of internal covariate shift, so that the network’s parameters are not changed rapidly during backpropagation. The convolution layers with 3 × 3 filters are considered and use the same padding to retain the convolution layer output size, which is also different from original U-net. From VGG16 architecture, two convolution layers/batch normalization layers are stacked to extend the receptive field at the least resolution. After two convolution layers and batch normalization, a max-pooling layer of size 2 × 2 with a stride of 2 is considered for down-sample the image size to 1/2. The feature channels are doubled at every down-sampling step. There exists an up-sampling of the feature map on each expanding path followed by a transposed convolution of size 2 × 2 halving the feature channels, a concatenation feature map is cropped with the respective contracting path, and two 3 × 3 convolutional filters. Since it is binary segmentation for each model, the final one is a 1 × 1 convolution layer with one filter, producing binary prediction that 1 is tumor and 0 is non tumor.

Fig. 2
figure 2

Architecture of proposed deep learning model

4 Performance Evaluations

The performance measures [27] such as accuracy, error rate, sensitivity, specificity, F1-measure, dice similarity coefficient (DSC), and Jaccard similarity coefficient (JSC) are used to evaluate the proposed DL model performance. The performance metrics are mostly evaluated on the basis of the confusion matrix and is shown in the Table 1. DSC and JSC coefficients are taken to measure the surface similarity of the glioma sub-regions.

Table 1 Confusion matrix

True Positives (TP) are the cases when the tumor (1) data point of ground truth image is correctly labeled as tumor (1) data point of the segmented image.

True Negatives (TN) are the cases when the non-tumor (0) data point of ground truth image is correctly labeled as non-tumor (0) data point of the segmented image.

False positives (FP) are the cases when the non-tumor (0) data point of ground truth image is wrongly labeled as tumor (1) data point of the segmented image.

False negatives (FN) are the cases when the tumor (1) data point of ground truth image is correctly labeled as non-tumor (0) data point of the segmented image.

Accuracy gives the relation between the number of correctly labeled data points to the total number of data points:

$$Accuracy = \frac{(TP + TN)}{{(TP + TN + FP + FN)}}*100$$
(1)

Precision or specificity is expressed as the ratio of number of true positives to number of positive calls. It is also called as positive predictive rate (PPR):

$$Specificity = \frac{TP}{{TP + FP}}$$
(2)

Figures 4, 5, 6, and 7 images show in first row a–d: T1 image, T2 FLAIR image, T1-contrast image; from second row e–h: ground truths of WT (full), TC (core), ET (enhancing tumor), all sub-regions combination (all); from third row i–l: prediction of WT (full), TC (core), ET (enhancing tumor), all sub-regions combination (all).

Recall or sensitivity is defined as probability of a positive test given that the patient has tumor. It is otherwise called as true positive rate:

$$Sensitivity = \frac{TP}{{TP + FN}}$$
(3)

F1 Score is expressed as the accuracy of a test and is interpreted as a weighted average of recall and accuracy, whereas F1 score touches its highest value at 1 or closer to 1 and the poorest value at 0 or closer to 0:

$$F1 - Score = \frac{{2*\Pr ecision*{\text{Re}} call}}{{\Pr ecision + {\text{Re}} call}}$$
(4)

Error rate is expressed as the ratio of data points that are classified incorrectly to the total data points taken. It can be defined as 1-accuracy:

$$Error\,rate = \frac{FP + FN}{{TP + TN + FP + FN}}*100$$
(5)

DSC is a quantity of overlap among the expected image and ground truth image:

$$Dice = \frac{2*TP}{{TP + FP + FN}}$$
(6)

JSC is another widely used surface overlap measurement which can be defined as the surface overlap between segmented image and its corresponding ground truth image:

$$Jaccard = \frac{TP}{{TP + FP + FN}}$$
(7)

5 Results and Discussion

The proposed deep learning models are trained with BraTS 2018 database which consists of 285 MRI scan images of glioma patients. Among them, 210 images belong to high grade glioma (HGG III-IV) patients, that is, malignant brain tumor images and the remaining 75 are low grade glioma (LGG I-II) representing the benign brain tumor images. The data provided in the BraTS database is supervised, preprocessed, and ground truths of the training dataset are also provided. The training dataset of BraTS 2018 data base is split into two datasets namely training and testing datasets with a ratio of 70:30. Therefore, the training dataset consists of 200 patients and the testing dataset consists of 85 patients. For each patient having four modalities of MRI scan sequences and each of the modalities again comprises 155 slices with volumetric images. The four imaging modalities in the BraTS database are described as T1, T1c, T2, and T2 FLAIR images, and all of them are available in sagittal view, axial view, and coronal view. Axial view imaging modalities of the database are considered for experimentation. The entire work is implemented using Keras, TensorFlow as a backend in Python environment and executed on google colaboratory platform using defined datasets.

The most popular medical image segmentation models, U-net and VGG16 model architectures are considered to propose a deep learning model to perform the segmentation. In this work, various sub regions of brain tumor like ET, TC, and WT are detected and segmented. A data augmentation is applied to the original dataset to produce more training data virtually. The data augmentation is to enhance efficiency of the network by providing more training data. The data augmentation includes simple transformation like rotation, flipping, shifting, shear operation, brightness, elastic distortion, and zooming. These operations result in displacement field of images, tumor global shape is slightly distorted in horizontal direction, and more training data is generated.

In preprocessing, the data is normalized with zero mean to avoid zero standard deviation problem. Generally, the cross entropy loss evaluates the class label prediction for each pixel vector individually and averages it over all pixels. This can be a problem if various classes in the database have unbalanced representation in the image, as training is dominated by the most prevalent class. Instead of cross entropy loss, a soft dice coefficient loss is used as a cost function which is evaluated for each class separately and then averaged to yield a final score. The stochastic gradient decent momentum is considered as optimizer with specific parameters to minimize the cost function. The adaptive momentum estimator is adopted to assess the factors. The hyper parameters for training process are considered as learning rate of 0.0001 and maximum number of epochs is equal to 60.

The proposed model to extract the sub regions of MRI brain tumor is trained with a dataset which is 70% of BraTS training dataset having both HGG and LGG cases. The model and methodology are evaluated on randomly selected ten number of HGG case patient images from the testing dataset (which is split into 30% of BraTS training dataset). The HGG cases are considered for evaluation because they are aggressive tumors leading to patient life threat. For each patient, the evaluation is done on three sub-regions extracted from the MRI brain tumor and the time taken for evaluating each patient is approximately 26 s. Sub-regions of MRI brain tumor considered for evaluation are the ET, TC, and WT. The whole tumor includes 1, 2, and 4 labels, tumor core is a combination of 1 and 4 labels, and ET is represented label 4 only. The WT is a complete extension of the tumor and is primarily segmented from T2 images and the extension of edema with T2 weighted FLAIR images is validated.

Figures 3, 4, 5, 6 and 7 show the MR images of five patients considered randomly to extract the sub regions of MRI brain tumors using DL models. Images shown in figures from a–d in first row of each figure are T1, T2, FLAIR, and T1c. Second row e–h in each figure are ground truth of whole tumor (full), ground truth of tumor core (core), ground truth enhancing tumor (et), ground truth of all sub-regions combination (all). Third row i–l in each figure are prediction of whole tumor (full), prediction of tumor core (core), prediction of enhancing tumor (et), prediction of all sub-regions combination (all).

Fig. 3
figure 3

Brain Tumor segmentation of patient name Brats18_2013_3_1 using deep learning models. Images show in first row ad: T1 image, T2 FLAIR image, T1-contrast image; from second row eh: ground truths of WT (full), TC (core), ET (enhancing tumor), all sub-regions combination (all); from third row il: prediction of WT (full), TC (core), ET (enhancing tumor), all sub-regions combination (All)

Fig. 4
figure 4

Brain Tumor segmentation of patient name Brats18_2013_5_1 using DL models

Fig. 5
figure 5

Brain Tumor segmentation of patient name Brats18_CBICA_AQU_1 using DL models

Fig. 6
figure 6

Brain Tumor segmentation of patient name Brats18_CBICA_BHK_1 using DL models

Fig. 7
figure 7

Brain Tumor segmentation of patient name Brats18_TCIA01_147_1 using DL models

Data of ten patients is chosen randomly to extract the sub regions like ET, TC, and WT and the proposed models are evaluated. The performance metrics like accuracy, error rate, sensitivity, specificity and two more measurements are considered to measure structural overlap between predicted sub regions (in segmented image) to ground truth sub regions (in ground truth image). Figure 4 shows effective sub-region segmentation of MRI brain tumor using the proposed methodology and is shown in Fig. 1. The ET, TC, and WT, are extracted with an accuracy of 96.68%, 98.88%, and 98.37%, respectively. These results assist the radiologist and a doctor can detect the exact size, shape (structural), location of the sub-regions of brain tumor like ET, TC, and WT respectively.

Table 2 shows the results of DL model evaluation in terms of performance metrics such as accuracy, error rate, specificity, sensitivity, and F1-score. The model has an accuracy of 99.74%, error rate of 0.26%, specificity of 0.9776, sensitivity of 0.9572, and F1-score of 0.9670. Considering all these results, it is concluded that the deep learning model has outperformance by reducing false negatives in the prediction phase. From Table 3, the sub regions of MRI brain tumor are calculated in terms of structural overlap similarity metrics dice coefficient and Jaccard coefficient. It shows how much the extracted sub region is structurally similar with sub region of ground truth of MRI brain tumor. The results show that the extracted sub regions like ET, TC, and WT are very close to ground truths with average dice scores of 0.9152, 0.9281, and 0.9670 and average Jaccard scores of 0.8471, 0.8835, and 0.9374 respectively.

Table 2 Performance evaluation metrics of DL models
Table 3 Comparison of DSC and JSC for ET, TC, and WT

From Table 4, the first three models are trained on BraTS 2018 database. These models stood in first three places of BraTS 2018 challenge and their dice scores for MRI brain tumor sub regions segmentation are tabulated. In BraTs 2017 challenge, Pereira et al., Havaei et al., Kamnitsas et al., and Dong et al. stood in the first three positions and the dice scores obtained are tabulated in Table 4. The proposed DL models prove that it is performing better with dice scores attained on an average of 0.92, 0.91, and 0.96 for TC, ET, and WT of the brain tumor sub regions. Figures 8, 9, and 10 show that the plots of dice score of TC, ET, and WT verses various models.

Table 4 Comparison of dice similarity coefficient of deep learning models with other models
Fig. 8
figure 8

Plot of tumor core verses various models in terms of DSC

Fig. 9
figure 9

Plot of enhancing tumor verses various models in terms of DSC

Fig. 10
figure 10

Plot of whole tumor verses various models in terms of DSC

6 Conclusions

In this paper, brain tumor segmentation of glioma sub-regions is performed using two models. Firstly, a 23 layer deep learning model is used to segment whole tumor and an 18 layer DL model is used to segment ET and TC. These models are constructed on the basis of U-net and VGG16 model architectures. The whole tumor is mainly obtained by segmenting the T2-weighted images and is utilized to check the edema’s extension in T2-weighted FLAIR and categorized when compared to ventricles and other necrotic structures. Enhancing tumor and tumor core are both segmented by evaluating the hyper-intensities in T1-weighted contrast enhanced images. The proposed models and method have produced outperforming results in terms of DSC, JSC, accuracy, sensitivity, and specificity for ET, TC, and WT sub regions of MRI brain tumor. The results on the testing dataset of BraTS 2018 database shown that the proposed DL models and methodology achieve outperforming results with average DSC of 0.9670, 0.9281, 0.9152 and JSC of 0.9374, 0.8835, 0.8471 for ET, TC, and WT, respectively. The proposed model (23 layers) has the performance metrics such as accuracy of 99.74%, error rate of 0.26%, specificity of 0.9776, sensitivity of 0.9572, and F1-score of 0.9670. The experimental results show that the regions ET, TC, and WT are extracted and located effectively. In the clinical practice, this pathology system assists the radiologist to detect size, location, and shape of the brain tumor precisely and hence the radiologist or a doctor can take consistent decisions, plan the best possible treatment so that the patient survival rate is improved.