Introduction

Early and accurate detection of brain tumor grade has a direct impact not only on the patient’s estimated survival but also on treatment planning and tumor growth evaluation. Among the central nervous system (CNS) primary brain tumors, gliomas could be considered as the most aggressive [1]. Recently, the World Health Organization (WHO), in its revised fourth edition published in 2016 [2], has considered two categories of glioma tumors: the low-grade (LG) and the high grade (HG) glioblastomas. The LG gliomas tend to exhibit benign tendencies. However, they have a uniform recurrence rate and could increase in grade over time. The HG gliomas are undifferentiated and carry a worse prognosis [3]. Among the recent sophisticated technologies, MRI could be considered as one of the main modalities used to image brain tumors for diagnosis and evaluation. Accurate identification of the tumor grade could be considered as a critical phase for various neuroimaging explorations [4]. Such an operation could be considered a time-consuming task. Consciously, several machine learning [5,6,7] and deep learning-based approaches [8, 9] have quickly evolved during the past few years illustrating that today’s medicine depends a lot on advanced information technologies. Several approaches have been proposed in the literature in order to classify brain tumors through MR imaging. They could be divided into two categories: supervised and unsupervised approaches. (1) The supervised approaches adopt a well “labeled” data. In fact, these approaches learn from labeled training data to predict results for unforeseen ones. (2) The unsupervised approaches could be defined as machine learning ones which do not need to supervise the model. Rather, we must permit the model to labor on its own to collect information from unlabeled data. The study [10] proved that the best classification accuracy of glioma brain tumors was achieved using 3D discrete wavelet transform (DWT) for feature extraction combined with random forest (RF) classifier. A comparative study with several classifiers such as multi-layer perceptron (MLP) [11], radial basis function (RBF) [12], or naive Bayes classifier [13] has been realized in order to attest the performance of the 3D DWT [10]. Support vector machine (SVM) has been also widely used for MR image classification [14, 15]. The study in [16] investigates a hybrid system based on genetic algorithm (GA) and SVM with Gaussian RBF kernel. GA optimization has been used to select the most relevant features. Experimental results illustrate that the use of GA improves the SVM’s classification accuracy.

Despite the significant potential of the supervised approaches in glioma brain tumor classification, it required specific expertise for optimal features extraction and selection techniques [17]. Over the past several years, unsupervised approaches [18] have gained researchers’ interest not only for their great performances but also because of the automatically generated features which could reduce the error rate. Recently, deep learning (DL)-based methods have emerged as one of the most prominent methods for medical image analysis such as classification [19], reconstruction [20], and even segmentation [21]. Recently, Iram et al. [22] discussed the use of a pre-trained VGG-16 model [23] for feature extraction. Feature map has been feed to long short-term memory (LSTM) recurrent neural network [24] to classify brain tumors into high/low grade. Authors assume that the pre-trained CNN models for feature extraction present better performance when cascaded with LSTM and achieve higher accuracy compared with GoogleNet [25], ResNet [26], and AlexNet [27]. Apparent 2D CNN limitations for brain tumors MRI classification have been discussed in recent work [28]. As a solution, a voxelwise residual network (VoxResNet) based on 3D CNN-based architecture for the identification of white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF) has been proposed [28]. Experimental results confirm the efficiency of brain tissues’ classification from volumetric 3D MR images. This algorithm has been ranked first in the MR brain challenge in 2017, outperforming 2D CNN’s state-of-the-art methods. Authors in [29] have proposed an end-to-end 3-dimensional convolutional neural network (3D CNN) with gated multi-modal unit (GMU) fusion to integrate the information both in three dimensions and in multiple modalities. The whole MRI images have been used in order to be applied directly to 3D convolutional kernels using different MRI directions (sagittal, axial, and coronal). Inspired by the potential success of such architecture, we were motivated to implement a 3D CNN for brain glioma tumor classification.

The accurate glioma brain tumor classification has been considered as a harmful task due to highly inhomogeneous tumor regions composition. In fact, the tumor region includes edema, necrosis, and enhancing/non-enhancing tumor. Furthermore, some tumor sub-region intensities’ profiles may overlap with healthy tissues.

We propose, in this paper, a novel glioblastomas’ brain tumor grade classification approach based on deep three-dimensional convolutional neural network (3D CNN) in order to distinguish between HG and LG tumors. The principal research contributions of this paper are mainly:

  • The proposed approach presents an automated multi-scale 3D CNN-based architecture for brain MRI glioma tumor classification on the basis of the World Health Organization (WHO) standards.

  • A preprocessing method [30] for volumetric MR images has been used to improve the performance of CNN to overcome the major problem in MR images such data heterogeneity and low contrast.

  • Using deep architecture through small 3D kernels size, the proposed architecture has the potential to extract more local and global contextual features with highly discriminative power for glioma classification with reduced computational and memory requirements.

  • We have applied a data augmentation technique to generate new patches from the original ones to overcome the lack of data and to tackle the large variation of brain tumors heterogeneity.

  • Comparison results over MICAII challenge “BRATS-2018” prove that the proposed approach could yield the best performance by presenting the highest classification accuracy compared with supervised/unsupervised recent state-of-art methods.

The remainder of this paper is organized as follows: Proposed approach details the proposed approach. Results explores the experimental results and discusses the obtained results. Finally, conclusions are drawn in Discussion.

Proposed Approach

The proposed approach investigated a real 3D deep CNN architecture for automatic MRI glioma brain tumor grading. For instance, a 2D deep learning model learns increasingly complex features’ hierarchy, by implementing many trainable filters’ layers and optional pooling operations. The majority of these methods do not entirely examine the volumetric information in MR images but explore only two-dimensional slices. These slices could be considered independently or by using three orthogonal 2D patches to merge the contextual information [21]. Hence, our proposed approach, based on employing 3D convolutional filters, takes advantage of generating more powerful contextual features that deal with large brain tissues’ variations [13].

Furthermore, to boost the proposed model performance, we adopted a preprocessing approach based on an intensity normalization followed by a contrast enhancement technique for MR images. Such a process could be considered as not conventional (typical) in CNN-based classification approaches. The proposed approach flowchart is illustrated in Fig. 1.

Fig. 1
figure 1

Flowchart of the proposed approach for glioma brain tumor classification

The proposed approach includes essentially the following steps:

  1. 1.

    Data preprocessing: Normalization and contrast enhancement, through T1-Gado MR scans, have been applied in order to enhance the images’ quality, followed by a resizing step of the input images to optimize the required memory.

  2. 2.

    Data augmentation: A simple flipping method is performed in order to fill the gap of data’s lack to ensure an efficient CNN training.

  3. 3.

    3D CNN architecture design and optimization: The hyper-parameters, such as the number of convolution layers, pooling layers, and fully connected layers (FCLs) have been settled.

  4. 4.

    Model training: Training the proposed model using the augmented dataset and enhanced MR images.

Preprocessing

One of the major difficulties in MRI analysis is to deal with the thermal noise and the artifacts caused by the magnetic field and the small motions produced by the patient during the scanning process. In fact, existing noise in the acquired MRI scans could corrupt the fine details, blur tumor edges, and even decrease the images’ spatial resolution [31]. Thus, it could seriously degrade the performance of CNN-based methods by making the feature extraction more complicated [32]. For this reason, denoising and contrast enhancement [33] techniques for MR images [4] have gained recently a lot of interest and have been widely investigated by researchers to ameliorate the quality of the data before engaging in MRI brain tumors exploration such as classification and/or segmentation [28, 34, 35].

Since the MRI scans could be collected from multiple institutions, the MRI scan intensities may vary significantly. Therefore, the intensity normalization [21] based on linear transformation in the range [0, 1] through the min-max normalization technique is used to reduce intensities inhomogeneity.

In this work, our preprocessing consists of 3 steps as illustrated in Fig.2: First, we apply intensity normalization of the whole T1-Gado MRI scans followed by an MRI contrast enhancement method, previously developed in a previous work [30]. Finally, we resize the input MR images for memory optimization purposes. In fact, the size of the MR images in the BraTS database is 250 × 250 × 155. The considered size is then 112 × 112 × 94. The adopted image resizing is the cubic B-spline method [36].

Fig. 2
figure 2

Preprocessing steps

Data Augmentation

In computer vision, the data augmentation could be considered as an important key factor that is very effective in training highly deep learning based-methods [37]. A variety of data augmentation strategies have been proposed in the literature [38] for deep learning in medical imaging such as random crops, rotation, shears, and flipping. Recent studies [39] prove that some augmentation strategies could capture medical image features more effectively than others leading to better accuracy. This study demonstrates that the flipping technique is the optimal data augmentation strategy for medical imaging classification that leads to more discriminative feature maps compared with other techniques. Based on this study, we are encouraged to adopt the only horizontal and vertical flipping technique to generate new patches for each image in the training dataset.

Proposed Architecture: Deep Multi-scale 3D Convolutional Neural Network

The proposed architecture, illustrated in Table 1, is built with eight convolutional layers and three fully connected (FC) layers. For accurate brain glioma tumor classification, our proposed CNN model is based essentially on two principal components:

  1. 1.

    Unlike 2D-CNN architecture, which does not entirely examine the volumetric information in MR images but explore only two-dimensional slices, we adopt a 3D convolutional layer that provides a detailed feature map exploiting the entire volumetric spatial information to incorporate both the local and global contextual information.

  2. 2.

    Deep network architecture that produces a better quality of local optima. The additional nonlinearity, in such architecture, could yield highly discriminative power [40]. As a result of the richer structures captured by the deeper models, the deeper architectures have previously shown its effectiveness for natural images’ classification. On 3D networks, its impact could be considered as more drastic [40]

Table 1 Proposed architecture

However, 3D CNN is computationally and memory exhausted as an increased number of trainable parameters are required when compared with the 2D CNN variant. Thus, as a solution, we proposed the exclusive use of 3 × 3 kernels at each convolutional layer which could be considered as faster to convolve allowing stacking more layers with reduced weight. Meanwhile, the pooling layers were used in order to reduce the size of the intermediate layer. Another adopted solution to deal with memory constraints is the use of reduced filters’ number per layer especially in the first two layers of the network where the features have higher dimensionality (only32 filters in the first layer and 64 in the second layer).

Only eight convolutional layers were stacked to avoid that the extracted features become more abstract with the network’s increased depth. The next subsections detail the proposed CNN architecture and the adopted hyper-parameters.

Activation function

The activation function could be considered as the responsible of the nonlinearity which transforms the data. The rectifier linear unit (ReLU) is deployed as an activation function in the proposed model, defined in Eq. (1), where f (i) represents the function of neuron’s output of an input called “i.

$$ f(i)=\mathit{\operatorname{Max}}\left(0,i\right) $$
(1)

We adopt “ReLU” to achieve better performance considering its ability to faster train deep CNN alternately to classical “sigmoid” or “hyperbolic tangent” functions given by Eq. 2.

$$ f(i)=\tan h\left(\mathrm{i}\right) $$
(2)

Pooling

Pooling is a down-sampling strategy on CNN. We could specify essentially two conventional forms for pooling such as max pooling [41] and average pooling [42]. The average pooling is characterized by the consideration of all elements in a pooling region, even the parts which have low magnitude. The combination between the average pooling and the (ReLU) activation function leads to the creation of down-weighting strong activations’ effect as a result of the average computation takes into account many zero elements. Even with hyperbolic tangent activation functions, which could be considered as a worse case, the strong positive and the negative activation could cancel each other out, which could engender then smaller pooled responses [43]. Fortunately, max pooling does not present such drawbacks. For this reason, max pooling has been used, in this work, since it extracts the most relevant features for classification like tumor edges [21]. A max filter has been applied to the max pooling process to non-overlapping the initial representation’s sub-regions. Fig. 3

Fig. 3
figure 3

Max pooling concept with 2 × 2 filters and stride 2

Regularization

For the fully connected (FC) layers, we have used the dropout [44] as a regularization to boost the generalization ability and to prevent overfitting. Dropout removes stochastically the network’s nodes with probability for each region during training. Therefore, all FC layers’ nodes have been forced to learn better the data’s representations while preventing nodes from co-adapting to each other. All nodes have been used at the test time. Dropout could be then considered as a different networks’ set and a banging’s form because each network has been improved by a training data portion.

Training Step

For efficient data training, the owing parameters was discussed: optimizer, loss function, and initialization

Optimizer

Adam is an optimization algorithm that could be used as a substitute for the classical stochastic gradient descent (SGD) procedure to refresh network weights. Such optimizer combined two extensions of stochastic gradient descent’s advantages specifically: adaptive gradient algorithm (AdaGrad) and Root Mean Square Propagation (RMSProp) [45].

Loss Function

In this research, categorical cross-entropy is employed as a loss function. This function is used to compare the predictions’ distribution with the true one according to Eq. (3) where ŷ and y represent respectively the predicted and the target values.

$$ L\left(y,\hat{\mathrm{y}}\right)=-\sum \limits_{j=0}^M\sum \limits_{i=0}^N\left( yij\ast \log \left(\hat{\mathrm{y}} ij\right)\right) $$
(3)

Initialization

The Glorot normal called also Xavier normal has been used since it could be considered as one of the most recommended common initialization schemes for deep CNN architectures [21]. We ensure then to maintain in control of the activations and the gradients. Samples have been drawn from uniform distribution within [− limit, limit] given by Eq. 4 where fin and fout represent respectively the input’s number and the output units in the weight tensor.

$$ \mathrm{limit}=\sqrt{\frac{6}{f_{in}+{f}_{out}}} $$
(4)

Results

This section is dedicated to the presentation of the experiment’s results in order to justify the hyper-parameters choice and to validate the real impact of the proposed approach’s main contributions. Python 3.4 environment has been used to construct the proposed 3D-CNN using the KERAS and Tensorflow backend library on a workstation Intel-i7 2.60 GHz CPU, 19.5Go RAM equipped with NVIDIA GPU Geforce GTX 1080 Ti 11Go RAM.

Dataset

The evaluation was carried out on the multi-modal Brain Tumor Segmentation Challenge (BraTS 2018) [46] in conjunction with the MICCAI conference. This challenge has essentially been taken in order to compare among the current state-of-the-art methods for the multi-modal segmentation task. Nerveless, provided annotation into HG and LG glioma tumors, approved by experienced neuroradiologists, inspired us to use such database for the classification purpose. Each subject case in BraTS-2018 has four volumetric MRI scans: the native (T1), the post-contrast T1-weighted (T1-Gado), the T2-weighted (T2), and the fluid-attenuated inversion recovery (FLAIR). All data have been previously skull-stripped, co-registered to the same anatomical template, and interpolated to the same resolution (1 mm3).The BraTS-2018 training dataset comprises 284 subjects that include 209 HG and 75 LG glioma tumors. The validation data comprise 67 mixed grades glioma tumors. The neuroradiologists have assessed radiologically the complete original TCIA glioma collections, and the dataset has been updated with more routine clinically-acquired 3T MRI scans. The BraTS-2018 is available through the Image Processing Portal of the CBICA@UPenn (IPP, ipp.cbica.upenn.edu). Figure 4 illustrates a HG and LG subject case from the BraTS dataset.

Fig. 4
figure 4

a High-grade (HG) glioma subject case. b Low-grade (LG) glioma subject case

Evaluation Metric

The accuracy has been used as an evaluation metric to assess the proposed approach’s efficiency. The classification accuracy is defined as follows:

$$ ACC=\frac{TP+ TN}{TP+ TN+ FP+ FN} $$
(5)

where the true positives (TP) represent the high-labeled data that are correctly classified, while the true negatives (TN) are the correctly predicted, although the false positives (FP) are MR images that are classified wrongly and the false negatives (FN) represent data that are correctly classified.

Validation of the Proposed Approach

In this section, we will firstly assess each processing step’s effects on the final classification accuracy. Then, we will validate the use of a deep architecture by making a comparative study with shallower networks architecture.

Preprocessing’s Validation

The training dataset quality has a direct impact on CNN performance. The publically available dataset (BraTS) could be considered as highly heterogeneous since it has been collected from multiple sites with different scanner technology and acquisition parameter settings. Thus, it may affect the MR scan qualities that could seriously limit the classification performance of the CNN model. To reduce the data heterogeneity, several researchers, such as Pereira et al. [21], found the main gain (4.6%) in the overall accuracy of CNN-based architecture after applying an intensity normalization using the same dataset for brain tumors segmentation. Insipid by the positive effect of intensity normalization on optimizing the performance of CNN for the segmentation [47], we applied the min-max normalization technique for voxels’ intensity normalization.

For contrast enhancement, we have applied an adaptive contrast stretching technique based on the original image statistical information [30].This technique preserves the tumor’s edge as well as the original image significant features. The applied technique has achieved encouraging results in MR images’ region of interest (ROI) contrast enhancing without an over noise amplification of the entire image. The final step of our proposed preprocessing is resizing the brain MR input image in order to overcome the computational complexity.

To gauge the suggested preprocessing’s impact on MRI glioma grading, we evaluate the accuracy with/without the preprocessing step. Figure 5 illustrates the obtained accuracy in the function of epoch’s number.

Fig. 5
figure 5

Preprocessing impact on classification’s accuracy

According to obtained results, one could notice that the accuracy is clearly better when applying the preprocessing confirming the effectiveness of the applied data preprocessing and the efficient model training.

Data Augmentation’s Validation

Medical imaging benchmarks are often imbalanced which could be considered as a serious problem especially when deep CNN is established for a fully automatic classification causing erroneous diagnosis guidance for the tumor grade diagnosis [39]. For instance, in the used dataset, the number of LG glioma brain tumor subject cases is much lower compared with the number of HG glioma subject cases. To balance the number of each class samples, data augmentation techniques are used. Moreover, the data augmentation is a common solution to alleviate the deeper networks’ overfitting. To evaluate such process impact on the proposed model performance, we have computed the classification’s accuracy with/without using data augmentation techniques. Figure 6 illustrates the effect of the data augmentation on the classification accuracy in terms of number of epoch.

Fig. 6
figure 6

Data augmentation technique’ impact on classification’ accuracy

As illustrated in Fig.6, one could notice that the use of the data augmentation’s techniques ameliorates clearly the classification’s accuracy from 0.8246 to 0.964. For this reason, we have adopted this amelioration on the proposed classification approach.

Deep Network Architecture’s Validation

To validate the real effects of the proposed deep CNN architecture on the classification accuracy, we changed each convolutional layer of the proposed model with larger kernels which have the equivalent effective receptive field. Two variants of kernels size have been experimented using the proposed architecture:

  • (5 × 5) kernels size with maintaining the feature maps’ number as for the proposed architecture.

  • (7 × 7) kernels size where we increased the CNN’s capacity by augmenting the feature maps, namely, 64 in the first convolutional layer, 64 in the second, and 128 in the third and the fourth layers.

As illustrated in Figs. 7 and 8, one could notice that the proposed architecture yields higher accuracy value on testing and validation dataset compared with the shallower networks of (5 × 5) and to a (7 × 7) kernels size. These results confirm that our proposed architecture, using small (3 × 3), could capture more details compared with large kernels size even when increasing the feature maps. One could conclude that the proposed architecture has the advantage to maintain the effective receptive fields of bigger kernels while reducing the number of weights and allowing more non-linear transformations on the data. For this reason, we have adopted the use of (3 × 3) kernels size deep CNN.

Fig. 7
figure 7

Comparison of deep versus large kernels (5 × 5) based CNN architecture

Fig. 8
figure 8

Comparison of deep versus (7 × 7) large kernels with augmented features maps

Hyper-parameters’ Validation

In this section, we propose to study hyper-parameters’ effects on the classification’s accuracy and more specifically the effects of the pooling, the activation function, the optimizer, and the initializer.

Pooling

We investigate average pooling versus max pooling. As illustrated in Fig. 9a, the max pooling has shown efficient performance compared with average one. For this reason, we have adopted the max pooling in the proposed CNN architecture.

Fig. 9
figure 9

Hyper-parameters’ validation. a Max pooling validation. b Relu activation function validation. c Adam optimize validation. d Glorot normal validation

Activation function

A comparative study between three activation function’s technique ReLu, selu, and tanh has been performed. As shown in Fig. 9b, one could notice that the ReLu activation function outperforms the two other activation function.

Optimizers

The Adam optimizer has been used to learn our network weights. Moreover, a second optimizer, the stochastic gradient descent (SGD), has been also tested in order to assess the classification performance. During the experiments, the initial learning rate for both optimizers was set to 0.001. As illustrated in Fig. 9c, the Adam optimizer provides much better performance compared with SGD optimizer.

Initializer

In order to justify our initializer’s choice, we have compared the performances of two different techniques: the Glorot normal and the Glorot uniform. As shown in Fig. 9d, the Glorot normal presents higher accuracy value that is why this initializer will be chosen in the proposed approach.

Discussion

In order to assess the proposed approach’s performances, a comparative study has been performed with both hand-crafted and deep learning-based approach from state of the art. Table 2 reports the obtained classification accuracy with the proposed approach as well as with the supervised approaches when applied to the BraTS dataset challenge for brain glioma classification.

Table 2 Comparative study with supervised-based methods for brain MRI glioma classification (supervised method versus proposed approach)

The approach proposed by [48] aims to classify glioma tumors into HG and LG. The features have been extracted by a fusion process between three modalities (MRI T1, T1-contrast, and FLAIR) based on the histogram, the shape, and the gray-level co-occurrence matrix (GLCM), and only forty-five significant features have been selected using LASSO method. The final classification is done by logistic regression (LR) using the LASSO score. The method achieved an accuracy of 89.81%.

The algorithm proposed by Cho et al. [49] investigated two types of classifiers mainly SVM and RF to distinguish between HG and LG through brain MRI. Qualitative evaluations could attest that the RF classifier showed the best performance (89%) compared with the SVM classifier (87%). For feature extraction, the authors have used shape and textural features. The study in [10] investigates brain glioma tumor’s classification into four classes (necrosis, edema, enhancing, and non-enhancing tumors). A 3D DWT has been used for feature extraction. A comparative study has been performed in order to evaluate different classifiers’ performances such as naive Bayes (NB), MLP with one hidden layer, and MLP with backpropagation, and the obtained accuracy was 60%, 70%, 76%, 80%, and 88%, respectively. RF classifier achieved then the better performance.

Table 3 illustrates a second comparative study between the proposed approach and the state-of-the-art CNN-based methods using the same dataset (BraTS).

Table 3 Comparative study with CNN-based approaches

The 3D CNN is rarely explored in MRI processing. To the best of our knowledge, only Iram et al. [22] have developed a 3D-based CNN model for MRI glioma grading. The feature’s maps have been extracted from the volumetric MR images and then are fed into the long short-term memory’s (LSTM) temporal direction network to classify brain tumors into HG and LG gliomas. In fact, this method is semi-automatic and does not explore sufficiently the 3D volumetric contextual information. However, automatic classification is highly desired in neurology’s practice.

In 2D CNN approaches, we have investigated two 2D CNN-based architectures. The first one has been proposed by Pan et al. in 2015 [51] which explore a pre-trained CNN model mainly the LeNet-5. Nerveless, this approach suffers from limited representation using shallower CNN networks. On the other hand, the approach, proposed by Ge et al. in 2018 [50], offers competing results for high- versus low-grade glioma classification. To enhance the obtained performances, they deployed a deep architecture exploiting multi-modality (multi-stream) fusion using six convolutional layers followed by three FC layers and data augmentation. However, the authors do not provide any comparative study due to datasets heterogeneity.

We could conclude that the present study offers a powerful approach for accurate glioma tumor classification outperforming several recent CNN architectures. Based on a fully 3D automatic deep CNN, it could harness the complimentary volumetric contextual information and offers then better results.

Conclusion

We presented in this study a multi-scale 3D CNN framework for automatic gliomas’ tumor grade classification in which, instead of patching the MR image, the whole 3D volumetric MRI sequences are passed to the network. Evaluation analysis shows that proposed architecture can learn high distinctive features to separate between LG and HG subject cases compared with the competitors using either variants of 2D CNNs or relatively shallower networks. Furthermore, the use of a preprocessing step has reduced significantly the dataset heterogeneity created by multi-scanner technologies and acquisitions protocols; meanwhile, the intensity normalization has a positive effect against to correct gliomas’ tissues large in-homogeneity. We found that data augmentation, through only a flipping technique, could improve significantly the overall accuracy especially when the dataset does not provide a satisfactory MR scans to train a deep CNN.

The comparative study with supervised and unsupervised state-of-the-art methods, using the same dataset, could attest that the proposed approach outperforms several well-known CNN-based architectures for glioma MRI classification. For future works, we proposed to investigate the newly technique of “capsule networks (CapsNet)” for MRI brain tumor classification.