Keywords

1 Introduction

The use of medical images has greatly increased our knowledge of biological processes for research in medical field. Unfortunately, inaccurate diagnosis of brain tumor is the leading cause of deaths among affected people [5]. Therefore, different types of brain tumors are needed to be diagnosed accurately which is essential for proper treatment. The manual evaluation and examination of medical images takes more time and may prone to errors or omissions. Therefore, computer-based analysis and diagnosis can help radiologists in brain tumor diagnosis.

In computer-based methods, image segmentation is a fundamental way to extract useful information from medical images and to identify the abnormalities. Medical community uses X-rays, Computed Tomography (CT) scan and Magnetic Resonance Imaging (MRI) modalities for cancer diagnosis and treatment planning. In MRI a patient is not subjected to harmful radiation and on top of that it also provides excellent soft tissue contrast. Furthermore, MRI highlights different types of soft tissues by using different imaging sequences, such as T1-weighted, contrast enhanced T1-Contrast Enhanced (T1CE), T2-weighted and Fluid Attenuation Inversion Recovery (FLAIR) images.

In this study, the aim is to develop a computer based method for automated segmentation of brain tumor from MRI and predict survival time of patients affected by glioma. Glioma is usually classified into two categories, namely Low Grade Glioma (LGG) and High Grade Glioma (HGG). LGG is slow-growing and generally treated by surgery or radiation therapy, while HGG is fast-growing and has poor prognosis with an average survival of 15 months [6].

2 Related Work

The task of brain tumor segmentation has recently been dominated by 3D CNN based architecture as it has an additional advantage of depth information from brain MRI volumes. In this regard, one of the early work was proposed by Kamnitsas et al. [7] that uses 3D patches extracted from brain MRI volumes and dual pathway CNNs to extract features from different scales. Although it was using depth information and different scales but it had low inference efficiency. Wang et al. [8] proposed a cascaded architecture which divides the overall multi-class problem into three binary class problems and designed multi CNN based architectures. It has good generalization capabilities and achieved second place on BraTS 2017 challenge. An efficient system based on 2D and 3D CNN based features is proposed by Mlynarski et al. [9]. They combined the features extracted by 2D models with 3D CNN but due to multiple networks, it has high computational overhead.

In [10] Chen et al. proposed a deep convolutional symmetric neural network for segmentation of brain tumor. Their proposed model consist of Deep Convolutional Neural Network (DCNN) with symmetric masks added in different layers. They have mainly focused on run-time segmentation of MRI volumes in seconds but their scores of enhancing tumor, tumor core and whole tumor are 0.58, 0.68 and 0.85 respectively. Pereira et al. in an article [11] have proposed a model in which they have mixed the information with the linear expansion of feature, by which only relevant features are obtained. The model is computationally expensive as it is using two networks (Binary FCN and Multi-Class FCN) for training. They have achieved dice score of 0.86, 0.76, 0.69 whole core and enhancing respectively. Their model have performed poor on tumor core as compare to our model.

In this study, we have evaluated specifically edema-label 2 of FLAIR, the necrotic and non-enhancing tumor core-label 1 along with enhancing tumor label 4 of T1-CE volumes, which provides more information of tumor region leading to better OS prediction. Most of the work in literature for OS prediction is based on radiomic features extracted from MR images. Sanghania et al. have used clinical, shape, texture and volumetric features for 2 class (short, long) and 3 class (short, medium, long) survival group prediction. The Support Vector Machine classification based Recursive Features Elimination (SVM-RFE) method is used for feature selection [12].

A statistical analysis of multi-scale texture features for the characterization of tumor regions has been performed for progression free survival and OS prediction by Ahmad Chaddad et al. [13]. Recently, Sun et al. have addressed the problems of brain tumor segmentation and 3 class OS prediction using ensemble of 3DCNN and Random forest classifier respectively. Radiomic features are used for the survival time prediction [14]. Furthermore, a study has been done by Sanghania et al., where they have evaluated the effectiveness of tumor shape features for OS prognosis in Glioma patients. They have evaluated abnormality regions of FLAIR and T1-CE MR images of patients [15]. They concluded that 3D shape based radiomic features are mostly associated with OS prognosis.

3 Methodology

This article is based on two basic tasks. In task 1, the brain tumor segmentation of two tumor grades HGG and LGG is performed. For that purpose, U-Net [16] is used for the segmentation of LGG and HGG. In task 2 radiomic features has been extracted using segmented tumor and MRI scans. Furthermore, high variance features along with age has been selected using RFE and fed to RF for the patient’s OS prediction. The overall system diagram is shown in Fig. 1.

Fig. 1.
figure 1

Proposed System Diagram (a) MRI Scans are given as input to Segmentation model, (b) Brain tumor segmentation of HGG and LGG is done using 3DU-Net model, (c) Segmented tumor results are gathered, (d) Radiomic features based on intensity, texture and shape are extracted from segmented tumor, (e) Significant features are selected using Recursive Features Elimination (RFE), (f) Random Forest (RF) regression model is used to predict overall survival time of patient.

3.1 Data Preprocessing

Data preprocessing is an essential step before training the deep neural network. One MRI volume has \( 240 \times 240 \times 155 \times 5 \) dimension. To make training process efficient, the irrelevant border from each slice is cropped to \( 224 \times 224 \). This process of cropping has been done automatically by ignoring ‘n’ number of pixels from the height and m number of pixels from width, where n = 8, m = 8. The values of ‘n’ and ‘m’ are determined by visualizing various slices of MRI volumes showing, \( 224 \times 224 \) area of slice as brain part and remaining as background. The \( 224 \times 224 \) slice is obtained by selecting \( \left[ {0:8,232:240,:,:} \right] \) intensity values from \( \left[ {240,240,:,:} \right] \) slice. Furthermore, the Z-score normalization [17] is used for normalizing each image modality within brain regions as in this normalization approach the standard deviations of below and above the mean value is measured. Min-Max normalization is also tested for image normalization but it performs poorly on this U-net model. The preprocessing process is shown in Fig. 2.

Fig. 2.
figure 2

Process diagram of preprocessing for brain tumor segmentation

3.2 Brain Tumor Segmentation

The segmentation task of BraTS 2019 dataset is challenging mainly because of data imbalance among different classes. The selection of five patients from dataset, to show class imbalance problem is random. Their statistics for each type of glioma are given in Table 1. This class imbalance is more severe in case of LGG and it has been observed that the segmentation networks perform poorly with LGG volumes.

To segment HGG and LGG volumes, a slice based 2D U-Net is used with slight modifications in the sense of reconstruction of image at decoder path shown in Fig. 3. A Batch Normalization (BN) layer is used after all hidden convolution layers. To preserve useful information during training Leaky ReLu (LReLu) is used in all hidden layers. The stride of 2 is used in hidden convolution layers which acts as down-sampling operation in the encoder pathway. In decoder path ConvTranspose Layer is used for UpSampling followed by dropout. Instead of using Upsampling layer as used by [16], ConvTranspose layer is used. The UpSampling layer doubles the input dimension whereas ConvTranspose layer perform inverse convolution operation. ConvTranspose layer have learnable parameters as compare to UpSampling layer. The addition of dropout layer has significantly reduced the over-fitting problem and provides better generalization.

Fig. 3.
figure 3

U-Net architecture with slight changes. After each convolution layer in Down-Sampling path, a BN layer is added and at up-sampling path dropout layer is used after BN to prevent over-fitting.

The data imbalance problem has been reduced by using a specialized tversky loss function [18] as shown in Eq. 1. This loss function generalizes the dice score and it outperforms the simple dice loss. In case of dice loss, the false positives (FPs) and false negatives (FNs) were treated equally which results in low recall and high precision values. The tversky loss focuses more on FNs which provides a better tradeoff between precision and recall. This loss function is used in segmentation task. It is defined as follows:

Table 1. Class distribution of five patients according to LGG and HGG
$$ S\left( {P,G;\alpha ,\beta } \right) = \frac{{\left| PG \right|}}{{\left| {PG} \right| + \alpha \left| P{\backslash}G \right| + \beta \left| G{\backslash}P \right| }} $$
(1)

Where P denotes predictions and G stands for ground truth. Where \( \alpha \) and \( \beta \) are the parameters used to control the magnitude value of penalties for FNs and FPs. The full form of tversky loss function is given in Eq. 2.

$$ T\left( {\alpha ,\beta } \right) = \frac{\sum\nolimits_{i = 1}^{N} { {p_{{0}_i}g_{{0}_i} }}}{\sum\nolimits_{i = 1}^{N} { {p_{{0}_i}g_{{0}_i} }}+ \alpha \sum\nolimits_{i = 1}^{N} { {p_{{0}_i}g_{{0}_i} }} + \beta \sum\nolimits_{i = 1}^{N} { {p_{{0}_i}g_{{0}_i} }} }$$
(2)

Where the probabilities p0i and p1i are related to lesion voxel i and non-lesion voxel i respectively. The g1i is 1 for non-lesion voxel and 0 for lesion voxel and vice versa for the g0i.

3.3 Overall Survival Time Prediction

To identify the clinical relevance of task 1 (segmentation), this section is mainly focusing on OS time prediction by analyzing radiomic features and algorithms based on Machine Learning (ML). In the segmentation task, segmented labels of tumor in the preoperative MR Scans are produced, which are used for feature extraction in OS prediction task. Radiomic feature are extracted from Flair and T1CE modalities.

In the next step, these extracted features are given to the RFE for the selection of significant features. Furthermore, these features are then analyzed through RF model for the prediction of patient’s OS time. There are three categories of survivors on the basis of their survival days: the long time survivors (>15 months), short time survivors (<10 Months) and mid time survivors are those which can live between 10 and 15 months. The overall process of OS prediction has been described in the following manner.

Radiomic Feature Extraction.

The location and geometry of tumor hold an important role in deciding the survival days [19]. Four raw MRI volumes described in (Sect. 1) along with segmented tumor are used for feature extraction. In total ‘3393’ features are extracted using python based radiomic feature extraction library named Pyradiomics [20]. There are three labels in brain tumor named Necrotic and Non-Enhancing Tumor core (NCR/NET)-label 1, peritumoral Edema (ED)-label 2 and Enhancing Tumor (ET)-label 4 [21]. The segmented tumor along with Flair and T1CE modalities is used to extract appropriate radiomic/imaging features. For the extraction of these features, ED-label 2 with Flair modality and the NCR/NET-label 1 along with ET-label 4 are used with T1CE specifically. The label types and modality types are needed to be defined in the setting file of Pyradiomics library.

Pyradiomics extracts features in 2D and 3D Images by calculating the mean intensity of each image and region pair. We utilized available features of every feature type that includes: Gray Level Co-occurrence matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), Gray Level Dependence Matrix (GLDM), First Order Features, Neighboring Gray Tone Difference Matrix (NRGTM) [12] and Shape features. We have extracted radiomic features from MRI volumes, and abnormal tumor regions of segmented tumor according to tumor type (i.e. edema-label 2 of FLAIR, the necrotic and non-enhancing tumor core-label 1 along with enhancing tumor label 4 of T1-CE volumes). The size of the tumor and its ratio according to the entire shape was extracted as well. The other features from image named mean, skewness, variance, standard deviation, histogram and entropy feature intensities of the tumor were also extracted.

Feature Selection:

To eliminate the irrelevant or least important features we have performed feature selection in three steps. At first to remove redundant features several experiments has been performed using multiple combinations of extracted features and their impact on result has been observed. Furthermore, we have selected 76 significant features using RFE and fed them to the regression model. RFE [22] is a feature selection technique that select relevant features by fitting them on a model and remove irrelevant features until the required number of features are selected as specified in the program.

Regression Model:

We have implemented RF regression model for the prediction of overall survival using significant features extracted and selected in previous steps. RF is an approach based on ensemble for the regression, classification and many other tasks depending on the shape and type of data. The RF regression model is used for the prediction task as the target variable named survival is a continuous variable, so that support vector machine and other classification models are not appropriate to this problem of regression. It creates different number of decision trees at the time of model training and randomly selects among them for making best decisions. The model has given the accuracy of 0.31 as shown in Table 2. The overall survival prediction pipeline for regression is shown in Fig. 4. In the OS prediction pipeline MRI volumes and segmented tumor are used as input Fig. 4(a). As shown in Fig. 4(b) radiomic features based on intensity, texture and shape are extracted from segmented tumor, Fig. 4(c) describe feature selection as 76 important features are selected using RFE, including clinical feature age and Gross Resection Status (GTR), In Fig. 4(d), it can be observed that after dividing data into 80% train and 20% test, the RF regression model is used to predict overall survival time of patient, then in the next step, 5-fold cross validation is performed to validate the model performance as shown in Fig. 4(e). Finally, model has returned predicted survival of patients in days as presented in Fig. 4 (f). The results of segmentation for all tumor types are shown in Table 2.

Table 2. Result of U-Net on BraTS 2019 data
Fig. 4.
figure 4

Overall Survival Prediction Pipeline (a) Segmented tumor as input, (b) Radiomic features extraction from segmented tumor, (c) Important features selection using RFE, (d) RF regression model for OS prediction, (e) 5-fold cross validation to validate the model performance, (f) Predicted result in days.

4 Results

In this section experimental results are defined in tabular format. The authors in [1,2,3,4] have used BraTS 2018 dataset (subset of BraTS 2019) for the segmentation of brain tumor. However, in this paper, BraTS 2019 dataset is used to train and test the models [23]. The multimode brain MRI scans of 335 patients, 259 Glioblastoma (GBM) HGG and 76 LGG are used for the training. Each patient comprises 4 modalities of image including FLAIR, T1, T1CE, and T2 along with ground truth label. We have used 80:20 split, 80 percent of the data as training set and 20 percent of the data as test set for the model testing. The data has been pre-processed using standard preprocessing methods. All provided scans are without skull, following same anatomical pattern, and are re-sampled to resolution 1 mm3. The shape of each volume is \( 240 \times 240 \times 155 \times 5 \).

The experiments for OS prediction task are performed on 76 patient’s volumes from training and 29 patient’s volumes from validation dataset, and got 0.31 and 0.27 accuracy respectively as shown in Table 3.

Table 3. Results summary for the prediction of patient’s survival

4.1 Evaluation Metrics

The most common evaluation metric in segmentation task is Dice Similarity Coefficient (DSC) and in survival predication, accuracy, Mean Square Error (MSE), Median Squared Error (SE), Standard Deviation of Squared Errors (stdSE) and SpearmanR.

$$ DSC\left( {A,B} \right) = \frac{{2\!\left| {A \cap B} \right|}}{\left| A \right| + \left|B\right| } $$
(3)

The DSC is used to find the similarity between ground truth and predicted labels of the model as presented in Eq. (3). Afterword’s, we calculate average of those values to find overall dice score.

$$ MSE = \frac{1}{n}\sum {\left( {Y - \hat{Y}} \right)^{2} } $$
(4)
$$ Accuracy = \frac{{\left( {TN + TP} \right)}}{{\left( {TN + FP + FN + TP} \right)}} $$
(5)
$$ SpearmanR = \rho_{{rg_{Y} rg_{{\hat{Y}}} }} = \frac{{covr\left( {rg_{Y} rg_{{\hat{Y}}} } \right)}}{{\sigma_{{rg_{X} }} \sigma_{{rg_{Y} }} }} $$
(6)
$$ S\!E (median) = 1.2533 \times S\!E(\mu) $$
(7)

The MSE in Eq. (4) is used to find how close our line is to the actual points. It is calculated by sum of square of difference between actual and predicted values and divide the average with them. Accuracy in Eq. (5) is used to measure the closeness of the value. It is the proportion of true results (TN + TP) with the overall results. Where TP = True Positive, TN = True Negative, FP = False Positive and FN = False Negative. The spearman rank correlation is used to find correlation between ranks of two values. In Eq. (6), \( rg_{Y} \) is the rank of variable Y and \( rg_{{\hat{Y}}} \) is the rank of variable \( \hat{Y} \). Where Y is actual and \( \hat{Y} \) is predicted value. It is calculated by dividing the correlation between two ranks with standard deviation of two ranks.

Where, SE (median) is the standard error of the median and \( SE(\mu) \) is the standard error of the mean. The MedianSE is shown in Eq. (7). The smaller the MedianSE, the more support there is for the corresponding model. The MedianSE is preferred to the mean squared error in the evaluation of model due to large outliers in the patient’s survival data [24]. StdSE is another evaluation metric for error calculation in regression. Standard deviation identifies how correctly mean represents sample data.

5 Conclusion

In this study, we have proposed the U-Net based dense segmentation system which is able to achieve good dice scores for HGG volumes from multi-modal BRATS 2019 dataset. Our primary results on test set are promising as dice score of (whole tumor 0.84, core tumor 0.8 and enhancing tumor 0.63) in glioma segmentation. However, the network performance degraded for LGG volumes which needs to be further investigated. It can be observed from the analysis that the data imbalance problem is more serious in LGG volumes thus special care must be taken in order to perform automatic segmentation of LGG volumes. In future, we are planning to design separate networks for HGG and LGG segmentation. On the other side, for survival prediction task, RFE has eliminated low variance features and selected more relevant features, which are then fed to RF regression model for the prediction of patient’s OS. In result, this model give 0.31 and 0.27 accuracy with MSE of 198540.65 and 208540.6568 on the training set and test set respectively. It is observed that, the OS prediction task has scope of improvement which will be provided in future by statistically analysing features extracted from region of interest. Furthermore, the relation of these features will be investigated with OS, which may leads to improved prognosis of survival.