Keywords

1 Introduction

Gliomas are the most frequent primary brain tumors, which have the highest mortality rate [1, 3, 4, 18]. They can be categorized to low-grade gliomas (LGG) and high-grade gliomas (HGG). HGG is more aggressive form of the disease, which has a median survival rate of two years or less. The slower growing low-grade variants, such as low-grade astrocytomas and oligodendrogliomas, usually makes life expectancy of several years [15]. MRI is a basic modality commonly used in brain structure analysis, which provides images with high contrast for soft tissues and high spatial resolution and can be useful to evaluate unknown health risk [2, 6, 15].

In recent years, lots of automatic approaches have been proposed for accurate segmentation in brain tumors, and these works can be roughly categorized into machine learning methods, deep learning methods and both-combined methods. Machine learning method is based on probabilistic models, which can learn from brain tumor patterns that do not follow a specific model, such as Conditional Random Field (CRF), Random Forrest (RF) and Support Vector Machine (SVM). Deep learning method learns the feature representation in a data-driven way [7], such as convolutional neural network (CNN), parallelized long short-term memory network (LSTM) and fully convolutional network (FCN). In addition, some authors combined probabilistic model (CRF, RF or SVM) and deep learning method to develop a novel method [5, 10, 12].

The fully convolutional neural networks (FCN), a new variant of CNN, gained the great interest in the segmentation competition of PASCAL VOC 2012. The deep convolutional learning model with substantially enlarged depth advanced the state-of-art performance on segmentation tasks that it alleviated the optimization degradation issue by approximating the objective function with residual functions instead of simply stacking layers, and residual block are skip connections between layers of the network. FCN based approaches are the pioneering work of deep learning in medical image segmentation, although the segmented result is not good enough.

In the end-to-end methods, with the combination of encoding layers or decoding layers, they achieved the success of image segmentation in pixel level. Compared to primary convolutional neural network, the end-to-end method can avoid a lot of duplicate calculations. U-Net architecture, based on fully convolution, had been successfully applied to medical image segmentation [9, 17, 19, 21] . This model is a popular and efficient network for segmentation in brain tumors. Naser and Deen [16] proposed a new approach to achieve segmentation in gliomas. They combined U-Net model for convolutional segmentation and pre-trained VGG16 model for transferring learning and a fully connected classifier for tumor grading. For clinical usage, the challenge is how to pursue the best accuracy for segmentation within limited computational budgets. Li et al. [13] proposed a multi-modality aggregation network (MMAN), which was able to extract multi-scale features of brain tissues and harness complementary information from multi-modality MRI images for fast and accurate segmentation. They applied dilated convolutional layers with different kernel size to obtain large-scale features without increasing too many parameters and computational costs. Ding et al. [8] developed a novel multi-path adaptive fusion network. In this model, they applied the idea of skip-connection in ResNets to the dense block so as to effectively reserve and propagate more low-level visual features. Liu et al. [14] investigated the performance of U-Net model in brain tumor, stroke, white matter hyperintensities (WMHs), eye, cardiac, liver, musculoskeletal, skin cancer, and neuronal pathology. They reported the different extended U-shaped networks and analyzed their pros and cons.

In this work, inspired on the groundbreaking proposal on U-Net, we focus on building the U-Net architecture by using residual convolutional blocks. We evaluated performances of different residual blocks. In addition, it is a key element to keep gradients independent and distributed identically. We aim to get better segmentation score in BraTS 2020 challenge.

2 Method

2.1 Pre-processing

In this work, we applied cropping and random-slicing methods. As for cropping, due to the GPU memory limitation, we cropped the zero-pixel region which in MRI images before training. The zero-pixel area of image boundary does not help to improve the segmentation accuracy. The original size of MRI images is array size of \(155\,\times \,240\,\times \,240\). In model, we employed max-pooling function four times that every dimension size must be divided by 16 (\(2^4\)). Therefore, considering factors above, we set the size of 3D MRI images as \(144\,\times \,192\,\times \,192\). As for multimodal 3D images, it is \(4\,\times \,144\,\times \,192\,\times \,192\).

For each MRI images, we cropped to nine slices randomly. This step can effectively prevent overfitting during training stage. We randomly take 9 consecutive 3D sequences with length of 16 in the first dimension of the MRI images into training. After randomly cropping, the array size of MRI images is \(9\,\times \,16\,\times \,192\,\times \,192\). As for multimodal 3D images, it is \(4\,\times \,9\,\times \,16\,\times \,192\,\times \,192\). In addition, we do the same operation for each epoch during training stage. So the sequences that input to the neural network are generally different for per image and per epoch. This randomization makes the neural network model powerful generalization, especially in limited training data sets. We ensure that all pixels of the brain are trained in training step.

In addition, we employed z-score normalization in medical images [11]. It is accomplished by linearly transforming the original intensities between mean and standard deviation into the corresponding learned landmarks, which defined as:

$$\begin{aligned} z=\frac{x-\mu }{\sigma } \end{aligned}$$
(1)

where \(\mu \) is the mean of the MRI sequence in pixel level and \(\sigma \) is the standard deviation of the MRI sequence in pixel level.

2.2 Architecture

We build the architecture of deep learning referring to Fig. 1. It is an end-to-end method of deep learning, which is also a pixel-to-pixel method. Each layer in this model is five-dimensional array size of bs \(\times \) c \(\times \) h \(\times \) w \(\times \) d, where bs is batch size dimension, c is the channel or multimodal (Flair, T1, T1c and T2) sequences and h, w, d are spatial dimensions. Each convolutional layer and de-convolutional layer contains batch normalization and activation function.

Fig. 1.
figure 1

Our proposed model using res-block-1 in encoding stage. Res-block-1 is shown in Fig. 2

In building this architecture, we refined three primary residual blocks and employed these blocks into encoding stage, in which there are res-block-1, res-block-2 and res-block-3. The residual block is a kind of skip-connect architecture, avoiding gradient vanished with increasing depth of network, in which the gradient is effectively transferred to the shallow layer during training network. We apply the randomized leaky rectified liner unit (RReLU) as activated function for neural network.

Fig. 2.
figure 2

Res-block-1.

We designed the res-block-1 block by using a dual-path convolution, an addition operation and a RReLU function. Referring to Fig. 2, we employed two convolutional layers with batch normalization in main path, where RReLU was adopted after the first convolutional layer. And in the skip-path, we employed a convolutional layer and a batch normalization layer. These two path are added by weighted, after that the output data feature activated by a RReLU function.

Fig. 3.
figure 3

Res-block-2.

Referring to Fig. 3, we designed the res-block-2 block by using the same dual-path architecture like res-block-1. In res-block-2, we putted RReLU function to the first position, so that the output feature which computed at the dual-path added each other, and then it putted fused feature to next neural unit.

Fig. 4.
figure 4

Res-block-3.

Referring to Fig. 4, the res-block-3 is the main single convolutional block with a primary skip-connect weights, in which the last convolutional layer is connected after weighted addition operation.

In this model, we apply RReLU as activated function for neural network [20], which defined as:

$$\begin{aligned} RReLU(x) = {\left\{ \begin{array}{ll} x &{} x>0 \\ ax &{} otherwise\end{array}\right. } \end{aligned}$$
(2)

where a is randomly sampled from uniform distribution U(L, R). L is lower bound of the uniform distribution and R is upper bound of the uniform distribution. We set L of 1/8 and set R of 1/3.

As for loss function, it is used to calculate the loss of training which used in back propagation. We used the Categorical Cross-entropy. The loss function can be described as:

$$\begin{aligned} loss(x,class)&=-\log (\frac{\exp (x(class))}{\sum _ j \exp (x(j))}) \nonumber \\&=-x(class)+\log (\sum _ j \exp (x(j))) \end{aligned}$$
(3)

which combines LogSoftmax and NLLLoss in one single class. As for NLLLoss (\(-x(class)\)) function, the negative log likelihood loss, it is useful to train a classification problem with class classes, and obtaining log-probabilities in a neural network is easily achieved by adding a LogSoftmax layer in the last layer. The Categorical Cross-entropy function is useful to solve the classification problem with multi-classes.

In encoding stage, we employ lots of convolutional layers to extract features from MRI images. And we set parameters of convolutional function with kernel size of 3, stride of 1, padding of 1. Channels, in encoding stage, are 32, 64, 128, 256 and 512 respectively. We use max-pooling function to down-sampling so that model get deep features and learn segmentation ability from its.

In decoding stage, we employ transposed convolutional layer to up-sampling, which makes the output 3D images with the same size of the input 3D images. The transposed convolution is effective and very easy to implement.

3 Experiments

Our method was evaluated on BraTS 2020 dataset.

3.1 Dataset

The BraTS 2020 dataset contains four modes for every patient: Flair, T1, T1c and T2. We trained our model in BraTS 2020 training set, which contains 369 MRI scans including high-grade and low-grade brain tumor. In addition, the validation set contains 125 scans of glioblastoma and testing set contains 166 scans of glioblastoma. BraTS challenge has always been focusing on the evaluation of state-of-art methods for the segmentation for brain tumors in multimodal magnetic resonance imaging scans. Metrics for this challenge are computed through the online evaluation platform that the ground truth labels are not available for public. Every region of gliomas needs to be segmented pixel-to-pixel sequences for 4 meaningful regions: the enhancing tumor (ET), the tumor core (TC), the whole tumor (WT) and normal tissues.

3.2 Setup

Some of the hyper-parameters of the architectures were shown in Table 1. We approached brain tumor segmentation as a multi-class classification problem, segmented normal tissue, necrosis, edema, non-enhancing, and enhancing tumor from MR images respectively. However, the given MR images are not suitable for pouring into neural network directly that the redundant data will cost large GPU memory. So we cropped images of effective parts in three dimensions. Similarly, the same process was used on label set. Additionally, in brain tumor segmentation, the number of samples of necrosis and enhancing tumor is small in training set. To deal with that, we normalized all pixel-level image using z-score (zero-mean) normalization, which made the input data follow a normal distribution and speeded up training. The learning rate was linearly decreased each epoch during the training stage. Our model was developed using PyTorch. We train the model using four GPUs of Nvidia RTX 2080 TI with 40 h.

3.3 Evaluation

The evaluation metrics of brain tumor segmentations consist of three types of measures: Dice similarity coefficient (DSC), Sensitivity and Specificity. The DSC measures the spatial overlap between the automatic segmentation and the label. It is defined as:

$$\begin{aligned} DSC=\frac{2TP}{FP+2TP+FN} \end{aligned}$$
(4)

where FP, FN and TP are false positive, false negative detections and true position, respectively. Sensitivity, also called the true positive rate or probability of detection, measures the proportion of positives that are correctly identified as such:

$$\begin{aligned} Sensitivity=\frac{TP}{TP+FN} \end{aligned}$$
(5)

A larger value of Sensitivity denotes a higher proximity of abnormal tissue between label and prediction of segmentation. Finally, specificity, also called the true negative rate, measures the proportion of negatives. It is defined as:

$$\begin{aligned} Specificity=\frac{TN}{TN+FP} \end{aligned}$$
(6)

where TN is true negative detections. A larger value of Specificity denotes a higher proximity of normal tissue between label and prediction of segmentation.

Table 1. Hyper-parameters of our proposed model. The weights and bias in initialization are each convolutional layers’ setting. We set the randomized leaky ReLU with default parameter setting.
Fig. 5.
figure 5

Our predict MRI images using res-block-1 block.

3.4 Result

We evaluate our proposed model on validation with three different residual block, and compared with other state-of-art methods. Lastly, we report the result of segmentation on BraTS 2020 testing dataset. The performance of our model is presented on Fig. 5.

Referring to Table 2, the Res-Block-1 get better performance of segmentation than others. In Dice metric of WT region, the Res-Block-2 gain a little advantage. In addition, all the res-block get good score in segmentation of WT region. However, it is diverse to design res-block in U-shaped like model. Further experimental investigations are needed to estimate the performance of these decoding method in segmentation of medical image.

Table 2. Segmentation result of Res-Block-1, Res-Block-2 and Res-Block-3 on BraTS 2020 Validation dataset
Table 3. Segmentation result of our proposed model, DeepLab and U-Net model on BraTS 2020 Validation dataset

We compared our model with DeepLab and U-Net model by applying the same pre-processing methods for quantitative study. Our study is focus on performance of deep learning neural model. The results are reported on Table 3. The most difficult tasks in this brain tumor segmentation is marking the tumor core region for LGG and the enhancing tissues for HGG. To compare with two classical end-to-end model and referring to Table 3, our proposed model outperformed these models in Dice metrics.

Table 4. Segmentation result of our proposed model on BraTS 2020 testing dataset

The BraTS challenge testing result is reported on Table 4. In segmentation of the core and the enhancing tumor, our proposed method have better performance on testing set.

4 Conclusion

In this work, we propose a U-shaped architecture using residual block. We evaluated performance of different residual block in this U-shaped architecture. Residual block is an effective block to build deep neural network in feature extraction stage. In brain tumor segmentation, there are lots of deep-learning models including 2D and 3D model that the architecture becomes more and more complex as the development of computer hardware and the result of segmentation becomes more and more precise. Our research approach is a powerful tool to studies of 3D medical images of brain tumors and our proposed model is an effective deep-learning model, especially in 3D brain tumor segmentation.