Keywords

1 Introduction

Magnetic resonance imaging is a medical modality used to guide the diagnosis process and the treatment planning. To do so, it needs to develop the images or slices segmentation, in order to detect and characterize the lesions, as well as to visualize and quantify the pathology severity. Based on their experiences and knowledge, medical specialists make a subjective interpretation of this type of images; in other words, a manual segmentation is performed. This task is long, painstaking and subject to human variability. Brain tumor segmentation is one of the crucial steps for surgery planning and treatment evaluation. Despite the great amount of effort being put to address this challenging problem in the past two decades, segmentation of brain tumor remains to be one of the most challenging tasks in medical image analysis. This is due to both the intrinsic nature of the tumor tissue being heterogeneous and the extrinsic problems with unsatisfactory image quality of clinical MRI scans. For example, the tumor mass of a glioma patient, the most common brain tumor, often consists of peritumoral edema, necrotic core, enhancing and non-enhancing tumor core. In addition to the complicated tumor tissue pattern, the MRI images can be further corrupted with a slowly varying bias field and/or motion artifacts, etc. Brain MRIs in most cases do not have well-defined limits between the elements that compose them; in addition, they include non-soft tissues, as well as artifacts that can hinder segmentation. Despite all these inherent conditions, numerous automatic algorithms or techniques have been developed and introduced in state-of-the-art. Among approaches exclusively designed to segment the brain tissues stand out those that are based on the paradigm of fuzzy clustering as well as all its variants [1,2,3,4, 28]. With the same purpose, hybrid methods based on combinations of different paradigms of machine learning and optimization algorithms have also been presented, e.g. [5,6,7]. On the other hand, methods designed to segment brain tumors or other abnormalities have also been introduced, among which one can refer to [8,9,10,11,12,13,14, 29]. For this task it is possible to affirm that in the state-of-the-art the proposals based on Deep Learning are the most novel and have the best results. The majority of these proposals yielded a high performance in the image processing task, specifically when these were brain magnetic resonance images. Nevertheless, after the pertinent analysis it was noted that most methods of them suffer from one or more challenges such as: training need, special handcrafted features (local or global), sensitive to initializations, many parameters that require a tuning, various processing stages, designed to segment just T1-weighted brain MRI images, among others. In this research paper, we concentrate on brain tissues and tumors segmentation in magnetic resonance images. For that purpose we introduce a system consisting in a cascade of U-Net models enhanced with a fusion information approach, this proposal can be considered an extension to the one presented in [29], as well as a formalization of theoretical concepts. The introduced proposal has the following special features in contrast with those above-mentioned: 1) it is able to segment RMIs with different relaxation times such as T1, T2, T1ce and Flair, 2) it is easily adaptable to segment brain tumors, 3) it does not require the initialization of any parameter, such as the number of regions in which the slice will be segmented, 4) it does not require any preprocessing stage to improve the segmentation quality of each slice and 5) it does not need various processing stages to increase its performance. The rest of this paper is organized as follow. A mathematical formulation of U-Net is introduced in detail in Sect. 2. The parallel and cascade architectures of Convolutional Neural Networks are introduced in Sect. 3. Experimental results and a comparative analysis with other current methods in the literature are presented in Sect. 4. In the final section the Conclusions are drawn and future work is outlined.

2 U-Net: Convolutional Networks for Biomedical Image Segmentation

Deep neural networks have shown remarkable success in domains such as image classification [6, 15], object detection and localization [17,18,19], language processing applications [20, 21], speech recognition [22], medical image processing [16, 23, 24], among many others. In the deep learning paradigm, Convolutional Neural Networks (CNNs) outstands as the major architecture, they deploy convolution operations on hidden layers for weight sharing and parameter reduction, in order to extract local information from grid-like input data. In simple words, they process de information by means of hierarchical layers in order to understand representations and features from data in increasing levels of complexity.

For biomedical image segmentation, U-Net is one of the most important architectures that have been introduced in the state of the art [25]. It is a fully convolutional neural network model originally designed to develop a binary segmentation; that is, the main object and the background of the image. This network is divided into two parts, in the first part, the images are subjected to a downward sampling (Contracting Path or Encoder), by means of convolution operations with a kernel of \(3\times 3\) each followed by a rectified linear unit (ReLU) and a maximum grouping layer of \(2\times 2\). The next part of the model (Expanding Path or Decoder) consists of layers of deconvolution and convolution with \(2\times 2\) kernel, finally the output will correspond to a specific class of objects to be segmented. The processing implies nineteen convolutions (\(\mathcal {C}\)), four subsamplings (\(\mathcal {S}\)), four upsamplings (\(\mathcal {U}\)) and four mergings (\(\mathcal {M}\)). To understand the model variant that we propose to segment brain tumors from BraTS 2017, it can be see Fig. 1.

Fig. 1.
figure 1

U-Net model.

In the Contracting Path: Layer 1 is a convolution layer which takes input image X of size \(240\times 240\) and convolves it with 32 filters of size \(3\times 3\); producing 32 feature maps of \(240\times 240\). In order to make the output of linear operation nonlinear, it is advisable to output of convolution operation is passed through a Rectified Linear Units (ReLU) activation function \(\sigma (x)=\max (0,x)\), in this way:

$$\begin{aligned} \mathcal {C}1_{i,j}^{k}=\sigma \left( \sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k}*X_{i+m,j+n}+b^{k} \right) \end{aligned}$$
(1)

where \(\mathcal {C}1^{k}\) stands for k-th output feature map in \( \mathcal {C}1 \) layer, (mn) are the indices of the k-th kernel (filter), while that (ij) are the indices of output. \(\mathcal {C}1_{i,j}^{k}\) is convolved with 32 kernels of size \(3\times 3\); in the same way, the output is rectified:

$$\begin{aligned} \mathcal {C}2_{i,j}^{k}=\sigma \left( \sum _{d=0}^{31}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {C}1_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(2)

In Layer 2, the output of the convolution layer \(\mathcal {C}2^{k}\) is fed to max-pooling layer \(\mathcal {S}1^{k}=\max \mathrm {Pool}\left( \mathcal {C}2^{k} \right) \). For each feature map in \( \mathcal {C}2^{k}\), max-pooling performs the following operation:

$$\begin{aligned} \mathcal {S}1_{i,j}^{k}=\max \begin{pmatrix} \mathcal {C}2_{2i,2j}^{k} &{} \mathcal {C}2_{2i+1,2j}^{k}\\ \mathcal {C}2_{2i,2j+1}^{k} &{} \mathcal {C}2_{2i+1,2j+1}^{k} \end{pmatrix} \end{aligned}$$
(3)

where (ij) are the indices of k-th feature map of output, and k is the feature map index. The output consists in 32 feature maps with size of \(120\times 120\), which implies one-half the size of the input image (X/2). \(\mathcal {S}1^{k}\) is convolved with 64 kernels of size \(3\times 3\), and the result is rectified:

$$\begin{aligned} \mathcal {C}3_{i,j}^{k}=\sigma \left( \sum _{d=0}^{63}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {S}1_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(4)

By means of 64 \(3\times 3\) filters \(\mathcal {C}3^{k}\) is convolved, and then the results is rectified:

$$\begin{aligned} \mathcal {C}4_{i,j}^{k}=\sigma \left( \sum _{d=0}^{63}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {C}3_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(5)

Layers 3, 4 and 5 a similar process than Layer 2. A max-pooling filtering followed that two convolutional stages along with their respective linear rectification. In brief, for Layer 3:

$$\begin{aligned} \mathcal {S}2_{i,j}^{k}=\max \begin{pmatrix} \mathcal {C}4_{2i,2j}^{k} &{} \mathcal {C}4_{2i+1,2j}^{k}\\ \mathcal {C}4_{2i,2j+1}^{k} &{} \mathcal {C}4_{2i+1,2j+1}^{k} \end{pmatrix} \end{aligned}$$
(6)

with size (X/4) is convolved with \(128\times 128\) kernels with size \(3\times 3\) kernel in order to obtain \(128\times 128\) feature maps:

$$\begin{aligned} \mathcal {C}5_{i,j}^{k}=\sigma \left( \sum _{d=0}^{127}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {S}2_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(7)
$$\begin{aligned} \mathcal {C}6_{i,j}^{k}=\sigma \left( \sum _{d=0}^{127}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {C}5_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(8)

For Layer 4, the max-pooling filtering on \(\mathcal {C}6^{k}\) is calculated as:

$$\begin{aligned} \mathcal {S}3_{i,j}^{k}=\max \begin{pmatrix} \mathcal {C}6_{2i,2j}^{k} &{} \mathcal {C}6_{2i+1,2j}^{k}\\ \mathcal {C}6_{2i,2j+1}^{k} &{} \mathcal {C}6_{2i+1,2j+1}^{k} \end{pmatrix} \end{aligned}$$
(9)

The result with size (X/8) is convolved with \(3\times 3\) size \(256\times 256\) kernels to obtain same number of rectified feature maps:

$$\begin{aligned} \mathcal {C}7_{i,j}^{k}=\sigma \left( \sum _{d=0}^{255}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {S}3_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(10)
$$\begin{aligned} \mathcal {C}8_{i,j}^{k}=\sigma \left( \sum _{d=0}^{255}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {C}7_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(11)

Last layer in contracting path takes as input \(\mathcal {C}8^{k}\), and performances a down-sampling to reduce the feature maps into \(15\times 15\) size (i.e. X/16), the filter is given as:

$$\begin{aligned} \mathcal {S}4_{i,j}^{k}=\max \begin{pmatrix} \mathcal {C}8_{2i,2j}^{k} &{} \mathcal {C}8_{2i+1,2j}^{k}\\ \mathcal {C}8_{2i,2j+1}^{k} &{} \mathcal {C}8_{2i+1,2j+1}^{k} \end{pmatrix} \end{aligned}$$
(12)

A \(3\times 3\) pair of kernels are considered to extract the deepest features, for that purpose \(\mathcal {S}4^{k}\) is convolved with 512 kernels, and the result is once again convolved with other 512 kernels, in both cases it is necessary to consider a linear rectification to avoid negative numbers, these operations are given as:

$$\begin{aligned} \mathcal {C}9_{i,j}^{k}=\sigma \left( \sum _{d=0}^{511}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {S}4_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(13)
$$\begin{aligned} \mathcal {C}10_{i,j}^{k}=\sigma \left( \sum _{d=0}^{511}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {C}9_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(14)

In the Expanding Path: Layer 6 develops an un-pooling process by means of nearest interpolation with an up-sampling factor of \(\uparrow _{2}\) for rows and columns. In short, it repeats twice each row and column of the k-th feature map:

$$\begin{aligned} \mathcal {U}1_{i,j}^{k}= \begin{bmatrix} \mathcal {C}10_{2i,2j}^{k} &{} \mathcal {C}10_{2i,2j}^{k}\\ \mathcal {C}10_{2i,2j}^{k} &{} \mathcal {C}10_{2i,2j}^{k} \end{bmatrix} \end{aligned}$$
(15)

\(\mathcal {U}1^{k}\) makes that the k-th feature map increase its size to X/8. Layers 6 and 4 are merged such as a concatenation process, i.e.:

$$\begin{aligned} \mathcal {M}1_{i,j}^{k}= \begin{bmatrix} \mathcal {U}1_{i,j}^{k}; \;\mathcal {C}8_{i,j}^{k} \end{bmatrix} \end{aligned}$$
(16)

\(\mathcal {M}1^{k}\) consists of \(768 \times 768\) feature maps. In Layer 7, \(\mathcal {M}1^{k}\) is in the first instance convolved with \(256\times 256\) kernels with a size of \(3\times 3\), each of the results is rectified in order to avoid negative numbers:

$$\begin{aligned} \mathcal {C}11_{i,j}^{k}=\sigma \left( \sum _{d=0}^{255}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {M}1_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(17)
$$\begin{aligned} \mathcal {C}12_{i,j}^{k}=\sigma \left( \sum _{d=0}^{255}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {C}11_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(18)

Afterwards, (18) is up-sampling with a \(\uparrow _{2}\) factor in order to increase the feature maps to an X/4 size:

$$\begin{aligned} \mathcal {U}2_{i,j}^{k}= \begin{bmatrix} \mathcal {C}12_{2i,2j}^{k} &{} \mathcal {C}12_{2i,2j}^{k}\\ \mathcal {C}12_{2i,2j}^{k} &{} \mathcal {C}12_{2i,2j}^{k} \end{bmatrix} \end{aligned}$$
(19)

Outflows of Layers 7 and 3 are concatenated as:

$$\begin{aligned} \mathcal {M}2_{i,j}^{k}= \begin{bmatrix} \mathcal {U}2_{i,j}^{k}; \;\mathcal {C}6_{i,j}^{k} \end{bmatrix} \end{aligned}$$
(20)

\(\mathcal {M}2^{k}\) consists of \(384 \times 384\) feature maps. Layers 8 and 9 follow a similar processing than Layer 7; but with a decreasing of feature maps number. In this regard, for Layer 8:

$$\begin{aligned} \mathcal {C}13_{i,j}^{k}=\sigma \left( \sum _{d=0}^{127}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {M}1_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(21)
$$\begin{aligned} \mathcal {C}14_{i,j}^{k}=\sigma \left( \sum _{d=0}^{127}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {C}13_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(22)
$$\begin{aligned} \mathcal {U}3_{i,j}^{k}= \begin{bmatrix} \mathcal {C}14_{2i,2j}^{k} &{} \mathcal {C}14_{2i,2j}^{k}\\ \mathcal {C}14_{2i,2j}^{k} &{} \mathcal {C}14_{2i,2j}^{k} \end{bmatrix} \end{aligned}$$
(23)

\(\mathcal {U}3^{k}\) makes that all feature maps increase their size to X/2. \(\mathcal {U}3^{k}\) is concatenated with \(\mathcal {C}4^{k}\), the output implies \(192 \times 192\) features maps:

$$\begin{aligned} \mathcal {M}3_{i,j}^{k}= \begin{bmatrix} \mathcal {U}3_{i,j}^{k}; \;\mathcal {C}4_{i,j}^{k} \end{bmatrix} \end{aligned}$$
(24)

For Layer 9:

$$\begin{aligned} \mathcal {C}15_{i,j}^{k}=\sigma \left( \sum _{d=0}^{63}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {M}3_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(25)
$$\begin{aligned} \mathcal {C}16_{i,j}^{k}=\sigma \left( \sum _{d=0}^{63}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {C}15_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(26)
$$\begin{aligned} \mathcal {U}4_{i,j}^{k}= \begin{bmatrix} \mathcal {C}16_{2i,2j}^{k} &{} \mathcal {C}16_{2i,2j}^{k}\\ \mathcal {C}16_{2i,2j}^{k} &{} \mathcal {C}16_{2i,2j}^{k} \end{bmatrix} \end{aligned}$$
(27)

Last up-sampling process \(\mathcal {U}4^{k}\) makes that all feature maps increase their size to X. \(\mathcal {U}4^{k}\) is concatenated with \(\mathcal {C}2^{k}\):

$$\begin{aligned} \mathcal {M}4_{i,j}^{k}= \begin{bmatrix} \mathcal {U}4_{i,j}^{k}; \;\mathcal {C}2_{i,j}^{k} \end{bmatrix} \end{aligned}$$
(28)

\(\mathcal {M}4^{k}\) implies \(96\times 96\) features maps, these are convolved with \(32\times 32\) kernels with size \(3\times 3\):

$$\begin{aligned} \mathcal {C}17_{i,j}^{k}=\sigma \left( \sum _{d=0}^{31}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {M}4_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(29)
$$\begin{aligned} \mathcal {C}18_{i,j}^{k}=\sigma \left( \sum _{d=0}^{31}\sum _{m=0}^{2}\sum _{n=0}^{2} w_{m,n}^{k,d}*\mathcal {C}17_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(30)

Last convolution layer convolves \(\mathcal {C}18^{k}\) with 32 kernels of size \(1\times 1\), the linear activation function is replaced by a Sigmoid one, \(\sigma (x)=\dfrac{1}{1+\exp ^{-x}}\); this function lets to highlight the region of interest (region to be segmented) in the segmented image \(\check{X}\), since it squashes to \(\mathcal {C}18^{k}\) into the range [0, 1]. In this way:

$$\begin{aligned} \check{X}_{i,j}=\sigma \left( \sum _{d=0}^{31}\sum _{m=0}^{1}\sum _{n=0}^{1} w_{m,n}^{k,d}*\mathcal {C}18_{i+m,j+n}^{d}+b^{k} \right) \end{aligned}$$
(31)

The segmented image \(\check{X}\) has a size \(240\times 240\) such as the input image X. Expressions (1) to (31) let us to depict how CNNs follow a hierarchical processing to extract specific features in order to detect and segment specific regions; in this case, brain tumors.

3 Deep Learning Systems for Brain Image Segmentation

3.1 Proposed Parallel System for Brain Tissues Segmentation

Conventionally, it may be assumed that different tissues can be found in a MRI slice: (1) White Matter (WM), (2) Gray Matter (GM), (3) Cerebral Spinal Fluid (CSF) and (4) Abnormalities (ABN). Nevertheless, it should be clarified that depending on the slice, not all regions may be present or the magnitude of their presence will be variant. In order to develop an automatic soft tissues recognition and their segmentation we suggest the system depicted in Fig. 2; it is basically comprised by four U-Nets models trained to work on a specific soft. After each tissue segmentation, it is necessary to perform the joint area representation by means of a method that determines appropriately the target and background region. From following rules [26], it is possible to perform the fusion of the segmented tissues, the background fusion, as well as the fusion of the detected and regions:

  1. 1.

    If \(R^{1}\) and \(R^{2}\) do not intersect, two regions are formed in the representation of joint area, that is \(R_{1}^{3}=R^{1}\) and \(R_{2}^{3}=R^{2}\).

  2. 2.

    If \(R_{1}\) and \(R_{2}\) are partly intersected, three regions are formed in the representation of joint area, that is \(R_{1}^{3}=R^{1}\cap R^{2}\), \(R_{2}^{3}=R^{1} - R_{1}^{3}\), \(R_{3}^{3}=R^{2} - R_{1}^{3}\).

  3. 3.

    If there is an area that is completely include, such as \(R^{1} \subset R^{2}\), then two regions are formed in the representation of joint area, that is \(R_{1}^{3}=R^{2}\) and \(R_{2}^{3}=R^{2}-R_{1}^{3}\).

Fig. 2.
figure 2

Parallel deep learning system for tissues segmentation.

The operation of proposed scheme is quite intuitive, in the first instance any slice of a study must be entered into the system, then a binary segmentation is developed by each U-Net model. That is, all of them have to identify the pixels that correspond to the tissue for which it was trained, and therefore must be able to segment it. After that, the binary segmented images are merged in order to obtain the final segmentation. Two remarks must be stated: (1) Depending on the slice number, the different tissues should appear; in this situation, if the input image does not contain certain specific tissue, the U-Net in charge of segmenting it will return the corresponding label to the background of the image as a result. (2) If the study corresponds to a healthy patient, then there will be no abnormality or tumor, in the same way as in the previous remark, the result should be the label of the image background. This adaptive capacity of the proposed scheme allows it to be able to segment all slices of a complete medical study, automatically and without human assistance.

3.2 Proposed Cascade System for Brain Tumors Segmentation

To address the automatic brain tumor segmentation in multimodal MRI of high-grade (HG) glioma patients, a variation of the proposed previous system should be considered, since the information provided by the tissues is not sufficient to detect and segment correctly a brain tumor. In this regard, other modalities T1, T1 contrast-enhanced, T2 and Flair must be taking into account. How it was stated in [27], Enhancing Tumor (ET) structure is visible in T1ce, whereas the Tumor Core (TC) is in T2, and Whole Tumor (WT) by means of FLAIR. In view of this, the proposal to detect an segment these sub-regions is illustrated in Fig. 3.

Fig. 3.
figure 3

Cascade deep learning system for tumor segmentation.

As can be seen, these is an U-Net trained for segmenting a specific glioma sub-region; then, the binary outputs are fused to reshape whole brain tumor, based on three rules stated in Subsect. (3.1). It is necessary to point out that although both methods are based on the same deep neural network, they cannot be used alternately, since the first model must be trained to identify and segment the soft tissues of the brain, while the second is trained to detect and segment the sub-regions of the gliomas and therefore reconstruct a brain tumor. As is well known, CNNs from the input information but with a specific objective.

4 Experimental Setup

4.1 Data

In this research paper, it was considered BraTS2017 database. In specific, 210 pre-operative MRI scans of subjects with glioblastoma (GBM/HGG) were used for training and 75 scans of subjects with lower grade glioma (LGG) were used for testing the proposed system. Each study has RMIs in modalities T1, T1ce, T2 and Flair, as well as their respective ground truth images, for each modality there are 155 images of 8-bits with a size of \(240\times 240\) pixels.

4.2 Training

In order to obtain the best results in the test phase it is suggested for BraTS2017 database: a) gray-scale of \(8-\) bits, b) TIFF image format, c) Adaptive Moment Estimation (ADAM) optimization method, d) 1000 epochs and e) learning rate of 0.001.

4.3 Evaluation

In order to evaluate quantitative and objectively the image segmentation performance as well as the robustness three metrics were considered in this study. To measure the segmentation accuracy, we used the Misclassification Ratio (MCR), which is given by:

$$\begin{aligned} MCR = \dfrac{misclassified\ pixels}{overall\ number\ of\ pixels} \times 100 \end{aligned}$$
(32)

where, the values can ranges from 0 to 100, a minimum value means better segmentation. Dice Similarity Coefficient is used to quantify the overlap between segmented results with ground-truth; it is expressed in terms of true positives (TP), false positives (FP), and false negatives (FN) as:

$$\begin{aligned} Dice = \dfrac{2\cdot TP}{2\cdot TP + FP + FN} \end{aligned}$$
(33)

where \(TP + FP + TN + FN =\) number of brain tissue pixels in a brain MR image. In this metric a higher value means better agreement with respect to ground-truth. In addition to stated metrics, the Intersection-Over-Union (IOU) metric was also considered. This is defined by:

$$\begin{aligned} IOU = \dfrac{TP}{TP + FP + FN} \end{aligned}$$
(34)

The IOU metric takes values in [0, 1] with a value of 1 indicating a perfect segmentation.

5 Results and Discussion

5.1 Tissues Segmentation

A convincing way to know the true performance of the proposed method is to subject it to the task of tissues segmentation of real brain magnetic resonance images. In this regard, the first experiment is related with the segmentation of images with modalities T1, T1ce, T2 and Flair taken from the BraTS-2017 database; specifically, the Low-grade Gliomas Brats17_TCIA_420_1 study.

The performance of the proposed Parallel and Cascade Deep Learning Systems (for convenience they will be identified as P-DLS and C-DLS, respectively) is compared with other methods designed to segment brain tissues, which were mentioned previously in the introductory section, such as the Chaotic Firefly Integrated Fuzzy C-Means (C-FAFCM) [2], Discrete Cosine Transform Based Local and Nonlocal FCM (DCT-LNLFCM) [4], Generalized Rough Intutionistic Fuzzy C-Means (GRIFCM) [3], Particle Swarm Optimization - Kernelized Fuzzy Entropy Clustering with Spatial Information and Bias Correction (PSO-KFECSB) [7]. All of them were implemented in the MATLAB R2018a environment, while for ours we used CUDA+CuDNN+TensorFlow+ Keras, that is, conventional frameworks and libraries for Deep Learning, as well as a GPU Nvidia Titan X.

The quantitative evaluation was done considering the MCR, Dice and IOU metrics. A summary of these is presented in Table 1. The numerical results reveal a superior performance of the segmentation method proposed in all the metrics considered, as well as all exposition modalities.

Table 1. Average performance on Brats17_TCIA_420_1 study.
Fig. 4.
figure 4

Tissues segmentation sample results of Brats17_TCIA_420_1 study.

A sample image and the segmentation provided by all algorithms evaluated in this experiment are depicted in Fig. 4, it is possible to see that just the proposed algorithm was able to segment images with different modalities. On the other hand, all the other methods presented problems of loss of information in the segmented regions, and in some cases they were not even able to segment the images in the 4 established regions. In brief, a good segmentation of the images in these modalities can guarantee the identification and segmentation of brain tumors.

5.2 Tumor Segmentation

Second experiment is related with Multimodal Brain Tumor Segmentation, with the aim to detect and segment the sub-regions: ET, WT and TC. In this respect, the proposed cascade system is used (see Fig. 3). In order to make a comparison of system performance, some methods based on deep learning are considered; particularly, Multi-dimensional Gated Recurrent Units for Brain Tumor Segmentation (MD-GRU-BTS) [10], Masked V-Net (MV-N) [11], 3D Deep Detection-Classification Model (3DDD-CM) [12], Ensembles of Multiple Models and Architectures (EMMA) [13] and Deep U-Net (DU-N) [14]. The quantitative evaluation is made from the studies CBICA_ATX_1 and TCIA_639_1 considering the metrics established previously, a summary of the obtained results is given in Table 2 and Table 3, respectively.

Table 2. Average performance on the CBICA_ATX_1 study.
Table 3. Average performance on the TCIA_639_1 study.

The numerical results reveal a superior performance of the segmentation method proposed in all the metrics considered for three tumor sub-regions. In brief, for CBICA_ATX_1 study the proposed system obtained \(9.585\le MCR \le 11.511 \), \(0.835\le Dice \le 0.876 \) and \(0.846\le IOU \le 0.889 \); whilst for other study results showed a slight change, i.e. \(10.119\le MCR \le 11.877 \), \(0.875\le Dice \le 0.826 \) and \(0.829\le IOU \le 0.869 \). Although all the methods evaluated are based on the use of deep neural networks, the fusion of information that we consider in our proposal helped to increase its performance. To ratify and illustrate the quantitative results obtained, it can be seen in Fig. 5, samples 71 of the first study and 127 of the second study, where a segmentation closer to ground-truth can be seen. It can also be seen that some of the comparative methods could not correctly identify and segment the three sub-regions considered.

Fig. 5.
figure 5

Tumor segmentation sample results of CBICA_ATX_1 and TCIA_639_1 studies.

6 Conclusions and Future Improvements

Based on the U-Net CNN, it was introduced a parallel system to detect and segment tissues on RMIs with different modalities; considering the same idea, a cascade version was proposed to detect and segment brain tumors. Both systems were enhanced by means three fusion rules in order to do a better job. The first proposal was able to segment different modalities of real RMI images degraded inherent noise densities. Instead, the second proposal was able to detect and segment the ET, WT and TC sub-regions of a brain tumor, with a superior performance than all comparative methods. As future work, we will consider proposing a deep neural network that can perform both tasks simultaneously.