1 Introduction

Magnetic Resonance Imaging is a radiology method suited for medical imaging. MRI scans produce three-dimensional detailed images of the interior of a body for medical analysis. Analysts and doctors can now able to inspect the inside of the human anatomy thoroughly. There are many applications in MRI imaging, such as brain tumor segmentation and classification; cardiac segmentation; prostate segmentation, and tissue segmentation. It requires an extensive amount of human labor for MRI segmentation. It can take hours to segment scans manually and may be prone to errors. Conventional feature extraction methods are tedious for picking significant features for learning the model. Brain MRI depicting brain tumor is shown in Fig. 1

Fig. 1
figure 1

Brain MRI of a diffuse midline glioma. [14]

Researchers prefer deep learning approaches mostly Convolutional Neural Networks (CNNs) for this process. It can automatically learn features efficiently. A properly trained CNN delivers higher precision and accurate segmentation which lowers the costs of medical imaging. CNN based detects the anomalies on the MRI image more precisely the human eye. They are just neural networks that use convolution layers based on numerical operation: convolution.

A typical CNN classification process can be summarized in Fig. 2. First, an input MRI image is pushed into CNN. Then, the input image goes through multiple steps of the CNN model: Segmentation, Feature extraction, Feature selection, and Classification performed by different layers in the network [1]. Finally, CNN predicts the class of images.

Fig. 2
figure 2

A standard CNN process

Owing to advanced image processing and faster computation suitability CNNs are widely used nowadays. But there are few challenges faced by researchers which they have tried to overcome. One important challenge is related to data. CNN's require enormous data for training. Various data augmentation techniques (like flipping, translations, rotation) have been applied [2, 3]. A patch-based approach [3, 4] is also applied to resolve the issue of insufficient data samples. Some models in literature used only positive samples on account of a limited number of samples for training [5]. Another technique like fine-tuning or pre-training [6, 7] is also widely used for better performance of CNN in the medical area. MRI pre-processing methods like intensity normalization, adaptive contrast enhancements, [8] are also implemented in the literature for dealing with noise and intensity variation in MRI scans.

Traditionally, 2D CNNs are used for 2D images but it is a very challenging task to apply it in 3D images as it captures only 2-dimensional spatial information and ignores the information in the third dimension. Using 3D convolutions solves this problem [9]. Several 3D-based CNNs have been presented by researchers in literature other than 2D CNNs.

Architecting CNN is the most important factor for image segmentation or classification in MRI medical applications. There are various architectures developed in the literature by implementing traditional CNN architecture variants or by proposing architectures different from classical ones [10].

P. Mohamed Shakeel et.al [11] analyzed machine learning propagation neural networks (MLBPNN) utilizing infra-red sensor imaging technology. The imaging sensor was coordinated through wireless infrared imaging sensor which is created to transmit the tumor warm information to a master clinician to screen the prosperity condition. Peng Liang et.al [12] combined the concept of wireless network and used the optimized depth getting to know set of rules to calculate and analyze the statistics produced by the MRI image segmentation approach of cervical cancer.

In this survey, analysis of MRI data, different dimensionality of CNN model, and various CNN architectures are discussed.

2 Convolutional Neural Networks

CNN is a supervised deep-learning algorithm that is like an artificial neural network that preserves spatial relationships in the data and has fewer associations between the layers. The image grid structure is fed through the layers which preserve these relationships in the CNN. In the first few layers, CNNs look for low-level features like edges and corners. As we get deeper, more complex high-level features are recognized by using patterns recognized in the first few layers. The most basic use case of CNN is to perform the classification of images by looking at some patterns into several classes [13].

A CNN has 3 different kinds of layers: Convolutional Layer, Pooling Layer, and Fully Connected Layer. It has multiple layers of Convolution + ReLU and Pooling layers connected one after another and performs feature extraction task whereas the FC layer act as a classifier. Each of these layers has different parameters that can be optimized and perform a different task on the MRI input datasets [14]. A CNN can be created by stacking different convolution, pooling, and fully connected layers [15]. Figure 3 illustrates different layers of a CNN.

Fig. 3
figure 3

Different layers in CNN [69]

2.1 Convolution Layer

It is the first layer in CNN and known as the feature extractor layer. Image features (like edges) are get extracted within this layer that can be used later in the network. This layer incorporates a set of filters which is a 2D matrix and an input image 2D matrix of pixels. For example, a 3 × 3 filter and a 5 × 5 input image can be used to produce a 3 × 3 output image by convolving the filter with the input image. The output image is known as a feature map.

Convolution operation includes:

  1. (1)

    Filter is carried out throughout the input image via a sliding window

  2. (2)

    Performing element-wise product of filter within the image.

  3. (3)

    Adding up altogether the element-wise products. The end result is the destination pixel of the output image.

  4. (4)

    Repeat the above steps for all the locations. A feature map will be generated for each convolution operation.

Different filters can perform different operations like edge detection, blur, and sharpen. After applying all the filters on an input image, a tensor of feature maps is obtained.

Rectified linear unit (ReLU) is an activation function, and it is applied to every value of the feature map generated by performing convolution operations (mentioned above). It extends of convolution layer. This function increases the non-linearity of CNN by removing all the negative values from the convolution and changing them to zero. All the positive values remain the same. It provides better feature extraction This function produces rectified feature maps. This layer is also referred to as Convolution + ReLU which performs convolution followed by an activation function.

2.2 Pooling Layer

A lot of information contained in a convolution layer’s output is unessential. As an instance, in case we utilize an edge-detecting filter and discover a strong edge at a certain region, probabilities are that we’ll additionally discover enormously sturdy edges at locations 1 pixel moved from the first one. We’re no longer locating anything contemporary. Pooling layers remedy this issue [13].

This layer is generally used between two convolution layers. It reduces the size of feature maps by retaining important information. It keeps down the number of features from the convolutional layer and generates a condensed shape of characteristic maps. Each feature map is down-sampled using this layer by performing a specific function.

Pooling function can be of different types:

  1. (1)

    Max

  2. (2)

    Average

  3. (3)

    Min.

Max pooling function is widely used as it works better than others. For example, a 2 × 2 max filter traverses the 4 × 4 input image to generate the maximum value within the region and place the max value into the 2 × 2 output image at the corresponding pixel [13]. Similarly, average-pooling extracts the average value in a filter region, and min-pooling takes the minimum value in a filter area.

2.3 Fully Connected Layer

The reason for this dense layer is to use high-level features extracted in preceding layers (convolution and pooling) for classifying the input image. This layer involves flattening and transforms a 2D pooled feature map into a 1D feature vector before classification. The generated feature vector is fed into the FC layer for processing. Each neuron of the previous layer is connected to the next layer hence fully connected, unlike the convolution or pooling layer.

Finally, an activation function such as sigmoid or softmax is used to classify the output or predict the class labels. It is the last layer in the classification task in the network.

In some modern CNN models [16,17,18,19], the dropout technique is applied on the dense layers to reduce over-fitting by removing neurons during the training phase.

3 CNN for MRI Analysis

This section presents a broad survey of CNN based on MRI input data processing, dimensionality, the architecture of the model.

3.1 MRI Data and Processing

CNN's are highly dependent on which type of input images are fed in the network. Difficult datasets are small, imbalanced, or heterogeneous. There are multiple MRI data sets available publicly for various purposes like BRATS for brain tumor detection. The four most commonly used MR modalities are T1-weighted (T1W), T2-weighted (T2W), T1-Gadolinium (T1Gd), and Fluid attenuated inversion recovery (FLAIR).

Annotation of MR images is a challenging task due to incorrect labeling of MR sequence types or modalities. Muhammad Sara Ranjbar et al. [16] demonstrated automatic annotation of MRI sequence types by using a deep CNN which can successfully detect patterns. Current works struggle with enhanced modalities and generate good results on non-enhanced T1W MRI scans. Jens Kleesiek et al. [20] presented a 3D CNN for brain extraction to address this issue which can deal with non-enhanced and contrast-enhanced T1w, T2w, and FLAIR contrasts.

A few current studies investigated CNNs based on multi-modalities as input. As an instance, Jose Dolz et al. [21] proposed two strategies: early and late fusion to deal with the trouble of low contrast brain MRI images. Similarly, Zeju Li et al. [22] additionally carried out early and late fusion with nearby slices, and experiments confirmed that early fusion can enhance segmentation consequences and late fusion can make it sensitive even if the tumor had low contrast. In early fusion, multi modalities were merged at the input, and in late fusion, independent channels were employed for every modality. Brain tumors are generally badly contrasted and making it tough to differentiate wholesome tissue from the tumor. Mina Ghaffari et al. [23] integrated multimodal MR images to get more comprehensive information and to overcome the poor contrast of brain tumor MR images. Some different specialized MR modalities known as fMRI and sMRI having complimentary brain information or features were combined for ADHD classification. Liang Zou et al. [9] showed that multi-modality CNN using fMRI and sMRI together as input achieves high accuracy as compared to single modality CNN for analysis of Attention deficit hyperactivity disorder (ADHD). Yan Wang et al. [24] also proposed multi-modality CNN by using fMRI and DTI together as input which is effective enough for multimodal MRI analysis for Alzheimer’s Disease.

Data augmentation, a technique to increase a variety of training data by applying operations like flipping, rotation, translations, reflections on images has been applied by some researchers. Other than generating new samples, it also helps in reducing over-fitting and class imbalance issues coming from the dataset [25]. By making use of data augmentation methods one can achieve satisfactory segmentation results on small datasets [26]. It also increases the generalizability of the segmentation outcomes [23]. Intensity inhomogeneity in MR scans is also a challenge. It may be caused by spontaneous movements of objects while scanning. N. Khalili et al. [27] proposed a data augmentation technique in which intensity inhomogeneity artifacts are brought synthetically in training data to cope up with intensity inhomogeneity artifacts. This method can doubtlessly take over pre-processing steps, such as bias field corrections, and complements the segmentation performance using CNN. It also helps in dealing with low-quality data. Insufficient or less training data may result in poor performance of CNN. Subhasis Banerjee et al. [24] observed that training brain MRI without data augmentation CNN model causes a drop in performance accuracy as it leads to over-fitting. Zhiqiang Tian et al. [7] stated that CNN performs better with more training datasets. It is observed that by applying data augmentation operations on the dataset, the performance of CNN can be improved (Table 1). Guy Amit et al. [28] suggested that with adequate data augmentation, the domain-specific trained networks can outperform specific classifiers. Hiba Mzoughi et al. [2] confirmed with experimental results that with the aid of applying only the flipping technique, CNN classification accuracy can give promising results outperforming recently unsupervised and supervised state-of-art approaches for classification of gliomas brain tumor. Muhammad Sajjad et al. [1] also improved accuracy by employing extensive data augmentation for the classification of multi-grade brain tumors and experiments showed convincing performance compared to existing approaches. Richard Ha et al. [17] employed data augmentation to limit over-fitting and advanced accuracy in predicting auxiliary lymph node metastasis using a breast MRI dataset. However, aggressive data augmentation may also degrade the performance of CNN in the case of fixed-budget training [5]. Data augmentation strategies that can ruin positional details of brain MRI scans need to be averted as it will impact the performance of the classifier. Xu Han et al. [29] added salt and pepper noise to increase the number of training samples in preference to using rotation or flipping. Most of the researchers applied to pre-process in CNN-based classification or segmentation task.

Table 1 Summary of MRI Data and Processing

Pre-processing methods like intensity normalization are applied for coping with low contrast MRI images and heterogeneity caused by multi-site multi-scanner acquisitions of MRI images. It is proved that combined with data augmentation, intensity normalization as a pre-processing step is found to be very powerful for brain tumor classification [2, 30] and eliminates over-fitting [31]. It also enhances image quality by removing noise. MRIs usually suffer from an inconsistent intensity problem known as bias field distortion which may affect the performance of CNN. For consistent intensity range, bias field distortion correction can be performed on input MRI scans [32,33,34,35]. Data Pre-processing is carried out earlier than feeding images to the network for training. Sometimes, MRI scans are corrupted by rician noise at some stage in the acquisition. Low-level image processing tasks such as image denoising for MRI scans are necessary for efficient disease diagnosis. Prasun Chandra Tripathi et al. [36] proposed a CNN-DMRI model for denoising of MRI scans and experimental findings conducted on synthetic as well as real MRI of unseen noise levels suggested that this model achieves promising results. R.R. Janghel et al. [37] suggested that by applying some image pre-processing technique before sending to the network for feature extraction increase the performance of CNN model. The proposed pre-processing algorithm in this work was to convert 3D fMRI to 2D fMRI which saves the computation cost and generates interesting features for early diagnosis of Alzheimer’s Disease. Generally, a dataset has extraordinary sizes of MRI scans however CNNs ought to have a definite typical image size. Image resizing is one pre-processing approach to deal with different sizes of images. In this technique, all the images of the dataset are resized to a similar size [29, 38, 39]. Table 2 shows issues of different MRI data processing techniques. Details of frequently used MRI datasets are provided in Table 3.

Table 2 Issues or limitations of MRI data processing techniques
Table 3 MRI Datasets

3.2 CNN Dimensionality (2D/2.5D/3D)

CNN can be classified on the basis of the dimension of the input patch. There are three approaches used in the literature: 2D, 2.5D, and 3D. Total number of publications for each CNN dimensionality from 2015 to 2020 is given in Table 4.

Table 4 Dimensions of different CNN models studied

The overall concept behind using a 2D CNN is to use 2D convolution filter to classify MRI scans supported slices. Another approach is to apply 2.5D CNN that could cope with some amount of spatial information. This approach is hardly ever used in the literature. They could strike very good stability between overall performance and computational costs by means of using 2D as well as 3D convolution. Yunzhe Xue et al. [40] designed a new 2.5D CNN architecture that takes 2D slices as input and looks into all 3 orientations with their 3 normalizations. The output from this 2D CNN architecture is combined into a 3D volume which is then passed through a 3D CNN for post-processing for segmenting stroke lesions in brain MRI images. And yet a better method is 3-D CNN which offers overall better performance and copes with richer spatial information than a 2.5D CNN. In 3D CNN, 3D convolutional kernels are applied on whole volumetric patches and are passed to the network.2D-CNNs can only use single slices as inputs and can’t supply inter-slice information. 3D CNNs remedy this hassle by leveraging inter-slice information which could cause advanced performance.

Recently, 2D CNN is extended to 3D CNN architectures to extract many learnable parameters and complicated features [25]. Hiba Mzoughi et al. [2] offered a 3D CNN approach for brain tumor classification to merge both local and global contextual information with reduced weights. It additionally generates the foremost discriminative feature map in comparison to the 2D CNN approach which captures only 2-dimensional spatial information. Liang Zou et al. [41] proposed 3D CNN for ADHD classification using rs-fMRI which can learn 3D local patterns and may boost the classification accuracy even with less training data. Also, a comparative study by Wei Feng et al. [42] indicates that 3D-CNN has the potential to capture the 3D context of MRI scans, whereas 2D-CNN can only filter scans of 2D local patterns and 3D-CNN was found superior to 2D-CNN for detection of Alzheimer’s Disease. In addition, Jose Dolz et al. [21] also confirmed performance improvement over 2D CNN by using 3D CNN. Masaru Ueda et al. [31] proposed a 3D CNN model to make full use of the potential of volume data in age estimation from brain MRI. Also, the extended 2D CNN model proposed by Mariana Pereira et al. [43] provided the best performance when compared to 3D CNN-based ones. Different data dimensionality influences the performance of the final results of a CNN [44]. However, experimental results showed that performing slice-by-slice 2D segmentation to correct 3D reconstruction is effective for both 2D and 3D tasks [45]. Usage of 3D context information with the assistance of a relatively less expensive 2D convolution filter can be implemented for segmentation tasks. Raghav Mehta et al. [46] introduced a CNN model which utilizes only 2D convolution operating on 3D image making the model memory efficient. Bijen Khagi et al. [47] also presented an idea of classifying 3D MRI images however permitting the 2D features generated from the CNN framework. Vinutha N et al. [48] suggested that the model performs well when the training sample size is large and the feature vector is reduced. Seyed Sadegh et al. [49] also proposed CNN with 3 parallel 2D pathways that circuitously grasp 3D image information without the requirement for computationally high-cost 3D convolutions. To efficiently combine the strength of both long-range 2D context and short-range 3D context, a hybrid 2D-3D CNN architecture is also possible. Pawel et al. [50] introduced one such model within which features are imported within the portion of the network where the features maps are comparatively small and experimental results showed that this model performed better than standard 2D or 3D CNN.Processing 3D images in the network are tedious. 3D-CNNs utilize numerous parameters which increase their computational expenses. As compared to the 2D-CNN, the 3D-CNN performs greater complex computations which increase memory load due to the moreover, added a degree of size.3D-CNN may also overfit the data and need more computational resources to avoid it. It is observed that due to high speed and low computational costs, most of the authors have implemented 2D CNN (Fig. 4) as compared to 3D CNN even though it is proved to be a better feature extractor [4]. To reduce its expensive computational cost, Qi Dou et al. [51] proposed a cascaded framework under 3DCNNs in order to speed up processing and remove redundant computations. Table 5 provides limitations of 2D CNN and 3D CNN techniques.

Fig. 4
figure 4

Dimension-wise different CNN models from 2015 to 2020

Table 5 Issues or limitations of CNN Dimensionalities

3.3 CNN Architecture

A basic CNN architecture has the following components:

  1. (1)

    The first layer is the input layer where data is fed into CNN for classification.

  2. (2)

    Convolutional layer extracts feature from input images by applying various filters.

  3. (3)

    Pooling layer which is placed after the convolution layer scales down the dimension of the input image.

  4. (4)

    Fully connected layer recognizes patterns of features and flattens the output generated by previous layers.

  5. (5)

    The Last layer is the output layer that determines the class for the image.

The speed and accuracy of CNN for performing various tasks depends on architecting CNN, how the layers are designed, and the network parameters used in each layer. Table 6 gives a brief description of some CNN architectures employed by researchers in MRI medical imaging. Traditional CNN network architectures such as LeNet-5, AlexNet, GoogleNet, VGG, U-net, ResNet achieved good results at the ILSVRC [52] were adopted by a few of the researchers Traditional CNN network architectures such as LeNet-5, AlexNet, GoogleNet, VGG, U-net, ResNet achieved good results at the ILSVRC [43] were adopted by few of the researchers is developed on a large dataset and it is reused for a different but related task. In simpler words, people fine-tune pre-existing CNN models trained on the non-domain dataset and retrain them on domain-specific datasets via transfer learning. Saman Saraf et al. [53, 54] classified Alzheimer’s disease by using two traditional CNN architectures: LeNet-5 and GoogleNet where an accuracy of 98.84% was achieved with structural MRI data and an accuracy of 99.99% were achieved with fMRI. V.BHanumathi et al. [55] analyzed the performance of AlexNet, VGGNet, and GoogleNet classifiers for brain tumor images and concluded that GoogleNet performs 10% more accurately than AlexNet and VGGNet. Muhammed Talo et al. [56] also compared the performance of the already pre-trained model AlexNet, Vgg-16, ResNet-18, ResNet-34, and ResNet-50 for multi-class brain disease detection, and experimental results showed that deep Res-Net-50 has attained the highest accuracy of 95.23% among five models. Deeper architectures like ResNet-50 having more layers performs well in classification task as compared to shallower architectures having fewer layers like AlexNet. Mahjabeen Tamanna Abed et al. [57] used 3 different deep networks: VGG19, ResNet50, and InceptionV3 via transfer learning and concluded that deep neural networks are certainly powerful regarding producing particular alternatives primarily based on complicated datasets. However, Subhasis Banerjee et al. [24] inferred that going deeper with convolutions did not help improve the performance of CNN. AlexNet was proven to be a good feature extractor. Also, it is an efficient transfer learning framework for the classification of brain tumors [39]. Sérgio Pereira et al. [30] also found that shallower architectures give a low performance. Bijen Khagi et al. [47] compared accuracies for scratch trained CNN features and AlexNet CNN features for classification of 3 class Alzheimer’s disease, even though consequences had been no longer promising however it could be concluded that AlexNet performs better than scratch trained CNN softmax classification supported probability score. But, Guy Amit et al. [28] showed that small-size CNN can learn features more accurately than pre-trained domain non-specific VGGNet for classification of breast MRI lesions. Sajjad et al. [1] utilize the pre-trained model VGG-19 having 19 layers for multi-grade tumor classification and achieved an accuracy of 95.58%. Similarly, Rachna Jain et al. [58] used a pre-trained VGG-16 as a feature extractor for the 3-way classification of Alzheimer’s Disease and demonstrated that VGG-16 was able to extract important features even though it was trained on non-domain specific images. R.R.Janghel et al.[37] also employed VGG-16 as a feature extractor for Alzheimer’s Disease which generates output quickly and accurately due to less complexity. It can be concluded that fine-tuning a pre-trained model saves huge labor required to build a model from scratch. Also, it is advantageous in reducing computational costs.

Table 6 Description of some CNN Architectures

Various researchers have fine-tuned only a few layers of traditional pre-trained CNN architectures. Muhammad Sara Ranjbar et al. [16] used a variation of VGG Net architecture instead of using full architecture for annotation of magnetic resonance imaging sequence type. Mariana Pereira et al. [43] utilize 2 layers of Resnet34 architecture instead of using the full pre-trained model and proposed a prolonged 2D model for multiclass Alzheimer's Disease classification by achieving an accuracy of 68.6% and performs nicely compared to the state-of-the-art AD classification method. Further, Seyed Sadegh et al. [49] have used U-net CNN architecture and augment it with the auto context algorithm for brain extraction. Mina Ghaffari et al. [23] compared changed U-net with a common U-net and showed that the proposed model accomplished higher brain tumor segmentation accuracy than the generic U-net. This turned into feasible by means of changing the skip connections with specific DenseNet blocks for transmitting semantic records of its encoder component to the decoder component. Another approach of modified U-net is to use multiple U-nets in parallel for optimizing the ability of the model. Additionally, Sadegh Charmchi et al. [59] optimized U-Net architecture and proposed a modified shallow U-Net for left ventricle segmentation using MRI. Yunzhe Xue et al. [40] employed 9 end-to-end U-nets in multi-path and multi-modal systems for capturing contextual information of brain MRI scans from all three planes and their normalization.

Another CNN model that is broadly used in literature is Fully Convolutional Networks. Unlike conventional CNN, it does not contain dense layers or absolutely connected layers at the end of the CNN network. Phi Vu Tran et al. [60] proposed the first FCN model for pixel-wise labeling in cardiac MRI and carried out state-of-the-art segmentation accuracy on multiple metrics. In contrast, the primary study to fine-tune a pre-trained FCN model was performed by Zhiqiang Tian et al. [7] for prostate segmentation and it is found that utilizing fine-tuned FCN could give promising segmentation results. A different FCN architecture based on encoder-decoder known as SegNet is used by researchers for pixel-wise segmentation tasks. Bijen Khagi et al. [61] segmented carefully associated brain MRI scans based totally on pixel-labeling with promising outcomes utilizing the encoder-decoder architecture of SegNet layer and suggests that like semantic segmentation of outdoor scenes images, CNN can be fruitful in medical MRI segmentation as well. Similarly, Fang Liu et al. [62] applied SegNet for high resolution pixel-wise multi-class tissue classification and showed that using SegNet as the heart of segmentation method yields accurate and fast cartilage and bone segmentation. Prasun Chandra Tripathi et al. [36] employed an encoder-decoder structure that performs up-sampling and down-sampling of images effectively to preserve important features of the images while denoising of MRI scans.

Few hybrid CNN models are also presented in the literature. Dinthisrang Daimary et al. [63] proposed 3 hybrid CNN models: Res-SegNet, U-SegNet, and Seg-UNet to acquire the characteristics of SegNet, ResNet, and U-Net which are the foremost famous CNN models for semantic segmentation and showed that hybrid models achieved high accuracy as compared to the opposite CNN models for brain tumor segmentation. Another hybrid model by using classical ResNet architecture as the base is presented in the literature. Ahmet Çinar et al. [64] developed a hybrid model by replacing the final 5 layers of ResNet50 by means of 10 CNN layers and this improved model achieved better results than the classical architectures such as InceptionV3, GoogleNet, Alexnet, Resnet50, and Densenet201 models. Raheleh Hashemzehi et al. [65] also proposed a new hybridization of neural a convolutional neural network (CNN) and neural autoregressive distribution estimation (NADE) which extracts prominent features for image classification and was found a beneficial tool for brain tumor detection. Hari Mohan Rai et al. [38] also proposed a combination of U-Net and Le-Net known as LU-Net having fewer layers with some modifications for detection of brain tumor and was found to be superior to Le-Net and VGG-16 models by achieving an overall accuracy of 98% due to its less complex architecture.

Many researchers have implemented the ensemble learning technique of CNNs. In ensemble learning, predictions of multiple trained base learners are combined into a single output. Jose Dolz et al. [21] demonstrated the benefit of combining multiple image modalities as input employing either early fusion or late fusion with overall execution enhancement over a single CNN. Li Sun et al. [8] to utilizes an ensemble of 3 different CNNs for brain tumor segmentation and evaluation results suggest that the ensemble method can reduce model bias and performs better than individual ones. Mostefa Ben naceur et al. [66] also adopted the ensemble learning technique by proposing EnsembleNet which takes the benefit of parallel architecture in less time and was found to generate highly accurate results for segmentation of brain tumors. Similarly, Pierrick Coupe et al. [67] uses an ensemble of many CNNs and proposed a model called AssemblyNet, which includes 2 assemblies of U-Nets. Reza Rasti et al. [68] developed a new model called ME-CNN for a mixture of an ensemble of CNNs and experimental results for breast cancer diagnosis indicate aggressive classification performances in contrast with existing single classifier as well as ensemble method.

4 Conclusion and Future Scope

This survey paper is centered on the viability of CNN for MRI analysis. It is used in various medical applications. This survey conveyed useful approaches to understanding the CNN and MRI domain and different techniques. In this survey, analysis of MRI input data and pre-processing is discussed. It can be concluded that pre-processing techniques & data augmentation can enhance the overall performance of the CNN model and can deal with poor quality MRI and fewer data. Furthermore, CNN dimensionalities and architectures are also analyzed.2D CNNs are broadly implemented in the literature due to low computation costs. One can also use 3D CNN when the cost is not an issue. Comparative analysis of various CNN architectures adopted for medical applications is also summarized in this study.

In the future, we will investigate more CNN-based techniques adapted in literature and their effect on the performance of the CNN model for MRI data. For different medical imaging like CT scanning or X-Ray, CNN can also be analyzed.