Detection of abnormal brain in MRI via improved AlexNet and ELM optimized by chaotic bat algorithm

Lu, Siyuan; Wang, Shui-Hua; Zhang, Yu-Dong

doi:10.1007/s00521-020-05082-4

Detection of abnormal brain in MRI via improved AlexNet and ELM optimized by chaotic bat algorithm

S. I : Hybridization of Neural Computing with Nature Inspired Algorithms
Published: 13 June 2020

Volume 33, pages 10799–10811, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Detection of abnormal brain in MRI via improved AlexNet and ELM optimized by chaotic bat algorithm

Download PDF

1832 Accesses
177 Citations
1 Altmetric
Explore all metrics

Abstract

Computer-aided diagnosis system is becoming a more and more important tool in clinical treatment, which can provide a verification of the doctors’ decisions. In this paper, we proposed a novel abnormal brain detection method for magnetic resonance image. Firstly, a pre-trained AlexNet was modified with batch normalization layers and trained on our brain images. Then, the last several layers were replaced with an extreme learning machine. A searching method was proposed to find the best number of layers to be replaced. Finally, the extreme learning machine was optimized by chaotic bat algorithm to obtain better classification performance. Experiment results based on 5 × hold-out validation revealed that our method achieved state-of-the-art performance.

Diagnosis of cerebral microbleed via VGG and extreme learning machine trained by Gaussian map bat algorithm

Article Open access 24 February 2020

Development of pathological brain detection system using Jaya optimized improved extreme learning machine and orthogonal ripplet-II transform

Article 27 November 2017

A Modified BAT Optimization Algorithm to Segment MRIs of Brain Subregions for Early Detection of Alzheimer’s Disease

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Brain is the most sophisticated organ in our body. It is the control center of the nervous system which controls our behavior. So, brain diseases are the most deadly compared with other diseases. Early diagnosis can assist the patients to survive these diseases. Currently, diagnosis of brain disease is dependent on medical imaging. Magnetic resonance image (MRI), computed tomography (CT) and X-ray are common medical imaging modalities in clinical diagnosis. MRI is non-invasion and free of radiation; it can provide clearer imaging results on soft tissue than CT and X-ray. Therefore, it is the first choice for brain disease diagnosis.

Recently, automated medical image analysis is becoming a hot research topic, which requires both medical expertise and machine learning. The developed computer-aided diagnosis (CAD) systems can assist the doctors and physicians to come up with decisions based on medical images. Abnormal brain detection can be regarded as an image recognition and classification problem from the viewpoint of artificial intelligence. A general framework to solve image classification problems often includes feature extraction and classifier training. During the last two decades, researchers and practitioners have proposed their methods to detect abnormal brain automatically based on MRI.

These abnormal brain detection methods can be classified toward two groups: traditional machine learning and deep learning. For classical machine learning, the image features are usually handcrafted; much attention is paid to classifier training and optimization. In [1], authors proposed to use discrete wavelet transform (DWT) for feature extraction and employed two classification algorithms for brain MRI classification: neural network self-organizing maps (SOM) and support vector machine (SVM). The SOM and SVM yielded accuracy of 94% and 98%, respectively. El-Dahshan and Hosny [2] proposed a hybrid pathological brain detection method. They firstly employed DWT to extract features from brain MRI. Thereafter, principle component analysis (PCA) was leveraged to reduce the feature dimension. Finally, feedforward back propagation neural network (BPNN) and k nearest neighbors (k-NN) were selected as the classification algorithms. The neural network and k-NN achieved 97% and 98% accuracy, respectively. Kalbkhani and Shayesteh [3] suggested to combine DWT and generalized autoregressive conditional heteroscedasticity (GARCH) for feature generation. PCA and linear discriminant analysis (LDA) were utilized to remove the redundant features. Finally, they trained k-NN and SVM to identify the types of the brain MRIs. Their approach can not only classify abnormal and healthy, but also recognize seven different brain abnormalities. Saritha, Paul Joseph (2013) [4] firstly performed DWT on brain MRIs and extracted entropies from the DWT sub-bands. Then, they proposed to use spider web plots to generate features based on wavelet entropy. Probabilistic neural network (PNN) was trained for image classification which achieved good classification results. El-Dahshan and Mohsen [5] put forward a brain tumor detection system based on brain MRI. The feedback pulse-coupled neural network was trained to segment the brain tumors before classification. Then, DWT with PCA was employed to generate image features. BPNN served as the classification algorithm which recognized the image as abnormal or healthy. The proposed method achieved 99% accuracy on both training and testing samples. Bahadure and Ray [6] put forward their brain tumor segmentation and recognition method. The signal-to-noise ratio of the raw images was improved by pre-processing. Then, they compared the performance of several image segmentation methods including watershed segmentation, fuzzy clustering means, discrete cosine transform and Berkeley wavelet transform, and found out Berkeley wavelet transform performed the best. Then, morphological operation was applied to the segmented images and a set of texture and statistical features were calculated to form the feature vector. Finally, genetic algorithm (GA) was employed for feature selection and classification. Their system achieved overall accuracy of 92.03%. Gudigar and Raghavendra [7] suggested two image decomposition methods: bidimensional empirical mode decomposition and variational mode decomposition. Then, supervised neighborhood projection embedding and bispectral feature extractor were used to generate feature vector. SVM was trained as the classifier. Experiment results suggested that variational mode decomposition was better than bidimensional empirical mode decomposition with 90.68% accuracy. Acharya and Fernandes [8] proposed a Alzheimer’s disease detection system based on brain MRI. A number of different image transforms were employed to extract features including wavelet transform and its variants. Then, Student’s t test was utilized for feature selection. Thereafter, k-NN was trained for identification and recognition. There are other reports showing the success of AI and signal processing methods in handling various tasks [9,10,11,12,13].

On the other hand, deep learning techniques usually generate image features in an automated manner. Deep learning is becoming an important tool for image classification in recent five years. Convolutional neural networks (CNNs) have made substantial improvements in image-based machine learning tasks. We no longer need to use image transforms or decomposition methods to extract handcrafted features, because CNN provides a unified framework to implement feature extraction and classification automatically and simultaneously. So, a bunch of deep learning-based abnormal brain detection approaches have been proposed recently. Nayak and Das [14] proposed a multilayer ELM autoencoder with leaky rectified linear unit to classify brain MRIs. The ELM autoencoders were stacked together to form a deep ELM in their experiment of multi-class classification. Deepak and Ameer [15] proposed a brain tumor classification approaches which can distinguish three types of tumors: glioma, meningioma and pituitary tumors. They used pre-trained GoogleNet and transfer learning to implement the classification. The last three layers in the pre-trained GoogleNet were modified and the parameters in early layers remained unchanged. So, the training of the modified GoogleNet was only for determining the weights in the last three layers. Their method achieved good classification performance in experiment. Han and Rundo [16] proposed a data augmentation method for brain tumor detection because the size of medical image datasets is small. A generative adversarial network (GAN)-based brain MRI augmentation algorithm was presented, which improved the classification accuracy. Lu and Lu [17] combined AlexNet and transfer learning for detecting the abnormalities in brain MRI. They used a pre-trained AlexNet modified the last several layers. Then, the whole modified AlexNet was fine-tuned on the brain MRIs. Their method achieved 100% accuracy on testing set.

From the above analysis, we can find that in abnormal brain detection systems based on classical machine learning, the feature extraction relies on manual image transforms and sometimes requires further feature selection and reduction. But the classifier training is generally faster than deep learning methods because of simple classifier structures and much less parameters. Deep learning methods are capable of generating image features automatically. With convolution and pooling operations in CNN, features can be learned from low level to high level gradually. Nevertheless, training deep CNN models is time-consuming.

The contribution of this study is that we aim to combine the classical machine learning and deep learning technique to obtain the fast training and automated feature learning ability. We improved the performance of pre-trained AlexNet by introducing batch normalization (BN) layers. The improved AlexNet was fine-tuned on our brain MRI dataset. Thereafter, we replaced its last several layers with ELM structure and proposed a searching approach to find the optimal number of layers to be replaced. To obtain better classification results, we optimized the weights and biases in ELM with a novel chaotic bat algorithm. Four different chaotic maps were tested. The evaluation results were all obtained by 5 × hold-out validation. Experimental results suggested that our system achieved good classification performance. Furthermore, our method provided a general framework that can be used in other image classification tasks.

The rest of this paper is as follows. Section 2 presents the brain MRIs in our experiment. Section 3 explains the methods in detail. Section 4 offers the experiment environment and settings, and the experiment research is given in Sect. 5. Finally, Sect. 6 provides the conclusion and future work.

2 Material

The brain MRIs used in this study are obtained from Whole Brain Atlas-Harvard Medical School (website: http://www.med.harvard.edu/AANLIB/). The key slices are selected by radiologists of over ten years’ experience. Our original dataset contains 177 abnormal samples but merely 28 healthy controls. To balance the ratio of the two classes, we firstly randomly select 14 samples from both classes to form the testing set and the rest samples to form training set. Then, the 14 healthy samples in training set are resampled to get 168 normal samples, which were copied for eleven times. In this way, both training set and testing set are balanced generally. The diseases with abnormal samples include cerebrovascular diseases, neoplastic diseases, degenerative diseases and inflammatory or infectious diseases. Some samples in our dataset are presented in Fig. 1.

3 Methods

We proposed a novel abnormal brain detection algorithm based on machine learning and deep learning techniques. Firstly, a pre-trained AlexNet was modified and fine-tuned on our brain MRI dataset. Then, we substituted the last several layers in the modified AlexNet with extreme learning machine. Finally, the extreme learning machine was optimized by a novel chaotic bat algorithm to obtain better generalization ability.

3.1 Improved AlexNet

AlexNet is one of the most well-known deep CNN structures proposed by Krizhevsky and Sutskever [18]. AlexNet achieved high classification accuracy on ImageNet dataset, which was a significant breakthrough in machine learning field. Since then, people started to put more time and effort to the research of deep learning models. Various deep CNNs have been invented recently, like ResNet [19], GoogleNet [20], VGG [21], etc., along with numerous training and optimization algorithms.

In this study, we propose to use batch normalization (BN) to improve the robustness of AlexNet for detecting abnormal brain. The distribution of the brain MRIs is complex because of the high variance of human brains. As a result, the distributions of inputs of the layers in AlexNet are different from layer to layer. This can make the parameter training extremely hard and time-consuming, which requires good initialization. In order to overcome this internal covariate shifting, BN is invented. The intuition behind BN is simple. As CNNs are trained in mini-batch mode, BN uses normalization transform on the activations of layers to keep the means and variances fixed. For a random variable x and its values in a mini-batch S:

$$S = \left[ {x_{1} ,x_{2} ,x_{3} , \ldots ,x_{n} } \right]$$

(1)

The mean μ_S and variance $\sigma_{S}^{2}$ of x can be obtained by:

$$\mu_{S} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} x_{i}$$

(2)

$$\sigma_{S}^{2} = \frac{1}{n}\mathop \sum \limits_{i = 1}^{n} \left( {x_{i} - \mu_{S} } \right)^{2}$$

(3)

So, the normalized values $\hat{x}_{i}$ can be obtained by:

$$\hat{x}_{i} = \frac{{x_{i} - \mu_{S} }}{{\sqrt {\sigma_{S}^{2} + \varepsilon } }}$$

(4)

Where $\varepsilon$ denotes a constant value to increase the numerical stability. Nevertheless, the normalized activations may not be the learning target of the layers in some cases. So, a transformation is added to the result:

$$y_{i} = \gamma \hat{x}_{i} + \alpha$$

(5)

Where γ and α are two learnable parameters of mini-batch S.

With BN, the training speed of deep CNNs can be accelerated and gradients are less dependent on the initial values of parameters. Furthermore, BN can serve as a regularization, which improves the generalization ability of deep networks.

3.2 ELM

The improved AlexNet can yield good classification performance, but its classification is dependent on the last several layers (mostly fully connected layers). We proposed to replace these layers with a more efficient classifier model: extreme learning machine, to further improve the detection accuracy. ELM is a training algorithm for single-hidden layer feedforward network (SLFN), proposed by Guang-Bin and Qin-Yu [22]. An SLFN contains merely three layers, namely input layer, hidden layer and output layer, shown in Fig. 2. The w and β are the input and output weights, respectively, and b denotes the bias in hidden nodes. The x and o represent the input and output, respectively.

The advantage of ELM is that it is trained without iteration, which makes it converges extremely faster than traditional BPNN [23, 24], and the generalization ability of ELM is also promising [25]. The training algorithm of ELM contains only three steps. Given a training set M:

$${\mathbf{M}} = \left[ {\left( {\varvec{x}_{1} ,\varvec{t}_{1} } \right),\left( {\varvec{x}_{2} ,\varvec{t}_{2} } \right),\left( {\varvec{x}_{3} ,\varvec{t}_{3} } \right), \ldots ,\left( {\varvec{x}_{\varvec{n}} ,\varvec{t}_{\varvec{n}} } \right)} \right]$$

(6)

Where x_i represents the input vector and t_i denotes the label, ELM firstly initializes the input weight w and bias b with random values. Then, the output matrix of hidden layer H can be calculated:

$${\mathbf{H}} = \mathop \sum \limits_{i = 1}^{{\hat{N}}} g_{i} \left( {\varvec{w}_{i} \varvec{x}_{j} + b_{i} } \right),j = 1, \ldots ,n$$

(7)

Where $\hat{N}$ denotes the number of hidden nodes and $g$(·) denotes the activation function in hidden layer. Finally, our target is to achieve the ELM output equals to the actual sample labels:

$${\mathbf{H\beta }} = {\mathbf{T}}$$

(8)

Where ${\mathbf{T}} = \left( {\varvec{t}_{1} ,\varvec{t}_{2} ,\varvec{t}_{3} , \ldots ,\varvec{t}_{\varvec{n}} } \right)^{\varvec{T}}$. So, the β can be obtained by Moore–Penrose pseudo-inverse:

$${\varvec{\upbeta}} = {\mathbf{H}}^{\dag } {\mathbf{T}}$$

(9)

Where ${\mathbf{H}}^{\dag }$ represents the pseudo-inverse of H. The training algorithm is summarized in Table 1.

Table 1 Training algorithm of ELM

Full size table

From the above analysis, it is clear that the training of ELM is simple to implement. So, ELM is widely applied in practical applications, like recognition [26], prediction [27] and clustering [28]. Therefore, we employ ELM in this study to replace the last several layers for brain MRIs classification. However, the random input weight and bias can have a bad effect on the robustness of the ELM performance; we hope to further optimize these parameters. So, chaotic bat algorithm is proposed to handle this problem.

3.3 SNN

We also employed Schmidt neural network (SNN) and random vector functional link (RVFL) net as classifiers to compare with ELM. SNN and RVFL are both random neural networks, but their structures are different. SNN was proposed by Schmidt and Kraaijveld [29]. There are three layers in SNN, shown in Fig. 3. The weights from input layer to hidden layer were randomly assigned, and there are biases in both hidden layer and output layer. The output of SNN can be expressed as

$$\mathop \sum \limits_{{\varvec{i} = 1}}^{{\hat{\varvec{N}}}} \left[ {\varvec{\beta}_{\varvec{i}} g\left( {\varvec{w}_{\varvec{i}} \varvec{x}_{\varvec{j}} +b_i} \right) +\varvec{b} } \right] = \varvec{o},\quad \varvec{j} = 1, \ldots ,\varvec{N}$$

(10)

Where $N$ denotes the number of hidden nodes. The training of SNN is similar to training of ELM, which can be implemented by pseudo-inverse to get the output weights β.

3.4 RVFL

RVFL was proposed by Pao and Park [30], which is different from ELM and SNN. RVFL firstly maps the input features to enhancement space with random weights and biases. Then, the input features and enhanced features are concatenated to form the feature vector. This structure looks like the shortcut shown in Fig. 4, which is similar to the modules in ResNet. Finally, output weights β are obtained by pseudo-inverse like ELM and SNN.

3.5 Chaotic bat algorithm

Chaotic bat algorithm (CBA) belongs to a swarm intelligent optimization method, which is evolved from bat algorithm [31]. Inspired by the echolocation behavior of bats, CBA uses a set of bats with potential solutions to search the solution space by certain strategies. In every iteration, the parameters of the bats will be updated including the position, velocity and frequency based on the optimal solution found so far. The bat algorithm is better than traditional PSO for optimization, and we introduce chaotic map to improve its searching ability.

Chaotic map is used in updating the positions of bats in our CBA. There are various chaotic maps, and we choose four maps for optimization: sine map, cosine map, Gaussian map and logistic map [32]. The formulae are presented below.

Sine map:

$$x_{k + 1} = \mu { \sin }\left( {\pi x_{k} } \right)$$

(11)

Where k denotes the iteration time and μ the parameter ranging from 0 to 1.

Cosine map:

$$x_{k + 1} = \mu { \cos }\left( {\pi x_{k} } \right)$$

(12)

Where μ represents the parameter ranging from 0 to 1.

Gaussian map:

$$x_{k + 1} = { \exp }\left( { - \alpha x_{k}^{2} } \right) + \beta$$

(13)

Where α and β are two parameters of real values.

Logistic map:

$$x_{k + 1} = rx_{k} \left( {1 - x_{k} } \right)$$

(14)

Where r denotes the parameter of positive integer values.

CBA firstly initializes the parameters in the bats with random values. Then, in each iteration, all the bats search the solution space with their velocities and update the solutions using chaotic maps. The best solution in that iteration is obtained by sorting. The iterations will continue until the stop criterion is met. A brief diagram of the CBA is given in Fig. 5.

3.6 BN-AlexNet-ELM-CBA

We propose the abnormal brain detection method based on batch normalized AlexNet, extreme learning machine and chaotic bat algorithm, abbreviated as BN-AlexNet-ELM-CBA. First of all, we employ a pre-trained AlexNet for extracting image feature from brain MRIs. We add the batch normalization layers in the AlexNet model to handle the internal covariate shifting problem. We also modify the last three layers because the original output contains 1000 nodes, but our brain images have only two categories: abnormal and healthy. Totally, six BN layers are added into AlexNet, mainly located after the convolution and pooling layers. The original fully connected layer ‘fc8’ is also replaced by two new fully connected layers. Because the output matrix dimension of ‘drop7’ is 4096 × 1, and the original ‘fc8’ in AlexNet contains 1000 nodes, but our abnormal brain detection is a binary problem. So, we use two layers ‘fc8’ and ‘fc9’ to gradually shrink the dimensions, and the dimensions for ‘fc8’ and ‘fc9’ are 256 × 1 and 2 × 1, respectively. We also construct a transfer-AlexNet (T-AlexNet) for performance comparison. T-AlexNet is built by removing all the batch normalization layers in BN-AlexNet. The three deep CNN structures are given in Fig. 6.

Then, the last several layers in BN-AlexNet are replaced by an ELM classifier. To obtain the optimal layers to be substituted, we proposed to search it by the classification performance. We test the accuracy of our system with n replaced layers based on 5 × hold-out validation and select the best one. The searching algorithm is given in Table 2.

Table 2 Searching algorithm for the optimal layers to be replaced

Full size table

In chaotic bat algorithm optimization for the ELM, the bats contain the input weight w and bias b of the ELM. The fitness function f(·) of CBA is the squared error of predicted labels and actual labels:

$$f\left( {\varvec{w},\varvec{b}} \right) = \mathop \sum \limits_{i = 1}^{n} \left( {\varvec{o}_{i} - \varvec{t}_{i} } \right)^{2}$$

(15)

Where $\varvec{o}_{i}$ and $\varvec{t}_{i}$ stand for the output of ELM and the image label, respectively, and n denotes the number of training samples. The solutions in bats are updated with their velocities and chaotic maps:

$$x_{i}^{t} = x_{i}^{t - 1} + v_{i}^{t} + \lambda \times {\text{chaotic}}\left( {x_{i}^{t - 1} } \right)$$

(16)

Where $x_{i}^{t}$ denotes the solution of the ith bat in tth iteration, and λ is the weighting parameter ranging from 0 to 1. In this paper, λ is set as 0.3. All the evaluation is carried out based on 5 × hold-out validation, i.e., we run the systems for five times and calculate the average classification performance for comparison. The pseudocode of our BN-AlexNet-ELM-CBA is presented in Table 3, and a brief diagram is illustrated in Fig. 7. Our method provides a general framework using off-the-shelf deep learning models. The system can be used in other image classification tasks by simple parameter tuning.

Table 3 Training of BN-AlexNet-ELM-CBA

Full size table

4 Experiment

We implemented our BN-AlexNet-ELM-CBA on MATLAB 2018a with deep learning toolbox. The experiment is done on a laptop with i7 7700HQ CPU, GTX 1060 GPU and 16 GB RAM.

4.1 Dataset

We obtained 359 samples in total and used 331 samples for training and 28 for testing. In training set, there are 163 abnormal samples and 168 normal controls. In testing set, there are 14 images for both classes, respectively. The dataset information is listed in Table 4.

Table 4 Dataset information

Full size table

4.2 Hyper-parameter settings

We added 6 batch normalization layers in AlexNet and modified its last three layers to obtain BN-AlexNet. The two fully connected layers ‘fc8’ and ‘fc9’ contained 256 and 2 nodes, respectively. The BN-AlexNet was trained on our brain images by stochastic gradient descent with momentum (SGDM) algorithm. The mini-batch size was 40, maximum epoch was 3, and learning rate was 1e-4. The T-AlexNet was trained on the same settings as BN-AlexNet.

The ELM is a simple structure with only one hyper-parameter: number of hidden nodes. We set 500 hidden nodes in our model considering the input dimension. Finally, the hyper-parameters in CBA were determined. The population of bat particles was 20, and max iteration was 5. The values of all hyper-parameters are provided in Table 5.

Table 5 Hyper-parameter settings

Full size table

4.3 Evaluation measurements

Six widely used measurements were employed to evaluate the classification performance of our method and compare with state-of-the-art approaches: sensitivity, specificity, accuracy, precision, F1 score and Matthew’s correlation coefficient (MCC). They can be computed by following equations:

$${\text{Sensitivity}} = \frac{TP}{TP + FN}$$

(17)

$${\text{Specificity}} = \frac{TN}{TN + FP}$$

(18)

$${\text{Accuracy}} = \frac{TP + TN}{TP + TN + FP + FN}$$

(19)

$${\text{Precision}} = \frac{TP}{TP + FP}$$

(20)

$${\text{F}}1 {\text{score}} = 2 \times \frac{{{\text{Precision }} \times {\text{Sensitivity}}}}{{{\text{Precision }} + {\text{Sensitivity}}}}$$

(21)

$${\text{MCC}} = \frac{{{\text{TP}} \times {\text{TN}} - {\text{TP}} \times {\text{FN}}}}{{\sqrt {\left( {{\text{TP}} + {\text{FP}}} \right) \times \left( {{\text{TP}} + {\text{FN}}} \right) \times \left( {{\text{TN}} + {\text{FP}}} \right) \times \left( {{\text{TN}} + {\text{FN}}} \right) } }}$$

(22)

Where TP and FP denote the numbers of abnormal samples correctly classified and miss classified, respectively, and TN and FN represent the numbers of healthy samples correctly classified and miss classified, respectively.

5 Results and discussion

5.1 Performance of the proposed method

The classification performance of T-AlexNet and BN-AlexNet is presented in Tables 6 and 7, respectively. The classification results of BN-AlexNet-ELM and BN-AlexNet-ELM-CBA are provided in Tables 8 and 9, respectively. The training time of T-AlexNet was 18 s for one time running. The comparison of the four methods is provided in Table 10 and Fig. 8. The BN-AlexNet achieved sensitivity of 78.57%, specificity of 97.14% and overall accuracy of 87.86%, which was better than T-AlexNet in terms of specificity and accuracy. So, the introduction of batch normalization did improved the classification performance of AlexNet. The performance of BN-AlexNet-ELM was better than BN-AlexNet with accuracy of 92.86%, and the BN-AlexNet-ELM-CBA outperformed BN-AlexNet-ELM with accuracy of 96.43%. The CBA optimization contributed to the high accuracy of BN-AlexNet-ELM-CBA. Though ELM training is extremely fast which can finish within 0.03 s, the ELM-CBA training can converge in 3 s, which is also acceptable for real-world application. The number of replaced layers was 3, and the chaotic map was Gaussian map in our BN-AlexNet-ELM-CBA. Detailed analysis was provided in following sections.

Table 6 Classification performance of T-AlexNet

Full size table

Table 7 Classification performance of BN-AlexNet

Full size table

Table 8 Classification performance of BN-AlexNet-ELM

Full size table

Table 9 Classification performance of BN-AlexNet-ELM-CBA

Full size table

Table 10 Classification performance comparison of our proposed four methods

Full size table

5.2 Optimal numbers of layers to be replaced

The numbers of layers to be replaced can make a great difference in our system, because the input dimension of the ELM is different with different layers to be replaced. As a result, the training and testing results of ELM vary. Therefore, we proposed to search the optimal replaced layers by evaluating the classification performance of our system with different layers to be replaced. The statistics were the average results of 5 × hold-out validation, shown in Table 11. It is clear that our system achieved over 90% accuracy except with 2 replaced layers. Because the feature dimension was only two, most information was lost. BN-AlexNet-ELM-CBA performed best with 3 layers to be replaced in terms of F1 score and MCC. Though the specificity was the highest when our system was with 5 replaced layers, the sensitivity was only 90.00%. Additionally, 4096 features were much more than 256 features, which inevitably increased the computational complexity. Therefore, the optimal number of layers to be replaced is 3 in this study.

Table 11 Performance of BN-AlexNet-ELM-CBA with different number of replaced layers

Full size table

5.3 Optimal chaotic map

In this study, we tested the performance of our BN-AlexNet-ELM-CBA with four different chaotic maps: sine map, cosine map, Gaussian map and logistic map based on 5 × hold-out validation. The results are given in Table 12. The ‘No map’ means the ELM was trained by bat algorithm with any chaotic maps. We can find that the introduction of chaotic maps generally improves the classification performance, except cosine map. The improvement was obvious in terms of specificity, from 87.14% to the best 95.71% obtained by Gaussian and cosine maps. The accuracy of BN-AlexNet-ELM-CBA with Gaussian map was marginally better than that of logistic map, and it also achieved the best F1 score and MCC. So, Gaussian map was selected as the optimal chaotic map in our method. Gaussian map provides the better chaotic mechanism for optimization of the CBA. As a result, the ELM can achieve higher generalization ability.

Table 12 Performance of BN-AlexNet-ELM-CBA with different chaotic maps

Full size table

5.4 Comparison of three classifiers

We compared the performance of ELM, SNN and RVFL with the same image features from BN-AlexNet and CBA optimization, i.e., the BN-AlexNet-ELM-CBA, BN-AlexNet-SNN-CBA and BN-AlexNet-RVFL-CBA. The statistics is presented in Table 13, which was obtained by 5 × hold-out validation. It can be seen that all the three methods achieved over 90% accuracy. The sensitivity of BN-AlexNet-SNN-CBA and BN-AlexNet-RVFL-CBA is 98.57% which is marginally better than BN-AlexNet-ELM-CBA, but the specificity of BN-AlexNet-ELM-CBA was the best among the three. Moreover, BN-AlexNet-ELM-CBA yielded over 90% performance for all the six measurements, so we think it is marginally better than the other two methods.

Table 13 Performance of different classifier structures

Full size table

5.5 Comparison with state-of-the-art methods

We offer a comparison between our BN-AlexNet-ELM-CBA and state-of-the-art methods in detecting abnormal brains in MRIs. The state-of-the-art methods include: RBFNN [33], CNN [34], GA [6] and SVM [7]. The detailed information is listed in Table 14. It is obvious that SVM yielded the best sensitivity and CNN achieved the best specificity. However, the difference between sensitivity and specificity of the two methods was relatively large, which resulted in the low accuracy. Our BN-AlexNet-ELM-CBA was marginally worse than SVM and CNN in terms of sensitivity and specificity, respectively, and achieved the best accuracy among the methods. Meanwhile, BN-AlexNet-ELM-CBA was also robust because of the small differences between the three measurements (Fig. 9).

Table 14 Performance comparison

Full size table

6 Conclusion

In this study, we proposed four novel abnormal brain diagnosing method for brain MRI: T-AlexNet, BN-AlexNet, BN-AlexNet-ELM and BN-AlexNet-ELM-CBA. Experiment result revealed that BN-AlexNet-ELM-CBA was the best of the four with sensitivity of 97.14%, specificity of 95.71% and overall accuracy of 96.43%. Our method leveraged the feature learning ability of deep neural network and the particle intelligence for ELM optimization for classification on small dataset. The introduction of batch normalization and chaotic bat algorithm improved the generalization ability of our system. Moreover, our method can provide a general framework to search the optimal feature layers in deep CNN models, which is applicable to other image classification tasks.

However, our system can only classify brain images as abnormal or healthy, multi-class classification is more useful in clinical diagnosis, which is one of our future research directions. We shall collect more samples and build bigger dataset to re-test our method. Because for deep models, generally, the bigger the training set is, the better they can perform. We shall adopt other swarm intelligent algorithms to optimize the ELM in the future.

References

Chaplot S et al (2006) Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed Signal Process Control 1(1):86–92
Article Google Scholar
El-Dahshan E-SA et al (2010) Hybrid intelligent techniques for MRI brain images classification. Digit Signal Process 20(2):433–441
Article Google Scholar
Kalbkhani H et al (2013) Robust algorithm for brain magnetic resonance image (MRI) classification based on GARCH variances series. Biomed Signal Process Control 8(6):909–919
Article Google Scholar
Saritha M et al (2013) Classification of MRI brain images using combined wavelet entropy based spider web plots and probabilistic neural network. Pattern Recogn Lett 34(16):2151–2156
Article Google Scholar
El-Dahshan E-SA et al (2014) Computer-aided diagnosis of human brain tumor through MRI: a survey and a new algorithm. Expert Syst Appl 41(11):5526–5545
Article Google Scholar
Bahadure NB et al (2018) Comparative approach of MRI-Based brain tumor segmentation and classification using genetic algorithm. J Digit Imaging 31(4):477–489
Article Google Scholar
Gudigar A et al (2019) Automated categorization of multi-class brain abnormalities using decomposition techniques with mri images: a comparative study. IEEE Access 7:28498–28509
Article Google Scholar
Acharya UR et al (2019) Automated detection of alzheimer’s disease using brain mri images- a study with various feature extraction techniques. J Med Syst 43(9):302
Article Google Scholar
Pandey HM et al (2016) Grammar induction using bit masking oriented genetic algorithm and comparative analysis. Appl Soft Comput 38:453–468
Article Google Scholar
Pandey HM et al (2014) A comparative review of approaches to prevent premature convergence in GA. Appl Soft Comput 24:1047–1077
Google Scholar
Pandey HM et al (2016) Maintaining regularity and generalization in data using the minimum description length principle and genetic algorithm: case of grammatical inference. Swarm Evolut Comput 31:11–23
Article Google Scholar
Pandey HM et al (2019) A comprehensive classification of deep learning librariess. In: Yang XS et al (eds) Third international congress on information and communication technology. Springer, Cham, pp 427–435
Chapter Google Scholar
Pandey HM et al (2018) Bit mask-oriented genetic algorithm for grammatical inference and premature convergence. Int J Bio-Inspir Comput 12(1):54–69
Article Google Scholar
Nayak DR et al (2019) Deep extreme learning machine with leaky rectified linear unit for multiclass classification of pathological brain images. Multimed Tools Appl
Deepak S et al (2019) Brain tumor classification using deep CNN features via transfer learning. Comput Biol Med 111:103345
Article Google Scholar
Han C et al (2019) Combining noise-to-image and image-to-image GANs: brain MR image augmentation for tumor detection. IEEE Access 7:156966–156977
Article Google Scholar
Lu S et al (2019) Pathological brain detection based on AlexNet and transfer learning. J Comput Sci 30:41–47
Article Google Scholar
Krizhevsky A et al (2012) ImageNet classification with deep convolutional neural networks. In: International conference on neural information processing systems, p. 1097–1105
He K et al (2016) Deep residual learning for image recognition. In: The IEEE conference on computer vision and pattern recognition (CVPR). p 770–778
Szegedy C et al (2014) Going deeper with convolutions
Simonyan K et al (2015) Very deep convolutional networks for large-scale image recognition. International conference on learning representations
Guang-Bin H et al (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1–3):489–501
Google Scholar
Huang G-B et al (2010) Optimization method based extreme learning machine for classification. Neurocomputing 74(1–3):155–163
Article Google Scholar
Tang J et al (2015) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821
Article MathSciNet Google Scholar
Wang Y et al (2011) A study on effectiveness of extreme learning machine. Neurocomputing 74(16):2483–2490
Article Google Scholar
Zhang L et al (2017) Deep object recognition across domains based on adaptive extreme learning machine. Neurocomputing 239:194–203
Article Google Scholar
Yang X-C et al (2015) Pressure prediction of coal slurry transportation pipeline based on particle swarm optimization kernel function extreme learning machine. Math Probl Eng 2015:1–7
Google Scholar
Liu T et al (2018) Extreme learning machine for joint embedding and clustering. Neurocomputing 277:78–88
Article Google Scholar
Schmidt WF et al (1992) Feedforward neural networks with random weights, In: Proceedings., 11th IAPR international conference on pattern recognition. Vol.II. conference b: pattern recognition methodology and systems. p 1-4
Pao YH et al (1994) Learning and generalization characteristics of random vector functional-link net. Neurocomputing 6:163–180
Article Google Scholar
Yang X-S (2010) A new metaheuristic Bat-inspired algorithm. In: Nature inspired cooperative strategies for optimization (NICSO 2010), vol 284, pp 65–74
Arasomwan AM et al (2014) An investigation into the performance of particle swarm optimization with various chaotic maps. Math Prob Eng 2014:1–17
Article MathSciNet Google Scholar
Lu Z et al (2016) A pathological brain detection system based on radial basis function neural network. J Med Imaging Health Inf 6(5):1218–1222
Article Google Scholar
Sajjad M et al (2019) Multi-grade brain tumor classification using deep CNN with extensive data augmentation. J Comput Sci 30:174–182
Article Google Scholar

Download references

Acknowledgement

This paper was partially supported by Royal Society International Exchanges Cost Share Award, UK (RP202G0230); Medical Research Council Confidence in Concept (MRC CIC) Award, UK; Hope Foundation for Cancer Research, UK (RM60G0680); British Heart Foundation Accelerator Award, UK; Guangxi Key Laboratory of Trusted Software (kx201901); and Henan Key Research and Development Project (182102310629); Fundamental Research Funds for the Central Universities (CDLS-2020-03); Key Laboratory of Child Development and Learning Science (Southeast University), Ministry of Education.

Author information

Authors and Affiliations

School of Informatics, University of Leicester, Leicester, LE1 7RH, UK
Siyuan Lu & Yu-Dong Zhang
School of Architecture Building and Civil engineering, Loughborough University, Loughborough, LE11 3TU, UK
Shui-Hua Wang
School of Computer Science and Technology, Henan Polytechnic University, Jiaozuo, 454000, Henan, People’s Republic of China
Shui-Hua Wang
School of Mathematics and Actuarial Science, University of Leicester, Leicester, LE1 7RH, UK
Shui-Hua Wang
Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589, Saudi Arabia
Yu-Dong Zhang
Guangxi Key Laboratory of Trusted Software, Guilin University of Electronic Technology, Guilin, 541004, China
Yu-Dong Zhang

Authors

Siyuan Lu
View author publications
You can also search for this author in PubMed Google Scholar
Shui-Hua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Dong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Shui-Hua Wang or Yu-Dong Zhang.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest regarding the publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lu, S., Wang, SH. & Zhang, YD. Detection of abnormal brain in MRI via improved AlexNet and ELM optimized by chaotic bat algorithm. Neural Comput & Applic 33, 10799–10811 (2021). https://doi.org/10.1007/s00521-020-05082-4

Download citation

Received: 17 February 2020
Accepted: 03 June 2020
Published: 13 June 2020
Issue Date: September 2021
DOI: https://doi.org/10.1007/s00521-020-05082-4

Detection of abnormal brain in MRI via improved AlexNet and ELM optimized by chaotic bat algorithm

Abstract

Similar content being viewed by others

Diagnosis of cerebral microbleed via VGG and extreme learning machine trained by Gaussian map bat algorithm

Development of pathological brain detection system using Jaya optimized improved extreme learning machine and orthogonal ripplet-II transform

A Modified BAT Optimization Algorithm to Segment MRIs of Brain Subregions for Early Detection of Alzheimer’s Disease

1 Introduction

2 Material