Keywords

1 Introduction

Alzheimer’s disease (AD) is a progressive neurological brain disease, which is caused due to the damage of nerve cells in parts of the brain [1]. It begins with the loss of memory, difficulty in speaking language and other cognitive functions making a patient unable to perform day-to-day life activities. In particular, [1] researchers found that AD is not only common cause of dementia but eventually leading to death of people, which become a remarkable focus in research (Fig. 1).

Fig. 1
figure 1

Proposed deep learning flow for classification of Alzheimer’s into AD, MCI and NC

According to Alzheimer’s association, it is the sixth leading cause of death in the USA[2]. A survey [3] stated that there will 131.5 million people living with dementia worldwide and most of them with age greater than 65 has higher rate of risk with this disease. The brain region including thinking ability, memory, reasoning of the patient wrinkle up and shrinks in the hippocampus area. This is the main cause of suffering from AD. The visualization of AD and healthy brain shown in Fig. 2 gives the idea that memory and language muscles have diminished. Genetic mutation is another cause for AD; estimated to affect about 1% or less than 1% people [4].

Fig. 2
figure 2

Brain cross-sectional to visualize difference between healthy brain and Alzheimer brain [5]

An early diagnosis of this disease becomes crucial and requires good clinical assessment based on patient’s medical history, several neuropsychological tests such as mini-mental state examination (MMSE), neuropsychiatric inventory questionnaire, clinical dementia rating and other pathological evaluations [6]. In addition to these clinical methods, there many other techniques to detect AD such as biomarkers, cerebrospinal fluid (CSF) analysis, brain imaging include magnetic resonance imaging (MRI)/positron emission tomography (PET), analysing proteins in blood.

Traditional approach for classification of AD to assist diagnosis was to generate feature wavelets using discrete wavelet transform (DWT) method. This does not give detection of disease; machine learning algorithms are required to do further processing [7].

Machine learning approaches are best for accurate classification of AD [8]. Most popular among these approaches is support vector machine (SVM). Rathore et al. [9] proposed a framework uses feature-based ranking method with SVM to classify the disease as AD and healthy controls (HC). SVM is used to build predictive classification models, which extracts informative features, high dimensional from the MRI [9]. However, this requires handcrafted features of brain structures, which is laborious and time consuming leading to need of experts advice.

An alternative family of machine learning methods is deep learning algorithms. Deep learning algorithms perform automatic feature extraction without human intervention. Due to the availability of more number of hidden layers, deep learning approaches learn the high-level representation from the raw data. This is the reason behind its popularity in the computer vision domain. Ortiz et al. [10] discussed many deep learning architectures for early diagnosis of AD. Convolutional neural network (CNN) are inspired from human visual cortex and learns the features from simple edges to more complex edges from the dense hierarchical structure. It is building block of convolution and pooling layers. Convolutional layer provides the feature maps by multiplying the input image with the kernel, and pooling layer down samples the image keeping the similarity features [11]. This stimulated many neuroscience researchers to find their solution to the problem associated with neuroimaging. Shi et al. [12] described multimodal classification of AD with four classes. Staked autoencoders (SAEs) are used for feature learning for both MRI and PET. These features are fused and trained using SVM, which achieved very less accuracy compared to other available multimode classification. Cui et al. [13] addressed sequential analysis of MRI image along time axis by measuring the longitudinal progression. Multi-layer perceptron (MLP) is used for spatial features, and to train these features, recurrent neural network (RNN) is used. However, such algorithm requires rigid segmentation as a preprocessing task. The accuracy achieved is 89.69% in two-way classification as AD and NC. Islam et al. [14] proposed a DCNN model for four classes. In this, five DCNN models have trained and output features are fused to get the prediction of disease. The uniqueness of this approach is that every model gives various features different from one another making the model generalized for unseen data prediction with accuracy 93.18%.

There are many works available on CNN method for detection of AD. Gunawardena et al. [15] addressed the problem for pre-detection of AD for three classes with accuracy achieved is 84.4%. Combination of CNN and RNN is a new approach for AD diagnosis proposed by Liu et al. [16]. 3D PET images sliced into 2D images which trained by CNN and RNN are used to classify the CNN features with accuracy 91.2% for one-vs-all of three classes. Khvostikov et al. [17] used fusion of structural MRI and mean diffusivity-diffusion tensor imaging (MD-DTI) on hippocampal ROI region for AD diagnosis. Wang et al. [18] proposed an eight layer CCN for with different types of activation functions and pooling operations for two classes, achieved accuracy is 97.65%.

In this work, a framework-based DCNN is implemented for detection of AD, which uses three layer of CNN model for three classes—AD, MCI and NC. This work does not use any rigid segmentation for grey matter (GM) of the data. Experimental evaluation is performed in terms of accuracy and loss on both training as well as testing dataset. The rest of the papers describes the methodology and implementation in Sect. 2, experiments and results discussion in Sect. 3 and conclusion in Sect. 4.

2 Proposed Methodology

Figure 1 shows the flow of proposed methodology which consists of two steps, preprocessing and network training. The detail description is in the following sections.

2.1 Data Preprocessing

Medical images when acquired from any imaging equipment are in Digital Imaging and Communications in Medicine (DICOM). After acquisition, they need to be converted into proper format (JPG, PNG, TIFF, etc.) for further processing. In our work, we have converted MRI samples into JPEG slices in MATLAB tool. Pixel size of each sample is reduced to 8-bit from 14-bit size by rescaling to 255. Processed slices for each class are shown in Fig. 3.

Fig. 3
figure 3

JPEG slices for each diagnosis class after preprocessing (left: AD, centre: MCI, right: NC)

2.2 Network Architecture

Convolutional neural networks (CNN) are inspired from human visual system. The visual system has small number of neuron cells sensitive to a specific field, i.e., some neurons in the brain fired only in the presence of edges in particular orientation. Such operation is depicted in CNNs. The functioning of convolution layer is to automatically extract features maps from the input images by using element-wise multiplication with filter along entire image. Pooling layer is generally used to avoid overfitting problem, i.e., when network memorizes the data instead of generalization. Rectified linear unit (ReLU) activation is used to fire the neuron or to determine the output of neural network. Feature maps are extracted with combination of several such Conv-ReLU-Pool operations and reach the final single or multiple fully connected layers. The detail operation of proposed architecture is as follows.

Our proposed model shown in Fig. 4 is an ensemble of three blocks; each of the individual blocks has several layers performing three basic operations, which are:

Fig. 4
figure 4

Architecture of proposed model

  • Convolution

  • ReLU activation

  • Max pooling

The architecture consists of three sets of convolutional and max pooling layers, followed by a flattening convolutional layer features, then two fully connected layers and finally a softmax/sigmoid classifier. Output has three classes, which are Alzheimer disease (AD), normal control (NC) and mild cognitive impairment (MCI). The input for architecture is a 256 × 256 gray scale image, which passes through the first convolutional layer with 32 feature maps with filters having size 3 × 3, a stride of one, and pooling is made zero with ReLU activation function. The image dimensions change from 256 × 256 × 1 to 254 × 254 × 32, according to the dimension formula given below:

$$ n^{[l]} = \left( {n^{{\left[ {l - 1} \right]}} + 2p^{[l - 1]} {-}f^{[l]} } \right)/s^{[l]} + 1 $$
(1)

where n is the size of input image or previous layer image, p refers to padding and s refers to stride, l refers to current layer, and F refers to the filter size. Then, the network applies max pooling layer with a filter size 3 × 3 and a stride of one. The resulting image dimension reduced to 84 × 84 × 32. Next, there is a second convolutional layer with 64 feature maps having size 3 × 3 and a stride of 1, so image dimensions reduced to 82 × 82 × 64. Then again max pooling with filter size 3 × 3, dimension reduces to 27 × 27 × 64. Next, third convolutional layer with 128 feature maps having size 3 × 3 and stride 1, dimension reduces to 25 × 25 × 128, and with max pooling, the dimension reduced to 8 × 8 × 128. Then, total parameters obtained are 8192 by flattening.

The fifth layer is a fully connected convolutional layer with 256 feature maps each of size 1 × 1. Each of the 256 units is connected to all the 8192 (8 × 8 × 128) in the fourth layer. The sixth layer is a fully connected layer with 256 units. Finally, there is a fully connected softmax output layer with 10 possible values corresponding to the digits from 0 to 9. Specification of the proposed model shown in below Table 1.

Table 1 Specification of proposed model

3 Experiments and Results Discussion

3.1 Dataset

The dataset used in this work obtained from Alzheimer’s disease neuroimaging initiative (ADNI) [19]. Dataset consists of 110 AD, 105 MCI and 51 NC subjects, where each subject contains 44 to 50 sample of images. Out of which 110 AD subjects are collected from Horizon Imaging Centre [20]. There are total of 9540 images used for training the network and 4193 images for testing. Data augmentation on images is done with rescale operation.

3.2 Hyper Parameters

Deep learning approaches advocate solving the problem end to end rather breaking the problem into different parts. Due to which it fails to interpret the reasoning behind the result obtained; because of the unknown work done by the collective neurons behind the dense structure of network. Hence, to improve the performance of the model, the following parameters are available for tuning.

  • Loss function

  • Optimizers

  • Layers

  • Drop out

  • Epochs

  • Augmentation

The loss function is the guide to the terrain, telling the optimizer when it is moving in the right or wrong direction. Optimizers shape and mould the model into its most accurate possible form by futzing with the weights. Dropout is used to prevent the network from overfitting by disabling the neurons on purpose; which is done with some probability; normally 0.2 probability drop is preferred. Data augmentation artificially increases the size of training set to avoid regularization of the network. By tuning the number of layers and epochs, accuracy of the model can be increased.

3.3 Experimental Setup and Evaluation

Proposed model is implemented with the Keras library with Tensorflow backend. The experiments are conducted on laptop with 8 GB RAM of Dell Intel Corei7. Model is trained on NVIDIA Ge Force 540 M GPU with 8 GB memory. Relu activation is applied for each neuron of CNN. Output is classified as AD, MCI and NC. There are total of 9540 images are used for training the network and 4193 images for testing. Loss function used is binary cross-entropy. Batch size taken is 10. Optimizer used are Adam, SGD (lr-0.0001, decay-0.00001), Adagrad, Nadam, SGD, Adadelta, Rmsprop. Drop out is kept 0.2 probability. Dense activation function used is softmax. Network is trained for 10 epochs.

Table 2 shows the results of the proposed DCNN model. Performance is evaluated in terms of accuracy and loss for training as well as validation set. Loss gives the best knowledge of how fit is the model. Out of all the optimizers, Adagrad proved to give best accuracy with less loss because it does not need manual tuning of learning rate as it makes small updates for frequent parameters and big updates for infrequent parameters. The accuracy versus epoch and loss versus epoch graph for both training and validation set is shown in Fig. 5. It is seen that for training set as the accuracy reached 98.57% loss is dropped down to zero. This gives the measure of progression during training period of the model. While the validation set gives the measure of the quality of the model. Validation accuracy has reached 87.72%, which describes that with 87.72% accuracy model can predict the detection on new data.

Table 2 Performance of proposed framework
Fig. 5
figure 5

Accuracy and loss for training set and validation set for 10 epochs with Adagrad optimizer using proposed model is shown

We have experimented our dataset on different optimizers with same setup discussed at the beginning of section C. Though the highest accuracy is achieved by adam, adadelta and nadam which can be depicted from Table 3, but the validation loss is constantly increasing; i.e., the model is starting to fit on noise and is beginning to overfit. This ultimately loses the model’s ability to predict on new data.

Table 3 Performance of proposed framework for different optimizers

Performance comparison of proposed model with other approaches along with techniques and data modalities is explained in Table 4. Among all the approaches, our proposed model has achieved accuracy as high as 98.57% without any pre-learnt features.

Table 4 Performance comparison of proposed framework with other approaches

A sample illustration of Conv-ReLU-Maxpool operations on single jpeg image with size of 320 × 240 × 3 is shown in Fig. 6. Single feature with filter size of 3 × 3 is extracted using convolution operation shown in Fig. 6b. Relu activation and size reduction of image using maxpool operation is emphasized in Fig. 6c, d. Another interpretation shown in Fig. 6e is the whole operation of Conv-Maxpool-ReLU-Maxpool. From the illustration, one can visualize how the feature maps are produced by the CNN model and also how feature sizes are reduced in every layer of operation. This experiment is conducted on Spyder software with Keras library. Images are visualized with the help of computer vision library.

Fig. 6
figure 6

Illustration of Conv-ReLU-Maxpool operations on sample jpeg image

4 Conclusion

In this work, we have presented a framework based on deep convolutional neural network for Alzheimer’s disease detection in terms of accuracy. We have achieved 98.57% accuracy on our dataset without using any handcrafted features for training the network. Validation accuracy achieved is 87.72%. Experimental data is obtained from ADNI, and total 13,733 images from 266 subjects are used. Further studies will focus on achieving performance parameters such as specificity, sensitivity, recall and F1-score by improving the DCNN model. Therefore, another experiment can be implemented to test the prediction of disease by tuning hyper parameters. As a future enhancement, we will use support vector machine to classify the DCNN features.