Introduction

Alzheimer’s disease (AD) is an incurable neurodegenerative disorder with an unrelenting progression, affecting memory and cognitive abilities of a person. AD pathogenesis is believed to be triggered due to the overproduction of amyloid-β (Aβ) [1, 2] and hyper phosphorylation of tau [3, 4] protein. This results in accumulation of Aβ plaques and tau neurofibrillary tangles, disrupting the nucleocytoplasmic transport between neurons leading to cell death. Initially, the hippocampus region [5] is affected by the disease. Since the hippocampus is associated with memory and learning, therefore, memory loss is one of the early symptoms of AD. The exact cause of AD is unknown and, in some cases, it is believed to be genetic.

Dementia is a general term used for memory-related neurological disorders; however, Alzheimer’s disease is the most common type of dementia. According to the World Alzheimer’s Report 2015 [6], approximately 50 million people are suffering from dementia where AD accounts for 70–80% of cases. It has been estimated that by 2050, 131.5 million people will be suffering from AD worldwide. The rate of prevalence of AD globally is alarming that in every 3 s a person falls prey to it [7]. Also, AD gets the 6th place among the leading causes of death in the aging population. The total estimated cost to combat the disease worldwide in 2015 was $818 billion [6]. The cost on AD is reaching up to trillion dollars by 2019 and this cost is estimated to reach up to 2 trillion dollars by 2030 [7]. The percentage of people with AD increases with age: 3% people of age 65–74, 17% people of age 75–84 and 32% people of age 85 or older have Alzheimer’s disease [8].

AD is a progressive disorder that starts with mild symptoms and gets worse progressively. Researchers believe that Alzheimer’s related brain changes may begin 20 years or more before any symptoms of AD appears [8]. There are various stages of the disease, that are termed as: cognitively normal (CN), significant memory concern (SMC), early mild cognitive impairment (EMCI), mild cognitive impairment (MCI), late mild cognitive impairment (LMCI), and Alzheimer’s disease (AD) [9]. CN subjects show normal signs of aging with no signs of depression and dementia. In SMC, the subject has normal cognitive functions but show slight memory concerns. Subjects retain older memories by facing difficulties in forming and retaining new ones. EMCI, MCI, and LMCI are the stages during which disease has progressed and start affecting daily life activities. The patient shows symptoms including loss of motor functions, speech difficulties, memory concerns and ability to read and write. Levels of MCI are determined by a Wechsler Memory Scale (WMS) neuropsychological test [10]. AD is the advanced and final stage of the disease leading to death. There is no cure for AD but the right medication and proper care can help to manage symptoms. Although AD can’t be cured however cognitive decline can be slowed down in the early stages of the disease. Therefore, early-stage detection of AD is highly desirable in order to increase the quality of patients’ lives and to improve the developments in drug trials.

In recent years, the growth of neurodegenerative disorders such as AD has gained much interest from researchers worldwide to develop high performing methods for diagnosis, treatment, preventive therapies, and target drug discovery by studying the pathological processes associated with each stage of AD [11]. The rate of progression of AD varies from patient to patient and individuals may show different symptoms in a certain disease stage [12]. That makes classification of AD stages a challenging task for diagnosis and prognosis. New research developments have made it possible to diagnose AD using advanced diagnostic tools and biomarker tests. Various invasive and non-invasive neuroimaging technologies are used for AD diagnoses such as structural Magnetic Resonance Imaging (sMRI), functional MRI (fMRI), Positron Emission Tomography (PET), Computerized Tomography (CT), Electroencephalography (EEG), Magnetoencephalography (MEG) and Cerebrospinal fluid (CSF) biomarkers. The neuroimaging data acquired from these technologies are used for providing a computer-aided diagnosis to aid physicians and clinicians in order to improve health-care systems for AD.

Recently, resting-state functional magnetic resonance imaging (rs-fMRI) is being increasingly utilized to study the pathogenesis of AD and its stages. rs-fMRI is non-invasive and has shown great applicability to map how AD spreads in the living brain. Various studies have tested the accuracy of AD-related fMRI measurements and found positive predictability of disease related to cognitive decline [13, 14]. Resting-State fMRI captures the changes in blood oxygenation levels of subjects in the rest state. Therefore, brain regions affected by neurodegeneration show different patterns of blood oxygenation levels. Also, normal healthy subjects and AD patients show different patterns of blood oxygenation activities, which may directly be related to disease pathology and can be used to distinguish various stages of AD for diagnostic decision making. Various researchers have targeted the problem of computer-aided AD classification and diagnosis from rs-fMRI data. In this respect, one of the earlier methods used for AD diagnosis was based on using statistical techniques such as the General Linear Model (GLM) that have been applied for the analysis of fMRI [15, 16]. This method detects activated brain regions by performing a correlation between the template model and fMRI time sequences. GLM is a time-consuming algorithm that uses voxel as a parameter of measurement and is single variate [15].

Independent Component Analysis (ICA) is another statistical technique used for the analysis of neuroimaging data. Oghabian et al. [17] have applied ICA algorithm to distinguish between healthy, MCI and AD patients. They used fMRI data from 15 normal, 11 MCI and 14 AD subjects and applied seven steps pre-processing pipeline. Different pre-processing techniques have been applied in this study including MCFLIRT based head motion correction [18, 19], slice-timing correction, mean intensity normalization, spatial smoothing using FSLBET based brain extraction [20], high pass filtering, and Gaussian smoothing. After applying various pre-processing steps, the ICA algorithm has been applied to fMRI activation patterns. They obtained a difference of 0.0097, 0.0051, and 0.0168 between control and MCI, between control AD and between AD and MCI subjects respectively.

Another common method for neuroimaging data analysis is based on Multi-Voxel Pattern Analysis (MVPA) techniques [21, 22]. This method is based on supervised linear regression and determines specific functional activities of various brain regions by using their neural dynamics. Coutanche et al. [22] have applied MVPA to determine symptoms in patients. And it was found that MVPA methods can be used to classify various stages of a disease. In MVPA based approaches, multiple classifiers are used to obtain the best results. To classify fMRI data non-linear classifiers are used such as Support Vector Machine (SVM).

The traditional machine learning techniques require handcrafted feature extraction before classification. However, for automatic analysis of neuroimaging data, manual extraction of features is suboptimal. Approaches based on user-defined features have limitations. Improved performance can be obtained by learning features specific to the problem of interest. Recently, deep learning methods are being used in the domain of neuro-imaging for automated feature extraction and analysis of brain data by using improved processing power and graphical processing units. In deep learning techniques, feature extraction is automatic, thus, models based on this achieve improved performance.

In this respect, H.I Suk et al. [13] have applied deep learning to classify three disease stages including MCI, MCI converter, and AD. The dataset includes scans from 128 MCI, 76 MCI converters, 93 AD and 101 normal control (NC) subjects. These scans were pre-processed by applying methods of skull-striping, spatial normalization, and cerebellum removal. For feature extraction from images, an auto-encoder network has been applied. After feature extraction, SVM based classification has been performed and accuracies of 95.35%, 85.67%, and 75.92% have been achieved for AD vs. NC, MCI vs. NC and MCI-converter vs. MCI respectively. Siqi Liu et al. [26] presented a multi-modal method to extract neuro-imaging features for AD diagnosis. The zero-masking method was used to learn low-level features and stacked autoencoder network was used for learning high-level features. The extracted features have been classified by applying SVM classifier and an accuracy of 86.86% have been achieved.

Payan et al. [27] presented an algorithm for classification of three stages of AD including MCI, AD and normal control (NC). The algorithms were based on applying a 3D CNN with an autoencoder network and 2D CNN to classify brain scans. An accuracy of 89.47% and 85.53% have been achieved with 3D CNN and 2D CNN models respectively. Siqi Liu et al. [38] also achieved a classification accuracy of approximately 85.53% with the same network architecture for 2D CNNs. Sarraf et al. [14] performed research for classification of AD. The study was based on classifying AD patients from normal control subjects using MRI and fMRI scans. Two network architectures have been applied for binary classification. These CNN based architectures were based on LeNet-5 and GooleNet. They achieved an average accuracy of 99% with LeNet and 100% with GoogleNet using fMRI data.

Table 1 presents a review of the studies based on Alzheimer’s disease classification using deep learning techniques. Most of the studies have used structural MRI or PET scans and are based on the classification of a few disease stages i.e. AD, CN, and MCI. There is a limited number of studies that have used fMRI data for multi-class AD diagnosis and classification. Some of the studies that have used fMRI data for AD classification has been listed in Table 2 along with the stages of the target disease.

Table 1 A literature review of Alzheimer’s disease classification using deep learning techniques
Table 2 Alzheimer’s disease classification using fMRI and deep learning techniques

Classifying different stages of AD is a challenging task due to overlapping features of different stages. Most of the work in literature is directed towards the binary classification i.e. presence or absence of AD from neuro-scans. Little work is done to classify two or more stages of this disease. In this research, the objective is to perform a multi-class classification of 6 AD stages that include CN, SMC, EMCI, MCI, LMCI, and AD. Classifying data with similar features among different classes is a challenging task. Another challenge is the availability of large datasets with ground truth labels. In order to overcome this problem transfer learning approach, in addition to training the model from scratch, has been used in this study to improve performance. We have used resting-state fMRI to perform multi-class AD classification by applying image processing and deep learning methods. We used Resnet-18 as a base architecture and empirically performed analysis by using two approaches. First, by training ResNet-18 from scratch by randomly initializing the network parameters and reducing the number of input channels to one. Second, by initializing weights from pre-trained model and using two strategies for transfer learning: (i) by replacing the last dense layer of the original network with the new dense layer and, (ii) re-training all the convolution layers of the network with our dataset. Several experiments are performed by tuning hyperparameters of algorithms and classifiers, to get optimal accuracy.

This paper is organized into the following sections. Section 1 provides an introduction and literature review. Section 2 presents the methods and materials used to conduct this research. The experimental details and results are listed in Section 3. Finally, the conclusion has been presented in the last section.

Materials and Methods

The research methodology consists of multiple steps including data acquisition, pre-processing, deep learning-based feature extraction and classification followed by evaluation. Neuroimaging data are acquired from a well-known database on Alzheimer’s disease. Pre-processing techniques are applied to remove noise and artifacts from data. Preprocessed data is then fed to CNN based neural networks for feature extraction and classification of multiple stages of AD. These computational steps are graphically presented in Fig. 1.

Fig. 1
figure 1

Computational steps for multi-class AD classification

Neuroimaging Dataset

Neuroimaging data is acquired from Alzheimer’s disease Neuroimaging Initiative (ADNI) database [40] that has been used in various studies [13, 23] for AD classification. ADNI is an extensive multisite study aimed at developing genetic, biochemical, neuroimaging and clinical biomarkers for AD diagnosis, prognosis, and tracking. ADNI contains neuroimages in various modalities including MRI, fMRI, PET, and DTI. In this research, we have used rs-fMRI brain scans provided by ADNI. The dataset contains fMRI scans from 138 subjects including 25 CN, 25 SMC, 25 EMCI, 25 LMCI, 13 MCI, and 25 AD. The age of the subjects is greater than 71 and each of them has been diagnosed and labeled as one of the AD stages based on their scores in cognitive tests i.e. mini-mental state examination (MMSE) [41] and clinical dementia rating (CDR) [42]. The characteristics of the fMRI dataset used for experimental analysis are given in Tables 3A and 3B.

Table 3A Overview of the study groups in the rs-fMRI dataset
Table 3B Characteristics of the rs-fMRI dataset

Preprocessing of Resting-State fMRI Data

Researchers have used various preprocessing steps on this dataset [14, 30]. Data preprocessing is applied to remove noise and artifacts from data that can improve the quality of images and leads to better feature extraction. For preprocessing rs-fMRI, the standard pipeline consisting of various steps is used. Firstly, the dataset is converted from DICOM to NIFTI format by using the conversion toolbox from Chris Rorden [43]. Functional Magnetic Resonance Imaging of the Brain (FMRIB) Software Library (FSL) [44, 45] is used for preprocessing the data.

Brain extraction is performed on scans to remove non-brain tissues such as neck tissues and skull. For this purpose, FSL-BET toolbox [46] is used, which performs brain extraction by estimating the intensity histogram-based threshold, the center of the gravity and radius of the sphere of the brain’s surface. Inside the brain, the tessellated surface is initiated, which slowly updates one vertex at a time until a complete surface is achieved. Then, motion correction is applied to remove and correct the effect of subjects’ head motion during data acquisition sessions. We performed motion correction by using FSL-MCFLIRT toolbox [19, 47]. Slice timing correction is also performed by using FEAT [48] module of FSL library. The method of slice timing correction applies interpolation to transform the voxel time-series either forward or backward in time to make the temporal adjustment. The interpolation method used for this study is sinc interpolation based on Hanning windowed method to adjust the voxel time-series by a fraction of scanner’s TR (Repetition Time) with respect to the middle of TR.

Intensity normalization is applied on data to ensure that each volume has the same mean intensity. Spatial smoothing is applied to reduce the noise level while preserving the underlying signal. Its purpose is to improve the signal to noise gain. The extent of spatial smoothing is set according to the size of the underlying signal. We performed spatial smoothing by using a 5 mm FWHM Gaussian kernel. The kernel size selection corresponds to what has been recommended in the literature for this dataset. Then, temporal high-pass filtering is applied to remove the low-frequency noise signals as a result of some psychological artifacts such as breathing, heartbeat or scanner drift for the rs-fMRI time series. High pass filtering is performed by using a temporal filter of cut-off frequency 0.01 HZ. We also applied spatial normalization on images by first putting the images in T1 weighted space by using a linear-transformation with 7 Degree of Freedom (DOF). The images are then registered to a standard space of MNI152 template, which is a reference template of brain-derived from the average of 152 MRI scans. To register images to MNI152 space, a linear transformation with 12 DOF (such as translation, scaling, shear, and rotation) is applied. In this study, spatial normalization is performed by FSL-FLIRT [47, 49, 50] toolbox.

After applying the preprocessing methods on fMRI data, preprocessed 64 × 64x48x140 4D fMRI scans are obtained in which each scan contains 64 × 64x48 3D volumes per time course (140 s). These 4D scans are then converted to 2D images along with image height and time axis. This results in 6720 images of size 64x64 per fMRI scan. The first and last three slices are removed as they contain no functional information. Therefore, from each scan information from 44 slices is used. Hence, 6160 2D images are obtained from each fMRI scan and are saved in portable network graphics (PNG) format.

The data acquired from ADNI is processed and converted to 2D images by using the aforementioned pre-processing methods. In this way, we have created a dataset that was used for training deep learning networks. The characteristics of the preprocessed dataset are given in Table 4.

Table 4 Characteristics of the preprocessed dataset

Deep Learning Methods for RS-fMRI Data

We performed our experiments on CNN based architectures to train and evaluate our dataset. Prior work on rs-fMRI based Alzheimer’s disease classification is mainly based on LeNet [14], GoogleNet [30] and AlexNet architectures. Due to the outstanding performance of Residual neural network [51] in the computer vision domain, our focus in this study is the ResNet-18 architecture [52]. We used this architecture by training from scratch as well as by adapting the pre-trained network for our task through transfer learning. The details of network architectures are given in this section.

Residual Neural Network for AD Classification

ResNet was developed by Kaiming He et al. [51] in 2016. A residual learning method was proposed to train deeper networks that are practically difficult to train. Network layers were reformulated to learn residual functions with reference to the layer inputs. The results indicate that the deeper networks based on residual learning can achieve better optimization and high accuracy. Experimental evidence [53, 54] revealed that network depth is crucial to achieving better performance. But deeper networks are difficult to train and the increased number of layers may not ensure better learning. Also when deep networks start convergence, accuracy gets saturated at a point and then starts to decrease rapidly. The use of residual learning in deeper networks solves the problem of accuracy degradation in deeper networks. In plain networks, several layers are stacked together to directly learn the desired mapping. However, in residual networks, the layers are stacked to learn a residual mapping. The mapping function, denoted by H(x), is fitted by a few stacked layers. The idea of residual learning is hypothesized as, if several nonlinear layers can asymptotically estimate a complicated mapping function, then they can asymptotically estimate the residual function denoted as F(x). The underlying mapping is given by:

$$ H(x)=F(x)+x $$
(1)

And the residual function is given by:

$$ F(x)=H(x)\hbox{--} x $$
(2)

The stacked layers explicitly learn the residual function F(x) rather than learning the original function H(x). This method assumes that residual mapping function is easier to optimize than the original function. For example, if an identity mapping is optimized than the residual can be pushed to zero easily rather than approximating the identity mapping from a few stacked non-linear layers. After approximating the residual function, original mapping function is calculated as H(x) = F(x) + x. This mapping function F(x) + x is realized as residual shortcut connection in a feedforward neural network and performs element-wise addition. In a residual network, these connections approximate an identity mapping. Their output is then added back to the stacked layers. Addition of these connections in the networks does not introduce more complexity or parameters. Also, these residual networks can be easily trained with SGD based backpropagation.

The architecture of the original ResNet-18 is shown in Fig. 2. There is a total of eighteen layers in the network (17 convolutional layers, a fully-connected layer and an additional softmax layer to perform classification task). The convolutional layers use 3 × 3 filters and the network is designed in such a way that if the output feature map is the same size then the layers have the same number of the filters. However, filters get doubled in the layers, if the output feature map is halved. The downsampling is performed by convolutional layers having a stride of 2. Lastly, there is an average-pooling followed by a fully-connected layer with a softmax layer. Throughout the network, residual shortcut connections are inserted between layers. There are two types of connections. The first type of connections, denoted by solid lines, are used when input and output have the same dimensions. Second types of connections, denoted by dotted lines, are used when dimensions increase. This type of connection still performs identity mapping but with zeros padding for increased dimensions with a stride of 2.

Fig. 2
figure 2

Original ResNet-18 Architecture

In order to benefit from the effects of a different design decision in deep learning, we designed several experiments by training modified ResNet-18 from scratch as well as performing transfer learning. Specifically, we used two strategies in our experiments for network training. First, we used a slightly modified version of ResNet-18 to perform training from scratch by randomly initializing the network parameters. We also reduced the number of input channels to one in order to perform training with the greyscale images.

Secondly, we used a pre-trained network for weight initialization and transfer learning was performed. Since the trained model was for a different domain and task, we adapt the network to perform our task. In order to transfer the knowledge from a pre-trained network, we performed transfer learning in two ways. In the first approach, we performed off-the-shelf (OTS) [55, 56] transfer learning by replacing the last dense layer of the original network with the new dense layer to match the number of classes for our task. In the off-the-shelf approach, all the layers except the last layer (classifier) of the network are used for feature extraction and the weights of the last layer are adapted to the new task. The second approach is fine-tuning (FT), in which more than one layers of the network are re-trained from the samples of the new task. For fine-tuning approach, we re-trained all the convolution layers of the network with our dataset. For both approaches of transfer learning, we used weights of ResNet-18 network trained on ImageNet as a starting point [57].

The details of the networks used for experiments are given in Table 5. We used the ResNet-18 architecture in our experiments and the table presents the difference between the original and our networks. There are three networks in the table, 1-Channel ResNet-18, Off the Shelf (OTS) and Fine-Tuned (FT). 1-Channel ResNet-18 represents the network that was trained from scratch by using greyscale images. For transfer learning, we used three additional convolution layers. The words “same” and “fine-tuned” are used to represent the difference between OTS and FT networks. The layer parameters of each network are given in the table. The ResNet with 18 layers has 2.37 million parameters and ResNet with 21 layers has 11.18 M parameters.

Table 5 Adapted Architectures of ResNet-18 used for AD classification

Evaluation Measures

A common approach of evaluating results of machine learning models is using precision, recall [58] f1-measure and area under the receiver operating characteristic (AROC) curve [54]. These measures have been originated form Information Retrieval. In this research, we have evaluated our multi-class AD classification models by using the aforementioned evaluation measures.

Precision

Precision or Confidence [58] is the proportion of predicted positive cases that are actually real positives. It is also called Positive Predictive Value (PPV). Precision is defined as:

$$ Precision= Confidence=\frac{TP}{TP+ FP} $$
(3)

where TP denotes true positive and FP denotes false positive.

Recall

Recall (also named as sensitivity) [58] is the proportion of actual positive cases that are correctly predicted. This measures the coverage of actual positive cases and reflects correct predicted cases. It is termed as True Positive Rate (TPR) and is given as:

$$ Recall= Sensitivity=\frac{TP}{TP+ FN} $$
(4)

where TP denotes true positive and FN denotes false negative.

F1-Measure

F- Measure is a combined measure [58] that captures the tradeoffs associated with precision and recall and is defined as:

$$ F- Measure=\frac{1}{\upalpha \frac{1}{P}+\left(1-\upalpha \right)\frac{1}{R}}=\frac{\left(1+{\upbeta}^2\right) PR}{\upbeta^2P+R} $$
(5)

where P denotes precision and R denotes recall. The harmonic mean is considered a very conservative average, that’s why a balanced measure is used called F1-measure with α = 1, β = 1/2 and is defined as:

$$ F1- Measure=\frac{2\ast PR}{P+R} $$
(6)

Area under Receiver Operating Characteristic (AROC) Curve

Receiver Operating Characteristics (ROC) analysis [58] has been borrowed from Signal Processing in medical sciences. It has become a standard tool for evaluation by comparing the true positive rate (TPR) and false-positive rate (FPR). In behavioral sciences, AROC curve represents the combination of sensitivity (TPR) and specificity (TNR). It allows to compare the performance of classifier models and takes values between 0 and 1. Best classifier model is the one which is closest to 1 and farthest from TPR = FPR. However, lower bound for classification is 0.5 in practical scenarios where classifier has no discrimination ability. Whereas classifier with must higher value than 0.5 has a much more discriminative ability. The approach to measuring the ROC curve is by calculating the area under the curve (AUC) and is given by:

$$ AUC=\frac{TPR- FPR+1}{2}=\frac{TPR- TNR}{2}=1-\frac{FPR+ FNR}{2} $$
(7)

where TPR denotes true positive rate, FPR as false positive rate, TNR as true negative rate and FNR as false-negative rate.

Results and Discussions

This research is aimed at using rs-fMRI data to classify 6 stages of AD. We have applied different preprocessing methods before performing further analysis. The preprocessing methods and algorithms used to analyze rs-fMRI data have been discussed in Section 2. This section provides details on the experiments conducted and discusses the results achieved. In the dataset, there are 138 4D scans and 850,080 2D images. For the evaluation, we split the dataset into a training dataset, validation dataset and testing dataset with 70%, 20%, and 10% split ratio, respectively as described in Table 6. The dataset was randomly shuffled before splitting. The validation set was used to determine the trend of learning during the training phase. We estimate the validation loss to determine the best models. The testing set was used to perform inference on the learned model.

Table 6 Details of dataset split used for training

Experiments and Evaluation

We trained three ResNet-18 based networks (1CR, OTS, and FT) to classify different stages of AD. We used the same experimental setup in all the experiments. The input to the networks are images of size 224 × 224 that are resized to match the pre-trained network’s input size. The learning rate is initialized to 0.001 which decreased by 10% every 25,000 iterations. Gamma is initialized as 0.1, momentum as 0.9 and weight decay factor is 0.0005. Stochastic gradient descent (SGD) based solver is used with a batch size of 32 images for training. The models are implemented on Caffe and trained on FloydHub cloud service with GPU Tesla K80.

The trends of training and testing loss and average test accuracy are shown in Figs. 3, 4, 5, 6, 7, 8.

Fig. 3
figure 3

Graphical trends of training loss vs. testing loss with 1CR Network

Fig. 4
figure 4

Graphical trend of test accuracy with 1CR Network

Fig. 5
figure 5

Graphical trends of training loss vs. testing loss with OTS Network

Fig. 6
figure 6

Graphical trend of test accuracy with OTS Network

Fig. 7
figure 7

Graphical trends of training loss vs. testing loss with FT Network

Fig. 8
figure 8

Graphical trend of test accuracy with FT Network

Table 7 summarizes the testing accuracy and validation loss with the three networks for a multi-class AD classification task. The best average testing accuracy is achieved with the OTS network, however, the improvement is slight compared to the FT network. However, for “CN” stage, the FT network achieved approximately 3.12% improvement in accuracy than the OTS network. Table 8 summarizes the outcomes of our evaluation and Fig. 9 presents the ROC curves of three models. We evaluated three different experimental setups with varying weight initializations and network architectures. We evaluated the classification models by using different measures such as precision, recall, f1-measure, and AROC curves analysis to evaluate all AD stages.

Table 7 Multi-class classification results with networks
Table 8 Evaluation of results for our models
Fig. 9
figure 9

Receiver operating characteristic curves for multi-class AD classification

The results indicate variability in outcomes with respect to AD stages, especially for CN and SMC stages. Specifically, for the “CN” stage, we observe a standard deviation of 0.048 for precision, 0.017 for recall and 0.016 for f1-measure of three models. For the “SMC” stage, the standard deviation is 0.05 for precision, 0.014 for recall and 0.0197 for f1-measure of three models. For each AD stage, the results are better with either OTS or FT models. However, the average scores for all measures are higher for the OTS network. In particular, an average improvement of 0.069, 0.0055, 0.0055 and 0.0002 is observed for precision, recall, f1-measure, and AUC respectively with the OTS model. While, our models achieved high AUC for all AD stages, yet the applicability of such technology in a clinical assessment largely depends on the data available for model training and evaluation.

Comparative Analysis

We compared the performance of our ResNet-18 based networks with each other. We also compared our results with a previous study [39], that have worked on a similar problem but with 5 disease stages including AD, CN, EMCI, LMCI, and SMC. But in order to have a fair comparison (using same data samples, data split and a number of stages), we performed an additional evaluation by training an AlexNet used in [39] on our dataset. We evaluated the classification results in terms of average accuracy and classification accuracy of each AD stage. Table 9 presents the comparative analysis as well as summarizes the classification results with other approaches. Figure 10 graphically presents the comparative analysis for 6 AD stages.

Table 9 Overview of classification accuracy (%)
Fig. 10
figure 10

Comparative analysis of classification results

In our comparative analysis, we noticed that our models performed better than Y. Kazemi et al. [39] and our trained AlexNet model. When we compared our results with Y. Kazemi et al., we found that the FT model achieved higher classifying accuracy with each AD stage and OTS and 1CR achieved higher accuracy in all but “CN” stage. Particularly, the FT model achieved 1.66%, 2.3%, 1.74%, 1.54 and 3.04% improvement in accuracy with CN, SMC, EMCI, LMCI, and AD respectively. While, TS model achieved an improvement of 5.45%, 1.74%, 1.49% and 3.04 with SMC, EMCI, LMCI, and AD respectively. With 1CR model, we achieved an improvement of 5.29%, 1.55%, 0.83% and 2.32% with SMC, EMCI, LMCI, and AD respectively. To have a fair comparison, we compared our results with our trained AlexNet model and found that there is an approximately 4% improvement in accuracy with our models. Overall, when directly comparing our results to the former best results, we achieved state-of-the-art results with our FT model for all AD stages. While with OTS and 1CR models we achieved state-of-the-art results for all except “CN” stage. Comparing the average accuracy for all AD stages with the former approach i.e. AlexNet, also achieved improved performance. While the former study reported average accuracy of 97.63% with 5 disease stages and a larger dataset than ours, we still achieved better results for 6 AD stages with an average accuracy of 97.92% and 97.88% for OTS and FT models respectively.

Conclusion

Alzheimer’s disease (AD) is an incurable neurological disorder affecting a significant world’s population. The early diagnosis of AD is crucial to improve the quality of patients’ lives and the development of improved treatment and targeted drugs. The present study was conducted to explore the effectiveness of the resting-state functional magnetic resonance imaging and advanced deep learning techniques to perform multi-class classification and diagnosis of AD and its progressive stages including CN, SMC, EMCI, MCI, LMCI, and AD. The study proposed to use deep residual neural networks combined with transfer learning approach for performing the classification of 6 AD stages. We incorporate different weight initialization schemes and network architectures to evaluate our dataset. We present a systematic evaluation of three networks including 1CR, OST and FT. 1CR network was exclusively trained on single-channel rs-fMRI images, while two networks were optimized on ImageNet dataset by retraining last dense layer in the OTS network and all the convolution layers in the FT network respectively. While state-of-the-art results are achieved with our models, however, in comparison to a former study [39] FT network achieved higher accuracy for all AD stages with 1.66%, 2.3%, 1.74%, 1.54 and 3.04% improvement in accuracy for CN, SMC, EMCI, LMCI, and AD respectively. And Our OTS network achieved the best average accuracy of 97.92% for 6 disease stages. The use of residual learning, pre-training and transfer learning approach helped to achieve better performance. The results of this study indicate that integration of resting-state fMRI based neuroimaging and deep learning methods can assist the diagnostic decision making in early diagnosis of neurodegenerative disease, especially AD. The diagnosis of AD stages could aid drug discovery by providing better pathogenesis for measuring effects of target treatments that can slow down the disease progression. By combining clinical imaging with deep learning techniques can help to uncover patterns of functional changes in the brain, related to AD progression and could aid in the detection of risk factors and prognostic indicators.