Introduction

Alzheimer’s disease (AD) is a neurodegenerative disorder that is becoming increasingly prevalent in aging populations [1]. AD is known to be the most typical form of dementia which slowly destroys neurons, thereby affecting the cognitive abilities of the affected individuals. At present, a wide range of treatment and medications are available to reduce the severity of the disease and delay its gradual onset. However, there is no definitive cure, or even a well-defined understanding of the pathogenesis of this disorder [2].

In 2007, a study was established to predict the future prevalence of AD, taking into account the increasing number of cases seen in elderly population [3]. The study concluded that by the year 2050, 1 in 85 persons would be affected by the disorder, with 43% of the cases need advanced level of care. It was also estimated that, if future medical interventions were able to delay the disorder by even 1 year, the number of persons affected in the year 2050 would decrease drastically. It can be concluded from this study and other similar studies [4,5,6] that AD poses a grave threat to global health. Therefore, there is a necessity for advanced methods in the diagnosis of AD in its early stages so as to administer the right treatment to delay the progression of the disease.

The greatest risk factor for being affected by AD is the age factor. Individuals aged 65 and above are at a significant risk of developing AD, and the risk keeps on increasing as their age progress. Other factors such as Down syndrome, genetics, family history also serve as a greater risk of suffering from AD. Research conducted in the past two decades have shown that head injuries, smoking, depression and even exposure to aluminium may make a person more vulnerable to AD [7,8,9,10,11].

Diagnosis of AD relies heavily on clinical methods. The most fundamental characteristics of the disease are the presence of beta-amyloid plaques in the neocortex regions and neurofibrillary tangles. However, their presence is indicated only in later stages of the disease, and there are no laboratory tests that can confirm or predict its onset in early stages. In 1984, a standard criterion for the clinical diagnosis of AD was established [12]. However, research conducted in subsequent years uncovered a lot of previously unknown clinical facts. In 2009, an updated criteria were published that highlighted the complex mechanisms that formed the backbone of the AD pathogenesis [13].

Recent advancements in research have identified a provisional phase between healthy aging and dementia, known as mild cognitive impairment (MCI), which can further be divided into two stages, early MCI (EMCI) and late MCI (LMCI) [14]. Patients with EMCI and LMCI were found to be at-risk of progressing towards AD dementia, with the risk being greater for patients with LMCI [15]. MCI is characterized by higher cognitive impairment and memory loss than normal aging, and therefore must be identified and treated in its early stages to postpone the advent of AD/dementia.

Considering early diagnosis of AD in clinics is complicated, the use of computer-based methods in conjunction with medical professionals has a tremendous role in identifying and diagnosing AD. Deep learning and machine learning classifiers have piqued interest of many researchers in recent years for the challenge of identifying the condition, with impactful outcomes. Since most of the studies have concentrated on structural brain-imaging techniques [16, 17], we made our effort in employing 2-deoxy-2-[fluorine-18] fluoro-d-glucose ( 18F-FDG) Positron Emission Tomography (PET) imaging as our imaging modality for study.

Although neuropsychological tests detect AD, pathophysiological changes in the brain are predominantly diagnosed through imaging. 18F-FDG-PET is a minimally invasive procedure used to assess the cerebral glucose metabolism. One of the most prominent features in AD is the substantial pruning of glucose metabolic activity in specific regions of brain as demonstrated by 18F-FDG-PET. Neurodegeneration is mainly caused due to the cerebral glucose hypometabolism. The metabolically active areas in the brain are represented by higher degrees of 18F-FDG-PET. AD patients have significant reductions in glucose metabolism in temporoparietal regions, Posterior cingulated cortex and frontal cortex regions. These reductions in metabolic activity indicates the early preclinical stages of AD. Hence 18F-FDG PET is used as promising diagnostic modality to identify the diagnostic patterns in the pre-clinical evaluation of AD. This modality can be used in diagnosing the neurocognitive disorder due to AD. It is mainly used for early prediction of AD and it has potential to distinguish the patterns of AD earlier than MRI for MCI subjects. Hence 18F-FDG-PET is considered as a predominant tool for pre-symptomatic diagnosis of AD providing better sensitivity and accuracy.

Since the patterns of cerebral hypometabolism are strongly linked with the various categories of cognitive dysfunctions and neurodegeneration, 18F-FDG-PET as marker is particularly useful for possible diagnosis of the stages of dementia [18]. The progress of deep learning research encourages finding and creating better deep learning methods to promote 18F-FDG-PET study and assessment, which might facilitate in the aid of early clinical AD treatments.

The role of artificial intelligence in healthcare has more powerful impact in recent times and grabbed a lot of attention from researchers and scientists for its effective implementation in medical applications. Deep learning models have been successfully implemented in the past decade for the purpose of diagnosis from medical images. The field of AD research is widely explored and there are several works in literature pertaining to the application of artificial intelligence in AD diagnosis, specifically with regard to 18F-FDG-PET in AD diagnosis. Several of these studies have utilised images from the Alzheimer’s disease neuroimaging initiative’s (ADNI) database, an initiative that aims to study whether the combination of imaging and clinical biomarkers along with neuropsychological evaluation may be used to estimate MCI and early AD.

Due to uncertainties surrounding the clinical biomarkers of the disease, there has been tremendous research work persist to identify the imaging biomarkers to detect the disease in its early stages. Imaging is now being used in the diagnosis of AD for positive reinforcement of clinical diagnosis. Amyloid deposits which are characteristic of AD can also now be visualised through amyloid imaging [19]. Recent works in literature have shown that through imaging, it is possible to identify structural cerebral anomalies that can assist in identifying AD in its early stages of cognitive impairment [20].

Initially, structural MRI was employed to detect progressive cerebral atrophy as a characteristic of neural degeneration, which would imply the onset of AD [21]. However, this approach lacked molecular specificity and was unable to identify the two hallmark pathological signs of AD i.e. amyloid plaques and neurofibrillary tangles. Functional MRI (fMRI) studies focussed on the activation of the hippocampus region of the brain. Decreased hippocampal activity during encoding of new information was seen to be consistently associated with AD [22, 23]. One major disadvantage of this technique is that it is highly sensitive to motion of the head. Diffusion tensor imaging of the hippocampal region has also yielded significant results [24]. However, just like fMRI, further longitudinal studies are needed to validate the concept.

Positron Emission Tomography (PET) with 2-deoxy-2-[fluorine-18] fluoro-d-glucose and integrated with CT (FDG-PET) is highly recognized for its powerful imaging capabilities [25]. FDG-PET has been shown to have a high specificity as well as sensitivity in the detection of AD. It is also possible to differentiate between the PET scans of individuals with AD and other forms of dementia such as Lewy Body Disease (LBD), Parkinson Disease (PD) and Frontotemporal Dementia (FTD) [26]. It is also possible to differentiate AD from other forms of dementia during the MCI stages using FDG-PET. Chen et al. developed a hypo-metabolic convergence index (HCI) based on known AD voxel regions and were successful in identifying which patients with MCI would progress to AD in the next 18 months [27]. Therefore, it can be concluded that FDG-PET is a promising imaging modality in the diagnosis of AD during its early stages as well as in the prediction of the progression of the disease.

Choi et al. [28] were the first to use an unsupervised learning method to recognise aberrant features in FDG-PET images of the brain. On PET scans of normal patients (those without cognitive impairment), a neural network based on variational autoencoder (VAE) was trained. The trained model’s output was an abnormality index, which defined the difference between the provided input image and a normal FDG-PET image. From 18-F-FDG-PET pictures, this model proved successful in recognising aberrant patterning in AD.

Wee et al. [29] incorporated cortical geometry of T1 weighted MRI images to classify controls (CN), EMCI, LMCI and AD. A spectral graph-CNN was trained based on images from the ADNI-2 cohort. The graph-CNN method achieved an accuracy of 85.8% for the classification of AD versus CN. Further, this model was tested on two other populations (ADNI-1 cohort and an Asian cohort), which yielded an accuracy of 89.4 and 90% respectively. Thus, the graph-CNN model was found to be efficient as well as reproducible on different populations. Ding et al. [30] implemented the Inception V3 deep learning architecture in the diagnosis of AD and MCI based on FDG-PET images from the ADNI database. The Inception-V3 architecture achieved an AUC of ROC curve of 0.98.

Liu et al. [31] implemented a combination of CNN and RNN framework to classify AD versus CN and MCI versus CN images from the ADNI database. Their approach was based on decomposition of 3D FDG-PET into 2D slices, from which the RNN framework would learn inter-slice and intra-slice features for classification. Their combined framework led to an AUC of ROC of 95.3% for AD versus CN and 83.9% for MCI versus CN. Singh et al. [32] proposed a method in which probabilistic principal component analysis (PCA) was carried out for feature reduction followed by binary classification using a multilayer perceptron neural network. The study utilised AD, CN, EMCI, LMCI images from the ADNI database. Their network achieved high precision, F1-score and recall scores in the classification of AD versus CN, AD versus MCI, CN versus MCI, AD versus EMCI and AD versus LMCI.

Ramzan et al. [33] used the ResNet-18 architecture for the multiclass classification and automated diagnosis of AD, SMCI, EMCI, LMCI, MCI and CN. They utilised fMRI images from the ADNI database and compared the performance of ResNet-18, when it is trained from scratch with randomly initialised parameters versus the transfer learning approach. Under the transfer learning approach, they further experimented with two different methods: (i) where only the last layer is replaced with a custom layer that is adapted to the current task (off-the-shelf method) and (ii) where more than one layer is replaced and trained for the new task (fine tuning (FT) method). The ResNet architecture was found to be highly suitable for the classification of AD and its various stages. All the different training methods used in this study yielded excellent results, but the highest result was attained by the off-the-shelf method, with an AUC of ROC value of 0.99.

For the detection of AD, non-invasive biomarkers like PET have been used normally.18F-FDG-PET is the most extensively used type of PET image. The most commonly chosen features from 18F-FDG-PET biomarker are cellular glucose metabolism. 18F-FDG PET quantifies the reduction in the regional cerebral metabolic glucose rate and provides a reliable biomarker in the AD patients. In clinical practice, identifying the stage of MCI to progression of AD dementia is the most challenging task for the medical community [34, 35]. To eliminate this challenge, the Worldwide researchers has access to public database such as ADNI Neuroimaging datasets (http://adni.loni.usc.edu) which includes multi-modal imaging data along with demographic, genetic and cognitive measurements of healthy, MCI and AD subjects. These datasets are compared and analysed to perform automated detection and classification of AD, CN and MCI Progression [36, 37] using deep learning techniques like Convolution neural networks.

Earlier diagnosis of AD is very crucial for clinical recommendation of prompt treatment. It’s a very challenging task to predict the progression of the stage MCI to AD and to differentiate MCI from healthy controls for providing appropriate treatment for the patients. Hence there is a need for multi-class classification system to predict the AD and its various stages from healthy controls. Recent advancements in Neuroimaging techniques paves the way for integration of high dimensional multimodal neuroimaging data with deep learning techniques for efficient classification of various stages of AD.

Our study proposes a novel automated method for the diagnosis of the various stages of AD. In this study, we have initialised a transfer learning approach to train a variation of the ResNet architecture known as ResNet-50. The ADNI repository’s 18F-FDG PET brain images were used as the operative case dataset for this study, which consist of images depicting multiple stages of AD/dementia such as CN, EMCI, LMCI and AD. Our aim was to classify the various stages of AD/dementia using the proposed architecture, and comparatively evaluate the performance outcome of our transfer learned model with that of other pre-trained CNN models. Further, we visualised the successful model results using Feature Maps, and analysed the results using statistical metrics.

Methodology

Dataset

The imaging data orderly acquired for usage in the study were retrieved from the Alzheimer’s disease neuroimaging initiative’s (ADNI) repository. Since two decades, the ADNI research has included collaborations with highly reputed scientific communities including National Institute on Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), multidisciplinary biomedicine based non-profit organisations, and commercial pharmacological companies. Details about this wide data collection in different image modalities and formats have been made available on the website [38]. ADNI’s important goal has been to study the evolution of cognitive symptoms in different categories of respondents, including normal subjects, early mild cognitive impaired individuals (EMCIs), late mild cognitive impaired individuals (LMCIs), and AD patients. Figure 1 shows the different stages of dementia and AD. For our study, 18F-FDG PET brain imaging protocols were collected with the dates spanning between January 2006 and December 2013. A total of 1130 subjects with different associated stages that include AD, CN, EMCI, LMCI were collected for training our deep learning algorithm. The ground truth image labels have been referred to improve the classification performance accuracy of AD. Details regarding the data collected for training and testing is included below in Table 1. The overall block diagram of the proposed work is given in the Fig. 2.

Fig. 1
figure 1

Sample subjects depicting different stages of AD and dementia

Table 1 Details of the dataset
Fig. 2
figure 2

Block diagram of Proposed Multi -class categorization of various stages of AD

Image pre-processing

The images were originally acquired in DICOM format from the ADNI database. The pre-processing of such images into a simpler format for feeding into the deep learning network pipeline was done using grid method [30], conducted in Python’s SciPy library. This method resamples the image from 2-mm isotropic voxels to pixel grid of 200 × 200 × 180-mm3 in volume. Conclusively, the images were retrieved in 2-D matrix size of 224 × 224, in PNG format.

Deep learning models identifies important features from neuroimaging data automatically. Deep learning models can identify the optimal set of features during the training stage of data itself. The pre-trained CNN models generate unsupervised features and then combine with a more refined decision layer. This kind of architecture model facilitate the modelling of complex decision boundaries for multi-class classification problems. Hence due to this inherent property of deep learning models, efficient prediction of MCI is possible, even though the prediction of EMCI and LMCI are challenging in Alzheimer’s disease due to limited differences between the four groups.

Deep learning technique uses pre-processes neuroimaging data like 18F-FDG PET to generate the features. The features are obtained based on the intensity measurements of glucose metabolism. Most relevant features are selected based on the component analysis to represent nearest axial perspective of the image. In the proposed work, automated feature extraction was performed using pre-trained models of CNN by fine tuning the hyper parameters to achieve better classification accuracy. The feature activation map provides salient details on how the pertinent layers extract sharp details from the image that would have not been possible to predict through naked eye.

Contrast limited adaptive histogram equalization

For enhancing the visibility level of the gray scale images, (Contrast Limited Adaptive Histogram Equalization (CLAHE) transformation technique is applied. It is a modification of adaptive histogram equalization technique in which excessive amplification of contrast is limited. The CLAHE algorithm is employed on small image regions (named as titles) rather than on entire image. It performs histogram equalization in small tiles or patches with contrast limiting and high accuracy. The artificial boundaries present in the images are removed using the combination of neighbouring tiles with bilinear interpolation method. After this process, still the noise is present in the enhanced image. To overcome the noise, double level thresholding is performed using histogram based thresholding method in the enhanced image. Then finally otsu thresholding technique is applied to obtain the best enhanced image.

Data augmentation

In order to have the most accurate prediction approach on test samples, deep learning demand a vast quantity of labelled sample data. By producing multiple iterations of the samples in a dataset, data augmentation takes the condition of expanding the size and variety of labelled training sets. Data augmentation approaches are often employed to alleviate class imbalance issues, minimize overfitting in deep learning, and facilitate convergence, all of which contribute to improved outcomes.

Collection of such huge number of medical image data is an overly expensive and exhausting process. Our study involved 1130 training samples, which underwent augmentation so that our model could learn the samples with suitable variations. The augmentation procedure was carried out using Python’s albumentation module [39], which provides a wide variety of augmentation techniques. Figure 3 depicts the eight augmentation techniques involved, while Table 2 enlists and explains the augmentation techniques taken for model training.

Fig. 3
figure 3

Example of different augmentation techniques applied for 18-F-FDG-PET images taken from the dataset

Table 2 Image augmentation types and its description

Transfer learning

Transfer Learning works on the idea of training a base network on a bigger, wider ranged dataset, and later the features or “weights” learned from the first dataset are redeployed for use, to be trained on a second dataset. Pre-trained models, trained on the ImageNet dataset, which has around 1.2 million images of 1000 labels, have been repurposed for use in several classification tasks and thus contributing actively in research. Substantial amount of time is saved, when parameters and hyperparameters are updated or fine-tuned according to the task’s interest, with only the last few layers being trained from scratch.

Residual network

ResNet stands as a short form of Residual Network described in Fig. 4. To overcome the vanishing and exploding gradient problem occurring in the course of backpropagation, the idea of residual learning was introduced [40]. The building functions of residual learning layers comprises the identity branch and the residual mapping function,which makes up the identity shortcut connection.The core idea behind this residual learning is this “identity shortcut connection”, which skips one or more layers.

Fig. 4
figure 4

Residual block of ResNet Architecture

Mathematically, this maybe be explained as, rather than conforming to the identity mapping of

$$\mathrm{H}\left(\mathrm{z}\right)=\mathrm{z}$$
(1)

The additional layers are made to suit another mapping owing to residual learning:

$$\mathrm{F}(\mathrm{z})=\mathrm{H}(\mathrm{z})-\mathrm{z}$$
(2)

this means H(z) gets transformed to:

$$\mathrm{H}(\mathrm{z})=\mathrm{F}(\mathrm{z})+\mathrm{z}$$
(3)

It was accomplished by the use of “shortcut connections.”

Figure 4 describes gives the diagrammatic representation of the identity shortcut connections which make up the building block of residual learning.

Training the ResNet model

For training deep neural networks without facing vanishing/exploding gradient problems, ResNet perfectly emends such issues while also keeping the computational time and classification accuracy in check, according to the task. The author [40] offered different variants of Resnet with 50,101,152 layers. For our multi-classification task, Resnet50 was used based on the optimal size of the model, while also having an effective approach for better generalization and classification with fewer error rates. The hyperparameters used for finetuning the ResNet-50 model is given in Table 3. ResNet-50 filters follow two architecture criteria, as shown in Table 4. First, the layers have the same number of filters, when the input and output feature sizes are the same. Second, halving the feature map size doubles the number of filters. Convolutional layers with a stride of 2 execute down-sampling effectively, and batch normalisation occurs soon after each convolution and before activation ReLU unit. A global average pooling layer with a 7 × 7 filter wraps up the network. As ResNet-50 was initially designed on an ImageNet dataset containing 1000 categories dataset, for training our dataset, only the last (i.e.) fifth convolutional block and the final classification layer were trained from scratch, while the other layers were non-trainable or frozen. Figure 5 illustrates the basic architecture of our ResNet-50 model. Detailed analysis of the output layers, shape, filters and channel size is shown in Table 4.

Fig. 5
figure 5

ResNet50 network architecture

Table 3 Hyperparameters considered for fine-tuning ResNet 50 Model
Table 4 ResNet-50 architecture for AD classification

Hyperparameter tuning

Considering the computational constraints and model uniformity across the dataset for training, the model was tuned using the selected hyperparameters enlisted in Table 3. Parameter values for learning rate, exponential decay factor are chosen with consideration of the Adam optimizer. To initialize the weights during training, kernel initialization was done using the default Glorot uniform initializer. The callbacks for preventing overfitting of the model were chosen reduce learning Rate on Plateau (ReduceLRonPlateau) and Model Check Point (save the model with the associated dataset weights).To track the training of the multi-classification problem, categorical cross entropy was applied.

$${Loss}_{CCE}=-\sum_{c=1}^{K} {y}_{o,c}ln\left({p}_{o,c}\right)$$
(4)

Here, K denotes the number of classes (4 classes in our case), \(y\) being the binary indicator, whether \(c\) is the correct predicted value for observation \(o\), while \(p\) is the predicted probability for observation \(o\) belonging to class \(c\).

Performance metrics

We employ the most frequently used deep learning performance metrics, such as sensitivity, specificity, precision, recall, F1-score, ROC-AUC, and confusion matrix, to assess the performance of our models. For the performance metrics used, the computed range follows around [0, 1]; the greater the value, the better model’s performance. The metrics are as follows: TP stands for true positive, FP stands for false positive, TN stands for true negative, and FN is for false negative.

Recall/Sensitivity: A statistical metric that measures how many accurate positive predictions were made out of all possible positive predictions. It’s determined by dividing the total count of true positives and false negatives by the total value of true positives (e.g. True positive rate).

$${\text{Sensitivity}}=\frac{TP}{TP+FN}$$
(5)

Specificity: A statistical metric that measures how many accurate negative predictions were made out of all the possible negative predictions. This is determined by dividing the total count of true negatives and false positives by the total value of true negatives. (e.g. True negative rate).

$${\text{Specificity}}=\frac{TN}{TN+FP}$$
(6)

Precision: A statistical metric that measures how many optimistic results were accurate. It is determined by dividing the cumulative count of true positives and false positives by the number of true positives.

$$\text{Precision }=\frac{TP}{TP+FP}$$
(7)

F1-score: A harmonic average of statistical measure of Precision and Recall, and it provides a more accurate estimate of wrongly graded cases than accuracy metric.

$$\mathrm{F}1\mathrm{ Score}=2\times \frac{\text{ Precision }\times \text{ Recall }}{\text{ Precision }+\text{ Recall}}$$
(8)

Accuracy: The proportion of true positives and true negatives in all of the examined cases is calculated using the accuracy metric. The mathematical expression to calculate accuracy is given by

$$\text{Accuracy }=\frac{TP+TN}{TP+TN+FP+FN}$$
(9)

Results

Following the proposed methodology outlined in the previous section, with only 50 epochs, a training accuracy of 96.23% and testing accuracy of 98.44% was achieved, as visualised in Fig. 6. Figure 6a represents the training and validation accuracy curves for 50 epochs of ResNet 50 model. From the curve, it was observed that, the training and validation accuracy gradually increases, as number of epoch increases and reaches the maximum point at 50th epoch. Figure 6b depicts the training and validation loss of ResNet 50 model. As indicated in the graph, the training loss decreases, as the number of epoch increases.

Fig. 6
figure 6

a Training and Validation accuracy of ResNet 50 model; b Training and Validation loss of ResNet 50 Model

Table 5 shows the performance metrics of the test set of ResNet 50 Model. It compares the outcome findings of the test set to the selected statistical metrics, taking into consideration the results obtained, with and without augmentation. From the Table 5, it was inferred that the accuracy and precision increased for test set with augmented images compared to without augmented images.

Table 5 Performance metrics of the model with and without involving augmentation techniques

A confusion matrix, sometimes called a contingency table, is a statistical method that is used to properly assess paired observations. Rows denote an expected class, while columns reflect an actual class, or vice versa. For our multi-classification problem, confusion matrix was utilized to assess the efficiency of the model in classifying the different stages of the disease A Receiver Operating Characteristic (ROC) Curve is a technique for determining the effectiveness of a test. It’s a graph that shows how the true positive rate (i.e. sensitivity) compares to the false positive rate (i.e. specificity). Figure 7a represents the confusion matrix of Multi –class classification such as AD, CN, EMCI and LMCI and Fig. 7b indicates the ROC for the testing set used in our model.

Fig. 7
figure 7

a Confusion matrix of Multi-class classification. b ROC curve depicting AUC value for multi-class classification

Visualising the activation layers

The feature maps of the first two layers of our model trained on the dataset are graphically represented in Fig. 8. Both the representation of Fig. 8a, b has 64 images that are organized in an 8 by 8 grid. As the number of parameters with respect to neurons increase at a rampant rate at the later layers of the model, only the first two layers are presented. The activation map provides salient details on how the pertinent layers extract sharp details from the image that would have not been possible to predict through naked eye.

Fig. 8
figure 8

a Feature map for convolution layer 1. b Feature map for max pooling layer

The Fig. 8a, b below visualises the layer first Convolution layer and Max Pooling layer using Feature Maps.

Comparison with other models

We compare our proposed method on the same dataset, on a similar hyper-parameter setting on pre-trained deep learning models (Fig. 9), over a range of 35–50 epochs. Parameters including test accuracy, test loss along with the training time (Table 6) were checked for analysing and deploying the best possible model in practice.

Fig. 9
figure 9

Model comparison with other pretrained deep learning Models

Table 6 Comparison with the different Pre-trianed models

Res Net 50V2 and Mobile net V2 model was trained in google colab for 13 and 14.4 h respectively, whereas Xception net and Dense Net 121 tooks 18.8 and 18.6 h respectively for training duration. It was observed that Res Net 50V2 model was trained in lesser time compared to other nets. Also, the test loss incurred in ResNet 50V2 is less compared to other pre-trained models. The maximum training time with maximum loss is observed in Xception Net. We are running the pre-trained models in Intel core I5 Processor with 8 GB RAM in Google Colab in cloud environment. Hence the computational complexity can be reduced by using optimization or pruning away the weights and quantizing the network.

Discussion

According to current reports, AD may possibly rank third (after heart disease and cancer) as the major risk factor for mortality among the elderly [45]. According to several studies, recognizing the growth of Alzheimer’s disease in its early stages and preventing the disease from developing further has become essentially important. There has been tremendous research done on Alzheimer’s and its early phases that include AD MCI using various imaging modalities such as MRI, functional MRI, and FDG-PET, and it has received a lot of attention [46,47,48,49]. Liu et al. proposed a combination of convolutional and recurrent neural network to perform classification between AD vs CN and MCI vs CN subjects from FDG-PET images, achieving an overall AUC score of 95.3 and 83.9% respectively [48]. Carlos et al. used ensemble model using SVM and RF and achieved an overall accuracy of 66.78% for the AD, CN, MCI stages [49]. In the case of MCI individuals, subtypes, which include late and early phases of MCI development, are seldom discussed in literature, but are relevant in early therapy. Neuroimaging-based metrics have demonstrated great sensitivity in monitoring the stages over time in the examination of AD, and have therefore been presented as prospective biomarker to assess AD incidence, advancement, and responsiveness to treatments. There is a rising emphasis in utilizing 18F-FDG-PET to diagnose and predict moderate cognitive impairment in people with AD and those at potential threat for MCI. This has spurred interest in studying and classifying the MCI group’s EMCI and LMCI cohorts.

As a concrete effort for designing a therapeutically relevant deep learning algorithm, we sought to build a generic model that incorporated the brain images of different conditions (stages of AD) that are effective representation in a clinical context. The ADNI repository’s contributions made it simple to study the relationships between deep learning and features, which solved the challenge of getting ample amount of neuroimaging data in our specified modality. To our knowledge, this is the first study of its kind to examine the multiple stages (AD, EMCI, LMCI, CN) of Alzheimer’s disease utilising the 18F-FDG-PET imaging technique and achieve an overall AUC score of 95% by fine-tuning ResNet50 model. Ding et al. performed a comparable and impressive study in which they used InceptionV3 architecture to classify three groups (AD, MCI, CN) utilising the same imaging technique and got an AUC score of 98% [30]. Ford et al. conducted the study to evaluate the suspected AD and frontotemporal lobar degeneration (FTLD) on FDG-PET images using machine learning techniques [50]. The authors used random forest model for the classification of AD, FTLD, non-AD and non-FTLD and attained the accuracy of 63%. They achieved an AUC of random forest model as 0.82 and 0.89 using Z-score features from cerebellar and whole brain cerebellar regions to distinguish AD and non-AD patients. In the proposed study, RESNET model produced the testing accuracy of 98.4% in classification of AD, CN, EMCI and LMCI patients. The AUC value for RESNET 50 model was found to be 0.95 on the testing dataset for the classification of various stages of AD. Hence the Res Net 50 model outperforms the random forest classifier in discriminating the various stages of AD.

We attempt to investigate further the subtypes of MCI along with AD and control groups, enhance the training set by using image augmentation technique to make our deep learning model classify the four categories with higher sensitivity and accuracy. Making use of feature maps, we tried to visualise the filter and inherent working of our deep learning model. However, it still remains a challenge to interpret the various stages with naked eye. With additional fine-tuning approaches in deep learning and model visualisation methods, this framework maybe be useful for investigation in clinical environment and contribute in a significant way in the coveted field of radiology. The limitations of study are as follows: In clinical practice in the US, Medicare will typically only reimburse for FDG-PET if the question is to distinguish AD from frontotemporal dementia. It appears ResNet50V2 is not optimized to distinguish between different subtypes of dementia.

Conclusion

The study aims to investigate the potential effectiveness of 18F-FDG PET image modality in classifying multiple stages of AD by using deep residual neural network Utilising transfer learning and hyper-parameter tuning approach, Resnet-50 model is implemented for classifying four AD categories. Our modified Resnet-50 model successfully classifies the four categories with an overall sensitivity, specificity and ROC-AUC score of 94%, 95%, 95% respectively on the testing set. The implemented method is compared with the various other cutting-edge deep learning models: MobileNetV2, Xception, DenseNet121 and VGG16. The experiments show that our fine-tuned Resnet-50 model outperforms these models with higher accuracy achieved. The capability of the proposed technique to attain high classification accuracy (98.44% on our test set) using simply the FDG-PET modality and the classification model, makes it practically valuable for potential use during screening processes in radiological and clinical trials. By helping the detection and localisation of irregular patterns at multiple stages of the disease from this imaging modality, we demonstrated that our model could be coupled with experts’ visual interpretation, in existing clinical practise.