Keywords

1 Introduction

Alzheimer’s disease (AD) is a chronic and progressive decline in cognitive function due to the damage to brain cells. The number of patients suffering from AD worldwide is more than 30 million, and this number is expected to triple by 2050 due to the increasing life expectancy [1]. The onset of AD is typically insidious-cognitive decline generally begins years before reaching the threshold of clinical significance and functional impairment [2]. The prodromal stage of AD called mild cognitive impairment (MCI) is not bound to fall in the deterioration with treatments performed in time. The conversion rate from MCI to AD is estimated to be between 10 and 25% per year [3]. Although there are no current disease modifying agents to halt the progression of AD, there are a number of clinical trials underway in patients with pre-symptomatic disease. Thus, as effective therapies become available, the early identification of patients with MCI will be of tremendous benefit to patients and their families.

The pathology of AD includes cortical and subcortical atrophy together with the deposition of \( \beta \)-amyloid. Molecular medical imaging, such as the positron emission tomography (PET), single-photon emission computed tomography (SPECT), and magnetic resonance imaging (MRI), offers the ability to visualize hypometaboism or atrophy introduced by AD, and hence has led to a revolution in early diagnosis of AD [4]. In particular, functional PET and SPECT can detect subtle changes in cerebral metabolism prior to anatomical changes are evident or a symptomatological diagnosis of probable AD can be made with MR imaging [5]. Comparing to SPECT, PET is able to provide higher resolution information than SPECT in evaluating patients with suspected AD. 18-Fluorodeoxyglucose (FDG) is a widely used tracer in PET imaging for studying glucose metabolism. The comparative study made by Silverman [6] demonstrated that FDG-PET is superior to perfusion SPECT in identifying early changes associated with AD and other neurodegenerative dementias.

However, early diagnosis of AD remains a challenging task, since the pathological changes can be subtle in the early course of the disease and there can be some overlap with other neurodegenerative disorders [7]. Most computer-aided approaches consist of extraction of image features to describe the pathological changes and construction of a statistical model of the disease from a set of training examples using supervised methods [8]. The features that have been considered include global features, which are computed by applying the entire brain volume to a linear transform, and local features, such as the statistics, histograms, and gradients calculated from volumes of interest (VOIs). Most pattern classification techniques, including the k-mean clustering [9], artificial neural network (ANN) [10] and support vector machine (SVM) [5], have been successfully applied to this task.

Recently, the deep learning technique has shown proven ability to analysis medical images. It provides a unified framework for simultaneous learning image representation and feature classification, and thus avoids the troublesome hand-crafted feature extraction and feature engineering. Suk et al. [11] used the deep model learned image representation to differentiate AD subjects from MCI ones based on the assumption that those deep features inherent the latent high-level information. They also fused multiple sparse regression networks as the target-level representation [12]. Ortiz et al. [13] applied the deep belief network to the early diagnosis of the Alzheimer’s disease using both MRI and PET scans. Valliani et al. [14] employed the deep residual network (ResNet) that has been pre-trained on the ImageNet natural image dataset to identify dementia types.

In this paper, we propose an ensemble of AlexNets (EnAlexNets) model to diagnosis between AD and NC and differentiate between stages of MCI, mild MCI (mMCI) vs. severe MCI (sMCI). The uniqueness of our approach include: (1) using the automated anatomical labeling (AAL) cortical parcellation map to obtain 62 brain anatomical volumes; (2) extracting image patches from each cortical volume to train a candidate AlexNet, the champion model for the ILSVRC2012 image classification task; (3) selecting effective AlexNet models and ensemble them to generate more accurate diagnosis. The proposed EnAlexNets model was evaluated against seven methods on the ADNI dataset and achieved the state-of-the-art performance.

2 Dataset

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial MRI, PET, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and early AD.

We selected 962 FDG-PET studies from the ADNI database, including 241 AD cases, 306 mild MCI (mMCI) cases, 127 severe MCI (sMCI) cases and 288 normal controls (NCs). 185 MBq of 18F-Fludeoxyglucose (FDG) was injected to the subjects and PET scanning was commenced approximately 30 min after tracer injection that produced 3D scan consists of six 5 min frames. The scans used for this study are baseline or screening in the ADNI database, and had been through a pre-processing pipeline that includes co-registration, averaging, voxel normalization, and isotropic Gaussian smoothing [15]. This pre-processing work was done by the ADNI participants and it makes any subsequent analysis simpler as the data from different PET scanners are then uniform. The demographic information of the selected dataset is shown in Table 1.

Table 1. Demographics data of patients in our dataset

3 Method

The proposed algorithm consists of four major steps: anatomical volume detection, data augmentation, patch classification, and dementia differentiation. A diagram that summarizes this algorithm was shown in Fig. 1.

Fig. 1.
figure 1

Diagram of the proposed EnAlexNets algorithm.

3.1 Anatomical Volume Detection

The AAL cortical parcellation map [16], which provides 62 brain anatomical volumes including 54 symmetrical and 8 asymmetrical volumes, is well aligned with the template brain PET image supplied with the statistical parametric mapping (SPM, Version 2012) package [17].

To map the anatomical labels from the atlas onto each study and maintain the original brain image information, we spatially normalized the SPM brain PET template to FDG-PET images, which are self-aligned, using the spatial normalization procedure supplied with the SPM package. Thus each anatomical volume in the AAL atlas can be transferred to our FDG-PET images. For each volume, we located its geometry center on FDG-PET images and put a \( 67 \times 67 \) window on that center for image patch extraction.

3.2 Data Augmentation

Since number of extracted image patches is limited, we employed three data augmentation techniques to enlarge our image patch dataset. First, we shifted the PET image horizontally and vertically with a step of 5 voxels before patch extraction. Second, we move the patch extraction window to its adjacent transverse slices forwardly and backwardly. Third, rotate the PET image clockwise and counter-clockwise with 5 and 10°. With these operations, each image patch has nine augmented copies.

3.3 Patch Classification

Considering the limited image patches we have, too complex deep neural networks may suffer from over-fitting to our training data. Hence, we adopted the AlexNet, the champion model for the ILSVRC2012 image classification task, as our classifier, which consists of only 5 convolutional layers and 3 fully-connected layers (see Fig. 2).

Fig. 2.
figure 2

Architecture of the AlexNet.

It has been widely acknowledged that the image representation learned from large-scale datasets can be efficiently transferred to generic visual recognition tasks, which have limited training data [18, 19]. Therefore, the AlexNet used for this study was previously trained on the ImageNet training dataset, which is a 1000-category large-scale natural image database. To adapt this model to our problem, we replaced its last three layers with a fully connected layer, a softmax layer and a classification output layer, and adjust the learning rates such that those new layers can be trained quickly and other layers are trained slowly.

Since the AlexNet takes an input image of size \( 227 \times 227 \times 3 \), we duplicated each image patch three times and resize each copy to \( 227 \times 227 \) by the bilinear interpolation algorithm.

3.4 Dementia Differentiation

Since we obtained 62 cortical volumes on each PET scan, we can use the image patches extracted in each volume to fine-tune a pre-trained AlexNet. As we know, not all cortical volumes play equally critical roles in dementia diagnosis. Therefore, the diagnosis capacity of each fine-tuned AlexNet varies a lot. We chose 30% of the fine-tuned AlexNets, which performed best on the validation dataset, and adopted the majority voting scheme to combine their decisions for dementia differentiation. Such ensemble learning acts as the role of expert consultation aiming to get a more accurate diagnosis.

3.5 Evaluation

We adopted the 5-folder cross-validation scheme to evaluate the performance of the proposed algorithm. We randomly sampled the instances from each class to form five folds, aiming to ensure the distribution of data in each fold is as similar as that of the whole dataset. In each run, we used 80% of patches for training, 20% for validation and 20% for testing. The data augmentation procedure was not applied to test patches.

4 Results

Table 2 gives the accuracy of the proposed algorithm and other seven state-of-the-art algorithms when applying them to differentiate AD cases from NCs. It shows that our algorithm and the method reported in [13] achieved the highest classification accuracy. However, the other algorithm used both PET and MRI data. Meanwhile, it reveals that our algorithm achieved the highest specificity and the second best sensitivity among those eight algorithms.

Table 2. Accuracy of eight methods (AD vs. NC)

Figure 3 depicts the accuracy of the proposed algorithm when applying it to the mMCI-sMCI classification. It shows that our algorithm achieved a classification accuracy of 85% in this problem.

Fig. 3.
figure 3

The performance of proposed algorithm (mMCI vs. sMCI).

5 Discussion

The proposed EnAlexNets algorithm is based on the pre-trained and fine-tuned AlexNets and ensemble of these models. Our experiments indicate that this algorithm is robust in securing more distinguish features and achieving better performance than that of using multi-modalities images. The EnAlexNets algorithm enables to effectively compensate for the less amount of information afforded by single image mode and reduce the complexity of diagnosis process requirements for multi-mode images. These investigations can be further improved by improving the CNN structure and enlarging the dataset. As for the time-complexity, the training time of multiple AlexNets summed up to 111,600 s same level as other state-of-art deep learning based methods. The diagnosis time of one patient case in our method was around 10 s. With better computation resources, this time consumed can be further diminished.

6 Conclusion

In this study, we proposed a multi-AlexNets based method for computer-aided diagnosis of AD and MCI. This method not only utilized the combination of different brain regions but also incorporated AlexNet with its strong feature representation ability. Better performance was observed on two classification tasks. Apart from the traditional diagnosis between AD and NC, we initially conducted the differentiation between different stages of MCI, mMCI vs. sMCI. Due to the latent progression from MCI to further stage, we intend to find the risk as early as possible, which is of great significance to both clinicians and patients. This study may be of great assistance to the computer-aided diagnosis of dementia and other biomedical fields.