1 Introduction

Alzheimer’s disease (AD) is the most popular type of dementia, and it is a neurodegenerative disease with a slow onset course and a progressive deterioration over time, which eventually causes death [1]. Early signs of the disease include significant memory deterioration and difficulties in the determination of time, place, and people; occurrence of interference behaviors; change in personality; and delusional hallucinations that can even affect activities of daily life and ability for self-care. The disease causes increasing need for care in the patient and can put a great burden on the caretaker. Therefore, the early diagnosis of AD is an important issue [2, 3]. According to the diagnostic criteria for AD set by the National Institute on Aging-Alzheimer’s Association, the PET biomarker commonly used in the diagnosis of AD can be classified into two categories. The first type is the imaging of beta-amyloid protein deposition in the brain, such as the 11C-Pittsburgh compound B (PiB) [4]. According to previous findings, there is more beta-amyloid protein deposition in the brains of most patients with AD compared with a normal person. Because 11C-PiB can bind to the beta-amyloid proteins deposited in the brain, it can be used for the diagnosis of AD [5,6,7]. The second type is the imaging of the brain metabolic function, such as 18F-fluorodeoxyglucose (FDG) [8]. In patients with AD, the brain metabolic functions are decreased due to damage of brain nerve cells. Because 18F-FDG can reflect the glucose metabolism condition of the brain, the distribution of hypometabolism seen in the 18F-FDG image of patients with AD can be used for disease severity evaluation and diagnosis [7, 9,10,11,12].

AD is a progressively deteriorating disease. Beta-amyloid protein deposition in the brain gradually increases before and during the onset of the disease, lasting for up to 19 years, and the brain metabolic function decreases, gradually [13, 14]. Current consensus is that during the period of transformation from the normal stage to the AD stage, there is an in-between stage of mild cognitive impairment (MCI) [15, 16]. Every year, approximately 15% of those in the MCI stage transform to AD; in comparison, the proportion of those who transform directly from the normal stage to the AD stage is only 1–2% [16, 17]. In 11C-PiB imaging, it can be observed that there is a gradually increasing trend of activity uptake during the stages of transformation. Conversely, in 18F-FDG imaging, activity uptake gradually decreases during the stages of transformation due to the increasing severity of hypometabolism [18]. Most aforementioned studies have focused on a single biomarker. Because the two biomarkers possess different and even complimentary characteristics, it is evident that the combination of the results of the two biomarkers can provide even more detailed information for early diagnosis of AD [19, 20], and can enable us to further understand the image pattern of the disease. Additionally, because the two biomarkers (11C-PiB and 18F-FDG) can provide complementary information, we can therefore utilize machine learning based classification algorithms to perform prediction and diagnosis. Currently, there are many studies using several types of neuroimaging modalities to perform dementia classification [21,22,23,24,25,26,27,28,29,30]. Among these, the most commonly used algorithm is the support vector machine (SVM). In comparison with other algorithms (such as neural networks), SVM is able to retain favorable classification accuracy even with small sample sizes [31]. Most aforementioned studies have focused on magnetic resonance imaging (MRI), but studies about PET (particularly 11C-PiB) are rare. In addition, those studies mostly focus on the classification results of the entire brain, and lack in the exploration of that regarding each of the brain regions. In the present study, we collected 11C-PiB and 18F-FDG images from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu), performed classification using the SVM algorithm with the quantitative analysis results of the two biomarkers, and finally compared and analyzed the classification results in accordance with each brain region. We thus evaluated the feasibility of the application of the SVM classification algorithm in 11C-PiB and 18F-FDG images of limited subjects, and found the region most suitable for AD prediction and diagnosis, thereby providing a reference for AD diagnosis and increasing the early AD diagnostic accuracy.

2 Materials and Methods

2.1 ADNI Data and Subject Characteristics

All data for research used in this study came from the ADNI database (https://www.adni.loni.usc.edu), which is established by the National Institute on Aging (NIA), National Institute of Biomedical Imaging and Bioengineering (NIBIB), Food and Drug Administration (FDA), private pharmaceutical companies, and non-profit organizations. The primary objective of ADNI is to test for whether serial MRI, PET, other biomarkers, and clinical and neuropsychological evaluations can be used in combination to measure for MCI and early AD disease course. Confirmation of the sensitivity and specificity of biomarkers for early AD progression can help researchers and clinical physicians in the development of new treatment methods and monitor its efficacy, and can even reduce the time and cost of clinical trials. This study collected data of 56 subjects from the ADNI database (40 males, 39 females), including 20 AD subjects, 27 MCI subjects, and 32 normal controls (NCs). Related information on the subjects is summarized in Table 1. T1 MRI, FDG-PET, and PiB-PET imaging were performed on all subjects, with less than 1-month interval between the two PET imaging scans and less than 3-month interval between PET and MRI. The pulse sequence used for T1 MRI was MPRAGE, and the purpose of MRI imaging in this study was anatomical spatial normalization [32] and volume-of-interest (VOI) selection.

Table 1 Subject characteristics

2.2 Image Acquisition and Scanning Protocol

In the ADNI database, there are different scanning protocols according to different PET scanners. In this study, there were two protocols for FDG-PET. The first protocol performed a 60-min dynamic scan immediately after radiotracer administration via IV injection. The second protocol waited for 30 min of uptake after radiotracer administration via IV injection before performing a 30-min dynamic scan. In this study, 11 subjects belonged to the first protocol, and the remaining 88 subjects belonged to the second protocol. Additionally, the FDG injected dose for all subjects was 6.11 ± 1.89 mCi. Regarding PiB-PET, there were three protocols used in this study. The first protocol performed a 90-min dynamic scan immediately after IV injection of radiotracer, the second protocol followed the same procedure as the first but changed the time for dynamic scan to 70 min, and the third protocol waited for 50 min of uptake after IV injection of radiotracer before performing a 20-min dynamic scan. The first protocol was performed on 45 subjects, the second was performed on seven subjects, and the third was performed on 27 subjects. All subjects received an IV injection with 11C-PiB activity of 13.01 ± 2.83 mCi.

2.3 Image Processing Procedure

We first used the “Coregister: Estimate” module provided by SPM8 software (https://www.fil.ion.ucl.ac.uk/spm/) in combination with default parameters to co-register the T1 MRI to the T1 MRI template provided by SPM8 to ensure the AC-PC line of T1 MRI image can present a horizontal condition like the template does in MNI space (Montreal Neurological Institute space). Then, we used the “Normalise: Estimate & Write” (also provided by SPM8) module to perform spatial normalization on the previously co-registered T1 MRI image so that its image spatial axes and voxel size are consistent with the standard MNI space brain template [33, 34]. The processed T1 MRI image is then converted from the native space to the standard MNI space, simultaneously generating a set of standardized conversion parameter (*_sn.mat) for further PET image processing.

For the FDG-PET image, the summed image was obtained from the dynamic scans from 30 to 60 min after injection of radiotracer. Regarding PiB-PET, the summed image was obtained from the dynamic scans from 50 to 70 min after injection of radiotracer. Then, all subjects’ PET images were co-registered to their corresponding co-registered MRI images, so that the two share the same spatial anatomical coordinates (Reference Image: co-registered MRI T1; Source Image: PET). Then, the “Normalize: Write” module was used to apply the previously generated standardized conversion parameters to the corresponding co-registered PET images (parameter file: conversion parameter; Images to Write: co-registered PET) to complete spatial normalization. At this point, the PET image is converted from the native space to the standard MNI space. The flow of image processing is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the image processing procedure

2.4 VOI Generation

For VOI generation, we first used the “Segment” module provided by SPM8 software to perform image segmentation on each spatially normalized T1 MRI image to generate probability maps for the three regions, namely gray matter (GM), white matter (WM), and cerebrospinal fluid (CSF), and one segmentation parameter matrix (*_seg_sn.mat). Then, we used the “Automatic labeling” function provided by IBASPM (http://www.thomaskoenig.ch/Lester/ibaspm.htm) [35], first inputting the previously generated three probability maps and one segmentation parameter matrix, in combination with the Automated Anatomical Labeling (AAL) regional map to divide the spatially normalized MRI images into 119 regions [36, 37], thus generating VOIs dedicated to each subject, as shown by Fig. 2.

Fig. 2
figure 2

Spatially normalized T1 MRI image for each subject condition with the overlay of its corresponding VOIs segmented by IBASPM. The top, middle, and bottom row stand for AD, MCI, and NC, respectively. Each color represents a different region in automated anatomical labeling (AAL)

2.5 Feature Extraction

In this study, the 119 AAL VOIs previously segmented from the spatially normalized T1 MRI images were combined, and regions commonly used for AD research according to past research were selected for feature extraction with FDG-PET and PiB-PET, respectively [38,39,40]. The feature used in this study is the standardized uptake value ratio (SUVR). Its calculation method is the average of each VOI within the target region divided by the average of each VOI within the reference region. The target regions include the whole brain cortex, orbitofrontal cortex, parietal cortex, precuneus, temporal cortex, and posterior cingulum, and the reference region is the cerebellar gray matter. In consideration that glucose metabolism in cerebellar gray matter may differ among different subjects and affect the accuracy of SUVR, we selected pon as an additional reference region for SUVR calculation in FDG image for comparison. This was done because the aforementioned region has been proven to be a reference region with higher reliability in AD diagnosis, and can increase clinical diagnosis accuracy and significant differences in statistical analysis [41,42,43].

2.6 SVM Analysis

SVM belongs to the supervised learning category within machine learning. This algorithm is commonly used to perform data classification. SVM considers all data points as an N-dimensional vector and divides the data into two categories by finding the most appropriate N-1 dimensional hyperplane. This hyperplane has greatest distances from both margins of the two classifications of data. Since SVM shows favorable data classification results in cases where numbers of training samples are small [44], currently it is also commonly used in clinical dementia studies.

In this study, SVM was used in the classification of different disease groups (i.e., AD vs. normal control and MCI vs. normal control) because it is suitable for limited subjects. Actual implementation of the algorithm in this study was performed with the “svmtrain” and “svmclassify” functions in MALAB R2011b (MathWorks Inc., Sherborn, MA), and radial basis function was used as the kernel function for the establishment of the classification model. The PiB and FDG SUVR of various VOIs calculated from before were used as features in SVM for classification. The tenfold cross validation was performed on all subjects, and data to be classified were divided into ten groups where nine groups were used for SVM training sample set, and the remaining one group was used for testing sample set. This procedure was repeated ten times, and a tenfold cross validation was considered to be completed when each group’s data has all been used for testing sample set for one time [45]. In order to obtain more accurate and consistent results, the study performed 50 times of tenfold cross validation, and calculated the accuracy, sensitivity, and specificity mean for each of the regions under different disease group classifications.

3 Results

3.1 Statistical Test

Figure 3 shows the global (whole brain cortex, CTX) and regional average SUVRs of each group. There are two values for every region’s FDG SUVR, and each value respectively corresponds to the two different reference regions, which are cerebellar gray matter (CbG) and pon. No matter it is in AD, MCI, or NC group, the global and regional FDG SUVRs calculated using pon as the reference region were all greater than those calculated using CbG as the reference region. Additionally, the NC group had the greatest global and regional FDG SUVRs, followed by the MCI group, and lastly the AD group. In contrast, PiB SUVR shows the opposite trend, with the AD group possessing the highest values, followed by the MCI group, and lastly the NC group.

Fig. 3
figure 3

Global (CTX) and regional (OrbFro, Par, PC, Tem, PCG) group mean SUVRs: a FDG SUVR using cerebellar gray matter as reference region, b FDG SUVR using pon as reference region, and c PiB SUVR. Two samples t tests between NC and remaining two groups (AD and MCI) are presented for each VOI. SUVR standardized uptake value ratio, CTX whole brain cortex, OrbFro orbitofrontal, Par parietal, PC precuneus, Tem temporal, PCG posterior cingulum, CbG cerebellar gray matter, Pon pon. *p < 0.05 vs. NC. **p < 0.01 vs. NC. ***p < 0.001 vs. NC

One-tailed two-sample t test with alpha level equal to 0.05 was performed for the AD and MCI groups in comparison with the NC group. In the AD group, there was a significant difference (p < 0.05) in FDG SUVR (CbG) in the whole brain cortex and orbitofrontal cortex regions, a more distinct significant difference (p < 0.01) in the parietal cortex region, and highly significant differences (p < 0.001) in the precuneus, temporal cortex, and posterior cingulum regions. Regarding FDG SUVR (Pon), all regions showed highly significant differences (p < 0.001). PiB SUVR showed a highly significant difference (p < 0.001) in the whole brain cortex, orbitofrontal cortex, precuneus, temporal cortex, and posterior cingulum regions, with a lower significant difference (p < 0.01) in the parietal cortex region.

In the MCI group, FDG SUVR (CbG) only showed a significant difference (p < 0.05) in the temporal region. However, for FDG SUVR (Pon), there was a significant difference (p < 0.05) in the precuneus region and larger significant differences (p < 0.01) in the whole brain cortex, parietal, and posterior cingulum regions. In addition, there was a highly significant difference (p < 0.001) in the orbitofrontal and temporal regions. Regarding PiB SUVR, there were significant differences (p < 0.05) in the orbitofrontal, precuneus, and posterior cingulum regions and larger significant differences (p < 0.01) in the whole brain cortex, parietal, and temporal regions; but no regions showed a highly significant difference (p < 0.001).

3.2 SVM Analysis

Table 2 lists the accuracy mean, sensitivity mean, and specificity mean of each region in different disease group classifications calculated with SVM on FDG SUVR (CbG) and FDG SUVR (Pon) with PiB SUVR, which was through 50 times of tenfold cross validations. Table 2a lists all the accuracy mean. For FDG SUVR (CbG) and PiB SUVR, the accuracy mean for the AD and NC group classifications with the whole brain cortex, orbitofrontal cortex, parietal cortex, precuneus, temporal cortex, and posterior cingulum regions were 72.18% ± 3.02%, 73.08% ± 3.47%, 68.26% ± 2.14%, 72.38% ± 2.58%, 82.71% ± 1.37%, and 77.10% ± 2.45%, respectively. The accuracy mean for the MCI and NC group classifications with each of the regions was 61.28% ± 3.01%, 53.00% ± 2.65%, 59.89% ± 2.36%, 58.02% ± 2.96%, 70.05% ± 2.43%, and 62.23% ± 2.23%, respectively. After performing the same analysis with FDG SUVR (Pon) and PiB SUVR, the accuracy mean for each of the target regions in the AD and NC group classifications was 80.88% ± 2.93%, 73.55% ± 3.05%, 78.33% ± 2.73%, 80.08% ± 2.21%, 86.06% ± 0.89%, and 77.68% ± 1.44%, respectively. Regarding the MCI and NC group classifications, the accuracy mean for each of the regions was 70.05% ± 2.10%, 66.45% ± 3.57%, 64.78% ± 1.62%, 63.01% ± 1.68%, 80.21% ± 0.88%, and 64.97% ± 1.34%, respectively. Other than the accuracy mean, the sensitivity and specificity mean of each region in the classifications of different disease groups are listed in Table 2b and c.

Table 2 Support vector machine analysis

4 Discussion

Currently, the application of PET in AD diagnosis often uses amyloid protein imaging (such as 11C-PiB). The fastest and most convenient diagnostic method is visual inspection, which involves evaluating disease severity though direct observation of the amount of amyloid protein deposition in the image [38, 46,47,48,49]. However, different image interpretation may occur in this method due to different physicians. Therefore, this method cannot provide consistent results. Experiences of the physicians can also have a significant impact on the reliability of the diagnosis results. Another diagnostic method involves the calculation of a cutoff value using the image quantitative figure-of-merit (FOM), splitting the images into two groups—amyloid-positive and amyloid-negative. SUVR is a semi-quantitative FOM that is currently being widely used. There are several studies that use different methods to generate the cutoff value and split the 11C-PiB images into the two groups mentioned above [50,51,52]. However, different calculation methods for quantitative FOMs of each method may cause some difference in the cutoff value. There is current difficulty in generating a golden standard cutoff value to be used in image classification and diagnosis. Additionally, even if classification is performed using the cutoff value, there are still other special conditions that may affect the diagnostic accuracy, including a normal subject with extra amyloid protein deposition classified as amyloid-positive. Therefore, if a second PET biomarker image is generated to provide additional information, the accuracy for diagnosis may increase. The present study used two PET biomarkers commonly used in AD diagnosis, in combination with the machine learning based classification algorithm, SVM, to perform classification of the subjects and evaluated the feasibility for SVM to be used in PET AD diagnosis through calculating the accuracy mean, sensitivity mean, and specificity mean.

We found that as the stage of disease goes on, the FDG SUVRs gradually decrease (Fig. 3), and the opposite situation was observed with the PiB SUVRs. This is consistent with the results of traditional research, which state that as the disease severity increases, the brain glucose metabolic function decreases, causing more and more significant hypometabolism in the image and further causing gradually decreasing FDG SUVRs. Additionally, because PiB reflects beta-amyloid protein deposition in the brain, the beta-amyloid protein deposition amount gradually increases during the process of transformation from a normal stage to the MCI stage and then into the AD stage, causing the phenomenon of gradually increasing PiB SUVRs. It is worth noting that when the pon was used as the reference region to calculate the FDG SUVRs, all the values calculated were higher than those of when CbG was used as the reference region. This shows that there is a smaller difference among subjects in the FDG uptake of the pon region when compared with the CbG region. Owing to some of the subjects having uptake to a certain extent in the CbG, the FDG SUVRs were decreased. Therefore, using pon as the reference region to calculate FDG SUVR is a better choice. This region satisfies the consistency and low uptake requirements necessary for a reference region. Additionally, we also found that there was higher statistical difference when the pon was used as the reference region in comparison to when CbG was used as the reference region. In the two-sample t-test of the AD and NC groups, although both reference regions showed statistically significant differences, FDG SUVR (Pon) showed a more prominent significant difference. In the t-test for the MCI and NC groups, the superiority of FDG SUVR (Pon) was even more prominently displayed. Although FDG SUVR (Pon) showed decreased significance in some regions compared with the previously mentioned test (AD vs. NC), the statistically significant standard was still met. In contrast, for FDG SUVR (CbG), only the temporal cortex region showed a significant difference. This proves again that pon, in comparison to CbG, is a better choice as the reference region and can increase the diagnostic accuracy. This can be verified again by the accuracy mean of the SVM classification later on. The characteristic of PiB as a favorable PET biomarker for AD diagnosis was significant in the PiB SUVR results. Both AD and MCI groups had significant differences in comparison with the NC group in statistical tests, and these results in combination with the aforementioned FDG SUVR results provide the feasibility for the application of SVM in AD diagnosis.

It was observed in the accuracy mean in Table 2a that the results calculated using pon as the reference region showed a higher SVM accuracy when in combination with PiB SUVR. This trend was observed in both the AD and NC group classifications and the MCI and NC group classifications, with different levels of increase in different regions. In the AD and NC group classifications, the increase in the parietal cortex was the most prominent, increasing from 68.26 to 78.33%, a 10.07% increase in the accuracy, with the least increase in the posterior cingulum, with only 0.58% increase. As for the classification results between the MCI and NC groups, the increase in the orbitofrontal cortex was most significant, increasing by 13% from 53 to 66%, while again the posterior cingulum had the least increase, which was from 62.23 to 64.97%, with an increase of 2.74%. In addition to the accuracy mean, we also found in Table 2b and c that when the pon was used as the reference region, there were increases in sensitivity mean and specificity mean when results were calculated for the different regions between different group classifications. We found that in the AD and MCI groups, the significant differences for FDG SUVR (Pon) were more prominent than those for FDG SUVR (CbG) (Fig. 3), indicating larger SUVR differences between normal controls and patients, resulting in more effective and accurate classification of the data into the two classifications, with increases in the accuracy, sensitivity, and specificity mean as well. Additionally, in the AD and NC group classifications, we found that the accuracy mean for all regions was above 70%, some reaching 80%, and the accuracy mean for the temporal cortex had the excellent result of 86.06%, with sensitivity mean and specificity mean reaching 82.40% and 90.36%, respectively. For the MCI and NC group classifications, the accuracy mean was generally lower than the results of the previous classification results (AD vs. NC), where the accuracy calculated using CbG ranged from 50 to 70% and using pon ranged from 60–80%. It is noteworthy that the temporal cortex region always had the highest accuracy. This is because as a normal person transforms into the MCI stage, the brain neurological function degeneration mostly begins with the temporal cortex, causing the most significant SUVR difference in the temporal region out of all the regions during the MCI stage, resulting in the highest accuracy. Overviewing everything mentioned above, FDG SUVR (Pon) is a better quantitative analysis FOM than FDG SUVR (CbG) and provides more favorable results in the application of SVM in AD diagnosis. There is indeed certain feasibility for the application of SVM in the clinical diagnosis of AD, particularly between the AD and NC groups. Regarding the classification between the MCI and NC groups, FDG SUVR (Pon) in combination with PiB SUVR in the temporal region also provided fair results.

5 Conclusions

This study used the two PET biomarkers 18F-FDG and 11C-PiB in combination with the SVM classification algorithm to conduct clinical diagnosis of AD with limited subjects. Favorable results were obtained in the quantitative analysis of FDG when using pon as the reference region, and a higher diagnostic accuracy was also obtained. Additionally, the diagnosis of AD with SVM also appeared to have fairly good results in this study, particularly achieving a very high accuracy, sensitivity, and specificity in the temporal cortex region. Therefore, using the dual PET biomarkers in combination with SVM possesses a certain significance and feasibility in the clinical diagnosis of AD (particularly in the temporal cortex region). Future studies using other machine learning or deep learning classification algorithms should focus on the temporal cortex region to reduce data calculation during training and increase efficacy of the algorithm. Besides SUVR, other features or quantitative FOMs can also be included for more stringent classification of disease severity and staging and even for classifications above the binary level (binary classification). Owing to the limited number of subjects in this study, the SVM classification algorithm was chosen. If data of more subjects are obtained in the future, other types of classification algorithms can also be utilized to perform big data analysis to further increase the diagnostic accuracy. Furthermore, there are other PET radiotracers that also reflect amyloid protein deposition in addition to 11C-PiB, such as 18F-AV-45, which uses 18F-labeled tracer, and consequently allows relatively easier drug preparation. We expect that the application of this radiotracer in this study will also achieve good results.