Introduction

Biomarkers for amyloid beta (Aβ) accumulation in the brain play a central role in the National Institute on Ageing and Alzheimer’s Association (NIA-AA) research definition of preclinical Alzheimer’s disease (AD) [1]. Preclinical AD, also termed asymptomatic AD, refers to the presence of AD-related pathophysiological processes, such as amyloid aggregation, in individuals who do not have cognitive deficits [1, 2]. Cognitively intact individuals who are amyloid positive are at increased risk for cognitive decline [3, 4]. Recent methods for defining amyloid positivity include positron emission tomography (PET) amyloid imaging and Aβ42 in cerebrospinal fluid assay. It is still largely unknown how the choice of a particular amyloid biomarker may affect the discrimination between amyloid-positive and amyloid-negative healthy subjects. This is important since an amyloid-positive status may define potential candidates for experimental (e.g. anti-amyloid) therapies in clinical drug development.

18F-Labelled tracers currently approved by the US Food and Drug Administration (FDA) and European Medicines Agency (EMA) for estimation of amyloid plaques in patients evaluated for cognitive decline are 18F-flutemetamol [5, 6], 18F-florbetaben [7] and 18F-florbetapir [8]. The cortical retention of 18F-flutemetamol has been compared to 11C-Pittsburgh compound B (PIB) [9] in clinical populations such as amnestic mild cognitive impairment (aMCI) patients together with clinically probable AD patients [10], or aMCI and clinically probable AD together with healthy controls [11], but not in cohorts consisting exclusively of cognitively intact older adults. This is crucial since the discriminative value of a tracer may also depend on the population under study. The concordance between two tracers may be better in a mixed sample of patients and controls than in a group consisting exclusively of cognitively normal controls. In cognitively intact older adults ligand retention values may lie closer to threshold than in patients with probable AD and it has been hypothesized that 11C-PIB could potentially outperform 18F-labelled tracers under such conditions [12]. Other 18F-labelled amyloid tracers (18F-florbetaben [7], 18F-florbetapir [8] and 18F-AZD4694 [13]) have also been compared with 11C-PIB within subjects, again mostly in clinical patient populations combined with cognitively intact older adults [14, 15]. For these tracers, no direct comparisons have been performed in cohorts consisting exclusively of cognitively intact older adults. Cognitively intact older adults who are amyloid positive constitute the target population for a number of current clinical drug development programmes. The success of targeted molecular therapies may critically depend on the presence of the drug target. Accurate ascertainment of amyloid positivity prior to inclusion may constitute one of the factors that determines the success of trials in preclinical AD (importance of high specificity) as well as the cost of screening for eligible subjects (importance of high sensitivity). Moreover, in cognitively intact older adults where amyloid levels are slightly to markedly elevated, the precise analysis method is essential. This may be less of an issue in patients with clinically probable AD who are well within the abnormal range. None of the studies comparing 18F-flutemetamol and 11C-PIB have evaluated the effect of reference region, magnetic resonance imaging (MRI) versus PET-based spatial normalization or partial volume correction (PVC) on the concordance between those tracers in cognitively normal older adults.

Therefore, the aim of this study was to directly compare 18F-flutemetamol to 11C-PIB within the same subjects in a prospective community-recruited cohort of cognitively intact older adults. We evaluated concordance between 18F-flutemetamol versus 11C-PIB binary classifications based on semiquantitative assessment and visual reads as well as the correlations between the semiquantitative measures. We also estimated the impact of different image analysis methods on amyloid quantification.

Materials and methods

Participants

Thirty-two cognitively intact older controls (mean age 72 years, SD 5) participated in this study (Table 1). They were recruited through advertisement in local newspapers and through websites for seniors as part of a larger longitudinal study, asking for healthy volunteers between 65 and 80 years of age for participation in a scientific study at the University Hospitals Leuven, Belgium, involving brain imaging. At screening, subjects underwent a detailed interview about medical history, a Mini-Mental State Examination (MMSE), a Clinical Dementia Rating (CDR), general physical and neurological examination, blood sampling and a conventional neuropsychological assessment. Inclusion criteria were age between 65 and 80 years, MMSE ≥ 27, CDR = 0 and normal test scores on neuropsychological assessment. Inclusion was stratified for two genetic factors: brain-derived neurotrophic factor (BDNF) (met allele present or absent) and apolipoprotein E (apoE) (ε4 allele present or absent), as this cohort was part of a larger 18F-flutemetamol study in healthy controls of the interactions between these polymorphisms [16, 17]. Exclusion criteria were neurological or psychiatric history and brain lesions on structural MRI. The protocol (EudraCT: 2009-014475-45) was approved by the Ethics Committee of the University Hospitals Leuven. Written informed consent was obtained from all subjects in accordance with the Declaration of Helsinki.

Table 1 Demographic and neuropsychological characteristics

Amyloid PET

PET scans were acquired on a 16-slice Siemens Biograph PET/CT scanner (Siemens, Erlangen, Germany). Tracers were injected as a bolus in an antecubital vein (18F-flutemetamol mean activity 150 MBq, SD 5 MBq, range 134–162 MBq; 11C-PIB mean activity 363 MBq, SD 33 MBq, range 255–420 MBq). The 18F-flutemetamol scan acquisition started 90 min after tracer injection and lasted for 30 min [6, 10, 16, 17]. The 11C-PIB scan was obtained within 30 days from the 18F-flutemetamol scan (mean 2 days, median 0 days, range −22 to 21 days). Three subjects could not come to the clinic within the 30-day period due to personal or health reasons and they received a 11C-PIB scan within 32, 39 and 118 days from the 18F-flutemetamol scan. Dynamic 11C-PIB scan acquisition extended from 0 to 70 min post tracer injection. Prior to PET acquisition, a low-dose CT scan of the head was performed for attenuation correction. Random and scatter correction were applied. The 18F-flutemetamol measurement was rebinned into six frames of 5 min, and the 11C-PIB measurement between 40 and 70 min post injection was also rebinned into six frames of 5 min each. Images were reconstructed using ordered subsets expectation maximization (4 iterations × 16 subsets). A structural T1-weighted MRI was acquired on a 3 T Philips Achieva scanner (3-D turbo field echo sequence, 32-channel Philips sensitivity encoding head coil: coronal inversion recovery prepared 3-D gradient-echo images, inversion time 900 ms, echo time/repetition time 4.6/9.6, flip angle 8°, voxel size 0.98 × 0.98 × 1.2 mm3 [17]).

The 18F-flutemetamol and the 11C-PIB scans were preprocessed using Statistical Parametric Mapping 8 (SPM8, http://www.fil.ion.ucl.ac.uk/spm). The individual images of the six frames were realigned and summed for both data sets separately. The individual’s 18F-flutemetamol and 11C-PIB PET summed images were coregistered to the subject’s T1-weighted structural MRI. 18F-Flutemetamol and 11C-PIB PET summed images were spatially normalized to the Montreal Neurological Institute (MNI) space using MRI. This was done in two steps. First, the MR image was spatially normalized to the SPM8 T1 template in MNI space using a unified segmentation approach. This generated the nonlinear transformation parameters as well as grey matter (GM), white matter (WM) and cerebrospinal fluid images. Next, these transformation parameters were applied to the individual’s coregistered 18F-flutemetamol and 11C-PIB PET summed images to spatially normalize them to MNI space.

Semiquantitative analysis of amyloid PET

To measure specific tracer retention, standardized uptake value ratio (SUVR) images were calculated from the spatially normalized summed 18F-flutemetamol images and from the spatially normalized summed 11C-PIB images (voxel size 2 × 2 × 2 mm3) with the cerebellar GM used as reference region. The spatially standardized volumes of interest (VOIs) were identical for 18F-flutemetamol and for 11C-PIB image analysis. The cerebellar GM was defined based on the automated anatomical labelling (AAL) atlas (areas 91–108) and masked inclusively with subject-specific GM maps, with the threshold for masking set at > 0.3 [16, 17]. This reference region was used both for 18F-flutemetamol and for 11C-PIB images. As a secondary analysis, we also used the pons as a reference region: this region was manually drawn on the SPM8 T1 template (13 axial slices of 2 mm) and then for each individual it was corrected to match the subject-specific anatomical boundaries of the pons based on the subject’s spatially normalized MR image.

Our primary PET outcome measure was the mean SUVR in the composite cortical VOI (SUVRcomp) with cerebellar GM as reference region. The composite VOI consisted of five bilateral cortical areas [16, 17]. The spatially standardized VOIs were based on the AAL template. We also calculated mean SUVR in each of these regions separately and additionally in medial temporal cortex (AAL 37–42), occipital cortex (AAL 43–54) and striatum (AAL 71–74). The AAL VOIs were masked inclusively with subject-specific GM maps, with the threshold for masking set at > 0.3 [16, 17]. Mean SUVR was also estimated in subcortical WM (SWM), which was defined based on subject-specific WM maps thresholded at > 0.5.

The cut-offs for SUVRcomp for binary classification were defined based on independent data sets re-analysed using the MRI-based PET analysis method described above. The cut-offs were defined based on the statistical distance between the AD group and the healthy controls as described in Vandenberghe et al. [10], that is: \( factor=\frac{Mean\kern0.5em SUV{R}_{comp}\kern0.2em AD- Mean\kern0.5em SUV{R}_{comp}\kern0.2em HC}{SD\kern0.5em SUV{R}_{comp}\kern0.2em AD+SD\kern0.5em SUV{R}_{comp}\kern0.2em HC} \), SUVR comp  cut − off = Mean SUVR comp  AD − factor × SD SUVR comp  AD.18F-flutemetamol cut-off was estimated based on the Vandenberghe et al. data set [10] and was equal to 1.38. The 11C-PIB cut-off was calculated based on 37 clinically probable AD subjects and 23 age-matched healthy controls (data sets from Nelissen et al. [18], Vandenberghe et al. [10] and Ahmad et al. [19]) and was equal to 1.22. Note that the used 1.38 18F-flutemetamol cut-off is lower than the cut-off defined by Vandenberghe et al. [10] or Thurfjell et al. [20] for a purely PET-based approach, probably due to exclusion of more WM signal by the current MRI-based method in the amyloid-negative cases. We also verified our binary case classification using the purely PET-based method with narrow VOIs and SUVR cut-offs with reference to cerebellar GM as used by Thurfjell et al. [20]. For this method, the cut-off with the neuropathological modified Consortium to Establish a Registry for Alzheimer’s Disease score as standard of truth was 1.57 [20].

As a further secondary analysis, we performed a semiquantitative analysis based on partial volume corrected data. PVC was based on the MRI using the modified Müller-Gärtner method [16, 17, 21].

Visual reads

18F-Flutemetamol and 11C-PIB scans were visually evaluated by three independent readers blinded to all subject information: two certified nuclear medicine physicians (reader 1 K.V.L, reader 2 K.G.) and a certified psychiatrist (reader 3 M.V.), experienced in reading amyloid scans. All readers had successfully completed the GE Healthcare electronic reader training programme for 18F-flutemetamol images. The visual read was done on summed orthogonal PET images in native space, scaled to the image maximum intensity value and displayed with a modifiable rainbow (National Institute of Health) colour scale. Each reader received an individually randomized list of 18F-flutemetamol and 11C-PIB images which were evaluated in separate sessions. Readers were asked to assign scans as positive or negative and to rate their overall confidence in classifying the image on a scale from 1 to 5 (5 being the highest confidence). The final assignment was based on a majority verdict.

Statistical analysis

The primary analyses were intended to evaluate in cognitively intact older adults:

  1. 1.

    The concordance between binary classification based on 18F-flutemetamol versus 11C-PIB according to semiquantitative SUVRcomp assessment.

  2. 2.

    The concordance of binary visual reads of 18F-flutemetamol versus 11C-PIB.

  3. 3.

    The correlation between 18F-flutemetamol and 11C-PIB SUVRcomp. Normality of data distribution was tested using the Shapiro-Wilk test. Correlations were evaluated using (a) Spearman rank correlation coefficients ρ if distributions deviated from normality and (b) slopes of linear regression m. The agreement between 18F-flutemetamol and 11C-PIB SUVRcomp was tested by a Bland-Altman analysis [22].

The secondary analyses evaluated in cognitively intact older adults:

  1. 1.

    The agreement between readers of the visual classification. This was analysed by means of Fleiss’ kappa (κ).

  2. 2.

    Readers’ confidence in visual binary classification of the PET scans. This was analysed by three-factor repeated measures analysis of variance (ANOVA), with reader (three levels: reader 1 vs 2 vs 3) and tracer (two levels: 18F-flutemetamol vs 11C-PIB) as within-subject factors and concordance of binary visual reads (two levels: concordant vs discordant) as between-subject factor.

  3. 3.

    The correlation between 18F-flutemetamol and 11C-PIB SUVR in a set of nine separate regions.

  4. 4.

    The correlation between 18F-flutemetamol and 11C-PIB SUVR values using the pons as reference region.

  5. 5.

    The correlation between partial volume corrected 18F-flutemetamol and partial volume corrected 11C-PIB SUVR values.

Statistical analyses were performed in Statistica 11 (http://www.statsoft.com/) and Matlab R2013b (http://www.mathworks.com).

Results

Regional and composite SUVR values of 18F-flutemetamol (W = 0.68–0.86, p < 0.002) and 11C-PIB (W = 0.69–0.87, p < 0.006) were not normally distributed. Therefore, in the subsequent analyses we used Spearman’s ρ coefficient.

Binary classification based on semiquantitative cut-offs was concordant between 18F-flutemetamol versus 11C-PIB in 94 % of the cases (Fig. 1a). Based on 18F-flutemetamol SUVRcomp, 5 of 32 subjects (16 %) were assigned to the amyloid-positive category (Figs. 1a and 2). Based on 11C-PIB SUVRcomp, 7 of 32 subjects (22 %) were assigned to the amyloid-positive category (Figs. 1a and 2). Two cases were discordant between 18F-flutemetamol and 11C-PIB (cases 8 and 31, Figs. 1a and 2). These subjects were assigned to the amyloid-negative category based on 18F-flutemetamol and to the amyloid-positive category based on 11C-PIB.

Fig. 1
figure 1

Concordance between binary semiquantitative (a) and visual (b) classifications of 18F-flutemetamol and 11C-PIB scans. a Dashed lines = SUVR cut-offs. b Red = positive scan; green = negative scan. Values in red and green cells = confidence levels of the readers

Fig. 2
figure 2

Representative summed PET images of the discordant cases between 18F-flutemetamol and 11C-PIB scans based on semiquantitative and visual classification. For the sake of comparison we also displayed two positive cases who were concordantly classified by semiquantitative and visual approach. Brain sections show axial slices at -4, 10, 24, 38 MNI z coordinates. On the right side of the brain sections SUVRcomp values (at the top) and results of visual reads (VIS R, at the bottom, + positive scan, − negative scan) are shown. Images are scaled to a maximum intensity in an image

Concordance of binary visual reads between tracers was 84 % (Fig. 1b). Based on 18F-flutemetamol scans 3 of 32 subjects (9 %) were assigned to the amyloid-positive category (Figs. 1b and 2). According to 11C-PIB scans, 6 of 32 subjects (19 %) were assigned to the amyloid-positive category (Figs. 1b and 2). Of five discordant cases, one subject was read as positive for 18F-flutemetamol but negative for 11C-PIB (case 24, Figs. 1b, 2 and 3), and four were read as positive for 11C-PIB but negative for 18F-flutemetamol (cases 17, 19, 29 and 31, Figs. 1b and 2). Fleiss’ κ for inter-reader agreement was 0.86 for 18F-flutemetamol and 0.93 for 11C-PIB.

Fig. 3
figure 3

Detailed view of case 24. Left upper corners show MNI coordinates. Right upper corners show brain orientation. Images are scaled to a maximum intensity in an image

When we analysed readers’ confidence in visual classification of 18F-flutemetamol and 11C-PIB scans, we found a significant main effect of reader (F 2,60 = 12.3, p = 0.00003): readers 1 (r1) and 3 (r3) were more confident than reader 2 (r2) (r1 > r2 p = 0.0001, r3 > r2 p = 0.0006) (Fig. 4a). We also found a significant main effect of concordance of visual classification: readers were more confident when classifying concordant cases compared with discordant cases (F 1,30 = 5.1, p = 0.03) (Fig. 4b). No other effects were found.

Fig. 4
figure 4

Analysis of readers’ confidence in visual binary classification of 18F-flutemetamol and 11C-PIB scans. Main effect of reader (a). Main effect of concordantly versus discordantly classified cases (b)

18F-Flutemetamol and 11C-PIB SUVR values were highly correlated in the composite cortical VOI, in all neocortical VOIs and in SWM (Table 2; Fig. 5). The correlations in striatum and medial temporal cortex were weaker (Table 2; Fig. 5). The slopes of linear regression were close to 1 in all neocortical regions and SWM (Table 2; Fig. 5). Slopes in striatum and medial temporal cortex were lower (Table 2; Fig. 5). The Bland-Altman analysis [22] showed a good agreement between 18F-flutemetamol and 11C-PIB SUVRcomp, with a systematic bias towards higher 18F-flutemetamol SUVR values (Fig. 6).

Table 2 Region-wise correlations between 18F-flutemetamol and 11C-PIB SUVRs for different analysis methods
Fig. 5
figure 5

Regional correlations between 18F-flutemetamol and 11C-PIB SUVRs

Fig. 6
figure 6

Agreement between 18F-flutemetamol and 11C-PIB SUVRs based on Bland-Altman analysis

When we applied the Thurfjell et al. [20] PET-based processing method and autopsy-derived SUVR cut-off with reference to cerebellar GM, 4 of 32 18F-flutemetamol scans (13 %) were classified as amyloid positive. In three cases, the MRI-informed and the purely PET-based processing methods yielded discordant classification: two subjects were classified as amyloid negative based on the PET-based method and as amyloid positive based on the MRI-informed method (cases 19 and 29, Figs. 1b and 2, SUVRcomp based on purely PET-based processing method 1.45 and 1.47, respectively), one subject showed the inverse pattern (case 31, Figs. 1b and 2, SUVRcomp based on purely PET-based processing method 1.61). The correlation between 18F-flutemetamol and 11C-PIB scans analysed by the purely PET-based processing method was high in the composite cortical VOI and in all neocortical VOIs (Table 2).

When the pons was used as reference region, the correlation between 18F-flutemetamol and 11C-PIB SUVRs was weaker in the composite cortical VOI and in neocortical VOIs. This difference was statistically significant in the lateral parietal VOI (p = 0.007) (Table 2). The correlation however was stronger in striatum, medial temporal cortex and SWM when the pons was used as reference region than when cerebellar GM was used. This difference was statistically significant in SWM (p = 0.004) (Table 2). With the pons as reference region, the slopes of linear regression were close to 0.5 and were significantly less steep than with cerebellar GM as reference region in all VOIs (p < 0.0001) except for SWM where the slope was 0.9 (Table 2).

PVC did not significantly alter ρ and slopes for the correlation between 18F-flutemetamol and 11C-PIB SUVR values in the composite cortical VOI or any of the neocortical VOIs (Table 2). In striatum, medial temporal cortex and SWM, PVC improved ρ or slope significantly (Table 2).

Discussion

To our knowledge this is the first study comparing 18F-flutemetamol to 11C-PIB in a cohort consisting exclusively of cognitively intact older adults, without patients with cognitive deficits. Our results provide evidence for a close correspondence between the two amyloid tracers even at this preclinical stage.

We detected a few more amyloid-positive cases with the 11C-PIB scan (7 of 32) than with the 18F-flutemetamol scan (5 of 32). This differs from previous comparisons that included only AD and MCI [10] or AD, MCI, together with healthy controls [11], where concordance between 18F-flutemetamol and 11C-PIB was 100 %. In the Hatashita et al. study [11], the cut-offs for semiquantitative assessment were not defined independently from the test sample and this may also have contributed to this complete concordance. In our study the cut-offs were based on independent data sets.

The values in the discordant cases in our study were around the cut-off, except for case 24 (see below). Near-threshold values may render the binary division between amyloid-positive and amyloid-negative cases in cognitively normal individuals more difficult. Amyloid accumulation is a progressive process and the amyloid-positive cases are distributed over a range of continuous values rather than bimodally. Hence, among cognitively normal controls a binary classification into positive and negative subjects is somewhat artificial. Individuals with the intermediate amyloid levels may either remain at this level or may be heading towards further pathological amyloid aggregation [23]. Subjects around the cut-off may be accumulating amyloid at a higher rate than those subjects who are further removed from the cut-off [23] and in this sense may be of special interest for potentially disease-modifying drug trials. To investigate the prevalence and the meaning of these cases with sufficient power, a joint longitudinal approach including different centres would be necessary. In such an approach a standardized quantification of amyloid deposition, such as the Centiloid Scale [24], would facilitate the comparison. Values close to threshold probably explain the higher rate of discordance in visual reads in our study compared to what has been found in AD and MCI [10, 11]. We however met one exception: in case 24, 18F-flutemetamol SUVR was far removed from the cut-off and nevertheless the 18F-flutemetamol scan was read as positive by all three readers with relatively high confidence. When evaluating this scan in retrospect, the outcome of the read may have been determined by the fact that tracer retention was similarly low in WM and in neocortex. As a consequence, the pattern of gyral indentation was lost and the cortical surface relatively even. This was not true for the 11C-PIB scan. The similarity in 18F-flutemetamol retention between neocortex and WM and the even appearance of the surface may have led to the positive read despite the low neocortical SUVR. This underscores the usefulness of semiquantitative measures when evaluating normal control 18F-flutemetamol scans. The overall confidence of readers in visual evaluation of scans was high; however, the confidence of all readers was lower when evaluating discordant cases compared with concordant cases. This indicated that a subset of scans in this population is particularly difficult to read.

As a further difference with previous comparative studies [10], the correlation between 18F-flutemetamol and 11C-PIB SUVR values in SWM (Table 2) was higher than previously observed (in [10] r = 0.36). The definition of the WM VOI may have been more accurate in the current study as it was based on the MRI. A WM VOI that is defined based on PET may be affected by spillover between GM and WM and this may differ between 18F-flutemetamol and 11C-PIB, yielding lower correlations in previous studies [10].

We also evaluated how differences in the analysis method affected the concordance and the correlation with 11C-PIB. PVC did not substantially alter correlations between 18F-flutemetamol and 11C-PIB in neocortical VOIs, but affected the correlation in striatum and medial temporal cortex in a positive sense. The latter area is known to be particularly susceptible to partial volume effects. Second, using the pons as a reference region resulted in substantially lower correlations between 18F-flutemetamol and 11C-PIB in neocortical regions (Table 2). Only in striatum, medial temporal cortex and SWM did the pons as a reference region yield better correlations (Table 2). Finally, the correlations of SUVRs were the same when 18F-flutemetamol and 11C-PIB scans were analysed with a purely PET-based method [20] compared to our MRI-based method [16] (Table 2). It however is worth noting that the slopes for the correlations between 18F-flutemetamol and 11C-PIB were substantially lower for the purely PET-based method than when MRI was used to independently define the regions to be used for analysis of the two PET modalities.

Practical implications

The FDA and EMA approvals of amyloid imaging are for visual reads and are restricted to patients with cognitive decline. For research use in cognitively intact individuals, our findings suggest that semiquantitative assessment would be preferable above visual reads. In cognitively intact older individuals cerebellar GM would be the preferred reference region compared with the pons. PVC would be advantageous for evaluation of medial temporal cortex and subcortical regions. Concordance between 18F-flutemetamol and 11C-PIB was better when regions were based on MRI rather than for PET-based regions.

Conclusion

Our study of amyloid markers in asymptomatic older adults provides evidence that semiquantitative measures of 18F-flutemetamol with cerebellar GM as a reference are closely similar to what one would obtain if 11C-PIB was used, in particular if MRI is used to define the regions of interest. Concordance for visual reads tended to be less convincing in this population.