Introduction

Positron emission tomography (PET) with 11C-labelled Pittsburgh compound B ([11C]PIB) has been shown to enable the quantitation and visualization of the accumulation of β-amyloid (Aβ) in the brain of patients with Alzheimer’s disease (AD) [14]. However, often in human PET studies with [11C]PIB, data analysis has been based on traditional manual region-of-interest (ROI) analysis (see, for example, references [1] and [4]), which is laborious and liable to operator-induced bias. Using modern analysis techniques including voxel-based quantitation and statistical parametric mapping (SPM), the regions of statistically significant differences in tracer uptake can be visualized as a demonstrative brain map, and using additional automated ROI analysis of parametric images, tracer uptake can be regionally quantified, making individual manual ROI analysis unnecessary.

It has recently been demonstrated that automated voxel-based image analysis methods are feasible for [11C]PIB imaging data [2, 5]. In addition, a recent study has demonstrated that the region-level reproducibility of automated ROI analysis with several modelling approaches of [11C]PIB quantitation is generally excellent [6]. Knowledge of the region-level reproducibility is important when evaluating whether conventional quantitation using manual ROI analysis can be replaced by automated ROI analysis conducted in a standardized brain space [2, 5, 6]. On the other hand, the test-retest reproducibility of the quantitation of [11C]PIB uptake at the voxel level, that is, at the level of individual volume elements (voxels) of images, is crucial when evaluating the feasibility of voxel-based statistical image analysis methods such as SPM, where statistical testing is carried out independently for each voxel. Automated voxel-based image analysis methods have been utilized in several studies on [11C]PIB accumulation (see, for example, references [2] and [5]), but the test-retest reproducibility of these analysis methods at the voxel level have not been studied previously. Automated ROI analysis has certain advantages over traditional manual ROI delineation. It avoids investigator-induced bias and also reduces workload especially in studies with large numbers of subjects.

[11C]PIB PET is currently used in follow-up studies of AD progression and in therapeutic trials aiming to reduce brain amyloid load or accumulation. For such studies with repeated [11C]PIB measurements, information on the reproducibility of the analysis and quantitation method is of great importance. Furthermore, during disease progression, AD patients often become restless and it may be difficult for them to lie still in the PET scanner. Therefore, a short and simple imaging protocol would increase feasibility and subject comfort and compliance, and also lead to better quality of the imaging data. Recent studies indicate that simplified quantitation using region-to-cerebellum ratios of late-scan standardized uptake values (SUVs) is valid for [11C]PIB PET [7, 8], and reports have also shown that the region-to-cerebellum ratio is a useful approach in clinical group comparison studies [13, 9].

The aim of the present study was to evaluate the reproducibility and validity of fully automated image analysis techniques, including voxel-based quantitation of [11C]PIB uptake, fitting of imaging data in a standard brain space and automated ROI analysis in a test-retest setting with repeated [11C]PIB PET scans. Traditional manual ROI analysis was used as a comparison method. To evaluate the potential use of short and cost-effective imaging protocols, which would be convenient for AD patients, the accumulation of [11C]PIB was quantitated as region-to-cerebellum ratios during only 30 min of data acquisition (60 to 90 min after tracer injection). In addition, based on the estimated variability in the quantitation of [11C]PIB uptake, power calculations were performed to estimate the number of subjects required to reveal statistically significant differences in [11C]PIB uptake changes in a parallel-group design with repeated measurements.

Materials and methods

Subjects

The study was approved by the Ethics Committee of Southwest Finland Hospital District. Six patients with clinically diagnosed AD and four healthy control subjects were recruited for repeated PET brain scans with [11C]PIB. An interview and a physical examination were performed in order to exclude other conditions such as neurological or psychiatric disorders. Structural MRI of the brain was also performed to find possible brain lesions and to aid in the image analysis. All subjects and their carers, when applicable, gave written informed consent prior to the study. An intravenous line was inserted into each subject’s forearm for the [11C]PIB injection on both PET scanning occasions.

Four of the six AD patients and three of the four control subjects were female. At the time of the first scan, the average age of the AD patients was 71 years (range 59–79 years, SD 7.6), and that of the healthy controls was 64 years (range 51–71 years, SD 9.2). The average Mini-Mental State Examination (MMSE) score of the AD patients was 24.8 (range 23–27, SD 1.8), and that of the healthy controls was 28.3 (range 28–29, SD 0.5). All AD patients were on cholinesterase inhibitor treatment throughout the study (two on galantamine and four on rivastigmine). AD had been diagnosed 1.5–4 years (mean 2.6 years, SD 1.0) prior to the study.

Radiochemistry

[11C]PIB was produced by the reaction of 6-OH-BTA-0 and 11C-methyl triflate as described previously [2]. The mean±SD specific radioactivity of [11C]PIB at the time of administration was 31.9±4.7 MBq/nmol for the AD patients, and 31.3±5.0 MBq/nmol for the healthy controls. The radiochemical purity of the tracer was more than 95% in all PET studies, with an average of 97%.

Scanning protocol

The imaging protocol has been described in detail previously [2]. In brief, all subjects underwent an established 90-min dynamic [11C]PIB PET scan with a GE Advance PET scanner (General Electric Medical Systems, Milwaukee, WI) in the 3-D scanning mode (septa retracted), providing 35 slices of 4.25 mm thickness covering the whole brain. The full-length scan was performed so that no data that could be of interest in the future would be discarded, although the plan was to use only the last 30 min of scanning data for the image analyses. Before injection of the radiotracer, an 8-min transmission scan with 68Ge rod sources was performed for attenuation correction. [11C]PIB was injected into an antecubital vein and flushed with saline. The mean injected activities for the AD patients were 322 MBq (range 295–345 MBq, SD 19.2) for the first scan and 301 MBq (range 265–331 MBq, SD 23.4) for the second scan. For the healthy controls, the mean injected activities were 301 MBq (range 289–311 MBq, SD 10.1) and 275 MBq (range 236–309 MBq, SD 30.8), respectively.

The second PET brain scan was performed within 102 days (mean 34 days, SD 30, for the AD group; mean 58 days, SD 27, for the control group) of the first scan in each subject. Each 28-frame scan sequence consisted of four 30-s frames, nine 1-min frames, three 3-min frames, ten 5-min frames and two 10-min frames, but as planned, only the last 30 min (60–90 min) were used in the image analyses of this study. Although the transmission scan was performed before tracer injection, it was considered to anatomically correspond well with the late-scan data. This assumption was reasonable because head movement was controlled during scanning by monitoring landmarks on the subject’s face. No blood sampling was performed during the scan. Imaging data were reconstructed into a 128×128 matrix using a transaxial Hann filter with a 4.6-mm cut-off and an axial ramp filter with an 8.5-mm cut-off.

Brain MRI scanning was performed with a Philips Intera 1.5-T scanner (Philips, Best, The Netherlands). T1-weighted 3-D scans with a voxel size of 0.50×0.50×1.00 mm were obtained.

Image preprocessing

The preprocessing of the [11C]PIB PET imaging data was performed using the Statistical Parametric Mapping [10] software version 2 (SPM2) and Matlab 6.5 for Windows (Mathworks, Natick, MA). For the spatial normalization of the imaging data, a ligand-specific template for [11C]PIB was generated from 14 [11C]PIB scans from another study sample using MRI-aided procedures [11] as described in more detail previously [2], using SPM99. The template was created using [11C]PIB scans from age-matched healthy control subjects (n=5), patients with mild cognitive impairment (n=4), and patients with AD (n=5) in an effort to eliminate group-related bias in spatial normalization. The spatial normalization would be most accurate when using group-specific templates. This would, however, lead to systematic differences in spatial normalization between groups and would invalidate group comparisons. Furthermore, we were interested in evaluating the use and reproducibility of a simple, fast and cost-effective image analysis procedure.

The template was created with the following steps. First, individual MR images were coregistered to summated (60–90 min) [11C]PIB images and then the MR images were normalized using a T1-weighted MRI template delivered with SPM to obtain normalization parameters. Using the acquired parameters, [11C]PIB summated images were normalized and written onto the template bounding box. To ensure the symmetry of the [11C]PIB template, the mean image of these normalized images was averaged with its mirror copy (left–right flipped image). Finally, this symmetrical [11C]PIB template was smoothed using an 8-mm gaussian kernel, as 8-mm smoothing is used for the estimation of the normalization parameters in SPM.

Quantitation of [11C]PIB uptake

The uptake of [11C]PIB was quantitated at the voxel level using the region-to-cerebellum ratio which is identical to the SUV ratio (SUVR; [7]). This simplified quantitation enables the utilization of a short 30-min image acquisition, which has been shown to be valid for [11C]PIB [7]. The calculation of parametric images was performed using a fully automated procedure conducted in a common stereotactic space with the following phases. At first, summated (60 to 90 min) [11C]PIB images were normalized using a [11C]PIB template and written using bilinear interpolation. To calculate parametric images, the radioactivity concentration over 60 to 90 min in the cerebellar cortex was calculated from these normalized images using automated ROI analysis as described below. Finally, parametric images representing [11C]PIB region-to-cerebellum ratios in each voxel were calculated by dividing normalized summated images by the cerebellar radioactivity concentration values.

Automated ROI analysis

Standardized ROIs were defined on the MRI template image representing brain anatomy in accordance with the MNI space (Montreal Neurological Institute database) utilized in SPM. To ensure a common stereotactic space, the mean image of 12 spatially normalized MR images was used instead of a single MR image. Separate gray and white matter templates were not considered necessary for the simplified methodology used in this study. The ROIs were delineated using Imadeus software (version 1.50; Forima, Turku, Finland) on the anterior and posterior cingulate cortices, lateral prefrontal and lateral temporal cortices, mesial temporal lobe (including amygdala, hippocampus and parahippocampal gyrus), occipital and inferior parietal cortices, caudate nucleus and putamen in the MR image. The ROIs are shown in Fig. 1. The left and right sides of each brain region were averaged. For reference purposes a ROI was drawn on the cerebellar cortex (to avoid contamination by white matter). The average regional region-to-cerebellum ratios were calculated using these ROIs on spatially normalized parametric images. In order to avoid quantitative bias (underestimation) in the estimation of ratio values, automated ROI analysis was performed without any smoothing of the images. Finally, the regional values were subjected to evaluation of reproducibility as described below (Fig. 2).

Fig. 1
figure 1

ROIs used in this study: a cerebellar cortex; b lateral temporal cortex and mesial temporal cortex; c lateral frontal cortex, caudate nucleus, putamen, lateral temporal cortex and occipital cortex; d posterior cingulate and parietal cortex; e white matter

Fig. 2
figure 2

Number of subjects needed per study group in relation to the between-group difference in the within-group relative change of [11C]PIB uptake in the lateral frontal cortex. The curves are for 90% power and for 80% power

Manual ROI analysis

Traditional manual ROI analysis was performed for purposes of comparison. [11C]PIB PET image pairs were realigned (within-subject) with SPM2, and the individual MR images were coregistered with the individual mean [11C]PIB images (mean of realigned images). ROIs were drawn on the same anatomical regions as listed in the section Automated ROI analysis. Imadeus software was utilized for manual delineation of ROIs on the coregistered MR images and for calculating the radioactivity concentrations from realigned [11C]PIB PET images. The regional uptake of [11C]PIB was quantitated as region-to-cerebellum ratios during the 60–90 min interval by dividing regional radioactivity concentration values by the cerebellar values.

Evaluation of reproducibility

To assess the reproducibility of [11C]PIB uptake measurements, absolute variability (VAR) values were calculated as follows:

$$ VAR = \frac{{\left. {\left| {{\text{scan}}2 - {\text{scan}}1} \right.} \right|}}{{0.5\left( {scan1 + scan2} \right)}} \times 100\% $$

Where scan1 and scan2 refer to parameter estimates of interest (region-to-cerebellum ratios) in the test and retest scans, respectively. In addition, test-retest reproducibility was evaluated in terms of intraclass correlation coefficients (ICC):

$$ ICC = \frac{BSMS - WSMS}{{BSMS + \left( {n - 1} \right)WSMS}} $$

Where BSMS is the between-subject mean square, WSMS is the within-subject mean square, and n is the number of repeated observations (in this case, n=2). The ICC can have values between −1 and 1; values close to 1 indicate that most variance is due to between-subject rather than within-subject variation, whereas values below zero imply greater within-subject than between-subject variation.

For the evaluation of reproducibility at the voxel level, VAR and ICC values were calculated using the above formulas for each voxel in the MNI space separately from the region-to-cerebellum ratio images that were prepared for SPM analysis, that is, spatially normalized and smoothed using a 12-mm gaussian kernel. The calculations produced 3-D maps representing the spatial distribution of ICC and VAR values at the voxel level in the MNI brain space. The average values of ICC and VAR at the voxel level in each brain region were calculated using the ROIs included in the automated ROI analysis. In addition, to evaluate the reproducibility of the automated ROI analysis, region-level ICC and VAR values were calculated directly from the nonsmoothed spatially normalized parametric images. Smoothing was omitted to avoid underestimation in quantification (see Automated ROI analysis). To achieve more detailed results concerning the reproducibility in the two groups of subjects, reproducibility at the voxel level and region-level was evaluated separately for the AD patients and the controls.

Student’s t-test for paired samples was used to test for differences in the parameters of interest between scan 1 and scan 2. P<0.05 was considered statistically significant. The analyses were performed using SPSS for Windows (release 13.0.1, SPSS, Chicago, IL).

To estimate sample sizes required to reveal statistically significant differences in [11C]PIB uptake in a parallel-group design with two measurements (for instance comparison of the effect of active treatment vs placebo in [11C]PIB uptake in AD patients), power calculations were performed using nQuery Advisor v7.0. (Statistical Solutions, Cork, Republic of Ireland). It was assumed according to the values of the present study, that the average baseline level of [11C]PIB uptake ratio would be 1.505 units for the lateral frontal cortex and that the standard deviation would be 0.08 units. The sample sizes were evaluated for differences between two parallel groups in the change from the baseline value to the post-intervention value (e.g. a change of 15% equal to 0.226 units in one group compared with a change of 0% in another group). The sample size calculations were made for 80% and 90% power and they were evaluated using a t-distribution with a two-sided significance level of 0.05.

Results

The regional estimates of [11C]PIB uptake and the reproducibility estimates of the three analysis methods employed are given in Tables 1 and 2. Regional estimates of [11C]PIB accumulation calculated using automated ROI analysis were generally in good agreement with values obtained using manual ROI analysis.

Table 1 Regional estimates of [11C]PIB uptake (mean±SD) and VAR and ICC values in the AD patients (n=6)
Table 2 Regional estimates of [11C]PIB uptake (mean±SD) and VAR and ICC values in the control group (n=4)

In general, the reproducibility of [11C]PIB uptake measurements was excellent in both groups, independent of the analysis method. In the AD group, the reproducibility of automated ROI analysis (average VAR 4.3%, range 3.3–5.1%) was slightly better than that of manual ROI analysis (average VAR 6.6%, range 5.5–9.9%) and reproducibility at the voxel level was intermediate (average VAR 5.3%, range 4.9–6.2%). In the control group, the differences between methods were even smaller. The reproducibility of automated ROI analysis (average VAR 3.5%, range 0.9–5.5%) was practically the same as that of manual ROI analysis (average VAR 3.2%, range 1.9–4.8%) and reproducibility at the voxel level was slightly inferior (average VAR 5.3%, range 4.2–6.4%). There were no statistically significant differences between the VAR estimates obtained by the different analysis methods. The ICC of the automated ROI and voxel-based analyses varied between 0.69 and 0.97 in the AD group and 0.52 and 0.99 in the control group. The ICC values of the three different analysis methods in individual brain regions are given in Tables 1 and 2.

Power calculations indicated that 90% power can be achieved using a sample size of five subjects per group when a 15% change from baseline (increase or decrease) in [11C]PIB accumulation in the frontal cortex is anticipated in one group compared to no change in another group. A graphical illustration of the relationship between sample size and difference in the within-group relative change of [11C]PIB accumulation is presented in Fig. 2.

Discussion

Voxel-based image analysis techniques are frequently used because they enable automated and objective image analysis and provide illustrative visualization of the results (see, for example, references [2] and [5]). Unlike conventional manual ROI analysis, voxel-based methods such as SPM can cover the whole brain space, making it very useful in explorative studies without any a priori hypotheses concerning the location of the differences or treatment effects. As manual ROI analysis is laborious and liable to operator-induced errors and bias, there is a need for automated and more objective quantitative methods. Recent studies have indicated that automated ROI analysis of spatially normalized parametric images is feasible for [11C]PIB [2, 5, 12] as well as for several other PET tracers (see, for example, references [1315]).

Since voxel-based methods are based on image analysis conducted in a common stereotactic brain space, they require spatial normalization that fits individual images into the common space. Although the images are smoothed before statistical analysis, the accuracy of spatial normalization is crucial as statistical testing is carried out at the level of individual voxels. It has been demonstrated that spatial normalization with a ligand-specific template can provide more accurate spatial normalization than MRI-aided spatial normalization [11], probably because inaccuracies in a between-modality (PET-MRI) coregistration can be avoided. However, it is open to discussion whether corrupting effects of gray matter atrophy or the white matter binding of [11C]PIB would possibly be better excluded by using MRI-aided normalization in studies in patients with AD. In any case, the purpose of this study was to evaluate the reproducibility of [11C]PIB uptake applying the most simplified methods of analysis, and therefore in this study spatial normalization was performed using a ligand-specific [11C]PIB template. The main aim of this work was to assess whether voxel-level simplified quantitation of 30-min [11C]PIB PET scanning data and fully automated image analysis can provide reproducible and valid [11C]PIB uptake estimates.

The regional binding estimates derived from voxel-based quantitation and automated ROI analysis were highly consistent with the region-to-cerebellum ratios achieved by manual ROI analysis. In addition, the region-level reproducibility of automated ROI analysis appeared to be at least as good as the reproducibility of manual ROI analysis. The spatial normalization of image pairs (scans 1 and 2) was performed separately using a ligand-specific [11C]PIB template, that is, without PET-PET realignment which is required in the manual ROI analysis of repeated PET measurements. The good reproducibility of automated ROI analysis indicates that individually performed spatial normalization is feasible and the accuracy is comparable with realignment between images acquired with an interval of several weeks. Altogether, the good reproducibility and validity of the automated ROI estimation argue for the use of automated ROI analysis instead of conventional manual ROI analysis. The use of automated analysis avoids investigator-induced bias and also reduces workload.

To evaluate reproducibility at the voxel level, VAR and ICC maps were calculated from parametric [11C]PIB images representing the spatial distribution of region-to-cerebellum ratio values. The small VAR values indicated that reproducibility at the voxel level is generally equal to that obtained by manual ROI analysis. The ICC values were fairly low in the control group (Table 2). However, this does not mean that within-subject variability would be greater in the controls, but rather that when there is practically no specific tracer uptake, between-subject variation is also very low, which causes the ICC value to decline. It has to be emphasized, therefore, that the VAR values should be considered the primary measure of reproducibility, especially in the control group. The good to excellent reproducibility inferred from the low VAR values indicates that voxel-based analysis with SPM is sensitive even in fairly small samples. Indeed, it has already been shown that voxel-based analysis is feasible for demonstrating increased [11C]PIB uptake in AD patients as compared to controls [2, 5, 12].

In our study, using manual ROI delineation, the VAR values varied from 1.9% to 9.9% between brain areas (mean 6.6%, range 5.5–9.9%, in AD patients; mean 3.2%, range 1.9–4.8%, in controls). Two previous studies have evaluated the reproducibility of manual ROI analysis [7, 16]. In a comprehensive study that compared various simplified [11C]PIB uptake quantitation methods it was found that the test-retest variability of the region-to-cerebellum ratio was among the smallest of the examined analysis methods. The ratios were determined using two time intervals, 40–60 min and 40–90 min after tracer injection, and the average variability values of manual ROI analysis were 5.3% and 5.0%, respectively [7], comparable with the variability estimates of the present study. In another study with a 2-year follow-up of AD patients [16], four AD patients were rescanned with [11C]PIB PET to evaluate the reproducibility of the region-to-cerebellum ratio obtained at 40 to 60 min after injection. The results were calculated with an equation identical to the one used in the present study. Mean VAR values in the cortical areas examined were between 3.2% and 7.3%, and slightly greater in the striatum (12.7%).

In a recent study that compared different parametric reference tissue models and their reproducibility in the quantitation of [11C]PIB uptake, the test-retest variability of the region-to-cerebellum ratios (5.2% in controls and 7.9% in AD patients) was greater than those obtained with other parametric methods [6]. In the present study, however, we were able to evaluate and report the good reproducibility of region-to-cerebellum ratios not only at the regional level but also at the voxel level, and the reproducibilities were of the same magnitude as was seen in the previous study at the regional level for simplified or multilinear reference tissue models [6]. Undoubtedly some parametric methods not based on the region-to-cerebellum ratio may be more accurate in the quantitation of [11C]PIB uptake. However, the previously evaluated reference-input Logan method [6] and the promising recently evaluated masked-volume-wise PCA method without any modelling assumptions [17], for example, require data to be collected from the time of injection to the end of the scan, whereas a scan planned for region-to-cerebellum ratio uptake analysis can be conducted with (a) no blood sampling and (b) even with a shorter scanning protocol. These differences in the required amount of input data underline the practical value of the methods evaluated in this study.

Disease modifying therapies in AD are under clinical investigation. To verify disease-modifying effects using clinical measures (neuropsychological testing, global impression of change, assessment of behavioural symptoms) requires long follow-up and large numbers of patients. Therefore surrogate markers of treatment efficacy would be desirable. The good reproducibility presented here can be interpreted as indicating that [11C]PIB PET could potentially be reliably used in study designs with repeated measurements, such as follow-up studies or therapeutic trials, as one method to monitor therapeutic efficacy . The post hoc power calculations performed by us provide preliminary information on sample sizes required in a parallel-group design. For example, for the comparison of two groups with a treatment response resulting in a 15% change in [11C]PIB uptake in the frontal cortex in one group and no change in another group, samples of only five subjects per group would seem to be sufficient (90% power). This small number of required subjects highlights the relevance of the adequate reproducibility of automated image analysis of [11C]PIB PET data.

The present findings show that good reproducibility can be obtained using only 30 min of [11C]PIB PET data. We emphasize that only simplified quantitation, which was used in the present study, is possible when using only 30 min of PET imaging data. A short imaging protocol would also have important practical implications since it enhances cost-effectiveness of PET scanner and tracer use, as the tracer may be injected and scanning would take place from 60 to 90 min after tracer injection, enabling the division of the same tracer batch among several patients. In addition, a short scan time will improve patient comfort and increase compliance, possibly contributing to better data quality from subjects with more advanced AD. This is important in follow-up studies, for example, where patients become more demented as time goes by. Advanced dementia is accompanied by increasing restlessness and agitation which may make it difficult for the patient to lie still in the scanner for a long period of time. Systematically poorer scanning data at the second imaging time point could lead to bias in follow-up studies.

The choice of image acquisition timing and other methodological issues should still be weighed separately for each study and clinical practice. Ideally, of course, at each PET imaging site using [11C]PIB a test-retest study should be done to evaluate the reproducibility being achieved at that site with the instrumentation at hand, and investigators should also estimate acquired sample sizes separately for each study design. The timing of the transmission scan should also be optimized in future when scanning 60 to 90 min after tracer injection. We note that we performed a dynamic 90-min scan preceded by a preinjection transmission scan and used the last 30 min data for analysis. Ideally one would perform the transmission immediately before scanning session or after it. A comparison of pre- and postinjection transmission scan for brain FDG studies has shown that the results obtained using a preinjection transmission scan agree well with those obtained using a postinjection transmission scan despite the time gap between the preinjection transmission scan and the emission scan [18]. However, such a comparison has not been performed for [11C]PIB. In addition, disregarding the early frames may result in loss of valuable information in some cases, since early frames contain information about blood flow [19].

In conclusion, the results of the present study indicate that the reproducibility of [11C]PIB region-to-cerebellum ratios using simplified fully automated analysis is good to excellent, even when applying only 30 min of imaging data. The reproducibility and validity of the estimates are maintained when quantitation is performed at the voxel level and the analysis is conducted in a standard brain space. Our results suggest that with [11C]PIB PET data, automated voxel-based image analysis is a quick and cost-effective alternative to conventional ROI analysis.