Introduction

Electrocardiography (ECG)-gated myocardial perfusion single-photon emission tomography (SPET) provides valuable information about left ventricular (LV) function. Among various analytic programs for ECG-gated SPET, Cedars-Sinai Quantitative Gated SPECT (QGS) [1] is now widely distributed.

A recent study has indicated that determination of LV volumes by QGS has incremental prognostic value over perfusion information for identifying patients at risk for cardiac death [2]. Thus, it has become increasingly important to understand the accuracy and limitations of QGS in assessing LV volumes and ejection fraction (EF) in individual patients. However, validation studies of QGS in assessing LV function have used only reference methods with limited accuracy [3, 4, 5, 6, 7]. Although other studies have used more reliable reference methods, such as contrast left ventriculography (LVG) [8, 9, 10, 11] or magnetic resonance imaging (MRI) [12, 13, 14, 15, 16, 17], all were based on data from relatively few patients, and uncertainty as to the accuracy of QGS has remained. We therefore performed a meta-analysis of all known data comparing QGS with LVG and with MRI. While such an approach has inherent limitations in terms of non-uniformity of experimental conditions across individual studies, it nevertheless offers statistically enhanced assessment of the accuracy of QGS.

Materials and methods

Search method, eligibility and exclusion criteria

We searched for eligible studies in the MEDLINE database of PubMed (last search updated August 2002). The search was accomplished by combining the terms "QGS", "gated", "SPECT", "SPET", and "LVG", "contrast", "ventriculography" or "MRI", "magnetic resonance imaging".

We collected human studies of LV end-diastolic volume (EDV), end-systolic volume (ESV) and EF determined by QGS (using technetium-99m tetrofosmin or sestamibi) and by LVG or MRI in the same individuals. Data were eligible regardless of subject characteristics and technical parameters used for QGS, LVG and MRI.

Meeting abstracts were excluded. We did not include phantom-only, computer simulation-only or animal studies, or human studies in which echocardiography or radionuclide ventriculography was used as the gold standard for QGS. Data for QGS using thallium-201 or iodine-123-labelled tracers were not included. In addition, we excluded studies confined to specific patient subgroups that could affect the accuracy of QGS, such as a study of individuals with small hearts or immediately after the onset of acute myocardial infarction. We carefully examined the retrieved studies to exclude duplication or overlap. Duplicated or overlapping data were factored in only once in the meta-analysis.

Data extraction and synthesis

We extracted from each pertinent article the characteristics of the study population as well as the technical characteristics of each imaging procedure.

We recorded the EDV, ESV and EF with each imaging modality for each study participant. None of the eligible studies directly indicated numerical data for each individual. Thus, we extracted the relevant data using a scanner and an image analysis program (NIH Image, version 1.62), from images of scatter plots shown in the articles. Finally, we added unpublished data from 25 individuals in our validation study of QGS compared with LVG. Since each pertinent study had evaluated relatively few subjects, the main analysis examined pooled data from individuals across studies. Two independent observers who performed all procedures in duplicate determined the reproducibility of the data extraction. The coefficient of variation (CV) for the repeated measurements was calculated by dividing the SD by the mean of the two values. The root mean square value of these CVs, representing the overall inter-observer variability, was 0.9%.

Statistical analysis

Data were statistically analysed using a software program (StatView for Macintosh, version 5.0). Data are expressed as means±SD except when otherwise indicated. The Kolmogorov-Smirnov test was used to evaluate the normality of the data distribution. LV volumes and their average values measured by two imaging modalities were log-transformed to normalise their distribution. Student's t test, least squares linear regression and Bland-Altman analyses [18] were used to compare variables determined by each imaging modality. The statistical power of the linear regression at α=0.05 was determined using a statistics program [19]. Deviations of the QGS measurements from those obtained by LVG and by MRI were compared using the non-paired t test. A value of P<0.05 was considered significant.

Results

Eligibility of studies and extraction of data

Among the searched articles, 11 comparisons of QGS with LVG [8, 9, 10, 11, 20, 21, 22, 23, 24, 25, 26] and six of QGS with MRI [12, 13, 14, 15, 16, 17] were potentially eligible. Of these, two were excluded: one [25] because it did not evaluate LV volumes, and the other [26] because it assessed subjects only during the super-acute phase after the onset of acute myocardial infarction. Of the remaining publications, one [21] did not present a scatter graph of EF although LV volume graphs were shown. We therefore communicated with the study investigators and obtained EF scatter plots. One study group had published two reports [10, 22], in which 9 of 20 subjects in one report were found among the 22 subjects in the other. We therefore selected the 22 subjects in the earlier report and the 11 additional non-overlapping subjects in the later report. Another group similarly had published two reports [13, 14], from which we selected 20 subjects from the larger study [13] and three additional non-overlapping subjects among 16 in the smaller study [14]. Since the number of subjects in the scatter plots differed from that stated in the text of two studies [12, 17], we included only data from the scatter plots. In all, we analysed information from 301 individuals (276 from nine publications [8, 9, 10, 11, 20, 21, 22, 23, 24] and 25 from our unpublished study) to compare QGS with LVG, and 112 (from six publications [12, 13, 14, 15, 16, 17]) to compare QGS with MRI.

In two studies [13, 14] from the same group, LV volume measurements by QGS had been corrected by multiplying by a phantom study correction factor. These measurements were reverted to uncorrected values for this analysis.

Characteristics of the included studies and imaging techniques

The disease categories of individuals in six of the ten studies that compared QGS with LVG were mixed (Table 1). All of the participants in comparisons of QGS with MRI had coronary heart disease. In each study, 57%–91% of the subjects were males, and the age of the participants ranged mostly from early 50s to late 60s. The proportion of individuals with prior myocardial infarction varied from 8% to 100%. The time interval between QGS and the gold standard was <1 month in 13 of the 16 studies. Of all the study participants, 3.4% had small hearts, defined as <50 ml in EDV measured by any of the imaging modalities.

Table 1. Characteristics of included studies

Tracer was injected into 91.3% of the study subjects at rest before QGS (Table 2). Frame numbers per cardiac cycle varied from eight to 16 (LVG subgroup, 11.2±3.5 frames; MRI subgroup, 11.0±3.9 frames; P=NS). In all studies the acquisition matrix was 64×64 pixels. Critical frequencies of the Butterworth filter ranged from 0.18 to 0.45 cycles/pixel or from 0.34 to 0.52 cycles/cm. Volumes were calculated from LVG using the area length method or Simpson's method applied to single- or biplane projection images. Simpson's formula was applied to contiguous multi-slice images acquired with a gradient-echo pulse sequence for MRI measurements in all six studies.

Table 2. Characteristics of imaging techniques

Comparison of QGS with LVG

The overall correlation was good between QGS and LVG measurements for EDV (r=0.81, SEE=27 ml, P<0.0001), ESV (r=0.83, SEE=18 ml, P<0.0001) and EF (r=0.79, SEE=8.3%, P<0.0001) from the 301 subjects across all ten studies (Fig. 1). However, compared with LVG measurements, the QGS measurements systematically underestimated EDV (LVG, 141±58 ml; QGS, 110±56 ml; P<0.0001), ESV (LVG, 66±43 ml; QGS, 58±42 ml; P<0.0001) and EF (LVG, 55.5%±14.3%; QGS, 51.0%±13.6%; P<0.0001). Bland-Altman plots showed that the deviations (mean±2SD) of QGS and LVG measurements of EDV, ESV and EF were 32±58 ml, 8±37 ml and 4.4%±18.0%, respectively (Fig. 2). The magnitude of the underestimation of EDV by QGS compared with LVG weakly but significantly correlated with heart size (r=0.12, P=0.03, statistical power=0.59) (Fig. 2).

Fig. 1.
figure 1

Correlation between LVG and QGS measurements of EDV, ESV and EF. EDV and ESV are log-transformed.

Fig. 2.
figure 2

Bland-Altman plots of EDV, ESV and EF measured by LVG and QGS. Averages of LVG and QGS measurements of EDV and ESV are log-transformed. The magnitude of differences between LVG and QGS for EDV is weakly but significantly correlated with heart size

We compared EF between LVG and QGS separately in a subgroup according to the number of ECG gating frames. In a subgroup gated at 16 frames (n=96), the EF did not significantly differ between LVG and QGS (LVG, 51.8%±14.0%; QGS, 53.1%±14.3%, P=NS). The subgroup gated at eight frames (n=144) showed a significant difference of 7.6%±17.4% (mean±2SD, P<0.0001).

Comparison of QGS with MRI

The correlation between QGS and MRI measurements for EDV (r=0.87, SEE=34 ml, P<0.0001), ESV (r=0.89, SEE=27 ml, P<0.0001) and EF (r=0.88, SEE=7.2%, P<0.0001) from 112 subjects across six studies was good (Fig. 3). Compared with MRI measurements, the QGS measurements systematically underestimated EDV (MRI, 157±81 ml; QGS, 144±76 ml; P=0.0001) and EF (MRI, 47.8%±15.7%; QGS, 43.5%±14.9%; P<0.0001), but not ESV (MRI, 93±72 ml; QGS, 92±68 ml, P=NS). Bland-Altman plots showed that the deviations (mean±2SD) of QGS and MRI measurements of EDV, ESV and EF were 13±73 ml, 1±58 ml and 4.3%±15.4%, respectively (Fig. 4). The magnitude of the underestimation of EDV and ESV by QGS compared with MRI did not correlate with heart size.

Fig. 3.
figure 3

Correlation between MRI and QGS measurements of EDV, ESV and EF. EDV and ESV are log-transformed

Fig. 4.
figure 4

Bland-Altman plots of EDV, ESV and EF measured by MRI and QGS. Averages of MRI and QGS measurements of EDV and ESV are log-transformed

In a subgroup gated at 16 frames (n=42), the EF did not significantly differ between MRI and QGS measurements (MRI, 36.6%±14.8%; QGS, 35.6%±15.5%, P=NS). A subgroup gated at eight frames (n=70) showed a significant difference in EF of 6.3%±14.6% (mean±2SD, P<0.0001).

Comparison of the discrepancies between LVG and QGS and between MRI and QGS

The QGS measurements of LV volumes differed significantly between the LVG and MRI study subgroups. Nevertheless, we compared the degree of QGS underestimation against LVG with that against MRI, since the magnitude of the underestimation against MRI measurements did not correlate with heart size. We found a greater difference in the former than in the latter with respect to EDV measurements [LVG minus QGS, 32±2 ml (mean±standard error); MRI minus QGS, 13±4 ml; P<0.0001] and ESV measurements (LVG minus QGS, 8±1 ml; MRI minus QGS, 1±3 ml; P=0.003), but not EF (LVG minus QGS, 4.4%±0.5%; MRI minus QGS, 4.3%±0.7%; P=NS).

Discussion

The major finding of this meta-analysis was that QGS systematically underestimated LV volumes, the underestimation being more pronounced in comparison with LVG than with MRI. EF was underestimated by an average of 6–8% only in the subgroup of patients in whom ECG gating was set at eight intervals (i.e. it was not underestimated in those with ECG gating set at 16 intervals). The mechanism underlying the difference between the 8- and 16-interval measurements of QGS EF is related to the smoothing of the LV time-volume curve [1]. The value of the underestimation associated with eight-interval gating was somewhat larger than that previously reported (3.7%) [1].

The potential mechanisms underlying the underestimation of LV volumes may be multifactorial. One potential factor might be related to the accuracy of LVG. Previous studies of MRI have indicated that EDV measurements by LVG are significantly larger than those by MRI [27, 28]. The reported differences between LVG and MRI in measuring EDV (LVG minus MRI) were 31±33 ml [27] or 21±23 ml [28] using short axis MRI, or 18±30 ml [27] using long axis MRI, which are similar to the mean difference of 19 ml between LVG and MRI as shown in the present study. An experimental study of the isolated porcine heart indicated that LV volumetry by MRI accurately measured true ventricular volumes (r 2=0.99, SEE=1.2 ml, mean difference=2.9%±5.0%) [29], whereas contrast LVG overestimated true volumes by an average of 15.9%±10.6% [29]. Other investigators also showed that LVG overestimated the volume of a human heart cast [30]. This evidence suggests that underestimated LV volumes that have been ascribed to QGS may, in fact, be largely due to overestimation of true ventricular volumes by LVG.

There are many technical factors in QGS that influence LV volumes, including the order and critical frequency of the Butterworth filter [1], selection of a reconstruction filter [31], myocardial count density [20], extracardiac background activity [32], matrix size for acquisition [20], zoom factor [33], extent and severity of perfusion defects [7], heart size [33] and radiopharmaceuticals [34]. The present study was a meta-analysis and each factor could not be standardised; however, each effect seems to have been averaged out across individuals from the various studies. Nevertheless, one postulated factor appears to consistently affect volume measurements by QGS [35]. Since QGS defines endocardial borders as 65% of SD inward from mid-myocardial peaks along radial myocardial count profiles [1], myocardial blurring due to degraded system resolution may induce inward misregistration of endocardial borders. With respect to the QGS algorithm, the same percentage value of the SD as determined from a phantom validation study has been applied to human studies [1]. Since the distance at which human hearts are imaged is farther than the distance at which phantoms are imaged, the system resolution of SPET in clinical studies is lower than that in phantom experiments. Indeed, one study has demonstrated that a phantom volume correctly measured by QGS at the nearest position of the detector (132 mm) was underestimated by 7% at the distance required for clinical cardiac studies (220–250 mm) [35]. This is quite similar to the magnitude of underestimation of EDV (8%) in comparison with MRI in the present study.

This study showed relatively good correlation coefficients between QGS and both reference methods in measuring LV volumes and EF. However, correlation coefficients are largely dependent on the range of parameters under investigation, and do not provide estimations of the accuracy of prediction by the methods to be evaluated [18]. More meaningful are the confidence intervals of the difference plots of two methods [18]. The results of the Bland-Altman analysis indicated considerable intervals of the differences, in the order of 100 ml for EDV and 30% for EF. Mechanisms for the variability around the systematic deviations may also be multifactorial and related to QGS as well as reference methods. One study of LVG measurements indicated that variability between two separate examinations for the same patient (inter-study variability) is fairly large (112 ml for EDV and 36% for EF, expressed as the confidence intervals between ±2SD) [36]. The inter-study variability of MRI for EDV and EF has been reported as 63 ml and 20% [37], and that of QGS (in sequential 99mTc-sestamibi imaging) as 28 ml and 11%, respectively [34, 38].

Some limitations must be considered when interpreting our results. Firstly, it remains open whether the values obtained with the reference methods (in particular with LVG) are more accurate than those obtained with QGS or vice versa. This limitation is due to the nature of the study, in which experimental conditions were not uniform and the accuracy of each measurement was dependent on the original investigations. To clarify the true accuracy of QGS, prospective studies may be helpful in which QGS and MRI are examined almost simultaneously, thereby precluding any marked changes in the physiological state of the studied subjects. Secondly, our results are only based on data obtained with 99mTc-labelled tracers and may not be applicable to other tracers. Indeed, our preliminary study of QGS with 201Tl indicated that it underestimates LV volumes more extensively than QGS with 99mTc, which is consistent with a previous report [34].

We conclude that QGS systematically underestimates LV volumes and EF (in patients gated with eight intervals) compared with LVG and MRI. Since it also shows considerable variations around the systematic deviations, there remains uncertainty over whether an individual value determined with QGS approximates the true LV volumes and EF.