Introduction

In developed countries, cardiovascular events are currently the leading cause of death, before malignant neoplasms [1]. In the US, coronary artery disease (CAD) was responsible for more than half (52%) of all deaths in 2004 [2]. Identifying high-risk patients is essential for the prevention of secondary complications of CAD [3]. By means of coronary artery calcium scoring (CAC scoring) [4] using multidetector computed tomography (MDCT), patients can be identified who have a high probability for the existence of a relevant stenosis [5]. For this, MDCT has replaced electron beam tomography (EBT), as the results of the two methods show good agreement [6], MDCT is less prone to artifacts from a high heart rate [7], small amounts of calcium can be detected with greater sensitivity [8], and it is available at most hospitals (3). There are however some disadvantages. The scores are sensitive to changes in image reconstruction parameters [9] and changes in the definition of the scoring parameters [10], and the radiation exposure is greater than in EBT [11]. Since CT coronary angiography (CTCA) is superior to CAC scoring for the detection of stenoses, but a score of >400 can indicate a stenosis that is not detected by CTCA, both methods are currently usually used simultaneously [12]. A previous attempt to achieve CAC scoring together with CT angiography in the same CT data acquisition [13] proved to be unreliable [14].

The objectives of this study are thus to determine whether a reliable calcium volume score can be assessed from the CTCA dataset, and if so, how well it correlates with the Agatston and V130 scores.

Patients and methods

Patients

The run-up to the study included 113 consecutive patients who had been referred for CAC scoring and CTCA between June and December of 2007. The study was conducted subject to the guidelines of the Declaration of Helsinki, and had no influence on treatment. All patients signed a written informed consent form before the examinations. The cohort consisted of 28 women (mean age 61.9 ± 11.9 years; age range 41.8–81.8 years), and 85 men (mean age 59.9 ± 10.9 years; age range 33.4–85.1 years) with no variation in ages (p = 0.21). Criteria for exclusion were contraindication to the administration of contrast medium and status post coronary bypass or stent (n = 11). Patients whose heart rate exceeded 75 bpm were administered up to 10 mg Metoprolol intravenously before the examination. If the heart rate remained > 75 bpm, the examination was not carried out.

Imaging

All examinations were conducted on the same 64-slice MDCT system (Lightspeed VCT XT, General Electric, Milwaukee, USA). The acquisition of the native sequence for CAC scoring was prospectively triggered by ECG in the 75% phase. In the axial mode, detector coverage of 20 mm with a slice thickness of 0.625 mm was used to minimize the cone beam artifact. To achieve a standard slice thickness of 2.5 mm, 4 of the 0.625 mm slices were added together, and in a full rotation 8 slices with a slice thickness of 2.5 mm could be acquired. The display field of view (DOV) was 25 × 25 cm, the matrix was 512 × 512 pixels, with a pixel size of 0.625 × 0.625 mm. Cathode current was 200 mA, the tube voltage was 120 kV. The CTCA was retrospectively triggered in spiral mode. For detector coverage of 40 mm, collimation was set to 64 × 0.625 mm. Pitch was between 0.16 und 0.20 depending on heart rate. The DOV was 18 × 18 cm and the matrix was 512 × 512 pixels with a pixel size of 0.625 × 0.625 mm. For image reconstruction, a standard convolution kernel was selected. The tube voltage was set at 120 kV. The cathode current was variable due to the use of ECG-controlled tube current modulation. In each case, 80 ml Iodixanol 320 with a flow of 6 ml/s was administered (Visipaque, GE Healthcare, Cork, Ireland). For the CAC scoring from the CTCA, the datasets at 75% of the RR interval were used.

Calcium scoring and image analysis

The CAC-score was determined according to the method described by Agatston et al. [4], the volumetric calcium score (V130) according to the method described by Callister et al. [15], each with a treshold of 130 Hounsfield units (HU). The data was analyzed using a workstation from General Electric (AW 4.4, GE, Milwaukee, USA). The “Smartscore” program was used for CAC scoring. The values of the marginal and diagonal branches were included in the analysis and added to the values for the left main, left anterior descending, circumflex, and right coronary artery (LMA, LAD, LCX, RCA). Coronary artery calcifications were defined as more than two adjacent pixels with absorption values of more than 130 HU, and were marked manually. In addition to the Agatston score, the volume score was compiled by automatic multiplication of the CAC area in axial slices by slice thickness. All measurements were carried out independently of each other by two different observers (O1: BH; O2: CS), and repeated at a different time. Vessel analyses of the coronary arteries and their branches were conducted on all patients using the CardIQ Xpress PRO program.

Calcium scoring by means of CTCA

To determine the CAC from the CTCA (Fig. 1a), first a threshold of 160 HU was introduced arbitrarily for the dataset of the 75% phase in the volume rendering (VR) program of the workstation. Using the “keep object” function, the arteries were kept, while the segments of vertebrae, ribs, and sternum included in the images were deleted. The remaining calcium-dense structures in the periphery were removed in a 3D VR view of the heart using the “cut” tool with infinite depth. Then those calcifications that were assigned to other structures such as the aortic valve, as well as catheter (n = 1) or pacemaker electrodes (n = 1) in the axial 0.625 mm slices were removed using the cut tool. After checking whether there were still any dense structures aside from the coronary arteries, an arbitrarily selected threshold of 600 HU was introduced and the remaining calcium volume automatically calculated (Fig. 1b).

Fig. 1
figure 1

a and b: 3D Volume Rendering Reconstruction of the CTCA of the calcified coronary arteries of a 66-year-old man (a) and segmentation of the coronary artery calcifications using a threshold of 600 HU (V600 score). LMA = left main coronary artery; LAD = left anterior descending coronary artery; LCX = left circumflex artery; RCA = right coronary artery

Dose

The effective dose of the CAC and CTCA exams was determined from the dose length product using a weighting factor [16, 17].

Statistical analysis

Descriptive analyses such as calculating the mean value and standard deviations were made using the Excel program (Microsoft, Seattle, Washington, USA).

Distribution analyses were made using the D’Agostino-Pearson test [18]. The remaining statistical analysis was conducted in several steps.

To estimate the magnitude of the differences between the first and second reading for each observer and that of the differences between the first and second observer (arbitrarily first–second reading or observer, respectively) regarding all three scores, the means of the differences between reading 1 and reading 2 and their standard deviations were first determined for all three scores in a very simple intuitive approach and then applied percentually to the mean of each score.

Then scattergrams were made in which the two readings of the two observers were plotted against the averages of the two readings of the two observers to optically estimate the extent of the intra- and interobserver variability. These presentations showed, as was already known for the V130 score and the Agatston score, a nearly perfect bisector for the V600 score as well, with its origin exactly at zero and with a slope of 1, which indicates good intra- and interobserver reliability. For a “more objective” quantification of these relations, an intraclass correlation coefficient was determined for each one as proposed by Shrout and Fleiss for this purpose [19, 20]. Since the test for significance is irrelevant to the question of agreement [21], it was not indicated here. To ensure that the readings of the same as well as of the other observer were consistent, this intra- and interobserver repeatability was estimated using a method similar to the “limits of agreement” method of Bland and Altman, known as the “repeatability coefficient” [22]. For this, in the case of intraobserver repeatability, the difference between reading 1 and reading 2 was plotted on the y-axis against the average of reading 1 and reading 2. The repeatability limit is defined as “the value less than or equal to which the absolute difference between two test results obtained under repeatability conditions may be expected to be with a probability of 95%” [23, 24]. Repeatability itself depends only on the distribution of incidental reading errors, which is why the magnitude of the reading error can be determined from the standard deviation of these readings [23]. Thus, the standard deviation of the differences between the pairs of repeated measurements was determined and the repeatability coefficient was determined to be the difference that was exceeded by only 5% of the pairs of readings [22, 25].

To estimate the magnitude of the scores obtained, the mean values and standard deviations were determined from the readings of both observers. Since the averages of the readings of Observer 2 and the standard deviations were somewhat larger than those of Observer 1, the values of the two observers were tested by means of a Wilcoxon matched pairs test. The lower and upper 95% confidence limits were compared with each other in a two-tailed test using “p” assuming significance in the event p < 0.05.

Then, to estimate the relations between the individual scores, scattergrams were made in which the V130 or Agatston scores of the two observers were plotted against the corresponding V600 score, as recommended by Bland and Altman [23, 26]. After thus demonstrating that the methods of the V130 and the V600 score and of the Agatston and the V600 score agree closely, the “regression approach for nonuniform differences” recommended by Bland and Altman was used [26], as the magnitude of the differences between the pairs of readings, not however their variation (SD), tended to increase with the magnitude of the absolute values. For this, a simple approach with a regression of the differences (δ) of the pairs of readings of the various scores (V130 and V600 as well as Agatston score and V600) was carried out on their average (α), in the form

$$\delta = {\text{x}} - {\text{y}}\;\alpha $$
(1)

[26]. In order to determine the variation of the values with respect to the best agreement line, the scatter of the residues (ρ) of the regression model (1) was adjusted as a function of the size of the measurement, fixed by α. For ρ a normal distribution was assumed, with a mean = 0 and a variance = σ2 [26]. Since the mean of the half-normally distributed absolute values is \(\sigma \surd {2 \mathord{\left/ {\vphantom {2 \pi }} \right. \kern-\nulldelimiterspace} \pi }\), the standard deviation of ρ could be obtained by multiplying the fitted values by \(\surd {2 \mathord{\left/ {\vphantom {2 \pi }} \right. \kern-\nulldelimiterspace} \pi }\) [26]. The 95% interval of agreement was then determined from the combination of both steps [27]. It can be expressed as

$$\delta \pm 1.96\surd {\pi \mathord{\left/ {\vphantom {\pi {2\rho }}} \right. \kern-\nulldelimiterspace} {2\rho }} = \delta \pm 2.46\rho $$
(2)

[25]. In another step, regressions of the readings from the V130 method or the Agatston method were carried out on the readings of the new V600 method to obtain a model with which the reading obtained using the new V600 method could be used to extrapolate the reading that would have been obtained from the old method [22]. For this, a 95% prediction interval was determined and on the basis of its width the acceptability of the new method was decided, as described by Bland and Altman [22].

In a final step, receiver operator characteristic analyses were conducted. Using this method, sensitivities and specificities were determined for a series of arbitrarily selected thresholds from the V600-CT angiography scores for the Agatston score values “0” and “400”, as well as for thresholds resulting from the ROC analyses of the V600 scores that correspond with the five categories of the Agatston score suggested by Rumberger et al. [5].

Simple group comparisons were made using T-tests where appropriate and indicated. All values were given as mean value ± standard deviation; p < 0.05 was considered statistically significant. The programs GraphPad Prism version 5.00 (GraphPad Software, San Diego California USA), and SPSS 15 (SPSS Inc., Chicago, Illinois, USA) were used.

Results

Dosage

The effective dose of the total examination, consisting of CAC, CTCA, and Scout, was 20.7 mSv (range: 7.9–56.1 mSv). The effective dose of CAC was 1.5 mSv (range: 0.8–4.4 mSv), whereas that of CTCA was 19.1 mSv (range: 7.1–17.3 mSv).

Contrasting in the left ventricle

The average absorption in the left ventricle at the time of the CT angiography was 375.6 ± 104.2 HU, with a range of 178–698 HU. In four patients, the threshold of 600 HU was partially exceeded in the left ventricle and the remains had to be deleted manually.

Agatston, V130, and V600 score

The values for the Agatston score were 566.8 ± 817.5 (O1), and 582.8 ± 838.3 (O2) with ranges from 0 to 3628 and 0 to 3870. The values for the V130 score were 452.7 ± 631.2 mm3, (O1) and 460.4 ± 641.2 mm3 (O2) with ranges from 0 to 2781 mm3 and 0 to 2931 mm3. The values for the V600 score were 0.0748 ± 0.1172 ml3 (O1) and 0.0758 ± 0.1219 ml3 (O2), with ranges from 0 to 0.6060 ml3 and 0 to 0.6620 ml3. There were no differences in the readings of each observer or between the two observers (p > 0.05, Wilcoxon matched pairs test).

Intra- and interobserver variability

The ICCs for the two readings of the Agatston, V130, and V600 scores were 0.9998, 0.9998, and 0.9985 for Observer 1, and 0.9997, 0.9997, and 0.9996 for Observer 2. For the comparison between the two observers these values were 0.9976, 0.9980, or 0.9982. The Spearman’s ρ for the correlation between the two observers for the three scores were 0.9981, 0.9979, and 0.9784. Table 1 shows the 95% confidence intervals of the differences between the first and second reading for each observer, and those for the differences between the first and second observer. The order of magnitude of these differences is the same for all of the three scores. The scattergram comparison of the readings from series 1 and 2 showed for both observers and all three scores that the points were on a nearly perfect line through the zero point of the coordinate system, with a slope of 1, indicating nearly perfect intraobserver repeatability. The comparison of the average readings from the two series of Observer 1 and Observer 2 also showed a result in which all points were on a nearly perfect line with a slope of 1 through zero, indicating nearly perfect interobserver reliability. For Observer 1, the repeatability coefficients for the two series were 72.8 mm3 for the V130 score, 100.3 for the Agatston score, and 36.2 mm3 for the V600 score; for Observer 2 they were 92.9 mm3 for the V130 score, 117.6 for the Agatston score, and 18.5 mm3 for the V600 score. The repeatability coefficients of the mean values of the two series for Observer 1 and those of Observer 2 were 224.7 mm3 for the V130 score, 313.3 for the Agatston Score, and 39.5 mm3 for the V600 score. Both intraobserver and interobserver repeatability were thus extremely good.

Table 1 95% confidence intervals of the differences between the first and second reading for each observer, and those for the differences between the first and second observer

Correlations between the scores

The scattergrams of the V130 and V 600 scores, as well as of the Agatston and the V600 scores showed a close relation between the old and the new scores for both observers. These are linear relationships with a Spearman’s ρ of 0.9279 (O1) and 0.9283 (O2; V130/V600), and 0.9208 (O1) and 0.9283 (O2; Agatston score/V600).

Bland-Altman analyses

Figure 2a–d show the regression lines for the differences of the various scores on their average, including the slope and y-axis intercept as well as the 95% interval of agreement, each indicating its width. For both observers, the V600 score tended to underestimate the coronary calcium as compared to the classic scores. Figure 3a–d shows the results of the regressions of the readings from the V130 method and the Agatston score method on the readings of the V600 method, again each with the slope and y-axis intercept. The dashed lines indicate the width of the 95% prediction interval in which 95% of the readings would have fallen if they had been made using the old V130 or Agatston method instead of the new V600 method. The prediction interval tends to spread out, which is why the lines are convex towards the regression lines. Since only very few readings lie outside of this narrow interval, it is acceptable to replace the V130 and the Agatston score method with the V600 method.

Fig. 2
figure 2

a–d: Regression lines for the differences of the various scores on their average, including the slope and y-axis intercept as well as the 95% interval of agreement. Observer 1, Agatston-Score—V600 Score (a), Observer 1, V130–V600Score (b), Observer 2, Agatston-Score—V600 Score (c), Observer 2, V130–V600Score (d). For both observers, the V600 score tended to underestimate the coronary calcium as compared to the classic scores

Fig. 3
figure 3

a–d: Regressions of the readings from the V130 method and the Agatston score method on the readings of the V600 method. The dashed lines indicate the width of the 95% prediction interval in which 95% of the readings would have fallen if they had been made using the old V130 or Agatston method instead of the new V600 method. Observer 1, Agatston-Score versus V600 Score (a), Observer 1, V130 versus V600Score (b), Observer 2, Agatston-Score versus V600 Score (c), Observer 2, V130 versus V600Score (d). Since only very few readings lie outside of the narrow prediction intervals, it is acceptable to replace the V130 and the Agatston score method with the V600 method

Categorization of the V600 score

Table 2 shows sensitivities and specificities for a series of arbitrarily selected thresholds from the V600 CT angiography scores for the Agatston score values “0” and “400”.

Table 2 Sensitivities and specificities for a series of arbitrarily selected thresholds from the V600 CT angiography scores for the Agatston score values “0” and “≥400”

Discussion

This study shows that a calcium volume score can be obtained from CTCAs that correlates strongly with the Agatston and V130 score. The intra- and interobserver variability is extremely small. The Bland-Altman analyses an underestimation of the coronary calcium load by the new score. However, the 95% prediction intervals of the regressions of the readings from the V130 method or the Agatston method on the readings of the new V600 method allow for the decision that it is conceivable to replace the Agatston and V130-Score by the V600 score. However, there are disadvantages and potential limitations to be considered.

The V600 score cannot be determined semi-automatically to the same extent as the Agatston score [10]. When the ventricle opacifies to a density of more than +600 Hounsfield units, it has to be painstakingly deleted manually. If opacification in the coronary arteries exceeds this value, it could make measuring the V600 score impossible, which did not occur in this study.

The lower limit of +600 HU was selected, as arterial opacification is less in most cases and thus the entire heart can be eliminated by the simple threshold step. By comparison, Agatston et al. selected a threshold of 130 HU to minimize noise effects [4]. The Agatston score is determined by the technical specification of a minimum density of 130 HU and the arbitrary weighting of densities above that. It correlates well with the actual coronary atherosclerotic plaque area, but underestimates its extent by about 80% [28], and correlates only moderately well with the calcium score determined by histology, but not for example with the lipid content or the extent of inflammation [29]. The V130 score appears to be inferior to the Agatston score due to slightly lower reproducibility [30]. However, the relevance of this difference is doubtful in view of the far greater difference between actual extent of arteriosclerosis and CAC score [29].

A previous approach of combining CAC scoring with CTCA which, after initial encouraging results [13] proved to be unreliable [14], pursued the same objective as this study, however not using a volume score, but Agatston and calcium mass scores. The acquisition of contrast medium was the reason for the increase of the score in the CTCA compared with the conventional CAC scoring [14]. In our view, the cause was simply that the threshold of +350 HU selected was too low and did not eliminate contrast medium adjacent to calcium plaques in every case. We chose the approach of using a volume score with a lower threshold of +600 HU because of its simplicity. Neither calibration steps nor the use of calcium density weightings are required. The volume of the hard plaque burden is more clearly underestimated than in the conventional scores, but the main condition necessary for risk stratification—linear relationship with the conventional score and its accurate predictability—is maintained throughout the entire value spectrum. Thus, the V600 method is anticipated to be easily applicable in other vascular territories using standard imaging protocols as well.

The V600 method has the potential to replace the classic scores in those cases in which previously CAC scoring and CTCA were carried out in two separate CT’s. This advantage would reduce the total dosage applied by 7% only, but should still not be underestimated, because one of the main disadvantages of cardiac CT even with modern 64-multidetector CT equipment is high patient exposure to radiation—it is higher than for catheter coronary angiography [11] and is probably associated with a relevant risk of inducing tumors, especially for young patients, even with protocols using lower tube currents than used in our series. Newer devices with higher detector coverage can image the entire heart in a single RR cycle, thus halving the effective dosage of a CTCA [31]. However, this advantage can be realized only if another CT data acquisition is not required for the CAC scoring.

The V600 method is hardly susceptible to errors resulting from the CT data acquisition starting position thanks to the use of 0.625 mm collimated slices [32]. It cannot be susceptible for variations stemming from various settings for two-dimensional connectivity or lesion size threshold [10] because such settings do not play any role in the simple approach presented here. Based on the density of a pixel, it is either selected for the V600 score or not. The use of interslice interpolation, which theoretically plays a significant role in conventional scoring methods because it verifiably reduces intra- and interobserver variability [10] is negligible for V600 scores—the CTCA protocols used today all include collimation of 0.625 mm and such low pitches that a maximum overlapping of slices is acquired. It was thus anticipated that the intra- and interobserver variability would be as low as for the Agatston score, namely close to zero. A further favorable argument is that the variability for detecting hard calcifications is much less for CAC scoring methods than for detecting soft calcifications [33]. Particularly these portions of calcified plaque are selected by the V600 score.

Concluding it must be pointed out that a few issues remain open. Since the threshold of 600 HU was more or less arbitrarily chosen, just as the thresholds of classic scores [4], further research is necessary to clarify the question of whether changing the threshold could further improve the already excellent correlation to the Agatston score. Furthermore, more research is required to determine, both for the V600 score and for the other CAC scores, to what extent changing the calcium scoring parameters could lead to better correlations with the actual extent of plaque in the coronary arteries, in order to get closer to the objective of optimizing risk stratification and lowering the exposure to radiation. Whether the risk stratification will be just as good using the V600 method as with the Agatston method also remains to be seen.