Introduction

In 1979 the Leaman score was developed to quantitate the severity of coronary artery disease (CAD), by assigning different weighting factors to coronary segments based on the theoretical average coronary blood flow to the left ventricle [1]. In contemporary practice coronary computed tomography angiography (CCTA) is increasingly being used to assess plaque characteristics, vessel remodelling and the progression and regression of atheroma. Substantial data demonstrate the strong association between the overall amount of coronary plaque seen on CCTA and coronary heart disease events, and such information may offer stronger prognostic value than merely the presence or absence of clinical risk factors and anatomical stenoses [2,3,4]. CCTA can reflect the different components of plaque which makes it very appealing for longitudinal trials of lipid-modifying medications, wherein the different components of plaque may change in a differential fashion over time, and in patients with different comorbidities such as diabetes. Notably, interventions may have differential effects on these diverse components [5, 6]. The emerging importance of this method has motivated applications beyond the coronary tree. The VICTORION PLAQUE study (NCT05360446) is an ongoing CCTA study assessing the effect of Inclisiran in addition to maximally tolerated statin therapy on atherosclerotic plaque progression, plaque composition and atheroma volume in participants with a diagnosis of non-obstructive CAD without previous cardiovascular events.

The CCTA adapted Leaman score was developed as a tool to noninvasively quantify total coronary atherosclerotic burden. A CT Leaman score (CT-LeSc) > 5 is an independent long-term predictor of hard cardiac events and can be used as a prognostic tool [7, 8], however, its calculation relies on the subjective assessment of lesions using CCTA, and therefore is subject to intra- and inter-observer variability, which may ultimately affect its reproducibility. A poorly reproducible score will limit its clinical application and will make guideline recommendations based on specific scores of limited values. Therefore, this study aimed to assess for the first time the inter-and intra-observer reproducibility of CT-LeSc.

Methods

Study population

The study population comprised of 50 CCTAs, which were randomly selected from the SYNTAX III REVOLUTION and ABSORB trials [9, 10]. Cardiac MSCT angiography was performed using a 64-slice computed tomography technology or beyond. Standard acquisition techniques were used, which included the use of nitrates prior to image acquisition and beta-blockers in patients with a fast heart rate (> 65 bpm), tube settings depending on patient body mass index (80–140 kV), and axial scan protocols for patients with lower heart rates to reduce radiation doses, all at the discretion of the individual sites. Images were reconstructed using thin slices (0.5–0.67 mm) and medium smooth reconstruction filters in different phases. The CT-LeSc was calculated using semi-automated plaque analysis software (QAngioCT Research Edition Version 2.0.5; Medis Medical Imaging Systems, Leiden, the Netherlands). The default window widths and levels set for the assessment of plaque (see below) was 750 and 250 respectively. If needed, the image display setting could be adjusted on an individual basis, in general at a window between 600 and 900 HU and at a level between 40 and 250 HU.

Leaman score calculation

The CT-LeSc was calculated as previously described [11]. and in brief the score for each coronary segment was the product of three values based on (1) plaque location [1]; (2) plaque type (noncalcified, calcified, or mixed plaques) with a weighting factor of 1.5 for predominantly noncalcified or mixed plaques, and a factor of 1 for predominantly calcified plaques; and [3] degree of stenosis with a weighting factor of 1 for the presence of obstructive CAD (visual stenosis ≥ 50%), and 0.615 for non-obstructive CAD (Fig. 1). The patient’s total CT-LeSc was the sum of the CT-LeSc for each coronary segment > 1.5 mm in diameter. Three case examples are shown in Fig. 1.

Fig. 1
figure 1

Three case examples of patients with coronary artery disease stratified by different coronary atherosclerotic burden scores. A A patient with right dominance and two non-obstructive lesions. The left main lesion extends to the proximal LAD. The left main and the proximal LAD segments scored separately; B a patient with right dominance and four focal lesions (three non-obstructive and one obstructive). Two focal lesions in the proximal RCA are separated by a distance > 3 times the reference lumen diameter and scored separately; C a patient with left dominance and five lesions (three non-obstructive and two obstructive). CT-LeSc CT-Leaman score; CAD coronary artery disease; LM left main; LAD left anterior descending; LCX left circumflex; RCA right coronary artery; NCP non-calcified plaque; CP calcified plaque; MP mixed plaque

When a coronary segment had more than one plaque, the plaques were only scored separately when they were more than three times the vessel reference diameter apart (Fig. 1C).

Inter-observer reproducibility

To assess interobserver reproducibility, the CT LeSc was calculated in 50 CCTAs by two separate teams each made up of two cardiologists, who reported results by consensus. The “team” meant that both observers evaluated and interpreted the CT-Leaman score with consensus. This was done to improve the accuracy of interpretation and to address the uncertainty of the classification of the type of plaque in to non-calcified vs. calcified vs. mixed.

Intra-observer reproducibility

To assess intra-observer reproducibility, the same CCTAs were analysed 8 weeks later by the same team of two cardiologists, with the analysts blinded to the results of the first analysis.

In total, the scores were calculated three times (i.e., The Team 1 and 2 initially performed the inter-observer analysis. Eight weeks later, team 1 performed the intra-observer analysis ).

Statistical analysis

Continuous variables are expressed as mean ± standard deviation (SD). Intra- and inter-observer variability were determined for eight variables using weighted kappa statistics that reflect the agreement between two or more observations using weight to quantify the relative difference between categories, with the strength of agreement characterised by kappa values < 0, 0 − 0.20, 0.41 − 0.60, 0.61 − 0.80, and 0.81 − 1.00 as none, slight, fair, moderate, substantial, and almost perfect, respectively. The eight variables included: total number of [1] lesions; [2] calcified lesions; [3] non-calcified lesions; [4] mixed lesions; [5] obstructive lesions; [6] non-obstructive lesions; and the total CT-LeSc in increments of [7] ten; and [8] five. The reproducibility of CT-LeSc was evaluated by calculating the intra- and inter-observer variability, which was defined as the difference between the corresponding measurements expressed as a percent of their mean. Bland Altman plots for the CT-LeSc were used to describe the agreement between teams and between rounds of analysis (Figs. 2 and 3). A 2-tailed P value of < 0.05 was used to indicate statistical significance. All statistical analyses were performed with R Version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria).

Fig. 2
figure 2

Bland–Altman analysis for the CT Leaman score between two teams

Fig. 3
figure 3

Bland–Altman analysis for the CT Leaman score between two rounds

Results

Corelab Inter-observer reproducibility assessment

Table 1, and supplementary Tables 1–7 show the results recorded by the two teams for the eight variables, whilst Table 3 shows a summary of the corresponding kappa values for the degree of agreement between both measurements beyond the level of chance. The kappa values ranged from 0.62 to 0.93 with the lowest seen in the total number of lesions (weighted kappa 0.62), and the highest for the Leaman score in increments of five (weighted kappa 0.93).

Table 1 Leaman score according to increments of five recorded by two teams

Corelab intra-observer reproducibility assessment

Table 2, and supplementary Tables 8–14 show the results recorded during the two rounds of the study conducted eight weeks apart for the eight variables, whilst Table 3 shows a summary of the corresponding kappa values. The kappa values ranged from 0.65 to 0.93 with the lowest seen in the total number of lesions and calcified lesions (weighted kappa 0.65 for both), and the highest for the Leaman score in increments of five (weighted kappa 0.93).

Table 2 Leaman score according to increments of five recorded during both rounds of study
Table 3 Weighted kappa values for inter- and intra-observer reproducibility

CT-Leaman score

The Bland Altman plot for the CT-LeSc calculated for each patient by the two teams demonstrates that majority of values lie within the limits of agreement; however, there were two data points outside the limits of agreement. (Fig. 2.

The mean ± SD CT-LeSc for the two teams were 15.36 ± 5.57 compared to 15.24 ± 5.16. The Bland Altman analysis showed a bias of 1.17 (a measure of precision) with 95% limits of agreement − 2.862 to 3.107.

The Bland Altman plot for the CT-LeSc for each patient during both rounds of the study shows that almost all values lie within the limits of agreement (Fig. 3). The mean CT-LeSc during rounds one and two were 16.61 ± 5.28 versus 16.82 ± 5.55, respectively. The Bland Altman analysis showed a bias of 1.28 (a measure of precision) with 95% limits of agreement − 3.404 to 2.989.

Discussion

The main findings from this study are that the CT-LeSc had substantial to near-perfect inter-and intra-observer reproducibility when assessed by the consensus of two cardiologists (Table 4).

Table 4 Prior studies assessing the reproducibility of the components of Leaman score

The rationale for evaluating the CT-LeSc reproducibility

The development of the CT-LeSc was driven by the need for a global and comprehensive scoring system to standardise prognostic assessment with CCTA, as has been achieved with invasive coronary angiography [1, 7, 11]. Whilst its primary role was to non-invasively quantify total coronary atherosclerotic burden [11], a potential role in prognostic assessment emerged owing to it being fundamental based on the Leaman score, which whilst originally developed to quantitate the severity of CAD [1], evolved to provide prognostic information. The latter being a consequence of its derivation using a combination of weighting factors related to average coronary blood flow to the left ventricle, and also considering the degree of coronary stenosis [1].

In the first study assessing the prognostic value of the CT-LeSc, Gonçalves et al. reported that just over a fifth of patients with non-obstructive CAD had CT-LeSc scores in the highest tertile (> 8.3), suggesting it can convey prognostic information [11]. Following this Mushtaq et al. demonstrated its long-term prognostic benefit by reporting similar event-free outcomes in patients with a high CT-LeSc irrespective of the presence of obstructive or non-obstructive CAD, and confirmed it was an independent long-term predictor of hard (cardiac death and non-fatal acute coronary syndrome) cardiac events with a hazard ratio of 5.39 (95% confidence interval: 3.49 to 8.33) [7]. They also showed that the CT-LeSc improved risk classification by 38% compared with the segment stenosis score and by 19% compared with the segment involvement score (SIS) [7]. In the multi-center CONFIRM study, CT-LeSc was the only multivariate predictor of myocardial infarction and all-cause mortality [8]. Furthermore, it was an independent predictor of major acute cardiac events, and improved risk stratification in patients with non-obstructive CAD, with myocardial infarction event-free survival rates in patients with a CT-LeSc ≤ 5 and > 5 of 98% and 95%, respectively (p = 0.01) [8].

In a low-risk trans-catheter aortic valve replacement (TAVR) population with CAD assessed by CCTA, the mean CT-LeSc was 6.27 ± 0.27, with 60.4% of patients having a CT-LeSc > 5, and most plaques located in proximal and mid-coronary segments [12]. Therefore low-risk surgical patients with severe aortic stenosis who undergo TAVR may still have a significant CAD burden, reinforcing the fact that future TAVR devices should guarantee unimpeded access to the coronaries for angiography and interventions.

Leaman score reproducibility

Our study demonstrates the overall reproducibility of the CT-LeSc was substantial to near perfect with a kappa value of 0.85 for inter- and intra-observer reproducibility. Notably, despite its proven role in risk stratification, previous studies have only assessed the reproducibility of some of its individual components, and not of the total score [7, 8]. The total number of lesions and the total number of calcified lesions had the poorest inter- and intra-observer reproducibility, although this was still categorised as substantial, as the lowest weighted kappa was 0.62. These relatively low kappa values were probably due to the subjective nature of tandem lesion assessment in a single segment, as they were only considered discrete lesions when separated by a distance more than three times the normal reference vessel diameter. Furthermore, in three cases, when the diameter of the vessel was between 1.5 and 2.0 mm a few plaques were overlooked. The slightly lower kappa value for the total number of calcified lesions was presumably influenced by the total number of lesions and the subjective nature of classifying plaques between calcific or mixed, which was dependent on whether > 50% of the plaque area was calcified or not.

Prior studies have shown good inter- and intra-observer variability for the detection of coronary stenoses on CCTA [13,14,15]. In a sub-study of the Scottish Computed Tomography of the Heart (SCOT-HEART) trial (Table 4) excellent intra-observer agreement and good inter-observer agreement was seen when the CCTA result was classified on a per-patient basis as obstructive, non-obstructive, or no CAD present [3, 16]. A small CCTA study involving 20 patients showed moderate inter- and intra-observer agreement for plaque classification (Table 4) [17]. In a reproducibility study assessing the quantitative composition of coronary artery plaque using CCTA, inter-observer correlations were excellent for calcified plaque and non-calcified plaque volume (r = 0.93–0.97, P < 0.001) (Table 4) [18].

The coronary artery disease reporting and data system (CAD-RADS) was created to standardize reporting for patients undergoing CCTA and to guide possible next steps in patient management. The inter-observer reproducibility of CCTA-based CAD-RADS assessment categories and modifiers has been reported as excellent [19]. A recent updated CAD-RADS classification follows an established framework of stenosis, plaque burden, and modifiers, which also includes an assessment of lesion-specific ischemia. Notably, whilst plaque burden has only recently been included, being absent in CAD-RADS 1.0 and only being added to CAD-RADS 2.0, the consensus document still doesn’t recommend any specific method to quantify plaque burden, simply suggesting operators select a technique that is most appropriate to their institution [20]. The various methods for reporting total coronary plaque burden include coronary artery calcium (CAC) testing, SIS, visual estimation, and quantitative assessment. Studies have demonstrated that quantitative measurements of coronary arterial stenoses and plaque composition are highly reproducible when high-quality CCTA datasets are analyzed using dedicated software for plaque composition analysis [21, 22]. While there are numerous important emerging techniques for performing a quantitative and reproducible assessment of total plaque burden and plaque type beyond visual assessment alone, these techniques are not widely available and are not routinely performed as part of clinical interpretation [23]. In addition, most techniques are time and labour-intensive, which inhibits their incorporation into routine clinical interpretation.

Early risk stratification of coronary atherosclerosis using CCTA may help inform prevention strategies. Post-mortem and intra-vascular ultrasound (IVUS) studies have shown that most acute coronary events coincide with a large and often ruptured atherosclerotic plaque in the coronary arteries [24].

The SCOT-HEART [25] and PROMISE [26] (Prospective multicenter imaging study for evaluation of chest pain) studies suggest that CCTA-detected coronary atherosclerosis, irrespective of the degree of stenosis, is an important driver of clinical events in symptomatic populations. The CONFIRM registry reiterated that CCTA has a leading role in the mid-term prediction of all-cause mortality among > 23,000 individuals without known CAD [27,28,29], with non-obstructive and obstructive CAD diagnosed by CCTA associated with higher rates of MI and mortality. The SCAPIS (Swedish cardiopulmonary bioimage) study included 25,182 randomly invited individuals (50.6% women) aged 50 to 64 years without known coronary heart disease to study the prevalence of subclinical coronary artery atherosclerosis in the general population, and found CCTA-detected atherosclerosis in 42.1% of patients, with a significant stenosis (≥ 50%) seen in 5.2% [30].

In the recent large-scale study among patients undergoing IVUS assessment of the left main coronary artery (LMCA), but in whom revascularization was not intended, IVUS-defined LMCA plaque burden was independently associated with all-cause and cardiac mortality. Every 10% increase in plaque burden was associated with a 12% increase in the 12-year hazard for mortality in a population where the LMCA angiographic diameter stenosis was only 12.4% ±10.8% and the IVUS minimal lumen area 13.1 ± 5.0 mm2 (well above the threshold for hemodynamic significance in most patients).

The Leiden score, which is a new tool that is very similar to the CT-LeSc on which it is based, has been internally and externally validated using the 5-year follow-up dataset of the CONFIRM registry. The new score showed a strong and independent predictive value for the events at long-term follow-up, furthermore it provided better discrimination and reclassification of events compared with the CADS-RADS score based on stenosis severity only [31]. Since the score was very similar to the CT-LeSc, questions were raised about whether it would add any further discrimination compared to the pre-existing CT-LeSc [32].

After detecting subclinical atherosclerosis by CCTA, it is important to appreciate the recommended threshold to institute aggressive pharmacotherapy. In the 2019 European Society of Cardiology/European Atherosclerosis Society Guidelines for Management of Dyslipidemias, people with documented atherosclerotic cardiovascular disease either clinically, or unequivocally documented by imaging, were deemed as being very high-risk as these are known to be predictive of clinical events. The LDL-C goal for these patients is < 55 mg/dL (< 1.4mmol/L). It is important to note that these guidelines don’t mention patients with non-obstructive CAD and a CT-LeSc > 5, despite these patients having a risk of atherosclerotic events which is like those with obstructive CAD.

Clinical implications of the CT-LeSc

The CT-LeSc is highly reproducible and can be used as a clinical tool for risk stratification in patients with CAD. Since the score is semi-quantitative, it could be used in pharmacological prevention trials to assess for eligibility, with a CT-LeSc > 5 used as the threshold when there is enough plaque to intervene, and/or as an imaging endpoint. The V-PLAQUE study (NCT05360446) is an ongoing primary prevention CCTA study where CT-LeSc is being used as an entry criterion along with FFRCT.

Limitations

Although the kappa value is currently the accepted standard measure of inter-and intra-observer reproducibility, it is not without limitations. By definition, it represents agreement beyond the level of chance, however, the actual level of chance agreement is variable and affected by the prevalence of the disease being studied. Overall, the kappa values obtained in this study should only be considered as a guide, and do not reflect the reproducibility of the CT-LeSc in a different patient population with a different prevalence of CAD. Additional limitations include the experience of the investigators in this study, who have each individually assessed the CT-LeSc in over a hundred CCTAs. A repeat assessment of the same CCTAs using investigators less familiar with the definitions, and less experienced in calculating the score may well provide different results.

Conclusions

CT-LeSc is a robust method to quantify total coronary atherosclerotic plaque burden and can be used for risk stratification, and as demonstrated has substantial to near-perfect intra- and inter-observer reproducibility when assessed by the consensus of a team of two cardiologists.