Introduction

Recently, several of the randomized thrombectomy trials have used Alberta Stroke Program Early CT Score (ASPECTS) [1] for patient selection [2]. Some national stroke guidelines have incorporated ASPECTS in their recommendations for selecting patients for endovascular therapy (EVT) [3]. When an imaging-based selection approach is used clinically to select patients for EVT, ideally, it should demonstrate good inter-observer agreement. Unfortunately, the determination of early ischemic changes (EIC) and their translation into the ASPECTS have a considerable inter-rater variability [4, 5], which is influenced by rater experience. These studies tended to include a broad range of stroke severities and may not necessarily apply to the population eligible for EVT [4]. Computer-aided diagnosis is a promising tool to aid the evaluation of EIC identified on CT. The aim of this study was to evaluate the agreement of two CE-marked (Conformité Européene) automated software solutions and the consensus of two neuroradiology experts (EC) based on a reference standard defined by final ASPECTS on follow-up imaging in those patients who had EVT and achieved prompt and complete reperfusion.

Methods

The study was approved by the local institutional review board and was prepared in accordance with the Guidelines for Reporting Reliability and Agreement Studies and the Standards for Reporting of Diagnostic Accuracy [6, 7].

Patient population

We retrospectively reviewed cases of patients having acute ischemic stroke who were treated with EVT between July 2013 and June 2017. Cases that met the following criteria were included: (1) occlusion of the internal carotid artery and/or middle cerebral artery; (2) complete recanalization corresponding to a TICI (thrombolysis in cerebral infarction) 3 score at the end of EVT; (3) time interval from CT to TICI 3 reperfusion < 100 min; and (4) availability of baseline NCCT head and a follow-up NCCT or MRI. Cases with significant motion artifacts were excluded from analysis. All patients had CT angiography and CT perfusion performed for confirmation of large vessel occlusion and perfusion deficit. Intravenous thrombolysis was initiated in eligible patients. Clinical and imaging data were retrospectively analyzed.

ASPECTS readings by raters

All NCCT were retrospectively assessed by a consensus reading of two expert neuroradiologist (O.J. and F.W.), without access to other imaging studies or to clinical information and without time restraints. Modification of the window and level of the image contrast was allowed as needed. Contrary to the original ASPECT scoring system, which utilized only 2 brain sections [1], readers graded EIC in each of the 10 ASPECTS regions according to current methodology, which utilizes all images. EIC was defined as tissue hypoattenuation or loss of gray–white matter differentiation, as these changes have been associated with edema and irreversible injury. The 10 regions were divided in deep structures including caudate (C), lentiform nucleus (L), internal capsule (IC) and insula (I), and in superficial cortical areas including M1–M6.

Automated ASPECTS

e-ASPECTS software (Brainomix) is a decision support software that uses thin NCCT slices (1.0 mm) to perform the analysis. According to information provided by the company, the Brainomix software first resamples and standardizes the input DICOM data. Subsequently, voxel-wise early or non-acute signs of ischemia are identified using a machine learning classifier which has been trained on a large dataset (> 10,000 images) containing a wide range of clinical CTs from stroke patients and negative controls. After a patient-specific segmentation of the ASPECTS VOIs is generated, finally, the ASPECTS output score is calculated by classifying each VOI according to the results of the voxel-wise ischemia analysis. As manufacturers regularly release new software versions which claim improved performance, analysis was performed with both e-ASPECTS v6.1 and v7.1.

RAPID ASPECTS (iSchemaView) can analyze different slice thicknesses. Slice thicknesses of 2–3 mm are recommended as preferred dataset. Therefore, in each patient, a fully automated NCCT was processed on this software using thick slices (2.5 mm). To compare directly with e-ASPECTS, a further analysis using 1.0-mm slices was performed. According to information provided by the company iSchemaView, RAPID ASPECTS defines the 10 ASPECTS VOIs in both hemispheres, measures their mean Hounsfield units (HU), and calculates percentage HU differences between corresponding VOIs. Basing on these percentage HU differences, the VOIs are classified as ischemic or non-ischemic and the total ASPECT score is calculated. In order to avoid misinterpretation due to liquor or old infarctions, dark voxels are excluded from the mean HU measurements.

Both software tools are CE-marked.

Imaging protocol, CTP post processing, final ASPECTS

Imaging protocol

All CT scans were performed using a 64-slice CT scanner equipped with a 40-mm-wide detector (Philips Healthcare). NCCT scans of the head were performed in helical mode (0.65 mm thickness, kV 120, mA 250) and axial non-contrast CT images were reconstructed in 1.0-mm and 2.5-mm slice thickness applying the brain standard kernel with a fourth-generation iterative reconstruction algorithm (iDose level 2/filter UB). The imaging parameters for CTP were 80 kVp, 150 mAs, and 32 × 1.25 mm detector collimation and a scan duration of 60 s. After cerebral NCCT, CTP was conducted with the toggling table technique, allowing an extended coverage of the brain of 80 mm. A scan delay of 3 s was applied after injecting 60 mL (flow rate 5 mL/s) of iodinated contrast agent (350 mg I/mL Imeron 350, Bracco Imaging).

CTP image post processing and analysis

Perfusion analysis was performed with the RAPID™ software (iSchemaView).

Final ASPECTS

Final ASPECTS is the ASPECTS rated by consensus of two expert readers based at follow-up CT or MRI scans which were obtained between days 1 and 8. If both CT and MRI were performed, MRI data were used to assess the final ASPECTS.

Statistical analysis

Attribute agreement analysis (numerically equivalent to accuracy) for each single ASPECTS region was used to assess the agreement among the software packages, and EC with final ASPECTS as reference [8]. The agreement for total ASPECTS by multiple appraisers was measured by using the weighted Kappa statistic.

Additional assessments were made for each ASPECTS region to define the sensitivity, specificity, and accuracy of each software package and EC. The overall sensitivity, specificity, and accuracy with 95% confidence intervals in the cortical (M1–M6) and deep areas (IC, I, L, C) were evaluated for the software packages and EC. The sensitivity, specificity, and accuracy are generated from the default software outputs provided by the manufacturer.

For e-ASPECTS, the manufacturer reports the default operating point was set by selecting the most specific operating point that was non-inferior in both sensitivity and specificity to an expert human scorer. For RAPID-ASPECTS, the default operating point in this analysis was the slider bar in the central position.

Score-based (total) receiver operating characteristic (ROC) curve analysis was performed using e-ASPECTS which allows generation of outputs using different operating points for sensitivity and specificity. RAPID does not output ROC curves; however, it allows adjusting its confidence level with a slider in the web interface. For this ROC analysis, the ground truth is positive for each ASPECTS point below 10, and negative for each ASPECTS point above 0, irrespective of which region is affected. As an example: if the physician scored 7, and the ground truth was 8, then this would result in 2 true positives, 1 false positive, and 7 true negatives. If the physician scored 8, and the ground truth was 7, this would result in 2 true positives, 1 false negative, and 7 true negatives. Weighted Kappa values were also calculated between each ASPECT score and the final ASPECTS. Continuous data with normal distribution were reported as mean ± standard deviation; ordinal or non-normal data were reported as median and interquartile range (IQR). Categorical data were reported as proportions.

Statistical tests used to determine the significance of differences in variables are listed in the data tables and within the text where relevant statistical significance was set at p < .05. Statistical analysis was performed by using XLSTAT Version 2018.2.

Results

Demographic characteristics, clinical outcome, procedural characteristics and descriptive imaging findings, and ASPECTS assessment of the study patients

Retrospective analysis of our database identified 52 eligible patients. Patient demographics, clinical and imaging outcomes, procedural characteristics, and the ASPECTS assessments are summarized in Table 1.

Table 1 Baseline demographic, clinical, and procedural characteristics. Imaging findings of patients with ICA or/ and MCA occlusion who achieved TICI 3 recanalization

Sixty-two percent of infarcted regions at follow-up were deep structures (Fig. 2) with the most common being the lentiform nucleus (19% of all infarcted regions, occurring in 60% of patients). Of the cortical regions involved, the most common region was the M2 region (10% of infarcted regions in 31% of patients).

Agreement of total and region-based ASPECTS

There was moderate agreement between EC, RAPID, and e-ASPECTS with the ground truth (Table 2). Attribute agreement analysis (accuracy) among operators and final ASPECTS as reference showed no significant differences in percentage agreement for EC, RAPID-ASPECTS, and e-ASPECTS 7: 77% (CI 95% 73–80), 74% (70–78), and 72% (68–76) respectively (Table 2). The ASPECTS distribution within the cohort was skewed toward higher scores (smaller infarcts) predominantly in the deep regions, without significant differences in the median ASPECTS between automated and EC ratings (Figs. 1 and 2).

Table 2 Agreement assessment with final ASPECTS as reference on follow-up imaging
Fig. 1
figure 1

Distribution of total ASPECTS results

Fig. 2
figure 2

Frequencies of early ischemic changes (EIC) on follow-up imaging and initial imaging by expert consensus (EC) and software packages of areas identified with EIC divided in deep structures (C, IC, L, I) and cortical regions (M1–M6)

Superficial cortical areas were more often marked as positive with e-ASPECTS (69 with v6/312) than EC (26/312) or RAPID ASPECTS (31/312) and deep structures less often identified as positive with e-ASPECTS (20 with v6/208) than EC (52/208) or RAPID ASPECTS (60/208). The final ASPECTS detected 101 ischemic lesions in deep structures and 63 ischemic lesions in cortical areas, p < 0.0001 (chi-square statistic). Figure 2 shows total frequencies of EIC in deep and superficial cortical regions.

Sensitivity, specificity, and accuracy are shown for each region in Table 3. The overall sensitivity, specificity, and accuracy with 95% confidence intervals in the cortical (M1–M6) and deep areas (IC, I, L, C) can be seen in Table 4. Within the regional analyses, it can be seen that within the cortical regions, EC and e-ASPECTS tended to be more sensitive, but less specific than RAPID ASPECTS. Within deep regions, RAPID ASPECTS was more sensitive, but less specific.

Table 3 Specificity, sensitivity, and accuracy for each ASPECTS region
Table 4 Sensitivity, specificity, and accuracy across all cortical and all deep regions

ROC curve analysis

Both EC and RAPID ASPECTS are in close proximity or overlapping to the e-ASPECTS ROC curves with no significant difference between the software packages and the EC (Fig. 3). Areas under the ROC curves with confidence intervals and weighted Kappa values are presented in Table 2. There was no significant difference between any of the ROC curves. Using a bootstrapping technique to determine statistical significance, the p value between e-ASPECTS version 7 and RAPID 2.5 mm was 0.813.

Fig. 3
figure 3

Score-based (total) receiver operating characteristic (ROC) curve using multiple operating points (black: e-ASPECTS v6; orange: e-ASPECTS v7; dark blue: RAPID ASPECTS 1.0 mm; light blue: RAPID ASPECTS 2.5 mm)

Discussion

This is the first study that evaluated the total and region-based agreement of two different automated ASPECTS tools using final ASPECTS as the ground truth, based on follow-up imaging in patients who were promptly and successfully treated with EVT. A prior study that evaluated e-ASPECTS performance demonstrated that on average, this software was equivalent to expert neuroradiologists [9, 10]. Our findings support previous studies that demonstrated moderate agreement with the total ASPECTS for automated compared to human scorers [4].

A contributing factor for the substantial agreement between all methods for quantifying ASPECTS is the distribution of scores within this cohort which cluster around 8. This restricted distribution (i.e., smaller variance) results in less opportunity for potential discrepancies. Despite this, within the regional analysis of deep and cortical structures, we showed different sensitivities and specificities between humans and automated software packages. These differences are attributed to a different operating point used by the software packages and EC.

It is notable that the ROC curve results are lower in this study than have been reported in previous studies of e-ASPECTS in different cohorts [10]. A likely explanation is the particular prevalence of damage in the ASPECTS regions within this study’s cohort. The most common finding in this cohort was infarction of the deep brain structures, in particular the lentiform nucleus, which was infarcted in 60% of patients at follow-up. RAPID ASPECTS and EC had higher sensitivity, but poorer specificity for detection of EIC in these deep structures. In contrast, e-ASPECTS showed lower sensitivity, but higher specificity. Areas of notable disagreement included the internal capsule and the insula. The internal capsule is known to be inconsistently scored among expert raters and has been the subject of low agreement in other studies [11, 12]. Further, the frequency with which the internal capsule was scored across methodologies varied from e-ASPECTS and EC which scored 0 and 4% involvement respectively to RAPID ASPECTS which scored involvement in 21% of cases.

The inverse pattern of sensitivity was observed in the cortical regions with e-ASPECTS scoring with greater sensitivity than EC and RAPID ASPECTS. Variability in cortical scoring may be due to the challenges of consistently defining anatomical borders for the superficial cortical regions (M1–M6). Despite this, identification of cortical infarction is important as these regions have been shown to have the greatest clinical eloquence, contribute to a greater proportion to the ASPECT score, and are therefore most likely to influence decision-making [13, 14]. It is possible that some discrepancy was observed as hypodensities that could be allocated to different cortical areas by different software programs or expert readers, resulting in discrepant region-based analysis [15]. This may underlay the increasing popularity of using volume measurements (such as a MRI- or CTP-defined ischemic core, or the e-ASPECTS CT infarct volume feature), although this approach may result in a loss of sensitivity to functional eloquence that can be provided by region-based scores.

Differences in slice thickness will result in different signal-to-noise ratios affecting the results, as was seen with the RAPID ASPECTS algorithm. In future studies, the influence of scan parameters should be explicitly investigated including the reconstruction algorithm as well as the slice thickness used for analysis. Previous work has demonstrated that machine algorithms tend to be less affected by reconstruction parameters than human raters [16]. The lower attribute agreement for the older version of e-ASPECTS demonstrates the improvements that are being made in machine learning, and the importance of using a consistent version of a software for analysis.

ASPECTS and e-ASPECTS have been shown to be a predictor of patient outcome following EVT [2, 17]. For automated software solutions to be clinically useful, not only is accuracy critical but also the presentation and traceability of the results. Automated ASPECTS tools are designed as decision support rather than standalone diagnostics. Their utility relies not only on their absolute score, but the ability of the clinician user to see the results and adjust the default score according to their clinical interpretation. Both RAPID ASPECTS and e-ASPECTS visually present results in all slices and both software solutions analyze the whole MCA territory of the brain. e-ASPECTS generates a heatmap which marks that portion of each region that shows EIC, allowing the clinician to visually inspect and review the results on a voxel basis. RAPID ASPECTS marks the whole area, even if only part of the region is affected. e-ASPECTS provides volumes of acute and non-acute ischemic areas, whereas RAPID ASPECTS provides Hounsfield units values for each region.

The study has some limitations. Only patients who were EVT candidates and had prompt and complete reperfusion were included, which limited the number of eligible patients and biased the cohort to patients with favorable ASPECTS ratings (patients with unfavorable scores are not treated with EVT). Therefore, our results may not apply to patients with lower ASPECT scores, which is critical when assessing the utility of the packages for patient selection for EVT. Furthermore, due to both the hyperacute nature of the imaging cohort and the time delay of up to 100 min between baseline imaging reperfusion, it is likely that some additional evolution of the infarct occurred, which accounts for the relative insensitivity of all methods tested.

Conclusions

Good agreement between both software packages and EC was obtained for total ASPECTS compared to ground truth. The packages differed with respect to regional contributions, without any significant difference in performance and without any implications in the clinical decision-making in this cohort. Fully automated ASPECT scoring is not designed to be used as a standalone tool clinically and both products in this study are intended for use as a decision support tool. Multidisciplinary neuroradiologic and neurologic expertise will always be required, but automated tools may facilitate decision-making. Automated ASPECTS can be used to assist decision-making, but other examination results, such as CTA, CTP, and clinical parameters, must also be considered. Automated ASPECTS tools have the potential to improve standardization and inter-rater agreement in both research and clinical practice, especially when images are being read by less-experienced readers.