Introduction

Hemorrhagic transformation (HT) represents the most severe and dreaded complication of acute ischemic stroke (AIS) and is an independent predictor of unfavorable outcome. The rate of HT is increased by revascularization procedure and is estimated to occur as symptomatic in 4.4% in the pooled analysis from 5 endovascular thrombectomy (EVT) trials [1] and in up to 6–7% of treated patients in both extended time window EVT trials [2, 3]. However, other factors might drive an increased risk of HT, such as sustained postprocedural hypertension and higher blood pressure variability within 24 h after EVT [4].

Follow-up brain imaging is usually performed within 24 h, or before in cases of neurological worsening, to detect HT, guide antithrombotic treatment regimen, even peri-procedural, and evaluate final infarct volume [5].

In the last years, the possibility of detecting peri-interventional intracranial hemorrhagic complications directly in the angiographic suite, using cone-beam computed tomography (CB-CT) scan, has gained increased attention [6]. However, most of the studies using CB-CT immediately after EVT have applied only a qualitative visual assessment [7, 8] to detect HT resulting in a variable predictive performance. The diagnostic challenge remains to differentiate abnormal post-EVT contrast staining (mimicking subarachnoid, intraventricular, or intracerebral hemorrhage) from actual intracranial hematomas [6,7,8,9,10]. Thus, the early detection of hemorrhagic outcomes using CB-CT immediately after EVT could improve the clinical care management of AIS patients. To date, there is only one study using an objective approach based on maximum-Hounsfield Unit (HU) measurements on CB-CT hyperdense lesions to predict 24-h HT [10]. This study showed promising results with an area under the curve (AUC), sensitivity, and specificity of 0.805, 96%, and 84%, respectively [10].

Machine learning (ML) application has been shown to improve the diagnostic accuracy of imaging in many fields, including AIS [11,12,13]. We hypothesize that the automated extraction of quantitative data from predefined regions of CB-CT immediately after EVT could help predict the 24-h HT risk in AIS patients and may implement visual neuroimaging assessment in the acute setting of AIS.

Materials and methods

Patient population

Between 15 January and 30 April 2022, all consecutive patients admitted at our Comprehensive Tertiary Stroke Center (“Tor Vergata” University Hospital of Rome, Italy), with a diagnosis of anterior circulation AIS due to large vessel occlusion (LVO) confirmed at brain non-contrast-computed tomography (NCCT) and CT-angiography undergoing EVT, immediate post-procedural CB-CT and 24-h post-treatment NCCT, were prospectively enrolled in this study and retrospectively reviewed.

We collected relevant demographic data including age, race/ethnicity, and sex, as well as clinical data, i.e., site of occlusion, stroke severity, pre-treatment modified Rankin Scale, time delays, cardiovascular risk factor, and stroke etiology, to define the population. Neuroradiological features were also collected including baseline Alberta Stroke Program Early CT Score (ASPECTS) on NCCT and Clot Burden Score (CBS) on CT-angiography. Final reperfusion status after EVT was graded according to the modified TICI (mTICI) score, in which mTICI 2b, 2c, or 3 was accepted as successful recanalization [14]. The pre-written study protocol retraced our institutional clinical care routine and was conducted following the ethical standards laid down in the 1964 Declaration of Helsinki and its later amendments. The local ethics committee approved prospective data collection on patients undergoing EVT (Registro Sperimentazioni, R.S. 25/18). Informed consent was obtained from all individual participants included in the study. Data is available upon reasonable request by the corresponding author.

Image acquisition

Brain NCCT and CT-angiography were performed using a GE LightSpeed Plus 256 multislice CT scanner with conscious sedation when clinically indicated. CB-CT was acquired using a monoplane neuro-angiography X-ray system (AlluraClarity FD20; Philips Healthcare, Best, the Netherlands). The sensor area of the detector measures 30 × 40 cm and consists of 2586 × 1904 pixels. The acquisition protocol used was the commercially available “Xper-CT HD” protocol using the manufacturer’s default settings. The acquisition protocol consists of a rotational trajectory over 200° while acquiring 620 projection images at 30 frames per second for 20.8 s. The X-ray tube voltage was set to 120 kV, the focal spot to 0.7 mm, and a copper filter of 0.4 mm was used. The associated CT Dose Index measures 45 mGy and the effective dose measures 1.6 mSv [15, 16]. Pre-processing steps of the projection images include offset correction, gain correction, scatter correction, and water beam-hardening correction.

Imaging evaluation

All images including immediate post-EVT CB-CT and 24-h NCCT were independently reviewed and graded by three raters (one radiology resident [F.P.], one senior neurointerventionalist [V.D.R.], and one senior neurologist [I.M.] with more than 5 years of experience) blinded to clinical data.

Images were analyzed on the same workstation with pre-set window levels. From both CB-CT and 24-h NCCT, four slices were selected at the following levels: (i) basal ganglia, (ii) insula, (iii) bodies of the lateral ventricles, and (iv) semioval centers height.

CB-CTs were read at a slice thickness of 3 mm, to match that of the 24-h NCCT. Each set of images was randomly and blindly analyzed in both a qualitative and quantitative fashion.

Image quality was assessed using a scoring system with a scale of 0 to 3, modeled on previously published techniques [17]: 0 — non-diagnostic or severely compromised, 1 — moderately compromised, 2 — mildly compromised, and 3 — excellent. The image quality of CB-CT was further compared with 24-h NCCT and graded as worse or equal by the readers in consensus.

On CB-CT, four different patterns of pathology were identified: (i) hypodense pattern suggesting ischemic area; (ii) highly hyper-dense pattern in the subarachnoid space suggesting subarachnoid hemorrhage (SAH); (iii) smoothly hyper-dense pattern suggesting “parenchymal contrast staining” (PCS) due to the blood–brain barrier defect; (iv) highly hyper-dense pattern in the brain parenchyma suggesting intraparenchymal HT. All the three readers recognized one or more pathological patterns, when present, for each of the four slices abovementioned, in both CB-CT and 24-h NCCT. The density in HU of the pathological area was measured using a region of interest (ROI) < 5 mm2. In case more than one pattern was recognized in the same slice, ROIs were placed in the parenchymal areas where the density appeared more affected (in terms of hyperdensity). The same measurements were performed also in the analogous anatomical structures of the unaffected hemisphere. The differential HU-ROIs value was assessed between affected and unaffected brain hemispheres for each selected slice.

At 24-h NCCT, the HT was defined as any hyperdense lesion in the context of the infarcted area and then classified according to the radiological European Cooperative Acute Stroke Study (ECASS) I definition [18] in hemorrhagic infarction (HI1 or 2) and parenchymal hematoma (PH1 or 2). Pure SAH (or mixed to contrast extravasation) was defined as subarachnoid hyperdense lesions caused by vessel rupture and was similarly assessed on 24-h NCCT [19]. Whenever it was not possible to disentangle the pathological pattern at 24-h NCCT, a further follow-up neuroimaging performed within 7 days after symptoms’ onset was used to define the pattern. As a surrogate measure of lesion volume, we computed the number of anatomic levels affected for each patient, with a 4-point ordinal scale according to the possibility of having from 1 to 4 slices affected.

Figure 1 reports an example of the sampling modality of the ROIs in CB-CT and 24-h NCCT slices.

Fig. 1
figure 1

In the first line, we reported the region of interests (ROIs) placed on cone-beam computed tomography (CB-CT) slices at the level of the most hyperdense area at four predefined anatomical levels: (A) basal ganglia, (B) insula, (C) bodies of the lateral ventricles, and (D) semioval centers. At the bottom, we reported the corresponding ROIs on 24-h non-contrast-enhanced computed tomography (NCCT) at the same four levels (A′–D′). CB-CT image quality was scored 2 (mildly compromised), and the comparison with the image quality of 24-h NCCT was graded as equal

Statistical analysis

We used the Statistical Package for the Social Sciences 22 for Windows to determine median values, interquartile ranges, and percentages. We compared groups for categorical variables with the χ2 test with Yates’s correction or Fisher’s exact test, and for continuous variables with the Mann–Whitney U test. p Values lower than 0.05 were considered significant. Interrater reliability was assessed by using κ statistics.

In each ROI, we considered the average [Mean (HU)], standard deviation [STD (HU)], minimum [Min (HU)], and maximum values of HU [Max (HU)], in both ROIs of the pathological and unaffected hemisphere. We calculated the signal-to-noise ratio (SNR), and the contrast-to-noise ratio (CNR) as follows:

$$SN R=\frac{{Mean\left(HU\right)}_{\mathrm{path}.}}{{STD\left(HU\right)}_{\mathrm{path}.}}$$
$$CN R=\frac{{Mean\left(HU\right)}_{\mathrm{path}.}-{Mean\left(HU\right)}_{\mathrm{unaff}.}}{\sqrt{{STD\left(HU\right)}_{\mathrm{path}.}^{2}+{STD\left(HU\right)}_{\mathrm{unaff}.}^{2}}}$$

Machine learning analysis

We tested whether mean (HU), SNR, CNR, max (HU), min (HU), and the number of anatomic levels affected could jointly be used to predict at 24-h follow-up two different outcomes: hemorrhagic (HC: HT and SAH) from non-hemorrhagic (non-HC: ischemic, PCS or “unremarkable” brain parenchyma) complications. Because the sample size was too small (172 dyads of ROIs) to allow for a train-test split of the dataset, we adopted the leave-one-out validation technique, a special class of cross-validation technique, which ensures unbiased estimates of accuracy on unseen data. With leave-one-out cross-validation, we excluded one observation from ML training and used it for validation; this was repeated with all samples in the dataset. We tested several algorithms for ML classification: support vector machines, quadratic classifiers, decision trees, and artificial neural networks (with either one or two fully connected layers). All algorithms have been implemented in Python code with the scikit-learn package [20]. Leave-one-out validation was also needed to exclude classifiers that overperformed in the training set but underperformed in the validation set, indicating model overfitting resulting in a lack of generalization. The ML training was performed by a decision tree method (one decision tree per leave-one-out iteration). We selected the following figures of merit for model performances: (i) accuracy, (ii) specificity (also called selectivity), (iii) precision (also called positive predictive value (PPV)), and (iv) sensitivity (also called recall or true positive rate (TPR)) (Fig. 2).

Fig. 2
figure 2

Adapted from Ma et al. [24]

Figures of merit for model performances: (i) accuracy (ACC); (ii) specificity, also called selectivity, or true negative rate (TNR); (iii) precision, also called positive predictive value (PPV), and (iv) sensitivity, also called recall or true positive rate (TPR).

All four figures of merit for performances are functions of the working threshold of the classifier, also called the “operating point.” The higher the threshold is set, the higher the specificity will be obtained but the lower the sensitivity will be. The working threshold was chosen based on the clinical needs, i.e., whether sensitivity and specificity are both equally important, or rather there is a criterion that suggests compromising sensitivity in favor of specificity or vice versa. For example, the optimal working threshold is chosen maximizing the harmonic mean of the PPV and TPR, with additional weights if precision and recall have different clinical importance. For this purpose, the Fβ-score is introduced:

$${\mathrm{F}}_{\beta }=\left(1+{\beta }^{2}\right)\frac{\mathrm{PPV}\times \mathrm{TPR}}{\left({\beta }^{2}\right)\mathrm{PPV}+\mathrm{TPR}}$$

where β is commonly chosen between three values: \(\beta =1\) (unweighted harmonic mean PPV and TPR), \(\beta =.5\) (which weighs recall lower than precision), and \(\beta =2\) (which weighs recall higher than precision).

Given our need to determine the risk of 24-h hemorrhage, the F2-score \(\left(\beta =2\right)\) was chosen as a figure of merit the machine learning algorithm was set to maximize.

Results

Population

Out of 50 patients, we enrolled 43 patients with a confirmed diagnosis of AIS due to anterior LVO undergoing immediate post-EVT CB-CT and 24-h NCCT. Seven patients were excluded since they did not undergo CB-CT for logistical issues related to the use of a secondary angio suite not equipped with CB-CT. Clinical, demographic, and neuroradiological characteristics and those related to treatment are reported in Table 1. Out of 43 patients, 23 presented HC (6 SAH and 17 HT). Out of 17 patients who developed HT, 5 patients presented a PH2 subtype, 9 patients a PH1 subtype, 2 patients HI2, and 1 patient HI1. For a comparison between CB-CT and 24-h NCCT scans, see Fig. 3.

Table 1 Bivariate comparison between patients with and without hemorrhagic complications (HC: subarachnoid hemorrhage and hemorrhagic transformation) according to baseline characteristics of the study population
Fig. 3
figure 3

(A) The machine learning (ML) evaluation of the cone-beam computed tomography (CB-CT) slice at the level of the bodies of the lateral ventricles performed immediately after successful endovascular thrombectomy (EVT) of the left MCA-M1 occlusion, and (B) 24-h non-contrast-enhanced computed tomography (NCCT) of the same patients detected a PH2 hemorrhagic transformation (HT); (C) the ML evaluation of the CB-CT slice at the level of the basal ganglia performed immediately after MT detected with good accuracy the PH1-HT but, less sharply the surrounding area of ischemic core hypodensity, and (D) 24-h NCCT of the same patients confirmed the PH1-HT with better delineation of the final ischemic core. (E) ML evaluation of the CB-CT slice at the level of the insula performed immediately after MT demonstrates with low-accuracy differences in hypodensity between the right affected insula and the left unaffected area, while the 24-h NCCT (F) of the same patient better delineates the final ischemic core at the level of the right insula secondary to unsuccessful right MCA-M2 occlusion

According to clinical and demographic characteristics, patients who developed HC at 24 h differed from those non-HC for age (median age 64.5 years [IQR 56–75.5] versus 81 [IQR 73–82], respectively), and arterial hypertension (~ 61% vs 90%, p = 0.039) (Table 1). Considering neuroradiological features, patients with HC compared to those with non-HC presented a lower ASPECTS (mean value 9 [IQR 8–10] vs 10 [IQR 9–10], respectively), and CBS (mean value 6 [IQR 3–6] vs 7 [IQR 6–8], respectively) on baseline NCCT and a higher number of anatomic levels affected on immediately post-procedural CB-CT (mean value 3 slices [IQR 1–3] vs 0 slices [IQR 0–2], respectively). There were no differences regarding treatment modalities, delays, mTICI scores, or functional outcome except for a higher 24-h NIHSS score in HC compared to non-HC patients (mean value 18 [IQR 13–21] vs 8 [IQR 6–18.5], respectively) (Table 1).

Out of 172 ROIs extracted from affected hemispheres, 92 ROIs were classified unremarkable, 5 were classified as PCS, 29 were classified as ischemia, 7 ROIs were classified as SAH, and 39 ROIs were classified as HT. The summary statistics for the 172 dyads of ROI extracted from the 43 patients are reported in Table 2. The degree of agreement in image interpretation between the three raters was excellent (k = 0.9). We needed to analyze further follow-up imaging for 7 patients (16.3%) to disentangle the neuroradiological pattern.

Table 2 The median value within each pattern at the evaluation of the cone-beam computed tomography slice: mean HU [Mean (HU)], the minimum HU inside the ROI [Min (HU)], the maximum HU inside the ROI [Max (HU)], SNR, and CNR on the affected side. Differences in median values (Δ) of mean (HU) within each pattern between the two sides (affected and unaffected)

Image quality

Mean image quality was significantly lower for CB-CT (1.88 [IQR 1–3]) than for 24-h NCCT (2.51 [IQR 2–3]; p < 0.001).

Predictive value of CB-CT metrics for hemorrhagic complications

The median values of mean (HU), min (HU), max (HU), and CNR were all higher in the HC group (SAH or HT) compared to the non-HC group (unremarkable parenchyma, PCS, or ischemia). Median SNR showed no statistically significant association between”HC vs non-HC” groups.

For summary purposes, Fig. 4 shows the distribution of all samples concerning mean (HU) and CNR.

Fig. 4
figure 4

Scatter plot of contrast-to-noise ratio (CNR) vs mean (HU) in five groups

Machine learning analysis

When the F2-score was chosen as a figure of merit (thus prioritizing sensitivity over specificity) and when all metrics were available, the ML method that best performed on the validation was a Bernoulli Naive Bayes Classifier with MinMaxScaler (with Additive Laplace/Lidstone) smoothing parameter alpha = 1 (one classifier per leave-p-out iteration). This latter was elected as the ML method classifier of choice.

In Fig. 5, results from the ML predictor are shown. Accuracy for classifying HC from non-HC cases was 82.0% with an AUC = 0.877, precision = 59.7%, sensitivity (recall) = 100%, and specificity = 75.4%.

Fig. 5
figure 5

Left: area under the curve (AUC) obtained by classification of hemorrhagic complications vs. non-hemorrhagic complications as obtained by a Bernoulli Naive Bayes Classifier trained over the whole dataset. Right: confusion matrix of the final classification

Discussion

The present study reported our preliminary experience of ML application on CB-CT performed immediately after EVT in patients with AIS, to predict the risk of 24-h HC. We proposed a new methodological approach for automatic early image analysis interpretation based on the Bernoulli Naive Bayes Classifier with MinMaxScaler with a 100% recall rate. We observed specificity of 75% and precision of ~ 60% by the trained model in HC prediction on unseen data.

Our methodology presents some strengths of innovation. First, the use of the highest HU values average hyperdense lesion on a single CB-CT slice compared with the contralateral unremarkable hemisphere resulted in a less time-consuming HC evaluation with an accuracy close to the previously proposed whole lesion(s) average HU evaluation, proposed by Chen et al. [10]. Moreover, the differential HU-ROI measurements between affected (ROIs placed in the most hyperdense parenchymal area) and unaffected brain hemisphere performed at four different anatomic levels provide a panel of data related to HU distribution in ROIs (Mean, Min, Max, SNR, and CNR). This last was used to train the ML algorithm and can potentially provide its further training.

Second, the ML analysis allows unbiased estimation of statistical metrics (accuracy, AUC, precision, recall) performances of the model trained on most of the data (i.e., 80%) in predicting an outcome on unseen data (i.e., 20%). In this way, the selection of classifiers overperformed in the training but underperformed in the validation set was avoided, testing the real validity and reproducibility of the model selected.

Third, by choosing the F2-score as a training learning goal, we explicitly searched for a ML model biased toward a higher recall supported by the need for a more conservative clinical approach, indulgent with type I errors (false positive rate), which minimizes type II errors (false-negative rate). In other words, we admitted the possibility to overestimate the rate of HC rather than underestimate it. Our results reflect this approach: in leave-p-out cross-validation, we obtained the highest possible sensitivity of 100%, corresponding to an ML algorithm that would report all HC, at the expense of a sub-optimal specificity (75%), which result in falsely reporting as “perspective hemorrhagic case” approximately one-fourth of patients with benignant outcomes. We considered this performance acceptable from a clinical perspective. Indeed, the objective interpretation of CB-CT with the use of ML in the angio suite immediately after or during EVT could be retranslated not only in a safer and tailored medical management in an acute setting (i.e., in the case of antiplatelet therapy administration during emergent stenting for AIS due to tandem occlusions) but also to eventually redefine a stricter neuroimaging follow-up (i.e., before the “standard” 24 h) in selected patients.

Despite these encouraging results, our study showed a roughly 60% precision for HC prediction. Although our model was trained by operator-defined ROI, this performance significantly differs from the sensitivity, specificity, and accuracy rates (often over 90%) of visual CB-CT assessment reported in the literature so far [7, 8, 10]. We may speculate that the low ML precision resizes the post-treatment CB-CT visual subjective performance, highlighting the need for a more standardized quantitative one, to overcome the operator variability interpretation.

Moreover, our ML method still has room for improvement as well. In fact, despite our good experience with the routine evaluation of CB-CT post-thrombectomy images, our ML approach is mainly based on the manually performed input we used, hampering the translation of complex disease patterns into a finite number of feature descriptors.

The use of a more sophisticated ML framework such as deep learning, based on deep convolutional neural networks, most used for pattern recognition tasks in images, automatically extracts relevant features from the training samples without manually designed features required as input. This artificial intelligence approach, when adequately trained, is expected to be more robust and accurate to the wide range of variations in features between different hemorrhagic/classes of hyper-densities on CB-CT.

Importantly, since the deep learning process is automated, as long as the training set is much larger and more diverse, it can easily analyze thousands of cases that even human experts may not be able to see and memorize in their lifetime [21]. If properly developed, validated, and implemented, this more sophisticated approach can be expected to be a more efficient data analytics artificial intelligence tool that might complement the experience of clinicians and improve patient care in terms of accuracy and workflow.

Our study presents three main limitations. First, the sample size was small due to the exploratory nature of the hypothesis. Nevertheless, the methodological procedure chosen, allowed us to extract data from 344 regions to perform an ML analysis.

Second, after the post-procedural CB-CT, HC was assessed only with 24-h NCCT with no systematically additional imaging follow-up, if not deemed necessary. The main reason is that our center represents a Comprehensive Stroke Center, so follow-up imaging was frequently performed in the Primary Stroke Centers where patients were further addressed. However, we specifically designed the study to assess the predictive value of CB-CT related to 24-h NCCT patterns, which is the most relevant question in clinical practice. Moreover, the persistence of hyper-densities at NCCT performed after 19 to 24 h has been demonstrated to strongly predict HT (specificity 100%, sensitivity 62.5%) and to be the only reliable method to differentiate PCS from HT [22]. Finally, in our population, symptomatic intracranial hemorrhage occurred in ~ 11% of cases, which is higher than classically reported rates of clinical trials and prospective stroke registries [1,2,3]. We could not exclude a selection bias due to the exclusion of a little share of patients for the already mentioned unavailability of the CB-CT in the secondary AIS angio suite. In fact, the percentage of symptomatic intracranial hemorrhage reported elsewhere in our population was 3.7%, which is close to the known clinical trial and stroke registries [23]. Nonetheless, this bias did not affect the study quality or the results’ generalizability, since our research question concerns the discriminative capability of CB-CT in predicting HC.

Conclusions

To the best of our knowledge, this is the first study applying ML to CB-CT performed immediately after EVT. The proposed approach seems to resize the few results available so far in the literature regarding the visual qualitative interpretation of the CB-CT images after EVT. Despite our preliminary data with ML being suboptimal, this tool could be adequately implemented, and this novel standardized quantitative CB-CT imaging analysis is warranted to overcome the inter-operator variability of interpretation, representing a further step towards a more precise and personalized medicine for our patients.