Introduction

The introduction of intravenous alteplase and mechanical thrombectomy has significantly improved the outcome of patients with acute ischemic stroke, but the serious adverse effects of these therapies mandate careful patient selection [1]. Hemorrhagic transformation is considered one of the most serious complications [2]. Reperfusion therapy may lead to worse outcomes than the natural course of the disease if it causes hemorrhagic transformation [3,4,5].

Complex interactions between multiple factors influence the risk of hemorrhagic transformation. Clinical factors such as a history of atrial fibrillation, diabetes, congestive heart failure, advanced age, or severe neurologic deficits are associated with hemorrhagic transformation [6,7,8]. Large infarction size and increased mean transit time on perfusion imaging have been associated with hemorrhagic transformation [9,10,11,12]. However, the identification of additional and novel imaging features that enhance the predictability of hemorrhagic transformation would be helpful.

The recent radiologic predictions of hemorrhagic transformation (HT) are assessing the degree of ischemia or blood–brain barrier permeability [13]. Various imaging markers are related to the degree of ischemia, encompassing cerebral blood volume, Tmax from perfusion images, and Hounsfield unit from NCCT. Given the significance of blood–brain barrier disruption in HT, investigations into related image markers are well-established. Notably, model-based approaches for computing Ktrans using dynamic contrast enhancement images have been extensively explored. However, according to a recent review, the results are not yet satisfactory in terms of accuracy [13]. The radiomics approach examines CT Hounsfield units to analyze ischemia levels. The textural properties of the infarcted brain tissue on non-contrast computed tomography (NCCT) may incorporate information about tissue frailty. A larger area of the low Hounsfield unit is associated with an increased risk of hemorrhagic transformation [9, 12]. However, additional features associated with hemorrhagic transformation may exist on NCCT. Radiomics has been used to quantify imaging features that may be related to underlying tissue characteristics [14]. Radiomics extract and quantify features that are not easily perceptible to the human eye and may provide a novel approach for predicting hemorrhagic transformation after reperfusion therapy. Scrutinizing tissue texture might has the potential to indirectly indicate blood–brain barrier functionality.

This study hypothesized that a machine learning model developed using radiomics features derived from NCCT may predict hemorrhagic transformation after reperfusion therapy in acute ischemic stroke.

Materials and methods

This study was approved by the Institutional Review Board of the Yonsei University College of Medicine (approval number: 4–2022-0928). In accordance with the Institutional Review Board, informed consent was waived due to the retrospective nature of the study.

Patients

This retrospective study analyzed the post hoc data from a prospective registry (Specialized Multicenter Attributed Registry of Stroke (SMART) registry). Briefly, this registry enrolled consecutive patients with acute ischemic stroke within 7 days of symptom onset. Patients admitted between January 1, 2012, and January 31, 2022, who underwent intravenous thrombolysis or endovascular thrombectomy, were included in this study. The enrolled patients all had NCCT scans before undergoing thrombolysis, so they all had NCCTs. Only patients who underwent follow-up MRI after thrombolysis were included in this study given the greater sensitivity of MRI for detecting hemorrhage [15]. Our analysis did not differentiate between anterior and posterior circulation, encompassing lacunar infarctions resulting from perforator occlusion. Baseline characteristics of the patients are detailed in Table 1.

Table 1 Baseline characteristics of the included patients

Patients were excluded from the study if they had unsuccessful image processing, inadequate MRI scans, no follow-up imaging, initial hemorrhage, or an infarction size that was too small for evaluation. Following the application of exclusion criteria for patients with hemorrhagic transformation, we utilized a propensity score matching technique to select patients without hemorrhagic transformation from the remaining group. The dataset was semi-randomly split with stratification for hemorrhagic transformation into training and test sets in a 7:3 ratio (Fig. 1). All patients were managed according to a standardized protocol that included an extensive evaluation of the underlying stroke etiology. Cardiac evaluations included 12-lead electrocardiography, transthoracic and/or transesophageal echocardiography, continuous electrocardiography monitoring in the stroke unit, and 24-h Holter monitoring.

Fig. 1
figure 1

Flowchart of the patients included in the current study. PS, propensity score

Initial computed tomography imaging acquisition

NCCT was performed using SOMATOM Definition Flash (Siemens) with the following parameters: 120 kVp, 135 mAs, and 5-mm slice thickness. Iterative reconstruction was applied with a low-pass filter J30s kernel.

Imaging analysis of hemorrhagic transformation

All patients underwent computed tomography angiography as part of their initial assessment. After intravenous thrombolysis or endovascular thrombectomy, follow-up imaging studies were performed approximately 24 h later using magnetic resonance imaging (MRI), including gradient-recalled echo sequence or susceptibility-weighted imaging and MR angiography. Additional imaging studies were performed in cases of clinical worsening at any time. The patient was determined to have hemorrhagic transformation if the patient showed a substantial lesion with susceptibility within the infarcted area on gradient-recalled echo sequence or susceptibility-weighted imaging, with or without hyperdensity on NCCT. The presence of hemorrhagic transformation was confirmed by a neuroradiologist with 11 years of experience.

Pre-processing and feature extraction

Digital Imaging and Communications in Medicine files were converted into Neuroimaging Informatics Technology Initiative format. Utilizing 3D Slicer [16], a sphere with a diameter of 1 cm was manually drawn as the volume of interest (VOI) on the initial NCCT image. The VOI was placed at the location of the lowest attenuation on the NCCT scan, where the infarction was thought to be located. A neuroradiologist with 9 years of experience located the VOI, which was reconfirmed by a neuroradiologist with 11 years of experience. If there was a disagreement between the two radiologists, the VOI was located by consensus. Normalization was performed on the entire NCCT scan and windowing in the range of 20–50 Hounsfield units was applied. The NCCT was resampled to an isotropic voxel size of 1 × 1 × 1 mm using advanced normalization tools [17]. A total of 94 radiomic features of VOIs were extracted from NCCT images using PyRadiomics [18]. Included features are first-order statistics that detail voxel value distributions; texture matrices such as Gray Level Co-occurrence Matrix, Gray Level Run Length Matrix, Gray Level Size Zone Matrix, Gray Level Dependence Matrix, and Neighboring Gray Tone Difference Matrix that capture spatial relationships and patterns; and a range of filters and transformations including wavelet and Laplacian of Gaussian for multi-resolution analysis and feature emphasis (Supplementary Table 1).

Model development

The basic unit of our models was a pipeline consisting of three components: a standardizer using Z-score normalization, feature selection using the least absolute shrinkage and selection operator, and finally, a classifier using a tree-based model [19, 20]. A classifier was selected between two tree-based models, the light gradient boosting model (LightGBM) and ExtraTrees, with the intent of achieving a high-performing model [21, 22]. LightGBM is known for its effective handling of unbalanced data and ExtraTrees model for its computational efficiency. Two tree-based modeling methods were applied separately twice: once to all radiomic feature sets (i.e., first-order and texture), and once to texture features only. The best model was selected for each of the two types of radiomic feature sets, encompassing all kinds of features as well as texture features alone. During model training, fivefold cross-validation was performed for hyperparameter optimization using the area under the receiver operating characteristic curve (AUROC) as an evaluation metric.

Model performance and feature importance

After training, the models were tested on a test dataset. The predictive performance of the models was evaluated using the AUROC with a 95% confidence interval (CI). The best-performing models for each feature set (all features and textural features) were selected. The diagnostic performance (sensitivity, specificity, and accuracy) was derived, and the cutoff value of 0.5 was used. To examine which radiomic features are important for predicting hemorrhagic transformation, we calculated the mean absolute Shapley value for each of the selected input features using the Shapley additive explanations (SHAP) algorithm [23]. A clinical model was developed using logistic regression with clinical variables previously reported to be associated with hemorrhagic transformation to compare its performance with that of the radiomics model. The included variables were sex, age, history of hypertension, diabetes, dyslipidemia, atrial fibrillation, chronic heart failure, medication history of antiplatelets, oral anticoagulants, statins, presence of left ventricular, left atrial or appendage thrombus, location of infarction (represented as binary variables of cortex, subcortex, insular, corona radiata, basal ganglia, internal capsule, thalamus, and cerebellum), intravenous thrombolysis, endovascular thrombectomy, initial NIH Stroke Scale, hemoglobin level, hematocrit level, white blood cell count, platelet count, low-density lipoprotein level, and initial glucose level. The clinical model was built with the training dataset. The AUROC of the clinical model was compared with that of the radiomics model on the test dataset using DeLong’s test.

Statistical analysis

Descriptive statistics were calculated for the baseline characteristics of the included patients. The chi-square test was performed for categorical variables, and the Mann–Whitney U test for continuous variables. Propensity score matching was performed to eliminate confounding variables with a 1:1 ratio for patients with and without hemorrhagic transformation based on the scores from the logistic regression model. The logistic regression model incorporated all variables included in the clinical model, which were selected by their association with hemorrhagic transformation. Nearest neighborhood method was used, with a 1:1 ratio. Average treatment effect in the treated was used as the target estimand. Python 3 with the open-sourced ScikitLearn library version 0.21.2 was used [24]. Statistical analyses were performed using R version 4.0.0 (R Foundation for Statistical Computing), which was also open-sourced, with ‘MatchIt’ package for propensity score matching [25, 26]. The schematic flow of the study is presented in Fig. 2.

Fig. 2
figure 2

Diagram showing image pre-processing and feature extraction. ANTs, advanced normalization tools

Results

Patients

A total of 1268 patients underwent intravenous thrombolysis or endovascular thrombectomy during the study period. Hemorrhagic transformation occurred in 271 (21.4%) patients. Of these patients with hemorrhagic transformation, 218 were included after excluding 53 (19.5%) patients due to failure of image processing (42 patients), low-quality imaging (five patients), infarction size being too small to draw the VOI (three patients), no follow-up MRI performed (two patients), and hemorrhage present on initial imaging (one patient). Using propensity score matching, 218 patients without hemorrhagic transformation were selected. After propensity score matching, imaging analysis was performed in patients without hemorrhagic transformation; 74 patients were further excluded because of failure of image processing (27 patients), no follow-up MRI was performed (10 patients), and the infarction size was too small to draw the VOI (37 patients). As a result, a total of 218 patients with hemorrhagic transformation and 144 patients without hemorrhagic transformation were successfully enrolled in the study. Representative images of patients who experienced hemorrhagic transformation and those who did not are shown in Fig. 3. In 6 out of 362 cases, there was disagreement about the position of the VOI and it was repositioned by consensus.

Fig. 3
figure 3

Representative images of the patients included in this study showing (A) non-contrast brain computed tomography, (B) follow-up magnetic resonance imaging of a patient who suffered hemorrhagic transformation, (C) non-contrast brain computed tomography, and (D) follow-up magnetic resonance imaging of a patient who did not suffer hemorrhagic transformation

The median age of the included patients was 77 years (interquartile range, 69–83 years), and 51.1% (185 patients) were male. Of the 218 patients included in the hemorrhagic transformation group, 38.5% (84 patients) received intravenous thrombolysis and 88.5% (193 patients) received endovascular thrombectomy. Similar proportions of patients received intravenous thrombolysis (33.3%, 48 patients, p = 0.37) and endovascular thrombectomy (94.4%, 136 patients, p = 0.08) in the non-hemorrhagic transformation group (Table 1). Presence of risk factors were similar between the patients who suffered hemorrhagic transformation and those who had not. Pre-stroke medications were similar between the two groups. The stroke severity, represented as NIH Stroke Scale, was similar between the two groups (median 14 for the hemorrhagic transformation group and 15 for the non-hemorrhagic group). Initial glucose level was significantly higher for the patients who suffered hemorrhagic transformation (137.0 mg/dL vs. 129.0 mg/dL, p = 0.01). Other laboratory values were similar between the two groups. The infarct topology was similar between the two groups. Also, there were no differences in onset to admission time between two groups (hemorrhagic transformation [175.5 min] vs non-hemorrhagic transformation [181.0 min], p = 0.839).

Performances of the models and feature importance

The best-performing model was the LightGBM when all features were used. The all-feature model showed an average AUROC of 0.992 on the fivefold validation and 0.986 (95% CI, 0.971–1.000) on the test dataset (Table 2 and Fig. 4). With a cutoff of 0.5, the model demonstrated an accuracy of 92.7%, a sensitivity of 93.9%, a specificity of 90.9%, a positive predictive value of 93.9%, a negative predictive value of 90.9%, and an F1-score of 93.9%. The ExtraTrees model performed best when textural features were used. The textural model showed an average AUROC of 0.944 for the fivefold validation and 0.845 (95% CI, 0.774–0.916) for the test dataset. With a cutoff of 0.5, the model demonstrated an accuracy of 73.4%, a sensitivity of 81.8%, a specificity of 60.5%, a positive predictive value of 76.1%, a negative predictive value of 68.4%, and an F1-score of 78.8%.

Table 2 Performances of the selected models
Fig. 4
figure 4

Receiver operating characteristic curves of the best-performing model using (A) all features and (B) textural features

The extratrees model, utilizing only texture features, demonstrated a moderately high AUROC; however, its overall performance lagged behind that of the LightGBM model. Notably, the sensitivity reached 81.8%, while the specificity declined to 60.5%, signifying a reduction in specificity. The features selected for each model are listed in Supplementary Table 2.

For the best-performing all-feature model, the SHAP analysis showed that the minimum value of the VOI was the most important feature (Fig. 5). The Large Dependence Low Gray Level Emphasis was the second most important feature of the model, which quantifies the prevalence of large clusters or groups of low-intensity voxels. The importance of the first-order features was approximately six times higher than that of the second-most important feature. The five most important features of the textural feature model are Sum Average, Joint Average, Low Gray Level Emphasis, Large Dependence Low Gray Level Emphasis, and Short Run Low Gray Level Emphasis. Sum Average measures the average intensity of paired voxels, shedding light on the region’s overall luminosity. Similarly, Joint Average indicates the region’s average intensity. Low Gray Level Emphasis accentuates the presence of low-intensity voxels. Large Dependence Low Gray Level Emphasis identifies areas characterized by significant groupings of low-intensity voxels, illuminating their spatial interconnectedness. Short Run Low Gray Level Emphasis highlights brief chains of low-intensity voxels, offering insights into the image’s fine structure and textural aspects.

Fig. 5
figure 5

Feature importance using the Shapley additive explanations algorithm on the test dataset for the (A, B) all-feature LightGBM model and the (C, D) textural feature ExtraTrees model

The logistic regression model using all clinical variables showed an AUROC of 0.714 (95% CI, 0.650–0.778) on the training dataset and 0.544 (95% CI, 0.431–0.658) on the test dataset. The performance of the radiomics models developed in this study were significantly better than that of the clinical model on the test dataset (p < 0.001). The coefficients of the logistic regression model are presented in Supplementary Table 3.

Discussion

This study showed that machine learning models using radiomic features from NCCT of the brain can accurately predict hemorrhagic transformation after reperfusion therapy. The best-performing model using all radiomics features was the LightGBM model. When textural features were only used, the ExtraTrees model performed the best. The SHAP analysis showed that first-order features were the most important features in the all-feature model, and Sum Average was the most important feature in the textural feature model.

The SHAP analysis revealed that first-order features were the most predictive of the occurrence of hemorrhagic transformation. The AUROC of the model using first-order features was 0.986, suggesting a close relationship between the first-order features and hemorrhagic transformation. The minimum and the ten percentile values were particularly correlated with hemorrhagic transformation, suggesting that a low Hounsfield unit is a predictor of hemorrhagic transformation. This is in line with previous studies, which have shown that hypodensity on NCCT is associated with hemorrhagic transformation [9, 11, 12]. Features such as the minimum and 10th percentile in NCCT reflect the extent of ischemic damage. These indicators become more pronounced with increased severity of ischemic damage, emphasizing their significant role in predicting HT. Textural features were also predictive of hemorrhagic transformations. Considering the high predictive value of the first-order features of hemorrhagic transformation, an additional model excluding the first-order features was developed. The textural feature model also showed promising results, with an AUROC of 0.845 for the test dataset. This may imply that textural features other than the Hounsfield unit are associated with the histological characteristics of hemorrhagic transformation. Textural features are basically less intuitive and more difficult to understand than first-order features derived from histogram analyses. However, three of the five textural features selected in the textural model were Low Gray Level Emphasis and its derivatives, which quantified the spatial distribution of low gray-level values. This indicates that textural values reflecting the distribution of low Hounsfield units in the infarcted area are also valid components of this feature set. These features could potentially provide information on the correlations between voxels of low-attenuation values that are not represented by first-order features; thus, they may contain information regarding the microstructural changes in the brain observed on NCCT. We posit that this microstructural change may be associated with blood–brain barrier permeability. Further investigation is warranted to explore this potential relationship.

An automated method using these highly accurate machine learning models that can be readily incorporated into clinical practice for the prediction of hemorrhagic transformation can guide physicians in making high-risk reperfusion decisions in urgent situations. For example, patients identified by the models as high risk for hemorrhage might receive more cautious reperfusion therapies. In addition, these predictions may alter post-thrombolytic management protocols during admission. Those identified as high-risk would undergo intensified neurological assessments and more frequent imaging, while their medication regimens, particularly anticoagulants and antiplatelet agents, might be adjusted to minimize bleeding risks. Considering the absence of a definite guideline for dosage and timing of antithrombotic therapy after reperfusion, patients at risk for hemorrhage may receive a lower dose of anticoagulation or delay antithrombotic therapy altogether [27]. This early prediction would facilitate swift interventions in case of hemorrhage onset, from aggressive blood pressure management to potential surgical options. Additionally, transparent discussions with families about elevated risks would set clear expectations, and from a resource standpoint, high-risk patients could be prioritized for intensive care units. Certainly, numerous challenges must be addressed before practical application, including issues of generalizability and accuracy. Effective coordination with physicians for the interpretation of predictions is also crucial in overcoming these hurdles.

This study used NCCT, which is the most widely and easily performed imaging modality for acute ischemic stroke [27,28,29]. Radiomic feature extraction can be rapidly performed without considerable computational power. In addition, the machine learning model used in this study is lightweight and can be run on any consumer-grade central processing unit without the need for a graphics processing unit. Previous studies using radiomics for prediction of hemorrhagic transformation have included clinical variables and/or infarction boundary was manually drawn by a professional [28, 29]. For the models developed in this study, all processes except for VOI drawing can be performed automatically. The VOI may also be easily automated using a machine learning model trained to detect low-density infarction core [30,31,32]. If VOI drawing is performed manually along the edge of the infarction, it would be a time-consuming process that can vary from person to person. Therefore, to ensure that the shape of the VOI did not affect the prediction, we used a uniform VOI of a 1-cm sphere and excluded shape features from the model development. Therefore, we expect our method to be easier to apply clinically than manual region drawing.

This study had several limitations. Even with propensity score matching, the final patient groups were not matched, owing to the exclusion of patients with image processing errors that could not be assessed before patient selection. NCCT equipment and protocols, parameters, and kernels may affect image texture, yet we did not control these aspects. Nonetheless, our model showed high performance and appears to better reflect real-world situations. SHAP analysis was performed to determine the features most strongly associated with hemorrhagic transformations. However, there are limitations to understanding the clinical implications of radiomic features on SHAP analysis, which are deemed important for hemorrhagic transformation. For the model to show stable performance across different machines and settings, a fixed sized sphere was used, and preprocessing, such as resampling was performed. Textural features shown to be important in this study were proven to be stable across different imaging conditions [33, 34]. To prevent overfitting, several steps were taken. Dataset was Z-score normalized, which ensures that the features have a mean of 0 and a standard deviation of 1. This standardization is crucial for models like LightGBM and ExtraTrees that are sensitive to the scale of the data, as it prevents features with larger scales from dominating the learning algorithm. The least absolute shrinkage and selection operator feature selection was applied, which is particularly effective for datasets with a smaller number of observations as it enhances the model’s generalization by selecting only the most relevant features. This not only helps in reducing the dimensionality of the data but also in eliminating irrelevant or redundant predictors that could lead to overfitting. Finally, we incorporated fivefold cross-validation during model training. This technique involves partitioning the dataset into five subsets, using four for training and one for validation in each iteration. This process not only provides a robust estimate of the model’s performance but also ensures that the model is tested across all data points, thus preventing the model from learning noise specific to a single subset of the data. However, the clinical applicability of the models is limited owing to the lack of external validation. The slice thickness for NCCT can vary, and the windowing range might differ based on specific settings, both of which can influence the reproducibility of our findings. Additionally, the manual positioning of the VOI by neuroradiologists could introduce reproducibility concerns. We have used a fixed-size VOI on the lowest attenuation, applied normalization on the entire NCCT scan, used a specific windowing range, and resampled to an isotropic voxel size to overcome these concerns. However, a future validation study including multiple centers in different regions worldwide is needed to maximize the generalizability and reduce potential bias. This study used fivefold validation with 7:3 split ratio for training and validation and two machine learning algorithms. A more performant model could be developed with a different approach. For example, in a recent radiomics investigation employing brain MRI, the authors assessed diverse models and observed superior performance with XGBoost [35]. Different cutoffs can be used in clinical settings. This was a retrospective single-center study, and the generalizability of the findings should be further validated.

In conclusion, the radiomics model created using initial NCCT was highly accurate in predicting the occurrence of hemorrhagic transformation after reperfusion therapy in patients with acute ischemic stroke. The two radiomics models confirmed that a low Hounsfield unit is a strong predictor of hemorrhagic transformation and that textural features alone can predict hemorrhagic transformation.