Introduction

Acute ischemic stroke (AIS), which has significant death and disability rates, poses a serious threat to human life [1]. The rapid intravenous delivery of recombinant tissue type plasminogen activator remains the main method for the early management of AIS in selected individuals [2].

Patients with stroke undergoing intravenous thrombolysis (IVT) are at risk of developing hemorrhagic transformation (HT) [3]. Between 10 and 48% of patients who receive thrombolytic therapy develop HT [4]. In cases of HT following thrombolysis, the prognosis is worse, regardless of whether the condition is symptomatic hemorrhagic transformation or not. This could also affect subsequent treatment. Moreover, with the trend of extending the time window further, it is important to detect HT development earlier and more precisely. Therefore, stroke neurologists can take proactive measures to prevent clinical deterioration and make optimal treatment decisions if HT is predicted early.

At present, various methods can be used to predict HT, for example, clinical indicators [5,6,7], imaging data [8,9,10,11,12,13,14] or combination of clinical features and imaging markers [15]. However, some previous studies involved fewer predictors, and the effectiveness of the prediction model was relatively limited (AUC < 0.75). Moreover, some studies have applied magnetic resonance imaging (MRI), which is time-consuming. In addition, because computed tomography perfusion examinations generated much more radiation and were expensive, they cannot be performed in smaller medical centers. Nevertheless, noncontrast computed tomography (NCCT) images were chosen as the study’s source images because they are more frequently performed, take less time, and, most importantly, are recommended by the guidelines for AIS [2]. NCCT images, particularly for thick images with slice thickness of 5 mm and slice spacing of 5 mm, are also much more practical for reexamination.

Radiomics analysis, a new method for assisting precision medicine, can automatically extract radiomics features from medical images, which is anticipated to overcome the shortcomings of visual image evaluation [16, 17]. According to previous studies, NCCT-based radiomics features have superior abilities in various disciplines. Regarding stroke, several problems have been reported, such as prediction of hemorrhage expansion after spontaneous intracerebral hemorrhage [18, 19], early identification of ischemic stroke [20] and estimating infarction onset time [21]. However, the method of obtaining radiomics features based on NCCT has not been utilized for the prediction of HT risk following thrombolysis, which is the primary focus of the present study.

Numerous machine learning (ML)-related studies have emerged in recent years, and ML undeniably performs quite well in classification and prediction [22]. At present, stroke-related ML research has been reported, and its performance is very outstanding [23]. Some studies have found that the method of predicting HT by ML is feasible [24, 25]. In this study, we developed and validated ML algorithms to automatically predict HT in patients with AIS receiving IVT combining clinical and radiomics-based features.

Materials and methods

Patient selection

This multicenter retrospective analysis was approved by the institutional review boards of our hospital, and the necessity for patient informed consent was waived.

Clinical data and NCCT images were collected from seven hospitals from June 2012 to December 2021. A total of 822 consecutive patients with AIS were chosen for inclusion. This study included patients with AIS who met the following criteria: (1) were undergoing IVT in accordance with the management guidelines for AIS, (2) had completed NCCT examination before IVT therapy, and (3) underwent a follow-up MRI or NCCT within 36 h after receiving IVT. Patients with head trauma injuries, primary cerebral hemorrhage or brain tumors, hemorrhagic infarction upon admission, insufficient data, and severe artifacts on NCCT images were excluded.

Finally, a total of 517 patients (282 patients without HT and 235 patients with HT) were enrolled. The dataset from six hospitals (the First Affiliated Hospital of Chongqing Medical University, Chongqing General Hospital, Haikou Affiliated Hospital of Central South University Xiangya School of Medicine, the Second People’s Hospital of Hunan Province/Brain Hospital of Hunan Province, the First Affiliated Hospital of Hainan Medical University, Changsha Central Hospital (the Affiliated Changsha Central Hospital, Hengyang Medical School, University of South China)) was randomly divided into training cohort (n = 355) and internal validation cohort (n = 90). Data from the seventh hospital (People's Hospital of Yubei District of Chongqing City), which included 33 patients with HT and 39 patients without HT, were kept as an independent external validation cohort. The flowchart of patients’ preparation is depicted in Fig. 1.

Fig. 1
figure 1

Flowchart of patients’ selection (IVT, intravenous thrombolysis; HT, hemorrhagic transformation)

Obtaining clinical data

Clinical data (demographic data and laboratory tests) were obtained. Laboratory tests on admission (including blood pressure, blood glucose levels, and blood lipid levels), initial National Institute of Health Stroke Scale (NIHSS) score, onset-to-CT time, medical history (including smoking (smoking index), drinking (drinking index), previous stroke, diabetes mellitus, and atrial fibrillation) and Trial of ORG 10172 in acute stroke treatment (TOAST) typing of acute stroke etiology were examined separately from the electronic medical record system.

Imaging acquisition

Additional file 1: Table S1 presents the models of CT scanners and scanning parameters used in seven institutions (Additional file).

Reference standard

HT was determined based on the European Co-operative Acute Stroke Study-II trial [26]. CT images show high-density lesions, including hemorrhagic infarction (HI) and parenchymal hemorrhage (PH). In this study, two neuroradiology staff members independently evaluated HT on follow-up NCCT or MRI within 36 h following IVT therapy for all the training and testing datasets without knowledge of the patient outcome (X.H. and L.B.Y., directors with 10 years of experience in neuroradiology). Any discrepancy was resolved by consensus. By comparing prior CT or MRI images, HT and contrast agent extravasation could be differentiated, and the conclusion was supported by examination performed 2–7 days following treatment.

Data preprocessing

Clinical data were processed using Z-score normalization after missing values were filled in using K-nearest neighbor (KNN). Furthermore, the steps of NCCT image normalization were as follows: (a) every NCCT image slice was resampled to a unified pixel dimension size of 1.0 × 1.0 × 1.0 mm3; (b) image intensity of every NCCT image was normalized by the gray-level discretization method with a fixed number of bins (256 bins); The purpose of the two steps was to minimize any potential effects brought on by scanners, scanning parameters. In addition, NCCT images were set in a fixed head window (window level = 50 Hounsfield unit (Hu); window width = 110 Hu). The purpose was to ensure there was less difference while manually drawing lesions.

Radiomics analysis

The region of interest (ROI) of cerebral infarction was manually defined on the axial slices of NCCT images using the 3D-Slicer software, slice by slice, around its perimeter. If the lesion’s border was not clearly visible on the NCCT image, diffusion-weighted images taken within 6 h were used to draw the border.

To ensure the reproducibility of radiomics features, the intra- and interobserver correlation coefficients were computed using the ROIs randomly selected from 20 patients. By comparing the ROIs’ features of radiologists 1, the intraclass correlation coefficient (ICC) was determined (twice, one month apart). Comparing the ROIs’ features of radiologists 1 and 2 allowed for the calculation of the inter-ICC. The features (with both ICCs threshold ≥ 0.95) having good reliability were added to the subsequent analysis (Additional file 1: Figure S1).

By applying the mask of ROIs, radiomics features were extracted based on the 3D-Slicer package (Version no. 4.13.0) (https://www.slicer.org/). Eight categories of radiomics features were obtained as follows: first order; shape; shape 2D; gray-level co-occurrence matrix (GLCM); gray-level run length matrix (GLRLM); gray-level size zone matrix (GLSZM); neighboring gray-tone difference matrix and gray-level dependence matrix.

Finding the best method to select features

Firstly, for clinical data, T-test was used to test the characteristics of significant difference between HT group and non-HT group in the training cohort. Then we compared the five common dimensionality reduction methods (including Least Absolute Shrinkage and Selection Operator (LASSO), Select from Model, Recursive Feature Elimination Cross Validation (RFECV), Recursive Feature Elimination (RFE), and Logistic Regression (LR)) by ten-fold cross validation in the training cohort to choose the best one. And the best method was used to select the most important clinical features (the blue part of Fig. 2).

Fig. 2
figure 2

Flowchart of the most important features’ selection (The numbers in parentheses are characteristic numbers; ICC, Intercorrelation Coefficient; LASSO, Least Absolute Shrinkage and Selection Operator; RFECV, Recursive Feature Elimination Cross Validation; RFE, Recursive Feature Elimination; LR, Logistic Regression; Linear SVC, Linear Support Vector Classification; SGD, Stochastic Gradient Descent; SVM, Support Vector Machine; RF, Random Forest; XGB, eXtreme Gradient Boosting)

Secondly, for radiomics features, the unrepeatable features below the ICC threshold were eliminated from the 1037 radiomics features obtained after sketching, leaving 778 features. The best of the five popular methods (LASSO, Linear Support Vector Classification, RFECV, RFE, Tree-based Model) was chosen by ten-fold cross validation in the training cohort to select the most significant radiomics features (the red part of Fig. 2).

Finding the best ML algorithm to build models

Before modeling, the effects of five ML algorithms (eXtreme Gradient Boosting (XGB), Support Vector Machine, LR, Stochastic Gradient Descent, and random forest) were compared to identify the best algorithm by ten-fold cross validation in the training cohort (the yellow part of Table 4) to develop the prediction model of HT.

Building models

Independent clinical prediction factors (p < 0.05) for HT were obtained by the best dimensionality reduction method mentioned above (Fig. 2). Then, they were used to develop a clinical model by the best ML algorithm. And the model was used in the internal and external validation cohorts to test the efficiency.

Using the same process as developing the clinical model, a radiomics model was constructed in the training cohort utilizing the most significant radiomics features which were ultimately selected. In addition, the internal and external validation cohorts also need to verify the radiomics model.

Finally, the clinical–radiomics model, which combined distinct clinical risk variables and important radiomics features, was developed in the training cohort and then validated independently in the internal and external validation cohorts.

Model evaluation

To assess each model’s performance, the receiver operating characteristic curve was created, and the AUC was calculated.

The calibration curve was presented to assess the model’s capacity for calibration, which compares the consistency between real results and the clinical–radiomics model. To assess the combined model’s clinical utility, a decision curve analysis (DCA) was used. The workflow of the radiomics analysis of the ROI and the building model is described in Fig. 3.

Fig. 3
figure 3

Workflow of the clinical–radiomics model of predicting HT after IVT (IVT, intravenous thrombolysis; HT, hemorrhagic transformation; DCA, decision curve analysis)

Statistical analysis

All statistical data were analyzed using R software (version 4.1.3) (https://www.r-project.org/). Normal data were presented as means ± standard deviation, and qualitative data were shown as numbers and percentages. A chi-squared test, a two-sample t-test, or the Mann–Whitney U test was used to evaluate the clinical characteristics. To compare the AUCs of various models, the DeLong test was utilized. A two-sided p value < 0.05 was deemed significant for all statistical analyses.

Results

Patients’ clinical characteristics

This study comprised 517 individuals (333 male [64.4%], mean age ± standard deviation 67.02 ± 12.67) from seven hospitals. The clinical data of all patients are shown in Table 1.

Table 1 Characteristics of all the patients

Outcomes of screening features

According to Tables 2 and 3, the RFE is the best method to select features for clinical and radiomics data. This study found that XGB had the highest accuracy and AUC (Table 4).

Table 2 Performance of five methods to select clinical features in the training cohort
Table 3 Performance of five methods to select radiomics features in the training cohort
Table 4 Performance of five machine learning algorithms to predict HT in the training cohort

The result of the univariate analysis for clinical risk factors associated with HT in the training cohort is presented in Table 5. After RFE, five independent risk factors were selected, including baseline NIHSS (0.379), fibrin degradation products (FDPs) (0.183), monocyte (0.147), D-dimer (0.164), and N-terminal pro-brain natriuretic peptide (NT-proBNP) (0.128) (the number reflects the importance of each feature).

Table 5 Univariate analysis for HT in the training cohort

In all cases, 778 radiomics features were reduced to avoid model overfitting by RFE. Then, 12 radiomics features were selected for building radiomics model (Fig. 4).

Fig. 4
figure 4

Importance of the 12 radiomics features

Performances of the clinical and radiomics models

The AUCs were calculated to assess the performance of the three models.

The clinical model demonstrated an AUC of 0.996 (95% CI 0.991–0.999) in the training cohort, 0.898 (95% CI 0.873–0.921) in the internal validation cohort, and 0.911 (95% CI 0.891–0.928) in the external validation cohort for differentiating patients with HT (Fig. 5, Table 6).

Fig. 5
figure 5

ROCs of the clinical model, radiomics model, and clinical–radiomics model by XGB (XGB, extreme gradient boosting)

Table 6 Performance of the three models

In the training cohort, the radiomics model displayed an AUC of 0.999 (95% CI 0.999–1.000); internal validation cohort, 0.922 (95% CI 0.896–0.941); and external validation cohort, 0.883 (95% CI 0.851–0.902) (Fig. 5, Table 6).

Performances of the clinical-radiomics model

In the training cohort, the AUC of this clinical–radiomics model was 0.995 (95%CI 0.991–0.999); in the internal validation cohort, it was 0.950 (95% CI 0.925–0.967), and in the external validation cohort, it was 0.942 (95% CI 0.927–0.958) (Fig. 5, Table 6). The DeLong test demonstrated no difference between the clinical model and the clinical–radiomics combination model in the training cohort, internal validation cohort or external validation cohort (p = 0.954, 0.179, and 0.364, respectively).

In the training cohort (p = 0.458) and internal validation cohort (p = 0.341), the clinical–radiomics model and the observed result had excellent agreement on the calibration curve for the potential of HT. However, the external validation cohort (p = 0.032) had slightly worse consistency (Fig. 6).

Fig. 6
figure 6

Calibration curves for the clinical–radiomics model in the training and validation cohorts

The decision analysis curves for all three models indicated that they were all clinically useful in predicting the HT (Fig. 7).

Fig. 7
figure 7

DCA for the clinical, radiomics, and clinical–radiomics model in the training and validation cohorts (DCA, Decision Curve Analysis)

Discussion

In this study, we developed an ML approach by incorporating clinical data with radiomics-based features from NCCT to predict the risk of HT postthrombolysis. It compensates for the limitations of visual recognition of NCCT image signs, reduces misdiagnosis and missed diagnosis by inexperienced first-line doctors, and increases their diagnostic confidence. By contrast, because these clinical data are obtained by routine examinations after admission, this model can be applied rapidly and is practical. In addition, we used data from multiple centers for modeling and independent external verification, so the model has a high degree of generalizability.

In this study, five clinical variables including NIHSS, FDPs, D-dimer, monocytes, and NT-proBNP on admission were found to be significant risk factors of HT for patients with AIS undergoing IVT. NIHSS is frequently used to measure the severity of a stroke, as it is effective for determining awareness and motor, sensation, response, and advanced nerve functioning [27]. Previous studies have shown that NIHSS was an independent risk factor of postthrombolysis HT [28, 29], which is consistent with the finding of our study. This indicates the need to avoid thrombolytic therapy if a patient suffers from a serious stroke. A study demonstrated that the FDPs on admission were related to PH [6, 30] because FDPs might impede platelet aggregation by competing with fibrinogen for binding to the platelet membrane [31, 32]. D-dimers may cause monocytes to produce and release proinflammatory cytokines, such as interleukin-6 (IL-6) and IL-1β [33], which will augment the HT risk [34]. These findings suggest that the coagulation index is very important for predicting HT, which needs the close attention of doctors. Finally, NT-proBNP is an inactive fragment derived from the cleavage of BNP [35]. In line with earlier studies [36,37,38], our findings showed that NT-proBNP level was independently associated with HT in patients with stroke who had received IVT. One possible explanation was that HT exacerbated the brain damage caused by ischemic stroke [39]. NT-proBNP needs to be distinguished from other disorders, such as heart failure; when its level increases, its sensitivity is great, but its specificity is weak. Thus, if a patient suffers from AIS, the doctor may recognize that the patient has a higher risk of HT.

The proposed clinical model could be utilized to estimate the HT risk and had excellent prediction efficiency. In this study, the AUC of the clinical model in the external validation was slightly greater than that in the internal validation, demonstrating a stronger capacity for generalization. Previous studies have only used clinical baseline data to predict HT [24], which is best suited for patients without access to imaging data. Since HT is primarily from the infarction itself and it is not sufficient to analyze merely clinical signs, we extracted the imaging characteristics of the ischemic area.

Then, the features extracted from NCCT images were used to develop the radiomics model, and the prediction efficiency was also great. In this study, 12 optimal quantitative radiomics features were extracted. Among them, the wavelet-HHH-glrlm-RunEntropy was the most important feature that mirrored the apparent heterogeneity of the infarction. The result might imply that the HT risk increases with the degree of heterogeneity of the infarct area. Nevertheless, further studies are required to elucidate the relationship between the pathological changes in HT and NCCT-based radiomics features. In addition, the shape features described the size of the cerebral infarction. The findings were similar to those of earlier studies [40] and revealed that the larger the infarct zone, the greater the HT risk. Consequently, although some signs on NCCT, such as the hyperdense middle cerebral arteries sign or the Alberta Stroke Program Early CT Score, can predict the risk of HT [41], front-line doctors may fail to recognize these signals because they lack experience, with which could be compensated by our radiomics-based models.

In this study, although DeLong's method yielded no statistical significance, it is noteworthy that AUC and accuracy were consistently higher in clinical-radiomics model than clinical model, across internal and external validation sets, showing the potential for radiomics to improve the prediction for future HT. In addition, the DCA results also revealed that the three models had a significant net benefit in predicting HT. As a result, if these models will be used as trustworthy, repeatable tools can guide therapeutic decisions. Moreover, they are less time-consuming—the model will actually identify the HT risk of a new patient within only a minute. Thus, this model may be used in clinical practice as soon as possible after being confirmed by a larger group.

This study has some limitations. First, because this was a retrospective study with a limited sample size, although participants from seven hospitals were included, some selection biases may have occurred. Moreover, because part of clinical data of some patients was incomplete, K-nearest neighbor method was used to fill in the missing values, which was the limitation of the study. After this study’s attempt, the NCCT-based model and clinically relevant data were quite efficient, which gave us more confidence to broaden the scope of our subsequent research. Later, despite studies on radiomics nowadays, including those on tumor, inflammation, and cerebrovascular illness, the model’s interpretation of radiomics features is generally limited or incomplete. Therefore, it is expected that subsequent research would gradually shed light on the internal link between radiomics features and clinical results. Furthermore, the intrinsic attribute of CT radiomics makes that most radiomic features are highly affected by CT acquisition and reconstruction settings [42], so we preprocessed all the NCCT images to reduce the impact of these factors. Thirdly, this study's reproducibility and viability may be limited by the fact that the ROI from post-IVT MRI was not feasible in a setting where only pre-IVT NCCT was administered. Fourthly, this article only predicts whether HT will occur, not including HT typing (HI and PH). Finally, some risk factors related to HT were not included, such as matrix metalloproteinases, homocysteine, and others, because some hospitals lacked the necessary laboratory indices. In actuality, many multicenter or big data research projects also struggle with this challenging issue. The integrity and consistency of the current data must be resolved because doctors in different institutions cannot agree on the same disease examination method and equipment, and patient’s compliance is different. As a result, more institutions must join the project of predicting HT, which calls for more alluring experts to organize them to do so. In the future, increasing their clinical practicability could help in integrating more clinical characteristics into the model and expand the number of samples.

Conclusion

By using radiomics features extracted from NCCT images and clinical features, this study established a model that demonstrated great performance and individualized risk assessment of HT for patients who received IVT. This trustworthy model can help first-line doctors identify patients who are at a significantly higher risk of HT and support them when they make clinical decisions.