Introduction

Intracerebral hemorrhage (ICH) is an important public health problem, accounting for about 10–15% of all strokes and leading to high mortality and disability rates [1, 2]. The 1-month fatality of ICH was approximately 40%, and it increases with age [3]. The risk of death in patients is usually estimated based on the clinical experience of attending physicians. Nevertheless, overestimation can cause unnecessary withdrawal or care limitation, the so-called self-fulfilling prophecy, while underestimation can result in prolonged, unnecessary, and expensive treatment [4]. Thus, predictive tools for prognosis of ICH patients are of great importance to get favorable outcomes.

Several prognostic tools have been proposed to predict mortality and functional outcome in ICH, but there are many methodological and reporting deficiencies [5]. Recently, owing to the widespread use of machine learning (ML) algorithms and increasing computing power, there is a growing interest in ML-enhanced diagnostic and prognostic vision in the medical field. Wang et al. developed an ML-based outcome prediction model by combining initial clinical presentation, laboratory data, and imaging findings [6]. Guo et al. also collected initial information on admission to predict the functional outcome of ICH patients in China [7]. Besides, clinical predictors known to affect ICH outcomes were excluded in predicting prognosis for ICH [8]. For instance, ICH can induce the inflammatory response [9], and inflammation is associated with poor clinical outcomes in patients with ICH [10, 11]. Neurological deterioration affects approximately one-third of patients with spontaneous ICH and increases the risk of death and dependency [12, 13]. These factors could provide potential benefits for the prognostic prediction of ICH, indicating that more accurate models could be established by including more variables. Moreover, no widely recognized tool is available in predicting the outcome of Chinese ICH patients [14]. To our knowledge, no prognostic model for ICH applying ML is constructed considering inflammation and changes in patients’ condition during treatment. This study aimed to develop and validate an ML-based model to predict functional outcome in Chinese patients with ICH in view of patients’ basic characteristics, laboratory tests, imaging findings, and condition changes in treatment.

Methods

Study population

This retrospective cohort study enrolled patients with ICH from two hospitals between November 2017 and November 2020. The follow-up period ended in February 2021. The institutional review boards of the two hospitals approved the study (ethics approval number: [2019]085 and 2021C023). Oral informed consent was obtained from the patients or their guardians. The study was performed in accordance with the Declaration of Helsinki. Inclusion criteria included the following: (1) the diagnosis of ICH complied with the American Heart Association/American Stroke Association ICH guidelines published in 2010; (2) the age of patients was ≥ 18 years; (3) the medical records of patients were complete (including baseline data, laboratory tests, imaging data, treatment records and prognostic data). Exclusion criteria were as follows: (1) patients had cerebral hemorrhage caused by traumatic brain injury, brain tumor, or cerebrovascular malformation; (2) patients took anticoagulant or antiplatelet drugs before the onset; (3) patients died within 7 days of hospitalization; (4) patients had congenital or acquired coagulation factor deficiencies or platelet abnormalities.

Data collection

The outcome was a poor functional outcome defined by the Modified Rankin Scale (mRS) score [15] of 3–6 at 3 months, which was collected via follow-up phone interviews. The mRS score of 0–2 was defined as a good functional outcome. We also collected demographic data (gender, age, height, weight, body mass index (BMI), medical history (hypertension, diabetes, hyperlipidemia, atrial fibrillation, acute coronary events, other diseases), smoking history, alcohol use history); baseline (on admission) vital signs (systolic blood pressure (SBP), diastolic blood pressure (DBP), body temperature, Glasgow Coma Scale (GCS) score); baseline disease characteristics (time from onset to emergency room, hematoma volume, hemorrhage site, intraventricular hemorrhage, intraventricular hemorrhage extension, subarachnoid extension, blend sign, spot sign, leukodystrophy, cerebral lacuna, brain atrophy); baseline laboratory test data (white blood cell count, platelet count, lymphocyte count, neutrophil count, serum potassium, serum sodium, serum calcium, hemoglobin, blood glucose, serum creatinine, activated partial thromboplastin time (APTT), international normalized ratio (INR), fibrin, troponin, total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, triglyceride); inflammatory marker (systemic immune-inflammatory index (SII)); 24-h admission indicators (SBP at 24 h, DBP at 24 h, hematoma volume at 24 h, GCS score at 24 h); treatment-related indicators (SBP change in 24 h, DBP change in 24 h, hematoma volume change in 24 h, GCS score change in 24 h, neurological deterioration (within 7 day), and treatment (medical treatment, surgical treatment)); mortality. SII = platelet count × neutrophil count/lymphocyte count. For BP management, the target level to which systolic blood pressure should be lowered in our hospitals is <140 mmHg [16].

Construction and evaluation of the random forest model

The study population was divided into training set and testing set with a ratio of 7:3 using completely random sampling. The training set was used to build the model to predict the functional outcome of ICH patients at 3 months, and the testing set is used for internal validation. All variables were included in the least absolute shrinkage and selection operator (LASSO) regression for variable selection. The selected variables were then incorporated into the random forest algorithm to construct the prediction model. The predictive performance of the random forest model was evaluated via the area under the curve (AUC), accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and calibration curve. The 95% confidence intervals (CIs) were estimated.

Statistical analysis

Measurement data with normal distribution were shown as the mean ± standard deviation (mean ± SD), and the t test was used for intergroup comparisons. Measurement data with skewed distribution were described by the median and quartile (M (Q1, Q3)), and comparisons of groups were performed using the Wilcoxon rank sum test. Counting data were represented with the number of cases and constituent ratio (n (%)), and between-group comparisons were subjected to the chi-square test or Fisher’s exact test. Missing values of baseline and 24-h data were imputed using multiple imputation. LASSO regression was applied for multivariate analysis to select variables. The importance ranking of these variables was provided by the random forest model. The receiver operating characteristic (ROC) curve was plotted, and the AUC was calculated. All statistical tests were two-tailed, and P < 0.05 was deemed statistically significant. SAS 9.4 (SAS Institute Inc., Cary, NC) was adopted for statistical analysis, and Python 3.8 (Python Software Foundation, Wilmington, DE) was utilized for random forest modeling.

Results

Characteristics of the study population

A total of 414 patients with ICH were retrospectively reviewed according to the inclusion and exclusion criteria. Two patients with abnormal body temperature (3.6 and 70°C, respectively) were excluded. In the end, 412 ICH patients were included for this study (Figure 1), with 268 (65.05%) males and 144 (34.95%) females. The mean age was 59.42 ± 12.59 years. Most of the patients (66.50%) had hypertension. The median GCS score on admission was 11. The median baseline hematoma volume was 20.00 mL. The majority of patients (63.59%) received medical treatment. The features of ICH patients are presented in Table 1. According to different mRS scores, these patients were divided into the good functional outcome group (n = 273) and the poor functional outcome group (n = 139). The results of comparisons showed that the two groups were significantly different in other medical history (except history of hypertension, diabetes, hyperlipidemia, atrial fibrillation, and acute coronary events), SBP, DBP, body temperature, baseline GCS score, time from onset to emergency room, baseline hematoma volume, intraventricular hemorrhage, blend sign, spot sign, brain atrophy, serum sodium, serum calcium, blood glucose, APTT, troponin, total cholesterol, low-density lipoprotein cholesterol, triglyceride, SII, SBP at 24 h, DBP at 24 h, GCS score at 24 h, DBP change in 24 h, hematoma volume change in 24 h, neurological deterioration, treatment, and mortality (all P < 0.05). The general mortality of patients was 12.62%; the mortality of the good and poor functional outcome groups was 1.10% and 35.25%, respectively. As shown in Table 2, the training set contained 288 patients, and the testing set included 124 patients. Generally, no significant difference was found in patient characteristics between the two groups.

Fig. 1
figure 1

Flow chart of study population selection. ICH, intracerebral hemorrhage

Table 1 Characteristics of study population
Table 2 Difference between the training and testing sets

Ranking of feature importance

After LASSO regression, 12 features are selected (Figure 2). These features were neurological deterioration, GCS score at 24 h, baseline GCS score, time from onset to emergency room, blood glucose, DBP change in 24 h, hematoma volume change in 24 h, SII, SBP change in 24 h, serum creatinine, serum sodium, and age in descending order of importance, according to the feature importance given in the random forest model (Figure 3).

Fig. 2
figure 2

LASSO feature selection for model construction. LASSO, least absolute shrinkage and selection operator

Fig. 3
figure 3

Importance ranking of features for model construction. GCS, Glasgow Coma Scale; DBP, diastolic blood pressure; HV, hematoma volume; SII, systemic immune-inflammatory index; SBP, systolic blood pressure

Predictive performance of the random forest model

Table 3 demonstrates the predictive ability of our developed random forest model for functional outcome at 3 months. In the training set, the cut-off value was 0.429, and the model had the accuracy, AUC, sensitivity, specificity, PPV, and NPV of 0.899 (95%CI 0.865–0.934), 0.960 (95%CI 0.939–0.982), 0.920 (95%CI 0.867–0.973), 0.888 (95%CI 0.843–0.933), 0.814 (95%CI 0.742–0.886), and 0.954 (95%CI 0.923–0.985), respectively (Table 3, Figure 4). In the testing set for internal validation, the cut-off value was 0.429, and the accuracy, AUC, sensitivity, specificity, PPV, and NPV of the model were 0.895 (95%CI 0.841–0.949), 0.964 (95%CI 0.931–0.998), 0.872 (95%CI 0.767–0.977), 0.906 (95%CI 0.844–0.968), 0.810 (95%CI 0.691–0.928), and 0.939 (95%CI 0.887–0.991), separately (Table 3, Figure 5). These results suggested that this random forest model had a good predictive performance. The calibration curves in the training and testing sets showed a good calibration capability of the model (Figures 6 and 7).

Table 3 Predictive performance of the random forest model
Fig. 4
figure 4

ROC curve of the random forest model for 3-month functional outcome in the training set. ROC, receiver operating characteristic; AUC, area under the curve

Fig. 5
figure 5

ROC curve of the random forest model for 3-month functional outcome in the testing set. ROC, receiver operating characteristic; AUC, area under the curve

Fig. 6
figure 6

Calibration curve of the random forest model for 3-month functional outcome in the training set. RF, random forest

Fig. 7
figure 7

Calibration curve of the random forest model for 3-month functional outcome in the testing set. RF, random forest

Discussion

Although many prediction tools have been proposed, ML-based prognostic models that take inflammation and changes in patients’ condition during treatment into account simultaneously are lacking in predicting poor functional outcome at 3 months for ICH patients. This study developed a random forest model by considering patients’ basic features, laboratory test indicators, imaging findings, and condition changes in treatment, with the accuracy, AUC, sensitivity, specificity, PPV, and NPV of 0.895, 0.964, 0.872, 0.906, 0.810, and 0.939 in the validation cohort, respectively, indicating a good predictive capacity of this model in the prediction of 3-month poor functional outcome.

Guo et al. [7] constructed 6 ML-based models to predict the poor functional outcome at 3 months of ICH patients considering the level of monocytes, and proposed that logistic regression (LR) and logistic regression cross-validation (LRCV) models exhibited better predictive performance for functional outcome (AUC = 0.890 and 0.887, respectively) than the ICH score. In this study, we also included the monocyte level to develop a random forest model, with additional consideration of other laboratory variables, such as blood glucose and serum sodium. Additionally, in the study of Gupta et al. [17], Intracerebral Hemorrhage Outcomes Project 3 (ICHOP3) was developed for 3-month functional outcome using GCS, National Institutes of Health Stroke Scale, Acute Physiology and Chronic Health Evaluation II, premorbid mRS, and hematoma volume to predict the 3-month functional outcome, and had an AUC in the derivation and validation cohorts of 0.89 and 0.745, separately. A prior study by Hall et al. [18] incorporated hematoma volume at hospital admission, hematoma expansion, intraventricular hemorrhage, overall ICH score, and GCS for prediction models of unfavorable functional outcome at 3 months, exhibiting the AUC of 0.75 for decision tree and 0.82 for random forest. The age and neurologic deficit (PLAN) score based on baseline information, as reported by Du et al. [19], had an AUC of 0.84 when predicting functional outcome (mRS 5–6) at discharge. The developed ML-based model herein displayed good discrimination and calibration, with the AUC of 0.964, considering basic characteristics, laboratory results, imaging findings, and condition changes in treatment of ICH patients. Qiu et al. [20] demonstrated that high serum S100A12 levels at admission, which are highly associated with the extent of inflammatory response, severity, and early neurologic deterioration in ICH patients, significantly discriminated a poor functional outcome, with the AUC of 0.794. Copeptin and location-specific differences in hematoma volume were also reported to have prognostic value for the 90-day functional outcome, and great discrimination of functional dependence was found by incorporating one of these two into a prediction model [21, 22]. Future work can consider incorporating serum S100A12 levels, plasma copeptin levels, and interactions between hematoma volume and location to optimize prediction models.

According to the feature importance analysis of this random forest model, neurological deterioration was illuminated to significantly contribute to the prediction of 3-month poor functional outcome for ICH patients. Neurological deterioration influences about one-third of patients with ICH and elevates the risk of death and dependence [13, 23, 24]. Law et al. [12] identified that neurological deterioration was independently associated with an mRS of > 3 and unfavorable functional outcome at day 90. Patients without neurological deterioration showed minimal change in hematoma volume at 24 h (1.5 mL), whereas patients with neurological deterioration had a significantly greater absolute increase in hematoma volume (16 mL) [12], which could worsen functional outcome of patients with ICH. High blood glucose levels were shown to correlate with poor functional outcome in ICH [25]. Hematoma with high blood glucose was found to cause nerve damage and decreased autophagy in ICH models [26]. Hyperglycemia can increase superoxide production in ICH induced by tissue plasminogen activator [27]. Hyperglycemia can also increase hematoma expansion in ICH models by plasma kallikrein [28]. These may contribute to the unfavorable functional outcome of ICH patients. Glycemic control can improve cerebral hemodynamics and neurochemistry in the perihematomal area in primary basal ganglia hemorrhage [29]. Stroke, including intracerebral hemorrhage, results in neuronal cell death and the release of factors such as damage-associated molecular patterns (DAMPs) that elicit localized inflammation in the injured brain region. Such focal brain inflammation aggravates secondary brain injury by exacerbating blood-brain barrier damage, microvascular failure, brain edema, oxidative stress, and by directly inducing neuronal cell death. In addition to inflammation localized to the injured brain region, a growing body of research suggests that inflammatory responses after a stroke occur and persist throughout the entire brain. Global brain inflammation might continuously shape the evolving pathology after a stroke and affect the patients’ long-term neurological outcome [30]. The inflammation and coagulation response after ICH would accelerate the formation of brain edema around hematoma, resulting in a more severe and durable injury [2]. Recent evidence indicates that neuroinflammation plays a critical role in injury expansion and brain damage [31]. As an inflammatory index, SII was highly associated with 90-day functional outcome in patients with ICH and could be used to predict outcomes [32]. Early SII index was reported as an independent predictor of poor outcome at time of hospital discharge for patients with supratentorial spontaneous ICH [10]. With additional attention to these factors, the random forest model performed well in predicting the functional outcome of ICH patients at 3 months.

A random forest model with good predictive capacity was established in the current study, incorporating inflammation and changes in patients’ condition during treatment, which may help to identify ICH patients with poor 3-month functional outcome and offer early interventions in clinical practice. There were several limitations. First, external validation is lacking in this paper, which may restrict the generalizability of our findings. Second, the mRS score was collected via follow-up phone interviews. Misjudgment of patients’ functional outcome evaluation may be present because interviewees might not be able to give accurate answers during the follow-up phone interview. Although prior studies have shown that assessment of the mRS through the structured telephone is reliable and comparable with face-to-face interviews [33], long-time functional outcome was not assessed in our study. Third, some variables such as neuroimaging and genetic variables, and medication of patients after discharge were not taken into account. Besides, due to the retrospective nature of this study, we could not obtain complete information about specific therapeutic procedures, the department where patients were admitted, the length of hospital stay, and the rehabilitation administered. More studies are needed in predicting functional outcome by considering these variables.

Conclusion

A random forest model was developed and validated to have a good predictive performance for poor functional outcome at 3 months among Chinese ICH patients, which might provide additional assistance for clinicians to predict 3-month functional outcome and offer appropriate medical care. Further studies are required to evaluate the applicability of this model in other populations.