Introduction

Spontaneous intracerebral hemorrhage (ICH) affects nearly 100,000 Americans each year, and acute management improves the prognosis [1]. Although severity of ICH is easily summarized [2] and standing order sets are common, presentation of ICH is not monolithic. Different etiologies of spontaneous ICH, such as hypertension or cerebral amyloid angiopathy [3], result in different clinical phenotypes [4]. Each clinical phenotype might be expected to have a different clinical course, complications, and potentially different patient outcomes. Thus, optimal management likely differs between phenotypes [5]. For example, patients with lobar hematomas and altered consciousness are at increased risk for seizures and thus would be more likely to benefit from antiseizure medication and electroencephalography monitoring compared to patients with deep hematomas and intact consciousness [6].

Although prior work has sought to identify differences in risk factors, features, and outcomes based on ICH location [7, 8] and characterize the locations and etiologies of spontaneous ICH in specific populations [9], to our knowledge, no study has sought to agnostically identify clinical phenotypes using unsupervised machine learning. As a form of artificial intelligence, machine learning algorithms recognize previously unknown patterns in data structure [4]. Unsupervised machine learning algorithms are agnostic to expert-defined labels and prespecified assumptions, which allows for the discovery of novel patterns [10]. These algorithms have been broadly applied to phenotyping cardiovascular diseases and aortic stenosis [11,12,13,14,15]; distinguishing patterns of end-of-life care delivery in the intensive care unit (ICU) [16]; and grouping stroke symptoms, biomarkers, and complex patient outcomes [17,18,19,20]. Unsupervised methods reduce human bias introduced from classifications based on clinical expertise and thus allow us to validate conventional clinical wisdom and discover phenotypes that may not have previously been characterized. Better characterization of clinical phenotypes could allow for the development of more precise management and, potentially, improved outcomes.

We tested the hypothesis that unsupervised machine learning could identify clinical phenotypes in patients with acute ICH. We also explored if these clinical phenotypes were clinically meaningful based on associations with complications (e.g., seizures) and functional outcomes.

Methods

Patients

We conducted a retrospective analysis of deidentified prospectively collected patient data obtained from three sources: (1) the Northwestern University Brain Attack Registry (NUBAR), a prospectively collected registry of electronic health records with detailed information on patient outcomes after stroke; (2) a cohort of patients from the Johns Hopkins Hospital and Johns Hopkins Bayview Medical Center, part of Johns Hopkins Medicine [21]; and (3) Antihypertensive Treatment of Acute Cerebral Hemorrhage-II (ATACH-II) [22], a clinical trial data set of patients with ICH that we used for validation of the phenotypes [23]. For consistency, we included only patients from ATACH-II enrolled in the United States because of variability in treatment response internationally (e.g., regional differences in treatments and outcomes for patients from Asia) [24, 25]. Across the three data sources, we analyzed complete patient records [17, 20].

Variable selection was limited because of the availability of data that could be harmonized across all three data sources. Harmonized patient data collected at admission included age, sex, race, ICH volume, ICH location, Glasgow Coma Scale (GCS) score, international normalized ratio (INR), systolic blood pressure (SBP), intraventricular hemorrhage (IVH), infratentorial location, history of diabetes, history of hypertension, and hematoma expansion. ICH location was dichotomized as lobar hematoma location versus thalamus, basal ganglia, brainstem, caudate, cerebellar, lentiform nucleus, or other location of hemorrhage. GCS score was categorized as < 5, 5–12, and 13–15. INR was dichotomized as 1.4 and lower (“normal”) and 1.5 and greater (“abnormal”). All patients had a diagnostic computerized tomography (CT) scan and a standard-of-care follow-up CT scan conducted around the 24-h mark [26, 27]. Hematoma expansion was calculated as the hematoma volume on the second or subsequent CT scan minus the hematoma volume on the initial CT scan. We defined hematoma expansion as growth of 6 mL or greater across all data sets [28]. Hematoma expansion was recoded to a binary true/false variable for harmonization across the three data sources.

Hematoma volumes at Northwestern were measured with validated, semiautomated, voxel-based techniques from CT scans. We previously established the high interrater reliability of this hematoma volume measurement technique in patients with ICH and reported excellent correlation between two separate evaluators (Spearman ρ = 0.99, P < 0.001) [29]. These validated methods account for voxel-by-voxel measurements (three-dimensional representations of volume that have the density of acute hemorrhage). Hematoma volumes for ATACH-II were adjudicated by a central reader. Hematoma volumes for the Hopkins data set were calculated using the ABC/2 method. When IVH was next to ICH, an expert adjudicated where the IVH began and the intracranial hematoma ended.

Outcomes

The patient outcomes assessed included seizure, hospital length of stay (LOS) in days, ICU LOS in days, discharge disposition, and the modified Rankin Scale (mRS; a global functional outcomes scale) score at 3-months’ follow-up. Seizures were defined based on characteristic clinical presentation observed during hospitalization by a clinician and reviewed by a study neurologist or electroencephalography monitoring per protocol [30]. Disposition at discharge was harmonized across the three data sources by recategorizing as died, inpatient (e.g., rehabilitation, acute care, nursing facility), outpatient (e.g., home), and other. The mRS score was dichotomized to mRS scores 0–3 (“good outcome,” independence or better) and mRS scores 4–6 (“poor outcome,” dependence or death) [31,32,33]. Follow-up mRS scores at 3 months were available for patients in the ATACH-II and NUBAR data sets.

Machine Learning

We used unsupervised k-prototype cluster analysis to group patients into clinical phenotypes because this algorithm performs well with mixed categorical and continuous data [34]. The algorithm generated mutually exclusive groups from the 13 independent admission variables using a combination of means for continuous variables and modes for categorical variables. This unsupervised cluster analysis was performed independent of the patient complications or outcomes data. Both elbow method heuristics and average silhouette method calculations were used to determine the optimal number of clusters [17]. Each cluster represents a clinical phenotype composed by maximizing similarities within and differences between clusters according to select admission data [16]. We generated a two-dimensional visualization of the clinical phenotypes using the uniform manifold approximation and projection (UMAP) [35]. The UMAP algorithm employs a nonlinear approach for dimension reduction.

Validation

We validated the k-prototype clustering algorithm in the independent cohort of ATACH-II data [11, 23]. We trained the k-prototype model using the aggregate data from the two US medical centers. We used the same 13 variables identified in the derivation cohort to assign phenotypes in the external validation cohort [11]. Our model was tested in this external cohort to validate the generalizability of the k-prototype clustering algorithm [15].

Statistical Analysis

Continuous data between phenotypes were compared using analysis of variance for normally distributed data or the Kruskal–Wallis H-test for nonnormally distributed data. Categorical data were tested for an association with phenotype using χ2 statistics. A P value of 0.05 was used as the threshold for statistical significance. Analysis was performed in R v4.2.2, package “clustMixType” (RStudio: Integrated Development for R. RStudio, PBC, Boston, MA, 2020. www.rstudio.com) [36, 37].

Results

The 13 patient admission data variables clustered into three clinical phenotypes of ICH (Fig. 1). Elbow method heuristics and average silhouette method calculations suggested the optimal number of clusters (k) was three (Supplementary Figs. 1 and 2). Demographics of the three clinical phenotypes are shown in Table 1. Illustrative head CT scans for the three clinical phenotypes are shown in Supplementary Fig. 3. Demographics of incomplete patient records not included in the analysis are documented in Supplementary Table 1.

Fig. 1
figure 1

UMAP model outputs visualized in two dimensions, three clinical phenotypes of ICH emerged

Table 1 Comparison of patient demographics across three clinical phenotypes

Although there was no human labeling of the clusters of patients, the three distinct phenotypes were clinically meaningful. Of the 531 patients from the two US medical centers, 141 (26.6%) were assigned to phenotype 1, 204 (38.4%) were assigned to phenotype 2, and 186 (35.0%) were assigned to phenotype 3 (Table 1). Clinical phenotype 1 included patients with small hematomas, high blood pressure, and GCS scores greater than 12. Clinical phenotype 2 included individuals with hematoma expansion and elevated INR. Clinical phenotype 3 included individuals with larger median hematoma volumes (24.0 [interquartile range 8.2–59.5] mL), who were more commonly Black or African American, and who had IVH (Fig. 2). This phenotype had the greatest proportion (18.8%) of patients with GCS scores less than 5 (Table 1).

Fig. 2
figure 2

Descriptive boxplots of three ICH clinical phenotypes, phenotypes based on patient demographics collected at admission

Patient complications and outcomes varied with the three clinical phenotypes (Table 1, Fig. 3). Seizures were more common in patients with phenotype 2 (P = 0.024). Patients in phenotype 3 had the longest durations for ICU and hospital LOS, more frequently died during hospitalization (38.2%), and had a greater proportion of poor mRS outcomes at 3-months’ follow-up (83.7%) (Table 1). The three clinical phenotypes were significantly associated with LOS (P = 0.001), discharge disposition (P < 0.001), and poor outcome on the mRS at 3-months’ follow-up (P < 0.001) (Fig. 3). Cause of death (e.g., death by neurological criteria, cardiac arrest, withdrawal of life support) was only available for the Northwestern cohort and was not associated with clinical phenotype (P = 0.6).

Fig. 3
figure 3

Descriptive boxplots of three ICH clinical phenotypes, phenotypes with corresponding patient complications and outcomes

We separately validated the three clinical phenotypes in the independent ATACH-II data (Supplementary Fig. 4). Of the 385 patients, 184 (47.8%) were assigned to phenotype 1, 130 (33.8%) were assigned to phenotype 2, and 71 (18.4%) were assigned to phenotype 3. As in the derivation cohort, validation cohort phenotypes differentiated between patients with small hematomas, elevated blood pressure, and GCS scores > 12 (phenotype 1); coagulopathic patients with hematoma expansion (phenotype 2); and patients with large hematomas and IVH (phenotype 3) (Supplementary Table 2). There were associations between the clinically validated phenotypes and LOS (P < 0.001), discharge disposition (P = 0.001), and death or disability (mRS scores 4–6) at 3-months’ follow-up (P < 0.001). In the validation cohort, seizure was not significantly associated with clinical phenotype (P = 0.5), potentially a function of the low seizure incidence in the independent cohort of ATACH-II data (1.0% versus 8.1%, P < 0.001) (Supplementary Table 3).

Discussion

In this multicenter study analyzing data from 916 patients with ICH, we found that unsupervised machine learning clustered patient presentations into three clinically distinct phenotypes. In turn, these data-derived clinical phenotypes had different risk factors for ICH (e.g., chronic hypertension, anticoagulation), different rates of seizures, and different likelihoods of poor functional outcome at follow-up. These results suggest that unsupervised machine learning could be a useful technique to identify clinical phenotypes, anticipate complications (e.g., need for ventricular drainage), and potentially inform protocols for treatment (e.g., prophylactic seizure medication).

Of the three clinical phenotypes, the one characterized by Black or African American race, large hematomas, and IVH was associated with the worst outcomes (phenotype 3). Although the clinical implications of larger hematomas are in line with previous work documenting the strong association between increased ICH volume and poor outcomes [38,39,40], we documented a new pattern of patient race alongside the clinical presentation and functional outcomes for patients in phenotype 3. This finding contributes to the expansive body of work on social determinants of health and ICH incidence, treatment, and outcomes [41,42,43,44]. Phenotype 1 and phenotype 2 were distinctly separated by SBP, an underlying factor that drives care management for patients with ICH [45]. The use of a phenotype may be complementary to standard measures of severity (e.g., ICH score) to anticipate potential complications and treatments.

The clinical phenotypes we present in this study may have potential implications in the targeted care of patients with ICH [15]. Accounting for clinical phenotype may allow for more precisely targeted assessment and treatment for subgroups of patients with ICH, a key step to achieving the goal of precision medicine [4]. Accounting for these phenotypes could promote targeted risk assessment and personalized treatment [10]. The data-derived clinical phenotypes may contribute to targeted therapy (e.g., prophylactic seizure medication, antithrombotic reversal, antihypertensive treatment) for each of the identified phenotypes [46].

These phenotypes could promote attention to targeted prevention of seizures, antihypertensive medication, and hemostatic medication. Seizures worsen outcomes in patients with ICH [30, 47,48,49,50]. The use of prophylactic seizure medication is common (~ 40% of patients) [51], yet indiscriminate administration to patients is associated with worse mRS scores and reduced health-related quality of life, particularly cognitive function [52, 53]. Targeting the use of antiseizure medications to patients at high risk due to seizures is more likely to achieve the intended benefit (preventing seizures) while minimizing the risk of side effects in patients at low risk for seizures. Prothrombotic agents (e.g., activated factor VII) [54] are likely to be beneficial in a subset of patients [54,55,56,57,58] but have potential adverse effects (e.g., venous thromboembolism) [59]. Improved patient selection is needed to precisely identify patients most likely to benefit from prothrombotic medications. Some patients require more antihypertensive medications even though there are increased adverse effects, particularly acute kidney injury [60]. The clinical phenotypes we present here could support judicious administration of prophylactic seizure medication, prothrombotic agents, and antihypertensive treatment.

Strengths of our approach include the large sample size from multiple centers, which could improve generalization. In addition, we reproduced three clinical phenotypes of ICH in an independent cohort. This external validation supports our results and lays the groundwork for the generalizability of our findings [10, 11].

Our agnostic subtyping approach used clinical variables to describe clinical phenotypes of ICH without prespecified assumptions or human-derived heuristics. Clinical phenotypes of ICH may be most useful in consort with human understanding and as a complement to standard measures of severity (e.g., ICH score). However, there are several limitations to this analysis. Although we were able to analyze data from multiple sources, we limited the scope of our analysis to patient data from the United States. International data (particularly Asia) were available from the ATACH-II trial, but we used domestic patients for consistency with the two other US cohorts. Patients from Asia also have regional differences in treatments and outcomes after ICH [25]. These differences could furnish different clinical phenotypes depending on individual patient populations or country of origin. Additionally, the ATACH-II trial experienced a relatively low risk of seizures, potentially diminishing our power to detect a difference in the external validation cohort. Future research may attempt to replicate our findings in cohorts representing different demographic groups and patients from other institutions. Another limitation was the lack of baseline mRS data for comparison with the reported mRS scores at 3-months’ follow-up. However, these seem unlikely to meaningfully change our analysis.

Although our leveraging of routine clinical data enhances the practical utility of understanding the natural segmentation of patients with ICH, our analysis was limited by the inclusion of only basic radiographic descriptors and the exclusion of more diverse imaging, biomarker, and clinical features data. Future research may attempt to replicate these phenotypes and incorporate other clinical features that were not available in the three data sources we harmonized (e.g., presence of abnormal vascular lesions, renal function, pretreatment with antiplatelet medications, levels of LDL, toxicology screen results, microbleeds, presence of dementia, etc.). These diverse data sets may further enable the detection of novel patterns and additional clinical phenotypes of ICH with unsupervised machine learning.

We used a machine learning method to identify clinical phenotypes from the data because existing etiologic classifications of intracranial hemorrhage (e.g., Structural lesion, Medication, Amyloid angiopathy, Systemic/other disease, Hypertension, Undetermined (SMASH-U)) regarding structural lesions criteria (e.g., vascular malformations) remain unclear in > 20% of patients [61]. Although the clinical phenotypes we describe can be mapped in part onto SMASH-U, they have the advantage of accounting for multiple variables present on admission. The variables we used to define phenotypes (e.g., hematoma location, blood pressure) are routinely collected. Even so, not all clinical phenotypes are represented. For example, cerebellar hematoma with brainstem compression is well described, but patients with cerebellar hematomas were excluded from ATACH-II, so the algorithm could not identify it. The clinical phenotypes we describe may be useful for guiding targeted assessments and treatments, even if they do not include every conceivable phenotype. It is possible that other clinical phenotypes of ICH might be discerned from additional data sets, a topic for future research.

Conclusions

In summary, we identified phenotypes of ICH admission data using unsupervised machine learning. These clinical phenotypes were clinically meaningful, associated with complications, and associated with functional outcomes at follow-up. Identifying clinical phenotypes at admission could lead to more targeted patient care by anticipating risk factors, complications, and outcomes.