Introduction

Intracerebral hemorrhage (ICH) represents the second most common stroke subtype and remains a significant cause of morbidity and mortality [1,2,3]. Rapid diagnosis and attentive management of ICH patients are crucial, since early deterioration is common in the first few hours after ICH onset [4]. Computed tomography (CT) scan is the most common imaging modality to detect acute ICH. The initial hematoma volume combined with other clinical and radiological factors is considered strong predictors of clinical outcomes and mortality after ICH [5,6,7,8,9]. However, hematoma volume measurement requires manual interpretation, which carries an inherent subjectivity [10, 11]. Hematoma 3D volumetry may be more accurate and more reproducible compared to the classic ABC/2 method in assessing hematoma volume [12,13,14,15]. Furthermore, the required 3D hematoma segmentation would allow automated densitometric and geometric analysis, without the need for operator input. A resulting comprehensive hematoma analysis may complement traditional head CT analysis, since CT density patterns and geometric characteristics of the hematoma may contribute to the prediction of ICH outcome [16,17,18,19,20].

The primary objective of this study was to evaluate the performance of predictive models for ICH outcome using a prototype of hematoma segmentation software that can provide 3D volumetry as well as hematoma density and shape quantification. These models were developed with the hypothesis that user-independent quantitative hematoma characteristics from automated 3D segmentation of ICH (i.e., volume, shape irregularity, and density profiles) combined with other imaging or clinical variables at admission will allow accurate prediction of both mortality and poor outcome at discharge. Our secondary objective was to introduce quantitative methods for hematoma CT density and shape analysis. These methods range from a simple metric to quantify hematoma shape irregularity using sphericity index (SI) to a data-driven approach for understanding the variability of density profiles among hematomas using functional principal component (FPC) analysis. Both objectives should support our long-term target of designing automated hematoma assessment tool for the bedside applications.

Materials and Methods

The Institutional Review Boards at Hennepin Healthcare System and Fairview Health Services in Minneapolis, Minnesota, approved the study prior to initiation. Consecutive spontaneous ICH admissions between January 2012 and December 2017 from the two healthcare systems were included. The data collection protocol for the study has been described previously [21, 22]. ICH subjects were identified using International Classification of Diseases (ICD) coding system with the primary or secondary diagnosis codes of 431, 432, and 432.9 for the 9th Revision (ICD-9) and I61.0–I61.9 and I62.9 diagnosis codes for the ICD-10 (when applicable). Cases were cross-matched with a prospectively collected stroke database to account for missing cases and coding misclassification. We excluded secondary hemorrhage due to other etiologies such as hemorrhagic transformation, venous infarction, trauma, tumor (primary or cerebral metastasis), ruptured vascular malformation, subarachnoid, subdural, or epidural hemorrhage.

Demographic data, medical history, time of symptom onset, vital signs, laboratory tests, initial CT scan, in-hospital treatment, and length of stay were recorded in a standardized data collection form. Baseline neurological and functional deficits were documented using the National Institutes of Health Stroke Scale, Glasgow coma scale (GCS), ICH-score [23], and modified Rankin Scale (mRS), when available. Outcomes at discharge were defined as death or composite unfavorable outcome (severe disability or death, mRS 4–6).

Post-processing of Hematoma in Initial CT Scan

Initial CT scans were post-processed using a research version of Vitrea software (v7, Vital Images, Minnetonka, MN, USA). After skull removal, semiautomatic hematoma segmentation was applied. For each case, the user defined a bounding box where the hematoma segmentation was applied using 1 click extend action on a single axial or coronal view. Then, the algorithm segmented the hematoma based on 3D histogram analysis performed on CT densities and various morphologic operations. Hematoma volume as well as hematoma density (both mean and SD values) was, then, automatically computed. Hematoma density 3D histograms data were obtained and exported for further analysis (Fig. 1a).

Fig. 1
figure 1

a Workflow for hematoma segmentation and quantitative analysis. b Four segmented hematomas with 3D models with their corresponding SI values

To assess the hematoma shape irregularity (Fig. 1b), we used the following sphericity definition as a shape irregularity index:

$$ {\text{Sphericity}}\;{\text{Index}}\; \left( {\text{SI}} \right) = \pi^{1/3} (6V)^{2/3} / A $$

where “V” represents the volume and “A” represents the surface area of the hematoma. The proposed SI quantifies the irregularity of the hematoma shape taking the sphere (SI = 1.0) [24, 25]. Practically, SI for brain hematoma ranges from mostly regular ellipsoids (SI: 0.90–0.80) to a highly irregular shape (SI: 0.2–0.4) in a continuous scale.

To assess intra- and inter-rater variabilities, two trained readers repeated the hematoma segmentation of 50 hematomas twice using the hematoma probe prototype. For ground truth measurements, all the hematoma volumes measurements were repeated using a semiautomated planimetry measurement using the Medical Image Processing, Analysis, and Visualization (MIPAV, National Institute of Health) software [12, 22].

The midline shift (MLS) was determined as a binary variable (yes/no) and an alternative 4-class ordered MLS (0–2 mm [no MLS], 2–5 mm, 5–10 mm, and ≥ 10 mm). The MLS was measured as the perpendicular distance between septum pellucidum and a midline connecting free edges of the anterior and posterior falx [26].

Using hematoma CT density curves, we extracted mean CT (HU), standard deviation CT (HU), and 3 continuous variables (FPC1, FPC2, and FPC3) describing the variation in these curves using a functional data analysis described later.

Statistical Methods

The statistical analysis was performed using the R-statistical programming environment (version 3.2.0, R Core Team, Vienna, Austria). Characteristics of each outcome (in-hospital mortality and severe disability, mRS 4–6) were computed using median, interquartile range (IQR), Mann–Whitney tests for continuous variables, and Fisher exact tests for binary variables. The agreement between automated volume measurement and reference volume and the intra-rater agreement were assessed using intra-class correlation (ICC).

The quantitative (continuous) and qualitative (binary) predictors of patient outcome have been jointly analyzed using a mixed-type data approach [27] where generalized association values (0–100%) were computed between quantitative and qualitative predictors and summarized in a heatmap plot (see Figure 1s in Supplemental Materials). Confounder plots were generated to visualize the association of any predictors with the patient outcome, with the hematoma volume taken as a reference major predictor (see Figures 2s and 3s in Supplemental Materials).

The univariate analysis of the predictors was performed for both unfavorable outcome at discharge (mRS: 4–6) and in-hospital mortality using a receiver operating characteristic (ROC) area under the ROC curve (AUC) metric for continuous predictors and odds ratio (OR) for binary predictors. Optimal cutoff points were determined using the Youden index method. AUC confidence intervals were computed with DeLong’s method, and significant differences between AUCs were tested using DeLong’s method for correlated AUCs 2-sided tests. In the multivariate analysis, the linearity assumptions and the presence of interactions for the main predictors in logistic regression models were assessed using nonlinear regression plots, ANOVA, and Wald tests [28], see Figures 5s–6s and Tables 1s and 2s in Supplemental Materials. Multivariable logistic models and one model using a random forest classifier were evaluated for their performance to predict in-hospital mortality or unfavorable outcome using custom programs written in R-language [29]. Training of the classifiers was performed on a random sample of 80% of the subjects with tenfold cross-validation while testing the predictive performances used the remaining 20% independent testing set. Performances were assessed as ROC AUC, sensitivity, specificity, and accuracy. Initial variable selection in model building was based on relative univariate AUC variable performances, variable interaction tests results, and the degree of correlation between model variables using a mixed data analysis as shown in Supplemental Materials. Model selection was based on the AUC criterion. Calibration assessment was performed to evaluate the quality of predicted probabilities of the different models using a bootstrapping method detailed in Supplemental Materials (Figures 7s–16s).

Functional Data Analysis of Hematoma CT Density Curves

A functional data analysis was performed on the hematoma density curves to visualize their main modes of variation in our cohort and to extract data-driven density features for patient outcome classification [30, 31]. Original CT histograms were converted to smooth curves defined between 0 and 100 HU [32]. Three FPCs explaining 87.7% of the variation between density curves were selected following Petersen and Müller’s FPC method for frequency distributions [33]. The resulting FPC scores (FPC1, FPC2, and FPC3) are non-correlated new variables related to the variation in the density curves to be added to our set of patient outcome predictors (Fig. 2)—see further details on statistical analysis in Supplemental Materials.

Fig. 2
figure 2

Functional principal component (FPC) scores for hematoma density profiles. a FPC1: First mode of variation in the hematoma density curves starting from highly heterogeneous higher densities curves (10th percentile—light gray curve) moving toward heterogeneous curve profiles (25th percentile, mean, 75th percentile) ending with more homogeneous low densities hematoma (90th percentile—black curve). b FPC2: second independent type of variation starting from highly heterogeneous pattern of mixed low (predominant) and mid-densities (10th percentile—light gray) to a moderately high-density profile (90th percentile—black). c FPC3: Third independent mode of variation in hematoma density profile shows hematoma density transitioning from homogeneous curves (10th percentile—light gray) centered around 55HU to bimodal high-density dominant curves (around 65–70 HU and lower densities (40–45 HU)—90th percentile—black)

Results

We included 430 subjects in the present analysis. The usability performance of the current automated segmentation was assessed on a sample of 30 randomly selected cases using the number of user clicks to obtain the final segmentation (median: 1, IQR: [1–3]). The median segmentation time including computation and user visual assessment was 13.7s (IQR: 16.9s). The ICC between automated and MIPAV volumes was 0.91 (95% CI 0.81–0.96) using a random sample of 50 cases. The intra-rater and inter-rater agreements of automated segmented hematoma volume were 0.98 (95% CI 0.97–0.99) and 0.96 (95% CI 0.94–0.98), respectively, on the same case sample.

The baseline characteristics are shown in Table 1. Table 2 shows the ROC AUC, sensitivity, specificity, and the cutoff values for the variables used in the predictive models. The imaging findings that are shown in Table 3 were significantly associated with death and poor functional outcomes (mRS 4–6). Table 4 summarizes the predicting models for mortality and composite unfavorable outcome (mRS 4–6) at discharge. Our best predictive logistic regression model for mortality included age, admission GCS, hematoma volume, MLS, the interaction of hematoma volume * MLS, and hemorrhage in the pons. Our best predictive logistic regression model for discharge composite unfavorable outcome included age, admission glucose, admission GCS, hematoma volume, SI, MLS, interaction of hematoma volume * SI, interaction of hematoma volume * GCS, and hemorrhage in the pons as predictors. Beside classification accuracy, our calibration assessment confirmed that all models predicting death and the two most accurate models predicting composite unfavorable outcome provide an acceptable prediction of outcome probabilities with mean absolute errors between 0.007 and 0.022 (see Figure 7s to Figure 16s—Supplemental Materials).

Table 1 Baseline continuous characteristics of ICH per in-hospital mortality and disability outcome
Table 2 In-hospital death and composite unfavorable outcome cutoff points
Table 3 Imaging predictors of in-hospital outcomes
Table 4 Prediction models for in-hospital mortality and severe disability (mRS 4–6) at discharge

Discussion

Among the admission radiologic imaging variables, ICH volume has long been known as a strong predictor of mortality [34]. Our univariate analysis confirmed the value of ICH volume for the prediction of in-hospital death (AUC: 0.82 [0.83–0.89]) and composite unfavorable outcome (AUC: 0.83 [0.79–0.86]). The optimal cutoff point for the hematoma volumes was > 41.6 ml (Table 2). Other published studies have reported hematoma volume cutoff values of > 32 ml (supratentorial) and > 21 ml (infratentorial) for predicting 30-day mortality [35], 40 ml cutoff volume predicting a poor outcome in patients with lobar hemorrhage [36], and > 60 ml cutoff point for the highest mortality rate [37]. The difference in cutoff values may be due to different outcome scales, different follow-up times, or different studied cohorts [35,36,37].

Our multivariate analysis revealed a significant interaction between hematoma volume and MLS (p = 0.038) for mortality prediction adjusted for SI, hematoma in the cerebellum and the pons (see Figure 5s and Table 1s in Supplemental Materials). It should be noted that a strong nonlinear effect seen in the OR change for 4-class ordered MLS > 10 mm (see Table 3) and the interaction MLS*volume in our model may explain the lack of significant benefit of using 4-category MLS over the binary MLS.

Further, we also observed a significant interaction between hematoma volume and GCS (p = 0.004) for severe disability prediction adjusted for hemorrhage in the pons and age (see Figure 6s and Table 2s in Supplemental Materials). Regarding SI, irregular-shaped hematomas were associated with larger hematoma, intraventricular hemorrhage (IVH), death, and severe disability at the time of discharge. Both SI cutoff values for mortality and severe disability were < 0.54. SI was a significant predictor of both death (p < 0.001) and composite unfavorable outcome (p = 0.002) even after adjusting for hematoma volume and MLS. The previous studies defining the irregularity of the hematoma shape by visual inspection have shown a relationship between the irregular shape and poor clinical outcome at 30 days and 3 months post-ICH [19, 38]. The underlying mechanism might be the higher risk of hematoma expansion (HE) in irregular-shaped hematomas, which is a predictor of poor clinical outcomes in patients with ICH [19].

Our univariate study showed that heterogeneous hematomas with higher-density profiles were only moderately associated with higher mortality (FPC1, AUC: 0.64) and severe disability (FPC1, AUC: 0.62). FPC1 was also moderately associated with hematoma volume (see Figures 1s–4s in Supplemental Materials). This correlation partly explains the increased mortality associated with FPC1 higher-density heterogeneous profiles. However, FPC1 remained a significant predictor of mortality using Wald test (p = 0.0272) even after adjustment for hematoma volume but only marginally significant (p = 0.052) in a model to predict mortality after adjustment for hematoma volume and location. A heterogeneous density hematoma might reflect active hemorrhage, an extended hemorrhagic time course, and multifocality which might be a predictor of HE and worse outcomes [39]. FPC2 predictor was not significantly associated with the patient outcome. It is worth mentioning that Nowinski et al. [40] presented CT density profiles of IVH-ICH acquired at different times during the acute and subacute phases (1–7 days) that are strikingly similar to FPC2 curves shown in Fig. 2, suggesting a possible (yet unconfirmed) common interpretation in terms of hematoma maturation. Considering the FPC3 predictor, heterogeneous bimodal low- and high-density curves were also associated with mortality compared to unimodal medium-density curves. These results are consistent with the previous studies mentioning hematoma density heterogeneity as indicative of poor clinical outcome and mortality [16, 20, 39]. In an exploratory sub-study analysis using the 316 cases of our dataset with follow-up CT scan, FPC3 was also found to be the best significant CT imaging predictor (p = 0.0138) of the HE (defined as > 33% or > 6 ml increase)—see Supplemental Materials. FPC3 was significantly associated with HE even after combining it with the other significant clinical predictors at baseline: INR (p = 0.0015), creatinine (p = 0.0042), and GCS score (p = 0.0372) with a modest overall accuracy of AUC: 0.70 [0.65 to 0.75]) for the logistic regression. Moreover, FPC3 values were significantly higher in HE group (n = 52) compared to no HE group (n = 378), Mann–Whitney test (p = 0.0009). See Figures 16s–17s in Supplemental Materials. These findings are consistent with the previous studies showing the role of heterogenous hematoma density in the prediction of HE [16,17,18, 20].

Nonetheless, further detailed investigation on hematoma density profiles is needed to fully understand the way they affect the outcome. Our study also showed that IVH was significantly associated with mortality and severe disability that is in line with the previous studies [41]. However, IVH did not improve the prediction of the outcome (data are not shown) and was therefore excluded from the final models.

The present study shows high predictive accuracy of the best models for in-hospital mortality using both imaging and clinical variables (Model E1: AUC: 0.95 [0.90–0.99]) and models with radiologic imaging only variables (Model C1: AUC: 0.91 [0.86–0.97]). Similarly, we observed a high prdictive accuraracy of our best models for composite unfavorable outcome using both imaging and clinical variables (Model E2: AUC: 0.92 [0.86–0.98]) and imaging only variables (Model C2: AUC: 0.88 [0.80–0.95]). These results are in agreement with previous predictive models summarized by Gregório et al. [42] meta-analysis which reported a pooled C statistic (AUC) of 0.880 for mortality prediction and 0.872 for functional outcome prediction. Logistic regression models performed as well as more complex nonlinear models such as random forests (see Model C1 in Table 4) in agreement with Gregório et al. [42] results.

Since the logistic regression models also allow fast computation and are easily interpretable, they are the best candidates for future automated classifiers. Our separation of predictive models between an imaging only predictor group (A1-2, B1-2, and C1-2) and a composite imaging-clinical predictor group (D1-2 and E1-2) anticipates future practical challenges associated with the second group as automatic retrieval of clinical variables in emergency settings is more complex than accessing only the radiologic imaging data since typical CT images contain only basic demographic information (i.e., age and gender). Despite better average performances, best composite models were not significantly more accurate than the best imaging only models. Moreover, the redundancy of predictors in clinical applications may be beneficial when clinical or imaging data are missing or borderline. Nonetheless, knowing the mild restrictive inclusion and exclusion criteria used in this study, performances of both types of models support the idea of feasibility for future automated predictive systems.

Limitations of the study include the inability to rule out inherent methodological issues given the retrospective nature of the study. Additionally, only hospital discharge outcomes were considered for the study, and associations with long-term functional outcomes remain to be validated. Other than an exploratory review of HE predictors including FPC3, this study did not assess HE due to a significant number of patients (n = 114) with missing follow-up CT scans due to different reasons such as death, care directive measures initiated before the follow-up scan, or the patient who had undergone hematoma evacuation within the first 24 h of admission. Even though one can use different statistical techniques to adjust for such variations, the task is not a straightforward and may lead to unreliable results. Nonetheless, the objective of the present study was to predict the outcome solely based on the admission data. In our model, we didn’t compare the 3D hematoma volume measurement method with the ABC/2 method since the planimetry and 3D techniques have been already compared with the ABC/2 method to highlight the shortcomings of the ABC/2 method [12]. The validation of our hematoma segmentation is based on hematoma volume comparison. However, a more accurate voxel-based method such as Dice’s similarity coefficient or Jaccard index is needed for validating the performance of the final version of software.

Regarding the proposed quantitative features, the SI mostly represents large-scale shape regularity and thus cannot represent lower-scale heterogeneity (e.g., lobar hematomas). Future studies may consider recent advances applied to tumor shapes using statistical shape analysis [43], 3D spherical harmonics [44], or topological data analysis [45]. Lastly, statistical tests applied on predictive models (either only imaging or composite models) could not establish that models using the new quantitative features are more accurate than their counterpart without these features. Therefore, further validation with large independent datasets will be required.

Conclusion

Automated 3D hematoma segmentation on CT images can offer a comprehensive set of user-independent volumetric, shape-related, and CT densitometric quantitative predictors of the ICH patient outcome. Predictive models combining these imaging predictors with other admission clinical variables demonstrate a high-accuracy prediction of mortality and disability, opening the way to future integrated decision support tools for the clinicians at the time of admission. Future work should determine the best options for such tools including the degree of integration of clinical information from medical history, patient monitoring system, touch screen-based user-initiated segmentation, visualization of outcome probability conditioned to treatment options, and database query of similar past cases.