Introduction

Worldwide, ischemic stroke leads to considerable mortality and morbidity. Up to 50% of stroke patients who survive are severely disabled requiring chronic care which leads to associated health care expenditures [1, 2]. Predicting functional outcome for an individual stroke patient remains difficult, despite the existence of several prognostic scoring systems such as ASTRAL, CT-DRAGON, iSCORE, PLAN and THRIVE [3,4,5]. A higher number of input variables and the inclusion of comorbidities and radiologic variables significantly improves discriminative performance [6]. However, these scoring systems are underutilized in clinical practice because the input variables are complex and often unavailable on admission (e.g. neuro-imaging results) [3, 5].

In clinical trials and benchmarking initiatives, prognostic scoring systems are used for case-mix adjustments, since case-mix is a confounder of the relationship between outcome, treatment effect (in clinical trials) and quality of care (in benchmarking). In the MR Clean randomized clinical trial, for example, the primary endpoint was functional outcome measured by the modified Rankin Scale (mRS) at 90 days (± 14 days) after stroke and adjustments were made for 6 variables: age, sex, pre-stroke mRS, duration from onset to randomization, stroke severity (NIHSS), and collateral status [7]. The proposed Belgian pay-for-performance indicator in acute ischemic stroke is hospital mortality. Case-mix adjustments are to be made on the basis of hierarchical logistic regression with variables age, sex, Charlson Comorbidity Index (CCI), place of hospitalization and year of registration. Stroke severity is not to be taken into account and data are to be retrieved from administrative databases [8].

Recently, a derivative score of the CT-DRAGON score was established [4]. The combination of NIHSS, age and mRS prior to stroke—the ‘reduced features set’—was observed to be predictive of both favorable and poor outcome in a monocentric study [4]. Considering that these variables are immediately available and rarely missing, the reduced features set may be useful for prediction of functional outcome and eventually for benchmarking.

In this study, we combined routine clinical data of three major stroke centers in Belgium and investigated if the reduced features set of age, pre-stroke mRS and NIHSS could reliably predict long-term functional outcome after stroke, and investigated if treatment modality influenced long-term outcome.

Methods

Patient population

This retrospective study included patients who presented with an acute ischemic stroke between March, 1st 2019 and December, 31st 2019 in three Belgian comprehensive stroke centers: AZ Groeninge (Kortrijk, Belgium), University Hospitals Leuven (Leuven, Belgium) and Ziekenhuis Oost-Limburg (ZOL) (Genk, Belgium). Patients received treatment (thrombolysis, thrombectomy, a combination of both or neither treatment with thrombolysis or thrombectomy ‘conservative treatment’), according to current stroke guidelines [1].

Patient characteristics such as age, sex, pre-stroke mRS, NIHSS on admission, time interval from stroke onset to emergency department (ER) admission (onset-to-door time), time interval from ER admission to imaging (door-to-CT time), time interval from ER admission to thrombolysis (door-to-needle time (DNT)) and time interval from ER admission to initiation of thrombectomy (door-to-groin time (DGT)) were collected, following the local standard practices. No specific training was given for data scoring or gathering. Missing data on admission were not retrospectively added, since we aimed to analyze real world data.

The study was approved by the ethics committees of AZ Groeninge Kortrijk (AZGS2020022), UZ Leuven (B371201941435) and Ziekenhuis Oost-Limburg Genk (19/0059U).

Reduced features set CT-DRAGON and outcome measure

The reduced features set, consisting of age, NIHSS on admission and pre-stroke mRS was shown by bootstrap forest analyses to accurately predict mRS at 90 days 0–2 vs. 5–6 after stroke in a monocentric study [4]. In the latter study, outcome was dichotomized into favorable outcome (mRS 0–2) and miserable outcome (mRS 5–6).

In this study the 90-day mRS was categorized into acceptable outcome (mRS 0–2) vs. poor outcome (mRS 3–6) [9]. Patients were re-evaluated by the neurologist or contacted by a stroke nurse, working in the neurological ward, by telephone to determine mRS after 90 days.

Statistics

Continuous variables were expressed as median and interquartile range (IQR) and categorical variables were presented as proportions.

Multivariable logistic regression analysis for mRS at 90 days [acceptable (mRS 0–2) versus poor outcome (mRS 3–6) and survivor (mRS 0–5) versus non-survivor (mRS 6)] was performed including age, mRS prior to stroke (ordinal variable across entire range) and NIHSS on admission, to validate the reduced features set [4]. Neither univariable analyses, nor multiple data imputation were done to reflect as much as possible the routine readily available data. Additionally, treatment modality (active vs. conservative treatment) was added in the model in a second analysis, to investigate whether this influenced the discriminative ability of long-term outcome. Effect of treatment on long-term functional outcome could thereby be assessed. Parameter estimates were reported with corresponding standard error. The area under the receiver operating characteristic (ROC) curve (AUROC) and the misclassification rate, as calibration measure, were also reported. Calibration was used to assess the concordance between predicted vs. observed outcome and therefore external validation of the model. For this reason, lack of fit-tests (Hosmer–Lemeshow) were performed [6].

Findings were significant at a level of 0.05. Data were analyzed using JMP, version 15.0.0 (SAS Institute Inc, Cary, NC, USA).

Results

Patient cohort

In the 3 hospitals, 1250 patients were admitted for stroke between March and December 2019 (Fig. 1). Duplicates through readmissions (n = 6) and minors (age < 18 years) (n = 1) were excluded. In total, 1243 stroke patients were included (Table 1). Eighteen percent (n = 225) were treated by thrombolysis, 7% (n = 88) by thrombectomy and 8% (n = 100) by the combination of thrombolysis and thrombectomy. Median age was 76 years (IQR 65–85) and 52% (n = 649) of this patient cohort was male. Median NIHSS on admission was 4 (IQR 1–11). Median onset-to-door time was 3.0 hours (IQR 1.2–10.2), median door-to-CT time 28 minutes (IQR 17–59), median DNT 30 minutes (IQR 20–45) and median DGT 68 minutes (IQR 40–101). Median mRS at 90 days was 2 (IQR 0–4) and 90-day mortality was 17% (n = 135).

Fig. 1
figure 1

Flowchart of patients. In total, 1250 stroke patients were admitted between March and December 2019 in one of the three major stroke centers AZ Groeninge (Kortrijk, Belgium), University Hospitals Leuven (Leuven, Belgium) or Ziekenhuis Oost-Limburg (ZOL) (Genk, Belgium). After exclusion of a minor (age < 18 years) (n = 1) and duplicates (n = 6), 1243 patients were included in this study. In total, all data were available of 766 patients for multivariable analysis after exclusion of missing data (n = 477)

Table 1 Patient characteristics

Missing data

All data points were available for age, sex and treatment modality. Only 4% of pre-stroke mRS and NIHSS data were unavailable. The 90-day mRS was missing in 485/1243 patients (61%). In addition, following proportions of operational times were missing values: 19% of onset-to-door time, 51% of door-to-CT time, 30% of DNT and 12% of DGT.

Validation of the reduced features set

Multivariable logistic regression analysis was performed for 90-day mRS 0–2 vs. 3–6 (Tables 2, 3). Age, NIHSS and pre-stroke mRS remained independently associated with good functional outcome at 90 days (all p ≤ 0.0001), AUROC 0.88. Misclassification rate was 18%, sensitivity was 85% and specificity 78% (R2 = 0.38).

Table 2 Multivariable logistic regression analysis of outcome (90-day mRS, 0–2 versus 3–6)
Table 3 Multivariable logistic regression analysis of outcome (90-day mRS, 0–2 versus 3–6) with treatment modality

In a next step, treatment modality was added in the model. Age, NIHSS on admission, pre-mRS and treatment modality were independently associated with good long-term functional outcome (p < 0.0001, p < 0.0001, p < 0.0001, p = 0.0001). The AUROC was 0.89. Misclassification rate was 17%, sensitivity was 84% and specificity 82% (R2 = 0.40) (Table 3).

Association of reduced features set with mortality

Furthermore, a multivariable logistic regression of age, NIHSS and pre-stroke mRS was performed to investigate association with survival (mRS 0–5) vs. non-survival (mRS 6) (Tables 4, 5). Age, NIHSS on admission and pre-stroke mRS were associated with survival after stroke (all p < 0.0001, AUROC 0.86). Misclassification rate was 12%, sensitivity and specificity were 74 and 83%, respectively (R2 = 0.30). Moreover, after adding treatment modality to the model, age, NIHSS on admission, pre-stroke mRS and treatment modality were independently associated with survival (p < 0.0001, p < 0.0001, p < 0.0001, p = 0.008), AUROC 0.86. Misclassification rate was 12%, sensitivity and specificity were 74 and 86% respectively (R2 = 0.31).

Table 4 Multivariable logistic regression analysis of outcome (90-day mRS 0–5 versus 6)
Table 5 Multivariable logistic regression analysis of outcome (90-day mRS 0–5 versus 6) with treatment modality

Discussion

This small multicentric observational study validated that the reduced features set of age, NIHSS on admission and pre-stroke mRS was associated with acceptable vs. poor long-term functional outcome after stroke. Moreover, these three easily obtainable indicators remained independently associated with functional outcome when treatment modality was added in the model.

In addition, treatment modality was associated with long-term outcome. This is not surprising, since stroke severity influences treatment modality [10]. Moreover, conservative treatment only occurs if the stroke is mild or too severe and out of the window of treatment [11,12,13,14].

The original CT-DRAGON score and its reduced features set were previously validated in a single-center dataset of patients, who suffered from anterior or posterior circulation or lacunar strokes and who underwent all sorts of treatments, including conservative treatment, ranging from anti-platelet therapy to palliative care [4].

In the Netherlands, the DASA report, a national stroke registry, is published yearly and includes more than 30,000 stroke patients [15]. This audit exists already since 2014 and therefore the Netherlands are one of the first pioneers in benchmarking acute stroke care. This tool allows detection of health care variation and helps to improve quality of stroke care. Median age and proportion of male patients were comparable between the DASA report and our patient cohort (74 vs. 76 years and 53 vs. 52% of stroke patients was male). In the Netherlands, median DNT was an impressive 25 min, which was lower compared to 30 minutes of the three Belgian hospitals. Similarly, the process-indicator DGT of the DASA report was lower compared to this study 64 vs. 68 minutes. Seventy percent of Dutch stroke patients experienced a mRS-score at 90 days between 0 and 2, compared to 61% of this Belgian cohort. The DASA report has highlighted the two most important contributors of bias in benchmarking initiatives. First, the outcome measure 90-day mRS was missing in approximately half of the patients. NIHSS on admission was missing in 45% of stroke cases. In this study, 90-day mRS was unavailable in only 36% of stroke patients and NIHSS on admission was missing in only 4%. In medical reporting, missingness is rarely at random, and thus an important confounder [16]. Second, benchmarking of outcome and even operational processes requires adjustments to the case-mix of the population [10].

The combination of age and NIHSS were also the most important determinants of mortality, mRS and even quality of life at 3 months after stroke in a four centers’ study in the Netherlands [17]. In the AHA/ASA statement document on the risk adjustment of ischemic stroke outcomes for comparing hospital performance, the characteristics of the prediction models for long-term functional outcome were described [18]. Age, pre-stroke mRS and NIHSS are the most consistently included variables in those models that are significantly associated with functional outcome. Treatment modality was associated with long-term outcome, probably because of interaction between NIHSS and treatment modality.

Testing of prediction models in patients undergoing endovascular treatment from the MR CLEAN registry, showed that the CT-DRAGON had the best performance to discriminate functional outcome mRS 0–3 versus mRS 4–6 [6]. In this study, an AUROCs > 0.80 was considered excellent. Our models with only the reduced features set had thus an excellent discriminative performance for functional outcome, reflected in an AUROC of 0.88. Nevertheless, it should be emphasized that the reduced features set cannot be used to predict outcome in an individual patient, as the misclassification rate is approximately 25%, whether or not treatment modality was included.

However, the combination of age, NIHSS on admission and pre-mRS might be a good set of confounders to be used in benchmarking of functional outcome between Belgian stroke centers.

In contrast, the 2012 KCE report 181As (Belgian Health Care Knowledge Centre) of stroke focused on hospital mortality after stroke and neglected the importance of case-mix adjustments [19]. In the proposed quality indicator of hospital mortality after stroke, a case-mix adjustment was included. It mainly encompassed sex, age and CCI. Nevertheless, the previously mentioned studies clearly highlight that NIHSS has to be included in outcome prediction models, whether evaluating mortality, functional outcome or quality of life [17, 18].

The logistic multivariable models using multiple data imputation by Lingsma and colleagues yielded a percentage explained variance (R2) of approximately 0.4 for mortality when age, NIHSS and CCI were included [17]. This is slightly higher than in our small study.

In their models on predicting functional outcome, NIHSS had the strongest incremental impact on the R2 value. NIHSS, together with age, heart failure and previous stroke reached an R2 of 0.35, which is a better than in our model using age, NIHSS and pre-stroke mRS.

The importance of case-mix adjustments, is further highlighted by the MR Clean investigators, observing in their study including 17 centers that variation between centers is mainly determined by demographic indicators, rather than structural and process indicators [20].

This study has several limitations. First, the patients included in this analysis were only limited and confined to three centers. It is advisable to include more centers, not only centers that perform endovascular treatment and to study prospectively over a longer time period.

Finally, we did not use multiple data imputation to account for missing data. This might have improved the accuracy and reliability of our models. However, it would have increased the models’ complexity and utility in clinical practice.

The novelty of this multicentric study is that three easy accessible and immediately available parameters have been validated in this study as surrogates for long-term outcome after stroke. This may help benchmarking acute stroke care in the future. This could be an opportunity to encourage other centers to share stroke data and to develop a nationwide Belgian audit tool to detect variability between patient care and ultimately to enhance acute stroke care in Belgium.

In conclusion, this Belgian multicentric observational study validates the reduced features set. Important patient characteristics (age, NIHSS on admission and pre-stroke mRS) remained independently associated with long-term functional outcome. It is therefore crucial to include these demographic indicators in future benchmarks.