Introduction

Subarachnoid hemorrhage (SAH) constitutes 5% of all strokes and is particularly devastating as it affects individuals with an average age of 55 [1]. Advancements in treatment modalities and neurocritical care management have led to a decreased SAH mortality rate [2]. However, morbidity remains a critical concern as more than half of SAH survivors face functional limitations, behavioral issues, and cognitive deficits that impair their daily activities and prevent their return to a productive environment [3, 4]. Therefore, comparing these outcomes in SAH studies is crucial to evaluate the efficacy of therapies accurately. However, challenges may arise when quantifying functional and quality of life impact due to the potential subjective interpretation of symptoms and the current lack of consensus regarding which scales to use.

Vasospasm and delayed cerebral ischemia (DCI) are the most reported pathophysiological outcomes in SAH research. However, varying definitions and low inter-rater reliability limit their use in multicenter trials. The prognostic value of the World Federation of Neurosurgical Societies (WFNS) scale in SAH remains unclear and, consequently, has not been extensively reported. Functional outcomes are frequently assessed using the modified Rankin scale (mRS), the Glasgow outcome scale (GOS), and the Glasgow outcome scale-extended (GOSE). Furthermore, assessments are performed at varying time points and apply inconsistent cutoffs to determine impairment. Cognitive and behavioral outcomes following SAH have been underexplored in prior studies.

Previous efforts have been made to standardize and comprehensively analyze outcomes in SAH trials. Similar challenges in quantifying outcomes are encountered in other stroke types, such as intracerebral hemorrhages [5]. Common data elements (CDE) provide a standardized collection of terms for the scientific community, aiming to improve consistency in reported measures [6]. This collection gathers information on the variable definition, its purpose (e.g., assessments and examinations, disease/injury-related events, outcomes, and endpoints), and its relevance in SAH (core, recommended, supplemental, or exploratory). A core element is defined as a fundamental component that should be included in all studies concerning SAH. A recommended element is highly advised for use, having been both utilized and validated within the field. A supplemental element is typically collected, though its significance varies based on the study’s design. Lastly, an exploratory element requires additional validation but can provide insights to bridge information gaps until further validation is conducted [6]. When selecting the most suitable outcome measures tools, a three-step process is essential: selecting the best domains to assess, determining the appropriate time point, and choosing an accurate and reliable measurement instrument [7]. Data banks, such as a multicenter registry with predetermined variables to report, could facilitate the homogeneity of information and worldwide accessibility [8].

This review provides a comprehensive summary of the most common scales and clinical outcomes used in the assessment of SAH and their usage recommendations. It is driven by the idea that a more systematic and comprehensive application of scoring metrics across various clinical domains would improve prognostic assessment and enable targeted therapeutics for SAH patients.

Pathophysiological Outcomes

Pathophysiological outcomes, such as neurological sequela or imaging changes, are the primary outcome measure in 36% of clinical trials in SAH [9]. Symptomatic or clinical vasospasm and DCI are the most used outcomes [9, 10]. However, there is considerable inconsistency in outcomes definitions, imaging studies used for determining vasospasm or DCI, and neurological scales used for diagnosis (Table 1), making it difficult to compare studies.

Table 1 Criteria for defining radiological vasospasm, brain ischemia, and neurological deterioration in SAH clinical trials

In defining vasospasm and DCI, the criteria have traditionally mainly relied in the characterization of clinical deterioration without any imaging confirming the vessel reduction of caliber and ischemia [11, 12]. Some authors have recommended to restrict the use of the word “vasospasm” to description of a vascular radiological test, such as digital subtraction angiography (DSA), magnetic resonance angiography (MRA) or computed tomography angiography (CTA), and not to the clinical manifestations of DCI [13]. “Angiographic vasospasm” is a radiological phenomenon, defined as greater than two-thirds reduction in intracranial vessel diameter on neuroimaging [14].

The goal standard for vasospasm adjudication is DSA, while CTA and MRA also exhibit high diagnostic performance, despite having the disadvantage of providing only a single snapshot of the neurovascular circulation [15]. Some authors have advocated for the increased use of transcranial doppler (TCD) in the diagnosis and assessment of vasospasm, due to its advantages, such as bedside accessibility and non-invasiveness, making it feasible for daily monitoring [16]. A Lindegaard ratio (mean medial cerebral artery flow velocity divided by mean internal carotid artery flow velocity) higher than 3 in TCD has shown to be indicative of cerebral vasospasm, but it exhibits lower sensitivity and specificity in predicting DCI compared to other imaging modalities [13, 17,18,19]. Consequently, TCD is primarily recommended as an adjunctive outcome measure to investigate proof of concept in conjunction with other imaging modalities [13].

The criteria for DCI adjudication should include the identification of any new cerebral infarction in computed tomography (CT) or magnetic resonance imaging (MRI) alongside clinical deterioration [2, 13]. This recommendation is based on the observation of ischemia in many patients who did not display radiological vasospasm, possibly due to narrowing of small arterioles or delayed imaging after symptom resolution. The term “cerebral infarction” should be reserved for isolated new imaging changes suggestive of ischemia, with the recommended timeframe of 6 weeks post-SAH for their accurate detection [13]. DCI has also been defined by radiological evidence of hypo-perfusion in the context of clinical deterioration [20]. Hypo-perfusion may manifest as a decrease in mean cerebral blood flow and an increase in mean transit time, while preserving cerebral blood volume, in CT perfusion (CTP) [20]. Some authors argue for the role of CTP in diagnosing microvascular vasospasm in cases lacking evidence of narrowing in middle to large caliber vessels where perfusion abnormalities can be demonstrated in CTP [21]. To assess its clinical utility, some studies have utilized both CTA and CTP to confirm the presence of vasospasm and quantify perfusion deficits [22].

The definition of clinical deterioration encompasses new neurologic focal deficits or changes in the level of consciousness. The Glasgow coma scale (GCS) is widely regarded as the premier tool for evaluating alterations in consciousness following SAH [23]. A shift of greater than 2 points in the GCS score is suggestive of a significant neurological alteration [13]. Although the National Institute of Health Stroke Scale (NHISS) is the preferred method for assessing new focal neurological deficits in stroke patients, further research is recommended to determine its accuracy in determining neurological deterioration post-SAH [13]. Additionally, a comprehensive work-up including brain imaging is recommended to rule out other potential causes of neurological deterioration. If the term “symptomatic or clinical vasospasm” is to be used, it should be based on the radiological criteria for vasospasm in conjunction with neurological deterioration, rather than solely on the clinical presentation.

Previously proposed uniform definitions for both vasospasm and DCI should be implemented in SAH research to enhance the interpretation and generalizability of novel therapies [13]. Signs of cerebral ischemia are sometimes reversible but may progress to cerebral infarction. These clinical changes often coincide with angiographic evidence of vessel narrowing, but they may also occur independently of each other. Therefore, it is crucial to document them separately until the pathophysiology of DCI is fully understood. Some authors find it more reasonable to solely evaluate for DCI, as certain drugs may not affect vasospasm but rather possess neuroprotective properties [24]. To enhance the comparability of research findings, authors should clearly specify the imaging modality, diagnostic criteria, and scales employed to evaluate changes in neurological status. Vasospasm should preferably be angiographically determined with DSA, with MRA and CTA being reasonable alternatives. TCD may be used for daily monitoring once the angiographic evidence of vasospasm is confirmed. DCI adjudication should rely on signs of cerebral infarction using CT or MRI and neurological deterioration, defined as a > 2-point drop in the GCS.

  • Clear and consistent definitions for vasospasm and DCI should be used in clinical studies. Both clinical and radiographic criteria for determining vasospasm and DCI should be collected to enable comparisons among studies.

Grading Scales for SAH

Several SAH grading scales are used in the acute assessment of SAH severity in clinical practice, with the Hunt and Hess, the WFNS, Fisher, and modified Fisher scales being among the most commonly used. However, they are infrequently reported in SAH research studies [9]. Their primary limitation lies in inter-rater variability and their lack of adoption in many institutions.

The Hunt and Hess scale, proposed in 1968 as a modification of an older system by Botterell in 1956, was designed to evaluate surgical risk and for determining the optimal timing for surgery post-SAH [51, 52]. It used key clinical signs considered at that time, including the intensity of the meningeal inflammatory reaction, severity of neurological deficits, and level of arousal. The scale consists of five grades, each incorporating the three criteria. While widely known and easy to administer, the scale has low inter-rater agreement due to the arbitrary margins between categories and the possibility of patients falling into different grades for each criterion [53].

In 1988, an expert committee introduced the WFNS scale, which comprises five grades based on the GCS and focal neurological deficits [54]. This scale aimed to shift focus away from the severity of presentation and prioritize the incorporation of the GCS along with neurological deficits such as hemiparesis and aphasia, which are the best predictors of mortality and disability, respectively [54, 55]. Kapapa et al. demonstrated that the WFNS score correlates more closely with quality-of-life scores after SAH compared to the Hunt and Hess scale [56]. However, Aggarwal et al. observed a better performance of Hunt and Hess in predicting GOS at 3 months [57]. One of the limitations of WFNS is that it does not specify a method for determining GCS cutoffs [58]. Additionally, conflicting data exists regarding the prognostic power of the WFNS scale, which may have limited its widespread adoption [59]. Another critique has been that the distribution of grades is largely skewed toward grade 1 [60]. The effect of any intervention in WFNS grade 1 patients could potentially be diluted because 75% of them already have good outcome (mRS < 2) [60]. However, WFNS is classified as a “core” element for “disease/injury event” measures within the CDE repertoire for SAH, while Hunt as Hess is “supplemental” [6]. Including the WFNS score in SAH research studies could determine its potential role as a predictor of functional outcomes and understanding the characteristics of the enrolled population in clinical trials.

The Fisher score, established in 1980, is the most used method for assessing the severity of SAH based on the amount of blood detected on CT scans [61]. It classifies SAH into four degrees: (1) no blood detected, (2) diffuse deposition of blood < 1-mm thick, (3) localized blood and/or layers > 1 mm, and (4) intraventricular and/or intraparenchymal extension. However, it has been criticized for not accounting for patients with thick cisternal blood and concomitant intraventricular or intraparenchymal blood [55]. Consequently, the modified Fisher score was proposed in 2001 to overcome this limitation [62]. The modified Fisher scale criteria are as follows: (0) no blood detected, (1) focal or diffuse thin SAH (< 1 mm) without intraventricular extension, (2) focal or diffuse thin SAH (< 1 mm) with intraventricular extension, (3) focal or diffuse thick SAH (> 1 mm) without intraventricular extension, and (4) focal or diffuse thick SAH (> 1 mm) with intraventricular extension. This modified scale seems to performs better in predicting vasospasm compared to the original version [63]. The enhanced predictive value of the modified scale likely stems from how it classifies patients with intraventricular expansion based on the amount of SAH. Both cisternal and ventricular blood seem to be independent predictors of DCI, being the risk additive [62]. By distinguishing between grades 2 and 4 for patients with intraventricular expansion with thin and thick SAH, respectively, the modified Fisher scale correlates more effectively with the risk of vasospasm, which tends to peak from grades 3 and 4 (Fig. 1) [63]. Both Fisher and modified Fisher scales are classified as a “supplemental” CDE; however, the use of the modified Fisher score is recommended to further clarify its relationship with SAH outcomes [6].

  • The use of WFNS and modified Fisher score in SAH research is recommended to further determine their prognostic role.

Fig. 1
figure 1

Comparison of Fisher and modified Fisher scales in classifying patients with SAH and intraventricular extension. AB This patient would have a Fisher score of 4 and a modified Fisher score of 2. The main difference between the two scales is that the SAH is thin (< 1 mm); therefore, it corresponds to a modified Fisher of 2, despite the presence of intraventricular blood. CD This patient has a SAH with Fisher and modified Fisher scores of 4. The SAH is thick (> 1 mm) and corresponds to a modified Fisher score of 4. These examples highlight the more granular classification of SAH achieved by the modified Fisher score, which accounts for varying degrees of SAH thickness, that correlates better with the risk of vasospasm and DCI. mFisher, modified Fisher score

Functional Outcomes

Functional outcomes serve as primary endpoints in 20% of SAH studies [9]. The main scales utilized include GOS, GOSE, and mRS [22, 64, 65]. However, among the CDE for outcomes, only mRS is “recommended” [66]. Death, GOS, and GOSE were classified as “supplemental,” while the remaining scales were classified as “exploratory.” Their limitations and scarce validation in SAH survivors may explain why they are not considered as core elements. Common weaknesses include interobserver variability, lack of consensus on timing of the assessment, and overlapping aspects among the scores.

Modified Rankin Score (mRS)

The mRS is a 6-point disability scale ranging from 0 (no symptoms) to 6 (death) and is widely used in SAH studies, being reported in almost half of the reviewed studies [9]. Its popularity has been increasing over the years, and currently is classified as a “recommended” CDE for outcome measurement [66]. However, its application varies across studies, with different time points, predominately 3, 9, or 12 months [67, 68]. Moreover, one-third of the studies reported dichotomized results, considering scores from 0 to 2 as favorable, resulting in the loss of considerable information [68, 69].

Despite being the most frequently measured outcome in stroke trials, the mRS has been criticized for its poor reproducibility, non-proportional differences between mRS categories (notably between categories 2 and 3 and between 4 and 5), and an overemphasis on mobility over cognitive and social functioning [70,71,72]. Furthermore, dichotomization of the mRS results in the loss of prognostic information and oversimplification in statistical analyses [71].

The main cause of the poor interobserver reliability for the mRS lies in the broad and subjectively descriptions of its categories [70]. This issue is exacerbated in large multicentric clinical trials with numerous raters [72]. Even apparent objective criteria such as “no symptoms” and “walking” are susceptible to misclassification, as raters seldom thoroughly probe for possible stroke sequelae or may not account for walking aids [71]. The overall reliability of the mRS improved significantly when structured interviews were conducted (k = 0.46 vs. k = 0.62) [73]. Structured interviews entail specific questions designed to grade each category and recommend that raters undergo learning and practice sessions before evaluating patients [74]. The use of specific checklists from the instrumental activities of daily living (IADL) scales has also been recommended to enhance reliability in mRS assessments [75].

The structured interview approach targets specific concerns for each mRS category [76]. The first category (no significant disability despite symptoms) aims to recognize any symptoms that may have arisen after the stroke, inquiring about each specific National Institute of Health Stroke Scale (NIHSS) component [73]. In the second category (slight disability), the focus is on the patient’s participation in social roles, including work and leisure activities, and any changes in personality or mood compared to baseline. The third category (moderate disability) emphasizes IADL like house chores, rather than solely focusing on walking. The fourth category (moderate to severe disability) asks about basic activities of daily living, such as hygiene. Finally, the fifth category (severe disability) inquiries about then need for constant care, moving beyond a mere assessment of bedridden status.

Glasgow Outcomes Scale (GOS) and Glasgow Outcomes Extended Scale (GOSE)

The GOS has been the most frequently utilized scale for assessing functional outcomes, reported in more than half of SAH studies [9]. However, its popularity has declined over the years in favor of the mRS [9]. Its application also varies between studies, being used at different time points (primarily at 3 months), and results being dichotomized using various cutoffs [65, 68].

The GOS was initially designed in 1975 for traumatic brain injury patients to capture how injury affected functioning in major areas of life [77]. It classified severe disability as the need for assistance with “some” daily activities; moderate disability if individuals could not reintegrate into previous activities at work or in social life due to physical or mental deficits; and good recovery if minor physical or mental deficits did not prevent the resumption of occupational and social activities [77]. Its most significant limitation is interobserver variability; for instance, general practitioners are more likely to report good recovery than psychologists [78].

Later in 1981, an extended version of the GOS, known as the GOSE, was created, expanding the five-point scale into eight points dividing each category into “lower” and “upper” disability levels [79]. However, this modification further increased the interobserver variability. Attempts were made to develop a structured interview for the GOSE, delineating performance in “at home” and “out of home” activities to incorporate social and behavioral aspects [80, 81]. Despite these efforts, widespread adoption of the GOSE in SAH research remains limited, being reported in only 5% of SAH studies [9].

Score dichotomization is an essential factor when recommending the incorporation of GOS and GOSE into outcome measurements for SAH studies. Classifying scores into favorable and unfavorable eliminates the benefits of the GOSE over the GOS [82]. Forty percent of SAH studies dichotomized their outcomes, with most of them aggressively including the "moderately disabled” (not able to return to work/social activities) category as favorable [9]. However, the same issue also arises with the mRS. The most frequently used patterns of dichotomization for both scales include similar patients in both groups: 0–2 mRS and 5–8 GOSE as good functional outcomes, while 3–6 mRS and 1–4 GOSE as poor functional outcomes [9]. Studies should refrain from using dichotomization in their reporting, or at a minimum, use less aggressive cutoffs by including GOSE categories 5 and 6 and modified Rankin score 2 in the unfavorable group. Removing dichotomization would prevent the loss of valuable information. Subtle post-SAH symptoms, such as headaches, mental “fogginess,” or fatigue, would be classified as mRS 1 and GOSE 7, which are “good neurological outcomes,” but can be highly impactful in daily activities.

When specifically comparing mRS and GOSE, the latter is more precise in capturing the extent of social, leisure, and behavioral disturbances. GOSE categories 5 and 6, which differentiate inability from reduced capacity at work, leisure activities, and relationships, are both included in the mRS 2 score (Table 2). This delineation is highly significant in SAH patients, where physically limiting symptoms are not as common as in other types of strokes [1]. However, this also could increase the interobserver variability. The completion of validated scales for IADL, such as the Barthel Index or the Karnofsky Performance Status Scale, and for quality of life, such as the stroke-specific quality of life, would increase the accuracy of evaluators [83]. However, these scales are rarely reported in SAH studies [9, 84, 85]. Structured interviews are valuable alternatives as they may incorporate questions from all these questionnaires and let the interviewer guide the conversation into the most relevant topic for each patient [80]. Incorporating family or caregivers in addition to patients in these assessments could offer valuable insights into patients’ well-being and post-SAH changes. As mRS is designated as a “recommended” CDE, its utilization through its structured interviews by trained personnel should be encouraged [6]. However, incorporating GOSE scores 5 from 6 in SAH survivors with mRS 2 could provide valuable information regarding social and behavioral outcomes.

Table 2 Comparison between the mRS and GOSE scales

Finally, concerning the timing of the application of these scales, the GOS and all-cause mortality are more frequently reported at 3 months [9, 65]. Therefore, these scales should be assessed at least at 3 months, for a better standardization of results. Broad and unspecific time frames, such as at time of discharge or follow-up, should be avoided.

  • It is recommended to use structured interviews in determining functional outcomes to avoid dichotomization of the results.

  • GOSE may be more precise than mRS in capturing the extent of social and behavioral outcomes.

  • Functional outcomes should be done at least at 3 months from the initial SAH for consistency between studies.

Neuropsychological Outcomes

The American Heart Association guidelines recommend having a multidisciplinary team approach after SAH to identify early behavioral and cognitive deficits with validated screening tools [86]. However, neuropsychological health is not commonly assessed at the time of follow-up. Neuropsychological deficits are among the main reasons for mid- to long-term disability in SAH survivors [3]. Its incidence is high among patients with “good neurological outcomes,” being reported in up to 60% of patients with reduced disability according to the GOS [87]. Despite the lack of consensus on how and when to evaluate neuropsychological function in SAH survivors in clinical practice, the Montreal Cognitive Assessment (MoCA) is one of the “recommended” CDEs for SAH research studies [88].

MoCA has demonstrated superior performance compared to the Mini-Mental State Examination (MMSE) by assessing executive function and abstraction domains. The MMSE does not assess these domains, which are commonly affected after SAH. MoCA is often considered a valuable screening tool for selecting patients who may benefit from a comprehensive neuropsychological evaluation. However, the traditional MoCA cutoff of 26 appears to be less specific in SAH than in the general population for detecting cognitive impairment. Lower thresholds are recommended, though an optimal value requires further validation. The suggested cutoff of 22 for stroke patients is a more reasonable reference until further evidence for SAH patients is obtained [89]. A specific study evaluating MoCA performance in SAH found the cutoff of 22 adequately detects cognitive impairment at 12 months (accuracy 85%, sensitivity 100%, and specificity 75%); however, for the subacute period (2–4 weeks post-SAH), the cutoff of 18 had a better diagnostic performance (accuracy 92%, sensitivity 75%, and specificity 95%) [90]. Further studies are required to validate these findings and establish standard cutoffs for SAH patients. Scores may need to be adjusted by years of education, age, and time from SAH to MoCA administration. In the meantime, performing a MoCA at 9–12 months using 22 as a cutoff seems like a reasonable approach to screen for cognitive impairment post SAH.

Domain-specific neuropsychological tests have been employed in less than ten percent of SAH studies [9]. The most frequently employed tests were the Wechsler Adult Intelligence Scale, the Wechsler Memory Scale and the Trail Making Test. The incorporation of specialized neuropsychological evaluations into SAH research, either as a second step after using MoCA for screening, or independently, could offer valuable information [91]. Neuropsychological testing will offer a more comprehensive characterization of specific cognitive deficits in SAH survivors, facilitating the design of targeted rehabilitation strategies.

Finally, many SAH studies have not studied behavioral and quality of life outcomes [9]. Despite good neurological and cognitive outcomes, many SAH survivors face challenges in resuming their previous work and social activities [92]. Psychiatric disorders, such as depression and anxiety, impact up to 50% of SAH survivors [93]. Their prevalence remains high during the first 2 to 5 years after SAH [94]. However, previous SAH studies have infrequently reported behavioral and quality of life outcomes [94, 95]. The early usage of validated psychological and quality-of-life batteries, such as the Beck Depression and Anxiety Inventories and Stroke Specific Quality of Life Scale, is recommended by the American Heart Association guidelines on the management of SAH [86]. These questionnaires offer the advantage of being relatively fast to complete and can be done at home without the need for an examiner.

  • MoCA should be used to screen for cognitive deficits in SAH survivors.

  • Behavioral studies and neuropsychological testing are encouraged in the comprehensive analysis of SAH survivors.

Procedural Outcomes

Procedural outcomes serve as valuable indicators of the economic impact of the new therapies on the health system. However, only one-third of the studies provided information on the length of hospitalization, intensive care unit stay, or procedure duration [9]. Studies could maintain records of these measurements to identify potential drawbacks associated with new protocols, techniques, or devices.

Conclusion

Standardizing outcome measures in SAH research is essential for effectively monitoring the impact of new therapies. However, very important metrics related to cognitive deficits, behavioral and neuropsychological health, and alterations in quality of life are not routinely studied. Based on this review, we recommend the following:

  • Clear and consistent definitions for vasospasm and DCI should be used. Clinical and radiographic criteria should be used in defining vasospasm and DCI.

  • The use of WFNS and modified Fisher score in SAH research is recommended to further determine their prognostic role.

  • It is recommended to use structured interviews at specific time periods when applying functional outcome scales. Three months after the SAH is a reasonable time point for analysis. Non-standardized time points such as time of discharge or first follow-up should be avoided.

  • Functional outcomes should not be reported dichotomized, or additional non-dichotomized information should also be available for further study.

  • GOSE may be more precise than mRS in capturing the extent of social and behavioral outcomes.

  • When possible, MoCA and behavioral questionnaires should be included in the study of SAH outcomes.