Introduction

Aneurysmal subarachnoid hemorrhage (aSAH) occurs at an annual rate of 10 events per 100 000 person-years [1, 2]. Although representing just 4–5% of the total stoke incidence, it affects younger patients and therefore creates a significant health burden, accounting for 27% of stroke-related life years lost before the age of 65 [3]. aSAH has a case fatality rate of up to 50%, and almost half of the survivors are left with cognitive impairment and functional limitations [4], [5].

Declining mortality rates reflect advances in the management of aSAH; however, there remains uncertainty regarding optimal management and limited high-level evidence to support clinical decisions [5, 6]. A lack of consistency in outcome measures reported in aSAH research hinders progress in the field [7], [8].

Several initiatives have been developed to improve consistency in the reporting of outcome measures. One of these measures is the common data elements (CDE). Developed by research communities, a CDE is a precisely defined question that has been paired with a well-specified set of responses [9]. A SAH CDE collection has recently been published by the National Institute of Neurological Disorders and Stroke (NINDS). At this stage, there are no core outcomes for SAH that have been identified by the NINDS CDE or other groups.

A core outcome set (COS) is an agreed standardized set of outcome measures used in clinical research [10]. COS reduce risk of selective reporting bias, facilitate the synthesis of findings from multiple trials and improve clinical decision making [11, 12] COS development is a consensus-based iterative process that is expected to take years to decades to develop [13]. The earliest stages involve scoping the relevant medical literature and clinical expertise. Later stages involve stakeholder involvement, consensus building and regular review [14].

Here, we systematically reviewed all outcome measures employed in randomized controlled trials (RCTs) of aSAH over the past 20 years. The outcome measures were classified according to the framework developed by the Outcome Measures in Rheumatology (OMERACT) consensus initiative that is advocated as a model for the development of COS [13, 15] The OMERACT framework employs four main categories: death, life impact, pathophysiological manifestations and resource use. This systematic review aims to assist future researchers regarding their choice of outcome instruments and represents the early stages of development of a COS in aSAH.

Materials and Methods

A detailed protocol of the study design and methods was developed, and the study was prospectively registered with the Core Outcome Measures in Effectiveness Trials (COMET) initiative. http://www.comet-initiative.org/studies/details/747. We followed the Preferred Reporting Items for Systematic Reviews and Meta Analyses (PRISMA) guidelines [16].

Inclusion criteria were English language research articles reporting RCTs that included a minimum of ten patients with exclusively SAH reported at least one outcome. There was no restriction regarding interventions and comparators used. We excluded review articles, letters and editorials.

The search strategy was performed in May 2015 and used the following electronic databases: Ovid Medline, Excerpta Medica dataBASE (EMBASE), Cumulative Index to Nursing and Allied Health Literature (CINAHL) and the Cochrane Central Register of Controlled Trials (CENTRAL). Included studies were published from January 1996, which correspond with the publication of the first CONSORT document [17].

We used the following search terms for each of the databases: subarachnoid hemorrhage, intracranial aneurysm, ruptured aneurysm, hemorrhagic stroke, delayed cerebral ischemia, intracranial vasospasm, randomized controlled trial and humans. The full search strategy is available in the supplemental digital content.

Data extraction was performed using the Evidence for Policy and Practice Information EPPI-Reviewer 4 a Web-based program developed and maintained by Social Science Research Unit at the Institute of Education, University of London [18]. The data extraction form (supplementary digital content) was developed a priori and refined following testing on ten randomly selected papers. Both reviewers independently extracted the data from each paper, and discrepancies were resolved by consensus.

The included studies were assessed for primary and secondary outcomes, and these were categorized into previously described domains using the OMERACT filter [15]. Outcomes were classified as primary when explicitly identified by the study authors, and all other outcomes were classified as secondary. We used a Wilcoxon Rank-Sum Test performed on SPSS Statistics Version 21 to determine differences in outcome measures between trials based on numbers of participants. We defined a clinical meaningful outcome as one that measures ‘directly how a patient feels, functions or survives’ [19].

Results

The search identified 1093 studies, and after removing duplicates and excluding letters, editorials and review articles, 716 studies remained. Of these, 129 trials representing 24 238 unique patients met the inclusion criteria. Detailed reasons for exclusion at each step are documented in Fig. 1 [16].

Fig. 1
figure 1

Preferred reporting items for Systematic Reviews and Meta Analyses Flow Diagram

Trial Characteristics

The most common trial characteristics were a population of 51–500 patients and a pharmaceutical intervention with a placebo-controlled comparator. Trials were most likely to be government or university funded and single center. The non-outcome characteristics of trials are presented in Fig. 2.

Fig. 2
figure 2

Trial characteristics

Primary Outcome

Across all domains, there were 51 unique primary outcomes and overall 285 endpoints were reported at least once as either a primary or secondary outcome. Primary outcomes were explicitly nominated in 89/129 (69.0%) of included studies. Clinically meaningful primary outcomes were reported in 58/129 (45.0%) of studies. Larger trials were significantly more likely to use clinically meaningful outcomes when compared to trials with fewer participants. Pathophysiological outcomes such as neurological sequelae, biomarkers and imaging findings were the most commonly used primary outcome measures (46/129, 35.7%). A primary outcome of symptomatic or clinical vasospasm, delayed ischemic neurological deficits or delayed cerebral ischemia was used in 20/129 (15.5%) of studies. Imaging modalities including transcranial Doppler (TCD), magnetic resonance imaging (MRI), computed tomography (CT) scans and digital subtraction angiography (DSA) were used in 13/129 (10.1%) of studies. Functional outcome measures (FOM) were reported as a primary outcome in 27/129 (20.9%) of studies with the Glasgow Outcome Scale (GOS) [20] used in 13/129 (10.1%) of these and the modified Rankin Scale (mRS) [21, 22] reported in 10/129 (7.8%). All-cause mortality was used in 4/129 (3.1%) of studies as a primary outcome measure. A composite primary outcome measure was used in 8/129 studies (6.2%).

Mortality

Mortality at a specific time point (landmark) was reported in 74/129 (57.4%) of studies (Fig. 3) with 3 months the most commonly chosen time point (Fig. 4). Death in the intensive care unit (ICU) or at hospital discharge was used in 11/129 (8.5%) of studies. Six trials (4.7%) used time-to-event analysis to report survival. Eighty-one trials reported all-cause mortality (81/129, 62.8%), and 11/129 trials reported disease-specific mortality (8.5%). Forty-five trials (45/129, 34.9%) did not report mortality.

Fig. 3
figure 3

Frequency of mortality, health-care resource use and assessment of life impact outcomes and pathophysiological outcomes

Fig. 4
figure 4

Frequency of different time points reported by included trials for mortality, the Glasgow Outcome Scale and the Modified Rankin Scale

Health-care Resource Use

Measures of health-care resource use were reported in 36/129 studies (27.9%). Most commonly, this involved the length of ICU stay (17/129, 13.2%) and length of hospital stay (18/129, 14.0%). Procedural aspects included the number of procedures (13/36, 10.0%), duration of a specific intervention (10/129, 7.8%) and a calculation of procedural costs (6/129, 4.7%). Other reported outcomes in this domain included measures of cost-effectiveness (3/129, 2.3%) and duration of rehabilitation (1/129, 0.8%).

Assessment of Life Impact

Sixty-five trials reported the GOS (65/129, 50.4%) at a range of different time points (Fig. 4). Of the trials that reported the GOS, 28/65 used a dichotomized measure (43.1%). 23/65 (35.4%) grouped ‘1-death’ ‘2-persistent vegetative state’ and ‘3-severe disability’ together as an unfavorable outcome and ‘4-moderate disability’ and ‘5-good recovery’ together as a favorable outcome. Two trials (3.1%) employed a less conservative approach, including severe disability in the favorable category, while two trials (3.1%) chose a more conservative approach limiting favorable outcome to good recovery only. One study (1.5%) used death as an individual category and then split ‘2–3’ and ‘4–5’ into two categories. GOS was also reported by one study (1.5%) as a median and an interquartile range.

The mRS was reported in 51/129 (39.5%) at a range of different time points (Fig. 4). Of the studies reporting the mRS, 29/51 (56.9%) dichotomized the ordinal scale with 18/51 (35.3%) classifying favorable as a scores of ‘0-no symptoms’, ‘1-no significant disability’ and ‘2-slight disability’, and unfavorable as ‘3-moderately disability’, ‘4-moderately severe disability’, ‘5-severe disability’ and ‘6-dead’. Four trials (7.8%) chose a less conservative approach with favorable encompassing scores of 0-3 and unfavorable scores of 4-6, and one study (2.0%) reported a more conservative approach limiting favorable outcomes to no symptoms and no significant disability. Four studies (7.8%) reported the mRS as three new categories, and two studies (3.9%) only reported scores of 0 (excellent) and 4–5 (poor).

There was significant temporal variation with respect to the two most commonly reported FOMs. In trials published between 1996 and 2006, 37/57 (64.9%) reported the GOS while 12/57 (21.1%) reported the mRS. In trials published between 2007 and 2015, 28/72 (38.9%) reported the GOS while 39/72 (54.2%) reported the mRS.

Other FOMs included the National Institute of Health Stroke Scale [23] (11/129, 8.5%), Barthel index [24] (8/129, 6.2%), the extended GOS [25] (6/129, 4.7%), and the Karnofsky Performance Status Scale [26] (2/129, 2.3%). Five studies (3.9%) reported Glasgow Coma Scores [27] as an outcome measure. FOMs that were used once (0.8%) included the World Federation of Neurological Surgeons SAH grading scale [28], the Functional Status Examination [29] and the Academic Medical Centre Linear Disability Score [30].

Patient-reported quality of life (QoL) measures were reported in 11/129 studies (8.5%). The Short Form (36) Health Survey [31] measure was used in six trials (4.7%), and the EQ 5D [32] measure was used in 3 (2.3%). QoL measures that were used once (0.8%) included the Sickness Impact Profile [33], a Visual Analog Scale [34], Health-Related Quality of Life 15D [35], Patient Health Questionnaire (PHQ-9) (an assessment of mood and impact on function) [36] and the Satisfaction with Life Scale [37].

Neuropsychological testing was employed by 10/129 studies (7.8%). Two studies did not describe the methods employed. The most commonly reported measures (6/129, 4.7%) were the Wechsler Adult Intelligence Scale, [38] the Weschler Memory Scale, [39] and the Trail Making Test [40]. Additional neuropsychological measures and their frequencies are presented in Fig. 3.

Pathophysiological Outcomes

Pathophysiological outcomes were reported in 93.8% of the studies (121/129). Neurological sequelae were the most common pathophysiological category (104/129) (80.6%). Non-neurological sequelae (33/129, 25.6%) were categorized into cardiac sequelae (16/129, 12.4%), pulmonary sequelae (15/129, 11.6%), infection, fever and sepsis (17/129, 13.2%) and fluid status (6/129, 4.7%).

There was wide variation in the definitions of ‘clinical vasospasm’ (15/104, 11.6%), ‘symptomatic vasospasm’ (24/104, 23.1%) and ‘vasospasm’ (28/104, 21.7%). The radiological methods used to define the variations of vasospasm included TCD using a mean velocity threshold or a Lindegaard ratio [41] and conventional or CT angiography. Clinical findings usually involved a decrease in GCS or new focal neurology not explained by other causes; however, there was a high degree of variability in the definitions employed. In studies reporting clinical vasospasm, it was not defined in 4/15 (26.7%), was defined as a combination of clinical and radiological findings in 1/15 studies (6.7%) and on clinical features alone in 10/15 studies (66.7%). Symptomatic vasospasm was not defined in 10/24 (41.7%) of the studies reporting this outcome measure, was based on cerebral blood flow measurements in 1/24 studies (4.2%) and was based on clinical findings in 6/24 studies (25.0%) and a combination of clinical and radiological findings in 7/24 studies (29.2%). Vasospasm was not defined in 5/28 studies (17.9%), was based on radiological findings in 18/28 studies (64.2) and was based on a combination of clinical and radiological findings in 5/28 studies (17.9%).

Overall 51/129 (39.5%) studies described an outcome measure consistent with some form of clinical deterioration associated with delayed cerebral ischemia. Terms used by these studies included delayed neurological deficits (3/51, 5.9%), delayed ischemic neurological deficits (28/51, 54.9%), delayed ischemic deficits (9/51, 17.6%) and delayed cerebral ischemia (15/51, 29.4%). 19/51 definitions required a combination of clinical and radiological findings (37.2%), 25/51 (49.0%) were clinical definitions, 1/51 (2.0%) was purely radiological and 6/51 (11.8%) did not provide a definition.

Seventy-one studies used an imaging modality as an outcome measure (71/129, 55.0%). Within studies reporting imaging modalities, TCD was the most frequently used (40/71, 56.3%). Some studies reported TCD as a Lindegaard ratio of greater than 3, 4 and/or 6, some studies used a mean velocity and others reported velocity thresholds of either 120 cm/s, 160 cm/s and/or 200 cm/s. 34/71 (47.9%) studies reported the findings on plain CT as an outcome measure, 17/71 (23.9%) studies used conventional CT or CT angiogram, 11/71 studies (15.5%) employed MRI, 5/71 studies (7.0%) reported single-photon emission computed tomography findings and 3/71 studies (4.2%) used CT perfusion.

Chemical biomarkers were measured in 43/129 studies (33.3%) and included serum electrolytes in 20/129 studies (15.5%), markers associated with brain injury (e.g., neuron specific enolase and s100B) in 6/129 studies (4.7%) and markers of coagulation and hemoglobin in 6/129 (4.7%). Other chemical biomarkers reported are presented in Fig. 3. Physical biomarkers were measured in 30/129 studies (23.2%) and included mean arterial pressure (9/129, 7.0%), intracranial pressure (8/129, 6.2%), central venous pressure (6/129, 4.7%), cerebral perfusion pressure (6/129, 4.7%) and oxygen tension (4/129, 3.1%). Additional physical biomarkers are reported in Fig. 3.

Discussion

We have identified a wide range of outcome measures and demonstrated significant heterogeneity in terms of the choice of outcome, measurement instrument used and timing of assessment. The strength of our study is that we applied a robust methodology, prospectively registered our study, employed a well-designed protocol and used the previously described OMERACT classification system. We conducted a broad search strategy and extracted all the data in duplicate reducing the likelihood of error.

There were several weaknesses in our study. We limited our search to English language papers and did not include studies pre-1996 which may have missed additional outcome measures. We limited our study to RCTs; however, looking at observational trials may have also increased the number of outcome measures. We did not consult with experts in the field to identify further trials which may have supplemented our comprehensive search strategy.

A previous systematic review of stroke trials (excluding aSAH) also demonstrated a lack of consistency in outcome measures [42]. Standardization of outcome measures has been promoted through the stoke common data elements [43]. The Stroke Standard Set (SSS) has recently been published which represents a COS in acute stroke trials [44]. The SSS was developed using a Delphi consensus process involving a panel of experts in stroke research and focusses on patient-centered outcome measures. Due to the different course of treatment and outcomes in aSAH, this group of patients was excluded from the SSS.

Consensus approaches have also been developed within aSAH research. In 2010, a panel of experts worked toward a consensus definition of delayed cerebral ischemia (DCI) [7]. Vergouwen and colleagues used a consensus approach to propose the following;

The occurrence of focal neurological impairment (such as hemiparesis, aphasia, apraxia, hemianopia, or neglect), or a decrease of at least 2 points on the Glasgow Coma Scale (either on the total score or on one of its individual components [eye, motor on either side, verbal]). This should last for at least 1 h, is not apparent immediately after aneurysm occlusion, and cannot be attributed to other causes by means of clinical assessment, CT or MRI scanning of the brain, and appropriate laboratory studies.

The panel also recommended that the terms ‘vasospasm’ or ‘arterial narrowing’ be restricted to descriptions of radiological investigations and not be combined with clinical manifestations of DCI. This paper has been cited over 350 times in the literature, and the definitions have been adopted into recent clinical trials [45,46,47].

Our work augments the current efforts in the harmonization of outcome measures in subarachnoid hemorrhage research. A development of a specific core outcome set will engage relevant stakeholders including patients, families, researchers, clinicians, allied health workers and policy-makers and identify which outcome measures should be prioritized during research. This specific core outcome set should aim to achieve consensus on how and when to measure these outcomes.

Conclusion

Our comprehensive systematic review has demonstrated substantial heterogeneity in the outcome measures employed in aSAH RCTs, making assimilation of the totality of evidence to guide patient management difficult. The development of a COS in aSAH is both necessary and attainable, and our systematic review provides a foundation for ongoing efforts in this area. A consensus approach to identify which outcomes should be used in aSAH trials including how and when these outcomes should be measured is critical step to ensure patients receive the best possible evidence-based management.