Abstract
Background
The modified Fisher scale (mFS) is a critical clinical and research tool for risk stratification of cerebral vasospasm. As such, the mFS is included as a common data element by the National Institute of Neurological Disorders and Stroke SAH Working Group. There are few studies assessing the interrater reliability of the mFS.
Methods
We distributed a survey to a convenience sample with snowball sampling of practicing neurointensivists and through the research survey portion of the Neurocritical Care Society Web site. The survey consisted of 15 scrollable CT scans of patients with SAH for mFS grading, two questions regarding the definitions of the scale criteria and demographics of the responding physician. Kendall’s coefficient of concordance was used to determine the interrater reliability of mFS grading.
Results
Forty-six participants (97.8% neurocritical care fellowship trained, 78% UCNS-certified in neurocritical care, median 5 years (IQR 3–6.3) in practice, treating median of 80 patients (IQR 50–100) with SAH annually from 32 institutions) completed the survey. By mFS criteria, 30% correctly identified that there is no clear measurement of thin versus thick blood, and 42% correctly identified that blood in any ventricle is scored as “intraventricular blood.” The overall interrater reliability by Kendall’s coefficient of concordance for the mFS was moderate (W = 0.586, p < 0.0005).
Conclusions
Agreement among raters in grading the mFS is only moderate. Online training tools could be developed to improve mFS reliability and standardize research in SAH.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
Arterial vasospasm plays a critical role in the development of delayed cerebral ischemia (DCI) after aneurysmal subarachnoid hemorrhage (SAH). It occurs in 20–50% of patients following aneurysmal rupture and contributes substantially to poor outcome through a combination of secondary injury mechanisms [1,2,3,4,5,6]. Advancing the understanding of these mechanisms is facilitated by reproducible measurement of cumulative hemorrhage burden. In 1980, C. Miller Fisher first utilized initial CT imaging to risk-stratify patients for vasospasm [7]. The Fisher scale specifically defined thin SAH as less than 1 mm and thick SAH as greater than 1 mm. It was not meant to be an ordinal scale, as group 3 has a higher risk of vasospasm than group 4. It has since been utilized, however, as an ordinal scale and reported incorrectly. It also does not account for the additive risk of ventricular blood. As a result, classification of patients by the Fisher scale is unclear and errors are often noted in its application [8, 9].
Claassen et al. [10] revisited the Fisher scale in order to develop a simpler admission CT rating scale with superior predictive value for DCI. Claassen’s scale accounts for the separate and additive risk of thick SAH (completely filling one or more cistern or fissure) and bilateral intraventricular hemorrhage. This score is often referred to as the “Modified Fisher scale” (mFS) although the manuscript made no mention of this name [11]. Some studies report this as the “Claassen scale” or “Columbia scale.” Then, Frontera et al. [12, 13] specifically coined the term “mFS,” which they found predictive of symptomatic vasospasm (not DCI). Their analysis utilized retrospective data collected on 1378 scans from a clinical trial, and no images or measurements were available. Therefore, no explicit measurement criteria were used in Frontera’s mFS to classify blood as thick or thin, and any IVH (not bilateral) was graded as present or absent (Fig. 1). Although Frontera et al. used different criteria from Claassen et al., many reference their work interchangeably when referring to the mFS (Table 1, Supplemental Table 1) [14, 15].
While prior studies have assessed the interrater reliability (IRR) of the mFS between a limited number of investigators at a single institution and found a moderate-to-good reliability [16, 17], IRR across institutions and working definitions of the criteria of the mFS is unknown. In the present study, we hypothesized that attending physicians that routinely take care of patients with SAH do not agree on the definitions of mFS criteria, and therefore, the mFS has limited IRR. We also performed a systematic literature review to assess for inconsistencies in the application of the mFS. As the National Institute of Neurological Disorders and Stroke (NINDS) Common Data Elements initiative has recently included the mFS as a highly recommended supplemental imaging grade [18], it is important that any misunderstanding of mFS grading be elucidated now and that the definitions of the scale criteria are properly understood to increase its validity in clinical trials and large population studies. A search of clinicaltrials.gov (accessed October 14, 2020) revealed 18 active or recruiting studies involving aneurysmal SAH and vasospasm, including trials of prevention (cilostazol and nimodipine combined, clazosentan, milrinone, CSF alteration, stellate ganglion block) and prediction (acetazolamide challenge with perfusion, 18F-FDG PET/CT) as well as 26 completed studies. Despite the low prevalence of aneurysmal SAH, VSP is a very active area of study, and enrollment into these studies should rely on good IRR with a scale that is used to predict VSP.
Methods
Study Design
There were two parts to this study, a cross-sectional survey and systematic review of existing literature. The survey was administered online to physicians from multiple institutions through the research survey portion of the Neurocritical Care Society website as well as a convenience sample of personal email contacts through snowball sampling. This study was reviewed and approved by LVHN’s institutional review board (IRB) and qualified as Human Subjects Research in Exempt Category (2)(i).
Instrument (Supplemental Figure 1)
Fifteen admission CT scans of patients with SAH from the authors’ institutions were randomly selected, anonymized, and made into videos for ease of scrolling and then uploaded into a Google Form survey (Fig. 2, Supplemental Figure 1). Participants were asked to grade the mFS for each CT scan. Only self-identified attending physicians who assign mFS were asked to participate. Additional data collected include details of how the participant defines “thick” versus “thin” clot, how “IVH” is scored, as well as demographic data with questions about medical training, experience with grading mFS, and training in mFS administration. Surveys were anonymous.
Statistical Analysis
Descriptive statistics were used to summarize the training characteristics of the participants. Frequencies and percentages are presented for continuous variables, while the median and interquartile range (IQR) are presented for continuous and ordinal variables. Kendall’s coefficient of concordance was applied to determine interrater reliability (IRR). Kendall’s coefficient of concordance is appropriate when there are 3 or more raters rating the same subjects (the same raters are used to assess each subject), and the rating is on an ordinal or continuous scale; 0 indicates no concordance, and 1 indicates perfect concordance [19]. Subset analysis was performed to determine IRR of several subgroups based on definitions of mFS criteria and level of training.
Literature Search
In a secondary analysis, we performed a systematic review of original research that cites the mFS paper by Frontera et al., as well as Claassen et al.’s paper on “The Fisher Scale Revisited”. We searched “Pubmed” and “Scopus” (accessed February 8, 2020) for original research citing those two papers. We excluded case reports, review articles, studies not readily available in English, and studies that used the mFS for reporting complications of procedures. We assessed each paper for its inclusion of definitions of the mFS criteria, whether or not the definitions (if present) were correct, and how the scale was used for the study (as part of the demographics reported, as a variable in a predictive/correlative model, as a matched variable, an adjusted variable, or as a comparator) (for definitions, see Supplemental Table 1).
Results
We received 47 responses to the survey—one response from a non-physician was excluded, leaving 46 responses. We could not determine a response rate due to our utilization of snowball sampling. There were 32 medical centers represented, reported as treating a median of 80 (IQR 50–100) patients with SAH per year. Nearly all participants had completed a fellowship in neurocritical care, but only approximately one quarter reported “formal training” in grading the mFS (Table 2).
In reporting definitions of mFS criteria according to Frontera et al.’s criteria, only 24% of participants correctly identified that there is no clear measurement of thick or thin SAH, but just over half (52%) correctly identified that any blood in any ventricle is scored as intraventricular blood (Table 3). Half of the participants recognized that there is a distinction between Claassen’s scale and the mFS, while 33% refuted the distinction, and 17% reported not knowing whether there was a distinction or not. Most participants (72%) would take an online training module to standardize scoring of the mFS.
In grading the 15 CT scans for mFS without being provided criteria, the overall IRR by Kendall’s coefficient of concordance was W = 0.586 (p < 0.0005), which is considered a statistically significant, moderate level of agreement. Those who correctly identified thin and ventricular blood definitions demonstrated a statistically significant better level of agreement (W = 0.727, p < 0.0005), while those who claim to have had formal training performed similarly to the entire cohort (W = 0.588, p < 0.0005).
In a secondary analysis, we found 241 papers referencing Frontera et al.; 108 fit the inclusion criteria for evaluation. There were 421 papers referencing Claassen et al., and 91 fit the inclusion criteria. With overlap, there were a total of 164 original research papers utilizing the mFS with only 17 explicitly listing Frontera et al.’s criteria when utilizing the mFS. There were nine papers explicitly listing Claassen et al.’s criteria as the criteria for the mFS. While the majority did not state any criteria used to grade the mFS, several papers were unclear in their criteria —interchangeably using Fisher and modified Fisher, partially listing criteria, or having incorrect references for named scales (Supplemental Table 1). The majority of studies use the mFS as a variable for prediction or to show correlation (100 studies), and another 44 studies report the mFS in demographics. In addition, mFS was used as an adjusted variable (5 studies), comparator variable (10 studies), inclusion criteria (2 studies), and as a matched variable (2 studies) (Supplemental Table 1).
Discussion
Among attending neurointensivists from over 30 institutions with high volumes of patients with SAH, we found only moderate IRR of the mFS. Most participants reported being responsible for grading the mFS, and nearly one in five of the participants had published data utilizing the mFS.
Prior studies have found higher IRR compared to our data. Claassen et al. did not measure IRR for their grading scale but did find that the IRR for their SAH and IVH measurements indicated good (ΚW = 0.6–0.8) and excellent (ΚW = 0.8–1.0) agreement [10]. A retrospective study of 271 patients’ CT scans graded by four raters found the mFS to have a moderate-to-good agreement (ΚW = 0.64) [16]. A later single-center study of 150 patients found similar results with two raters (ΚW = 0.61) [17]. Most recently four raters from a single institution of 165 patient scans were found to have a moderate agreement for the mFS (Κ = 0.42) [20]. Our study differs from prior work due to the large number of raters and institutions represented. We think our study reflects the heterogeneity of raters that would be contributing mFS grades to a large, multicenter clinical trial.
The burden of subarachnoid and intraventricular hemorrhage predicts symptomatic vasospasm and delayed cerebral ischemia [2, 21]. Hemolysis leads to inflammation, oxygen-free radical reactions, and endothelial injury that drives vasoconstriction [22]. The mFS holds tremendous research value as a grading system that facilitates risk stratification for symptomatic vasospasm based on blood burden. However, in order to be valid, a grading scale must have clear criteria and demonstrate good IRR. The mFS has clear utility in research and should not be replaced, but should be standardized. According to our data, the mFS lacks good IRR which is likely due to uncertainty regarding the scale criteria. We hypothesize that much of the confusion stems from the slight differences in the foundational papers by Claassen et al. and Frontera et al., in which the same author group used slightly different criteria to assess related but not identical outcomes (delayed cerebral ischemia vs. symptomatic vasospasm) [10, 12]. Our literature search found that many authors have attributed the mFS to Claassen et al. and sometimes integrate the original Fisher criteria for thin versus thick blood into the mFS, adding to the confusion. Of the 37 studies that had some definition of the mFS in their manuscript, only 17 listed the correct criteria and 20 listed incorrect or incomplete criteria. Even the NINDS Common Data Elements Project Investigators incorrectly attributed the mFS to Claassen et al. in one publication [23], while defining thin blood and thick blood according to the original Fisher criteria in another [24]. The CDE states that it is an attempt to “harmonize and standardize data collected for clinical studies in neuroscience,” and if incorrect and inconsistent definitions are used, this will only further the confusion on the correct definition of mFS. The incorrect scale criteria offered by commonly referenced Web sites such as mdcalc.com and UpToDate® compound the problem [11, 25].
In our study, the IRR of the mFS was only moderate. Of note, half of our participants were not aware that Claassen’s scale was unique from the mFS, about half could correctly identify the criteria for IVH, and less than a quarter recognized the criteria for thin versus thick blood. Nonetheless, there is reason to believe that proper training could improve the IRR. The study with the highest IRR provided each rater with a detailed description of the scale [16], suggesting that simply providing the criteria can bolster the IRR of the scale. Similarly, in our study those participants who could properly define the mFS score components showed good agreement.
With the inclusion of the mFS on the CDE, it is time to standardize the definition and training for the mFS. Many other grading scales (clinical and radiographic) require standardized training prior to inclusion in clinical trials and continued re-education and certification to assure the validity of the data. For example, inclusion of the National Institutes of Health Stroke Scale (NIHSS) into any trial requires certification by the examiners. An online-/video-based training program for the NIHSS has improved the reliability of the scale and is now standard practice prior to inclusion into any study [26]. Similarly, interventional stroke trials require online training and certification for the Alberta Stroke Program Early CT score (ASPECTS) after early studies showed insufficient interrater reliability [27, 28].
The consequences of the deficiencies in the IRR of the mFS are unknown. We found 164 original research articles referencing the mFS. A substantial portion found the scale to be an important predictor variable or an adjusted variable in a predictive model. The reproducibility of those results may depend on the consistency of mFS grading across institutions. As trials attempt to enrich their cohorts for the outcomes of symptomatic vasospasm and DCI [29, 30], having more reliability around scoring would be important for clinical trial entry. Specifically, studies that have focused on subtypes of SAH using components of the mFS without naming the mFS (such as a focus on thick clot) [30] showed benefit of therapy only for this subtype, underlining the importance of accurate IRR in evaluating scans for therapeutic effect. Further, the validity of trial results based on mFS with poor IRR should be evaluated for type II error. Our study has several limitations. The survey participants were mostly neurologists with neurocritical care certification. Others, including neurosurgeons, neuroradiologists, or non-physicians, may grade the mFS at some institutions. Although our participants self-identified as being primarily responsible for grading the mFS in their patients, we cannot be sure that they represent mFS graders at large. In addition, our participants were relatively inexperienced with a median of 5 years of neurocritical care practice. It is uncertain whether this is problematic. Others have reported no influence of experience on the IRR for the mFS [16]. We anticipated a higher response, though our recruitment methods did not allow for calculating a true response rate. In order to improve response rate, we limited the number of CT scans reviewed and were not able to show all possible visual permutations. Additionally, our sample was obtained through non-probability sampling methods. We hypothesize that participants were more interested in mFS grading than those who did not complete the survey and therefore may be more likely to be familiar with the accurate definitions. Thus, we do not think that the poor response rate would bias our results toward lowering the IRR, and those interested in responding may be more familiar with accurate definitions biasing results toward a higher IRR, but we cannot be sure. The survey included videos of scans that could be scrolled, but did not have available windowing or measurement tools, though the mFS does not require measurement to be accurate. However, we did not receive any feedback from participants that technical factors impaired their grading.
Conclusion
IRR among raters in grading the mFS is inadequate and may be related to discrepancies regarding the definitions of the score criteria. The NINDS SAH Common Data Elements may require further clarification in order to standardize research in SAH. More importantly, mFS may become a core tracking metric required for Comprehensive Stroke Centers and endorsed by Joint Commission (like the Hunt and Hess Scale). Many other common data points such as NIHSS and ASPECTs for ischemic stroke involve formal standardized training and certification with continuing education, especially if the data are to be used in research. The mFS would benefit from a similar formalized training program with certification, and 72% of participants agreed that they would take formal online training. Alternatively, an automated imaging pipeline capable of more accurately and rapidly measuring cisternal and ventricular hemorrhage may prove superior in facilitating large cohort studies evaluating underlying mechanisms of injury. [24].
References
Hop JW, Rinkel GJ, Algra A, van Gijn J. Initial loss of consciousness and risk of delayed cerebral ischemia after aneurysmal subarachnoid hemorrhage. Stroke. 1999;30(11):2268–71.
Hijdra A, van Gijn J, Nagelkerke NJ, Vermeulen M, van Crevel H. Prediction of delayed cerebral ischemia, rebleeding, and outcome after aneurysmal subarachnoid hemorrhage. Stroke. 1988;19(10):1250–6.
Chou SH, Smith EE, Badjatia N, et al. A randomized, double-blind, placebo-controlled pilot study of simvastatin in aneurysmal subarachnoid hemorrhage. Stroke. 2008;39(10):2891–3.
Dorsch N. A clinical review of cerebral vasospasm and delayed ischaemia following aneurysm rupture. Acta Neurochir Suppl. 2011;110(Pt 1):5–6.
Dorhout Mees SM, Kerr RS, Rinkel GJ, Algra A, Molyneux AJ. Occurrence and impact of delayed cerebral ischemia after coiling and after clipping in the International Subarachnoid Aneurysm Trial (ISAT). J Neurol. 2012;259(4):679–83.
Rosengart AJ, Schultheiss KE, Tolentino J, Macdonald RL. Prognostic factors for outcome in patients with aneurysmal subarachnoid hemorrhage. Stroke. 2007;38(8):2315–21.
Fisher CM, Kistler JP, Davis JM. Relation of cerebral vasospasm to subarachnoid hemorrhage visualized by computerized tomographic scanning. Neurosurgery. 1980;6(1):1–9.
Rosen DS, Macdonald RL. Subarachnoid hemorrhage grading scales: a systematic review. Neurocrit Care. 2005;2(2):110–8.
Komiyama M. Misunderstanding of Fisher’s grouping system for computed tomography evaluation of aneurysmal subarachnoid haemorrhage. Interv Neuroradiol. 2019;25(6):653–4.
Claassen J, Bernardini GL, Kreiter K, et al. Effect of cisternal and ventricular blood on risk of delayed cerebral ischemia after subarachnoid hemorrhage: the Fisher scale revisited. Stroke. 2001;32(9):2012–20.
Singer R, Ogilvy C, Rordorf G. Subarachnoid hemorrhage grading scales. UpToDate Inc. UpToDate Web site. https://www.uptodate.com. Published 2020. Updated November 19, 2019. Accessed 5 Feb 2020.
Frontera JA, Claassen J, Schmidt JM, et al. Prediction of symptomatic vasospasm after subarachnoid hemorrhage: the modified fisher scale. Neurosurgery. 2006;59(1):21–7 (discussion 21-27).
Rosen DS, Macdonald RL. Subarachnoid hemorrhage grading scales. UpToDate Inc. UpToDate Web site. https://www.uptodate.com. Published 2020. Updated November 19, 2019. Accessed 5 Feb 2020.
de Oliveira Manoel AL, Jaja BN, Germans MR, et al. The VASOGRADE: a simple grading scale for prediction of delayed cerebral ischemia after subarachnoid hemorrhage. Stroke. 2015;46(7):1826–31.
van der Steen WE, Leemans EL, van den Berg R, et al. Radiological scales predicting delayed cerebral ischemia in subarachnoid hemorrhage: systematic review and meta-analysis. Neuroradiology. 2019;61(3):247–56.
Kramer AH, Hehir M, Nathan B, et al. A comparison of 3 radiographic scales for the prediction of delayed ischemia and prognosis following subarachnoid hemorrhage. J Neurosurg. 2008;109(2):199–207.
Jiménez-Roldán L, Alén JF, Gómez PA, et al. Volumetric analysis of subarachnoid hemorrhage: assessment of the reliability of two computerized methods and their comparison with other radiographic scales. J Neurosurg. 2013;118(1):84–93.
Suarez JI, Sheikh MK, Macdonald RL, et al. Common data elements for unruptured intracranial aneurysms and subarachnoid hemorrhage clinical research: a national institute for neurological disorders and stroke and national library of medicine project. Neurocrit Care. 2019;30(Suppl 1):4–19.
Lund Research Ltd. https://statistics.laerd.com/premium/spss/kccir/kendalls-coefficient-of-concordance-in-spss-4.php. Accessed 4 May 2020.
Woo PYM, Tse TPK, Chan RSK, et al. Computed tomography interobserver agreement in the assessment of aneurysmal subarachnoid hemorrhage and predictors for clinical outcome. J Neurointerv Surg. 2017;9(11):1118–24.
Dupont SA, Wijdicks EF, Manno EM, Lanzino G, Rabinstein AA. Prediction of angiographic vasospasm after aneurysmal subarachnoid hemorrhage: value of the Hijdra sum scoring system. Neurocrit Care. 2009;11(2):172–6.
Macdonald RL. Delayed neurological deterioration after subarachnoid haemorrhage. Nat Rev Neurol. 2014;10(1):44–58.
Damani R, Mayer S, Dhar R, et al. Common data element for unruptured intracranial aneurysm and subarachnoid hemorrhage: recommendations from assessments and clinical examination workgroup/subcommittee. Neurocrit Care. 2019;30(Suppl 1):28–35.
Hackenberg KAM, Etminan N, Wintermark M, et al. Common data elements for radiological imaging of patients with subarachnoid hemorrhage: proposal of a multidisciplinary research group. Neurocrit Care. 2019;30(Suppl 1):60–78.
Kummer B. Modified fisher grading scale for subarachnoid hemorrhage (SAH). http://www.mdcalc.com. Accessed 5 Feb 2020.
Lyden P, Brott T, Tilley B, et al. Improved reliability of the NIH Stroke Scale using video training. NINDS TPA Stroke Study Group. Stroke. 1994;25(11):2220–6.
Mak HK, Yau KK, Khong PL, et al. Hypodensity of > 1/3 middle cerebral artery territory versus Alberta Stroke Programme Early CT Score (ASPECTS): comparison of two methods of quantitative evaluation of early CT changes in hyperacute ischemic stroke in the community setting. Stroke. 2003;34(5):1194–6.
Gupta AC, Schaefer PW, Chaudhry ZA, et al. Interobserver reliability of baseline noncontrast CT Alberta Stroke Program Early CT Score for intra-arterial stroke treatment selection. AJNR Am J Neuroradiol. 2012;33(6):1046–9.
Mura J, Rojas-Zalazar D, Ruíz A, Vintimilla LC, Marengo JJ. Improved outcome in high-grade aneurysmal subarachnoid hemorrhage by enhancement of endogenous clearance of cisternal blood clots: a prospective study that demonstrates the role of lamina terminalis fenestration combined with modern microsurgical cisternal blood evacuation. Minim Invasive Neurosurg. 2007;50(6):355–62.
Mayer SA, Aldrich EF, Bruder N, et al. Thick and diffuse subarachnoid blood as a treatment effect modifier of Clazosentan after subarachnoid hemorrhage. Stroke. 2019;50(10):2738–44.
Funding
None.
Author information
Authors and Affiliations
Contributions
The manuscript complies with all instructions to authors. Authorship requirements have been met, and the final manuscript was approved by all authors. This has not been published elsewhere and is not under consideration by another journal.
Corresponding author
Ethics declarations
Conflict of interest
Dr. Claassen reports grants from NINDS, grants from NINDS, other from iCE Neurosystem, outside the submitted work. There are no other conflicts of interest.
Ethical approval
This study adheres to ethical guidelines. This study was reviewed and approved by LVHN’s institutional review board (IRB) and qualified as Human Subjects Research in Exempt Category (2) (i).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Melinosky, C., Kincaid, H., Claassen, J. et al. The Modified Fisher Scale Lacks Interrater Reliability. Neurocrit Care 35, 72–78 (2021). https://doi.org/10.1007/s12028-020-01142-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12028-020-01142-8