Introduction

The Glasgow Coma Scale (GCS) is the most used scoring system for the evaluation of patients with impaired consciousness [1]. However, a few but important limitations of the GCS have been recognized. The verbal component cannot be properly assessed in intubated patients; in this case, some physicians use the lowest possible score and others extrapolate the verbal response score from other neurological findings [2]. Moreover, the GCS does not detect subtle clinical changes in comatose patients [3]. For those reasons, over the past few decades, many scales were developed with the aim of integrating with or replacement of the GCS. Most of them were complex or not reliable, and thus were not widely accepted [48].

The Full Outline of UnResponsiveness (FOUR) score, has been recently proposed as a new coma scale [2]. It explores four components: eye response, motor response, brainstem reflexes, and respiration pattern (including mechanical ventilation); at variance with the GCS, the FOUR score does not evaluate the verbal response. Globally, the scale provides more neurological details than the GCS since it also includes items to assess respiration and brainstem reflexes. The maximum score for the four items is 4. In subjects in whom all categories are graded as 0, brain death should be considered [2, 9]. Moreover, the FOUR score can detect a locked-in syndrome as the presence of a vegetative state where the patient can spontaneously open the eyes, but is unable to track the examiner’s finger [2]. Nevertheless, the FOUR score does not distinguish patients in the vegetative state from those in the minimally conscious state [10, 11]. The scale has been considered valid, reliable, and a good prognostic predictor in critically ill patients [2, 1215], and has already been translated into French [16] and Spanish [17, 18] while an Italian version is not yet available. The aim of this study was to provide and validate the Italian version of the FOUR score.

Methods

Development of the Italian version of the scale

Two participants in the study (E.M. and S.S.) independently translated into Italian the original version of the FOUR score, of the associated instructions, and of the drawings’ text. Thereafter, a consensus meeting was held to agree on a fully comprehensible and accurate Italian translation consistent with the original English text. The draft was back translated into English, and compared with the original to develop the final Italian translation (Fig. 1). Thereafter, the Italian version of the FOUR score was validated.

Fig. 1
figure 1figure 1

Italian version of the FOUR score, of the items’ instructions and drawings

Participants

Patients admitted to the Neurology, Neurosurgery, and Intensive Care Units in the L’Aquila Hospital and to the Emergency Department Stroke Unit and Intensive Care Unit of the Policlinico Umberto I in Rome, were consecutively enrolled and evaluated within 7 days from symptom onset, from August to October 2010. In detail, 13 (14.9%) patients were evaluated on hospital arrival (day 0), 22 (25.3%) patients on day 1 and 52 (59.8%) from day 2 to day 7. Inclusion criteria were an age ≥18 years, and a diagnosis of acute brain injury. Exclusion criteria were treatment with neuromuscular junction blockers and sedatives less than 30 min prior and within the evaluation period. The study was performed in accordance with the Helsinki Declaration. Consent was obtained from the patient or from the legal surrogate.

Procedure

A pair of raters, randomly chosen, independently assessed all patients using the Italian version of the FOUR score and the GCS. Two raters were neurologists (N1–N2) and two were residents in neurology (R1–R2) with at least 4 years in clinical practice (N1: S.S.; N2: A.C.; R1: Alf.C.; R2: S.R.). Raters were provided with written instructions to ensure adequate understanding of the administration procedure and scoring of the scales. A trial session on two patients was also performed. The pairs of raters were N1/R1, N1/R2, N2/R1, N2/R2, N1/N2, and R1/R2; each patient was assessed by the two raters within a time interval of 1 h. The order of the evaluations was randomly set.

For each patient age, gender, medical history, diagnosis on admission, day of evaluation, intubation, neuroimaging data, and degree of consciousness (awake, drowsy, stuporous, or comatose) according to established criteria were recorded [19]. During the assessment, vital functions (heart rate, breath rate, blood pressure, oxygen saturation) were monitored.

Outcome at discharge was assessed by the modified Rankin Scale (mRS) by the same rater randomly selected from each pair. The mRS is a 7-point scale that assesses overall function and mortality, in which patients who die are scored 6, the worst possible score of the scale. In our study, patients were regarded as having a good recovery when the mRS score was between 0 and 2 and a poor outcome when the score was between 3 and 6.

Statistical analysis

Descriptive statistics for the study was presented as mean ± SD or median. Inter-rater agreement for the total score and the single items’ scores was evaluated by the weighted Cohen’s kappa (κw). A κw of 0.40 or less was considered poor, between 0.41 and 0.60 fair to moderate, between 0.61 and 0.80 good; values above 0.81 were considered to show an excellent agreement [20]. Internal consistency of the scale was evaluated by Cronbach’s α and intercorrelations of the items’ scores by the ρ Spearman’s correlation coefficient. The receiving operating characteristic (ROC) curve analyses adjusted for age, gender, consciousness, and clinical diagnosis were also calculated to determine the ability of the FOUR score and the GCS to predict mortality or poor outcome at discharge. Sensitivity, specificity, and likelihood ratio of those scales were also computed. Internal consistency, construct validity, intraclass correlation coefficients (ICC) and ROC curves were performed with SPSS 17.0 using the same rater score from each pair of raters and κw was performed with MedCalc 11.4. Statistical significance was set at P < 0.05.

Results

Eighty-seven patients (62% men; mean age ± SD 70.2 ± 13.9 years) were assessed. Median time from the acute event to evaluation was 2.78 days. Forty-three (49.4%) patients were alert, 12 (13.8%) drowsy, 6 (6.9%) stuporous, and 26 (29.9%) comatose; 15 (17.3%) patients were intubated and mechanically ventilated. Fifty-six (64.3%) patients had an ischemic stroke, 11 (12.6%) a traumatic head injury, 7 (8.0%) an intracerebral hemorrhage, 6 (6.9%) a subarachnoid hemorrhage, 3 (3.5%) an acute encephalitis, 1 (1.2%) a meningitis, 1 a right fronto-temporo-parietal meningioma, 1 a cerebral cysticercosis, and 1 a metabolic coma.

In the 87 enrolled patients, a total of 174 ratings were performed by the FOUR score and by the GCS. Distributions of overall and single items scores are reported in Fig. 2.

Fig. 2
figure 2

Distribution of the FOUR scores (left panel) and GCS (right panel) in our cohort

The inter-rater agreement for all raters was excellent, both for the total FOUR score (κw 0.953, 95% CI 0.928–0.978; ICC 0.991, 95% CI 0.986–0.994) and the total GCS score (κw 0.943, 95% CI 0.917–0.972; ICC 0.988, 95% CI 0.981–0.992) (Table 1). The inter-rater agreement for each pair of raters ranged from good to excellent, independent of the level of expertise; the agreement for the items included in the FOUR score and in the GCS was good for the motor response and excellent for the others. The inter-rater agreement for the FOUR score was excellent in alert (κw 1.000, 95% CI 1.000–1.000) and stuporous patients (κw 0.920, 95% CI 0.810–1.000) and good in drowsy (κw 0.749, 95% CI 0.581–0.917) and comatose (κw 0.782, 95% CI 0.645–0.920) patients. The inter-rater agreement for the GCS was excellent in alert (κw 0.944, 95% CI 0.834–1.000) patients and good in drowsy (κw 0.791, 95% CI 0.587–0.996), stuporous (κw 0.781, 95% CI 0.540–1.000), and comatose patients (κw 0.711, 95% CI 0.520–0.901). The Cronbach’s α showed a high internal consistency both for the FOUR score (0.995) and the GCS (0.994). Spearman’s ρ between the items of the scales was 0.953 (P < 0.01) both for the FOUR score and the GCS.

Table 1 Inter-rater agreement (κw) for the FOUR score and the Glasgow Coma Scale

Thirty-nine patients (44.8%) had a favorable outcome at discharge and 48 (55.2%) a poor outcome, including 22 (25.3%) patients who died. When considering mortality (Fig. 3), the area under the curve (AUC) in the ROC curve analyses was comparable for the FOUR score (AUC = 0.935; 95% CI 0.884–0.985) and the GCS (AUC = 0.953; 95% CI 0.913–0.994). The optimal score to predict mortality at discharge was 10 for the FOUR score (sensitivity 91%; specificity 86%) and 9 for the GCS (sensitivity 100%; specificity 81%). Similarly, as shown in Fig. 3, referring to poor outcome, the AUC values were 0.909 for the FOUR score and 0.958 for the GCS. At the FOUR score, items assessing brainstem reflexes and respiration had lower AUC values with respect to items that assessed eye and motor response (Table 2). These findings remained statistically significant after adjusting the analyses for age, gender, consciousness, and clinical diagnosis.

Fig. 3
figure 3

Area under the curve (AUC) values for the FOUR score (left) and the GCS (right) for poor outcome (mRS = 3–6) and mortality (mRS = 6)

Table 2 Area under the curve (AUC) values for the items of the FOUR score according to the modified Rankin Scale

A formal test of accuracy was conducted using the positive likelihood ratio (+LR); we found higher +LR values for the FOUR score (6.6 for mortality and 16.3 for poor outcome) as compared to the GCS, still representing the gold standard in clinical practice (5.4 for mortality and 12.2 for poor outcome).

In patients with the most severe brain injury (GCS 3–5), the FOUR score provided greater neurological details than the GCS. In 8 (9.2%) patients with a GCS of 3 the FOUR score ranged from 0 to 6, in 7 (8%) patients with a GCS of 4 the FOUR score ranged from 2 to 6, and in 5 (5.8%) patients with a GCS of 5 the FOUR score ranged from 3 to 8 (Table 3).

Table 3 Cross-tabulation of the FOUR score and the GCS in patients with the most severe brain injuries

Discussion

Our study shows that the Italian version of the FOUR score is a valid predictor of outcome providing greater details than the GCS, and can be reliably used in patients with acute brain injury. Inter-rater agreement for the overall FOUR score is excellent (κw = 0.953) and comparable to that of the GCS (κw = 0.943), and is similar to that reported by the developers of the scale (κw = 0.82 for both scales) [2], by the authors of the French version (κw = 0.86 for the FOUR score and κw = 0.85 for the GCS) [16], and by those of the Spanish version of the scale (κw = 0.93 for the FOUR score and κw = 0.96 for the GCS) [17]. Inter-rater agreement ranges from good to excellent in all categories of patients showing a greater agreement than the GCS in stuporous and comatose patients.

Our evaluations were performed by neurologists and neurology residents with clinical expertise, and show that the scale is reliable and independent of the expertise of the raters. The study did not include nurses and fellows as raters because in Italy only evaluations performed by physicians hold legal value. The validation of the French version of the scale involved highly, moderately, and less-experienced raters; performances are comparable only among the highly and moderately experienced raters [11]. At variance with other studies, this different finding may have depended on the higher level of expertise of our residents who were chosen among those with at least 4 years of clinical practice.

Scores of the single items can be used reliably as suggested by all available data, including ours [12, 20]. The inter-rater agreement shows values from good to excellent for all the items of the FOUR score with lower κw values for motor response at the FOUR score and the GCS, as already reported [13]. The chance of inter-rater agreement at both scales increases in alert patients, according to Wolf et al. [15] and at variance with Idrovo et al. [17].

Our ROC curves show that the AUC values of the FOUR score and the GCS are comparable; accordingly, both scoring systems are excellent outcome predictors of in-hospital mortality. However, prediction was less accurate in patients with a poor outcome (mRS ≥ 3) both with the FOUR score and the GCS. At variance with the developers of the scale and according to Eken et al. [2, 12], we find that brainstem reflexes and respiration do not provide the expected benefit in predicting prognosis, since the evaluation of eye and motor responses as included in the FOUR score shows a greater variability. The positive likelihood ratio is higher for the FOUR score than the GCS suggesting that the former scale is better able to identify the outcome.

Moreover, patients with the lowest GCS (3–5) had values between 0 and 8 at the FOUR score, emphasizing that the FOUR score is more useful in tracking the clinical status of patients with acute brain injuries.

In conclusion, we found that the Italian version of the FOUR score can be used to reliably assess patients with impaired consciousness. The scale is easily taught and administered, allows accurate tracking of the neurological status in patients with severe brain injury, and is useful to predict poor outcome. For all the above reported reasons, in our opinion, the FOUR score is worthy of a greater knowledge and application in clinical practice.