Introduction

Many studies over the past 20 years have identified various clinical, electrophysiological and radiological tests that may be used as predictors of outcome. Zandbergen et al. [1] demonstrated the power of somatosensory evoked potentials (SEPs) compared indirectly to other tests to predict unfavourable outcome in a systematic review of anoxic coma. In addition, a recent systematic review [2] after all forms of brain injury showed the high predictive power and low false-positive rate of SEPs (specificity 98.7%, 97.7–99.3%) for unfavourable outcome but did not compare them to other tests. A proper, direct comparison of tests is required. The aim of the current study was to perform a systematic review summarising the published data for tests predicting outcome following acute, severe brain injury and to directly compare the predictive power of the common tests of motor responses, pupillary responses, the Glasgow Coma Scale (GCS), electroencephalography (EEG), and computed tomography (CT) to that of SEPs.

Methods

Studies were identified by two means. First, the databases Medline (1976–December 2002) and Embase (1980–March 2002) were searched using combinations of the terms somatosensory evoked potentials/response, electroencephalography, X-ray computed tomography, Glasgow Coma Scale, brain injury, head injury, coma, hypoxaemia, anoxia, heart arrest, outcome, health status, predictive value, sensitivity, specificity and related key words. Secondly, we conducted a manual review of articles’ bibliography and contacted the authors. The studies included had to include patients described as “severely” brain injured or “comatose” or have the majority of patients with a GCS less than 8 and allow the determination of outcomes in association with graded measures for at least one of either the GCS, EEG, CT or a clinical feature (pupillary response or motor response) and all patients with bilaterally absent SEPs in order to generate appropriate 2×2 tables. Case reports, studies using neonatal subjects, abstracts, studies that were duplicates of other studies and studies dealing only with brain dead subjects were excluded from the systematic review. Non-English-language studies were included if translation was possible.

Data analysis consisted of determining the relationship between dichotomised test and outcome data using standard 2×2 tables. Outcomes were defined as favourable or unfavourable: a favourable outcome consisted of the Glasgow Outcome Scale categories of normal and moderate while severe disability, vegetative state or death were considered unfavourable outcomes. For SEPs, outcomes were determined for patients with normal, abnormal and bilaterally absent SEPs. The cut-off points (thresholds) for the other predictors used are specified in the result tables. Separate 2×2 tables were generated for the prediction of favourable and unfavourable outcome. The number of true positives (test positive and outcome positive), false positives (test positive and outcome negative), true negatives (test negative and outcome negative) and false negatives (test negative and outcome positive) were determined for each test in each study in order to generate a standard 2×2 table for the use of SEPs and at least one other predictive measure. Data were excluded if an entire row of the 2×2 table consisted of cells with a value of 0. A number of specific methodological criteria were recorded from each study: whether the test was conducted and classified in a blinded manner, whether patient enrolment was prospective and consecutive, details of study entry criteria, patient characteristics and SEP methodology.

Sensitivity, specificity, positive predictive value, negative predictive value and odds ratio (OR) were calculated for each test for the prediction of favourable and unfavourable outcome. Exact 95% confidence intervals were calculated using the binomial distribution for the proportions. If one or more of the fields of the 2×2 table were empty, 0.5 was added to all cells (Yates’ correction). All data analysis was performed using a computer software package (version 6.0, Stata, College Station, Tex., USA).

Comparisons were made between test results to identify which test has superior performance in regard to sensitivity, specificity, positive and negative predictive value and odds ratio. Formal meta-analysis was conducted using the technique of summary receiver operating characteristic curves (sROC) [3, 4]. The results were also summarised by dividing the odds ratio for SEPs by the odds ratio for the other test to determine the ratio of odds ratios (ROR) for each study. A ROR greater than 1 indicates superior performance of SEPs while a ROR less than 1 indicates inferior performance of SEPs compared to the other test. ORs and RORs were tested for homogeneity while individual study RORs were pooled for each comparison using a random effects model with weighting inversely proportional to the variance [5].

Results

We initially identified 151 publications, 126 of which were excluded for various reasons: could not generate 2×2 table (40%), study not prospective/retrospective (31%), SEPs compared to other test (17%), incorrect patient group (5%), long latency SEPs (3%), row of data consisting only of zeros (2%), not used to predict outcome (1%) and duplicate study (1%). Twenty-five English-language studies [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30] and four non-English-language studies [27, 28, 29, 30] were identified that compared SEPs to other predictors (Table 1). Ten studies described their patients as “comatose” or “severely” injured while six consisted only of patients with a GCS less than 8, and nine studies also included some patients with GCS of 8 or higher. Only four studies focused specifically on children [8, 24, 25, 29] while nine included both children and adults [6, 12, 13, 15, 17, 20, 22, 28, 30].

Table 1 Study characteristics (SEPt timing of SEP test, OTHt timing of other test, m median, av average, parentheses range,? unknown, pa pre-admission, a on admission, sp pre-sedation/paralysis, Sim other tests performed simultaneously with SEPs, ND not stated or known to be recorded separately, Prob probably, Imm immediately prior, Par “parallel” within the time period specified, Spct Spectrum, Cons consecutive, n number of studies, T TBI, H hypoxic-ischemic encephalopathy, O other, B SEPs blinded, M motor response, P pupils, pr=prospective, A=adult subjects only, p=paediatric patients, a adult patients, O/C when outcome assessed, d days, m months, y years. SEP SEP parameters outside recommended guidelines [31, 32], F bandpass filter, I stimulus intensity, R stimulation rate)

SEP methods typically involved repeat median nerve stimulation at a level above the motor threshold at a rate of 3–6 Hz with recordings made from the neck and contralateral parietal cortex (C3/4 or 2 cm behind) in accordance with published recommendations [31, 32]. Some studies used filters outside the recommended ranges [6, 10, 11, 12, 13, 14, 15, 16, 17, 22, 26], a stimulus intensity below motor threshold [16, 19] or a high stimulation rate [23]. While 14 studies compared SEPs with only one other predictor, 10 compared two other predictors to SEPs, and one study compared SEPs to four predictors. Some of the studies provided details on the timing of SEPs and the other tests. SEPs and the other tests were typically performed at different times in different studies, generally within the first few days following injury with the initial test results commonly used. Only the tests in three studies [6, 18, 19] appear to have been performed simultaneously with SEPs. The results of the comparisons between SEPs and GCS, pupillary responses, motor responses, EEGs, and CT are presented, respectively, in Tables 2, 3, 4, 5, and 6.

Table 2 SEPs (upper rows) vs. GCS (lower rows) (Aetiol Aetiology, T trauma, H hypoxic-ischaemic injury, O other, TP number of true positives, FN number of false negatives, FP number of false positives, TN number of true negatives, ND data unavailable, Normal normal SEPs are cut-off values for positive SEP test for favourable outcome prediction, Bil. absent bilaterally absent SEPs are cut-off value for positive SEP test for unfavourable outcome prediction, cut-off GCS threshold for positive GCS test result)
Table 3 SEPs (upper rows) vs. pupillary responses (lower rows) (Aetiol Aetiology, T trauma, H hypoxic-ischaemic injury, O other, TP number of true positives, FN number of false negatives, FP number of false positives, TN number of true negatives, ND data unavailable, Normal normal SEPs are cut-off values for positive SEP test for favourable outcome prediction, Bil. absent bilaterally absent SEPs are cut-off value for positive SEP test for unfavourable outcome prediction, cut-off threshold criteria, equal to or better than)
Table 4 SEPs (upper rows) vs. motor responses (lower rows) (Aetiol Aetiology, T trauma, H hypoxic-ischaemic injury, O other, TP number of true positives, FN number of false negatives, FP number of false positives, TN number of true negatives, ND data unavailable, Normal normal SEPs are cut-off values for positive SEP test for favourable outcome prediction, Bil. absent bilaterally absent SEPs are cut-off value for positive SEP test for unfavourable outcome prediction, cut-off threshold criteria)
Table 5 SEPs (upper rows) vs. EEGs (lower rows); the EEG patterns listed are the major or dominant types present unless otherwise noted (lower row) (Aetiol Aetiology, T trauma, H hypoxic-ischaemic injury, O other, TP number of true positives, FN number of false negatives, FP number of false positives, TN number of true negatives, ND data unavailable, Normal normal SEPs are cut-off values for positive SEP test for favourable outcome prediction, Bil. absent bilaterally absent SEPs are cut-off value for positive SEP test for unfavourable outcome prediction, cut-off EEG threshold)
Table 6 SEPs (upper rows) vs. CT (lower rows) (Aetiol Aetiology, T trauma, H hypoxic-ischaemic injury, O other, TP number of true positives, FN number of false negatives, FP number of false positives, TN number of true negatives, ND data unavailable, Normal normal SEPs are cut-off values for positive SEP test for favourable outcome prediction, Bil. absent bilaterally absent SEPs are cut-off value for positive SEP test for unfavourable outcome prediction, cut-off threshold criteria)

In 61% of patients (range 14–90%) there was a unfavourable outcome, with 43% (range 17–80%) dying. For the prediction of unfavourable outcome SEPs were associated with far fewer false positives than the other tests. Only eight false positives were reported with the use of SEPs across all comparisons. Seven of these false positives, four in the study by Beca et al. [8] and one each in those by Pohlmann-Eden [20], Rumpl et al. [22] and Taylor and Farrell [24] failed to meet the criteria required for the proper determination of a bilaterally absent response [2]. The studies by Beca et al. [8] and Taylor and Farrell [24] included only children. By comparison, there were 77 false positives associated with the GCS, 17 with CT, 12 with pupillary responses, 8 with motor responses and 3 with the EEG. SEPs also tended to have higher numbers of true negatives than other tests. On the other hand, SEPs tended to be associated with higher numbers of false negatives and fewer true positives than the other tests. For favourable outcome prediction the only clear pattern is the tendency for SEPs to be associated with higher numbers of true positives and fewer false negatives than the other tests and their higher rate of false positives than the GCS.

Table 7 provides a comparison of the number of times that SEPs had higher, equal or lower values of sensitivity, specificity, positive predictive value and negative predictive value than each of the other tests. These comparisons showed that for unfavourable outcome prediction SEPs tended to display lower sensitivity, higher specificity, higher positive predictive value and lower negative predictive value. Motor responses, pupillary responses, GCS and CT tended to have higher sensitivity than SEPs. The specificity and positive predictive value of SEPs was consistently equal to or greater than that of all other tests. For favourable outcome prediction SEPs tended to display higher sensitivity, specificity, positive predictive value and negative predictive value. Exceptions were for pupillary response which had higher sensitivity and the GCS, which tended to have better specificity.

Table 7 Comparison of predictive powers of SEPs and the other tests. The data present: the number of times SEPs are superior to the other test/the number of times SEPs are equal to the other test/the number of times SEPs are inferior to the other test for each of the measures of test performance (PV predictive value)

Figure 1 summarises the sensitivity and specificity data in sROC curves. These curves show that the SEP characteristics are superior to all but one of the other tests for both favourable and unfavourable outcome prediction. The exception is for the comparison with motor responses for favourable outcome prediction, where the two sROC curves intersect and show significant overlap. Examination of the results of paediatric studies and those consisting only of patients suffering traumatic brain injury (TBI) or hypoxic-ischemic encephalopathy (HIE) compared to the other studies failed to identify any differences. Individual points on the sROC curves show considerable variation, even amongst studies consisting of only children or patients suffering only TBI or HIE which are widely distributed amongst the other studies. There was a suggestion however of better sensitivity for SEPs for unfavourable outcome prediction amongst HIE patients.

Fig. 1
figure 1

Summary receiver operating characteristic curves comparing SEPs and other predictors: left for the prediction of favourable outcome; right for the prediction of unfavourable outcome. Filled circles, heavy line SEPs; empty circles, light line respective comparative predictor Glasgow Coma Scale (a), pupillary response (b), motor response (c), electroencephalography (d), computed tomography (e). TPR True positive rate (=sensitivity); FPR false positive rate (=1−specificity). Note that the data displayed are after Yates’ correction

Pooled RORs were all greater than 1 (Table 8). Statistical significance (p≤0.05) was reached only for the GCS (unfavourable) and CT (favourable) comparisons, due to wide confidence intervals for individual studies.

Table 8 Pooled odds ratio comparisons (n number of studies, ROR ratio of odds ratios, CI confidence interval)

Discussion

Despite their availability for over 20 years it is alarming how few studies have directly compared the performance of SEPs to that of the more commonly used tests. Only studies that directly compared tests using a paired design were included in this systematic review in order to control for variation, as occurs within the design of randomised controlled trials, and because this is the most efficient and powerful means to perform such a comparison [3, 4, 33, 34, 35, 36]. The variation displayed amongst the individual results of studies, as shown by the spread of data in the sROCs, indicates how misleading comparisons can be if the results of individual tests obtained from different studies are simply compared.

The various results presented here suggest that SEPs are superior to pupillary responses, motor responses, GCS, EEG and CT for the prediction of outcome after acute, severe brain injury. We compared the tests in a number of ways to determine this. The formal meta-analysis with ORs was performed because the OR value is a good overall measure of a test’s statistical discriminatory power and a good way to contrast two tests and relates to the test’s sROC curves. Although significant in only two instances, pooled RORs were all greater than 1, with most individual studies reporting RORs greater than 1. The ROR confidence intervals also tended to extend further above 1 than below. A comparison of sensitivity, specificity and positive and negative predictive values was also performed because these values are clinically important (Table 7). sROC curves also summarise the relationship between sensitivity and specificity and portray its variation between studies while presenting the data visually. For two intersecting sROC curves, as for SEPs and motor responses for favourable outcome prediction, the question of which test is better depends on the circumstances of its use. The advantage of SEPs over motor responses is that it is common to be unable to assess motor function due to sedation or paralysis while SEPs are unaffected in these situations.

It is possible that other individual tests not compared here are superior to SEPs. This systematic review focussed on the most commonly used and available tests. Individual studies have compared SEPs to many tests, ranging from other evoked potential tests to biochemical markers. It is well documented that SEPs are the most powerful of the standard evoked potential modalities [6, 9, 20, 24, 37, 38, 39, 40, 41, 42, 43] although work by Madl et al. [18, 44] and Shermon et al. [45] suggest that long latency SEP components are superior to standard SEPs due to their higher sensitivity. Long latency components are reportedly more prone to technical artefacts and the effects of sedation than SEPs, however [45, 46, 47], which makes them potentially unsuitable in an ICU setting. Also, the use of SEPs recorded over the frontal cortex, rather than the standard parietal recordings, has shown promise as a more powerful predictor of outcome [13, 48, 49]. Two new biochemical markers, serum neuron-specific enolase [50, 51] and brain lactate measured by proton magnetic resonance spectroscopy [46], also appear to have good promise as predictors of outcome. These tests need further testing, including direct comparisons with suitable alternatives, such as SEPs, motor responses and pupillary responses.

Although this review shows that SEPs are excellent predictors of outcome when compared to a number of standard tests, they are not perfect. Some of the other tests perform better than SEPs in some regard. While SEPs are the best overall predictor of outcome, motor responses, pupillary responses, GCS and CT had better sensitivity for unfavourable outcome prediction while the pupillary responses had better sensitivity and GCS better specificity for favourable outcome prediction. Although specificity for predicting unfavourable outcome with SEPs approaches 100%, sensitivity and the specificity for favourable outcome prediction is not as good. While this could be improved with repeat testing and exclusion of patients who die of non-neurological disorders [2], the combined use of SEPs with other predictive tests is also likely to improve the accuracy of outcome predictions, although some compromises are likely. Various tests have been reported to improve the accuracy of favourable and unfavourable outcome when used in conjunction with SEPs. The additional measurement of frontal SEPs or motor SEPs [12, 13, 48, 49], EEG [21] or EEG reactivity [14] has improved the accuracy of outcome predictions when combined with SEPs. Schreiner and Pohlmann-Eden [51] and Sherman et al. [45] have also reported that the biochemical markers serum neuron-specific enolase and creatine kinase BB, respectively, improve the accuracy of unfavourable outcome prediction when combined with SEP results. Clinical features such as those investigated here, including the GCS, pupillary responses and motor responses, also improve the predictive powers of SEPs [7, 10, 23, 39, 52, 53, 54]. Kane et al. [55] reported that the combination of mismatch negativity and the GCS with SEPs was the best predictor of outcome while Ying et al. [26] also combined a number of tests, clinical and motor evoked potentials, with SEPs to obtain the best outcome prediction of mortality. Additional research is required to identify the most powerful combination of predictive tests.

The sROC curves show the variation in results that can exist between studies. One factor potentially causing this variation is the relative timing of the individual tests. Only three studies [6, 18, 19] appear deliberately to have performed the tests simultaneously with most studies conducting the tests independently, over wide time periods. This causes variation in the relative predictive power of tests within a study as well as introducing variation between studies. Clinical heterogeneity can also cause variations in results. Many factors, such as treatment strategies, SEP methodologies and patient characteristics, can influence between-study results but are largely controlled for when within-study results are examined. One limitation of all included studies, and common to other studies examining outcome prediction in general, is the lack or absence of details on the approach to the withdrawal of treatment. This is difficult to avoid, however. Ethical concerns prevent ongoing maximal treatment to patients thought to have a poor prognosis [56] and only two included studies [7, 10] reported the provision of maximal treatment for a predetermined minimal period (3 days). While many of the studies blinded clinicians to SEPs, other parameters were not blinded and were available to be used in the decision to withdraw or to continue treatment. This may bias the results but should add more weight to the findings on the relative performance of SEPs.

A comparison of the results of different studies using only paediatric patients and those using only patients suffering TBI compared to HIE failed to show any differences, although this was limited by small numbers and/or restricted variation between studies, except in the GCS comparison where SEPs had a better sensitivity for unfavourable outcome prediction amongst patients suffering HIE. This may be due to differences in the pathological changes or their changes over time between the two types of injury. There is also evidence [2] that false positives for unfavourable outcome prediction are more likely in children and after TBI.

The results presented here suggest that SEPs are the best single overall predictor of outcome, but that standard clinical tests—GCS, pupillary and motor responses—may improve overall predictive ability when combined with SEPs. SEPs must of course be performed following the correct technical procedures and interpreted with the appropriate expertise [2]. Although not as simple to perform as the clinical signs, SEPs can be determined in the presence of sedation and paralysis, are easily performed in an ICU and are easier to record and interpret than EEGs. Although no single predictor can predict outcome with certainty, and combinations are required [1, 44, 52, 57, 58, 59, 60, 61], we suggest that there is sufficient evidence for clinicians to incorporate SEPs into the prediction of outcome after brain injury.