Introduction

Sedation is integral to the management of critically ill patients requiring mechanical ventilation. It is currently provided by either infusions or intermittent boluses of drugs with hypnotic and analgesic properties [1]. Over-sedation is associated with adverse outcomes during critical illness, and clinical strategies designed to avoid this can decrease ventilation times and intensive care unit (ICU) length of stay [23]. These strategies may also improve clinical outcomes, decrease some complications, such as ventilator-associated pneumonia, and decrease illness cost [4].

At present, sedation is usually managed by clinical assessments, often in conjunction with a protocol for adjusting drug doses. This approach has been shown to decrease ventilation times and the costs of sedative drugs and is considered best practice in recent guidelines [5]. Several clinical scales have been proposed and validated for assessing sedation level reliably; these are integral parts of most sedation protocols [6–8]. One problem with these scales is that they lack discriminative power in the over-sedated range. This means that clinical assessments lack the ability to distinguish between patients who are heavily sedated, but would regain consciousness rapidly after sedation withdrawal, from those who would have delayed recovery in consciousness because of drug accumulation or coma/encephalopathy. This is clinically relevant because the patient in whom deeper levels of sedation are frequently maintained in order to facilitate more invasive or uncomfortable treatments are often sicker. Under these circumstances sedative drug pharmacokinetics and dynamics are less predictable.

A monitoring system that could detect over-sedation in critically ill patients would be useful, particularly if it could discriminate different degrees of deep sedation and encourage staff to consider decreasing drug doses [9]. Several systems based on electroencephalogram (EEG) analysis via forehead electrodes have been developed as depth-of-anesthesia monitors. The Bispectral Index (BIS, Aspect Medical Systems, Newton, Mass.) is widely used for anesthesia. Although promoted as an intensive care monitor, published literature has contrasting conclusions regarding its value in this setting [10]. Spectral Entropy measurement (Entropy™ Module, GE Healthcare, Helsinki, Finland) was developed for anesthesia monitoring and has not been evaluated in intensive care [11]. The module can be used to record raw EEG signals as well as Entropy values, allowing off-line examination of data.

The primary aim of this study was to assess whether Entropy, a measure of hypnosis in anesthetized patients, could be used to measure sedation status in general intensive care patients during routine clinical management by comparing values with a clinical sedation scale.

Patients and methods

We studied 30 patients admitted to a single teaching hospital ICU over a treatment period of up to 72 h. Inclusion criteria were a requirement for mechanical ventilation, sedation using continuous infusions of either midazolam or propofol and concomitant analgesic drugs if clinically indicated, and informed consent from relatives. Exclusion criteria were: (a) a patient in whom brain injury, namely hypoxic brain injury, traumatic brain injury, or intracranial hemorrhage, were considered present at the time of enrollment to the study; (b) drug overdose as admission diagnosis; (c) a patient requiring neuromuscular paralysis at the time of screening for the study; or (d) status epilepticus. We also excluded patients known to be clinically deaf or who had chronic neuromuscular disorders or brain disease that might interfere with normal clinical sedation assessment. The ethics committee and local institution gave permission to carry out the study.

Study design and protocol

We performed a non-interventional prospective cohort study. A standard disposable GE Datex-Ohmeda Entropy sensor (GE Healthcare, Helsinki, Finland) was applied to the forehead symmetrically relative to the midline. Each sensor comprised a strip that included one electrode each for the left and right hemispheres, and one central ground electrode. Sensors were changed every 24 h. The monitor performed an automatic impedance test every 10 min to ensure electrical contact fidelity. Periods of poor electrode contact (impedance > 5 kOhm) were rejected from the analyses.

The sensors were connected to a GE Datex-Ohmeda S/5 Compact anesthesia monitor, provided with the Entropy Module (GE Healthcare, Helsinki, Finland). The Entropy Module calculates two different Entropy parameters. State Entropy (SE) is derived over the frequency range 0.8–32 Hz; it includes most of the EEG power and in an anesthetized patient primarily reflects activity of cortical neurons, whereas contribution from facial muscle activity is small. Response Entropy (RE) is derived over the frequency range 0.8–47 Hz; it includes in addition to the cortical component a significant contribution from muscle activity that dominates the high-frequency part of the spectrum. An Entropy number ranging from 0 to 100 is displayed for each parameter, with 0 indicating complete suppression of cortical activity and 100 the normal awake state. For anesthesia a value in the 40–60 range is associated with a low probability of consciousness. The data recording included raw EEG/EMG waveforms with a 400-Hz sampling frequency plus the SE and RE parameters. All data were recorded by a dedicated laptop computer equipped with S/5 Collect software (GE Healthcare, Helsinki, Finland). A detailed description of the entropy algorithm has been published previously by Viertiö-Oja and colleagues [11].

Once data recording had started, it continued until one of the following end points was reached: (a) patient regained consciousness and mechanical ventilation was discontinued; (b) 72 h had elapsed; (c) the patient or a relative requested discontinuation of the monitoring and/or withdrawal from the study; or (d) the patient had died.

Observations and management of patients during the protocol

Patients received routine clinical management throughout the study period determined by caring clinicians. Most patients were sedated with propofol as first-choice sedative. Some patients received midazolam. Sedatives were administered by continuous infusion with additional boluses as considered clinically appropriate. Analgesia was provided with alfentanil or morphine infusions. The choice of sedative and analgesic drugs, and the doses prescribed, were determined by medical and nursing staff and were not controlled for the purpose of the study. A sedation scoreassessment was carried out up to every 30 min using a modified Ramsay scoring system (Table 1). The published scoring system was modified to standardize the stimuli, specifically by including a tetanic stimulus at deep sedation levels. Each Ramsay score was compared with the median Entropy numbers for the 1-min period preceding each assessment. This approach was used to avoid the potential confounding effect of stimulation during the Ramsay assessment. All Ramsay assessments included in the analysis were carried out by a single member of the research team (P.R.), who was blinded to the Entropy numbers. In addition, a laptop-based notation file was used to record all events during periods of observation. We aimed to achieve a mean of 15–20 single observer assessments per patient, which would generate a total of 450–600 paired data for analysis.

Table 1 The modified Ramsay Sedation Assessment Scale used in the study

Analysis

Assessment of criterion validity

To assess the criterion validity of entropy we examined the correlation between SE, RE, and clinical sedation score as reference standard. We calculated the distribution of SE and RE values corresponding to each modified Ramsay level and presented the results graphically by box-and-whisker plots. The performance of the Entropy parameters in indicating the depth of sedation as given by the Ramsay score was tested with the prediction probability, PK, which is a variant of Kim's measure of association [12]. All prediction probabilities and their standard errors were estimated with the jackknife method as described by Smith [12] et al. With this approach a PK value of 0.5 indicates no predictive ability compared with the reference (in this case clinical sedation level), and a PK value of 1 indicates perfect prediction. The calculations were performed with Excel software (Microsoft, Redmond, Wash.) using a custom spreadsheet macro, PKMACRO, developed by Smith and colleagues [12]. We calculated PK values for the ability of SE and RE to distinguish each Ramsay score category from the other categories.

Assessment of construct validity

We used two approaches to assess construct validity. Firstly, as we were primarily interested in the ability of Entropy to distinguish “lighter” sedation ranges from “deeper” sedation states we calculated the PK value for SE and RE for discriminating patients in Ramsay range 1–3 from those in Ramsay range 4–6. Secondly, we plotted continuous entropy data and compared it with intermittent sedation scores and our annotation files. In this analysis we also examined the relationship between entropy values and facial EMG power, which was recalculated from the data files. This was done from the sum of the frequency components between 55 and 145 Hz, excluding the 100-Hz mains frequency multiple.

Comparisons between different Ramsay levels were carried out using a Kruskal--Wallis test for non-parametric data. If significant, a Mann--Whitney U test was used to test inter-group differences. The significance level was set at 5%.

Results

Characteristics of the 30 patients are shown in Table 2. A total of 1200 h of EEG/EMG monitoring were carried out. The median duration of monitoring per patent was 40 h (first quartile 26 h, third quartile 66 h; range 2–76 h). The single assessor made a total of 475 sedation assessments during the monitoring period; of these, 59 were rejected from the analysis due to poor Entropy data quality at the time of the assessment, leaving 416 assessments that were used in the analyses. The median (first, third quartile; range) number of trained observer sedation assessments per patient was 14.5 (8, 21.5; 3–32). The number of observations made across Ramsay scores 1–6 were 5, 90, 168, 16, 119, and 18, respectively. This broadly categorized patients into “deeper” sedation state (Ramsay 4–6, n = 153 assessments) and “lighter” sedation state (Ramsay 1–3, n = 263 assessments).

Table 2 Characteristics of the patients studied

Assessment of criterion validity

The values for SE and RE in relation to Ramsay score are shown in Fig. 1. Although median values did decrease as Ramsay scores progressed from 1 to 6 (p < 0.05 between levels 2–3 and 4–5), there was a wide range in values for each category, particularly for the Ramsay 3–6 range. These ranged from values suggesting deep anaesthesia (< 40) to values suggesting very light sedation or normal consciousness (> 80) even for the Ramsay 5–6 patients, who were non-responsive. The mean (SEM) PK value of RE and SE for discriminating each sedation level from all other levels were 0.713 (0.019) and 0.710 (0.019), respectively. Although these values indicated some predictive power (value > 0.5), criterion validity was inadequate as a clinically useful test to distinguish each clinical sedation level from other levels.

Fig. 1
figure 1

Box-and-whisker plots show the State (top panel) and Response (bottom panel) Entropy (SE, RE) values that were observed at different Ramsay sedation scores using data pooled from all patients observed during the study. Boxes indicate interquartile range; horizontal line within box indicates median value. The number of observations on which the plots are based is shown in Table 2. * p < 0.05 for adjacent groups (Mann–Whitney U test)

Construct validity

The mean (SEM) PK value of RE and SE for discriminating patients in “lighter” (Ramsay 1–3) from “deeper” (Ramsay 4–6) sedation states were 0.750 (0.025) and 0.748 (0.025), respectively. This suggested inadequate construct validity to distinguish lighter from deeper sedation.

We observed a clear pattern on visual inspection of the data. There were frequent “on–off” effects in entropy number where values changed rapidly from low to highlevels, and vice versa. This was particularly noticeable during deeper sedation states (Fig. 2). During lighter sedation states entropy numbers tended to be consistently very high (Fig. 3). When we examined the fEMG power corresponding to the on–off effect, we found that fEMG activations corresponded to switches to high entropy numbers. The fEMG power varied within each Ramsay score category; high levels of fEMG power were observed during some periods even in over-sedated patients (Ramsay 5–6). The distribution of fEMG power calculated for each Ramsay sedation assessment is shown in Fig. 4. There was a correlation between fEMG power and Ramsay score. The PK value for fEMG for distinguishing each sedation score was similar to the entropy numbers [0.711 (0.019)]. The PK for distinguishing Ramsay 1–3 from 4–6 was higher than for the entropy numbers 0.764 (0.024).

Fig. 2
figure 2

An individual patient plot illustrates the relation between the Entropy values and facial EMG power (fEMG power) when the clinical sedation score indicated that the patient had sluggish responses to stimuli. The Ramsay score was 5 throughout the recording (bottom panel). Despite this, clear periods of fEMG activation occurred manifest as acute increases in fEMG power (middle panel). Some of these responses were spontaneous and some in response to clinical stimuli (evident from the detailed annotation files). During any fEMG response both State and Response Entropy numbers increased to very high “awake” levels despite no change in Ramsay score (top panel). This created an “on–off” effect in Entropy that correlated with fEMG activations

Fig. 3
figure 3

An individual patient plot illustrates the relation between the Entropy values and facial EMG power (fEMG power) when a patient was emerging from sedation as drugs were reduced. The Ramsay score increased indicating transition from sluggish response to stimuli (level 5) to a cooperative, awake, and tranquil patient (level 2) (bottom panel). The fEMG power was high throughout the period of observation and increased as sedation level progressed from 5 through to 2 (middle panel). Both State and Response Entropy numbers (SE, RE, top panel) had a frequent “on–off” effect during deeper sedation, and subsequently a persistent very high value throughout emergence. The lack of discrimination for Entropy numbers between sedation levels 5 to 2 is clear

Fig. 4
figure 4

Box-and-whisker plot shows the facial EMG power (fEMG power) observed at each Ramsay sedation assessment level. * p < 0.05 for adjacent groups (Mann–Whitney U test)

Discussion

We have shown that State and Response Entropyvalues do not discriminate well between different clinical sedation levels in non-paralyzed intensive care patients. Although median Entropy values decreased in association with increasing sedation level, the variation in observed values was high. These observations suggest that entropy has inadequate validity as a measure of sedation state under routine clinical conditions. Facial EMG activation is a plausible explanation for the poor validity; it was observed across all clinical sedation ranges and often created an “on–off” effect in Entropy numbers, particularly in more sedated patients.

We studied patients during routine clinical care at various stages of their illness and with a range of conditions. Our findings are likely to be relevant to most non-neurological ICU patients. We obtained a large number of data points based on observations by a single trained observer. Inter- and intra-rater variability were therefore minimized as a potential source of bias. We observed relatively few under-sedated patients (Ramsay score 1) but do not consider this important because monitors are not required in this situation. Most observations were made in the Ramsay 2, 3, and 5 levels, which include sedation levels considered optimal in most guidelines (Ramsay 2–3) and the heavy/over-sedated ranges (Ramsay 5–6). We believe a useful intensive care sedation monitor must reliably detect over-sedation and distinguish it from optimal sedation. Although no single state can be considered an optimal sedation level for all ICU patients, the impression that Entropy numbers do not adequately distinguish lighter from deeper sedation states was supported by the PK values. If the monitor was intended to alert staff to excessive sedation by presenting a low Entropy number, it has limited value in non-paralyzed patients because high numbers were often generated in patients assessed as Ramsay 5 or 6. Paradoxically, this could lead to increases, rather than decreases, in sedation.

Entropy parameters were developed for use during anesthesia, when the main goal is avoidance of light anesthesia. During anesthesia State Entropy correlates well with surgical anesthetic level, and an increase in Response Entropy, which intentionally includes the higher fEMG frequencies, acts as a warning if the patient frowns in response to surgical stimulation [13]. The system was not developed to monitor patients in the ICU environment who receive frequent intense stimuli, such as tracheal suctioning, and in whom the goal is a responsive patient. High Entropy values observed in Ramsay levels 5 and 6 were associated with fEMG activations and higher fEMG power.Our data suggest that, unlike during anesthesia, during ICU sedation much lower fEMG frequencies, including those below 32 Hz, are problematic confounders to the Entropy algorithm. This conjecture is supported by our previous observations that in the ICU, fEMG may dominate the forehead EEG signal down to frequencies as low as 22 Hz [14]. We showed that bursts of “high” fEMG were frequently observed during and in between formal sedation assessments, and were consistently associated with high entropies numbers.

This is the first study to evaluate the Entropy Module in sedated critically ill patients. Several studies have evaluated the Bispectral Index (BIS) as a sedation monitor in the ICU [10]. The BIS algorithm is different from Entropy, but it is potentially subject to similar confounders. Case reports and a volunteer study have shown that neuromuscular blockade without altering sedative drugs, or in non-sedated volunteers, decreases BIS numbers in association with abolition of fEMG [1516]. This indicates that fEMG is an important component of the BIS number as we have observed with Entropy. A recent controlled study found that BIS numbers decreased after administering a neuromuscular blocker to ICU patients with Ramsay score 4–5 but did not change when the Ramsay score was 6 [17]. These observations are consistent with an important confounding effect from fEMG at optimum sedation levels. Recent versions of BIS (BIS-XP) aim to adjust for fEMG activity and present fEMG power. In an observational study similar in design to ours Ely and colleagues found a trend for BIS-XP values to decrease with greater depth of clinical sedation, but observed a wide range in BIS-XP values, including those usually associated with awake individuals, even during deep sedation [18]. The authors also observed significant fEMG power during deep sedation and a strong correlation between fEMG power and BIS number, confirming the findings of others [19]. These data suggest that fEMG is likely to be an important confounder to all currently available consciousness monitors that analyze EEG from frontal electrodes.

Conclusion

In conclusion, we have shown that Entropy measured from frontal EEG has low validity to distinguish clinical sedation state in critically ill patients managed under routine clinical conditions due to strong interference from facial EMG activity.