Introduction

The Epworth sleepiness scale (ESS) is a popular, self-administered questionnaire developed for assessing subjective average sleep propensity, which is inexpensive and easy to administer [15]. In spite of the widespread use of the ESS, there is limited documentation on the reliability and validity of the questionnaire. Several studies have expressed conflicting opinions about the association between ESS score and multiple sleep latency as measured by the multiple sleep latency test (MSLT) [6]. Some studies show a moderate association between ESS score and mean sleep latency of the MSLT [7], while the majority show no significant [810] or a negative association [4, 5] between ESS score and mean sleep latency.

The ESS has been translated to many languages; however, the number of different translated versions of ESS in use today is unclear. There are only a few articles documenting the test–retest reliability and validity of translated versions of the ESS [2, 11, 12].

The objective of this study was to assess the internal consistency and the test–retest reliability and validity of a Norwegian version of the ESS using two populations: (1) subjects who were previously evaluated for sleep apnea syndrome and (2) subjects who presented with complaints of excessive daytime sleepiness (EDS) and were referred to and successfully completed an ambulant MSLT.

Materials and methods

Study design and subjects

This study is based on data collection from two samples: (1) a cross-sectional postal survey of subjects who were previously evaluated for obstructive sleep apnea (OSA) and (2) consecutive patients with complaints of EDS, who were referred by a general practitioner, a specialist in neurology, or otolaryngology, and successfully completed an ambulant MSLT. This study was approved by The Regional Committee for Medical and Health Research Ethics.

Patients evaluated for obstructive sleep apnea

A questionnaire was mailed to all 242 patients evaluated for continuous positive airway pressure (CPAP) for treatment of OSA, without any known comorbid conditions, who were referred by a general practitioner, a specialist in neurology, or otolaryngology, and admitted to the pulmonary unit of the Akershus University Hospital from 1993 to 1999 [13]. Patients with a discharge diagnosis of the International Classification of Diseases, ninth revision (ICD-9) code 780.5 (1993–1998) or ICD-10 code G47.3 (1999) were eligible for the study.

The questionnaire included the ESS, items about demographics, smoking, and use of CPAP equipment. Pilot testing of the questionnaire in nine patients with OSA resulted in minor adjustments. A reminder was sent after 2 weeks and another after 4 weeks. We reviewed the medical records of the patients, abstracting relevant information and data from a sleep study before the hospital admission [13]. The apnea–hypopnea index (the total number of episodes of apnea and hypopnea divided by the number of hours of sleep) was calculated for each patient, as a measure of disease severity.

Among the respondents in the survey, 160 consented to be contacted again. From these 160 respondents, we randomly selected 90 patients who were asked by telephone to participate in a test–retest assessment of the questionnaire, including the ESS, 2 weeks apart. Sixty-nine patients agreed to participate, of whom 65 responded to the first questionnaire and 51 to the second. The mean time between assessments was 18 days.

Patients referred to sleep laboratory for multiple sleep latency test

All subjects referred to Akershus University Hospital between 2003 and 2005 for ambulant MSLT were eligible for the study. They were referred for EDS by a general practitioner, a specialist in neurology, or otolaryngology. Those eligible for the study were asked to fill out a questionnaire, including the ESS. Forty-six subjects agreed to participate and were given the questionnaire while at the department being fitted and instructed in use of the MSLT equipment. They filled out the questionnaire at home and returned it the next day along with the MSLT equipment. Of the 46 participants 37 executed the MSLT correctly and were subsequently included in the study.

Epworth sleepiness scale

The ESS is a widely used, English-written, self-administered questionnaire developed for assessing subjective average sleep propensity [1, 2]. The questionnaire assesses the level of general sleepiness during eight real life situations where subjects are asked to rate their chance of dozing in recent times in each situation on a scale of 0 to 3. The responses are added together to produce an ESS score ranging from 0 to 24 [1]. It has been reported that, at a cut-off score of >10, the ESS has a sensitivity of 93.5% and a specificity of 100% in distinguishing EDS from normal daytime sleepiness [14].

In this study, a Norwegian version of the ESS was used for assessing daytime sleepiness. The English version of the ESS was translated into Norwegian using a standardized translation procedure. The English version was first translated to Norwegian by two native speaking Norwegian physicians who were fluent in English and then translated back to English by a translator, whose mother tongue was English, for comparison. Written consent, from the copyright holder Murray Johns, was given for the use of this Norwegian version of the ESS.

Overnight polysomnography

Patients who were evaluated for sleep apnea underwent an ambulant sleep apnea recording using the Embla system (Embla; Medcare-Flaga; Reykjavik, Iceland). In these studies, signals from airflow (using thermistors), arterial oxygen saturation, heart rate, abdominal and thoracic respiratory movements, body position, and snoring were recorded.

These polysomnography (PSG) recordings were scored the next day by doctors specializing in clinical neurophysiology using criteria from the 1999 American Academy of Sleep Medicine Task Force [15].

Multiple sleep latency test

The MSLT is considered to be the gold standard for the diagnosis of EDS [2] and was used to validate this Norwegian version of the ESS.

The participants referred to MSLT also underwent a PSG recording using the ambulant Embla system, including surface electrode recordings of the electroencephalogram (C3, C4, O1, and O2), electro-oculograms (EOG1 and EOG2), electrocardiogram, as well as m. submentalis and bilateral m. tibialis anterior electromyograms. Before the MSLT and PSG, all patients filled out sleep logs, for a duration of 2 weeks, where information about use of medications was also described. Patients were also required to fill out a sleep log the day and night of the recordings. In conjunction with the MSLT, patients wrote down if they were presently taking any medication, when and how fast they fell asleep, and whether or not they dreamt during the nap periods. In conjunction with the PSG, patients described when they had their last meal of the day, if they drank alcohol or took any medications the night of the recording, when they went to bed, when they turned off the lights, how many hours they felt they slept, if they were up and out of bed at any time during the night, and if they had any problems with the Embla equipment.

The MSLT was completed during the daytime, where the patients were given five nap opportunities, with test durations of 20 min, 2 h apart throughout the day. Patients themselves marked the start and stop time of each nap period by pressing the event button on the Embla equipment. They were required to stay awake between these napping sessions to correctly complete the test. Patients who did not comply, by either sleeping in between the set napping sessions or by not indicating their start and stop napping times with the event button, were not included in this study, as the MSLT was considered to be incorrectly done. The PSG was then carried out the following night. This practice is a variation of the accepted convention [16, 17], but makes these studies more convenient and economical for both the patient and the laboratory. The recordings were analyzed and scored the next day by doctors specializing in the field of clinical neurophysiology, using the Rechtschaffen and Kales standardized scoring system for sleep stages [18]. Three doctors were responsible for the scoring of these tests, but 95% of the recordings included in this study were scored by only one of these doctors. Due to this fact, inter-scoring reliability is high.

Statistical analysis

Descriptive statistics are presented using means (SD) or number (%). Internal consistency reliability of the ESS total score was assessed using Cronbach’s coefficient alpha in the total sample of patients evaluated for OSA. Test–retest reliability of the ESS items was assessed using weighted kappa, with quadratic disagreement weights kappa statistics indicate the proportion agreement between observers after correction for the agreement expected by chance, with 1 indicating full agreement, 0 equal to chance alone, and −1 complete disagreement [19]. Test–retest reliability for the aggregated ESS total score was assessed using an intraclass correlation coefficient.

We assessed the construct validity of ESS by Spearman’s rank correlation of ESS item and total scores with variables from the MSLT: number of times the patient fell asleep and mean latency to sleep (in minutes). When discussing the strength of agreement correlations, we used the following nomenclature from agreement statistics [20]: <0.20 poor, 0.21–0.40 fair, 0.41–0.60 moderate, 0.61–0.80 good, 0.81–1.00 very good. A moderate to good correlation would support the construct validity of ESS. We also assessed this separately in a subset of patients with pathological ESS score, i.e., ESS > 10.

We further assessed criterion-related validity in the total sample of patients that were evaluated for OSA by investigating whether the ESS items and total score could discriminate between groups of patients selected according to snoring (yes vs. no), apnea–hypopnea index above/below the sample median (>24 per hour vs. ≤24 per hour) and whether the patients were in possession of a CPAP machine (yes vs. no). Similarly, in the MSLT group, we assessed criterion-related validity using self-reported information about snoring (yes vs. no), sleeping during the day (yes vs. no), and sleepiness while driving (yes vs. no) as criteria.

Groups were compared using the t test for independent samples. We chose a 5% significance level. The SPSS statistical software version 11.5 (SPSS, Chicago, IL, USA) or Stata version 8.2 (Stata Corp., College Station, TX, USA) were used for the analyses.

Results

Descriptive statistics comparing the two groups of patients are presented in Table 1. Among patients complaining of EDS who underwent MSLT (n = 37), the mean age was 43 years (age range, 32–54 years), and 32% were women. For patients who were previously evaluated for OSA (n = 178), the mean age was 56 years (age range, 45–67 years), and 24% were women.

Table 1 Descriptive statistics, mean (SD)

Reliability of the Norwegian version of the Epworth sleepiness scale

Internal consistency reliability, as assessed with Cronbach’s alpha, was 0.84 (n = 154). Test–retest reliability of the Norwegian version of ESS, as assessed with weighted kappa in a subset of patients evaluated for OSA, ranged from 0.61 to 0.80 for the eight different ESS items (n = 50) (Table 2). For the aggregate ESS total score, the test–retest reliability was 0.81, as assessed with an intraclass correlation coefficient.

Table 2 Test–retest reliability of Epworth sleepiness scale items (n = 50)

Validity of the Norwegian version of the Epworth sleepiness scale

In the total sample of participants referred for EDS (n = 37), there was only a fair to moderate correlation between the number of times a patient fell asleep and the ESS items of “sitting quietly after a lunch without alcohol” and “lying down to rest in the afternoon when circumstances permit,” and poor correlation for the remaining individual ESS items and the ESS items for the total sample (Table 3).

Table 3 Spearman’s rank correlation of the Epworth sleepiness scale (ESS) item and total score with the multiple sleep latency test (MSLT)

In a subset of patients (n = 25) with ESS scores > 10, we found a fair correlation between the number of times a patient fell asleep and the ESS items of “sitting quietly after a lunch without alcohol,” “in a car, while stopped for a few minutes in traffic,” “sitting and reading,” and the ESS total score. There was a moderate correlation between the number of times a patient fell asleep and the ESS item of “watching TV.” The remaining ESS items showed poor correlation to the number of times a patient fell asleep. In this subset, we also found a fair correlation between the mean latency to sleep and the ESS items of “lying down to rest in the afternoon when circumstances permit,” “sitting and talking to someone,” “sitting quietly after a lunch without alcohol,” and total ESS score. The remaining ESS items showed poor correlation with the mean latency to sleep (Table 3).

When comparing ESS items and total scores according to presence or absence of snoring, sleeping during the day, and sleepiness while driving in patients complaining of EDS who underwent MSLT, as well as ESS item and total scores according to snoring status, AHI < 24 h−1 vs. AHI > 24 h−1, and possession of a CPAP machine, in patients evaluated for OSA, one finds few significant correlations (Tables 4 and 5).

Table 4 ESS items and total scores according to presence or absence of snoring, sleeping during the day, and sleepiness while driving in the MSLT group, mean (SD)
Table 5 ESS item and total scores according to snoring status, AHI ≤ 24 h-1 vs. AHI > 24 h-1, possession of a CPAP machine in the OSA group, Mean (S.D.)

In the MSLT group (n = 36), there were statistical differences in six of the eight items scores and the ESS total scores between those reporting sleeping during the day compared to non-sleepers. In contrast, there was a difference only on one ESS item when dividing groups according to the presence of snoring or sleepiness while driving (Table 4).

In the larger sample being evaluated for OSA (n = 178), there were differences in item scores and the ESS total score among those reporting snoring or not. However, there were no differences in scores between groups divided according to the median AHI index or according to being in possession of a CPAP machine (Table 5).

Discussion

In this study, we investigated the reliability and validity of a Norwegian version of the ESS. The internal consistency reliability was acceptable, in line with recommendations with Cronbach’s alpha > 0.80 [21]. Assessment of the test–retest reliability among patients evaluated for OSA showed good agreement and therefore supports the reliability of the Norwegian ESS. We found only a fair to moderate association of the different ESS items and total score with the number of times a patient fell asleep and mean sleep latency found on the MSLT, mainly in a subset of patients who had a total ESS score > 10. These findings may question the validity of the ESS when using MSLT variables as a gold standard for daytime sleepiness.

Our finding of a good test–retest reliability of the Norwegian ESS is comparable to that of other translations of the ESS [3, 12, 22, 23]. In contrast, a recently published study, where the ESS was administered to 142 patients with sleep apnea syndrome twice with an average time interval of 71 days, reported highly variable ESS scores over time [24]. This larger variability may be related to the longer time span between the assessments than in the present study.

The ESS represents a subjectively reported situational sleep propensity [25], and there are conflicting results about the association of the ESS with other objective measures of sleep propensity. Some previous studies have shown that higher ESS scores are associated with lower mean sleep latencies of the MSLT [2, 3]. In the present study, we found only fair to moderate associations between the MSLT and different ESS items and total ESS score in patients complaining of EDS, in line with previous research [5, 810] that have questioned whether the ESS can replace objective measures of sleep propensity like the MSLT [810]. Some of the prior studies go on to suggest that the ESS and MSLT may evaluate different aspects of sleepiness and that the use of both methods can be complementary to each other. This seems reasonable, as MSLT is a time-consuming, burdensome, and relatively expensive test to perform when compared to the relative ease and inexpensive administration of the ESS. The MSLT is a commonly used method to objectively measure sleep propensity and is a valuable aid in the diagnosis of illnesses such as narcolepsy. The ESS cannot replace MSLT in these situations. The findings in the present study suggest that the use of the ESS alone to quantify and reliably diagnose EDS is not sufficient, and we believe the MSLT is still needed to objectively quantify sleep propensity.

Some limitations of the study should be noted. The MSLT was performed during the day before an overnight PSG. This practice varies from the accepted convention of performing the PSG the night before the MSLT [16, 17], but because our laboratory performs these studies in an ambulant setting, recording the PSG the night of the MSLT makes it more cost-effective, efficient, and convenient for both the patient and the laboratory. This method of performing MSLT in an ambulatory setting is largely based on almost 30 years of tradition at our laboratory. Guidelines defining the use of the MSLT have changed over the years, where the newest guidelines published in 2005 state that “the MSLT must be performed immediately following PSG recorded during the individual’s major sleep period” [17]. Previous guidelines however have not been that categorical, for example, stating that “the MSLT is generally performed on a day following a clinical PSG recording” [26]. We have continued our practice of performing PSGs after MSLTs mainly due to logistic reasons. The obvious and biggest disadvantage of our current practice is the lack of objective documentation of sleep quality the night before the MSLT. We are unaware of any studies that have previously validated any form for ambulatory MSLT against the “gold standard” in-patient MSLT, and we are therefore in the process of initiating such a validation study.

The Norwegian version of the ESS used in this study was translated at Akershus University Hospital in 2001, when we were unaware that other researches worked on another Norwegian translation of the same scale. This version of the scale has been used in a telephone survey [27]; however, we are not aware of assessment of its psychometric properties. The two versions differ only slightly from one another, most significantly in the translation of the English verb dozing, where the two versions have selected two distinct Norwegian verbs. Another difference is that in the version used in this study, the ESS items are, as in the original version, translated in the present participle (i.e., sitting and reading), while the other version uses the infinitive translation of the verbs (i.e., to sit and read). Interestingly, in one study, where the aim was to develop a Greek version of the ESS, the researchers translated an English ESS, where instead of the verb dozing, probability of sleepiness and sleeping was used [23].

The present study was conducted in two different populations, one consisting of patients evaluated for OSA and the other of patients with a sleep disturbance, who were referred to a sleep study and an MSLT. This study documents the psychometric properties of the Norwegian version of the ESS, which we therefore think can be used in practice. However, the Norwegian version has the same limitations as other language versions of the ESS. The measurement properties of the ESS clearly vary according to the population in which it is used, which has been emphasized in previous reports [7, 28]. This noted that the ESS is still an important questionnaire for the assessment of subjective sleep propensity, especially in patients who are evaluated for or have been diagnosed with OSA. It is translated into several different languages and used in many countries around the world [2, 11, 12, 22, 23]. The ESS has the advantage of evaluating average sleep propensity in recent times [29], while the MSLT evaluates the condition of the patient in a very specific situation and does not necessarily reflect daily life [22]. We conclude that the Norwegian version of the ESS had acceptable internal consistency and test–retest reliability. The association of the ESS items and total score with the MSLT was only fair to moderate, in line with previous reports from other countries.