Introduction

Daytime sleepiness is considered to be a very important health problem, especially in those who suffer from obstructive sleep apnea (OSA) [1] and narcolepsy [2]. Obstructive sleep apnea affects 9% to 24% of adult middle-aged population [3]. Undiagnosed OSA has many side effects for patients such as cardiovascular complications [4], occupational hazards [5], and traffic accidents [6]. Narcolepsy is established by the classic tetrad of excessive daytime sleepiness with irresistible sleep attacks, cataplexy, hypnagogic hallucination, and sleep paralysis [7].

The gold standard test for OSA diagnosis is polysomnography (PSG) [8], and for narcolepsy, the multiple sleep latency test (MSLT) has been suggested [9]. However, daytime sleepiness could be assessed using subjective measures. Of these, the Epworth Sleepiness Scale (ESS) is among well-known instruments. This questionnaire was introduced by Johns in 1991 [10] and then was extensively used in research studies and clinical managements worldwide. There is evidence that a significant correlation between the ESS and apnea–hypopnea index (AHI), sleep onset latency on PSG, and daytime sleep onset latency on MSLT exists [10]. The ESS was translated into various languages, and its reliability and reproducibility was reported in different cultures [1122].

Despite need for such an instrument in Iran, no Iranian instrument exists to assess sleepiness in Persian-speaking populations. This study aimed to translate and validate the ESS in Iran in order to facilitate its use both in research and clinical settings.

Materials and methods

Epworth Sleepiness Scale

The ESS is an eight-item questionnaire that is being used as a simple and inexpensive measure for subjective evaluation of daytime sleepiness [10]. The questionnaire asks respondents to rate their sleepiness in eight daily situations from 0 to 3 giving a total score of 0 (no daytime sleepiness) to 24 (the most excessive daytime sleepiness). The cutoff point for excessive daytime sleepiness is considered to be equal or greater than 10 [23].

Translation

Standard forward–backward method was used to translate the English version of ESS into Persian (the official language of Iran) by two physicians. A fellowship of sleep medicine compared these translations, and a single provisional version was provided. In this Persian version, considering that drinking alcohol is illegal in Iran, the word “alcohol” was deleted in item 7. However, two professional translators who were not familiar with the questionnaire back-translated the provisional Persian version into English. Translations were handed to a bilingual physician, and from these two, one English version was derived which was not meaningfully different from the original version. Eight physicians expert in sleep medicine assessed content validity, and some changes were made. Then, for evaluating face validity, 15 patients completed the questionnaire and obscure words and sentences were changed, and final version of the questionnaire was provided (“Appendix”). Written consent for translation and the use of the Iranian version of ESS was obtained from the copyright holder.

Patients

This study was performed in three well-known referral sleep disorder clinics in Tehran, Iran from September 2008 to May 2011. There were two groups of patients: The first group was OSA suspicious patients identified by PSG (PSG group, n = 466), whereas the second group was narcolepsy suspicious patients with excessive daytime sleepiness identified by MSLT (MSLT group, n = 41). All patients in three clinics were visited by the first author. Patients filled in the Iranian version of ESS (ESS-IR) at their first clinic visits.

Patients who did not want to participate in the study nor had cardiac or respiratory failure were excluded. Written consents were obtained from all patients.

Additional measures

Polysomnography

Polysomnography is the gold standard test for OSA diagnosis. Electrooculogram, electrocardiogram, electroencephalogram, and electromyogram (submentalis and bilateral tibialis anterior muscles) were monitored by surface electrodes. Also snoring, respiratory airflow, arterial blood oxygen saturation, respiratory effort, and body situation were recorded by specific sensors and infrared beams video monitoring [24]. All tests were analyzed by first author according to the recommendation criteria by the American Academy of Sleep Medicine [25]. An apnea was defined as a decrease in airflow more than 90% from baseline for at least 10 s, and hypopnea was defined as a decrease in airflow more than 50% from baseline for at least 10 s with a ≥3% reduction in oxygen saturation or with arousal. AHI is calculated by dividing sum of apnea and hypopnea by hours of sleep.

Multiple sleep latency test

The MSLT is the gold standard test for evaluating daytime sleepiness. This test usually starts around 2 h after waking up from nighttime sleep. To identify those patients who might have night time sleep problems, the night before the MSLT, patients were surveyed by PSG [26]. The MSLT consists of at least four nap opportunities performed at 2-h intervals. In nap opportunities, patient was assessed by PSG device. Each nap opportunity was terminated after 20 min if sleep does not occur. If patient fell asleep, in order to assess for the occurrence of rapid eye movement (REM) sleep, nap was terminated after 15 min from the first epoch of sleep. If in any of the four naps patient entered into REM, the fifth nap opportunity was performed. In 2-h intervals between naps, individuals were out of bed and did not have permission to sleep. As the MSLT had taken carefully and its outcome results had shown real scale of individual’s sleepiness, we asked patients about use of any medications. The use of central nervous stimulants at least 15 days before performing the MSLT was not allowed, and if needed, our clinical team planned the use of medications. Individuals did not have permission to take any alcohol, caffeine, or smoke at the day that the MSLT is being performed. First author analyzed all tests.

Reliability

Reliability of the ESS-IR was assessed by two methods:

  1. 1.

    Internal consistency using the Cronbach’s alpha coefficient

  2. 2.

    Test–retest analysis: 123 patients filled in the ESS-IR twice: once at first clinic visit and once more before PSG. Intervals between the first clinic visit and performing PSG varied from 2 to 4 weeks.

Validity

Validity of the ESS-IR was assessed by three methods:

  1. 1.

    Construct validity: Factor analysis was performed separately in PSG and MSLT groups.

  2. 2.

    Discriminant validity: According to the AHI, patients divided into four groups: (a) patients without OSA (AHI < 5, n = 119), (b) patients with mild OSA (5 ≤ AHI < 15, n = 92), (c) patients with moderate OSA (15 ≤ AHI < 30, n = 81), and (d) patients with severe OSA (AHI ≥ 30, n = 174). It was expected that the ESS-IR would discriminate between these groups.

  3. 3.

    Criterion validity: Relationships between obtained results from the MSLT and the ESS-IR items and total score were calculated.

Responsiveness to change

Responsiveness of the ESS-IR was assessed by comparing the ESS-IR total score before and after CPAP treatment. We checked the ESS score of 16 patients who reported significant improvement in their symptoms, 6 to 9 months after treatment by telephone interviews.

Statistical analysis

Several statistical tests were applied for establishing the psychometric properties of the ESS-IR. Cronbach’s alpha coefficient was assessed for evaluating internal consistency and alpha values equal or greater than 0.7 considered satisfactory [27]. Test–retest reliability was assessed using intraclass correlation coefficient (ICC). Factor analysis was used by performing principle component analysis for assessing construct validity. One-way analysis of variance (ANOVA) and post hoc Scheffe test were performed to examine how well the ESS-IR could discriminate between patients who differed in PSG scores. Criterion validity was assessed by Spearman’s correlation coefficient between the ESS item and total scores and the MSLT findings. As suggested, Spearman’s correlation coefficient power was categorized as poor (0–0.20), fair (0.21–0.40), moderate (0.41–0.60), good (0.61–0.80), and very good (0.81–1) [28]. Responsiveness of the ESS-IR was assessed by Wilcoxon’s test for paired nonparametric data. P value <0.05 was considered statistically significant. PASW statistics 18 was used for statistical analysis.

Results

Demographic and anthropometric characteristics of patients as indicated by PSG and MSLT are shown in Table 1. Neck circumference, chest circumference, abdomen circumference, systolic blood pressure, and diastolic blood pressure were only available for some patients.

Table 1 Demographic and anthropometric characteristics of studied patients

Reliability of the Iranian version of Epworth Sleepiness Scale

Cronbach’s alpha coefficient of the ESS-IR in patients indicated by PSG was 0.82 and in patients surveyed by MSLT was 0.88 (Table 2). The ICC was 0.81 (95% confidence interval 0.74–0.86, P value <0.001).

Table 2 The Cronbach’s alpha values if an item of the ESS-IR was deleted and factor loadings from factor analysis of the ESS-IR in PSG and MSLT groups

Validity of the Iranian version of the Epworth Sleepiness Scale

Factor analysis in both groups was assessed separately. In PSG group, two factors were derived with eigenvalues of 3.63 and 1.02 for the first and the second factors, respectively, and jointly accounted for 58.19 of variance observed. Also in MSLT group, two factors were derived with eigenvalue of 4.53 and 1.18 for the first and the second factors, respectively, and jointly accounted for 61.50 of variance observed. Factor loadings of each item in two derived factors are shown in Table 2.

One-way ANOVA showed the difference in ESS item and total scores between groups 1 to 4 as indicated by PSG (Table 3). As shown, the ESS total score significantly varied between group 1 and 3, 1 and 4, and also 2 and 4 but not between group 1 and 2, 2 and 3, and 3 and 4.

Table 3 The mean ± standard deviation of the ESS-IR items and total score in PSG groups 1–4 and results of discriminant analysis with one-way analysis of variance and post hoc Scheffe test

Spearman correlation coefficient of the ESS item and total scores with MSLT in group of 41 patients were shown in Table 4. As shown, there was a fair correlation between the ESS total score and MSLT results. There was a fair correlation between number of times patient fell asleep and items 4, 5, 6, and 8. There was a fair correlation between mean latency to sleep and items 3, 5, 6, and 8. There was a fair correlation between total sleep time and items 2, 3, 5, and 8. There was a poor correlation between number of REM in patient’s sleep and all items. The mean REM latency had fair correlation with items 3 and 7 and had moderate correlation with items 4 and 6.

Table 4 Spearman’s correlation coefficient of the ESS-IR items and total score with the MSLT

Responsiveness of the Iranian version of the Epworth Sleepiness Scale

In 16 patients who were treated with CPAP, the mean score of the ESS before treatment was 10.62 ± 6.56 while 6–9 months after treatment, the mean score of the ESS decreased significantly to 3.75 ± 2.51 (P < 0.001).

Discussion

In this study, we translated the ESS into Persian and assessed its reliability and validity. Translation of the questionnaire was based on standard translation procedure (forward–backward) which is the acceptable and recommended method for translating questionnaires into other languages from original versions.

Internal consistency of the ESS-IR in PSG and MSLT groups was, respectively, 0.82 and 0.88 which were higher than the minimum recommended value for internal consistency and were similar to other studies [1022, 29, 30]. As shown in Table 2, deletion of none of the items did not increase Cronbach’s alpha coefficient significantly. Test–retest intraclass correlation coefficient was also 0.81 which is acceptable and similar to other studies [1214, 16] and represents high consistency of the ESS in 2 to 4 weeks intervals between first clinic visit and PSG.

Factor analysis showed that the ESS-IR has two factors. Johns and some other investigators showed that the ESS had only one factor [1012, 30]. Rosales et al. in evaluation of the Spanish version of ESS in Peru showed that the ESS had two factors (eigenvalues 3.32 and 1.03) [17]. Zhang et al. studied the modified Chinese version of ESS (mESS) and showed that the mESS had two factors in group of healthy individuals (eigenvalues 1.66 and 1.40) and one factor in group of patients (eigenvalue 4.11) [31]. Nguyen et al. showed that the ESS had three factors (eigenvalues 6.2, 1.7, and 1.4) [32]. In our study, first factor showed statistically significant loads in all items except item 4 in MSLT group, and eigenvalue of the second factor in both groups was near to unit. Considering that the ESS total score is acquired by summation of each given score to the items, we can conclude that ESS-IR has only one factor which could be named tendency to sleep.

Patients that had different degrees of OSA had higher ESS score than patients without OSA (AHI < 5), but this difference was not significant between patients without OSA and with mild OSA. However, the difference was significant between patients without OSA and with moderate and severe forms of OSA. Also the ESS difference between patients with mild and severe OSA was significant whereas this difference between patients with moderate and severe OSA and patients with moderate and mild OSA was not significant. Johns in evaluation of the ESS difference between patients with primary snoring (n = 32) and OSA (n = 55) showed that there were significant differences between patients with primary snoring and patients with mild, moderate, and severe OSA. Also the ESS score difference in patients with severe and moderate OSA was significant, but the ESS score difference in patients with mild and moderate OSA was not [10]. Izci et al. in their study comparing 128 patients with OSA which were surveyed by PSG showed that the ESS score difference between patients with moderate and severe OSA was not significant. Also the ESS difference between patients with AHI < 15 and 15 ≤ AHI < 30 and patients with AHI < 15 and AHI ≥ 30 was not significant [12]. According to our study and two other mentioned studies, we can conclude that the ESS has a good power to discriminate between patients with and without OSA, although it cannot completely differentiate severity of OSA.

In this study, we showed that there was only poor to moderate correlation between the ESS items and total score with MSLT results. Some studies showed moderate correlation between the ESS total score and MSLT [1, 33] whereas some others, such as our study, reported no significant or negative correlation [16, 3437]. These findings suggest that the ESS which subjectively evaluates sleepiness does not have an equal value to MSLT, which is an objective measure of daytime sleepiness, and it cannot be replaced with MSLT in clinical managements especially for diagnosis of narcolepsy.

In order to assess the utility of the ESS in OSA patients, we analyzed the ESS score of patients after CPAP treatment. Patients who reported significant improvement in their symptoms especially in daytime sleepiness had significant lower ESS score in comparison to before treatment. This finding showed that if the ESS does not have excellent utility in diagnosis of OSA, it can be used as a valid tool in fallowing OSA patients [17, 19].

This study has several limitations. One of the limitations of this study was lack of control group from healthy individuals. In this study, we assumed patients with AHI < 5 as control group for patients with OSA, whereas most patients with suspicion of OSA were surveyed by PSG and had AHI < 5 had primary snoring. However, Johns [10] and Banhiran et al. [19] showed that the ESS difference between patients with primary snoring and healthy ones was not significant. Other limitations of our study is nonattendance of patients with other disorders such as insomnia, sleep terror, nightmare, advanced sleep phase syndrome, delayed sleep phase syndrome, shift working, etc. By including these patients in the study, we could not only assess the validity of ESS in patients with OSA but also evaluate the ESS abilities and features in other groups of patients.

In conclusion, the ESS-IR has a high degree of internal consistency and test–retest reliability. The ESS has the ability to differentiate individuals without and with moderate and severe forms of OSA. Correlation between the ESS total score and MSLT results was not significant, and the ESS cannot be replaced with MSLT. The ESS-IR as a subjective, inexpensive, simple, and culturally adapted measure is recommended for evaluating daytime sleepiness in clinic and research studies in Iran.