Introduction

Excessive daytime sleepiness (EDS) is one of the most common clinical features of obstructive sleep disordered breathing (OSDB), a spectrum of disorders ranging from primary snoring (PS) and upper airway resistance syndrome (UARS) to obstructive sleep apnea syndrome (OSAS). Although the standard objective method to assess the degree of sleepiness seems to be the multiple sleep latency test (MSLT) [1], it is expensive, time consuming, and impractical for large-scale application. To more easily evaluate the sleep propensity of OSDB patients, the Epworth sleepiness scale (ESS) originally developed by Johns, has instead become the most popular method [2]. The ESS is a set of self-administered questionnaires which aims to assess the degree of sleepiness during eight common situations where subjects are asked to rate their chance of dozing in recent times on a scale of 0 to 3 in each situation. The total score, thus, can range from 0 to 24 in which a lower score means less sleepiness. It has been translated and proven for its reliability and validity in several languages, for example Spanish [3], German [4], Italian [5], Norwegian [6], Greek [7], and Turkish [8]. In Asian countries, it was also translated into the Chinese [9] and Japanese languages [10].

To maintain the usefulness of the ESS and to allow comparison among results of researches from different centers, it is important to have a standardized version particularly when translated into another language. In Thailand, although it has been used in several sleep researches and clinical practice for years, there have never been any studies to validate the Thai version of the ESS and its application in OSDB patients who might have different activities and cultures from those of the West. The objectives of this study are therefore to translate the ESS into Thai language by using a standard method, to test its reliability and validity, and to assess the relationship between its score and the severity of OSDB.

Materials and methods

This study was supported by the Routine to Research Management Fund, Faculty of Medicine Siriraj Hospital, Mahidol University and was conducted between November 2008 and May 2010 after the approval from the Siriraj Institutional Review Board. The translation processes were also kindly permitted and advised by the original developer (Murray W. Johns) [2].

Translation of the original ESS into Thai language

The translation of the 1997 Australian English version of the ESS (copyright of M.W. Johns 1990–1997) which was used in this study followed standardized processes. These started from the translation of the ESS English version into Thai by four translators who are fluent in English, including one professional translator from a university. One of these translated versions was blindly selected with total agreement by the research committees who are medical specialists and translated back into English by another professional translator for comparison. This process was repeated until the selected final English version is as close as possible in vocabulary and meaning to the original. Although during discussion, some of our committee members expressed concern about the possible differences among cultures and the ambiguity of some questions, particularly the last one, “in a car while stopped for a few minutes.”, the consensus was to retain the content without significant adjustment and to observe until proven otherwise. This final version was then tested in a small group of subjects and minimally adjusted before applying it to the larger study groups.

Control subjects

A total of 71 control subjects were selected. In the first control group, 32 daytime hospital employees were recruited by sleep questionnaires developed by our research committee who are sleep medicine specialists. The inclusion criteria were normal healthy people age >18 years who had a body mass index (BMI) <30 kg/m2, no history of snoring or witnessed apnea, no complaints of daytime sleepiness, and no history of insomnia or difficulty sleeping. All selected subjects must have a regular nighttime sleep pattern starting before midnight and wake up no later than 8 am with an average total sleep duration of 7–9 h per night. All must have a history of waking up after sleep onset no more than two times per night and no problem of getting back to sleep. Any subjects who were shift workers or worked later than 8 pm were excluded. The people who had underlying chronic illnesses or used substance(s) or medications affecting the sleep–wake cycle such as hypnotics or stimulants were also excluded from the study. In the second control group, 39 patients with PS or snoring without EDS and apnea–hypopnea index (AHI) less than five confirmed by full night polysomnography (PSG) were included for additional comparison. All of these subjects were asked to complete the ESS at least one time and 44 of them were also asked to do it again 4 weeks later to check the test–retest reliability of the questionnaire.

Subjects with OSA

A total of 157 patients (115 males and 42 females) aged >18 years old who complained of snoring or excessive daytime sleepiness and visited the outpatient clinic of Siriraj hospital were included. At first visit, all patients completed the ESS Thai version and the questionnaires, including general demographic data (age, sex, height, weight, BMI), OSDB symptoms, sleep, and other medical history. Those who had other comorbidities such as insomnia, restless leg syndrome, chronic alcoholism, psychiatric illness, unstable cardiovascular diseases, or cancer were excluded. All participants underwent a standard overnight level-I PSG recording electroencephalogram, electro-oculogram, electromyogram, electrocardiogram, nasal pressure transducer, and thermistor for airflow measurement, thoracic and abdominal movement measurements, oxygen saturation monitoring, and a microphone for recording snoring sound. All polysomnographic data in our study were scored manually by experienced sleep technologists and reviewed by a board-certified specialist in sleep medicine who was unaware of the patients’ information. The definitions of sleep stages and respiratory events used in this study were according to the recommended criteria in the manual of American Academy of Sleep Medicine for the scoring of sleep and associated events 2007 [11]. In particular attention, the AHI was calculated and used for classification of the disease’s severity into three groups, including AHI of five to <15 (mild OSA), AHI of 15 to <30 (moderate OSA), and AHI of ≥30 (severe OSA).

In order to check the discriminant validity of the ESS, we recruited 126 OSA patients to compare with the control groups. To assess its test–retest reliability, we asked 27 OSA patients to complete the questionnaire again 4 weeks later before any treatment was started. To check the responsiveness properties of the questionnaire, we requested 15 patients who regularly used continuous positive airway pressure (CPAP) machine longer than 5 h per night and 16 patients who underwent uvulopalatopharyngoplasty (UPPP) with radiofrequency (RF) therapy of tongue base or inferior turbinate to complete the ESS again at 3–6 months after treatment.

Statistical methods

To calculate the sample sizes for discriminant validity in this study, we used an alpha error of 5% and power (beta) of 90% with mean differences of 3.0 and standard deviations (S.D.) of 4.0. Therefore, the initial estimated number of the control group was 30 and the diseases group was 120. The ESS scores were described by mean ± S.D. and 95% confidence intervals (CI). For the reliability test, we used Cronbach’s alpha coefficients as the indexes of the internal consistency with accepted values of 0.7 or higher. To assess the test–retest reliability, we used the intra-class correlation coefficient (ICC). For comparison among groups or discriminant validity, the ESS scores were tested by one-way ANOVA and then by post hoc Tamhane test. The relationship between ESS and continuous variables such as AHI, sleep latency, lowest O2 saturation were tested by Spearman’s correlation coefficients. Statistical analysis was performed by using the SPSS (version 13.0). Significance was accepted at p < 0.05 in two-tailed tests.

Results

There were a total of 228 subjects included in this study. In total of 71 subjects of the control groups, there were 32 normal healthy subjects (11 males and 21 females with mean ages of 33 years and mean BMI of 21.6) and 39 patients with PS (23 males and 16 females with the mean age of 45 years and mean BMI of 24.7). In the OSA group, there were 126 patients (89 males and 37 females with the mean age of 51 years and mean BMI of 27.1). (Table 1) The mean scores of the ESS in normal subjects, PS, and OSA patients were 6.2 ± 3.3, 6.0 ± 2.9, and 9.9 ± 5.3, respectively. There was no statistical difference in the mean ESS scores between male and female subjects in all groups (p > 0.05). In the multiple linear regression correlation analysis, we also found no statistical difference between the scores of ESS and genders or age (p = 0.29 and 0.22, respectively). When each item was analyzed, the highest score was 1.7 ± 0.9 in the fifth item (a question about a chance of dozing off when lying down to rest in the afternoon when circumstances permit) and the second highest score was 1.6 ± 1 in fourth item (a chance of dozing off as a passenger in a car for an hour without a break). On the other hand, the lowest score was 0.3 ± 0.6 in the sixth item (in a situation of sitting and talking to someone) and the second lowest score was 0.4 ± 0.7 in the eighth items (in a car while stopped for a few minutes in traffic).

Table 1 The demographic data of subjects and ESS scores

Reliability

Cronbach’s alpha coefficient for the ESS Thai version in this study was 0.87 which indicated an excellent internal consistency. After deleting some specific items, particularly on the sixth item (sitting and talking to someone) and the eighth item (in a car while stopped for a few minutes in the traffic), there were no substantial changes in the values (Cronbach’s alpha, 0.84–0.86.) The test–retest reliability or the reproducibility was done in 71 subjects. The ICC was 0.79 (95% CI, 0.69–0.86). In 44 control subjects and 27 OSA subjects, the ICC were 0.67 (95% CI, 0.47–0.81) and 0.82 (95% CI, 0.63–0.91), respectively.

Discriminant validity

In this study, there was no statistical difference in the mean of the ESS total scores between normal healthy and PS subjects; however, we found a statistically significant difference between the control groups and OSA patients (p < 0.001; 95% CI, 5.0 ± −2.6). Nevertheless, there was no statistically significant difference in the mean ESS scores among different severity of OSA classified by AHI except a trend to be higher in moderate to severe OSA than in mild OSA (Fig. 1). When analyzing the PSG parameters, we found only a weak relationship between AHI and the ESS scores (Spearman correlation coefficients = 0.38). Other parameters such as the arousal index, mean or minimal O2 saturation, time of O2 above 90%, and apnea index (AI) also had a significant but very weak correlation with the ESS scores; however, there were almost no correlation between it and total sleep time, sleep efficiency, sleep latency, rapid eye movement (REM) latency, and sleep stages proportion (percent). These results implied that the ESS scores alone were not a good predictor for OSDB severity (Table 2).

Fig. 1
figure 1

Comparisons of the ESS scores between different groups. Data are shown as box and whisker plots; the line within the box marks the mean, and the boundaries of the boxes delineate the 25th and 75th percentile. The Plot demonstrates that there was a statistically significant difference between the mean of the ESS scores of the control groups (6.1 ± 3.0) and OSA patients (9.9 ± 5.3) (p < 0.001; 95% CI, 5.0 ± −2.6); however, there was no statistical difference between normal healthy (6.2 ± 3.3) and PS subjects (6.0 ± 2.9), and no statistically significant difference among different severity of OSA except a trend to be higher scores in moderate to severe OSA than in mild OSA. *The mean difference is significant at the <0.001 level (two-tailed). NS not significant

Table 2 The correlations between ESS scores and polysomnographic parameters

Responsiveness

In the 15 patients (11 males and four females) with OSA who were treated with CPAP, the mean score of the ESS before and 3 months after treatment were 13.9 ± 4.0 and 3.4 ± 1.7, respectively (p < 0.001). In 16 patients (15 males and one female) with OSA who underwent upper airway surgery including UPPP and RF of the tongue base or inferior turbinate who reported significant improvement in symptoms, the mean scores of the ESS before and 3 months after operation were 14.7 ± 4.0 and 5.0 ± 2.3, respectively (p < 0.001). This statistically significant improvement showed a property of responsiveness in the ESS scores, Thai version, to the changes after treatment.

Discussion

OSDB is a spectrum of disorders characterized by repetitive events of upper airway narrowing causing fragmented sleep and/or oxygen desaturation. Untreated OSDB patients frequently have EDS which may increase the risk of motor vehicle accidents [12], social problems [13], and cardiovascular consequences [14]. Its clinical severity can range from just a simple or PS and UARS to a more severe form of OSAS.

In several researches and clinical practice, ESS has proven its usefulness and seems to be the most popular tool to evaluate the sleep propensity among OSDB patients [25, 710, 1519]. Nevertheless, its application in Thais may be limited, possibly due to a difference of language and cultural system. In this study, we prevented some potential problems such as content inequivalence by following standard processes of professionally forward and backward translation, content experts’ examination of translation quality, and minor adjustment after a pilot test.

The results of this study showed that our Thai version of the ESS had an excellent internal consistency reliability (Cronbach’s alpha coefficients = 0.87). No unusual strong influence on the coefficients was found when one of these eight questions was eliminated (Cronbach’s alpha ranging from 0.84 to 0.86). Furthermore, the test–retest reliability of this version was also highly acceptable as demonstrated by the ICC of 0.79 which was greater than 0.5 as recommended for good reproducibility coefficients [20].

The mean score of the ESS obtained in our control groups were quite similar to those articles of Johns [2] (control, 5.9 ± 2.2 and PS, 6.5 ± 3.0), the Greek version [7] (control, 5.6 ± 3.2), and the German version [4] (control, 5.7 ± 3.0) but higher than the control of the Turkish version [8] (3.6 ± 3) and lower than the Chinese version of Chung [21] (7.5 ± 3.0). Nevertheless, the ESS scores in our OSA patients were lower than those of all the former articles [2, 4, 69]. The author hypothesized that it is probably from a difference in lifestyle because many of the Thai patients did not drive and more often answered the last question of the ESS (in a car while stopped for a few minutes in traffic) as zero. Regarding the normal control group in this study, we selected more female subjects with younger age and used rigid inclusion criteria in order to reduce the risk of having OSA when polysomnography was not available to confirm the diagnosis. We believe that it should be acceptable because the data from a previous study by Johns [2] showed no distinction between the ESS scores of both sexes and the data from the multiple linear regression analysis in our study had confirmed this insignificant difference of ESS scores between both genders and among different ages (p = 0.29 and 0.22, respectively).

The relationship between daytime sleepiness as measured by the ESS and the severity of OSDB is somewhat conflicting in the literature and has been reported as being only weakly associated [2, 9, 15, 2123]. Although EDS is more prevalent in patients with OSA than in normal control or PS, the use of the ESS to screen the presence of OSA in the general population is limited by its low sensitivity and specificity. In one study, there were 65% of patients with severe OSA who had the ESS scores of 11 or less [15]. In this study as well, despite the significant difference of the ESS scores between OSA patients and the control groups, we could not find this distinction within the OSA group except an insignificant trend of higher scores in the moderate and severe groups than in mild OSA. Furthermore, when we analyzed the ESS scores and common PSG parameters, there were only weak associations or almost no correlation between them. Therefore, our results were comparable to those of previous reports [4, 79, 18, 21, 23]. This weak property of the ESS in differentiating OSDB severity may be due to several confounding factors such as the complexity of sleep mechanism, the effect of PSG including first-night effect and night to night variability, the sleep deprivation, the use of caffeine or medication, psychological or medical illnesses, and co-existence with other sleep problems. Some cases of OSDB may perhaps underreport their sleepiness because they lose their frame of reference for abnormal sleepiness, [24] which is probably due to having this problem for a long time. In addition, they may deny it because of social pressures such as a concern over losing their job. On the other hand, some patients may be indeed asymptomatic despite having a severe disease.

In responsiveness to treatment or sensitivity to change after intervention, our data showed that the mean ESS scores had decreased from a baseline of 13.9 ± 4.0 to 3.4 ± 1.7 after regular CPAP usage for 3–6 months in 15 patients with OSA (p < 0.001). The mean scores of the ESS had also reduced from the preoperative values of 14.7 ± 4.0 to 5.0 ± 2.3 at 3–6 months postoperatively in 16 patients with OSA (p < 0.01). These results were in the same direction with their self-reported dramatic improvement in symptoms after therapy; thus, ESS may be useful in monitoring responses to the treatment for OSAS or possibly other sleep disorders [25, 26].

There are possibly some limitations in this study. First, our normal subjects were mainly full time healthy hospital employees and PSG was not done in this group. Our point is that it is acceptable and not different from previously published literature [2, 4, 5, 79, 21]. Second, the control groups may not be perfectly matched in age and gender with the disease group; however, no significant difference between the ESS scores and genders or ESS score and age was found in our study. Third, we did not compare the ESS with the MSLT which may be a gold standard objective test. Nevertheless, we believe that the questions given to assess sleepiness in routine activities will reflect the reality or be more practical in application, for testing purposes than sleepiness in dark laboratory conditions. The relationship between the ESS and MSLT is also conflicting [22, 23, 27, 28]. Fourth, the original ESS questions in various situations may not be appropriately applied to majority of Thai people; for example, the last question about sleepiness in a car while stopped for a few minutes in the traffic, for which many responders were unclear about whether they were drivers or passengers in a car. Furthermore, most Thai people do not drive by themselves but rather take public transportation or ride motorbike vehicles instead. Consequently, some of them did not know how to answer this question correctly and then often scored it as 0.

Although the ESS is well-known and has been proven to be a very useful tool for sleep research, its application for clinical practice should be done in combination with more comprehensive sleep history and clinical examination. This is possibly due to the complexity of sleep–wake cycles. For future researches, especially in Asian people, the need to modify or apply it in different situations is waiting to be proven.

Conclusion

Our Thai version of the ESS showed an excellent internal consistency and good test–retest reliability. It may be able to discriminate between people without complaints of EDS such as normal people or PS and patients with OSA; however, the ESS score has a weak relationship with AHI. Although very useful, it should not be used as a single tool to predict the OSDB severity. We recommend the use of it in combination with a more comprehensive clinical evaluation.