Introduction

Sleep apnea is a disorder of interrupted breathing during sleep due to collapse of the airway during breathing when the pharyngeal and tongue muscles relax during sleep, causing an obstructive event. According to a recent study, Young’s group found that one in six people over 50 years old had at least mild apnea, with one fourth of those cases severe [1]. Data generated by Young’s team suggest that at least 75% of severe sleep apnea cases still go undetected. Researchers estimated that two thirds of the people diagnosed at the outset of the study with apnea chose not to use the standard of treatment (continuous positive airway pressure) at least four nights/week during the time of the study. People with severe apnea who went untreated were four times as likely to die as those without the disorder [1]. Identifying at-risk individuals as early as possible improves intervention and might reduce morbidity.

One questionnaire that captures snoring/obstructive sleep apnea (OSA) symptoms is the Berlin questionnaire (Appendix). The Berlin questionnaire was an outcome of the Conference on Sleep in Primary Care, held in April 1996 in Berlin, Germany. This 11-item questionnaire includes questions about risk factors for sleep apnea, including snoring behavior, wake time sleepiness or fatigue, and obesity or hypertension. The Berlin questionnaire is a validated instrument that is used to identify individuals who are at risk for OSA in primary and some non-primary care settings [25]. The sensitivity ranges from 54% to 86% and the specificity ranges from 43% to 87% [5, 6] among primary care patients. This instrument has been reported worldwide in the USA [2, 3, 7], Jordan [8], and Nigeria [9], to name a few. It has also been reported recently to screen for OSA in surgical patients before anesthesia [5], as well as patients undergoing endoscopy [10], or to predict outcomes after the catheter ablation of atrial fibrillation [11]. This questionnaire has been used to determine the prevalence of OSA among orchestra members [12] among other populations. The reliability of the Berlin questionnaire has also been reported by Chung et al. [5]. The agreement and Cohen κ coefficient of test–retest were 96.3% (n = 54) and 0.9168 (confidence interval, 0.804–1.000), respectively. In conclusion, it is a self-report instrument that is focused on a set of known symptoms and clinical features associated with sleep apnea and has been widely used.

A second questionnaire reported in the literature and validated in a large study with 608 subjects is the apnea risk evaluation system (ARES). This sleep apnea screening questionnaire (Advanced Brain Monitoring, Inc., Carlsbad, CA, USA) is a validated questionnaire [13] that combines features of three established screens: the Berlin questionnaire [3], Flemons’ index [14], and the Epworth sleepiness scale [15]. Individuals are assigned as having “no significant risk,” “low risk,” or “high risk” of OSA. In a large study, the ARES algorithm for assigning OSA risk provided a sensitivity and specificity of 94% and 79%, respectively [13]. These results are an improvement over the Berlin questionnaire which reported a sensitivity and specificity of 0.86 and 0.77 using the same clinical cutoff for predicting a respiratory disturbance index (RDI) >5 [3]. The ARES screener has been cross-validated in a population of dental patients [16] and for perioperative screening [17].

The above studies suggest that screening for sleep apnea has promise. The objective of this prospective study was to compare the specificity and sensitivity of the ARES and Berlin questionnaires in a group of 84 patients.

Materials and methods

Study design

The study presented in this paper is part of a larger study conducted at the University of Southern California whose goal was to predict high risk of OSA based on imaging and family history. During that prospective case–control study, the Berlin and ARES questionnaires, as well as the outcome (ambulatory somnographic study) were assessed on all the subjects at recruiting time.

Subjects

This study included all subjects who completed the study. Of 85 subjects recruited, only one did not successfully complete the sleep study and was not included. In total, 53 moderate to severe OSA patients with RDI ≥ 15 events per hour (eight women and 45 men; mean age, 58.4 ± 10.34 years) and 31 control patients with RDI <15 (11 women and 20 men; mean age, 49.0 ± 12.63 years) were enrolled between June 2006 and June 2009 through mail, flyers, and oral communication among Dr. Clark’s private patients and faculty and staff at USC School of Dentistry. Dr. Clark’s patients who had sleep apnea as part of their diagnosis were contacted through mail and invited to participate in the study. Faculty and staff at the school were contacted through email and flyers. Flyers were published in the school and in the internal magazine. During the study, subjects were referred to the principal investigators by colleagues, other dentists attending courses at the school, faculty or staff working at the school, or by subjects enrolled prior. Our “control” group were subjects who participated in the study and had a RDI < 15 and were recruited by the same means: flyers, mail, and oral communication. A subject with an RDI > 5 and RDI < 15 can be considered to have mild sleep apnea. Data such as age, gender, body mass index (BMI), race/ethnicity, and self-reported blood pressure were recorded. Each participant answered the Berlin questionnaire [3] and ARES questionnaire (ABM, Carlsbad, CA, USA) [13]. The answers to the Berlin and ARES questionnaires were only based on participant’s self-assessment.

After signing informed consent, all subjects had a two-night baseline ambulatory sleep study (ARES Unicorder, ABM, San Diego, CA, USA) to assess the outcome (OSA cases defined by an RDI ≥ 15 events per hour). Table 1 presents descriptive statistics of OSA and control groups. This study was approved by the Institutional Review Board of the University of Southern California HSC-051050.

Table 1 Descriptive statistics for OSA and control groups

Devices and software

In this study, two-night ambulatory somnographic studies for assessment of OSA were performed with the ARES Unicorder (Advanced Brain Monitoring) for all subjects. The ARES Unicorder measured oxygen saturation, pulse rate, airflow, respiratory effort, snoring levels, head movement, and head position from a wireless recorder self-applied with a single strap to the forehead. The reliability of ARES system has been evaluated in two studies and compared to polysomnography in the lab. The diagnostic sensitivity of in-lab ARES for diagnosing OSA using an RDI cutoff of 15 per hour is 95% and specificity is 94%, with a positive likelihood ratio (LR+) = 17.04 and negative likelihood ratio (LR−) = 0.06, according to a prior study [18]. For in-home ARES data, the sensitivity was 85% and specificity 91% (LR+ = 9.34, LR− = 0.17) [18]. In a second study, the concurrent in-laboratory comparison yielded a sensitivity of 97.4, a specificity of 85.6, a positive predictive value of 93.6, and a negative predictive value of 93.9; in-home comparison sensitivity, specificity, positive predictive value, and negative predictive value were 91.5, 85.7, 91.5, and 85.7, respectively [19].

Methods

All subjects were interviewed by one blinded operator and answered to the Berlin and ARES questionnaires.

The Berlin questionnaire has 11 questions (Appendix). One introductory question and four follow-up questions concerned snoring, witnessed apneas, and the frequency of such events. Three questions addressed daytime sleepiness, with a sub-question about drowsy driving. One question asked for a history of high blood pressure. Patients were to provide information on age, weight, height, and sex. BMI was calculated from the self-reported patient information on weight and height. Scoring the Berlin questionnaire for each subject as “high risk” or “low risk” is summarized as follows.

Scoring

A subject will have a “high risk” score if two or three of the following categories are scored at risk:

  • ✓ Category 1 (questions 1–5): In category 1, a positive score for risk is defined as frequent symptoms (i.e., “more than three to four times per week” or “almost every day”) in the questions about snoring (q3) and witnessed apneas (q5).

  • ✓ Category 2 (questions 6–9): In category 2, a positive score for risk was frequent symptoms in two or more questions about awakening sleepy (q6), wake time sleepiness (q7), and/or drowsy driving (q8–9).

  • ✓ Category 3 (questions 10–11): In category 3, a positive score for risk was defined as a self-report of high blood pressure and/or of height/weight information giving a BMI of >30 kg/m2.

The scoring system has been described in [3]. For each subject, we obtain a “high risk” (score ≥ 2) or “low risk” (score < 2) score for OSA.

The second questionnaire (ARES) is one-page long and includes age, gender, height, weight and neck size, diagnosis of diseases associated with risk for OSA (i.e., high blood pressure, heart disease, diabetes, or stroke) or prior diagnosis of OSA, the Epworth sleepiness scale [15], and a five-scale response to the frequency rating for snoring, waking up choking, and having been told that he/she stopped breathing during sleep [13].

Statistical analysis

The quantitative distribution of returned questionnaires, individual patient variables, and results of ambulatory sleep monitoring were expressed by descriptive statistics (frequencies, means and standard deviations, and range). Differences between OSA and control groups in terms of gender were evaluated using the chi-square test. Differences between groups in terms of mean age, BMI, neck circumference, and RDI were assessed with the independent t test. The overall apnea index did not pass the formal normality test (Kolmogorov–Smirnoff), so Wilcoxon rank sum test was used to compare the two groups. Chi-square test was used to assess the association between ARES or Berlin questionnaire and OSA status (RDI ≥ 15). The association between the pair of dichotomous variables was assessed by McNemar’s paired test. 95% confidence intervals and odds ratios were provided. Statistical analyses were performed using SAS System for Windows (version 9.0 or later, SAS Institute, Cary, NC). A p value <0.05 was used to determine statistical significance.

A priori sample size calculations

Assuming that 32.3% of subjects will be classified as high risk by the Berlin questionnaire [3] and based on prior data showing that the unadjusted odds ratio for an RDI > 30 given high-risk status is 5.37, we calculate a p2 = 0.72 and a sample size of 24 cases and 24 controls to provide 80% power with a 5% two-sided significance level. A sample size of 53 cases and 31 controls would provide us with 95.6% power. At the time of the design of the study, no preliminary ARES data were available to compute the a priori sample size.

Results

Descriptive statistics

Table 1 presents the subjects’ gender and mean age, BMI, neck circumference, as well as a summary of somnographic variables (respiratory disturbance index and overall apnea index). The two groups were statistically significantly different in mean age, BMI, and neck circumference, with the cases older, heavier, and with thicker necks than controls as expected. The control’s mean age was 10 years lower than the cases. Cases were mostly males (44/53) and 2.7 kg/m2 heavier than the controls. Control’s mean neck circumference was 1.5 in. narrower than cases. At baseline, the two groups were statistically significantly different in apnea severity as measured by the RDI (p < 0.0001) and apnea index (p < 0.0001). This is to be expected as OSA cases were patients with an RDI ≥ 15 per hour; otherwise, the subject was deemed a control. There were no statistically significant differences in race (Table 2) or ethnicity distribution (Table 3) between the two groups.

Table 2 Comparison of race distribution between OSA and control groups
Table 3 Comparison of ethnicity distribution between OSAs and controls

In this study, as reported in Table 4, we found a significant association between a “high risk” ARES questionnaire and OSA status (p = 0.0002) and a “high risk” Berlin questionnaire and OSA (p = 0.04). The sensitivity, specificity, and positive and negative predictive values for the two questionnaires against ambulatory somnographic data are presented in Table 5. We repeated the analysis with two predetermined cut points for OSA (RDI ≥ 15 and RDI ≥ 10). The ARES questionnaire had higher sensitivity, lower specificity, similar positive predictive value (PPV), and higher negative predictive value (NPV) than the Berlin questionnaire. Both questionnaires were significantly associated (p = 0.0001, Table 6).

Table 4 Screening for OSA with the Berlin questionnaire and the ARES questionnaire
Table 5 Sensitivity, specificity, and positive and negative predictive values for two questionnaires against ambulatory somnographic data
Table 6 Association between ARES and Berlin questionnaires

Discussion

Demographics

The goal of this study was to compare the sensitivity and specificity of two questionnaires to identify patients at risk for obstructive sleep apnea in adult patients and controls. The two groups of subjects were recruited in the same fashion by mail, flyers, and oral communication at a private dental office and the University of Southern California School of Dentistry. Every subject who qualified was recruited regardless of their race, gender, age, or BMI, and the final classification of their OSA status was based on a standard medical classification (RDI ≥ 15 per hour) based on a ambulatory somnographic assessment independent of the investigator by a blinded certified sleep physician, so investigator bias is not expected. Cases and controls were unmatched by age, gender, and BMI as in most of prior studies [20, 21]. One of the few large case–control studies had patients matched by gender and race, but not age or BMI [22]. Cases were mostly males as expected (with prevalence of OSA at least double in males than females [23]). Cases were 10 years older than controls, which is not ideal but consistent with prior studies by Mayer et al. [21] (cases 5 years older than controls) and Okubo et al. [20] (12 years older). Cases had a larger BMI than controls by 2.7 kg/m2 as in Mayer et al. (3 kg/m2). That is a better matching than the study by Schwab who had a BMI difference of 10 kg/m2 [22]. Sample size was relatively large compared to prior studies except Mayer’s and Schwab’s, but those used MRI as the imaging modality. In this study, there were no significant differences in race or ethnicity between the two groups, with a large majority of our subjects white and non-Hispanic.

Screening for OSA

Because of the study design (case–control study), we cannot compute prevalence or incidence of OSA; however, the odds ratios are a very good estimate of the relative risk if the cases in the study represent the cases in the general population and the controls in the study represent the controls in the general population. In this study, a subject having a “high risk” ARES questionnaire (which includes demographic data, medical history, and the Epsworth questionnaire) was 7.9 times more likely to have OSA than a subject with “low risk” or “no risk” score. For a typical screening tool, sensitivity is the most important accuracy criterion. However, to convince employers to screen for OSA, specificity is also important because of the costs associated with false positive cases. In a prior publication by the company, it was reported that the ARES algorithm for assigning OSA risk provided a sensitivity and specificity of 94% and 79%, respectively (PPV = 91%, NPV = 86%) [13], compared to the Berlin questionnaire which reported a sensitivity and specificity of 86% and 77% for predicting an RDI >5 [3]. In our sample, using a clinical criteria of RDI > 10, the ARES questionnaire had a sensitivity of 87.7%, specificity of 57.9%, a positive predictive value of 87.7%, and negative predictive value of 57.9% compared to 67.7%, 68.4%, 88%, and 38.2%, respectively, for the Berlin questionnaire. Though prior studies by other groups were based on different populations, sleep studies and not ambulatory somnography, and our sample is smaller, our results confirm that the ARES questionnaire has better sensitivity, less specificity, similar PPV, and better NPV than the Berlin.

Limitations

Though there were no significant differences in race or ethnicity between the two groups and all races and ethnicities were welcomed, most of the subjects enrolled were white non-Hispanic, so the results in this paper might not be generalized to other races or ethnicities or to patients younger than 32 or older than 80 years. Recruitment was limited by design and financial considerations to Dr. Clark’s patients or faculty/staff/students or family members at USC School of Dentistry as opposed to the general population for OSA cases and controls. Most of the subjects were recruited by direct mail or flyers, but toward the end of the study, the subjects referred their friends or family members to participate in the study. All subjects who had a cone-beam CT and had at least two family members were included in the study regardless of age, gender, and BMI. Because the results are based on self-report and this is a case–control study, there is a potential for recall bias of the exposure. Cases may be more likely to recall and report snoring loudness, apneas, or high blood pressure than controls because they might think more about their sleep quality and sleep patterns. This could result in positive bias. Due to the fact that only one blinded interviewer (RE) conducted all interviews, we do not expect observer bias for the Berlin or the ARES questionnaire. The board-certified sleep physician creating the sleep report was blinded to outcome status, so we do not expect observer bias in those measurements.

This study may suffer from self-selection bias. Persons with a specific combination of exposure (family history or craniofacial anomalies) and outcome (sleep apnea) may self-select themselves to participate in a case–control study. It could create bias if those who participated were different from those that did not in terms of exposures analyzed in the study.

Conclusions

In conclusion, in this group of patients, the ARES performs better than the Berlin questionnaire screening for OSA patients except in its ability to identify correctly individuals who truly do not have the disease. This could be explained because the test has been tailored to screen patients with an RDI ≥ 5; however, we do not have enough patients below five to conduct that comparison. In conclusion, the ARES questionnaire is a better choice than the Berlin questionnaire in this dental setting; however, the Berlin questionnaire is publicly available and the ARES screener is proprietary.