Introduction

Moderate or severe obstructive sleep apnoea (OSA) (apnoea–hypopnoea index (AHI) ≥15) is estimated to be prevalent in 7 % of middle-aged adults [1]. The gold standard methodology for diagnosis of OSA, laboratory-based polysomnography (PSG), is a common method for determining case and control status with respect to OSA. However, PSG is logistically difficult and expensive to perform in large samples [2]. For this reason, portable monitoring [3] and questionnaires [4, 5] are also used within research to characterise OSA cases and controls. Case–control studies do not require the complete absence of the disease in the controls to properly interpret the findings; they simply require the control sample to be representative of the source population of the cases [6]. Yet, control samples with substantial undiagnosed OSA would bias conclusions toward the null hypothesis. The extent of undiagnosed OSA in the general population has been estimated in previous decades to represent 5 % of adults [79]. Given this high prevalence, a simple method that allowed the control group to be refined to one with lower OSA prevalence without undue reduction in sample size could benefit future case–control studies.

Questionnaires offer simplicity and have been widely used to screen for OSA [10]. The Berlin questionnaire (BQ) has been used extensively within hospital and primary care to identify probable cases of OSA [1113]. It has also been used on the general population to identify probable controls with a low risk of OSA [4, 5]. It captures self-report of three symptom-based categories of OSA risk: (1) history of snoring or breathing pauses; (2) daytime tiredness, fatigue and driving risk; and (3) hypertension or obesity. High risk of OSA is attributed to anyone with persistent and frequent symptoms in two of these three categories [3]. Validation of the screening properties of the BQ in primary care [3], hospital [13, 14], sleep clinic [12] and general population samples [15, 16] has found the instrument varies by population with sensitivities ranging 37–86 % and specificity 43–95 % [15]. Notably, performance within a general population construct was sub-optimal [15, 16].

Missing data may be one reason why the BQ performs poorly in a general population. In the one validation study of the BQ, in a Norwegian general population sample, non-response was classified by default as “low risk” within the category [15]. Using this logic, cases were identified by confirmation of the presence of symptoms only. In comparison, probable controls were identified by confirmative absence of symptoms or by non-response. When non-response is high, using the scale in this manner introduces bias, which is minimal to the post-test probability of “high risk” of OSA by BQ but would be of greater importance in the post-test probability of being “low risk” of OSA. Therefore, if the BQ were to be used within a general population sample for the purpose of predicting OSA controls, as opposed to OSA cases, then individual questions rather than high- and low-risk classifications may prove more discriminatory. These considerations are important in the optimal design and interpretation of case–control epidemiological and genetic studies of OSA. This is particularly so given the general lack of availability of “control” samples that have been objectively screened for OSA using polysomnography.

Accordingly, this study had three aims: first, to estimate the prevalence of undiagnosed moderate–severe OSA in an Australian general population using overnight nasal airflow data from the single-channel portable monitoring device with an AHI cut-off of ≥15; second, to evaluate the capacity of the BQ to identify those with and without moderate–severe OSA in the general population; and third, to investigate whether component questions of the BQ, rather than the complete questionnaire, had greater discriminatory capacity to identify such subjects.

Methods

Sample population

The sample comprised 793 participants from the 2005/2007 Busselton Health Study, which is a cross-sectional study of 2,935 residents drawn randomly from the electoral roll in the shire of Busselton, Western Australia [17]. The 793 subjects were recruited consecutively during the physical evaluation component of the larger study and underwent overnight single-channel (nasal airflow) studies (ApneaLink™, ResMed Limited, NSW, Australia). Exclusion criteria included data capture from the device of less than 2 h duration or prior diagnosis with or current treatment for OSA, which was central to the first aim of the study.

Sleep parameters

Overnight nasal airflow data from the single-channel portable monitoring device were scored according to automatically generated default criteria for the ApneaLink™ (firmware v3.49, scoring software v6.00, ResMed, NSW, Australia). Specifically, an apnoea was defined as a decrease in airflow to 0–20 % of baseline values for at least 10 s, a hypopnoea defined by a decrease in baseline airflow of 50–80 % for at least 10 s. Minimal amplitude threshold for snoring was 6 %. A duration range of 0.3–3.5 s was set for snoring events.

Subjects were instructed to activate the device immediately prior to “lights out” in preparation for sleep and to turn it off on arising from sleep in the morning. The AHI was derived from the mean number of apnoeas and hypopnoeas per hour during the evaluation period.

Berlin questionnaire criteria

The BQ was coded two different ways (methods A and B) as follows:

  • Method A utilized questionnaire responses based on the original publication and validation in a primary care sample [3] using the coding protocol that was published as an appendix by Hiestand et al. [4]. Briefly, component questions were grouped into three categories in order to allocate an overall high or low risk for OSA. Category 1 contained five questions that captured snoring history—high risk was attributed to individuals reporting at least two of the following characteristics: frequent snoring (more than or equal to three to four times per week), very loud snoring, whether others were frequently bothered by the snoring or frequent breathing pauses during sleep (more than or equal to three to four times per week). Category 2 contained questions that captured daytime symptoms—high risk was attributed to individuals reporting at least two of the following three characteristics: self-report of frequent daytime sleepiness, unrefreshed sleep and drowsy driving. Category 3 captured common co-morbidities of OSA—high risk was attributed to those with a BMI >30 kg/m2 or with hypertension. In this study, hypertension was defined as self-report of current treatment for hypertension or by a systolic blood pressure of >135 mmHg or a diastolic blood pressure of >85 mmHg, which was measured by research staff on the morning following overnight monitoring of sleep. An overall high risk of OSA was assigned to individuals considered high risk in two of the three categories. All other responses, including missing data, were coded as “low risk” by default.

  • Method B examined the screening properties of the constituent questions of the BQ by themselves, rather than using aggregated risk in categories. Missing responses were removed from the analyses. One further deviation from the traditional use of the BQ was considered for the questions on snoring frequency and breathing pauses. Traditionally, low risk was assigned to snoring or breathing pauses less than or equal to one to two times per week; however, we considered whether different thresholds for low risk (never and less than or equal to one to two times per month) could better capture control status. Following on from using individual questions, combinations of two questions (regardless of category) were then considered. Some combinations of three questions were investigated.

Statistical analysis

To address the first aim of the study, age-adjusted gender-specific prevalences were calculated as the sum of age-specific prevalence in the reference population weighted by 2007 Australian population estimates [18]. Mean, standard deviation, median and interquartile range (IQR), χ 2, Student's t test and Mann–Whitney U test were used to describe the populations. The odds ratio, sensitivity, specificity, positive and negative predictive values (PPV and NPV), positive and negative likelihood ratios and post-test probabilities of OSA were calculated. Data were analysed using PASW Statistics GradPack, version 17.0, and Microsoft Office Excel 2007.

Results

Sample characteristics

The mean age of the sample population who underwent overnight monitoring (n = 793) was 54 (±16) years. Males comprised 48 % of the sample, as did smokers. European ancestry was reported by 97 % of the sample. Median alcohol intake was 6.6 standard drinks per week (IQR, 3.6–11.1), and subjects undertook a median 5 h of physical activity per week (2.1–9.6). In these respects, the sample was statistically similar to the larger general population health study from which it was drawn. The subset with overnight Apnealink monitoring data had a statistically significant but not clinically meaningful greater BMI (27.5 ± 4.7 vs 26.9 ± 4.7 kg/m2). Waist circumference was not different between the subset and the general population sample (92.1 ± 13.1 vs 91.4 ± 13.5 cm). Median AHI was four events per hour of monitoring (IQR, 2–10). The proportion with moderate or severe OSA (AHI ≥ 15) was 15 %.

Age-standardised prevalence of undiagnosed OSA

The age-standardised prevalence estimate of moderate–severe OSA was 9.1 % (12.4 % in men and 5.7 % in women). It was absent in women under 40 and men under 35 years.

Validation of screening properties of the Berlin questionnaire

Diagnostic accuracy of the BQ (METHOD A) for an AHI ≥15 is reported in Table 1. Test sensitivity was 53.8 %, and specificity, 70.4 %. Amongst subjects classified as high risk, the PPV was 24 %. Among subjects classified as low risk, NPV was 89.8 %. The corresponding positive likelihood ratio was 1.82, and the negative likelihood ratio was 0.66 (Table 1).

Table 1 Diagnostic accuracy of the BQ

Utility of component questions of the Berlin questionnaire

Figure 1 demonstrates the pre-test and post-test probabilities for AHI ≥15 and <15. The pre-test probability of AHI ≥15 was 15 % (95 % CI 12.5, 17.5 %). The post-test probability for AHI ≥15 in those deemed high risk by BQ was 24 % (95 % CI 20, 28 %). One constituent question, reporting apnoeas on more than two nights per month, performed significantly better than the complete BQ (post-test probability of AHI ≥15—38 %; 95 % CI 29, 48 %). Pre-test probability of AHI <15 was 85 % (95 % CI 83, 87 %). One constituent question, absence of hypertension, had a significantly higher post-test probability (94 %; 95 % CI 94, 96 %) than the post-test probability estimate of low risk by BQ (90 %; 95 % CI 88, 92 %).

Fig. 1
figure 1

Estimated probabilities of moderate–severe OSA (AHI ≥ 15) and not having moderate severe OSA, before and after completing the complete BQ and its constituent questions

Some questions were not individually helpful as screening items for either cases or controls: questions about snoring volume, unrefreshed sleep, daytime fatigue and never having fallen asleep driving and BMI. The absence or low frequency of apnoeas did not increase post-test probability of AHI <15 beyond that of the BQ low-risk post-test probability. Post-test probability after reporting refreshed sleep, nil fatigue, never drowsy driving and BMI in the non-obese was similar to pre-test probability of AHI <15.

The utility of combinations of questions in screening for controls

Only one combination of questions (being normotensive or snoring less than or equal to one to two nights per month) had post-test probabilities comparable to that of the post-test probabilities of the individual questions. The post-test probability of an AHI <15 for this combination was 92 % (95 % CI 90, 93 %: Fig. 1). Other combinations of questions had poor discriminatory capacity for AHI <15 and were therefore not reported.

Table 2 outlines how screening for the absence of moderate–severe OSA using snoring frequency, blood pressure measurement and the combination of these two measures changed the group characteristics in the total cross-sectional general population sample (n = 2,935). All three screening criteria produced samples with a greater proportion of female subjects and reduced mean age, BMI and waist circumference in comparison with the source population. The greatest sample size was obtained from the combined question criteria.

Table 2 Characteristics of the reference population and subgroups obtained using three screening questions: snoring frequency, blood pressure and snoring frequency or blood pressure

Discussion

This study found that the prevalence of undiagnosed moderate–severe OSA in a general population sample from Western Australia was 9 %. This study is the third to estimate screening properties of the BQ in a general population sample and the first using an Australian sample. Our findings were in accordance with those from a Norwegian general population sample [15]. Sensitivities of 54 % imply that approximately 50 % of persons with undiagnosed OSA among the general population can be identified by questionnaire. Specificity of the Berlin questionnaire (70 %) was slightly lower than that seen in the Norwegian population (80 %). In contrast, a small general population sample from Korea (n = 101), where direct interview rather than self-completed paper-based method was employed, suggested higher sensitivity of 89 % but lower specificity (63 %) [16].

We postulated and investigated research utility for the individual questions in the BQ to optimise screening for “control” subjects with an AHI <15. Questions on snoring frequency and hypertension had significantly greater post-test probability of AHI <15 than that of the complete BQ. By using a combination of snoring less than two nights a month or nil hypertension, the post-test probability of AHI <15 was comparable to that of the individual questions and comparable to, but not significantly greater than, low risk from the complete BQ. This combination of questions reduced the general population sample size by 36 % when using a cut-off of AHI <15 and thus represented the most efficient way to use the BQ to screen a general population for the probable absence of OSA.

Defining controls as having an absence of moderate or severe OSA (i.e. AHI < 15) means that individuals with a milder form of the disease would still be included as controls. The disadvantage of doing this might be offset by the fact that sleep-disordered breathing with AHI ≥15 is more symptomatic and treatable than milder forms [19], and some studies may wish to remove the majority of “worst offenders” amongst undiagnosed OSA. As such, questions on snoring frequency and hypertension provide a useful method for reducing the proportion of undiagnosed moderate–severe OSA in a general population, where post-test probability of AHI <15 was 91 and 94 %, respectively. However, this is done at the expense of sample size, and researchers must consider the relative merits of sample size and misclassification.

Nevertheless, validation of methods to identify both cases and controls contribute important information for the design of research studies. The BQ high- and low-risk categories have previously been used in the general population as epidemiologic proxies of sleep-disordered breathing and controls [4, 5]. The BQ could be used within the context of candidate gene studies of OSA since they require large control samples that are representative of the general population to detect small effect sizes from genetic loci. Screening for undiagnosed OSA would then optimise the power of these studies to detect associations. Yet, candidate gene studies of OSA exemplify the range of control selection strategies employed in peer-reviewed research, often without validation. The gold standard is PSG to ascertain case or control status [2024], but others have used personal and familial history to determine “healthy sleepers” [2528] or a history of hypertension [21, 29], and some have used no selection method at all [3033].

The BQ was designed as an instrument within a primary care population and not for an epidemiologic approach in a general population. Its symptom-based approach was intended to identify presence of OSA, not absence of OSA, and therefore, it is unlikely to have a high negative predictive value. Other methods, such as portable monitoring and actigraphy, represent better modalities for screening general populations for disordered breathing.

Limitations

Objective sleep measurements were collected in the sample by home-based monitoring using a single-channel (nasal airflow) sleep apnoea screening monitor, which uses length of study as a proxy for sleep duration and has a less sensitive definition of respiratory events than PSG. Portable monitoring devices are known to underestimate AHI in comparison with PSG-derived AHI defined using the Chicago criteria [19, 34] but are likely to be closer to the recommended definition from the American Academy of Sleep Medicine [35]. Agreement between PSG-derived AHI and ApneaLink-derived AHI is high and maximal at an AHI threshold of 15 events per hour (sensitivity, 90.9; specificity, 94.6; PPV, 90.9; NPV, 94.6; AUC, 0.98) [36].

Conclusion

Our findings: (1) indicate that the prevalence of undiagnosed moderate to severe OSA is high in the Western Australian general population, (2) confirm that the BQ has sub-optimal screening properties for OSA within a general population but (3) demonstrate that snoring frequency and hypertension are associated with a higher post-test probability of AHI <15 than the complete BQ. This combination represents a simple and economical screening tool to identify OSA controls from a general population with a high prevalence of undiagnosed moderate to severe OSA, albeit with reduction in sample size. Future epidemiological and genetic case–control studies of OSA using population-based controls should account for undiagnosed OSA in their methodology.