Background

Obstructive sleep apnea (OSA) is a condition where intermittent obstruction of the airway during sleep causes sleep fragmentation and repeated desaturations [1]. Untreated sleep apnea in patients with severe apnea causes a 46% excess mortality over 8 years, an increase attributed chiefly to increases in cardiovascular mortality [2]. OSA has been associated with impaired RV function despite normal systemic pressures [3]. The prevalence of OSA was estimated from large population studies such as the Sleep Heart Health study at up to 18% [4], although overt prevalence is reported at 4% of men and 2% of women [1]. In certain populations, a much higher prevalence has been reported; for example, in patients with acute myocardial infarctions, some studies have reported up to 65% prevalence [5]. Part of this difference in prevalence is one of underdiagnosis: it is notoriously difficult to diagnose OSA clinically. Much of the difficulty lies in problems of accessibility: the gold standard for diagnosis of OSA remains the in-laboratory, monitored polysomnography [6] (PSG), yet this is a cumbersome and expensive test. In an era of cost containment, it is simply not practical to send everyone for a full PSG, and even if cost were not an issue, long waiting times at the sleep laboratory are a frequent issue as well. Ideally, patients should be screened before being referred for a PSG, but it has proven unexpectedly difficult to accurately predict the presence or absence of OSA based on clinical parameters [7]. One alternative available is for partial or home sleep studies. Various partial sleep study devices have been on the market, but under current American academy of sleep medicine guidelines, they are meant to be used in patients with high pretest probabilities only and in conjunction with a comprehensive sleep evaluation; negative studies using these devices should still be followed up with a full in-laboratory PSG [8]. Another approach is for scoring systems to try and improve clinical prediction of OSA.

Multiple scoring systems based on clinical characteristics [913] have been designed and tested and mostly have been shown to have reasonably good sensitivity but poor specificity; that is, they may have reasonable utility for exclusion of OSA, which makes them suitable for use in a primary care setting [14]. However, many of these rules involve complicated mathematical contortions and are not designed to be remembered off the cuff, making them unattractive for general use by physicians outside the sleep medicine field and limiting their widespread usage.

The STOP-BANG questionnaire is a simple 8-point tool for screening patients who might have OSA. It has previously been validated for screening of preoperative surgical patients, showing a sensitivity of 92.9% in the validation cohort [15]. The good sensitivity, along with a snappy acronym and ease of use, makes this an attractive candidate for a more generally usable tool for general screening of patients for OSA.

We sought to test its validity in a more high-prevalence population such as in patients referred to our Sleep Disorders Unit. We also wished to establish the veracity of the equation’s prediction rule as well as the cutoffs used in this prediction rule to score body mass index (BMI) and neck circumference in an Asian population, especially in light of previous data showing that Asians may have sleep apnea even at lower BMI [16].

Materials and methods

This study was performed at the Sleep Disorders Unit of the Singapore General Hospital, the largest tertiary care hospital in Singapore. All patients undergoing diagnostic overnight PSG between November 2008 and April 2009 were prospectively included in this study. The study was approved by the Institutional Review Board of the Singapore General Hospital.

Demographics

Patient characteristics such as age and gender were recorded on admission to the sleep disorders clinic on the night before the sleep study was performed. The patient’s height and weight were also measured and recorded at this time so that the BMI (weight in kilograms divided by height in meters squared) could be calculated. The patient’s neck circumference was also measured at the level of the cricoid cartilage in centimeters and recorded by the technician on duty.

STOP-BANG questionnaire

Patients were invited to complete a simple, four-question self-administered questionnaire on the evening prior to their sleep study. This consisted of four simple yes/no questions, with one point awarded for each yes answer. Simple demographic data were also collected and one point each was added for the presence of four clinical characteristics, which were dichotomized according to pre-specified cutoffs. The originally validated STOP-BANG questionnaire used the following cutoffs for scoring: BMI > 35, age > 50 years, neck circumference >40 cm, gender = male. A score of 3 or more out of a total possible score of 8 is considered high risk for OSA (see Table 1).

Table 1 Questionnaire characteristics

Polysomnography

The nocturnal polysomnograph consisted of continuous polysomnographic (Compumedics E Series, Australia) recordings of a standard electroencephalographic montage consisting of four electroencephalograms (C3–A2, C4–A1, O1–A2, O2–A1), right and left electro-occulogram, submental and bilateral tibial electromyogram, and electrocardiogram using surface electrodes. Respiration was monitored with oronasal thermocouples and with nasal pressure transducers. Thoracoabdominal movements were monitored using Piezo-electric strain gauges (Piezo, Compumedics). Continuous pulse oximetry was also monitored. Sleep stage scoring was performed in 30-s epochs by certified registered polysomnographic technologists according to Rechtschaffen and Kales’ criteria [17]. The scoring technologists were blinded to the patient’s STOP-BANG scores. Apnea was defined as cessation of airflow for more than 10 s. Hypopneas were scored as at least 30% decrease in airflow with 4% oxygen desaturation or with 3% oxygen desaturation and arousal. The Apnea–Hypopnea Index (AHI) was defined as the total number of apneas and hypopneas per hour of sleep time.

Statistics

Association between presence/absence of moderate and severe OSA and BMI and neck circumference was determined using χ 2 test. Sensitivities, specificities, positive predictive values, and negative predictive values were calculated from 2 × 2 cross tabulation of OSA classification from PSG and OSA risk classification by STOP-BANG and their significance was assessed using Pearson’s χ 2 test. All statistical analysis was performed using SPSS statistical software version 17 (SPSS Inc, Chicago, IL, USA)

Results

Three hundred and forty-eight patients completed diagnostic overnight (both full night and split night) PSG between November 2008 and April 2009. The racial makeup of the patients comprised 76.4% Chinese, 6.1% Malay, 14.1% Indian, and 3.4% others (Caucasians and Eurasians). The proportion of Chinese and non-Chinese patients approximated that of the national (Singapore’s) population [18]. A total of 319 (91.2%) patients completed the self-administered questionnaire. The proportion of patients scoring for each point of the questionnaire is detailed in Table 1. There was no significant difference in age, gender, race, BMI, neck circumference, or actual AHI among patients who did or did not complete the questionnaire (Table 2). The vast majority of patients (333/348, or 95%) had a sleep study for evaluation of OSA; of the remaining patients, 11 were studies done for a research protocol, 3 were done as part of evaluation for narcolepsy, and 1 was done for evaluation of possible parasomnia.

Table 2 Demographics of patients who did/did not answer the questionnaire

The clinical characteristics of patients used in the validation cohort are described in Table 3. Among our patients, approximately half had at least moderate sleep apnea and one third had severe sleep apnea. Use of the STOP-BANG screening questionnaire would have identified a large proportion (73.9%) of patients as high risk for OSA, and this would have picked up 95.4% (all but 5) of the patients with severe sleep apnea.

Table 3 Patient characteristics

Cutoff for BMI and neck circumference

There was a clear and statistically significant separation of BMI and neck circumference among patients with and without OSA. The mean BMI among patients with AHI < 15 and AHI ≥ 15 was 25.6 and 30.25, respectively; median neck circumference was 37.9 cm versus 41.5 cm for patients with AHI < 15 and ≥15, respectively (p < 0.001).

Using a cutoff of BMI > 26 for a positive score increased the number of patients who would have been identified as high risk from 240 to 257, increasing the sensitivity of the STOP-BANG score for detecting moderate OSA from 91.3% to 94.4%, while decreasing specificity from 40.4% to 32.7%. An extension of McNemar χ 2 statistics showed that both the sensitivities and specificities (overall performance) of STOP-BANG at BMI cutoffs of 35 and 26 are significantly different (p = 0.0002) [19]. Using a cutoff of BMI > 30 as compared to the original BMI cutoff of 35, the number of patients identified as high risk increased from 240 to 244 but with minimal change in sensitivity or specificity, as seen in Table 5 (overall performance: p value not significant). Different cutoffs for neck circumference did not significantly change the sensitivity or predictive accuracy of the STOP-BANG questionnaire.

Discussion

Our results confirm the clinical utility of the STOP-BANG screening tool even in a high-prevalence population of patients undergoing overnight PSG at a sleep disorders unit.

For widespread use, a screening tool for OSA has to have high sensitivity and negative predictive value, such that practitioners would be able quickly to make a reasonable decision that a particular patient is unlikely to have sleep apnea and does not need a referral to a sleep physician or that he does and should be referred. In other words, it is more important for this tool to pick up most of the patients with OSA severe enough to warrant treatment, so that they can be referred on for further treatment (high sensitivity), and that with a patient who has been identified as having low risk with this tool, the practitioner can be reasonably sure that he or she is unlikely to have significant OSA (high negative predictive value). The screening tool should also be easy to remember and to score, so that one does not need a computer or calculator to risk stratify a patient. We believe that the STOP-BANG scoring system fulfills these criteria. Among our patients, a high proportion (>90%) were able to complete the questionnaire without any help.

Other questionnaires such as the Berlin questionnaire, the Wisconsin questionnaire, and the Sleep Apnea of Sleep Disorders have been available for some time but have not managed to gain traction for general use. A recent systemic review by Abrishami et al [20]. examined the use of these questionnaires and found a pooled sensitivity of 77% and specificity of 53% in patients without history of sleep disorders. The STOP-BANG questionnaire compares favorably to other commonly used screening questionnaire in terms of results, with the added advantage of remarkable ease of use.

American Academy of Sleep Medicine practice guidelines indicate CPAP treatment for OSA of at least moderate (AHI > 15) severity [21]; treatment of mild OSA with an AHI from 5 to 15 has proven more controversial, with some studies indicating minimal or at best modest benefit in terms of improving blood pressure control and poor compliance [22]. Hence, the indication for treating patients with mild OSA hinges mainly on whether they are symptomatic, especially viz., excessive daytime sleepiness. As seen from Table 4, the STOP-BANG screening tool has lower sensitivity for picking up mild OSA (sensitivity, 86%; NPV, 52%). However, we believe that based on the inconclusive evidence of treatment benefit and even of increased cardiovascular risk with mild OSA, these patients should be investigated only if they have symptoms significant enough to cause disruption in patient’s functioning, or if they show complications that are associated with OSA (e.g., young or difficult-to-control hypertension). We note also that this tool has very high sensitivity in picking up patients with severe OSA; it is these patients precisely that there is greater imperative to identify OSA; recent data from the sleep heart health study indicate that it is in patients who have severe untreated OSA that have increased mortality [2].

Table 4 Predictive parameters for STOP-BANG

The STOP-BANG questionnaire has previously been validated for screening of preoperative surgical patients. We believe that this simple and easy-to-remember formula is an ideal tool for even more general use, especially in populations that have recently been identified as having relatively high prevalence such as patients with the metabolic syndrome, or where a diagnosis of OSA is likely to have a more significant impact, such as patients who are already known to have ischemic heart disease or heart failure. It has been reported that patients with cardiac failure may not display the classic symptoms of daytime sleepiness, and hence, the cardiologist may not necessarily be alerted to the need to investigate further [23]. By educating practitioners in other fields to routinely screen for patients with OSA, we would be able to pick up substantially more of these patients.

The other issue we considered here was whether the cutoffs for BMI and neck circumference used by the original validation study would be applicable in other populations, especially in Asians who are known to have more severe OSA at lower BMI. Previous studies have suggested that a BMI of 26 be used to indicate obesity in Asians [24].

Our initial hypothesis was that lowering the BMI cutoff would improve the sensitivity of STOP-BANG, and there was indeed a statistical improvement in the sensitivity of the test using the BMI cutoff of 26 from 91.3% to 94.4%, although this also led to a significant drop in the specificity of the test from 40.4% to 32.7%. Using a cutoff of 30 for scoring the BMI, however, achieved very similar results in terms of sensitivity/specificity of the test as compared to the original cutoff of BMI > 35, as can be seen from Table 5.

Table 5 Predictive characteristics of STOP-BANG using different BMI cutoffs

It is debatable whether the change in test parameters would make changing the BMI cutoff to 26 a clinically relevant improvement to the STOP-BANG tool, as the increase in sensitivity sacrifices the specificity of the tool. The results, however, serve as a useful reminder that especially in Asians, OSA and other morbidity may be associated with a lower BMI [25]. This has been attributed to differences in soft tissue and craniofacial morphology (longer soft palate, inferior placement of base of tongue, and increased craniocervical extension) [26]. In our cohort, the proportion of patients who were identified by a change in cutoff of BMI from 35 to 30 doubled from 14% to 28% and increased further to 55% when BMI > 26 was used. We would suggest that using the World Health Organization definition of obesity [27] and taking a cutoff of BMI > 30 would be a practical compromise, which as well as makes the numbers much easier to remember (BMI, 30; neck circumference, 40 cm; age, 50 years; i.e., 30.40.50) without affecting the performance of the tool. Using different cutoffs for neck circumference did not improve the performance of the STOP-BANG tool.

The strength of our study is that we were able to test the STOP-BANG questionnaire in a reasonably sized population; however, one weakness of this study is of course that it was undertaken in a high-prevalence setting (50% of our patients had AHI ≥ 15) where its sensitivity may be expected to have been high, the corollary is of course that there is relatively low specificity (applying the STOP-BANG tool, only 78/319 patients or 29% are categorized as low risk); hence, a large number of patients will still have to undergo more formal testing. However, looking at Chung’s original validation study [15], among the cohort of surgical patients who actually underwent PSG testing, the prevalence of at least moderate OSA with an AHI ≥ 15 was remarkably similar, 39.5%. In this cohort, a very large sample (2467) of surgical patients completed the questionnaire and were invited to do an overnight PSG, of which only 416 (17%) actually did come for the PSG. The sensitivity of STOP-BANG in this cohort in detecting patients with an AHI of ≥15 and ≥30 here was 74.3% and 79.5%, respectively. This suggests that the performance of this tool is likely to be reasonable even in a much more general pool of patients, making it suitable for screening purposes.

In our patient population, the vast majority of patients had PSG to exclude sleep disordered breathing; only 15 patients had other indications such as exclusion of narcolepsy or parasomnias listed. However, it was felt that their inclusion in the study would bring some balance to an otherwise extremely high-prevalence population; that is, these patients would be expected to somewhat lower the prevalence of OSA in the test population and hence decrease the sensitivity and positive predictive value of the test. Despite their inclusion, however, the results still showed excellent sensitivity. One would expect that with use in a lower-prevalence population, the negative predictive value of the instrument would improve, although this of course will still need to be tested in a community study.

The performance of this tool among other cohorts of patients, especially among patients with cardiac disease and heart failure, remains to be elucidated.

In summary, we feel that the STOP-BANG questionnaire is a very useful screening tool that is easy to use and holds great promise as a simple screening tool for OSA. Use of a cutoff of 30 for scoring BMI in the STOP-BANG tool may simplify its use without compromising its accuracy.