Introduction

Obstructive sleep apnea (OSA) is highly prevalent in the patient population evaluated for bariatric surgery. Reported prevalence for OSA in this population range as high as 60–83% [13]. Haines et al. showed an overall OSA prevalence of 83% in 349 patients screened for bariatric surgery that were thought to be high risk with Epworth Sleepiness Scale (ESS) > 6 or after a clinician examination.

The primary goal of our study was to use demographic and questionnaire data (without polysomnography (PSG)) to identify a subgroup of patients who were evaluated for bariatric surgery and did not have OSA. We hypothesized that, by utilizing preoperative OSA symptom questionnaires in combination with menopausal status and body mass index (BMI) data, we would be able to predict which bariatric surgery candidates did not have sleep apnea without the use of PSG. In addition, we hypothesized that preoperative OSA symptom questionnaires in combination with menopausal status and BMI data would allow us to predict which subjects had severe OSA (apnea–hypopnea index (AHI) > 30). We evaluated the utility of the multivariate apnea prediction (MAP) score as a tool to predict the relative risk of sleep apnea in a population of patients considered for bariatric surgery. In addition, we developed our own prediction models for sleep apnea (and different severities of sleep apnea) using age, BMI, gender, and OSA respiratory symptom score (obtained from the MAP questionnaire). The results of this study were presented in part, in poster format, at the American Thoracic Society meeting in Toronto, Canada, May 2008.

Materials and Methods

Subject Population

Three hundred forty-two consecutive patients were referred for overnight polysomnogram at the Penn Sleep Center, University of Pennsylvania, PA, USA. PSG was performed irrespective of clinical symptoms or findings suggestive of OSA.

Polysomnography

All patients underwent standard overnight PSG, which included electroencephalography (EEG), electrocardiography, pulse oximetry, chin and tibial electromyography, and nasal and oral airflow monitoring. The PSG variables were recorded by Sandman Elite® version 7.2 software. Polysomnograms were scored by technicians and interpreted by sleep physicians according to established criteria [4]. The apnea–hypopnea index was defined as the number of hypopneas and apneas occurring per hour of sleep. An obstructive apnea was defined as the absence of airflow for a period of 10 s or longer despite ongoing respiratory effort. A hypopnea was defined as a 30–99% decrease in airflow or respiratory effort associated with a reduction of O2 saturation by at least 3% or an EEG arousal. Sleep data parameters, including total sleep time, rapid eye movement (REM) time, sleep-onset latency, arousal index, sleep efficiency, REM latency, AHI, and REM AHI, were determined. The severity of OSA was classified as follows: mild apnea 5 < AHI ≤ 15, moderate apnea 15 < AHI ≤ 30, and severe apnea AHI > 30.

Questionnaire Data

Subjects were asked to complete questionnaire data including ESS and MAP scores at the time of initial bariatric surgery evaluation or prior to PSG and at the time of clinical sleep consultation (see “Appendix 1”) [57]. All ESS scores were collected prior to initiation of continuous positive airway pressure (CPAP). BMI and information regarding menopausal status were also collected. The ESS score was obtained from a self-reported sleepiness questionnaire that included eight questions regarding sleepiness in various situations (over the previous 4-week period) [5, 6]. The ESS score ranged between 0 and 24, with scores higher than 10 reflecting sleepiness. The MAP score ranges between 0 and 1 and was obtained from a self-reported questionnaire that incorporated answers regarding apnea symptoms (snoring, snorting, loud gasping, and cessation of breathing) as well as BMI, age, and gender [7]. The OSA respiratory symptom score was calculated from questions 1, 5, and 8 on the MAP questionnaire, which pertained to respiratory symptoms suggestive of sleep apnea (see “Appendix 2”) [7].

Menopausal Status

Questions were asked regarding menopausal status in the questionnaire administered prior to PSG. In the event of unavailable questionnaire data regarding menopausal status, we classified patients in a manner used by Dancey et al. [8], based on the median age of perimenopausal transition and median age of menopause published by McKinlay (perimenopausal median age of onset 47.5 years and median age of menopause 51.3 years) [9].

Subject’s menopausal status was classified in the following manner: (1) regular menstrual cycles, no missed menstrual cycle in last year, no absence of menstrual cycle for 60 days in last year, no history of missed two cycles in last 365 days—classified as premenopausal; (2) absence of menstrual cycle for ≥365 days—classified as postmenopausal; (3) irregular menstrual cycles and last cycle within last 365 days—classified as “irregular cycles”; (4) no data available/menstrual cycle questions not answered (<45 years classified as premenopausal; ≥45 years old, <55 years old—insufficient data to classify menopausal status; and >55 years old classified as postmenopausal); and (5) patients <45 years old on active contraception were classified as premenopausal.

Statistical Analysis

Statistical analyses were performed using computerized software (SAS version 9.1). Descriptive comparisons of subject parameter means were performed. For normally distributed data, t tests were used. After checking for equality of variances, either a pooled p value was used or one with a Satterthwaite correction (for those data with equal variances, a pooled p value was used; otherwise, the Satterthwaite corrected p value was used). Means were reported ± standard deviation. For those measures that were not normally distributed, the Wilcoxon rank sum test was used. An analysis of variance test was performed for group comparisons with more than two levels using the general linear model procedure in SAS. Significance levels of p < 0.05 were used to determine statistical significance for group differences.

Predictive models for three levels of apnea severity (AHI ≥ 5 events per hour, AHI > 15 events per hour, and AHI > 30 events per hour) were created using a logistic regression to test the association between sleep apnea and the selected factors (age, BMI, gender, MAP, ESS, menopausal status, and race). Quadratic terms and factors that were not significant were dropped from the model. Sensitivity, specificity, positive predictive values, negative predictive values (NPV), and likelihood ratios were computed for each of the three models.

Receiver operating characteristic curves were used to assess predictive utility of models to determine different levels of OSA severity (AHI ≥ 5 events per hour, AHI > 15 events per hour, and AHI > 30 events per hour). The discriminant index or c-statistic, equivalent to the area under the ROC curve, was computed from the model of predicted relative likelihoods. This provided a measure of the discriminatory capacity of the model.

Negative likelihood ratios represent the probability of an individual with AHI ≥ 5 having a negative test relative to the probability of an individual with AHI < 5 having a negative test. The formula used to calculate negative likelihood ratios is false-negative rate/true-negative rate (or {(1 − sensitivity)/specificity}). Negative likelihood ratios provide information about the usefulness of a negative test result in ruling out disease.

The study protocol was approved by the institutional review board of the University of Pennsylvania.

Results

General Demographics

Three hundred forty-two consecutive subjects underwent PSG at the Penn Sleep Center, Philadelphia, PA, USA from November 1, 2005 to January 31, 2007 as part of an evaluation prior to bariatric surgery. Our sample included 82% women and 18% men. The mean BMI (kg/m2) was 49.5 ± 10, and mean age (years) was 43.8 ± 10.9. The distribution of race was 50% white, 48% African–American, and 2% other (Asian and Hawaiian–Pacific Islander.) Population characteristics are displayed by gender and menopausal status in Table 1.

Table 1 Population characteristics by gender and menopausal status

Sleep Apnea Data

The overall sample prevalence of OSA was 77.2%. Of these, 30.7% had mild OSA; 19.3% had moderate OSA, and 27.2% had severe OSA. The prevalence of OSA was 93.6% among men and 73.5% among women.

Men had more severe sleep apnea than all women (including all subgroups based on menopausal status). Compared to the subgroup of postmenopausal women, men still had more severe sleep apnea. The mean AHI for men (48.6) with OSA was significantly greater (p < 0.0001) than the AHI for women (26.3 events per hour) with OSA. The mean AHI was lower for premenopausal women compared to postmenopausal women (25.6 events per hour compared to 29.1 events per hour, respectively)—however, the difference was not statistically significant (p = 0.06). The prevalence of OSA disease severity, grouped by gender and menopausal status, is shown in Table 1.

Table 2 depicts mean BMI stratified by OSA disease severity in addition to gender and menopausal status subgroups. In both men and women, BMI mean was greater in the groups with severe OSA compared to mild OSA; however, in men, this difference did not reach statistical significance. Among men, the mean BMI for subjects with mild OSA was 45.9 kg/m2 while the mean BMI for subjects with severe OSA was 51.5 kg/m2 (p = 0.08). The mean BMI was significantly greater (p = 0.0004) in women with severe OSA (54.6 kg/m2) than in women with mild OSA (48.8 kg/m2).

Table 2 BMI and AHI distribution across OSA category and gender/menopausal status

We used the ESS to examine sleepiness in this population. Mean ESS scores are shown in Table 1 with corresponding gender and menopausal status. The mean ESS score for the sample was 8.7. The mean ESS for subjects with OSA (9.0) was significantly greater (p = 0.04) than for those subjects without OSA (7.7). However, after controlling for gender and menopausal status, this difference was not quite statistically significant (p = 0.06). There was no significant difference in ESS mean among gender and menopausal status subgroups for the overall sample (p = 0.76). Similarly, when only considering patients with OSA, there was no significant difference in ESS mean among gender and menopausal status subgroups (p = 0.44).

ESS scores are also displayed in Table 3, stratified by presence and severity of OSA. Table 3 depicts the prevalence of sleep apnea as the ESS score decreases (as subjects report less sleepiness); 28.7% of the sample had ESS scores <6. Among patients with an ESS < 6, the prevalence of OSA was 72.4%, and the prevalence of moderate or severe OSA was 35.7%. An ESS < 6 is an arbitrary cutoff point that has been used by other authors to exclude performing PSG to screen bariatric candidates for OSA [1].

Table 3 Epworth score by AHI category

We used the MAP score to predict the relative risk of sleep apnea in this population. MAP scores are displayed in Table 4, stratified by presence and severity of OSA. Table 4 shows the prevalence of sleep apnea as the MAP score decreases. The utility of the MAP score as a tool to predict sleep apnea has been described in a general population of patients that attended a sleep clinic [7]. Table 4 demonstrates the MAP score’s potential utility as a sleep apnea prediction tool in our bariatric population. Based on the properties of the MAP predictive model, the patients least likely to have sleep apnea have the lowest MAP scores. However, two of five patients with the lowest possible MAP scores (in the range of 0.2–0.3), have OSA. Hence, even if the most stringent cutoff point was chosen, the MAP model would fail to correctly predict which subjects did not have OSA (i.e., AHI < 5 events per hour). In our population, no subjects had MAP scores of <0.2; hence, it was not possible to ascertain the predictive abilities of the MAP model in a hypothetical bariatric subject with a MAP score of <0.2.

Table 4 MAP score by AHI category

We used regression models to predict OSA. Separate logistic regression models were developed for the following three outcomes: AHI ≥ 5 events per hour, AHI > 15 events per hour, and AHI > 30 events per hour. Age, BMI, gender, and sleep apnea symptom score (MAP) showed significant associations with the three levels of OSA severity and were used in the final models. ESS, menopause, and race did not add to these models. When predicting AHI ≥ 5 events per hour, AHI > 15 events per hour, and AHI > 30 events per hour, the area under the curve (AUC) values were 0.8, 0.72, and 0.8, respectively. The AUC represents the discriminative ability of the prediction model, and an AUC ≥ 0.8 is traditionally regarded as representing excellent discriminative ability. The discriminative ability of the model reflects the model’s ability to correctly predict the outcome in question (namely three different levels of OSA: severity AHI ≥ 5 events per hour, AHI > 15 events per hour, and AHI > 30 events per hour).

Table 5 shows negative predictive values and negative likelihood ratios when predicting AHI ≥ 5. Logistic regression scores were stratified by decile. Individual logistic regression scores were calculated by inserting a set of variables (BMI, gender, age, and sleep apnea symptom score) into the logistic regression equation. As the logistic regression equation always yields a score between 0 and 1, individual scores were stratified into ten groups between 0 and 1 (deciles). Due to the nature of the model that we developed, the lowest scores, i.e., those in the lowest score deciles, represent the subjects that were least likely to have AHI ≥ 5. We would expect the negative predictive value to be greatest with the most stringent cutoff values, which occur in the lowest decile (in this model 0.2–0.299—as no individual scores were <0.2 for this model).

Table 5 Logistic regression model for outcome AHI ≥ 5 events per hour

First, only eight of a possible 258 subjects (3%) had logistic regression scores between 0.2 and 0.299, showing that this stringent cutoff is not applicable to the majority of the population and thus was not clinically useful. For example, if this model (with this stringent cutoff) was applied to a hypothetical bariatric population, only 3% of subjects would be expected not to have OSA (hence very limited clinical utility). Second, even when employing this cutoff (the most stringent cutoff possible), the corresponding negative predictive value was only 75%. In Table 5, the scores in the lowest deciles represent individuals least likely to have AHI ≥ 5; therefore, the negative likelihood ratios are highest for the lowest deciles. In other words, the false-negative rate is low, and the true-negative rate is much higher. Unlike the negative predictive values, the negative likelihood ratios are not affected by the prevalence of the outcome (AHI ≥ 5).

Tables 6 and 7 show negative predictive values and negative likelihood ratios when predicting AHI > 15 and AHI > 30, respectively. Logistic regression scores were stratified by decile. As in Table 5, the highest possible negative prediction values and highest possible negative likelihood ratios were expected with the lowest scores that were generated by the prediction models. The highest possible negative predictive value possible for predicting AHI > 15 was 82.35 (applicable to logistic regression scores that fall in the 0.1–0.2 score decile). However, only 17 of 258 possible subject scores fell in this score decile range. The highest negative predictive value possible for predicting AHI > 30 was 96.8 (applicable to logistic regression scores that fall in the 0–0.1 score decile). Of 258 possible subject scores, 62 fell in this score decile range (approximately one quarter of the study population). These data indicate that it is possible to predict (with a relatively high certainty) which bariatric subjects will not have AHI > 30 (severe sleep apnea). However, there are no data to suggest that being able to predict this preoperatively in the bariatric population has meaningful clinical utility.

Table 6 Logistic regression model for outcome AHI > 15 events per hour
Table 7 Logistic regression model for outcome AHI > 30 events per hour

Table 8 shows positive predictive values and positive likelihood ratios when predicting AHI > 30. The percentage of the population that have prediction score values within each possible decile is depicted. The highest possible positive predictive value possible for predicting AHI > 30 was 83.3 (applicable to logistic regression scores that fall in the 0.7–0.8 score decile). Of 258 possible subject scores, 12 fell in this score decile range (4.7% of the study population). In a clinical context, this means our prediction model can be used in a hypothetical bariatric population, to predict the presence of severe OSA (AHI > 30). The model can be useful as a risk assessment tool for OSA > 30 (i.e., severe apnea)—with relative risk increasing with increasing scores.

Table 8 Logistic regression model for outcome AHI > 30 events per hour (positive predictive values)

Predictive models were developed with and without respiratory symptom scores for the three possible outcomes: AHI ≥ 5 events per hour, AHI > 15 events per hour, and AHI > 30 events per hour. The models without the OSA respiratory symptom score were generated using all 342 subjects’ data while the models that utilized the OSA respiratory symptom scores were generated using data from 258 subjects (only 258 of 342 subjects had completed MAP questionnaires). The predictive models that did not utilize the OSA respiratory symptom score generated models with less discriminative abilities than those models that utilized the OSA symptom score. When predicting AHI ≥ 5 events per hour, AHI > 15 events per hour, and AHI > 30 events per hour (utilizing OSA respiratory scores, gender, age, and BMI), the AUC values were 0.8, 0.72, and 0.8, respectively (n = 258). In comparison, when utilizing only gender, age, and BMI in the development of the predictive models for AHI ≥ 5 events per hour, AHI > 15 events per hour and AHI > 30 events per hour (n = 342), the AUC values were 0.75, 0.7, and 0.76, respectively. It is evident that BMI, gender, and age are the major variables that account for the discriminative power of the predictive models. While OSA respiratory symptom score does contribute to the discriminative power of the model, the contribution is minor.

Discussion

Our study describes the characteristics of patients with sleep apnea that were evaluated for bariatric surgery. We examined various clinical and historical data (including OSA symptom questionnaires and menopausal status) in an effort to correlate these characteristics with OSA severity. This study population represented a large number of patients considered for bariatric surgery that underwent full overnight PSG to screen for OSA, regardless of suspected risk for OSA. Prior studies have demonstrated that the transition to menopause is associated with a higher risk of OSA [1012]. Our study was the first to incorporate menopausal status in addition to OSA symptom scores and BMI in an effort to predict the prevalence and severity of OSA before bariatric surgery.

Characteristics of Sleep Apnea in a Bariatric Population

Our study found a very high prevalence (77.2%) for OSA in all subjects evaluated for bariatric surgery regardless of preoperative risk for OSA. Similarly, other authors have previously shown a high prevalence of OSA prior to bariatric surgery; however, our series is the largest series to date of patients that underwent full PSG regardless of preoperative risk of OSA [1, 2, 13].

In both men and women, ESS and BMI increase with increasing OSA severity. Interestingly, the mean AHI was higher for premenopausal compared to postmenopausal women. This may be explained by the fact that the premenopausal group had a higher BMI than the postmenopausal group (51.9 ± 11.3 vs. 49 ± 8.9). However, the difference was not statistically significant (p = 0.10). Furthermore, the higher number of African–American females in the premenopausal group (African–Americans are known to be at higher risk for OSA) may have led to this unexpected result [14].

Limitations

Our study limitations are related to data loss. MAP questionnaire data (and apnea symptom questionnaire data) were available for only 74% of patients (258 of 342). However, comparison of the group of patients with available MAP questionnaires and those without available questionnaires revealed no significant difference among the groups in terms of BMI, age, gender, race, ESS, and OSA severity. The menopausal questionnaire data were not available for 77 of 279 female subjects, and for 27 female subjects menopausal status could not be determined by either questionnaire data or using age-specific cutoff (as described in methodology).

Do All Patients Who are Considered for Bariatric Surgery Require a Polysomnogram?

Debate exists whether all patients that are considered for bariatric surgery should undergo preoperative PSG evaluation and screening for OSA as opposed to only those patients with clinical history or examination that is suggestive for sleep apnea. In the study reported by Haines et al., of 597 consecutive patients considered for bariatric surgery, 150 subjects with an ESS < 6 did not undergo PSG evaluation for OSA. In addition, 98 subjects with ESS ≥ 6 were not considered to be at high risk for OSA and did not undergo PSG evaluation, based on a pulmonologist’s examination (“symptoms were mild and not attributed to airway obstruction”). The authors state in their discussion that the ESS cutoff of <6 is arbitrary [1]. Furthermore, no data exist to support the premise that a clinical history and examination, even by a sleep specialist, ensure adequate sensitivity to screen for OSA in a bariatric population.

The reason that OSA cannot be missed in the bariatric population is that unidentified and untreated OSA may lead to significant perioperative and postoperative complications. Hwang et al. assessed 172 patients with clinical examination and symptoms that were consistent with OSA, with overnight home pulse oximetry prior to surgery [15]. Fifty-seven percent of patients in this study had more than five episodes per hour of oxygen desaturations (defined by O2 desaturations > 4%). This group of patients had a higher frequency of postoperative complications (88% of the total complications) than the group of patient with less than five episodes per hour of oxygen desaturations >4% (12% of total complications). The complications were primarily respiratory and cardiovascular [15].

We assessed the MAP score as a possible tool to predict the presence of sleep apnea in the bariatric population and in addition developed our own prediction models for three levels of OSA severity (AHI ≥ 5 events per hour, AHI > 15 events per hour, and AHI > 30 events per hour). The MAP score serves as an indicator of relative risk for sleep apnea. The utility of this score, as a relative risk indicator for sleep apnea, varies in different populations. Prevalence of sleep apnea, comorbidities, and baseline characteristics of the population affects the manner in which the MAP score functions as a relative risk indicator for sleep apnea. The choice of different cutoff points for the MAP score, applied to the same population group, will result in different positive and negative predictive values for sleep apnea.

In a population of patients evaluated in sleep centers, a MAP cutoff point of 0.5 generated a positive predictive value for sleep apnea of 0.75 and a negative predictive value for sleep apnea of 0.74 [7]. Utilizing a different cutoff in the same population would result in different positive and negative predictive values for sleep apnea. In a clinical setting, the cost of a false-negative prediction is weighed against the cost of a false-positive prediction, when determining the cutoff point that would be clinically most appropriate.

In the setting of a bariatric population undergoing evaluation for sleep apnea, a false-negative result (i.e., missing a subject with sleep apnea preoperatively) may lead to deleterious consequences in the perioperative and postoperative period. On the other hand, a false-positive result would lead to a subject unnecessarily undergoing PSG to investigate for suspected sleep apnea. Therefore, the most appropriate prediction strategy for the bariatric population should employ cutoff points that maximize the negative prediction value for sleep apnea (i.e., the least possible false-negative results). Table 4 illustrates the effect of employing increasingly stringent cutoff points using the MAP model in the bariatric population. It is evident that even if the most stringent cutoff point was used (MAP score < 0.3) it is not always possible to correctly predict the presence of OSA in our bariatric population.

The logistic regression model that we developed for presence of OSA (AHI ≥ 5 events per hour) has excellent discriminative ability (evidenced by an AUC value of 0.8). However, the negative prediction values for OSA that can be generated by utilizing even the most stringent cutoff possible are not high enough to be clinically useful (highest possible NPV = 75%). Since OSA is highly prevalent in the population of patients that is considered for bariatric surgery, the pretest probability of having sleep apnea is high. As a result, the use of a prediction model utilizing age, BMI, gender, and sleep apnea symptom (as well as ESS, menopausal status, and race) score to further risk stratify this group for presence of sleep apnea did not yield adequate negative prediction values.

It is possible to use the prediction model that we developed as a tool to assess risk of severe OSA (AHI > 30) in the bariatric population. Risk is dependent on the individual logistic regression score that is generated for an individual subject based on BMI, gender, age, and apnea score. Positive predictive values and corresponding positive likelihood ratios for severe OSA (AHI > 30) are displayed in Table 8.

Summary

Our data demonstrate that the use of clinical indicators including ESS, MAP, menopausal status, gender, BMI, and age cannot reliably predict the presence of OSA in the population that is considered for bariatric surgery. Establishing a diagnosis of OSA prior to bariatric surgery has important implications for perioperative airway management as well as potential postoperative pulmonary complications. As routine perioperative and postoperative management of bariatric surgery subjects includes anesthesia and analgesia, it is likely that the severity of OSA worsens during the perioperative and postoperative course. Therefore, it is important to identify even mild OSA (5 ≤ AHI < 15) prior to bariatric surgery. The use of a prediction strategy for OSA that is less than 100% sensitive in this high-prevalence population could have deleterious implications for perioperative and postoperative morbidity. Portable sleep studies may have a future role in the diagnosis of OSA in the bariatric population, but at present data are lacking to support their use in lieu of full PSG in this setting.

Conclusion and Guidelines

As the prevalence of OSA in all patients screened for bariatric surgery in our population was greater than 77%, we advocate routine PSG testing for all patients that are undergoing weight loss surgery. PSG testing is recommended for all bariatric surgery candidates approximately 8–12 weeks prior to the planned surgery. Those candidates that are diagnosed with OSA should start CPAP therapy at least 6 weeks prior to surgery (this will give them enough time to acclimate to CPAP). CPAP therapy for patients with OSA should be used in the postoperative period to prevent airway complications at settings determined during the preoperative PSG. OSA will worsen in the immediate postoperative period secondary to anesthetics/narcotics, being supine, and airway inflammation from the endotracheal tube. Therefore, patients with OSA having bariatric surgery should be extubated to CPAP and the extubation should not be performed until the patient is awake. Furthermore, CPAP therapy should be continued after bariatric surgery for 1 year. With weight loss from the bariatric surgery, the patient’s OSA should improve (however, the patient should remain on CPAP until this improvement is documented with a follow-up sleep study). Therefore, repeat PSG should be performed 1 year after the bariatric surgery in order to reevaluate the severity of OSA and determine the ongoing need for CPAP therapy.