Introduction

Obstructive sleep apnea (OSA) is the most common type of sleep-disordered breathing (SDB) affecting 9−25% of general adult population [1] and is characterized by recurrent obstruction of the pharyngeal airway during sleep, nocturnal hypoxemia, and excessive daytime sleepiness. Several population-based studies have reported a strong and independent association of OSA with hypertension, stroke, myocardial infarction, diabetes, long-term cognitive impairment, and increased all-cause mortality [2,3,4,5]. Despite the substantial burden of this disease, up to 80% of individuals with moderate-to-severe OSA remain undiagnosed [6].

Although overnight polysomnography (PSG) is the gold standard for diagnosing the presence and severity of OSA, its high expense, relative inaccessibility, and time consumption can delay the diagnosis and treatment of OSA. Therefore, a brief and precise screening tool of identifying patients who are at high risk of OSA and triaging them for prompt diagnosis and treatment is clinically relevant, particularly in areas with limited healthcare resources. A number of screening tests have been developed to identify high-risk patients. However, many of these screening tests are complicated and composed of a couple of subjective items, making them inconvenient to use and may increase variability in clinical practice.

The NoSAS score is a new screening tool based on five items (neck circumference, obesity, snoring, age, and sex), which was developed in a population-based cohort of 2121 subjects in Switzerland (HypnoLaus) and subsequently validated in a Brazilian cohort of 1042 subjects (EPISONO) [7]. Using a definition of SDB as apnea−hypopnea index (AHI) ≥ 20 events/h, the authors found that the NoSAS score had high negative predictive values (NPVs) of 90% and 98% in these cohorts, respectively. Moreover, it outperformed the STOP-Bang and Berlin questionnaires, as evidenced by higher area under the curve (AUC) values [7]. In a multiethnic Asian population-based cohort study, Tan et al. [8] found that the NoSAS score performed similarly to the STOP-Bang and Berlin questionnaires. All three scores had high NPVs in ruling out severe SDB. In a hospital-based retrospective study in China, Hong et al. [9] validated that the NoSAS score performed significantly better than the STOP, STOP-Bang, and ESS, and at par with the Berlin questionnaire for SDB screening. The variation in the predictive parameters among different studies may be due to the difference in sample size and discrepancies of demographic and anthropometric parameters as well as associated comorbidities of the recruited subjects. Given the important age- and sex-relevant differences in prevalence, anthropometry, and clinical presentations of OSA, we hypothesize that there is an age- or sex-specific difference in the performance of NoSAS score for predicting OSA with different levels of severity in clinical population. The objective of the present study was to investigate and compare the predictive parameters of NoSAS score among subjects with different age or sex according to the severity of OSA and validate its values as a screening tool in different populations. Additionally, we aimed to determine age- or sex-specific cutoff values to predict OSA in these populations.

Methods

Participants

This was a cross-sectional retrospective study of consecutive subjects suspected of having SDB and referred for overnight PSG at Peking University First Hospital between January 2006 and September 2017. Inclusion criteria were (a) aged ≥ 18 years, (b) total sleep time ≥ 4 h during PSG, and (c) having complete clinical information associated with NoSAS score. Patients previously diagnosed with OSA, those undergoing noninvasive ventilation pressure titration during PSG, additional diagnoses obtained throughout PSG (like central sleep apnea syndromes and sleep-related hypoventilation disorders), and all cases with technically inadequate PSG were excluded from this study. Our study protocol was approved by the Ethics Committee of Peking University First Hospital and waived the patient consent requirement.

Data extraction

The following data which were collected by a technician prior to PSG was extracted from medical records: neck circumference (NC), body mass index (BMI), snoring, age, and sex. The participants’ NoSAS scores were retrospectively calculated from the available data. The NoSAS score allocates 4 points for having a NC > 40 cm, 3 points for having a BMI of 25−30 kg/m2, 5 points for having a BMI of 30 kg/m2 or more, 2 points for snoring, 4 points for being older than 55 years, and 2 points for being male. Subjects with 8 points or higher are at high risk of OSA [7].

Sleep study

All sleep studies were conducted either on a Grael diagnostic sleep system (Compumedics, Aus.) or on an EMBLA N7000 digital system (Embla Systems Inc., Broomfield, CO, USA). The following were performed: electroencephalography, electrooculography, electromyography, snore detection, airflow monitoring, respiratory effort assessment, pulse oximetry, electrocardiography, and body position monitoring. PSG records before the year of 2012 were rescored manually by a certified technician and subsequently reviewed by a sleep physician and interpreted in accordance with American Academy of Sleep Medicine criteria [10] as the rest of the records, which define apnea as a ≥ 90% reduction in airflow from baseline for ≥ 10 s and hypopnea as a ≥ 30% reduction in airflow from baseline for ≥ 10 s, accompanied by desaturation ≥ 3% or an arousal. Both of the technician and the physician were blinded to NoSAS results. The apnea index (AI), hypopnea index (HI), and AHI were calculated by determining the total number of apnea, hypopnea, and apnea plus hypopnea per hour of sleep, respectively. The diagnosis of OSA was based on an AHI ≥ 5 events/h with more than 50% of all respiratory events obstructive. The severity of OSA was classified as follows: mild (5 ≤ AHI < 15 events/h), moderate (15 ≤ AHI < 30 events/h), and severe (AHI ≥ 30 events/h). We defined clinically significant OSA as an AHI ≥ 20 events/h, according to the NoSAS score study [7]. Indices of nocturnal hypoxemia were the following: mean SpO2, percentage of recording time with SpO2 below 90%, minimal SpO2 value recorded during sleep, and oxygen desaturation index (ODI), which was defined as the number of scored desaturations ≥ 3% per hour of sleep.

Statistical analysis

Statistical analysis was performed with Stata, version 14.0 (College Station, TX, USA). Data are presented as mean ± standard deviation or median with interquartile range (IQR) as appropriate for continuous variables, and as number (percentage) for categorical variables. Groups were compared with the chi-square test or Fisher exact test (for dichotomous variables), Student’s t test (for normally distributed continuous variables), or the Mann−Whitney U test (for non-normally distributed continuous variables). Discrimination at six AHI thresholds (5, 10, 15, 20, 25, and 30 events/h) was estimated from the AUC obtained by receiver operator characteristic (ROC) curves and compared between subpopulations (men vs. women, elderly vs. non-elderly), which may range from 0.5 (no discrimination) to 1.0 (perfect discrimination). An AUC > 0.7 was considered clinically significant. Using 2 × 2 contingency tables, the following parameters were calculated: sensitivity, specificity, likelihood ratio positive (LRP), likelihood ratio negative (LRN), accuracy (rate of correct classification), and diagnostic odds ratio (DOR). All statistical tests were two-sided, and values of p < 0.05 were considered statistically significant.

Results

Participant characteristics

From the 1627 subjects that underwent overnight PSG, 1119 (68.8%) were included in the study. The exclusions consisted of 103 with a total sleep time < 4 h, 5 with an age younger than 18 years, 186 with a previous diagnosis of OSA and/or noninvasive ventilation pressure titration during PSG, 166 with incomplete information and/or technical error during data collection, and 48 with additional diagnoses obtained throughout PSG (Fig. 1). The excluded subjects were not different from the study population in age, sex, NC, snoring, and BMI. For the study population, the median age, BMI, NC, and AHI were 58.0 years (IQR 49.0, 71.0), 26.7 kg/m2 (IQR 24.2, 29.1), 40.0 cm (IQR 38.0, 43.0), and 35.6 events/h (IQR 21.4, 55.9), respectively. There were 893 (79.8%) men and 226 (20.2%) women. Compared with men, women were older and had lower proportion of snorer; lower median BMI, NC, AHI, and AI; and less severe desaturation, but higher median HI (all with p < 0.05). There were 403 (36.0%) elderly participants (aged ≥ 65 years) and 716 (64.0%) non-elderly participants (aged < 65 years). Compared with the non-elderly, the elderly had lower proportion of men and snorer; lower median BMI, NC, and AI; and less severe desaturation, but higher median HI (all with p < 0.05) (Table 1).

Fig. 1
figure 1

Study protocol and flow diagram. OSA, obstructive sleep apnea; PSG, polysomnography; TST, total sleep time

Table 1 Demographic, anthropometric, and sleep data in different populations

Prevalence of OSA

Mild OSA was present in 13.8%, 14.2%, 12.9%, 11.8%, and 21.7% of all, non-elderly, elderly, male, and female participants, respectively. Moderate OSA was present in 24.0%, 22.9%, 26.1%, 24.4%, and 22.6% of all, non-elderly, elderly, male, and female participants, respectively. Severe OSA was present in 59.8%, 59.8%, 59.8%, 61.8%, and 51.8% of all, non-elderly, elderly, male, and female participants, respectively (Table 1). There was no statistical difference in prevalence of OSA between non-elderly and elderly. The prevalence of AHI ≥ 15, ≥ 20, and ≥ 30 events/h was significantly higher in men than in women (86.2% vs. 74.3%, 80.2% vs. 65.0%, 61.8% vs. 51.8%, respectively, all with p < 0.01).

Diagnostic value

Table 2 shows the demographic data and characteristics of the different populations classified by NoSAS score into high- and low-risk groups. NoSAS score was able to distinguish participants who were men, older, snorer, and with higher BMI and NC which allowed higher AHI, higher ODI, and more severe desaturation to be identified as were seen in the PSG data. But NoSAS score did not show significant differences in the lowest oxygen saturation between high- and low-risk groups in the elderly and men according to our PSG data and did not show significant difference in age between high- and low-risk groups in the elderly.

Table 2 Demographic, anthropometric, and sleep data (comparison between low-risk group and high-risk group in different populations)

Overall, a NoSAS score of 8 points or higher resulted in sensitivity, specificity, accuracy, and AUC for the prediction of an AHI of ≥ 20 events/h of 79.4%, 35.9%, 69.4%, and 0.63 (in non-elderly 72.9%, 46.4%, 66.8%, and 0.65; in elderly 90.7%, 16.7%, 74.2%, and 0.59; in men 84.9%, 18.1%, 71.7%, and 0.56; in women 52.4%, 76.0%, 60.6%, and 0.71, respectively) (Table 3, Fig. 2). We also calculated the performance of the NoSAS score using AHI cutoffs of 5, 10, 15, 25, and 30 events/h. The AUCs of the NoSAS score at all AHI cutoff values were significantly lower in men than in women (all with p < 0.01), while the AUCs at AHI cutoff values of 5, 15, and 30 events/h were significantly lower in the elderly than in the non-elderly (p < 0.01, 0.05, and 0.05, respectively). We found a higher LRP, a higher specificity, but a lower sensitivity with a superior performance in women (Table 3, Fig. 3).

Table 3 Performance of NoSAS score of 8 points in different populations at different AHI cutoff values
Fig. 2
figure 2

Receiver operating characteristic curves of the NoSAS score at AHI cutoff of ≥ 20 events/h in different populations. At an AHI cutoff of ≥ 20 events/h, the diagnostic performance of the NoSAS score was better in women than in men. AHI, apnea–hypopnea index

Fig. 3
figure 3

Performance of the NoSAS score in different populations. The AUC of the NoSAS score was lower in the elderly and men than in their counterparts. *Compared with non-elderly, p < 0.05. **Compared with non-elderly, p < 0.01. ^^Compared with men, p < 0.01. AHI, apnea–hypopnea index; AUC, area under the curve

In the non-elderly, a NoSAS cutoff point of 7 provided sensitivity, specificity, and accuracy for the prediction of an AHI of ≥ 20 events/h of 86.7%, 37.4%, and 75.3%, respectively. When using an age cutoff of 50 years in the non-elderly, a NoSAS cutoff point of 8 provided sensitivity, specificity, and accuracy for the prediction of an AHI of ≥ 20 events/h of 79.8%, 35.5%, and 69.6% with a comparable AUC with the conventional age cutoff of 55 years (0.65 (95% CI, 0.60−0.70) vs. 0.65 (95% CI, 0.62−0.69), p > 0.05) (Table 4).

Table 4 Performance of modified NoSAS score of 8 points with age cutoff of 50 years in non-elderly

In women, a NoSAS cutoff point of 6 provided sensitivity, specificity, and accuracy for the prediction of an AHI of ≥ 20 events/h of 85.0%, 39.2%, and 69.0%, respectively. When using a NC cutoff of 35 cm in women, a NoSAS cutoff point of 8 provided sensitivity, specificity, and accuracy for the prediction of an AHI of ≥ 20 events/h of 78.9%, 51.9%, and 69.5% with a comparable AUC with the conventional NC cutoff of 40 cm (0.75 (95% CI, 0.69−0.81) vs. 0.71 (95% CI, 0.65−0.77), p > 0.05) (Table 5).

Table 5 Performance of modified NoSAS score of 8 points with neck circumference cutoff of 35 cm in women

Discussion

The NoSAS score is a simple, efficient, and easy-to-implement score that limits the number of subjective variables to only snoring. In this study, we retrospectively analyzed the value of the NoSAS score in subjects with suspected OSA and conducted PSG monitoring in our sleep center. We observed clear age- and sex-specific differences in the performances of NoSAS with significantly better discrimination but lower sensitivity for predicting OSA in the non-elderly and women. Age- and sex-specific cutoff values reverse this imbalance.

Analysis of a questionnaire’s performance in specific populations can provide clinicians with a set of predictive parameters for different levels of OSA severity, which can be used as a crucial guide for diagnostic and therapeutic decisions. Using a similar AHI cutoff of ≥ 20 events/h, we found that the sensitivity of the NoSAS score (79.4%) was similar to those reported in the HypnoLaus (79.0%) [7] and EPISONO (85.0%) [7] cohorts and another hospital-based cohort in China (74.8%) [9], and higher than that reported in a multiethnic Asian population-based cohort (69.4%) [8]. However, the specificity (35.9%) was lower than the other four cohorts (69.0%, 77.0%, 54.1%, and 78.2%, respectively). The AUC of the NoSAS score (0.63) in our cohort was also slightly lower than the other four cohorts (0.74, 0.81, 0.71, and 0.74, respectively). Apart from differences in sampling methodology, type of sleep studies employed, and scoring criteria used, we suppose that the reduced AUC and specificity of the NoSAS score in our cohort could be partly accounted by the differences in the constitution of race, age, and sex. Compared with Caucasian and Latino, the pathogenesis of OSA in Chinese could be related to other factors such as craniofacial restriction, neuromuscular control of upper airway, or arousal threshold besides obesity [11, 12]. In our cohort, the proportion of men (79.8%) is much higher than most of the other four cohorts (48.0%, 45.0%, 77.9%, and 50.4% respectively). And the participants (59 ± 15 years) are also much older than most of the other four cohorts (59 ± 11, 42 ± 14, 48 ± 14, and 48 ± 14 years, respectively), which means that there is a much higher proportion of elderly. As our results showed, AUC and specificity of the NoSAS score in the elderly and men were not as good as those in the non-elderly and women. With higher proportions of elderly and men in our cohort, the overall performance of the NoSAS score in screening OSA was barely satisfactory. This hypothesis is supported by another large hospital-based cohort study consisting mainly of men (69.2%) with relatively older participants (mean age of 52.2 years) as well as higher prevalence of overall and severe OSA (97.1% and 64.4%, respectively) [13]. In this study, a NoSAS score of 8 points or higher provided sensitivity and specificity for predicting severe OSA of 90.1% and 29.4% with an AUC of 0.66, which were very close to ours and also lower than the other four cohorts mentioned above.

In this study, we found a higher AUC, a higher specificity, but a lower sensitivity for NoSAS score in women than in men. The sex-specific difference in the performance of the NoSAS score was also found by other researchers in bariatric, morbidly obese, and general populations [14,15,16]. A recently published population-based cohort study [16] by Bauters et al. using polygraphic data showed that diagnostic performance indices of NoSAS calculated on the overall group (men + women) overestimated the performance in both sexes separately. The sensitivity of NoSAS for an AHI > 15 events/h was acceptable in men (87.1%), but low in women (55.3%). The reverse was true for the specificity (39.9% in men, 87.4% in women) [16]. Our study, similar to previous studies [16,17,18], demonstrated that OSA was much more prevalent and severe in men than in women, and men have larger NC and BMI than women, whereas women with OSA are generally older than their male counterparts [17,18,19]. All these differences may cause the disparity in the sensitivity for NoSAS score between men and women. In addition, men generally present with typical symptoms, such as snoring and observed apnea, whereas women are more likely to present with atypical symptoms, such as depression, fatigue, and insomnia [20, 21]. Although NoSAS score limits the number of subjective variables to only snoring compared with other questionnaires, such as STOP-Bang questionnaire, the influence of sex on symptoms associated with OSA like snoring as showed in our study and others can still result in the difference in predictive performance between men and women. Thus, a better sex-specific screening algorithm or a sex-specific adjustment of the cutoff values of the scores could help to improve the predictive power in patients with different sex. An ideal OSA screening score should have a high sensitivity to avoid false-negative results, but also be specific enough to avoid referral of low-risk patients for costly and time-consuming sleep recordings. NoSAS score had higher diagnostic accuracy in women; however, its sensitivity was quite low, yielding the highest proportion of missed diagnoses, which was unfortunate for a screening score. With a decreased cutoff value of 6 points, sensitivity was significantly improved, reaching 85.0% for the prediction of an AHI of ≥ 20 events/h with an acceptable specificity of 39.2%. Given the disparity of NC between sexes, sex-specific adjustment of the cutoff values of NC of the score is also a reasonable and practical strategy because the different cutoff values for women and men can be attributed to the differences in the fat distribution and anatomical features between the sexes [17]. The median NC in women of our cohort was 36.3 cm. As our data showed, using a NC cutoff of 35 cm in women could also help to improve the sensitivity (78.9%) for the prediction of an AHI of ≥ 20 events/h with a comparable AUC with the conventional score. These optimal sex-specific NoSAS score and NC cutoff for OSA screening in women have also been validated in the aforementioned population-based study [16]. For men, although the sensitivity of NoSAS score was good enough, there is still much space to improve the diagnostic accuracy.

Studies show that the prevalence of sleep apnea tends to increase with age to approximately 65 and 55 years for women and men respectively, after which it appears to decline [22, 23]. We think that this is the reason why in our study NoSAS score did not show significant difference in age between high- and low-risk groups in the elderly. There were higher sensitivity and lower AUC for NoSAS score in the elderly than in the non-elderly. Meta-regression analysis findings reveal that age and BMI were positively correlated with the pooled sensitivity [24]. Although the elderly had lower BMI and NC, the differences were only mild and the vast majority of the elderly and non-elderly were still lying in the same category according to the associated cutoff values. In addition, with a median age of 52 years, only minority of the non-elderly were older than 55 years and allocated 4 points, meanwhile 4 points were directly assigned to all the elderly. The difference in age may far outweigh the differences in BMI and NC and underlie the observed disparities in sensitivity for the NoSAS score. With a decreased cutoff value of 7 points, sensitivity was significantly improved, reaching 86.7% for the prediction of an AHI of ≥ 20 events/h in the non-elderly with an acceptable specificity of 37.4%. Using a decreased age cutoff value of 50 years as other screening tools, such as STOP-Bang, also improved the sensitivity (79.8%) for the prediction of an AHI of ≥ 20 events/h in the non-elderly with a comparable AUC with the conventional score. As the prevalence of obesity based on BMI progressively increases until 60 years of age and decreases thereafter [25], the high prevalence of OSA in the elderly could be explained by other factors besides obesity: decreases in ventilator control and muscular endurance, increased upper-airway collapsibility, and sleep fragmentation potentially contribute to the development of OSA in the elderly [26]. A screening algorithm incorporating variables reflecting these possible mechanisms could help to improve the predictive power in the elderly.

One of the strengths of the analysis is that it relies on a large hospital-based cohort consisting of high proportions of the elderly and women, with PSG recordings using up-to-date techniques and manually analyzed according to the current guideline, which probably increases its reliability and relevance for clinical real-world settings. Although our study was a retrospective one, all information was collected before PSG monitoring in order to avoid bias. Furthermore, this is the first study that was effectively designed to investigate age- and sex-specific differences in the performances of NoSAS in Chinese clinical population. There are also limitations that need to be taken into account. First, it involved patients referred to sleep laboratory, which could represent a selection bias and limit its external validity. The prevalence of OSA was 97.6%, and the proportion of severe OSA patients was about 60.4%. So the sensitivity of NoSAS score in our study may be overestimated. However, the comparative results were still reliable and clinically relevant. Second, some subjects were excluded from this study for incomplete data and/or technically inadequate PSG. But the excluded subjects were not different from study population in age, sex, NC, snoring, and BMI.

Conclusions

In subjects referred to sleep laboratory, age- and sex-specific differences in the performance of NoSAS score are present, and their ability to replace diagnostic sleep studies currently appears to be limited, particularly in the elderly and men. If applied, age- or sex-specific cutoff values should be preferred, but there is a need for better age- or sex-specific screening algorithms.