Introduction

Obstructive sleep apnea (OSA) is characterized by repeated upper airway obstruction episodes during sleep, leading to frequent arousals and fragmented sleep and to excessive daytime sleepiness (EDS). This represents a serious safety risk in professions where motor vehicles need to be operated for long periods of time, such as among commercial drivers (CDs). OSA is associated with an increased motor vehicle accident (MVA) risk [1], and the OSA prevalence is around threefold higher among CDs than in the general population [2]. CDs with OSA have up to three times the MVA risk [3], and continuous positive airway pressure (CPAP) treatment has proven highly successful in reducing the MVA risk of CDs with OSA [4]. Hence, OSA screening and treatment among CDs is required by the regulatory authorities [4, 5] for public safety and cost-effectiveness reasons.

Overnight polysomnography (PSG) in a sleep center is the gold standard technique for diagnosing the presence and severity of OSA [6]. However, due to its high costs and poor availability, the systematic evaluation of the entire CD population with standard PSG is unfeasible [7], and the optimal way to manage this issue has yet to be addressed.

Several questionnaires have been developed for OSA screening in the general and high-risk populations [8]. These clinical tools consider EDS as well as OSA symptoms, anthropometric measures, and the presence of medical conditions associated with OSA. The efficacy of these questionnaires among CDs is debated. Poor symptom awareness and symptom denial [9, 10], as well as the higher OSA prevalence among CDs, may result in poor performance and in an unacceptable number of CDs with undiagnosed and untreated OSA [11].

To research this topic, we investigated the accuracy of eight standard OSA questionnaires in a cohort of CDs. We hypothesized their poor performance in this special population.

Methods

This was an unsponsored, investigator-initiated, prospective cohort study conducted by a certified sleep medicine center (Multidisciplinary Center for Sleep Medicine, IRCCS Sacro Cuore Don Calabria, Verona, IT) between July 2015 and April 2016. Participation in the study was offered to all CDs employed by 10 discrete commercial goods and people transportation companies based in the two neighboring provinces of Mantua and Verona in Northern Italy. The ethical committee of Verona and Rovigo provinces approved the study protocol (842CESC), and informed consent was obtained from all participants.

Subjects and setting

Consecutive subjects were recruited at their facility office during their yearly scheduled occupational health visit. Only actively working individuals aged 18–65 years with a regular driver’s license were included. Those who did not give formal consent, refused to undertake the home sleep apnea test (HSAT), lacked comprehension of the Italian language used in the questionnaires, and had a previous OSA diagnosis were excluded. Female drivers were excluded since only four were identified. All subjects were interviewed by qualified sleep technicians at the coordinating center. Demographic data, anthropometric parameters (body mass index (BMI), Mallampati score [12], and neck circumference), as well as the presence of OSA risk factors (essential hypertension, alcohol use, diabetes, and smoking) were collected. The Berlin [13], STOP [14], STOP-Bang [15], sleep apnea clinical score [16], apnea risk evaluation system [17], OSAS-Tavolo Tecnico Intersocietario [18], and European Obstructive Sleep Apnea Screening (EUROSAS) [19, 20] questionnaires were sequentially administered by the sleep technicians.

Home sleep apnea test (HSAT)

All subjects, irrespective of their questionnaire findings, underwent an HSAT investigation (type IIII PSG, SCOPER classification S4C4O1P2E2R2 [21]) using a commercial device (Alice NightOne, Philips SpA – Respironics, IT). The sleep technicians delivered the equipment to the participating subjects immediately after the clinical interview, providing full instructions for its use. The HSAT recording, in conjunction with a sleep diary, was collected from the subjects’ place of sleep, home, or work vehicle. Nasal airflow, chest respiratory efforts, sleep position, pulse oximetry, heart rate, and snoring events were monitored. After completing the HSAT, the participants returned the device to their facility’s office. Their participation in the study did not interrupt the CDs’ work activity.

The HSAT data were analyzed offline using ProFusion PSG 3 Lite (Compumedics Europe, Germany) by a sleep center medical staff member who was blinded to the questionnaire results. Apnea was defined as an airflow cessation ≥ 90% from baseline lasting ≥ 10 s, while hypopnea was defined as an airflow reduction of 30–90% for ≥ 10 s followed by ≥ 3% in oxygen desaturation [7]. A diagnosis of OSA was made if a respiratory events index (REI) value of ≥ 5 events/h was calculated for the total sleep period reported in the diary. The severity of OSA was classified, depending on the REI value, as normal (REI < 5), mild (REI ≥ 5 and < 15), moderate (REI ≥ 15 and ≤ 30), or severe (REI > 30) [22]. CDs with a HSAT recording lasting ≤ 5 h were excluded from the study because of the low reliability of the calculated REI [23].

As per protocol, individual HSAT reports were not disclosed to anyone other than the investigated subject. Each facility’s occupational health office was only told the total number of OSA-positive and OSA-negative CDs.

Statistical analysis

The demographic and clinical data were summarized using descriptive statistics, measures of variability, and precision plots. All parameters were reported with 95% confidence intervals. When necessary, the statistical models and estimations were adjusted for covariates.

Differences between CDs with and without a valid HSAT were assessed using the Chi-squared test, Fisher exact test, Mann–Whitney U test, or t-test, as appropriate. Contingency tables, for the assessment of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), as well as the area under the receiver operating characteristic curve (AUC) values were calculated for both the REI value thresholds of ≥ 5 and ≥ 15. Two independent univariate regression models (one for each REI threshold) were used to select the variables included in the multivariate logistic regression model for identifying independent predictors of OSA.

All analyses were performed using STATA 15 (StataCorp, 2017; Stata Statistical Software, Release 15; College Station, TX: StataCorp LLC.). A P value < 5% was set for statistical significance.

Results

All employed CDs in 7 out of the 10 transportation companies participated in our study, while, in the remaining 3, the participation rate was 15–29% (Online Resource 1). Three hundred and fifteen subjects were enrolled, but 72 (23%) CDs had to be excluded from the analysis due to their insufficient HSAT recording length or a technical failure. There were no differences in the demographic data between the included and excluded CDs except for smoking and hypertension rates (Table 1).

Table 1 Demographic and polysomnographic information of the study population

The median age, mean BMI (± standard deviation), and median neck circumference of the included CDs were 50 (interquartile range (IQR) 25–70) years, 27 ± 5 kg/m2, and 38 (IQR = 32–53) cm, respectively. In total, 45 (18.5%) CDs had hypertension, 66 (32.1%) were smokers, and 105 (44.2%) denied any alcohol intake. We found a REI value of ≥ 5 events/h in 172 (71%) subjects and ≥ 15 events/h in 68 (28%) subjects.

In the univariate analysis (Online Resources 2 and 3), OSA and severe OSA were associated with age, BMI, neck circumference, hypertension (only for moderate-to = severe OSA), and Class IV Mallampati score. In the multivariate analysis, only age and neck circumference were associated with any degree of OSA, while Class IV Mallampati score was associated with moderate-to-severe OSA.

All questionnaires demonstrated a high specificity (89–100%) in identifying the OSA risk; however, this led to a rather limited sensitivity (1–36%) (Fig. 1; Online Resource 4). Due to the high OSA prevalence in CDs, the questionnaires’ NPV was unsatisfactory, as it ranged 74–80% for predicting moderate-to-severe OSA. The STOP-BANG (using the ≥ 3 score cutoff) made an exception, showing, inversely, a high sensitivity (88–99%) and a poor specificity (29–44%). Accordingly, compared with all of the other questionnaires, the STOP-BANG exhibited the highest NPV (87–99%) and the lowest PPV (30–44%). The questionnaires’ AUC values ranged 0.51–0.71 for the prediction of OSA and 0.52–0.66 for the prediction of moderate-to-severe OSA. Combining the Mallampati score with the STOP-BANG questionnaire did not increase the overall accuracy in determining the OSA risk.

Fig. 1
figure 1

A Receiver operating characteristic curve (AUC) of questionnaires for respiratory events index (REI) > 5 events/h. B AUC of questionnaires for REI > 15 events/h

Discussion

Given the association with the MVA risk, OSA identification and treatment among professional drivers are a public health issue. Due to the limited availability of PSG testing and sleep medicine resources, CDs screening in occupational health practice are often limited to the administration of standard OSA questionnaires.

In our cohort of CDs, where OSA was found at the expected prevalence in this population [24], we demonstrated the poor performance of these questionnaire tools. The main finding of this study is that, although the questionnaires had a high specificity, a large number of CDs with a negative questionnaire had OSA. The STOP-BANG exhibited the opposite limitation, providing a high number of false-positive subjects. The AUC values confirmed the questionnaires’ overall unsuitability as reliable OSA screening tools in this population.

Several investigators have previously researched the accuracy of EDS and OSA questionnaires among CDs [11, 25]. With the exception of the following three studies, previous work tested with PSG only those subjects (and sometimes only a proportion of them) who were ranked at high risk by the questionnaires themselves.

Ueyama et al. [26] retrospectively investigated 1309 CDs employed by a single transportation company using the Epworth sleepiness scale (ESS) and a type IV PSG. They found OSA in 60% of the subjects and moderate-to-severe OSA in 24%, but only 9% had an ESS score ≥ 11 due to the poor awareness of subjective sleepiness symptoms. Firat et al. [27] examined 85 highway bus drivers from two transportation companies with the Berlin, OSA50, STOP, and STOP-BANG questionnaires and a standard PSG. Although the authors did not provide information on their screening method, they found the STOP-BANG to be the best-performing questionnaire (PPV 67%, NPV 76%) in a CD sample characterized by an increased rate (54%) of moderate-to-severe OSA. Popević and colleagues [15] assessed the accuracy of the STOP-BANG questionnaire in a sample of 100 CDs already preselected from an unspecified larger sleepiness study. They found better performance in the use of the STOP-BANG questionnaire than in our study, as the AUC values were 0.80 for OSA and 0.92 for moderate-to-severe OSA. The different screening strategies and population characteristics (since the CDs in the Popević cohort were more hypertensive and had an increased BMI and neck circumference) might account for the discrepancy between the two studies.

Apart from the larger sample of CDs included, several methodological differences from previous studies characterize this study. A higher number of OSA and EDS questionnaires were tested, and all CDs underwent the HSAT study irrespective of their questionnaire results. Screening CDs during their occupational health visit provided a probabilistic sampling of the participating subjects from the entire cohort. Greater acceptance of study participation was ensured by obtaining the HSAT at home or in the work vehicle without disrupting the CDs’ busy schedule and maintaining anonymity of the subjects HSAT results. Sleep technicians played a major role, by being available at the transportation facilities, explaining the benefit of OSA diagnosis to the drivers, and ensuring the quality of the data collection and analysis as much as possible.

The use of the HSAT instead of overnight PSG made this field study feasible. Although it lacks the EEG signals, the HSAT is based on total recording time rather than actual sleep time, and this may lead to REI underestimation. However, except for the presence of select comorbid conditions such as moderate-to-severe pulmonary disease or the extremely obese, HSAT is a reliable alternative to standard PSG [2829] and is routinely used in sleep centers, generally without the need for confirmatory PSG. Outcome studies conducted in non-occupational settings have shown a similar efficacy between overnight PSG and the HSAT [30]. Regulatory authorities are thus increasingly recognizing the role of the HSAT for the diagnosis of OSA among CDs, as long as the chain of custody of the investigated subject is provided [31].

Limitations

We cannot exclude the possibility that using standard PSG would have yielded different results on the questionnaires’ accuracy. The involvement of sleep technicians experienced in the use of both the HSAT and standard PSG helped minimize the differences between the two techniques. The study protocol did not include the subjects’ chain of custody. Some CDs could have therefore cheated by avoiding or misreporting sleep or by using a substitute during the HSAT recording. Moreover, subjective questionnaires are open to bias by CDs who may be motivated to underestimate their OSA and EDS symptoms due to the legal consequences. We sought to minimize this issue as much as possible by guaranteeing the anonymity of the HSAT report and excluding from the study CDs with a previous OSA diagnosis.

Conclusions

OSA and EDS questionnaires do not seem to provide an acceptable level of accuracy for the screening of OSA among CDs. Objective OSA measures, as well as fitness to drive tests, are undoubtedly needed in this context. The HSAT could improve the investigation of OSA in this population.