Introduction

Obstructive Sleep Apnea (OSA) is a common respiratory disorder that can have serious impacts; it leads to daytime sleepiness, poor job performance, increased risk of accidents, and cardiovascular problems. By definition, sleep apnea (or hypopnea) is the cessation (or more than 50% reduction) of airflow to the lungs during sleep that lasts for at least 10 s, and is usually associated with more than 4% drop of the blood’s Oxygen saturation (SaO2) level.20 There are three forms of sleep apnea: obstructive, central, and mixed; they account for approximately 84.6, 0.4, and 15% of the reported cases, respectively. Central apnea occurs because of neurological impairment, and mixed apnea is a combination of the other two.20 OSA is common in smokers,18 in people with high blood pressure and those with narrowed airway due to tonsils or adenoids.11 Sleep apnea occurs two to three times more often among men and elderly.11 It is also associated with cardiovascular problems, daytime fatigue, irritability, and lack of concentration.10,17,19,24

Polysomnography (PSG) during the entire night is currently the accepted Gold Standard diagnostic assessment for sleep apnea. The standard PSG consists of recording various biological signals including EEG, ECG, and EMG of chins and legs, nasal airflow, electro-oculogram (EOG), and abdominal and thoracic movements.3 PSG is an expensive test for the health care system as it needs full night supervision, is not portable, and is also inconvenient for patients. Therefore, many researchers have attempted to develop alternative, non-invasive, and portable OSA monitoring tools.

There are many different technologies that record a reduced number of signals, and claim detecting apnea/hypopnea events during sleep. Most of these technologies record at least four signals including airflow, SaO2, respiratory effort, and snore sound by an ambient microphones.1,32,36 In these technologies, airflow is usually measured by a nasal cannulae connected to a pressure transducer, output reduction of which or cessation associated with a drop of more than 4% of SaO2 is detected as the main diagnostic sign of OSA. Our team has recently developed a method using only tracheal respiratory sound and SaO2 signals that detects apnea/hypopnea events with a comparable accuracy to that of PSG.35

While there have been many studies and developed technologies21 to detect OSA during sleep, few studies have attempted to assess OSA during wakefulness.2,7,25,29,31,37 In one of the studies,29 it was claimed that OSA can be predicted from a short-time, daytime recording of the nasal airway pressure. However, that study29 suffers from several limitations including the number of studied subjects. They applied their proposed method to two different groups: Group 1 including 15 non-OSA and 3 OSA, and Group 2 including 14 OSA and 2 non-OSA individuals. Although the reported accuracy of classification is very high, the numbers of class members (OSA vs. non-OSA) in each of their classifications are completely unbalanced. Therefore, the classification results cannot be reliable.

In one study,31 an OSA detection using Ultrafast Magnetic Resonance Imaging from pharyngeal airway technique was proposed. Data were recorded during both wakefulness and sleep. The results from sleep data were impressive, but they showed an accuracy of only 46% during wakefulness. In another study,37 the patients with mixed apnea were reported to be detected from the patients with OSA and control subjects based on the analysis of breathing sound during wakefulness. The Breathing pattern of 36 individuals with mixed apnea, 20 with OSA, six with central apnea, and 10 non-OSA individuals were compared using the coefficient of variation (CV) and sample entropy of the inspiratory, expiratory cycles, and complete breath sounds. The results showed that the breathing irregularity (defined by higher CV and sample entropy values) during wakefulness was greater in patients with mixed apnea compared to that of OSA and control subjects, though, they did not provide classification results. In addition, the irregularity features were not statistically significant between the groups.

Recently, a group of researchers investigated the correlation of speech disorder and OSA.7 The acoustic features of 10 non-OSA and 18 OSA individuals during speaking were compared, and substantial differences were found, though no classification were done.7 By combining the Gaussian mixture model (GMM)-based classifier and feature-selection methods, a technique for detection of OSA using oronasal airway pressure signal was proposed.2 They achieved good results (>85% of accuracy between OSA and non-OSA) for 41 subjects, but it seems that the feature space dimension was too high for their database size (over-fitting). This implies that the results may not be as promising as reported if the population size changes. Furthermore, no physiological reasons for the calculated features were provided.

On the other hand, previous studies have shown that patients with OSA have a defective ability to dilate the pharynx during inspiration.30 Moreover, the tracheal breath sound intensity of seven patients with OSA has been shown to increase significantly in supine position with respect to that of eight individuals in control group.25 Given that people with some degrees of upper airway congestion are more prone to develop OSA, we hypothesize that there must be some noticeable differences between the nose and mouth breathing intensities, complexity, and smoothness of the breath sounds of people with OSA in the supine and upright sitting positions as compared to those of non-OSA people.

To investigate this hypothesis, as a feasibility study, we screened people for OSA using a simple technique during wakefulness, and the results were presented in our previous publication.22 In our feasibility study, the number of participants was 16, and only the training accuracies of two types of classifiers were evaluated. In this study, we increased the number of participants to 52, evaluated the test accuracy using the Leave-One-Out routine,5 and also ran statistical tests on the significance of the respiratory sound features with respect to OSA. In addition, we investigated whether people suspected of having moderate and severe OSA could be distinguished from those with mild OSA and non-OSA individuals by this simple technique during wakefulness. In all tests, the severity of sleep apnea was evaluated by PSG.

Materials and Methods

Data

Fifty-two participants (37 males) suspected of having OSA, who were referred for full-night PSG assessment at the Sleep Disorder Lab at Misericordia Health Center, Winnipeg MB, gave written consent to be enrolled in this study. The study was approved by the Health Research Ethics Board of the University of Manitoba. All of the study participants, even the non-OSA individuals, went through full night PSG assessment. Based on their apnea/hypopnea (AHI) scores (determined by the PSG), we grouped them into non-OSA (AHI < 5), mild (5 ≤ AHI < 15), moderate (15 ≤ AHI < 30) and severe OSA (30 ≤ AHI). The average age, body mass index (BMI), and AHI values of the participants are summarized in Table 1.

Table 1 Average age, body mass index (BMI), AHI values of the participants

Tracheal breath sound signals were collected by a Sony microphone (ECM-77B) embedded in a chamber (diameter of 6 mm) placed over the suprasternal notch of trachea using double-sided adhesive tapes. The sound signals were amplified, band-pass filtered (0.05–5000 Hz), and digitized at 10,240 Hz sampling rate. The recordings were done in two different body positions: upright sitting and supine. In each body position, breath sounds were recorded during two breathing maneuvers for at least five full breaths in each trial. The two breathing maneuvers were breathing through the nose and then through the mouth with a nose clip in place at medium flow rate (the subject’s comfortable normal medium flow rate). Therefore, for every participant, we recorded a total of four breathing signals (2 breathing maneuvers at 2 different body positions).

Preprocessing

Inspiration is an active process, while expiration is a passive process. Therefore, we analyzed the inspiration and expiration phases separately. Using a fourth-order Butterworth band-pass filter (150–800 Hz), we filtered the recorded signals to reduce the effects of heart sound and background noise. Since the recordings were done in hospital environment, there were occasional background vocal noises; those were excluded from data manually by listening to all recorded data. We calculated the onset of each phase using the method detailed in Huq and Moussavi15 The onset detection method is automatic and highly accurate; however, we verified the detected onsets manually for all subjects in this study to insure the accuracy. Since we did not record respiratory flow, to insure the accuracy of the phase labels, we always started each recording at the inspiration phase and checked all the signals manually by listening as well as visual inspection in time–frequency domain for plausible occasional swallow that may interfere with alternating breath phases.

Feature Extraction

For each respiratory phase in each breath, we calculated the power spectrum density (PSD—using Welch method), fractal dimension (FD), and Kurtosis in every 50-ms window with 50% overlap with the adjacent windows, and averaged over the segments within the breath phase,22 denoted as \( P^{{b_{i} }} ,\,FD^{{b_{i} }} \) and \( Kurt^{{b_{i} }} \), respectively, where b i represent the breath number 1–5. Next, we calculated the average curves of the \( P^{{b_{i} }} ,\,FD^{{b_{i} }} \) and \( Kurt^{{b_{i} }} \) over five breath cycles for each inspiration and expiration separately. They are denoted as \( Ave^{\text{Pow}} ,\,Ave^{\text{FD}} \) and \( Ave^{\text{Kurt}} . \) Then, the variance and median values of these average curves were calculated, denoted as \( Var_{\text{pow}}^{\text{ave}} ,\,Med_{\text{pow}}^{\text{ave}} ,\,Var_{\text{FD}}^{\text{ave}} ,\,Med_{\text{FD}}^{\text{ave}} ,\,Var_{\text{Kurt}}^{\text{ave}} ,\,{\text{and}}\,Med_{\text{Kurt}}^{\text{ave}} . \) These six features were shown as the characteristic features by our feasibility study. Thus, in this study, we used these features for further analysis. Figure 1 shows the algorithm of the proposed signal-processing method up to this stage schematically.

Figure 1
figure 1

The flowchart of signal-processing stages (Stage 1)

Having recorded in two breathing maneuvers and in two different positions, for each participant, we had four recorded sound signals. Then, separation of inspiration and expiratory phases would result in eight signals per participant. In addition, we investigated the difference between nose and mouth breathing (absolute value of differences between calculated features in each breathing maneuver) in each position as well as the differences between the positions in each nose and mouth breathing signals, resulting in four extra signals for each inspiration and expiration phases. Therefore, extracting the six features from each signal (16 different conditions for each signal) would result in 96 features in total per participant. However, since we excluded some parts of the data that contained vocal noise in the environment, not every subject had all the 96 features.

Feature Selection

As the first stage of the feature selection, we divided the study participants into two groups: participants with AHI > 30 (severe OSA) and with AHI < 5 (non-OSA). Then, we ran One-Way Analysis of Variance (ANOVA)23 test on each of the 96 features separately. Twenty-one features were found significantly different between the non-OSA and severe OSA groups (p < 0.05); these 21 features formed the first selected set. Next, we divided the study participants into two groups with AHI > 15 and AHI < 15. We ran the One-Way ANOVA test again on each of the 96 features separately. This time, 17 features were found to be significantly different between the mentioned groups (p < 0.05); these 17 features formed the second feature set. Then, out of the first and second sets of features, we selected the common features (12) that were present in all subjects’ signals for further analysis. At this stage, the features selected from the mouth breathing were excluded because not all of the subjects had clean mouth-breathing signals. Therefore, instead of excluding those subjects from the classification, we excluded the mouth-breathing signals.

Twelve features are still so many for classification in a small population as it can cause over-fitting classification. Therefore we aimed to find the best 2 or 3 features that would result in the highest classification. To find the best subspace for classification, a search algorithm is needed. We used the maximum relevancy minimum redundancy (mRMR) method,26 and selected the best 3D and 2D subspaces for classifying participants with AHI < 15 from those with AHI > 15. In brief, the mRMR method maximizes jointly the dependency between the features and the target class (called Max-Dependency) based on their maximum relevancy, and then uses probabilistic criterion to minimize the redundancy of the selected features; the details of the algorithm can be found in Peng et al.26

After selecting the features, we ran One-Way multivariate analysis of variance (MANOVA)23 to verify that the combination of selected features was also significant between the study participants. Figure 2 shows the feature selection algorithm flowchart.

Figure 2
figure 2

The flowchart of signal processing stages (Stage 2)

Classification

Once the features were selected and reduced, we used linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA) classifiers5 for classification of the participants into different classes. Basically, for 2-group classification, LDA approaches the problem by assuming that the conditional probability density functions are both normally distributed with their mean values calculated from the training set and pooled estimation of the covariance matrix. Under this assumption, the Bayes’ optimal solution is to predict points as being from the second class if the ratio of the log-likelihoods is below some threshold.5 The QDA approach is the same as the LDA except for estimation of the covariance matrix which is stratified by the groups.5

Due to small number of data in each group, instead of dividing the dataset to training and test sets, we used Leave-One-Out method, in which every time one subject’s data are left out as the test subject, all others are still used as training; this procedure is repeated till all participants’ data are used as the test subject once. The sensitivity, specificity, and classification accuracy were determined as follows:

$$ {\text{Sensitivity:}}\frac{\text{TP}}{\text{TP + FN}} \times 100 $$
(1)
$$ {\text{Specificity:}}\frac{\text{TN}}{\text{TN + FP}} \times 100 $$
(2)
$$ {\text{Classification Accuracy:}}\frac{{{\text{Number}}\,{\text{of}}\,{\text{correctly}}\,{\text{classified}}}}{{{\text{Number}}\,{\text{of}}\,{\text{Total}}\,{\text{Subjects}}}} \times 100 $$
(3)

Note that TP, FN, TN, and FP, in our study, represent the numbers of correctly classified subjects with higher AHI (AHI > 30 or AHI > 15 in the 2nd classification), misclassified subjects with higher AHI, correctly classified participants with lower AHI (AHI < 5 or AHI < 15 in the 2nd classification), and misclassified participants with lower AHI, respectively.

Results

The three best-feature sets selected by the mRMR method were

  • \( MPUNI{\colon}\,MED_{\text{pow}}^{\text{ave}} \) of the Upright sitting position, Nose breathing, and Inspiratory phase;

  • \( MKSNI{\colon}\,Med_{\text{pow}}^{\text{ave}} \)of the Supine position, Nose Breathing, and Inspiratory phase; and

  • \( VKUNI{\colon}\,Var_{\text{Kurt}}^{\text{ave}} \) of the Upright Sitting position, Nose breathing, and Inspiratory phase.

When we limited the best features to be 2, MPUNI and VKUNI were selected again. Since the classification results were very similar for 2D and 3D classifiers (the one with 2D was slightly better), we present the results of 2D classification. In general, the smaller is the size of the feature set, the more robust is the classification result in terms of a change in population. In other words, we may achieve a higher accuracy using more features for this particular population, but then the chance to achieve the same accuracy with the same feature set in a larger population is less. Therefore, in pilot studies with small population, it is recommended to select the smaller feature sets and simple classifications.

To insure whether the combination of the MPUNI and VKUNI was also statistically significant between the groups, we ran MANOVA test on the two selected features. The test showed p-values less than 0.001 among the groups of AHI > 15 and AHI < 15 and also among the groups of AHI < 5 and AHI > 30. Therefore, these features were selected to classify participants with different degrees of OSA severity. First, we investigated classifying the participants with AHI < 5 from those with AHI > 30. Then, we investigated classifying the participants with AHI > 15 from those with AHI < 15, which is more challenging as there is no gap of AHI between the two groups.

Figure 3a shows the 2D scatter plot of the selected features calculated for the participants with AHI < 5 and those with AHI > 30. Figures 4a and 4b show mean and standard deviation of the MPUNI and VKUNI features calculated for the mentioned groups, respectively. Figure 3b shows the 2D scaled scatter plot of the selected features of the participants with AHI < 15 and those with AHI > 15 created with MPUNI and VKUNI. Figures 5a and 5b show mean and standard deviation of the calculated MPUNI and VKUNI, respectively. It can be seen that individuals with higher AHI range (AHI > 15) show higher values of MPUNI and lower values of VKUNI feature compared to those individuals with AHI < 15.

Figure 3
figure 3

(a) The 2D scaled scatter plots (MPUNI, and VKUNI) for participants with AHI < 5 and AHI > 30, (b) the 2D scatter plots (MPUNI, and VKUNI) for participants with AHI < 15 and AHI > 15

Figure 4
figure 4

(a) The mean and standard deviation of the MPUNI calculated for the participants with AHI < 5 and AHI > 30, (b) the mean and standard deviation of the VKUNI calculated for the mentioned groups

Figure 5
figure 5

(a) The mean and standard deviation of the MPUNI calculated for the participants with AHI < 15 and AHI > 15, (b) the mean and standard deviation of the VKUNI calculated for the mentioned groups

Table 2 shows the specificity, sensitivity, and classification accuracy achieved using LDA classifier for both of the approaches; we calculated these values using the Leave-One-Out method. Table 3 shows the results of the same test as that in Table 2 but using the QDA classifier. Overall, QDA classifier outperformed the LDA. The overall accuracy for classifying the groups of non-OSA and severe ones as well as those above and below AHI = 15 were 91.7 and 83.3%, respectively, when using the QDA classifier; they were 86.4 and 83.3%, respectively, when using the LDA classifier. QDA classifier over performed the LDA classifier, except for the sensitivity for classifying participants with AHI < 15 from AHI > 15.

Table 2 Sensitivity, specificity, and classification accuracy for the LDA classifier using MPUNI and VKUNI as the classification features
Table 3 Sensitivity, specificity, and classification accuracy for the QDA classifier using MPUNI and VKUNI as the classification features

For further investigation, we extracted the AHI of the misclassified participants in both classification approaches using either LDA or QDA. When classifying the severe OSA individuals from non-OSA people, three subjects with AHI of 3.6, 3.9, and 30.7 with either LDA or QDA were misclassified. On the other hand, when using AHI = 15 as the threshold to divide people into two groups, the AHI of misclassified subjects were 3.6, 3.9, 10.3,13.7, 14, and 30.7 using LDA and 3.6, 13.7, 14, 30.7, 35.6, and 115.6 using QDA.

Figure 6 shows the 2D scatter plot of the two selected features of all the participants in different AHI groups. TN, FP, TP, and FN represent:

Figure 6
figure 6

The 2D scaled scatter plots (MPUNI, and VKUNI) for participants with AHI < 15 and AHI > 15. The figure shows four different groups: the participants with AHI < 15 classified correctly (TN); the misclassified participants with AHI < 15 (FP); the participants with AHI < 15 classified correctly (TP); and the misclassified participants with AHI > 15 (FN). The accuracies are based on QDA classification

  • The misclassified participants with AHI < 15 (FP);

  • The participants with AHI > 15 classified correctly (TP);

  • The misclassified participants with AHI > 15 (FN);

  • The misclassified participants with AHI < 15 (FP).

Discussion

As mentioned in Introduction Section, people with OSA disorder usually have smaller and more collapsible pharynx than healthy individuals.8,14,16,27,28 It is known that muscle tone decreases naturally during sleep. In a non-OSA individual, the decrease in dilator muscle activity does not obstruct the airway. However, in people with OSA, a decrease in dilator muscle activity leads their airway to collapse and obstruct the airflow as the dilator muscle activity is not enough to maintain the pharynx sufficiently open during sleep.6,9

In addition, it has been shown that people with OSA compensate for their more collapsible airway by an increased activity in the dilator muscle during wakefulness.38 The increased dilator muscle activity represents the increased negative pressure of the pharynx. On the other hand, the intensity of tracheal breath sounds represent the pharyngeal pressure almost linearly during normal breathing (no flow limitation condition)34; this has been the premise of acoustic flow estimation method.36 Given the above mentioned facts and knowing that the intensity of tracheal breath sounds represents the pharyngeal pressure, we hypothesized that there must be some notable differences between the breathing sounds of people with different degrees of OSA severity during wakefulness.

The objectives of this study were to investigate the above hypothesis, and offer a simple and non-invasive method to identify people with different degrees of OSA severity during wakefulness. We focused on classifying participants with AHI > 30 from those with AHI < 5 and also people with AHI > 15 from those with AHI < 15. The main objective was to find a robust feature set for classifications of different AHI ranges. In the first stage, we selected the features that were statistically significant among different AHI groups. Since the number of significantly different features was still high for the given small population, a search algorithm was needed to find the best subspace of features. There are several algorithms that map the available feature space to a lower dimension space, i.e., principal component analysis (PCA), independent component analysis (ICA), or reduce the features set dimension by selecting a subset of original features, i.e., sequential forward search, and mRMR.5,26 The algorithms such as PCA and ICA guarantee orthogonality and independence of the selected features but the features are no longer the original features; therefore, the physical interpretation of the features would be almost impossible. In feature selection for diagnostic purpose, such as in case of this study, the physical interpretation of the selected features is important as the same can also be of some diagnostic value. Therefore, we chose a supervised feature selection algorithm that searches among the original features space and finds the best in terms of maximum relevancy and minimum redundancy.26

The mRMR algorithm has been shown26 to outperform other similar feature selection algorithms. Thus, we used this method for selecting the best feature subspace. The optimal selected features using the mRMR method were found to be MPUNI, MKSNI, and VKUNI.

The results are congruent with physiological facts about airway structure associated with OSA as mentioned above. The MPUNI is a feature derived from average power of the breath sounds during inspiratory phase of nose breathing, when the pharyngeal pressure is more negative. The higher MPUNI values in the people with OSA represent more negative pressure and turbulence of the air in their airway (Figs. 3b, 4b); this result is congruent with the above mentioned physiological change on airway structure because of OSA.

The other selected feature was VKUNI, which is a measure of variability of the kurtosis during inspiratory nose breathing. Kurtosis, in general, is a measure of peakedness of the distribution of the random variables respect to normal distribution. As we calculated the kurtosis in 50-ms windows; we have a vector of kurtosis for each signal. Therefore, larger variability of kurtosis may represent larger variation in flatness of distribution of signal in different windows. In other words, larger variation in distribution of signal is related to the larger complexity of the signal. Studies on biological signals have often reported an association between the pathological signals and loss of complexity.12,13,33 The lower variability of the kurtosis in our OSA data may also be due to lower complexity of pathological signals (Figs. 3c, 4c).

Participants with higher AHI range showed higher MPUNI and lower VKUNI values; therefore, we expected to see two clusters in the 2D scatter plots as can be seen in Figs. 3a and 4b. The clusters obtained from non-OSA participants (AHI < 5) and those with severe OSA (AHI > 30) are more distinguishable because of the gap between the AHI range of participants. Thus, the classification results were expected to be better for the mentioned groups as opposed to when classifying two groups below and above the threshold of AHI = 15 (Tables 2, 3). Except for two misclassified subjects, the AHI of the rest of misclassified cases were very close to the boundaries of the classification (Fig. 6). Note that in determining our classification accuracies, we only used the AHI values and not the overall diagnosis of the participants. Obviously, one cannot say the severity of the OSA of a person with AHI of 14 is indeed different from the one with AHI of 16. The AHI values can be slightly different depending on who scores the PSG.

It should be noted that in the hospital in Winnipeg, where we recorded our data, the PSG is assessed manually. Manual assessment of PSG is considered to be more accurate than automatic scoring, though it is expected to have slight differences in the AHI values depending on who scores the PSG.4 Nevertheless, while the AHI is one of the most important factors in diagnosis of OSA, it is not yet the only factor determining the severity of OSA. Therefore, we expected to have some misclassified cases especially for those with AHI close to the boundaries. Moreover, due to the small size of our study subjects, we did not group the subjects in terms of their anthropometric parameters, i.e., gender, BMI, height, as well as their smoking history that can have an impact on the tracheal respiratory sounds. These are important parameters that should be considered in future studies.

Conclusion

In this study, a novel method using breath sound analysis has been proposed for OSA severity prediction during wakefulness. We showed that, two features representing average power and variation in the kurtosis of the sound signals are characteristic features that can be used for screening OSA severity. We tested our proposed method on 52 subjects; the preliminary results show a good separability between the groups with different degrees of severity of OSA. The results of this study are encouraging and pave the way for a simple, non-invasive, and inexpensive screening tool for the people suspected of OSA, as well as predicting the degree of severity of the OSA during wakefulness.