Introduction

Obstructive sleep apnea (OSA) is a highly prevalent sleep disorder characterized by recurrent episodes of upper airway (UA) obstruction during sleep, leading to partial or complete cessation of breathing. OSA is widely recognized to contribute to cardiovascular comorbidities, type 2 diabetes, and hypersomnolence [1]. It is a multifactorial disorder that originates from UA anatomical characteristics and muscle responsiveness, arousability, and breathing instability. The interaction among these factors contributes to the occurrence/aggravation of OSA [2].

The apnea-hypopnea index (AHI), i.e., the number of apneas or hypopneas per hour of sleep, is measured by polysomnography (PSG) and is used to evaluate the severity of sleep apnea. However, other PSG parameters have not been successfully synthesized [3, 4]. Indeed, the characteristics of respiratory events assessed using PSG may reflect the underlying pathophysiology of OSA. If true, it would be necessary to identify useful parameters other than AHI to classify OSA subgroups.

Cluster analysis is a multivariate method of classifying a sample of subjects into different groups with common characteristics, based on a set of measured variables. Several recent studies have used cluster analysis to distinguish OSA phenotypes [5,6,7,8,9,10]. Some studies have reported OSA phenotypes with a combination of variables, including symptoms and comorbidities, whereas only a few studies have used cluster analysis with PSG variables [5, 9, 10].

In the present study, we used the characteristics of respiratory events from PSG and performed a cluster analysis. We hypothesized that the characteristics of respiratory events reflect the underlying OSA pathophysiology.

Methods

Subjects

Men with moderate or severe OSA, who underwent diagnostic PSG between August 2013 and November 2015 at the Yoyogi Sleep Disorder Center (Tokyo, Japan), were eligible to participate in this retrospective study. Patients with cardiovascular, cerebrovascular, and/or psychiatric diseases, periodic limb movements > 15/h, and those on medications affecting sleep, breathing, or muscle control at diagnostic PSG were excluded from the analysis. Study participant flow (N = 210; age, 47.0 ± 0.8 years; body mass index (BMI), 27.4 ± 0.4 kg/m2) is shown in Fig. 1. The study was approved by the institutional review board of Tokyo Medical University (Tokyo, Japan). All participants provided informed written consent for study participation and to have their anonymized clinical data analyzed.

Fig. 1
figure 1

Study participant flow. PSG, polysomnography; AHI, apnea-hypopnea index; PLMI, periodic limb movement index; PSD, psychosomatic disease; CVD, cardiovascular disease; SA, sleep apnea

Polysomnography

Nocturnal in-laboratory PSG was performed using a PSG system (Alice 5, Respironics, Murrysville, PA), including monitoring of electroencephalogram (C3-A2, C4-A1), bilateral electrooculogram, submental electromyogram (EMG), electrocardiogram, and bilateral anterior tibialis EMG. Airflow was recorded using nasal pressure cannula and an oronasal thermistor. Thoracic and abdominal movements were recorded using inductive bands, and oxyhemoglobin saturation was monitored with a finger probe using a pulse oximeter. Body position was monitored using an automatic position sensor. Sleep stages, respiratory events, and associated arousals were scored according to the American Academy of Sleep Medicine Scoring Manual published in 2007 [11]. Hypopnea was defined as ≥ 50% decrease in nasal pressure, in association with an arousal or ≥ 3% desaturation (according to the alternative criteria).

Data collection and parameter computing

Event data sheets were used to assess respiratory events in the supine and non-supine positions separately during each sleep stage (N1–N3 and rapid eye movement), for the combined total time for each position and sleep stage of at least 5 min (10 epochs).

The variables measured included number and type of respiratory events (apneas or hypopneas), event duration, degree of oxygen desaturation, and occurrence and timing of arousals during whole sleep and/or each sleep stage. Mean event duration and mean minimum SpO2 were recorded as the average of all relevant events in each subject. Apneas were divided into obstructive, mixed, and central apneas (CAs). The fraction of apnea (Fapnea) was obtained from the ratio of apneas to the total number of respiratory events. The arousal ratio (Rar) was obtained from the ratio of arousals during respiratory events to the total number of respiratory events. We selected variables based on clinical PSG characteristics reported by Edwards et al. [12], except from the arousal index, for which we used Rar. The preponderance of respiratory events in the sleeping position and sleep stages is well-known [1]. UA collapsibility is strongly influenced by body position rather than sleep stages [13]. Due to the limited number of subjects and in order to represent significant and meaningful respiratory event patterns in our patients, we performed an ascending hierarchical clustering analysis using Fapnea, mean even duration, mean minimum SpO2 accompanied with the events, Rar, and AHI in the supine position, according to Ward’s method.

Absolute correlations between selected variables were lower than 0.90 (− 0.63 to 0.50). Ward’s method is a linkage method in which individual data points or clusters are combined. In brief, the method merges (at each step) the two closest clusters or points with minimal increases in the summed squared Euclidean distances between them. The result of this hierarchical clustering is shown as a dendrogram. On its vertical axis, the proximity between the merging clusters is typically displayed. A Scree plot was drawn to plot the curve of the total within-cluster sum of squares according to the number of clusters. The elbow method was used to determine the appropriate number of clusters as objectively as possible. The location of a bend (a meaningful change in the slope) in the plot was considered to be an indicator of the appropriate number of clusters.

Although there are no established guidelines regarding the sample size necessary for cluster analysis, a prior report suggested the use of no fewer than 2k cases (k = number of variables), preferably 5 × 2k [14]. Thus, our minimum sample size was calculated to be 160 (5 × 25).

Statistical analyses

Data are reported as mean ± standard error (SE) or median (interquartile range (IQR)). After identifying clusters, the differences in patient demographics and PSG variables among clusters were examined using non-parametric analysis (Kruskal-Wallis test). If a difference was significant, a post hoc analysis (Steel-Dwass test) was performed. To evaluate the reliability of the classification derived by hierarchical cluster analysis, K-means method was used for validation. After that, Kappa statistic was calculated to measure the agreement between the two classifications [15]. To evaluate the effect of sleep stage or body position, PSG variables were compared using the Wilcoxon signed rank test between NREM and REM sleep or between supine and non-supine positions. Statistical significance was accepted at p < 0.005. The prevalence of comorbidity was assessed using the chi-squared test. In this case, p < 0.05 was considered significant. All analyses were performed using JMP version 11.0.0 (SAS Institute Inc., Cary, NC, USA).

Results

Patients were divided into three groups using hierarchical cluster analysis of the supine position data. The radar chart and characteristics of the three clusters are shown in Fig. 2 and Table 1.

Fig. 2
figure 2

Clusters (CL) from polysomnographic parameters in the supine position. Medians of five variables are shown for the three clusters. AHI, apnea-hypopnea index; Fapnea, fraction of apneas per total respiratory events; Rar, ratio of arousals during respiratory events; min SpO2, minimum oxygen saturation

Table 1 Patient demographics and characteristics in the supine position

Cluster 1 (CL1, 20% of the entire subject pool) exhibited the highest Fapnea, largest AHI, lowest minimum SpO2, and largest Rar and BMI than clusters 2 and 3. Cluster 2 (CL2, 31% the entire subject pool) had the longest duration of events among all clusters, and higher Fapnea and larger Rar than cluster 3. The BMI in CL2 was smaller. Among the three groups, cluster 3 (CL3, 49% the entire subject pool) constituted the largest proportion; this cluster had lower Fapnea and AHI, shorter duration of events, and smaller Rar than CL1 and CL2. CL3 exhibited a minimum SpO2 similar to that of CL2 without hypoxemia and had a significantly higher percentage of CAs than CL1 and CL2.

K-means clustering (n = 3) for validation is presented in Table S1 and showed similar groups to those identified by hierarchical clustering. The distribution of numbers between hierarchical and K-means clustering is shown in Table S2. Kappa statistic was 0.7(95% CI, 0.61–0.78).

Comparing PSG parameters in the supine position between NREM and REM sleep, Fapnea was significantly larger, the duration was longer, and the minimum SpO2 was lower during REM than NREM sleep in all clusters, as shown in Fig. 3 and Table S3. AHI increased during REM sleep in CL2 and CL3 but decreased in CL1 due to the considerable prolongation of event duration. To examine the effect of body position on PSG variables, we evaluated 135 patients. However, 75 patients were excluded from this analysis due to insufficient non-supine sleep time (median (IQR) 7.5 (47.5) min). The differences in PSG variables between the supine and non-supine positions in each cluster are shown in Fig. 4 and Table S4. In each cluster, the non-supine AHI was significantly lower than the supine. In CL1 and CL2, Fapnea in the non-supine position was smaller. However, in CL3, the non-supine Fapnea was not significantly different from the supine Fapnea. In each cluster, the non-supine Rar was significantly lower, and SpO2 was significantly higher than the corresponding supine values. Lastly, the duration of the events between the supine and non-supine positions was not significantly different in each cluster.

Fig. 3
figure 3

Changes in variables during NREM and REM sleep in the supine position in each cluster. Medians of five variables are shown for the three clusters. #P < 0.01; *P < 0.001; and P < 0.0001. CL, cluster; AHI, apnea-hypopnea index; Fapnea, fraction of apneas per total respiratory events; Rar, ratio of arousals during respiratory events; min SpO2, minimum oxygen saturation

Fig. 4
figure 4

Changes in variables in the supine and non-supine positions in each cluster. Medians of five variables are shown for the three clusters. #P < 0.01; *P < 0.001; and P < 0.0001. CL, cluster; AHI, apnea-hypopnea index; Fapnea, fraction of apneas per total respiratory events; Rar, ratio of arousals during respiratory events; min SpO2, minimum oxygen saturation

Discussion

We analyzed the diagnostic PSG variables from OSA patients in the supine position using hierarchical cluster analysis. Our results showed (1) three distinct clusters: high fraction of apnea and severe desaturation (CL1), high fraction and long duration of apnea (CL2), and low fraction of apnea (CL3); (2) significant differences in the BMI and apnea type among the three clusters; (3) worsening of the apnea fraction, duration, and desaturation during REM sleep in all clusters; (4) a body position effect on the severity of OSA and arousal ratios in all clusters, and on the fraction of apnea in CL1 and CL2.

In the theoretical pathophysiological model, predisposition to OSA reportedly depends on the combination of UA anatomy, UA muscle response, ventilatory stability, and arousability [2]. UA collapsibility is likely influenced by anatomical and neuromuscular factors [16, 17] and has been evaluated previously by measuring pharyngeal critical pressure [17, 18]. Even though AHIs are similar among patients, some patients have predominantly obstructive apneas, whereas others have predominantly obstructive hypopneas. Moreover, apnea requires higher pressure than hypopnea to maintain UA patency. Therefore, a different apnea/hypopnea ratio might require different pressures to overcome these obstructive events. Joosten et al. reported that moving from the supine to the lateral position significantly improves passive UA collapsibility and its ability to dilate and stiffen [19]. We found that Fapnea was smaller in the non-supine than in the supine position in many patients, similar to AHI [20], suggesting that Fapnea may indicate UA collapsibility.

Arousals are stimuli that terminate the respiratory events in OSA, which accompany hypoxemia and/or hypercapnia, thus increasing the ventilatory drive [21]. When the drive reaches the arousal threshold, a respiratory effort-dependent arousal occurs. Hence, Rar may indicate the likelihood of termination of an event by an arousal. In contrast, events without arousal open the UA in a neuromuscular compensatory fashion before arousal [22].

Clustering is a useful technique to examine data by grouping, but clusters are dependent on the selected variables. Only three studies have performed cluster analysis on PSG measures other than AHI [5, 9, 10]. Joosten et al. identified six clusters in patients with mild or moderate OSA [5], categorized into clinical OSA groups, such as REM-predominant, supine-isolated, supine-predominant, and REM/supine overlap. Furthermore, Lacedonia et al. identified three clusters in patients with OSA using AHI, time with SpO2 less than 90%, and oxygen desaturation index [9]. These two studies did not include arousals and degree of desaturation. Recently, Zinchuk et al. selected 17 features from four domains (breathing disturbance, sleep architecture disturbance, autonomic dysregulation, and hypoxia) by variable reduction analysis and identified seven clusters from routine PSG data [10]. However, their method was overly complex for clinical use. The authors also suggested that measuring arousal timing, apnea and hypopnea duration, pre-event oxygen saturation, ventilatory loop gain, and pharyngeal muscle responsiveness may help refine clustering. Herein, we used variables, including arousal timing and respiratory event duration and obtained three clusters with interpretable clinical and PSG features. Validation is an important consideration; however, as our analytical sample was small, we elected not to use “hold-out” validation but use the K-means method, non-hierarchical clustering, used to evaluate relevance in market research [23]. Kappa statistic shows the agreement between classifications; this was 0.7 between hierarchical and K-means clustering, indicating substantial agreement between the two classifications. In the future, a different dataset is needed to validate our clustering.

Patients in CL1 were obese and exhibited characteristics consistent with “typical OSA.” As chemical response to both hypoxemia and hypercapnia could promote increase in ventilatory drive sooner and to a higher degree, the high Rar in CL1 suggests that ventilatory drive at the end of the events may reach the arousal threshold level. Peppard et al. showed that BMI is an important predictor of oxygen desaturation severity during respiratory events in OSA independent of age, sex, sleeping position, smoking history, baseline SpO2, and event duration [24]. Therefore, high BMI was an important factor to distinguish this cluster, in terms of high Fapnea and severe hypoxemia.

CL2 was characterized by long duration of events and high Fapnea. Although there was no difference in the minimum SpO2 between CL2 and CL3, CL2 had approximately 10-s longer duration and higher Rar. This indicates that UA-muscle responses were not enough to compensate for the airway opening, even for longer duration, and that the respiratory events were supposed to end with the occurrence of arousals. Taken together, longer duration and higher Rar might indicate a high arousal threshold in CL2. In contrast, relatively large Fapnea and non-obesity indicated that the UA was more collapsible in the supine position. Lastly, Fapnea in the non-supine position in CL2 was similar to that in CL3 and much lower than that in the supine position, suggesting dominant anteroposterior UA collapsibility in CL2.

Patients in CL3 had lower Fapnea and Rar, suggesting that the UA likely opens before arousals resuming respiration. Why did the patients in this cluster have OSA? It was suggested that patients with a mildly collapsible UA exhibit high chemosensitivity [25]. In CL3, breathing instability might have contributed to OSA development. As we did not include variables to assess breathing instability, we speculate that high chemosensitivity in CL3 was due to the higher percentage of CAs. The repeated CA occurrence during sleep might reflect breathing instability, which is dependent on the gain of the respiratory control system [25]. Indeed, it would have been useful to obtain data for breathing control from PSG. Terrill et al. provided a method to quantify breathing instability from PSG [26]. However, we were unable to incorporate this method in our clinical PSG protocol. Another factor potentially causing breathing instability is a sensitive arousal threshold. In CL3, non-supine Rar decreased, although there were almost no changes in the Fapnea or duration. This may explain the reduction in the non-supine AHI observed in this cluster.

As we estimated the pathophysiological interaction on sleep-disordered breathing (SDB) using this cluster analysis of respiratory events in PSG, a choice of treatment other than continuous positive airway pressure (CPAP) might be applied. Thus, body weight reduction and/or CPAP (standard treatment) would be useful for patients in CL1, oropharyngeal muscle training or hypoglossal nerve stimulation may improve UA muscle function in CL2 patients, and oxygen inhalation or acetazolamide might reduce high loop gain in CL3 patients.

This study has several limitations. First, clustering was done only in the supine position. Although, we could estimate the effects of sleep stages and body positions within each cluster, these might not reflect the pathophysiology of all sleep stages or that of the non-supine position. It would be ideal to categorize patients according to combinations of four REM/NREM and supine/non-supine conditions. However, every clinical PSG does not provide sufficient parameters for all four combinations. Further studies will be needed to consider the exact effect of sleep stages or body positions on clustering. Second, we did not include women due to sex differences in the UA, fat distribution, and breathing stability [16]; moreover, only few samples of retrospective data were available for women. Therefore, we cannot draw conclusions regarding the impact of sex on clustering. However, our study showed that the apnea fraction was larger, duration was longer, and hypoxemia was worse in all clusters during REM sleep, suggesting greater UA collapsibility during this stage. Female patients generally have REM-predominant sleep apnea and exhibit more hypopneas [27]. Therefore, most female patients might belong to CL3. Female SDB patients should also be clustered in this way. Third, the effect of body position was evaluated only in subjects who slept in both the supine and non-supine position. Nevertheless, since approximately two-thirds of patients in each cluster were included, we could only speculate the body position effect on our PSG variables. Forth, we did not consider ethnic differences among patients. Japanese patients with sleep apnea are less obese than Caucasian patients [28, 29]. Future studies should evaluate whether similar results are true for non-Japanese subjects. Fifth, our study was conducted at a single institution, and thus, patients may not represent the general OSA population. Thus, multiple-center studies are required to confirm our findings. Lastly, as we used alternative criteria included in the 2007 AASM scoring manual for hypopnea, we might have underestimated hypopneas. In the 2012 revision of scoring criteria, AHI increased about 20%, mainly due to hypopneas [30]. If the 2012 criteria had been used, we might have found worse AHI and decreased Fapnea in our subjects. However, we do not consider that this affects the essential quality of clustering in our study.

In conclusion, hierarchical cluster analysis using PSG variables in the supine position revealed three distinct subgroups of OSA patients based on respiratory events characteristics. Combined with demographic data, the three subgroups included obese patients with high fraction of apnea and severe desaturation, non-obese patients with high fraction of apnea and long event duration, and patients with low fraction of apnea and high proportion of central apneas. These results suggest that PSG variables other than AHI may be useful for personalizing treatment in patients with moderate to severe OSA. Future studies are needed to verify the accuracy and efficacy of this clustering analysis.