Introduction

Upper airway surgery in patients with obstructive sleep apnea (OSA) has shown positive results in selected patients [1, 2]. Tonsillectomy is established as the treatment of choice for patients with OSA and tonsil hypertrophy [3, 4]. Results following tonsillectomy have proven to be favourable even in the long term. Although the beneficial effects of many other surgical techniques have been demonstrated in OSA patients [2, 5, 6], their long-term effect has been questioned [7]. This thought is mainly due to the fact that the operated areas consist of soft tissues and their condition could, therefore, worsen over the years [8]. However, there is little evidence on this issue in intervened patients [9].

According to the only available meta-analysis reviewing the published literature on this matter, the success rate of the beneficial results following uvulopalatopharyngoplasty decreases from 67.3%, when measured shortly after surgery, to 44.35% when evaluated in the long term [10].

In some of the published research, measured parameters are limited to apnea–hypopnea index (AHI), oxygen desaturation index (ODI), or exclusively subjective parameters. As far as the evaluation of surgical results is concerned, multiple definitions of ‘long term’ have been found. Some publications define ‘long term’ as 6-month post-surgery [11], others as 12 months following the surgical procedure [12], and in the recently published aforementioned meta-analysis, only evaluations carried out after 34 months were considered ‘long term’ [10].

The main objective of this study was to investigate whether the beneficial effects obtained through upper airway surgery remain stable over time, measured both in objective and subjective terms. As a secondary aim, the stability of the results according to which type of surgery performed was also studied.

Methods

This is a retrospective study in which data from all adult patients with OSA who underwent upper airway surgery in the Otolaryngology Service at a University Hospital have been collected since 2002. Consecutive patients operated during the period 2002–2018 and studied until 2020 that completed the first post-surgical control were included. Exclusion criteria were: patients who underwent exclusively nasal surgery or tonsillectomy, referred for bariatric or skeletal surgery. Five more patients were excluded as they did not perform the first postoperative sleep study.

A type 1 or 3 sleep study was performed in all patients, the interpretation of which was carried out based on the criteria in force for each moment according to the American Academy of Sleep Medicine (AASM). The definition of hypopnea agreed since 2001 was 30% decrease in thoraco-abdominal movements or airflow lasting 10 s or more and accompanied by at least a 4% drop in saturation [13]. In 2007, the AASM adopted preferred and alternative definitions. In this work, it was used the recommended criteria; 30% decrease lasting for at least 10 s and accompanied by a 4% drop in saturation [14]. Later, in 2012, the drop in oxygen saturation for hypopnea changed to 3%, which we assumed in our practice (recommended criteria of the 2012 AASM scoring manual) [15].

All patients filled in a questionnaire on sleepiness and quality of life. All of them underwent an examination of the upper airway while awake, paying special attention to the tonsil size and the palate’s characterization using the Friedman classification [16], in addition to a complete fibroscopy [17]. To conclude the examination, a propofol-induced sleep endoscopy was performed using a TCI (target-controlled infusion) pump [18], except in cases of tonsillar grade 3 and 4 according to Friedman [16].

Within the upper airway surgeries, different types of pharyngoplasty were included: uvulopalatopharyngoplasty (UPPP) [19] (with or without the palatopharyngeus muscle section, in addition to the suture of the pillars), lateral pharyngoplasty [20], expansion pharyngoplasty [21], and modified reposition pharyngoplasty with barbed suture [22]. Other techniques included were: lingual coblation tonsillectomy, midline posterior glossectomy [23], radiofrequency of the tongue base, partial epiglottectomy, and hyoid suspension.

All these techniques could be accompanied by tonsillectomy or nasal surgery. The surgical indication was based on the awake + drug-induced sleep endoscopy findings, the subjective clinical parameters, and the sleep study. Different surgical techniques could be combined to create a surgical treatment tailored to the patient. Multilevel surgeries were included in surgical procedures performed in both a single stage or two stages.

During data interpretation, two groups of pharyngoplasties were created. On one hand, the group of classic pharyngoplasties, including the traditional UPPP [19] and its variants, described before 2003 which had a resective character. In addition, on the other hand, the group of new pharyngoplasties: lateral pharyngoplasty [20], expansion pharyngoplasty [21], and reposition pharyngoplasty with modified barbed suture [22], with a more functional character.

Post-surgical control was based on a new sleep test and the filling of the sleepiness and quality-of-life questionnaires mentioned above again at 8 months. For the analysis of the long-term results, such analyses were repeated after 34 and after 48 months. Some patients underwent extra control even later. The same type of sleep study was always performed for each patient.

A total of 153 patients who completed the first post-surgical control 8 months after surgery were included in the final analysis. Concerning the study of objective variables regarding OSA, the apnea–hypopnea index, the oxygen desaturation index, the amount of time spent by the patient with an oxygen saturation lower than 90% (T90), and the minimum O2 saturation during sleep (SAT MIN) were considered.

For the study of changes in subjective terms of OSA, the Epworth sleepiness scale score [24] and the visual analogue scale of well-being [25] were employed. On this visual scale, the patient must choose a point on a 10-cm line that corresponds to their health-related quality of life concerning OSA (0, cannot be worse; 10, cannot be better). The competent authority’s ethics committee approved this study, and informed consent was obtained before surgery from all patients included in this work.

Statistical analysis was performed with Stata14 (StataCorp 4905, Lakeway Drive College Station, Texas 77,845, USA). Continuous variables are described as means with standard deviation. Qualitative variables are described as proportions. After verifying normality, the t test was used to compare continuous variables, and Chi-square or Fisher’s test for categorical variables. For continuous variables that did not meet the normal distribution, the Mann–Whitney U test was performed. For all cases, the level of significance was set at < 0.05. To study the evolution of the different variables in regard to time, regression equations were carried out and represented through scatter diagrams.

Results

The first post-surgical control was carried out on average 236 (± 122) days after surgery. The second control, completed by 54 patients, was carried out at 1028 (± 987) days after surgery, around 34 months, as proposed by the most conservative authors [10]. On average, the third control was carried out at 1428 (± 705) days, completed by 20 patients, around 48 months following surgery. Some patients underwent extra control, and the latest follow-up was completed at 3159-day (105-month) post-surgery.

Average OSA indicators both in objective (AHI, ODI, T90, MIN O2 SAT) and subjective terms (Epworth sleepiness scale and well-being scale) significantly improved after surgery and remained stable over time.

Table 1 summarizes the main variables considered before surgery and after successive post-surgical controls. As subgroup analysis, the pre- and postoperative data of each population that met the long-term follow-up are shown. The body mass index (BMI) did not change between the preoperative and first postoperative control and remained unaltered in subsequent postoperative controls too. The mean AHI of the sample before surgery, of 34.84/h (± 24.52), stands out, which after surgery decreased to 14.54/h (± 16.24) and did not vary more than 1 point in the successive controls (p = 0.01). All sleep variables showed significant improvement after surgery and remained stable in the following controls.

Table 1 Main variables evaluated before surgery and after successive post-surgical controls. Subgroup analysis for each population that met the long-term follow-up

Figure 1 shows the evolution of each variable throughout the follow-up period. Each point represents a completed post-surgical test and the moment it was performed. The slope coefficient of the regression equation obtained for each variable was < ± 0.001 in all cases.

Fig. 1
figure 1

Evolution of main parameters during follow-up (days). AHI apnea–hypopnea index, ODI oxygen desaturation index, T90 time spent oxygen saturation < 90, MIN O2 SAT minimum oxygen saturation, EPWORTH Epworth sleepiness scale score, QoL visual analogue scale for quality of life

Table 2 describes the evolution of the mean AHI according to the surgeries performed. The distribution of the surgeries performed can be observed. There were no statistically significant differences in the long-term stability of results when comparing one surgical technique to another. Although also non-significant, the stability of the results of some techniques stands out, the modified reposition pharyngoplasty among them.

Table 2 Evolution of the AHI based on the surgeries performed

Table 3 describes the evolution of the mean AHI according to groups of surgeries. The group of new pharyngoplasties (expansion pharyngoplasty, lateral pharyngoplasty, and modified reposition pharyngoplasty with barbed suture) presented greater long-term stability than the group of those considered classic pharyngoplasties (UPPP and miscellaneous) p = 0.04. Likewise, the long-term results of single-level surgeries were found to be more stable than those of multilevel surgeries, although statistically significant differences were not observed (p = 0.07). There were no differences when multilevel surgery was performed in one or two stages, p = 0.54.

Table 3 Evolution of AHI based on grouped surgeries

Table 4 summarizes the presurgical variables of those patients who completed the successive controls. There were no statistically significant differences in the distribution of gender, age, and BMI between the global population and the patients who underwent long-term follow-up. The preoperative AHI, ODI, MIN O2 SAT, T90, Epworth, and well-being scale did not show statistically significant differences between the different populations.

Table 4 Presurgical values of patients who completed each control

The success obtained according to Sher’s criterion [7] was 53% for the total sample. For the group that fulfilled a second control, it was 54%; and for the group that fulfilled the third control, it was 50%.

Discussion

Our data indicate that the improvement obtained after upper-airway surgery is not limited to the immediate post-surgical period but is maintained over time, as demonstrated by our results obtained up to 48 months following surgery.

The Sher [7] success recorded in the first and subsequent controls did not vary either.

These results come into agreement with those obtained by the Karolinska’s group, who evaluated AHI after modified uvulopalatopharyngoplasty at 6 and 24 months after surgery. They observed that the preoperative mean AHI went from 52.9/h to 23.6/h after surgery, and remained stable in the test performed after 24 months with an AHI of 24.1/h [26].

The rest of the objective parameters in our sample remained equally stable in the second and third controls. In the scatter diagrams (Fig. 1), it can be observed how the slope was very low for all the variables. These data indicate that even though there is a worsening trend over time, the rate at which this occurs is very slow, since even at the 3000-day (100-month) follow-up, the slope remains practically flat.

Furthermore, another paper of the Karolinska´s group evaluated post-surgical results 15 years following UPPP. During their analysis, they evaluated the ODI variable, which indeed did not change its definition throughout that period. The presurgical ODI was 26.5/h, and after completion of the study 15 years later, with 52% of the original participants, the post-surgical ODI was down to 8.5/h [27]. In our case, the ODI improved after surgery, and in subsequent follow-ups, it continued to show statistically significant and stable improvement. These data support the hypothesis that the beneficial effects of surgery persist over time.

The BMI of the evaluated patients did not change after surgery or throughout the follow-up, so neither failure nor benefit could be attributed to this factor.

Subjective variables such as the Epworth sleepiness scale, snoring, and quality-of-life scales have been more widely studied in the literature as they do not require sleep studies [28]. In our sample, the measurements carried out via subjective parameters also exhibited favourable and stable results. The Epworth scale score showed a downward trend over time, and the quality of life an upward one. This phenomenon was also seen in the long-term evaluation of Epworth and excessive daytime sleepiness (EDS) in other studies [26, 27]. This subjective aspect can even improve over time. This phenomenon is due to the fact that OSA improvement can positively affect other aspects of quality of life. In fact, when consulted 24 months after surgery, 83% of patients affirmed they were satisfied with the surgery, and 92% said they would recommend it [28].

The number of respiratory events and their consequences tend to increase with age [29]. This is due to increased fat and flaccidity in soft tissues, muscle tone loss, neuromuscular/central response, and physiological changes in sleep architecture [30]. This worsening is intrinsic to senescence and will be present in both operated and non-operated patients. However, other hypotheses point to deterioration linked to the passage of time specifically in the operated patients. These are based on the loss of tension following the surgeries performed and the pharyngeal scar’s anatomopathological changes [8]. However, in our sample, the results observed remained unaltered throughout the entire follow-up (Table 1, Fig. 1). In our sample, new pharyngoplasty techniques, which include repositioning of the muscles with anchorage to rigid structures, showed superiority in terms of stability.

No technique or combination of surgeries demonstrated an advantage in terms of long-term stability. This question could not be assessed during the study due to the array of techniques employed and possible multiple combinations. Such diversity meant that the number of participants in each category was low, which may have impacted statistical power to detect significant differences. Indeed, although no technique showed significant superiority, modified reposition pharyngoplasty suggests very long-lasting results, which can be inferred thanks to being one of the most numerous groups.

Other types of surgeries also have demonstrated long-term stable outcomes. The upper airway stimulation by implantation of hypoglossal nerve stimulation, obtained a success rate of 72.4% at 12 months, and 75% at 60-month follow-up [31].

Single-level surgeries showed greater stability than multilevel surgeries, although no statistically significant differences were found (p = 0.07). This may be due to the fact that in cases in which multilevel surgery is indicated, the collapse pattern is more complex and, therefore, difficult to resolve. Moreover, the more elements are involved, the more likely it is that one of them will get worse.

Within this series, 10 lingual tonsil coblations were performed. Traditionally, this technique has raised doubts due to potential growth after partial excision. It was not possible to shed light on this issue, since unfortunately, the third (48-month) control included no participants that had undergone this type of surgery. Nevertheless, in the 5 cases that completed the second postoperative control (performed on average at 34-month post-surgery), the AHI remained similar, with a post-AHI of 12.96/h compared to the first AHI control of 10.34/h (presurgical 25.69/h). A study focused on robotic tongue base surgery showed postoperative growth of the lingual tonsil in only 8.8% of patients and found no correlation with time passed since surgery [32].

The evaluation of the demographic parameters of those patients who fulfilled the second and third controls showed that there were no statistically significant differences in the distribution of gender, age, and BMI compared to the global population studied. The baseline OSA indicators (AIH, ODI, MIN O2 SAT, T90, Epworth, and well-being scale) did not exhibit statistically significant differences between the different populations either. The Sher success [7] obtained after surgery, around 50–55%, did not show differences between the first, second, and third control patients. This suggests homogeneity among the populations comprising the different controls and constitutes in itself a valuable result.

The Sher success [7] obtained in our sample was not very high. This is because all operated patients were included, and not just those considered good candidates. Patients with severe OSA and moderate OSA with other risk factors who did not tolerate or use continuous positive airway pressure (CPAP) were assessed and were offered the surgery that best suited their case to try to reduce their disease severity [5].

The main limitation of this study was the loss of patients throughout the follow-up. All operated patients were invited to continue with periodic controls to assess their health. However, many of them, due to either the persistence or cure of their disease, stopped attending or declined the voluntary follow-up. Patients who fulfilled the different postoperative controls can be considered a representative sample of the totality, since the baseline variables did not show differences to those of the sample’s total population.

Data from the fourth and subsequent controls were not included, because the number of patients in each group was too low (maximum of 10). However, these data have been entered into the temporal regression model, as these are entered individually and provide longer term follow-up data.

Another limitation of this work was the immense variety of surgeries included and the scant sample size available for each category. For this reason, evaluations of grouped techniques were carried out.

The changes on the hypopnea definition during this period of time add a limitation to this study. However, the changes on the hypopnea scoring criteria resulted in a greater prevalence and severity of SDB [33]. Therefore, when this change affected to patients of this work, it does in a more exigent way. After being diagnosed with a severer criteria (4% oxygen desaturation drop), the results were evaluated with a lighter criteria (3% oxygen desaturation drop), which made more difficult to have a good outcome. In fact, it could happen also between short- and long-term results. The long-term evaluation had to fulfil a lighter criteria and with that, a higher severity was obtained with the same sleep study. Although being a limitation which makes difficult to drive conclusions, the direction in which the definition changed reinforce the positive outcome of stable results of this work. We could not find any consensus correction factor to introduce and reassess our sample [33]. With the aim of overcoming this controversy and provide follow-up data that did not changed over the time, we made the evaluation of the results also by oxygen saturation index and by T90.

Conclusion

The results obtained after upper airway surgery remain stable in the long term, both in objective and subjective terms. In our sample, modern pharyngoplasty techniques have shown superiority over the classical ones in terms of long-term stability.