Introduction

A detailed examination of the upper airway (UA) is paramount for the study of patients with obstructive sleep apnea hypoapnea syndrome (OSAHS); in addition, to provide an overview of the anatomy of the UA, it will establish a topographic diagnosis of obstruction, identifying potentially collapsible areas, which is a fundamental aspect when alternative treatments to CPAP therapy are necessary. Several authors have shown that incorrect or insufficient use of selection criteria for planning surgical treatment may be responsible for failures related to surgery in patients with OSAHS [13]. This is why topographic diagnosis is essential to improve the selection of surgical candidates and to predict its success; besides, it helps to evaluate other therapeutic alternatives to CPAP such as mandibular advancement devices (MAD) [4, 5]. Nevertheless, awake UA examination does not reflect what occurs during sleep when the muscle tone of the UA is decreased.

Drug-induced sedation endoscopy (DISE) is a technique that involves pharmacological induction of sleep so that the UA can be explored with the aid of a flexible endoscope, defining vibration and collapsible areas in patients with chronic snoring and OSAHS [6]. This technique was originally described in 1991 by Croft and Pringle [7] and has been used in the evaluation of these patients since then. One can assess the pattern and degree of collapse of the UA, in a state that simulates natural sleep; therefore, it is considered a guiding tool in making treatment decisions, particularly in the surgical treatment [5, 8]. DISE has been validated in multiple studies [912], and it is considered a simple, safe, and cost effective technique [13, 14]. Some studies have demonstrated its usefulness in selecting candidates for surgery [3, 1518] while others address the difference between DISE and the awake UA, and its influence on treatment planning [17, 19, 20].

Some publications point out the subjectivity of the technique and argue that experience may have an important role on the reliability of the results [2123]. Nevertheless, the influence of experience on treatment planning has not been assessed yet. Therefore, this study was conducted in order to know the influence of experience on the selection of alternative treatments to CPAP. Secondary aims of the study were the knowledge of the agreement on pattern and degree of collapse of the different areas of the UA between an experienced observer and an observer in training.

Material and methods

It is a cross-sectional study performed in a university hospital. Thirty-one preoperative DISE videos from our archive were randomly selected. All the videos belonged to OSAHS patients seeking for alternative therapies to CPAP.

The videos were independently and blindly evaluated by an expert observer and an observer in training (otorhinolaryngology resident). The observers were unaware of the identification and physical examination of the patient and prescribed treatment. DISE findings and treatment indication of each observer were then compared. The study was approved by the local ethical committee.

Drug-induced sedation endoscopy

DISE had been performed in our hospital’s outpatient surgery unit, the sedation agent used was propofol administered with a target-controlled infusion (TCI) pump. No anticholinergic or topical anesthesia of the nose was used during the procedure. Each video corresponded to a single patient and had duration of around 10 to 15 min.

A modified VOTE classification [24] was selected to assess DISE findings. This scale evaluates the primary structures that can contribute to the collapse of the UA, soft palate, oropharynx, tongue base, and epiglottis, proposing three degrees of severity, no collapse, partial obstruction, or vibration and complete obstruction or collapse, and classifies the shape or configuration of collapse as anteroposterior, lateral, or concentric. The modification of VOTE classification consisted of the possibility of lateral and concentric collapse at the tongue base level, since the original classification only considers the anteroposterior collapse at this level (Table 1).

Table 1 Modified VOTE classification

The presence or absence of collapse at each level was compared, as well as the degree and configuration of collapse, according to this classification.

Treatment planning

After evaluating the videos, each observer proposed an alternative treatment to CPAP based on the findings. To facilitate the comparison, three treatment options were established: option 1: only soft palate surgery, option 2: only treatment of tongue base/hypopharynx including surgery or indication of MAD, and option 3: multilevel treatment, that is, treatment both at the palate and at the hypopharynx/tongue base.

Statistic analysis

A statistical STATA software program (STATA /IC 14.1, StataCorp) was used for the data analysis. The descriptive statistics for the clinical characteristics of the patients were expressed as mean and standard deviation.

The percentage of agreement between observers, kappa coefficient (k) and prevalence-adjusted bias-adjusted κappa (PABAK), and confidence intervals to 95 % as a measure of interobserver agreement was calculated, defining the level of agreement on the scale proposed by Landis and Koch [25]: k coefficient: ≤0 = poor, 0.01–0.20 = slight, 0.21–0.40 = fair, 0.41–0.60 = moderate, 0.61–0.80 = good–substantial, and 0.81–1 = almost perfect. Level of significance was 0.05.

Results

A total of 31 videos were analyzed; 87 % of the sample corresponded to male patients and the mean age of patients was 42 years (range 23–61), with an average body mass index (BMI) 26.58 ± 2.9 kg/m2. Mean apnea hypopnea index (AHI) was 30.31 ± 18.59 per hour. Thirteen patients were severe OSAHS, 8 moderate OSAHS, and 10 mild OSAHS.

The presence or absence of the UA collapse was evaluated (Table 2); most patients had collapsed at the level of the soft palate for the two observers, with 80 % of agreement, and moderate strength for the prevalence-adjusted bias-adjusted κappa.

Table 2 Interobserver agreement in the presence of UA collapse

The highest percentage of agreement, on the presence of UA collapse, was at the oropharynx level, followed by the soft palate, tongue base, and finally the epiglottis; all presented a high percentage agreement and moderate to substantial strength for the kappa coefficient except for the collapse at the velum. However, when assessing the percentage of overall agreement taking into account the four levels at the same time, this decreased to 38.70 %.

Table 3 shows the percentage of agreement and kappa values for degree and configuration of collapse in each level of the UA using the modified VOTE classification. Overall strength is moderate to good, except for the degree at the soft palate and the tongue base where kappa values decrease to a low level of agreement.

Table 3 Interobserver agreement using a modified VOTE classification

Considering obstruction degree, the observer in training overestimated the degree of collapse in almost all levels compared to the appreciation of the experienced observer (Fig. 1). At the level of the tongue base, there was an exception to this rule, as the training observer classified 45.16 % of patients as non-obstructed while the expert observer thought that only 22.58 % of them did not have any degree of obstruction at the tongue base level.

Fig. 1
figure 1

Agreement according to degree of obstruction (modified VOTE classification)

When assessing treatment planning after DISE, the percentage of agreement was 67.74 %, with moderate interobserver agreement (k = 0.5133, 95 % CI 0.2646–0.7620; p = 0.000) (Fig. 2). In 9 of the 31 patients, the differences laid in multilevel treatment vs. treatment of a single level; usually, the resident planed less multilevel treatment as detected fewer tongue base obstructions. In one patient, there was a difference between palate surgery planned by the resident and MAD planned by the senior ENT.

Fig. 2
figure 2

Indication of alternative treatment to CPAP

Discussion

This study has demonstrated that the agreement between an observer in training and an experienced observer is high at all levels of the UA. It was good for both degree of collapse and its configuration, according to a modified VOTE classification. Interobserver agreement measured by kappa index is moderate to substantial in almost all levels of the UA, except at the level of the tongue base where the strength is weak. Similarly, when considering alternative therapeutic options to CPAP based on DISE findings, interobserver agreement on treatment planning is moderate.

The most frequent site of collapse visualized during DISE was the soft palate; these results are comparable to other studies [9, 26, 27]. Although at the level of the velum, the percentage of agreement on the presence of the collapse was high; the kappa value is in the range of low level of agreement, both in assessing the presence and the degree of collapse (k = 0.1667). This may be attributed to the influence of a high prevalence of collapse at this level on the kappa coefficient. It has been demonstrated that the kappa coefficient is highly dependent on the prevalence of the disease [28]; Feinstein and Ciccheti [29] described the “paradox” of high values agreement observed associated with low values of kappa, which explain that with a fixed value of the agreement observed, the magnitude of kappa depends on the prevalence of the phenomenon studied, i.e., in cases of high prevalence, the number of true positives is high; therefore, the probability that observers classified the subjects as such is higher and the coincidence attributable by chance will be greater. Due to this paradox, prevalence-adjusted bias-adjusted κappa was calculated, obtaining a moderate strength of kappa (PABAK = 0.6). To our knowledge, this is the first time that this prevalence and bias-adjusted kappa has been reported on this subject; therefore, it cannot be compared with other publications.

The validity and reliability of DISE have been previously studied; Kezirian et al. [22] showed that the reliability of it was greater when the presence of global collapse is valued compared with the degree of this, especially at the level of the hypopharynx. Rodriguez-Bruno et al. [21] found a greater intraobserver and interobserver agreement at the tonsils-oropharynx level, followed by the epiglottis. These results are similar to those found in our study.

Gillespie et al. [17] evaluated in a blind randomized study, the test-retest and the interobserver agreement for three otolaryngologists experienced on DISE. They showed moderate concordance kappa index when comparing the results between observer pairs (k = 0.65; 95 % CI, 0.62–0.69), (k = 0.65; 95 % CI, 0.61–0.69), (k = 0.62; 95 % CI, 0.10–0.38) using their particular classification (DISE Index score). However, interobserver agreement decreased when comparing the results according to the VOTE score: (k = 0.28; 95 % CI, 0.16 to 0.40), (k = 0.24; 95 % CI, 0.10–0.38), (k = 0.29; 95 % CI, 0.16 to 0.41).

Vroegop et al. [23] assessed the intraobserver and interobserver agreement in a group of experienced observers and a group of inexperienced observers on DISE, showing that the concordance was higher in the group of experienced observers with good levels of agreement when assessing the collapse at the oropharynx, tongue base, and epiglottis.

The interobserver agreement was moderate to good for both the degree and configuration of collapse in almost all levels of the UA except the tongue base. The area of the hypopharynx and the region of the tongue base are one of the levels that becomes more important during the exploration of the UA during DISE in the selection of candidates for alternative treatments to CPAP, not only because the presence of severe retrolingual collapse has been implicated as a predisposing factor for failure in patients undergoing surgery of the palate [1] but also due to the discrepancy in the exploration of the collapse of the UA in the patient awake and DISE regarding this level [27, 3032]. Thus, recent studies consider that DISE is paramount when considering treatment with MAD or tongue base surgery [33]. Although interobserver agreement between an observer in training and an experienced observer was significant, we consider that it is important to develop a learning curve for this technique in order to obtain more reliable results at the tongue base level.

The literature has shown that DISE influences therapeutic decision, and therapeutic indications can be changed after this technique from 63.9 % considering only surgical indications [33] up to 78.4 % of cases if we take into account both surgical treatment and MAD [17]. Regarding treatment planning in CPAP failure patients based on DISE findings, a high percentage of agreement and a moderate strength of interobserver concordance were found, regardless of experience among observers. Therefore, we consider this technique a useful tool for patient selection and treatment indication. Observers differed only in one patient on the treatment level (soft palate vs. hypopharynx); the experienced observer indicated MAD and the unexperienced one, palate surgery. Probably, the unexperienced observer did not realize that the velum collapse was better when performing the mandibular advancement maneuver. The fact that 80 to 90 % of the patients had some degree of velum collapse can also explain the propensity of the unexperienced observer towards palate surgery. Importantly, although 29 to 48 % of patients had complete collapse at the level of the epiglottis, both observers indicated epiglottis surgery associated with multilevel treatment only in one patient; this could be explained by the fact that they considered the epiglottis collapse was a secondary one due to tongue base collapse. Differences on treatment planning were found mainly regarding one level treatment or multilevel treatment. In our point of view, experience may play an important role when considering multilevel treatment, as the resident underestimated tongue base collapse. Probably, agreement would be higher after a learning curve.

VOTE classification is a system that can simplify DISE findings; nevertheless, it is easy to apply, and probably, this is the reason why it is frequently used [34]. However, based on our experience, we modified it in order to represent tongue base collapses in a better way. The perfect classification has not been published yet [35]. The ideal classification should be able to describe as much as possible UA patterns of collapse, as this could be important in order to select the best treatment. However, our classification has some limitations too, as the presence of lingual tonsil hypertrophy is not reflected. Nowadays, there are different surgical approaches to tongue base collapse such as TORS, coblation lingual tonsillectomy, SMILE, interstitial radiofrequency, tongue base suspension, hyoid suspension, and hypoglossal nerve stimulation. It is unknown if these techniques have a different success rate according to the morphology or the degree of collapse. In our study, in order to make things easier to compare, tongue base obstruction treatment was grouped in one possibility (either surgery or MAD).

One of the limitations of the study is the retrospective design; however, observers were blinded to the baseline characteristics of the participants, so one should not expect the influence of confounding variables, such as AHI or BMI on the results. However, not having the availability of sound in videos and being a retrospective and blind assessment, it was not possible to make a perfect assessment of the performance of dynamic maneuvers as Esmarch and its effectiveness, which is important for the orientation of treatments with MAD.

For future research, a greater number of observers as well as a larger sample of patients would be ideal for more accurate results.

Conclusion

DISE has a moderate reliability in assessing interobserver agreement and therapeutic indication between an experienced observer and an observer in training; however, it is important to develop learning curves for this technique in order to obtain more reliable results.