1 Background

1.1 Importance of airway management

The ability to definitively manage the airway is common between anesthesia, pulmonary, critical care and emergency providers. A multitude of predictors have been suggested to aid in detection of the potentially catastrophic “can’t ventilate, can’t intubate” scenario. This devastating endpoint can occur in 1 in 1000 elective and 1 in 250 rapid sequence cases [1]. These predictors have variable sensitivities and specificities for actual detection of a difficult airway with failure to predict and plan leading to catastrophe. Appropriate airway management has obvious implications for mortality and morbidity, the more severe being aspiration, unrecognized esophageal intubation, neurologic impairment, and death. The incidence of these and other complications is irregularly reported depending on the operational definitions used [2]. The purpose of this review is to evaluate the utility of ultrasound for detection of the difficult airway in a preoperative setting.

1.2 Difficult airway definition

Difficult airway has no widely-accepted standard definition, but rather is a constellation of various aspects of airway management [3]. It can be divided into difficult mask or supraglottic airway (SGA) ventilation, difficult SGA placement, difficult or failed endotracheal intubation, and difficult laryngoscopy. The latter is further complicated as it can reference direct laryngoscopy, indirect laryngoscopy (i.e. video), or flexible fiber optic bronchoscopy. The definition of difficult intubation also lacks consensus [4, 5] but is commonly derived from laryngoscopy endpoints like the Cormack–Lehane Grade (CLG) [6] or the Intubation Difficulty Score (IDS) [7]. The IDS is partly derived from the CLG. The CLG has become a standard for defining difficult laryngoscopy for research purposes in anesthesia [811]. To further obscure matters, some authors have variable criteria for the number of trained airway personnel required to truly denote “difficult” airway [4]. For the purposes of this review difficult ventilation will be excluded and only difficult direct laryngoscopy and intubation will be considered. Direct laryngoscopy is performed with the end goal of tracheal intubation. The classic “three axis alignment” theory for intubation described by Bannister and Macbeth [12] is still prominent in clinical practice to maximize direct laryngoscopy efficacy. After aligning the laryngeal, pharyngeal, and oral axes with the provider’s vision, the tongue and epiglottis are displaced anteriorly to allow direct visualization of the rima glottidis. Subsequently, intubation is attempted with the corollary that easy laryngoscopy translates into easy endotracheal intubation. While this corollary is not perfect (i.e. in cases of subglottic stenosis or difficulty passing the endotracheal tube), it is the most common outcome variable of difficult airway research.

1.3 Currently available clinical predictors

While there are standard preoperative difficult intubation “predictor tests,” there is much debate over sensitivity (20–62 %), specificity (82–97 %), and reproducibility [11]. In general, each bedside indicator has poor predictive power when utilized alone [1, 10, 13, 14] but in combination they become a more useful clinical tool [10]. To discuss all of the purported predictors is beyond the scope of this review. The cited references are those of large-scale studies, systematic reviews, and meta-analysis. Mallampati classification in combination with thyromental distance potentially holds the highest predictive value (likelihood ratio 9.9; 95 % CI 3.1–31.9) for difficult intubation [11].

The modified Mallampati score is accurate for predicting difficult intubation more so than the original scale, with an area under the summary receiver operating characteristics curve (sROC) of 0.83 ± 0.03 versus 0.58 ± 0.12, respectively [4]. An older meta-analysis did not distinguish between the modified and original Mallampati scoring or difficult laryngoscopy versus intubation [11]. A third analysis contains a pooled sample of 177,088 patients with some sample overlap in populations to the first analysis [14]. Importantly, all three analyses concluded poor to moderate discriminative ability when the Mallampati score is used alone. Lastly, there are multiple methodologies for how the screening test is performed with variations in patient positioning and use of phonation [15].

Another meta-analysis [16] determined that a thyromental distance of less than 6.5 cm as determined by ruler measurement has poor sensitivity (48 %, 95 % CI 43–53 %) for difficult intubation. When assessed by the more common method of less than three fingerbreadths, the test has further diminished sensitivity (16 %, 95 % CI 14–19 %). The discrepancy in sensitivity reflects the difference in provider fingerbreadth sizes. The same provider can also use different portions of the fingers (i.e. the proximal interphalangeal joint vs. distal interphalangeal joint vs. phalanx) that change the reproducibility of this bedside screening test.

Patients with body mass indexes (BMI) of 24–34 have an increased odds ratio of difficult intubation (1.11, 95 % CI 1.04–1.18, p < 0.0001). The odds ratio increases in patients with BMI ≥ 35 (1.34, 95 % CI 1.19–1.51, p < 0.0001) but still has poor sensitivity (7.5 %, 95 % CI 7.3–7.7 %) and positive predictive value (6.4 %, 95 % CI 6.3–6.6 %) for difficult intubation [5]. However, increased neck circumference (≥43 cm) can be used in lieu of BMI as a more reliable clinical indicator of the difficult intubation [10, 17].

Of all physical exam correlations, the most commonly used for difficult laryngoscopy—and, by extension, intubation—is the CLG [6]. It is not without limitations, the most obvious being it cannot be measured without an anesthetized airway or through radiographic means. It can also be limited by operator technique, such as aligning the visual axes improperly. CLG variants exist and have enhanced sensitivity and specificity [8].

1.4 Structures visible on ultrasound

Adding ultrasound to the airway provider’s armamentarium allows for rapid visualization of structures that are typically not apparent until performing direct laryngoscopy. With appropriate depth and probe frequency selection, the ultrasound can visualize any structure that lays superficial to the oral, pharyngeal, or tracheal air columns [18]. This includes the mouth, tongue, oropharynx, hypopharynx, hyoid bone, epiglottis, larynx, vocal cords, cricothyroid membrane, cricoid cartilages, trachea, esophagus, stomach, lungs, and pleura [19]. Ultrasound has equal efficacy to CT scan in quantifying nearly all airway structure dimensions [20]. Operators can evaluate pathologies (such as masses) and determine if there is airway invasion [21]. Additionally, sonography can accurately delineate appropriate endotracheal tube size, placement, and assessment of airway edema prior to extubation [22]. This preoperative knowledge can aid the provider with intubation through anticipation of appropriate equipment, especially in the cases of expected prolonged intubation. The ability to perform the airway ultrasound using a smartphone is now available to providers and could enhance the integration into the operating room as a point of care tool [23]. At the time of this writing, several recent reviews of ultrasound in other airway management realms are available [18, 19, 21, 2427].

2 Systematic review

2.1 Methods

PubMed, Ovid, CINAHL Plus Full Text, and Google Scholar searches were conducted on May 1st, 2016 the PRISMA [28] methodology as indicated in Fig. 1. Keywords and Boolean phrases searched were: [“difficult airway” OR “difficult intubation” OR “difficult laryngoscopy” OR “difficult ventilation”] AND [ultrasonography OR sonography OR ultrasound]. No limitations or date ranges were used, yielding 86 articles after duplicate removal. Two reviewers manually screened the record titles and abstracts. Abstracts without publications, case reports, letters, textbooks, unrelated topics, or articles only available in foreign language were excluded. The remaining sixteen articles were eligible for review, three of which were integrative or narrative reviews of general airway ultrasound use. Three studies were added after reviewing eligible documents’ references. The STARD [29] checklist for diagnostic tools was used to critically appraise the twelve primary research studies that exist in the literature. Bias was assessed by use of this checklist including blinding, incomplete data reporting, and subject attrition. Use of appropriate statistical tests was determined algorithmically using graphical flow charts [30, 31]. Those that failed to meet relevance (n = 2) or did not analyze intubation difficulty (n = 1) were rejected, leaving ten studies in this systematic synthesis.

Fig. 1
figure 1

PRISMA methodology for article selection

3 Results

Table 1 provides a tandem comparison of individual studies. All studies utilized subjects in an observational prospective study design except Wojtczak [32] who employed a retrospective chart analysis of surgical patients. Wojtczak has an ongoing prospective study. One study was a pilot and no subsequent study has been published [33]. Table 2 summarizes the standard predictors stated by each study.

Table 1 Synthesis table
Table 2 Standard predictors assessed

All studies were comprised of adult samples in a preoperative surgical setting. Populations included elective surgical patients [3339], pregnant adults [40], obese patients [32, 38, 40], and females undergoing thyroidectomies [39]. Common exclusion criteria were the inability to use standard screening tests (i.e. cervical spine immobility or inability to open the mouth for direct laryngoscopy), inability to consent, and obesity (as appropriate for sample selection). Pregnancy also served as a disqualifier for some [36, 38, 41], yet was specific inclusion criteria for another [40]. Aydogmus [40] also excluded patients with hypertension and diabetes mellitus. All studies except Meco [39] excluded known airway pathologies, which are inherent to the female population undergoing thyroidectomies in that study. Edentulous patients were excluded in Gupta’s study [34]. Wojtczak [32] makes no mention of exclusion criteria. Countries of publication include the United States [3234] Turkey [39, 40], Israel [38], Canada [35], Portugal [41], and China [37]. Only three studies [33, 36, 39] discussed how the sample size was derived. All studies used convenience recruiting.

The ultrasound measurements were obtained using variable positioning strategies of neck extension [32, 34, 40] versus neutral neck alignment [32, 33, 3537, 39, 41], and supine [3234, 36, 37, 3941] versus sitting upright [35]. Wojtczak [32] found significance for difficult intubation in measuring hyomental distance when obtained in a neck-extended position but not in a neutral position in obese patients. Ezri [38] did not disclose positioning and was unable to be contacted to clarify positioning. Sonographers consisted of emergency medicine residents [33], anesthesiologists [34, 36, 40], radiologists [38, 39], and the principal investigators [32, 35].

All studies utilized a direct laryngoscopy technique, with some investigators electing for adjunct use of a bougie [36, 38] in difficult airways. Pinto [41] graded all subjects with external laryngeal pressure which is known to improve the laryngoscopic view [38]. The outcome variable defining difficult airway in all studies was a CLG III or IV. Meco [39] additionally used the IDS of greater than 5. Pinto [41] also compared performance of sonography to the Naguib model [42]. This somewhat cumbersome calculation can be used to predict difficult intubation using the formula 4.9504 + (thyrosternal distance × 1.1003) + (Mallampati score × −2.6076) + (thyromental distance × 0.9684) + (neck circumference × −0.3966). Using these definitions, there were a total of 114 difficult airways out of 681 subjects across all studies. Wojtczak [32] provided no reference to qualification of the direct laryngoscopist, while Wu [37] specified that the anesthesia providers were required to have more than two years of experience. Pinto [41], Ezri [38], Aydogmus [40], Meco [39], and Komatsu [36] referenced the anesthesiologist as the individual providing the direct laryngoscopy. Hui [35] stated either the attending staff anesthesiologist or senior resident performed the direct laryngoscopy. Adhikari [33] and Gupta [34] stated that anesthesia providers performed the direct laryngoscopy.

There is considerable variability in the sonographic locations assessed and scanning protocols (see Table 3). Locations that correlated to difficult intubation are hyomental distance [32] with the neck extended, at the hyoid bone [33, 35, 37], and thyrohyoid membrane [33, 34, 37, 41]. There was no utility demonstrated quantifying the genioglossus [32] or geniohyoid [32, 33] size, subglottic air column diameter [40], thyroid gland size [39], or the level of the thyroid isthmus [33, 38].

Table 3 Sonographic technique and measurements

Hyomental distance with neck extension demonstrates predictive significance in a small sample size of 12 obese adults with 6 difficult laryngoscopies [40]. The difficult laryngoscopy group had a 52.6 ± 5.8 mm measure compared to 65.5 ± 4.1 mm in the easy intubation group (p < 0.01). This location requires a low-frequency curvilinear probe in all but the smallest of patients, while all other locations finding significance can be performed with a high-frequency linear probe [21]. No other studies evaluated this measure.

At the hyoid bone, Adhikari [33] found measurements of 16.9 mm (95 % CI 11.9–21.9) in the CLG III/IV group differed significantly from the 13.7 mm (95 % CI 12.7–14.6) in the CLG I/II group. This aligns with Wu’s [37] findings, with measurements of 15.9 ± 2.7 mm in the difficult laryngoscopy group, versus 9.8 ± 2.6 mm [37] in the easy laryngoscopy group (p < 0.0001). A third study [35] only evaluated whether or not the hyoid bone could be seen via a sublingual sonographic approach. Inability to identify the hyoid bone demonstrated significance (p < 0.0001) for CLG III/IV on intubation with a sensitivity of 72.7 % and specificity of 97 %.

Adhikari [33] found thyrohyoid membrane anterior tissue as a significant predictor. CLG III/IV have a 34.7 mm (95 % CI 28.8–40.7) versus 23.7 mm (95 % CI 22.9–24.4) in CLG I/II. Wu [37] also found this level to correlate to difficult laryngoscopy of 23.9 ± 3.4 mm versus 14.9 ± 3.9 mm (p < 0.0001) in the easy group. Similarly, Pinto [41] evaluated only this location and found significance and derived that ≥27.5 mm denotes a difficult laryngoscopy. Gupta [34] used a different oblique, transverse view through the membrane and derived ratios through regression analysis with the intent of having a pre-intubation CLG made by ultrasonography. Investigators derived a negative correlation of the distance from the epiglottis to the vocal cords (−0.966, 95 % CI −1.431 to −0.501, p = 0.0001), positive correlation of the size of the hyoepiglottic ligament (0.595, 95 % CI 0.261–0.929, p = 0.0008), and a ratio of the two measures. The ratio has the most positive correlation and derived an ED50 and ED95 of 2.41 and 4.86 respectively for detection of a CLG III (no CLG IVs in population).

Aydogmus [40] uniquely examined tracheal air column diameter at three levels starting at the vocal cords and moving caudally. Researchers found no significant difference (p = 0.160) in subglottic air column diameter between non-obese (16.78 ± 2.13 mm) and obese (17.69 ± 1.91 mm) pregnant women. While this does not match the model of anterior neck thickness evaluated the others, it adds to the literature that the air column does not narrow with varying BMI.

There are conflicting findings at the vocal cords: three authors found significance [3638] when measuring the distance from the anterior commissure to the skin. This finding was not supported by Adhikari [33], who measured from the thyroid cartilage to the skin at the level of the vocal cords. Ezri [38] found difficult laryngoscopies had neck thickness of 28 ± 2.7 mm compared to 17.5 ± 1.8 mm (p < 0.001). Wu’s [37] findings support this marker with CLG III/IV having 13.0 ± 3.1 mm compared to easy grades measuring 9.2 ± 2.0 mm (p < 0.0001). Komatsu [36] performed a study with an obese population similar to Ezri [38] using a different ethnic cohort (Americans instead of Israelis). Despite the published title, Komatsu [36] actually did find significance at this level (p = 0.049) between groups (difficult 20.4 ± 3 mm compared to easy 22.3 ± 3.8 mm, p = 0.049), but after evaluation of the data through multivariable regression analysis, deemed it not clinically important (p = 0.134) based on independent predictive ability. The discrepancy of difficult versus easy laryngoscopy has seemingly conflicting depths (i.e. the more anterior pre-tracheal tissue reflecting difficult intubation conflicts with the more posterior tissue by Komatsu). This may reflect a type 1 error on the latter study, given the borderline p value. However, there may be an outlier implication in the findings, as a 10 mm difference was found between the three patients with grade IV views. The superficial measure may reflect an anatomical variant of an extremely anterior airway in an obese patient, and given the small sample size, could skew the results. This is supported in that no discussion of data normality analysis takes place, despite use of parametric testing methods. Additionally, Komatsu [36] did not allow for the backwards, upwards, rightwards pressure (BURP) technique, which does improve visualization [38]. Sonography protocol differences are unlikely as Dr. Ezri was on site for Komatsu’s study.

Two authors [33, 38] had conflicting findings at the level of the suprasternal notch. Ezri [38] showed that 33 ± 4.3 mm versus 27.4 ± 6.6 mm in the easy laryngoscopy group was significant (p < 0.013) while Adhikari [33] did not.

Appropriate statistical tests were used by most [3235, 37, 38, 40, 41]. Meco [39] utilized Pearson correlation, a parametric test, for correlation analysis on data that is mixed ordinal and nominal. More appropriate selection would have been a nonparametric Spearman correlation (for ordinal) or contingency coefficients (for nominal). The use of the more stringent parametric test introduces the possibility that an effect of sonographic thyroid volume on difficult intubation was missed [30]. During comparison analysis, Meco [39] did utilize appropriate tests. Wojtczak [32] utilized an unpaired t test, but does not mention if data was normally distributed. If the data was not normally distributed, it may influence the determination of effect significance.

4 Discussion

Predictive value for difficult laryngoscopy has been demonstrated at the hyoid bone, thyrohyoid membrane, and hyomental distance in the sniffing position. The results at other locations inferior to the thyrohyoid membrane, however, are mixed. Adhikari [33] suggests that an anterior neck soft tissue thickness of 28 mm at the thyrohyoid membrane can serve as a cut off to detect difficult laryngoscopy. It seems clinically prudent to perform the measurements in the position that the patient will be during laryngoscopy: supine with the neck extended. While three authors [33, 37, 41] found significance at the thyrohyoid membrane, the measurements are seemingly contradictory. Adhikari’s [33] easy laryngoscopy group had a measure of 23.7 mm, Wu’s [37] difficult laryngoscopy group was 23.9 mm, and Pinto [41] derived 27.5 mm. This may be explained in part by the demographics of the groups. Adhikari’s [33] study is an American study with a predominantly female sample (32 females with 19 males) and mean age of 53. However, eighty-three percent (5 of the 6) difficult laryngoscopies were male. Pinto’s Portuguese population was similar in that there were a statistically significant different amount of men compared to women (39 vs. 34) and had a disproportionally high among of difficult intubations in the male group (13 vs. 4) The varied populations studied may have implications for some of the conflicting findings. Wu’s [37] study is in a Chinese Han population, predominantly female (120 out of 203 total), with the 28 difficult laryngoscopies divided among the sexes equally. This may be attributed to the anthropometric differences in fat distribution between Chinese and white populations. While white subjects have a higher body mass index (BMI), Chinese subjects have increased fat distribution to the trunk [43]. The contrast between races is most evident among females. These gender and ethnic differences may account for the nearly 10 mm disparity in the groups. There are no studies comparing thoracic subcutaneous fat or neck circumference specific to these populations.

Komatsu had results conflicting with Ezri [38] e despite similar positioning in both groups. The conflicting results at the suprasternal notch have little clinical significance in direct laryngoscopy due to the inferior location to vocal cords. Use of this anatomic location does not align with the surrogate use of Cormack–Lehane Grade for detection of a difficult intubation. This is not to say that the location itself is without utility for assessment. For example, if the surgical case necessitates insertion of a bronchial blocker or double lumen endotracheal tube this location may be of clinical use.

Wojtczak’s [32] study uniquely examined hyomental distance. The neck neutral position did not yield significant results, but there was significance with the neck extended, which represents the intubating position. The measurement difference can be attributed to the stylohyoid ligament’s stationary affixing of the hyoid bone to the occiput [44]. When extending the neck, the mandibular mentum moves away from the hyoid while the hyoid remains stationary.

Evaluation of the tracheal air column diameter in normal airways does not correlate to difficult intubation prediction [40]. However, utility exists in utilizing ultrasound for pathologic states such as tracheal stenosis for prediction of the difficult airway [20, 22].

The time to acquisition across a large range of anatomic locations was slightly over nine minutes [33]. However, when obtaining only those levels that showed significance, this time is reduced to less than 2 min. When not including caliper measure at the point of care, time is reduced further to 31.7 ± 12.4 s [34]. In a busy preoperative setting, this may be a feasible amount of time to allot for a more detailed airway assessment when concerns of a difficult airway are present.

Past studies [9, 17] demonstrated that neck circumference correlates to a difficult airway more so than obesity [5]. Ezri’s [38] findings agree with this notion in a morbidly obese population (50 ± 3.8 cm vs. 43.5 ± 2.2 cm, p < 0.001) that had no difference in BMI (p = 0.47). Adhikari [33] found in post hoc analysis BMI and sonographic measures do correlate closely. This, by extension, may support that actual anterior neck tissue is a better indicator than circumference as fat distribution can differ among individuals despite circumference [36]. A suggested explanation is that increased anterior neck tissue results in decreased airway structure mobility [38].

4.1 Limitations

There is considerable variability across the literature reviewed regarding the sonographic measurements that will predict a difficult airway, which encompasses laryngoscopy, use of adjunctive measures, and intubation. This review seeks to identify if sonography can be utilized as a predictor of a difficult airway. All studies selected the CLG of III and IV as the outcome variable that most closely correlates to a difficult laryngoscopy. The likelihood of difficult intubation is high with a grade III view, approaching 90 % [45]. Across the literature, the CLG is a standard outcome measure for determination of difficult laryngoscopy.

Four [36, 38, 40, 41] of the ten studies employed external laryngeal pressure (Sellick’s maneuver) while intubating, which may provide an improved grade view compared to no external pressure [46, 47]. Paralysis is a standard component of induction as it facilitates successful intubation [48] and the use of neuromuscular blockade was confirmed in all ten studies.

The studies reviewed have marked differences in populations with the only commonality being an adult population. Another limitation to these studies is the use of convenience samples, which limits applicability across general patient populations. Despite the differences in sample size, ASA status, BMI and demographics, these authors elected to include the studies due to the overall limited number of publications on ultrasound assessment of the difficult airway. Other limitations reflected in these studies were the absence of specific ultrasound scanning protocols and variations in the training in ultrasound use among investigators. It should be noted that a discussion of training in airway ultrasound is described by all authors except Gupta [34], who actually cites loss of 23 subjects due initial deficiency in performing sonography. Performing midline anterior neck sonography may be challenging as maintaining probe contact with the skin is difficult [34], especially over the prominent thyroid cartilage in men [26]. Approaching this more difficult location with the high-frequency probe tilted cephalad seems to relive this issue [41]. However, as evidenced by others, 6 h appears to be sufficient to add to the skill base for previously skilled sonographers [33]. Komatsu [36] also states that the Data and Safety Monitoring Committee stopped the study, dropping the required power for the study from a sample of 200 to 64. No response has been given as to why.

Pinto [41], Hui [35], Ezri [38], Komatsu [36], Adhikari [33], Gupta [34], and Meco [39] confirmed blinding between ultrasound operators and those performing the intubation, while Wojtczak [32] was unable to blind due to study design. Blinding is unknown for two articles [37, 40]. These authors have been contacted for clarification without response. There is potential for operator bias if the provider performing the laryngoscopy is aware of the sonographic findings but this is unlikely given the lack of validated cut off measurements.

5 Conclusions

5.1 Recommendations for future study design

While the articles reviewed provide interesting insight into sonographic predictors for difficult intubation, more studies are needed in order to standardize the findings. Future research should address the limitations as described in this review. Attaining a larger sample size with blinding would increase the ability of the research to measure an effect and maximize accuracy of the results. Additionally, using a consistent set and methodology for quantifying physical assessment predictors will limit variability of the findings. Coupled with this, a formalized ultrasound scanning protocol specifying the measurements to be attained, as well as appropriate technique in attaining the measurements will help to improve predictive value and reproducibility. To avoid technical difficulty for image acquisition, a saline bag could be used to create an acoustic window. This will allow for easier probe manipulation and enhancement of the deeper structures. A study can be launched using a within-subjects design comparing standard indicators and ultrasound to the Cormack–Lehane scale using grades III and IV as the outcome criteria for difficult laryngoscopy.

5.2 Summary

The purpose of this review was to evaluate existing literature on the predictive value of airway ultrasound for difficult intubation, defined as Cormack–Lehane grade III or IV under direct laryngoscopy. Current clinical predictors such as the modified Mallampati score, thyromental distance, and obesity are inadequate tools for detecting a difficult airway when used alone. The consequences of failing to predict and prepare for a difficult intubation can range from transient hemodynamic changes to hypoxic cardiac arrest. Ultrasound shows promise for enhancing provider knowledge preoperatively, thereby improving patient safety, by allowing for visualization of many airway structures. This additional information may prompt the provider to change their airway plan to include adjuncts that may not have been readily available if a difficult intubation had not been predicted. The intent of this review is to synthesize the current literature and raise the evidence level evaluating this modality in the difficult airway assessment. Despite differences in sample characteristics and measurement techniques across studies, airway ultrasound holds early predictive value to detect the difficult airway. Standardized studies are needed to correlate this diagnostic modality to difficult intubation or reject its use. Significance for difficult intubation has been established by visualization of the hyoid bone, measurement of the hyomental distance with neck extension, and the measurement of anterior soft tissue thickness at the thyrohyoid membrane. Further investigation is warranted using a larger sample with standardized positioning and a well-defined scanning protocol.