Keywords

Overview

Visualization of the larynx is necessary to evaluate structure and function, identify pathology, and plan treatment. There are different methods of evaluating the larynx, and each has benefits and limitations. Flexible endoscopy under halogen light can be performed in the clinic on nearly all children and provides an excellent view of general structure and mobility at the cricoarytenoid joints. Rigid or flexible stroboscopy provides more in-depth evaluation of the vibratory properties of the vocal folds, closure pattern, and any vocal fold lesions. High-speed laryngeal visualization has the advantage of being able to capture vibratory properties of aperiodic or chaotic vibration.

Flexible Laryngoscopy

Flexible laryngoscopy is a key technique in the evaluation of the pediatric larynx. While other techniques exist for evaluating the physiology of the larynx, flexible laryngoscopy is a useful tool for evaluation of anatomical features. There are many strengths unique to flexible laryngoscopy: it is cost-effective, portable, fast, and adaptable to a child of any age. It is often the first instrumented step in the evaluation of the pediatric larynx and can guide further work-up and treatment.

There are alternatives to evaluate the pediatric larynx with indirect mirror exam being one example. Mirror exam has many of the benefits of flexible laryngoscopy, as it is cheap, fast, and portable. However, it cannot be recorded and requires a cooperative patient that can be coached through the exam, restricting its use to teenagers. Another option is direct laryngoscopy. Though it does provide an excellent exam and can be performed on children of any age, it requires a general anesthetic in the operating room and does not provide the dynamic information provided by flexible laryngoscopy in the awake patient.

Procedure Details

Flexible endoscopes come in a broad range. Some differences are subtle, such as using an eyepiece versus a separate video tower or a pediatric versus an adult sized endoscope. A more critical distinction, perhaps, is a distal chip endoscope contrasted with a fiber-optic endoscope. One trade-off here is the potential addition of a working channel. The working channel yields a bigger scope, which can be a significant challenge in the pediatric population. Although distal chip endoscopes provide better quality images, they have been found to have similar diagnostic accuracy compared with fiber-optic laryngoscopes [1]. Some studies even suggest that fiber-optic scopes are more accurate [2]. However, with improvements in technology have come smaller diameter distal chip endoscopes, allowing for improved image quality and comfort for smaller children. Fiber-optic and distal chip endoscopes are pictured in Figs. 14.1 and 14.2.

Fig. 14.1
figure 1

Flexible fiber-optic pediatric endoscope

Fig. 14.2
figure 2

Flexible distal chip pediatric endoscope

Another consideration in preparation for laryngoscopy is the use of an intranasal anesthetic and/or decongestant. Using a combination spray can be beneficial to examiner and patient: it decreases pain, decreases duration of the exam, and provides a superior view [3]. After using the spray, it is best to wait several minutes prior to the exam to allow maximal benefit. Anesthetics should be used with caution, however, as they can have unwanted consequences depending on the indication for the endoscopic exam. For example, topical anesthetics are known to increase signs of laryngomalacia [4] and may influence and swallow function, although findings on this have been mixed in adults and not extensively studied in children [5,6,7].

There are several other non-anesthetic considerations that may facilitate a flexible endoscopic exam. These vary by patient age. For a neonate, infant, or toddler, swaddling can help. For a preschool or school-aged child, distracting them during the exam or coaching them through it (if they are amenable to that) may be helpful. Finally, an adolescent should be able to participate more actively in breathing and relaxing techniques. Positioning the patient such that they are sitting up straight, leaning forward, and slightly extending their neck (assuming the sniffing position) is also important.

The steps to performing flexible laryngoscopy are as follows:

  1. 1.

    Administer topical anesthetic and position patient as detailed above.

  2. 2.

    Insert the endoscope along the nasal floor, maintaining a straight endoscope to allow for precise manipulation.

  3. 3.

    Once the posterior nasopharynx is encountered, instruct the patient to breathe through their nose (if they are able to follow instructions) to allow passage into the oropharynx.

  4. 4.

    In the oropharynx, have the patient protrude their tongue to allow for better assessment of the tongue base and valleculae.

  5. 5.

    Advance to the hypopharynx. Instruct the patient to insufflate their cheeks to provide a better examination of the pyriform sinuses.

  6. 6.

    Assess the true and false vocal folds. Have the patient produce a sustained /i/ to evaluate mobility. Spontaneous crying will also suffice for this purpose. Instruct the patient to sniff in to elicit posterior cricoarytenoid muscle contraction and consequent vocal fold abduction.

  7. 7.

    Advance the endoscope to the level of the vocal folds to examine the subglottis.

  8. 8.

    Withdraw the endoscope slowly, evaluating the adenoid pad, torus tubarius, and nasal cavity.

Interpretation

More important than the technical ability required to perform flexible laryngoscopy is the interpretation of the exam. Recording the exam is ideal to allow revisiting and comparing across serial exams. The nasal cavity, nasopharynx, oropharynx, hypopharynx, and larynx can all contribute via different mechanisms to alter voice and swallow function.

In the nasal cavity, it is important to assess for mucosal edema as congestion can alter resonance (Fig. 14.3). As such, congestion should be noted, keeping in mind that this may be altered by the use of topical decongestant [8].

Fig. 14.3
figure 3

Normal nasopharynx

Moving posteriorly to the palate, palatal mobility and velopharyngeal competence should be evaluated. Velopharyngeal insufficiency can occur in the setting of various craniofacial syndromes or rarely status post-adenotonsillectomy [9, 10]. The adenoid pad should be examined to determine the amount of obstruction. Adenoid hypertrophy can also have effects on resonance in addition to the negative consequences on eustachian tube function [11].

In the oropharynx and hypopharynx, surface characteristics of the mucosa should be noted (e.g., cobblestoning and erythema) (Fig. 14.4). Posterior pharyngeal wall cobblestoning or lingual tonsillar hypertrophy can be signs of gastroesophageal reflux disease (GERD) [12]. Lingual tonsillar hypertrophy can also contribute to obstructive sleep apnea and is especially common in children with Down syndrome [13, 14]. In the hypopharynx, post-cricoid edema can be a highly sensitive finding for GERD. Other less sensitive findings include hypopharyngeal cobblestoning and generalized erythema/edema [12]. The pyriform sinuses should be examined for pooling of secretions, penetration of secretions into the supraglottis, and other anatomic abnormalities such as a third branchial cleft sinus tract with an opening at the pyriform sinus.

Fig. 14.4
figure 4

Normal larynx and hypopharynx

The supraglottis, glottis, and subglottis should be evaluated from both a functional and anatomic/structural perspective. Using laryngomalacia as an example for evaluation of the supraglottic airway, it is a pathology with both functional (mucosa overlying the arytenoid cartilages prolapsing into the airway) and structural (foreshortened aryepiglottic folds and an omega-shaped epiglottis) components [15]. From a functional perspective, at the level of the glottis, there can be a range of pathologies including incomplete glottic closure, paradoxical vocal fold motion, or vocal fold paralysis. From a structural perspective, benign vocal fold lesions or laryngeal webs/atresia may be present. The subglottis is similar to other parts of the larynx where pathologies such as subglottic hemangiomas or stenosis can contribute to symptoms on the structural side and tracheomalacia can be a factor on the functional side.

Videostroboscopy

While endoscopy under halogen light can evaluate laryngeal structure, mobility, and tissues, and identify the presence or absence of lesions or masses, it lacks the ability to evaluate the vibratory characteristics, pliability of the vocal folds, and closure pattern. The rate of vibration of the vocal folds during phonation is much faster than the human eye can distinguish. Because of this, videostroboscopy allows the evaluator to assess vibratory features through essentially taking advantage of an optical illusion created by stroboscopic light.

Videostroboscopy to evaluate the larynx was well described by Bless, Hirano, and Feder in 1987 [16] and is part of the recommended protocols for instrumental evaluation of the voice set out by the American Speech-Language-Hearing Association (ASHA) expert panel [17]. Videostroboscopy is performed using either a rigid or flexible endoscope (fiber optic or distal chip) attached to a stroboscopic light source and a video recording system [16, 18]. Recommended specifications for equipment are detailed in the recommendations of the ASHA task force [17].

Stroboscopy takes advantage of two phenomena of visual perception: a perception of a flicker-free, uniformly illuminated background (occurring at greater than 50 Hz) and the perception of apparent motion when two objects are displayed in rapid succession [18, 19]. Stroboscopy works by producing a flickering light source at a slightly slower rate than the frequency of vocal fold vibration, so that what is seen is actually a sampling of images across multiple vocal fold vibratory cycles, rather than a single cycle. Due to the mentioned visual perceptual phenomena, the observer’s eye perceives this as a continuous motion, allowing them to assess vibratory characteristics of the vocal folds. A minimum of three glottic cycles are needed to make valid perceptual judgements, with each cycle consisting of opening, closing, and closed phases [20]. Rating is not reliable with an aperiodic signal, as the light cannot sync appropriately to provide images that appear to be in immediate succession.

Instrumentation and Procedures

Stroboscopy can be performed with either a flexible or rigid endoscope. When performing rigid endoscopy, the child should be positioned in an upright position, leaning forward from their waist, with their chin up and tongue out. Very young children often have difficulties participating in rigid endoscopy, as it requires them to sit with their mouth open, their tongue out, and sustain phonation in this position. While we have sometimes had success in performing rigid stroboscopy as young as 3 years old, it is more usual for children age 5 or 6 to be able to participate. Flexible visualization requires less assistance from the child but can be more unpleasant for children because, as stated above, the passage through the nose can be slightly uncomfortable. As with halogen endoscopy, topical anesthetic and decongestant can be applied and often make the procedure more comfortable. For young children sitting on a parent’s lap can also be comforting, as well as allowing for the parent to assist with positioning. A laryngeal microphone is positioned on the child’s neck so that the stroboscopic light can sync with their fundamental frequency. Flexible endoscopes can be either fiber optic or distal chip, and imaging advances in recent years have allowed for much smaller diameters of distal chip endoscopes. Improved image quality and a smaller diameter combine to improve both patient participation and the ability to interpret stroboscopy.

Parameters and tasks for recommended evaluation are detailed in the recommendations of the ASHA task force on instrumental voice evaluation [17]. Poburka and colleagues created and validated a rating system for both stroboscopic and high-speed video imaging of the larynx, which is included in Fig. 14.5 [21]. The following parameters should be assessed when performing a stroboscopic evaluation in order to fully assess laryngeal function [16, 17, 21, 22].

Fig. 14.5
figure 5figure 5figure 5figure 5figure 5

(a, b) The Voice-Vibratory Assessment with Laryngeal Imaging (VALI) form: Stroboscopy. (ce) The Voice-Vibratory Assessment with Laryngeal Imaging (VALI) form: High-speed Videoendoscopy

Parameters which can be assessed with halogen light only:

  • Arytenoid mobility – degree of abduction and adduction, symmetry, and speed of movement

  • Tissue appearance

  • Supraglottic compression – degree of lateral or anteroposterior compression above the level of the vocal folds

  • Free edge contour (rated during abduction, each vocal fold rated separately)

Parameters evaluated using stroboscopy:

  • Glottal closure (rated during modal pitch) – the degree and configuration of glottic closure during closed phase

  • Amplitude (rated during modal pitch, with each fold rated separately) – the magnitude of lateral movement of the vocal folds during vibration

  • Mucosal wave (rated during modal pitch with each vocal fold rated separately) – the magnitude of movement of the mucosa during vibration

  • Vertical level – the degree to which the vocal folds meet on the same plane (is one higher or lower than the other?)

  • Adynamic segments – are there portions of the membranous vocal fold that do not vibrate?

  • Phase closure – whether open or closed phase dominates or if it is equal

  • Phase symmetry – the degree to which the vocal folds mirror each other during vibration

  • Regularity/periodicity – the regularity of vibrations

Evaluation of these parameters is recommended during the following tasks: [17]

  1. 1.

    Rest breathing – three consecutive cycles

  2. 2.

    Laryngeal diadokokinesis (ʔiʔiʔiʔiʔiʔiʔiʔi)

  3. 3.

    /i/ – sniff or /i/ quick inhale

  4. 4.

    Sustained /i/ at modal pitch, at least three stroboscopic cycles

  5. 5.

    Sustained /i/ at low and high pitch, at least three stroboscopic cycles of each

  6. 6.

    Sustained /i/ at varying loudness levels, at least three stroboscopic cycles of each

  7. 7.

    Any additional tasks individualized to the patient’s voice complaints

Acquisition of these tasks relies heavily on the patient’s willingness to participate, which can be more of a challenge with children than adults. Every attempt should be made to help the child feel comfortable and gain their participation. In pediatric clinics and hospitals, child life specialists can be extremely helpful in making children feel comfortable and relieving some of the potential fear and stress involved.

Interpretation and Evaluation

When an adequate sample can be obtained, stroboscopy has a high level of clinical utility in evaluating the vibratory function of the vocal folds and in differentially diagnosing lesions [23,24,25]. Successful stroboscopy has been reported on in the literature with children as young as 3 years old [23]. Detailed evaluation may be more challenging in children than adults due to multiple factors, including relative difficulty sustaining a pitch for the required number of cycles, difficulty cooperating, and a smaller larynx. Zacharias and colleagues found that clinicians were able to identify vibratory features in 92% of stroboscopic exams in children but only confidently rate those features in 42% of exams [24]. The researchers found that raters were more able to rate the features when performed with a rigid endoscope than with a flexible scope and that older children were more able to tolerate the rigid exam than younger children [24]. As stated above, making a child more comfortable with the procedure is important not only for the child’s comfort but also in our ability to make adequate observations. As a visual perceptual measure, ratings of videostroboscopy are by nature subjective and subject to the limitations of any perceptual measure. Relatively few studies using stroboscopy as an outcome measure have reported on interrater reliability, and of those that have, many are low [26, 27]. Ratings are dependent on the skill and experience of the rater, as well as their rigor in applying those skills. Efforts have been made over the years to standardize evaluation procedures and ratings in order to be more consistent across raters and clinics, and there are multiple rating forms available for use in evaluating stroboscopic images [16, 21, 26, 28, 29]. The Voice-Vibratory Assessment with Laryngeal Imaging (VALI) form (Fig. 14.5) provides a rating system for both stroboscopy and high-speed digital laryngeal imaging of the larynx [21]. Consistent use of the same methodology across raters, as well as regular practice and training, should improve reliability and clinical accuracy of ratings.

High-Speed Videoendoscopy

Videostroboscopy, the current gold standard in laryngeal imaging, is designed to evaluate periodic vibrations of any nature [16, 22]. In order to obtain reliable and valid visual perceptual judgments of vocal fold vibratory motion from videostroboscopy, a steady-state phonation of at least 2–3 s [20] from which three consecutive glottal cycles [30] can be viewed is required. In the pediatric population, it is often difficult to obtain steady-state phonation of a minimum of 2–3 s with either a rigid or flexible videostroboscopy due to examination factors of ease and cooperation. Other factors such as moderate and severe overall auditory perceptual impairment of voice quality typically also result in short phonations of less than 2 s, resulting in tracking errors on videostroboscopy [31]. The presence of tracking errors renders the exam clinically invalid for documenting the vibratory features of amplitude, mucosal wave, periodicity, glottal closure, etc. [30]. High-speed videoendoscopic systems are able to capture cycle-to-cycle vocal fold vibratory motion for phonations less than 2 s due to the high-temporal resolution of up to 8000 frames per second. In contrast with high-speed videoendoscopy, videostroboscopy is able to provide an averaged vibratory motion at 30 frames per second. The sampling rate of high-speed videoendoscopic systems is fast enough to also capture transient events of oscillatory onset, oscillatory offset, and voice breaks.

Instrumentation and Procedures

Since its first report in 1940 [32], high-speed videoendoscopy systems have undergone substantial modifications making the once impractical research tool now clinically feasible.

High-speed videoendoscopic systems have similar appearance to the videostroboscopy systems but differ substantially in terms of its basic principle and playback capabilities. Like videostroboscopy, simultaneous acoustic and various other signals (e.g., electroglottography, electromyography, etc.) can be captured with high-speed videoendoscopic recordings. However, unlike videostroboscopy, high-speed videoendoscopic recordings do not provide simultaneous playback of the video and audio. Slow video playback rates ranging from 10 to 30 frames per second are required to view and evaluate the high-speed videos captured at high-temporal resolutions of up to 8000 frames per second. Due to the current technological limitations, playback of audio simultaneously with the slow playback of the high-speed videos is not possible. The spatial resolution of high-speed videoendoscopy is generally lower (512 × 256 pixels) compared to videostroboscopic systems which can range from 720 × 468 for standard digital videostroboscopic systems to 1920 × 1080 pixels for high-definition videostroboscopic systems. As is evident high-definition videostroboscopy is not similar to high-speed videoendoscopy as the former has high spatial resolution but is still lower in terms of the temporal resolution compared to high-speed videoendoscopy. Because high-speed videoendoscopic systems allow for the capture of cycle-to-cycle variations of vibratory motion due to its increased temporal resolution, high-speed videoendoscopy was reported to take less time (2.31 ± 1.92 min) compared to videostroboscopy (2.95 ± 2.41 min) for evaluation of vocal fold vibratory features in adolescents [25]. Common commercially available high-speed videoendoscopy systems are able to record phonations for up to 10 s requiring multiple recordings to capture the range of tasks required to evaluate the vocal fold structure and function. High-speed videoendoscopic systems also require a strong light source of 300 watts; hence care must be taken to turn the light source down between recordings to prevent any heat-related side effects from overheating of the tip of the endoscope. Because high-speed videoendoscopic systems differ in terms of the basic principles compared to videostroboscopy, considerable training is required for its use.

Core tasks and measures similar to those for videostroboscopy can be used for clinical examination with high-speed videoendoscopy. The use of tasks and procedure for videostroboscopy recommended by the American Speech-Language Pathology (ASHA) task force [30] is an ideal place to start as these tasks can also be used for high-speed videoendoscopy. The basic recommended protocol of rest breathing, laryngeal diadochokinetic tasks /iʔ iʔ iʔ iʔ/, and maximum vocal fold adduction and abduction(/i:/-sniff, /i:/-sniff) can be used for evaluation of vocal fold edges, vocal fold mobility, and the maximum range of vocal fold mobility at the level of the arytenoids [30]. The tasks of sustain phonation of /i:/, sustained /i:/ at varied pitch and loudness levels, and [5] variations in pitch and loudness on sustained /i:/ that elucidate the patients’ problem can used to evaluate the vocal fold function features of supraglottic compression, regularity, amplitude, mucosal wave, glottal closure, left/right phase symmetry, vertical level, and glottal closure duration [30]. Often high-speed videoendoscopy is used in conjunction with videostroboscopy clinically rather than in isolation, especially in instances where videostroboscopy results in tracking errors due to short phonation time. Since high-speed videoendoscopy is often used in combination with videostroboscopy, the clinician may choose to limit high-speed videoendoscopy to the evaluation of vibratory function only, thereby reducing the overall time required for the clinical exam.

Evaluation

The vibratory motion obtained from high-speed videoendoscopy can be evaluated both quantitatively and qualitatively. Currently, quantitative tools for evaluating vibratory motion have not attained widespread utility as the custom-developed software systems are not readily available and often too laborious for routine clinical use. Qualitative visual perceptual evaluation of vocal fold structure and function is routinely used in clinic. The Voice-Vibratory Assessment with Laryngeal Imaging (VALI) form for visual perceptual evaluation of vocal fold structure and function can be used for both videostroboscopy and high-speed videoendoscopy (Fig. 14.5) as the VALI rating form was developed a prior for reliable visual perceptual ratings of vocal fold structure and vibratory characteristics for videostroboscopy and high-speed videoendoscopy [21]. The VALI visual perceptual rating form has improved graphics and definition of each parameter to aid the clinician for improved reliability in rating the laryngeal imaging features of interest [21].

The value of high-speed videoendoscopy to the understanding of vocal fold vibrations and voice production is immeasurable. Most of our current knowledge of vocal fold vibrations of normal and disordered voice in adults is derived from classic studies in the early 1960s from high-speed films [33, 34]. The first high-speed study on pediatric vocal fold vibrations was reported in 2011 [35]. Series of studies since 2011 quantifying vibratory motion using high-speed videoendoscopy in children have consistently revealed that the vibratory motion in children is complex and not easily predicted from vibratory motion of adults [36,37,38,39] (Table 14.1). Typically developing children demonstrate a posterior glottal gap more frequently compared to adult males and females. The posterior glottal gap in children is large extending to the membranous portion of the vocal folds resulting in a diamond-shaped gap [40]. The presence of this diamond-shaped posterior gap (Fig. 14.2) though not a statistically significant finding due to small sample size (boys = 28; girls = 28) could be considered as part of normal development rather than an abnormality on videostroboscopic examination. Typically developing children also had greater cycle-to-cycle variability in both amplitude and time periodicity and left/right phase symmetry during sustained steady-state phonation compared to adult men, suggesting greater aperiodicity of vocal fold vibrations in children [36]. The presence of these aperiodicities/instabilities in vibratory motion should not be confused with the presence of an abnormality but rather part of the normal development of vibratory motion in children. Quantitative measurement of vibratory amplitude revealed that children had large vibratory amplitude compared to the length of the vocal fold, suggesting that the adult normative reference of vibratory amplitude of 50% mediolateral excursion of the vocal fold may not hold true for pediatric vocal fold vibratory amplitude. In the absence of normative findings of vibratory motion in the pediatric population on videostroboscopy, normative findings from high-speed videoendoscopy can serve as a basis for clinical evaluation of vibratory characteristics from videostroboscopy.

Table 14.1 Summary of differences in vibratory characteristics in typically developing children, adult females, and adult males without dysphonia

High-speed videoendoscopy is the most powerful tool to date to evaluate vocal fold vibratory motion. With future studies, high-speed videoendoscopy will be able to provide further insights into vibratory motion across pitch and loudness variations and will thereby be able to provide detailed functional assessment of various voice disorders leading to timely and improved diagnosis of various vocal conditions in the pediatric population.

Emerging and Evolving Practices

Clinically, laryngeal imaging modalities of videostroboscopy, high-speed videoendoscopy, and videokymography have been primarily limited to providing qualitative or quantitative information about vocal fold vibrations in two dimensions (2D), which are not calibrated in terms of size and distance between the vocal folds and the tip of the endoscope. Vocal fold vibrations are three-dimensional involving not only the lateral and longitudinal dimensions which can be viewed from the superior surface but also the vertical dimension, which is often difficult to visualize from examination of the superior surface. Precise clinical measurements of the vertical dimension have significant potential to improve clinical diagnosis and management of dysphonia. Emerging studies using the latest generation of laser devices coupled with high-speed videoendoscopy have the capability to project a calibrated laser grid of 18 × 18 laser dots [41] and allow in vivo recording of the vertical dimension in absolute values [42]. The applications of these new laser devices for clinical examination of pediatric vocal fold vibrations have the capability for generating new insights into the clinically relevant diagnostic process and thereby improve evidence-based assessment and management of pediatric voice disorders in the near future.