Introduction

The way in which we visualize the larynx has greatly shaped our understanding of this complex organ. It has been shown that history and physical examination alone are insufficient in making a diagnosis in a dysphonic patient and that laryngeal visualization is paramount to diagnostic accuracy [1•]. Subtle changes to laryngeal anatomy can create functional deficits. From the advent of laryngeal visualization with mirrors and sunlight, laryngoscopy has evolved to allow for precise evaluation of laryngeal biomechanics and disease surveillance. With every advance, we uncover new ways of looking at laryngeal physiology as well as the pathologic processes that can disrupt it. Due to the dynamic function of the larynx, it is ever important that a patient be examined awake to fully evaluate the phonatory and respiratory function.

Office-Based Examination

Visualization of the larynx is an important component of a comprehensive head and neck examination. There are several techniques and instruments used in the office setting to visualize the larynx. Before we discuss them, it is important to acknowledge that the invasive nature of exam itself can change the dynamic of a patient’s phonation and breathing. It is therefore important to make this experience as comfortable as possible.

Anesthesia

There are two ways in which the larynx can be visualized, trans-nasally and trans-orally. Either approach needs appropriate anesthesia to provide a comfortable examination. There are several methods to do this, but mostly this is done topically. Flexible, trans-nasal laryngoscopy begins by spraying each nasal cavity with a mixture of 2 % lidocaine and 0.25 % phenylephrine to provide both anesthesia and nasal decongestion. Trans-oral laryngoscopy typically omits the decongestant and uses a plain 2 % lidocaine or benzocaine spray. When more extensive anesthesia of the larynx is needed for procedures, such as vocal fold injection of laser surgery, additional topical 4 % lidocaine is applied directly to the vocal folds via the operating channel of a flexible laryngoscope or a curved cannula trans-orally. Lidocaine can also be delivered via nebulizer or atomization to topicalize the oropharyngeal and supraglottic mucosa. It is imperative that you strike a balance with patient comfort and the amount of anesthesia used to perform an examination as to not cause lidocaine/benzocaine toxicity. It is also important to perform these steps in a timely manner in relation to the procedure itself, finding a window that allows adequate time for anesthesia to take place and enough to complete the examination or procedure.

Positioning

When imaging the larynx, it is important to position the patient prior to the start of the procedure. This positioning can change depending on the patient body habitus, anatomy, and the type of visualization technique used. Typically, the patient should be seated. The examination chair should then be adjusted to a height that is comfortable for the examiner. We then employ the Kirstein position in which the patient leans forward slightly at the waist with a flexed neck and their head extended at the atlanto-occipital joint. This position can be achieved by placing a pillow between the patient’s shoulder blades.

Visualization of the Larynx

Our ability to visualize the larynx has changed dramatically in the office setting. Originally, this exam was performed with direct rigid laryngoscopy on an awake patient. A shift then took place with emphasis on indirect visualization. This means that the examiner is looking at an image transmitted via a medium such as a mirror, glass rod, optic fiber, or digital photograph. For decades, a mirror exam was the gold standard for office-based laryngoscopy, and it continues to be used today. Unfortunately, it can be technically difficult for the otolaryngologist and uncomfortable for the patient, especially with a sensitive gag reflex. Furthermore, a mirror exam does not provide a sustained laryngeal view and cannot benefit from magnification. A mirror exam also cannot be recorded, archived, or reviewed with the patient. This is a significant disadvantage for surveillance of potentially recurrent epithelial pathology.

The mirror exam was then supplanted by flexible laryngoscopy. This technique is now ubiquitous in an Otolaryngologist’s office. Flexible laryngoscopy is a form of fiberoptic visualization where an optic cable can be bent and manipulated via a hand piece as the tip transmits the desired image to an eyepiece or camera. Flexible laryngoscopy does not affect laryngeal dynamics to the same extent as a mirror exam, and it can be recorded and played back for analysis or patient education [2•]. It is also improves our ability to evaluate neurologic disorders since it is not a peri-oral exam and allows for more natural phonation.

Rigid, trans-oral laryngoscopy uses a glass rod to transmit the image of the larynx to a monitor via a camera attached to the laryngoscope (Fig. 1a). This creates an image with high-resolution optical quality and magnification that may be better for evaluating mass lesions and mucosal abnormalities [2•]. The downside to this method is technical difficulty, patient discomfort, and it may not allow for natural phonation.

Fig. 1
figure 1

a Rigid high definition laryngoscopy; b white light laryngoscopy revealing laryngeal papillomatosis; c narrow band imaging used in the same patient with laryngeal papillomatosis, now displaying more extensive laryngeal involvement and accentuated microcirculation

For both flexible and rigid laryngoscopy, there have already been many advances, for example, distal chip cameras have been used in flexible laryngoscopy instead of traditional fiberoptics. The image is digital and sent directly to a video monitor. This technology, known as “chip in the tip”, produces images with improved clarity and detail. Likewise, rigid laryngoscopy has been connected to a camera at the eyepiece to transform high-resolution optics into a digital picture that is also displayed on a monitor. Once a digital representation of the image has been made and presented on a monitor, it was a natural progression to record these images in sequence, analyze the video, and archive them. Archiving exams enhances our surveillance of benign and malignant pathology. Additionally, we are able to review our findings with the patient, which is important in helping them understand their disease process as well as goals and outcomes of surgery.

Further advances in digital imaging technologies were made and applied to laryngoscopy. Currently, we have high definition cameras with exceptional resolution. We are now able to transmit images and view the larynx in unprecedented clarity and magnification. The mucosal lining, vascularity, and even pathologies are clearer and more pronounced.

Videostroboscopy

Stroboscopy is an office-based imaging tool that allows for evaluation of vocal fold vibration, mucosal pliability, and glottic competence. Vocal fold vibration occurs at a rate faster then the human eye can appreciate and stroboscopy or intermittent light can create an illusion of a slowed image that allows for review of the vibratory cycle. Stroboscopy uses the sound of a patient’s voice at a particular frequency. This frequency is recorded via a microphone placed against the skin at the thyroid cartilage. The recorded frequency is then used to synchronize light flashes from the stroboscope back onto the larynx at a slightly slower speed capturing images of the larynx at different points across the glottic cycle. When viewed in succession, these images create a slow motion sequence of vocal fold vibration. Stroboscopy requires periodic vibration. Aperiodicity will not allow the strobe to synchronize with mucosal oscillation.

Videostroboscopy is the most practical and useful technique for clinical examination, and it enhances the laryngologist’s ability to evaluate the biomechanics and visco-elastic properties of the phonatory mucosa [3]. One can assess vibratory properties such as amplitude, phase symmetry, glottic closure, and pliability [4]. All of these measures give insight into the exact cause of patient’s symptoms and have proven to change management pathways. Laryngeal pathology is closely associated with stroboscopic findings, and this technology has been particularly useful in identifying cysts, polyps, and other lesions. It has also been useful in identifying laryngeal irritation, inflammation, and edema. Stroboscopy is not any more effective in diagnosing pathology that would otherwise be seen on fiberoptic examination including large tumors or lesions, vocal fold motion impairment, and laryngeal movement disorders [5]. There are also several vocal fold behaviors that cannot be imaged with stroboscopy due to aperiodicity. These include voice breaks, diplophonia, and vocal fold function during onset and offset [5, 6•].

Traditionally, stroboscopy was performed with a rigid strobolaryngoscope opposed to a flexible laryngoscope because the long fiberoptic bundles could not obtain a light intensity strong enough for evaluation. Rigid telescopic trans-oral endoscopy could produce the optimal imaging and magnification necessary; however, this technique requires tongue protrusion, which may distort natural phonatory posture. It can also elicit a gag reflex and requires increased patient cooperation and examiner skill. The advancement in flexible laryngoscopy to distal chip endoscopes remedied previous drawbacks, and it is now considered effective in visualizing the larynx with great resolution and adequate light, allowing for stroboscopic evaluation. Additionally, they are more conducive to capturing a more natural phonatory posture compared to the rigid exam. Regardless of rigid versus flexible examination, both are amenable to video capture and recording for later review.

Advances in Office-Based Videoimaging

Narrow Band Imaging

Narrow band imaging (NBI) is a diagnostic tool that utilizes narrow band spectrum filters to enhance visualization of the microvascular patters of the mucosa and submucosa. This accentuates the appearance of epithelial lesions and provides improved detail of microcirculation (Fig. 1b, c). The light source is different then standard white light (or visible light) endoscopy [7]. White light occurs along the electromagnetic spectrum from wavelengths 400 to 700 nm. When we visualize a color, what really is occurring is white light across the spectrum being absorbed by an object, while certain wavelengths are scattered and reflected back. It is the light reflected back at a certain wavelength that gives it its color. It is important to note that absorption of certain wavelengths can occur at different depths. Shorter wavelengths diffuse more superficially, while longer wavelengths diffuse more deeply [2•].

NBI takes advantage of these physical properties by filtering light at specific frequencies; 400–415 nm which is the blue spectrum, and 540 nm which is the green spectrum. These narrow bands of light are preferentially absorbed by hemoglobin in the subepithelial vasculature [2•, 8, 9, 10•, 11, 12•]. Bands of light in the blue spectrum penetrate the superficial layer of mucosa and are absorbed by capillary vessels and bands in the green spectrum penetrate more deeply and are absorbed by veins [9, 12•].

This is interesting because we are not examining a lesion or neoplasm itself, but rather the angiogenic patterns, or the proliferation of blood vessels in the superficial epithelium of a lesion [7, 9]. Ni et al. have described these vascular patterns, I–V, under NBI as earthworm like vessels formed by intraepithelial capillary loops (IPCL) [13]. These patterns were later observed to have relevance due to the association observed in non-neoplastic lesions consistent with patterns I–IV and neoplastic lesions consistent with pattern V [10•].

This type of technology is particularly useful and reliable in the glottis, which has a thin, nonkeratinizing layer of stratified squamous epithelium. It can also play an important role in the diagnosis and treatment of superficial dysplasia or carcinomas in the oropharynx and hypopharynx despite not working as well as with keratinized lesions, as keratosis reflects all visible light [2•, 11].

The advantage of such technology is that we are able to better see lesions not otherwise picked up on white light endoscopy. This can lead to early diagnosis and treatment as well as better surveillance for recurrence, even if the patient was radiated [10•, 12•]. In a study by Piazza et al., 347 patients were evaluated with NBI in the preoperative, intraoperative, or post-operative setting, and there was a 21 % gain in true-positive adjunctive information compared to white light endoscopy [8, 14]. When this technology is coupled with high definition television (HDTV), 1080 lines of resolution, or 4.26 times better then standard definition, NBI has been shown to achieve its highest diagnostic accuracy, especially in the outpatient setting under local anesthesia [8]. This allows for improvement in peripheral margin control of lesions as well as a nuanced patient treatment plan. Additionally, NBI maintains a high sensitivity and specificity even after radiotherapy or chemotherapy, 100 and 98 %, respectively [8]. NBI has also been shown to be a great diagnostic tool for recurrent respiratory papilloma. It has increased the sensitivity of diagnosis from 80 % with white light to 97 % with white light plus NBI [9].

A learning curve with this technology may exist resulting in biopsies and an inflated false positive rate at first. This however is mostly due to acute inflammation or chronic post-radiation changes [8].

High Speed Cinematography

High speed cinematography or high speed video (HSV) allows the user to capture upwards of 8000 images per second, a rate faster then normal physiologic vocal fold vibration. This is done using a rigid peri-oral endoscope with a Xenon light source for adequate visualization. The endoscope is connected to cameras that allow for digitized video recording and playback [2•]. With this technology, we are able to capture at least 2000 images per second, meaning 10–20 frames of each glottic cycle depending on the fundamental frequency [15•]. Due to the shear size of data collected, only 2 s of video are actually recorded. Using these images, we are able to create a video that expands a few glottic cycles into a 2 min video. From this, we are able to clearly analyze vocal fold vibration by watching the wave itself.

This is an improvement compared to stroboscopy which relies on creating a sequential mosaic of images along the glottic cycle, giving the illusion that it is visualized in real time. With stroboscopy, certain key portions of the exam are missed, including the beginning and end of the glottic cycle. This is particularly exaggerated with aperiodic vibration. A study by Mendelsohn et al. looked at the utility of this technology compared to stroboscopy and found that it did not improve diagnostic accuracy in dysphonic patients compared to stroboscopy [15•]. In another study by Kendall, it was found that there was not a significant difference between HSV and stroboscopy in evaluating glottal configuration, mucosal wave propagation, and amplitude of vibration. It is however advantageous in the diagnosis of aperiodic dysphonias, specifically those with neuromuscular etiologies [3]. This technology can also be useful in discerning types of vocal tremor, differentiating spasmodic dysphonia from muscle tension dysphonia, again because aperiodicity cannot be captured by stroboscopy. Additionally, this technology shows potential in the diagnosis of presbyphonia or difficult diagnostic cases like vocal fold scarring.

To improve objective measures, this technology has shown great promise when combined with complex algorithms that evaluate both glottic area waveform (GAW) and vibratory patterns [6•, 16•, 17, 18]. GAW plots glottic area against the amount of time it takes for the glottis to open and close, suggesting vocal fold pliability measures. This can be useful when evaluating preoperative and postoperative results for benign vocal fold lesions.

The major drawback to this technology pertains to its utility. Essentially, it will draw a large amount of information or data from a very short range in time, usually a few seconds. It is therefore important to record the exact segment you would like to analyze. It is likened to an in-depth look at a snapshot in time. For these reasons, it is also difficult to evaluate the vocal folds across different pitches and intensities [2•]. It is anticipated that these short comings will dissipate with advances and availability of powerful low cost cameras and image processers.

Kymography

Kymographic imaging is a modern way of evaluating the complex motion of the vocal folds. During laryngoscopy, kymography will evaluate the vocal folds as they oscillate during phonation at a particular line horizontal to the glottis. At this line, the vocal folds are observed as they oscillate toward and away from midline. By singling in on one particular segment, a kymograph is created, which is a single image [2•, 19•, 20].

Whereas traditional laryngoscopic images are great at diagnosing structural or anatomic pathology leading to dysphonia, kymography helps elucidate vibratory abnormalities. When compared with more traditional HSV feedback, kymographic depictions have generally been more reliable in diagnosing the vibratory properties of the vocal folds. Currently, kymography can also be coupled with HSV imaging or even stroboscopy for more precise evaluation. In fact, HSV imaging has been shown to significantly enhance kymographs, with lower rates of frames per second (<2000) typically insufficient and greater then 4000 frames per second recommended [20].

Once the data are collected, there are many software applications and algorithms that can be applied, creating objective data regarding vibratory frequency, periodicity, peak power, and symmetry for each vocal fold.

Kymography is ideally suited to visually judge pathologies causing dysphonia related to left–right asymmetries. It is also helpful in the diagnosis of vocal fold vibratory irregularities, laryngeal lesions, and unilateral vocal fold paralysis. It has also been useful in providing insight into the changes of vocal fold behavior during treatment and follow-up, for example, status post-thyroidectomy or after vocal fold medialization for a unilateral vocal fold paralysis. In one study, kymography revealed that 31–62 % of patients with early glottic cancer displayed an abnormal glottic asymmetry after phonosurgery despite closure duration and periodicity falling within normal limits [21].

While the clinical information that kymography can collect is interesting and in many ways avant-garde, its data remain largely investigational. Even if it helps differentiate a normal from a pathologic process, it is not clear whether this technology impacts clinical decision making. It appears that this new technology has potential for future use; however, more studies are needed as it finds its place in clinical practice.

Conclusion

Evaluation of the larynx is an essential component of a comprehensive head and neck examination. The way in which we exam the larynx has changed dramatically over the years and with every advance we can heighten our understanding of the pathophysiology of this complex organ. The nature of the exam is a dynamic one, and therefore most advances we have seen lend it to awake evaluation in an office-based setting. While flexible or rigid laryngoscopy with stroboscopy remains the current gold standard for laryngeal visualization, technological advances are certain. The authors of this paper are encouraged by advances already made including NBI, high speed cinematography, and kymography. We look forward to these advances establishing themselves in clinical practice as well as future advances that will positively impact patient care.