Keywords

8.1 Introduction

Clinical diagnostic tools and imaging technologies can be used to quantify both the physical and the functional status of the middle ear in humans. Because different pathologies often cause a single structural change and because a single pathology often causes multiple structural changes, few of these diagnostic measures can identify a specific pathology. However, these clinical technologies can provide very precise, repeatable, and quantifiable measures of the structure and function of the entire middle ear and its components in a living human.

Figure 8.1 is an illustration of the human middle ear and its surrounding structures that can be used when discussing current clinical middle ear measurements. For purposes of this chapter, the middle ear consists of the structures encompassing the lateral surface of the tympanic membrane, the middle ear cavity space, the ossicular chain up to the medial surface of the stapes footplate, and all attachments to the ossicles including the tendons of the middle ear muscles. In a living human, measurement apparatus can be placed at various positions in the external ear canal, requiring that the characteristics of the remaining ear canal become a component of the measurement. The influence of the external ear canal is reduced the closer the apparatus is positioned to the tympanic membrane but at the expense of a more invasive, uncomfortable, and risky measurement. Generally, it is not practical to place measurement apparatus within the middle ear, except during surgery, or to place measurement apparatus on the medial side of the middle ear. However, it is possible to use the sensory functions of the cochlea as a measurement sensor to obtain responses on the medial side of the middle ear. Medical imaging can add additional structural information and in some cases functional information.

Fig. 8.1
figure 00081

Schematic representation of the right peripheral human ear illustrating the outer ear that consists of the acoustic structures lateral to the middle ear including the pinna (1) and the external ear canal (2); the middle ear itself, which consists of mechanical structures including the conical shaped tympanic membrane (3) and the ossicles (4); and the inner ear, which consists of the cochlea with its sensory structures (5) and the connected neural structures (6)

Clinical measurement methods began many years ago and continue to evolve. The oldest methods are the most standardized and the recent methods the least standardized, though both are valid, reliable, and useful. Virtually all of the measures are controlled by governmental regulations or institutional policies, and current audiology textbooks cover the standard clinical measures (Roeser et al. 2007; Katz et al. 2009). The purpose of this chapter is to outline current and emerging clinical middle ear diagnostic measures and imaging technologies, to provide a reasonable understanding of the underlying measurement concepts, and to illustrate the quantitative information that can be obtained for the human middle ear. Quantifiable measures are emphasized, with the full understanding that medical practice also includes many subjective measures that are not elucidated here.

Prior to performing any middle ear measures, one should examine the ear canal with an otoscope or binocular microscope to ensure there are no cerumen blockages, foreign bodies, drainage, tympanic membrane perforations, or an external ear canal that easily collapses. All of these conditions can affect middle ear measurements and should be corrected or noted. In the case of drainage or other obvious pathologic conditions, the measures may need to be deferred pending medical assessment and intervention. A cerumen blockage of less than 50 % of the ear canal volume may or may not affect some of the middle ear measures but should be noted in either case to allow proper interpretation of the results.

Four fundamental measurement approaches provide quantifiable information about the middle ear in humans. A particular approach is selected based on a variety of factors including the specific type of information desired and practical considerations including convenience, availability of measurement devices, minimization of patient time, and minimization of invasiveness. In many cases the results of the different approaches are interpreted in aggregate.

8.1.1 Behavioral Measures

Behavioral measures are the primary, oldest, most common, and least invasive of quantifiable measures of middle ear function. Quantifiable behavioral measures are based on the application of a controlled signal that activates the sense of hearing that originates in the cochlea and a voluntary response from a cooperative subject such as raising the hand or pressing a button. Calibrated acoustic signals are presented by air conduction from one of a variety of transducers positioned at or near the entrance to the ear canal. Calibrated vibratory signals are presented from one of a variety of vibratory transducers applied to the skull, typically on the surface of the skin and the subdermal soft tissue that overlies the mastoid process resulting in a signal reaching the cochlea by bone conduction. The behavioral response, a voluntary indication of whether or not the person heard the stimulus, is elicited based on activation of the same sensory structures in the cochlea whether the stimulus is delivered by air conduction or by bone conduction. Thus the measurement point for both cases is just beyond the middle ear. The response to an air conduction signal includes the effects of the middle ear because the stimulus must pass through the middle ear to reach the cochlea. The response to a bone conduction signal minimizes, though does not eliminate, the middle ear effects because the signal bypasses the middle ear and travels to the cochlea directly by bone conduction. Comparison of air-conduction and bone-conduction behavioral responses provides quantitative information on middle ear function.

Both the air- and bone-conduction transducers are calibrated on a decibel scale in hearing level (dB HL) with well-known standards such that the air- and bone-conduction signals produce equal responses at an equal dB HL in individuals with normal middle ear function. In the presence of abnormal middle ear function, the level of the air conduction stimulus that reaches the cochlea will be attenuated compared to level of the bone conduction stimulus that reaches the same cochlea. The amount of the stimulus elevation in decibels to overcome this attenuation is a quantification of the energy loss through the middle ear in cases of abnormal middle ear function. This diagnostic information can be obtained at multiple frequencies and will accurately quantify middle ear function regardless of the absolute hearing sensitivity of the cochlea as long as the hearing sensitivity of the cochlea itself does not change between the two measures. For all cooperative subjects willing and able to provide behavioral responses, it is very reasonable to assume that cochlear function is stable whether the cochlea is normal or abnormal. The only exception is the rare occurrence of cochlear pathology associated with fluctuating hearing, and even in these cases, the change in cochlear function occurs much slower than the time it takes to complete the two measures.

8.1.2 Physical Measures

Physical measures comprise the second category of quantifiable information of the middle ear. These were developed later than the behavioral measures of middle ear function but are equally common and only slightly more invasive. Physical measures are based on the application of a controlled acoustic signal and controlled static air pressure in the external ear canal while simultaneously measuring the resultant sound pressure level that develops in the ear canal. The sound pressure level that develops in the ear canal in turn is dependent on the acoustic characteristics of the external ear canal itself and the acoustic characteristics at the lateral surface of the tympanic membrane, which in turn are related to the physical characteristics of the tympanic membrane and the middle ear structures.

Several conditions can alter the measured acoustic characteristics at the lateral surface of the tympanic membrane. These include the applied static air pressure in the ear canal, both positive and negative relative to ambient air pressure, many abnormal middle ear conditions related to abnormal development, and many middle ear pathologies. Changes in the physical characteristics of the tympanic membrane and the middle ear structures in turn result in measurable changes in the ear canal sound pressure level. The changes in the resultant measured ear canal sound pressure can be converted to a variety of acoustic units that employ the impedance analogy (impedance, reactance, and resistance), the admittance analogy (admittance, susceptance, and conductance) and acoustic reflectance, all under the generic umbrella term of acoustic immittance measures. Both magnitude and phase of the measured sound pressure level can be determined. The raw measured value represents the effects of the combined acoustic characteristics of the external ear canal and the lateral surface of the tympanic membrane, which can be further separated into values that represent the acoustic characteristics of each separately. Because the measured acoustic immittance values vary as a function of the applied ear canal air pressure, additional analysis can provide a very good estimate of the middle ear air pressure, even though no pressure transducer is placed in the middle ear. These measures are considered physical rather than physiologic because they are based strictly on the physical characteristics of the external ear canal and middle ear structures and can even be obtained in a cadaver.

8.1.3 Physiological Measures

Physiological measures of the middle ear include a variety of involuntary responses that result from calibrated and controlled stimuli that induce specific involuntary physiological activity. The involuntary physiological responses range from reflexive contractions of the middle ear muscles detected from acoustic measures in the external ear canal to acoustic signals generated by the cochlear structures, also detected in the external ear canal, to neural activity from the peripheral auditory neural system (cochlea and cranial nerve VIII) and central auditory neural system (brainstem and higher neural systems) detected from electrical signals measured with surface electrodes on the head. Though changes in these physiological responses in relation to changes in the controlled stimuli can be analyzed to infer functional status of the neural mechanisms themselves, that is not the focus of this chapter. However, assuming that the neural functional status does not change between measures, or as a result of other interventions, these involuntary physiological responses can also be analyzed to obtain quantifiable measures of the functional status of the middle ear structures. In essence, the voluntary behavioral measures mentioned earlier can be replaced by these involuntary physiological measures.

8.1.4 Imaging

Imaging technology in general, and for the middle ear in particular, is currently undergoing many new developments and is evolving. Entirely different imaging technologies for the middle ear either are becoming available or are being adapted specifically for clinical use. The details of some of these technologies and how the findings relate to research are reviewed by Funnell, Maftoon, and Decraemer, Chap. 7). Consequently, clinical middle ear imaging is the least standardized of the measures and the most likely to change in the near future.

The full range of clinical medical imaging techniques can be used for the middle ear depending on the purpose. The middle ear and some of its structures can be observed directly with an endoscope in a limited number of cases. Because the middle ear contains bone, soft tissue, and air-filled space, conventional medical imaging can provide quantifiable images of the structures of the middle ear as well. A variety of measures are based on ionizing radiation (X-ray) that optimizes visualization of bone. Others are based on magnetic resonance imaging (MRI) that optimizes visualization of soft tissues. Optical methods are able to visualize both bone and soft tissues. All imaging methods are more invasive than behavioral measures or physical measures but for different reasons. The ionizing radiation used in X-ray imaging raises concerns about direct damage to tissues, a serious enough risk that this approach is not used for research on normal subjects. The magnetic signals used in magnetic resonance imaging (MRI) do not necessarily have a risk of damage to tissues directly, but also can be considered invasive because the individual must often be positioned for long periods of time in the very small spaces of an imaging scanner that in turn increases the risk of claustrophobia or related psychological reactions. Optical methods may require apparatus positioned deep in the ear canal. In some cases, imaging conducted for unrelated medical problems can also be reformatted to provide structural images of the middle ear. If additional procedures are employed, some functional information can be obtained as well.

8.2 Middle Ear Behavioral Measures

8.2.1 Auditory Threshold

The most common behavioral measure of middle ear function is based on auditory sensitivity as quantified by the threshold of hearing. To be precise, the threshold of hearing is defined as the level of an acoustic signal that is heard 50 % of the time in a series of controlled presentations. The threshold level is directly related to the status of the entire peripheral auditory system, including the sensitivity of the cochlea. A limiting factor is the ambient acoustic noise level making it necessary to conduct the measures in sound attenuating booths. The measurement process involves systematic increases in the level of a calibrated signal to where it is audible 100 % of the time followed by systematic decreases in the level of the signal to where it is audible 0 % of the time. The threshold level is defined as the level of the calibrated stimulus in the range between complete audibility and complete inaudibility, usually defined as audible 50 % of the time. The decibel range between audibility and inaudibility is only a few decibels so that stimulus step sizes of 5 dB are adequate for most measures. More importantly, changes in threshold associated with relevant changes in the middle ear usually exceed this range substantially. It is not uncommon for a middle ear condition to change auditory threshold by 70 dB.

Threshold measures are influenced by a variety of factors that can cause behavioral results to be quite variable. These include factors beyond the peripheral auditory system such as psychological status (e.g., motivation to provide a correct response, fatigue, cognitive ability, developmental age), physiological status (stability of cochlear function related to certain disease processes), effects of certain pharmaceuticals, and exposure to excess acoustic stimulation. The measures also are influenced by the measurement protocol (e.g., instructions on how to respond, the presentation sequence of the stimuli). Though these factors do affect the variability of the threshold measures, when controlled and specified, the threshold values are fairly accurate and reliable for most subjects measured under a specific protocol.

The most common protocol for obtaining a threshold measure for both clinical and research purposes is called a modified Hughson Westlake protocol (Roeser et al. 2007). This protocol includes presentation of a series of calibrated signals, both well above and well below threshold, a 5-dB step size, and enough presentations at threshold levels to determine the 50 % audibility point, usually from three to five presentations and produces acceptable test–retest variability. Most middle ear pathologies and abnormalities result in threshold changes that exceed this variability by a substantial amount such that any criticisms of the subjective nature of the measures are easily mitigated. If finer resolution is required, smaller step sizes and automated algorithms can also be implemented at the expense of greatly increasing measurement time.

8.2.2 Measurement Parameters

The frequency range of hearing for humans is commonly stated as from 20 Hz to 20 kHz. This wide range places significant demands on the stimulus transducer that plays a substantial role in accurately quantifying the thresholds and must be considered when interpreting auditory threshold measures. For air conduction stimuli, the output of a loudspeaker positioned and calibrated in space without the subject present (sound field) is an effective way to specify a sound and a result that accurately includes the effect of the external ear canal in its natural state. This is the optimal choice for obtaining measures that include the complete external ear canal, including the pinna effects that become prominent at higher frequencies and reduce the substantial stimulus variability largely related to the inconsistent acoustic leaks between the transducer and the ear for earphone measures. Smaller insert earphones can be inserted and coupled tightly to the more regularly shaped ear canal with a replaceable soft tip that creates and maintains a consistent acoustic seal between the transducer and the ear canal and improves the reliability of the calibrated stimulus and therefore the reliability of the threshold measures. However, the remaining length of the ear canal is not clearly determined and easily can be altered based simply on how deeply the insert earphone is positioned in the ear canal, resulting in generally small but unknown changes in the level of the applied acoustic signal. Circumaural earphones fit completely around the pinna and are more likely to establish a tight acoustic seal that allows for much more consistent stimulus delivery. These earphones have the added benefit of much better high frequency output than other transducers and allow accurate threshold measures to be obtained up to the maximum frequency for humans (20 kHz) (Ballachanda 1997). However, the output of a loudspeaker calibrated in an undisturbed sound field can be substantially altered by the presence of the subject and by the surrounding acoustic environment that almost always includes reflective surfaces inside the small booths intended to attenuate ambient acoustic noise. The use of sound field signals also makes it more difficult to determine which of the two ears is responding. An earphone more precisely establishes which ear is being stimulated and can have signal calibration improvements compared to the sound field case. Older style earphones that fit on top of the ear (supraaural) do not present consistent stimulus levels because of the unspecified and uncontrolled acoustic coupling between the irregularly shaped pinna and the earphone cushion.

The bone conduction transducers also have limitations related to maximum frequency output and coupling factors. All current conventional bone conduction transducers employ electrodynamic technology that uses a rather large moving rod positioned in a coil to impart the vibratory signal to the skull. Because this rod has significant mass, the output of the bone conduction transducer decreases as frequency increases, resulting in a maximum practical measurement frequency of up to only about 6 kHz (Popelka et al. 2010). The effective output of a bone conduction transducer is also limited because it must be coupled to the skull via the skin surface rather than directly to the bone of the skull. The highly variable soft tissue between the transducer and the skull attenuates the level of the signal to the skull and introduces stimulus variability across subjects. The surface area of the portion of the transducer that is held against the skull and the coupling force of the transducer to the head can be controlled to reduce the stimulus variability associated with these two factors. The surface area of the portion of the transducer held against the head can be defined in shape (1.75 mm3area with specified radius of the contact area and the radius of the edge) as well as the coupling force (Toll et al. 2011).

In spite of these variability issues, quite accurate air-conduction and bone-conduction thresholds can be determined in most patients and research subjects. At a minimum, all of these measurement factors must be delineated under a specified protocol to produce the least variable threshold measures and to provide sufficient information for the measures to be repeatable.

8.2.3 Inherent Threshold Measurement Limitations

The presence of two hearing organs in a patient or subject (right and left cochleae) raises a concern about the participation of the nontest ear during the measurement of the test ear. The amount of the test signal that reaches the nontest ear from the test ear is called interaural attenuation and is determined by a variety of factors associated with the subject and with the transducer. A measurement signal from a bone conduction oscillator placed at one skull location, typically on the mastoid process of the ear of interest, can reach both cochleae at the same level because both cochleae are rigidly coupled to the same skull, resulting in an interaural attenuation of 0 dB. Conversely, an air conduction signal from an insert earphone placed deep in the ear canal of the ear of interest can reach the cochlea of the opposite ear but only at a greatly reduced level, resulting in an interaural attenuation of greater than 100 dB. Interaural attenuation values range from as little as 0 dB to as high as 100 dB and are frequency and transducer specific. If the interaural attenuation is low enough to implicate participation of the nontest ear, the nontest ear can be prevented from contributing to the measured threshold with the addition of an acoustic signal to the nontest ear that covers up or masks its function and thereby removes its participation in the measurement of the test ear. The application of masking to block the nontest ear must be performed carefully because the masking signal itself can affect the hearing sensitivity in the test ear either directly based on the same interaural attenuation values outlined earlier, or indirectly because the neural pathways of each ear have connections between them within the central nervous system. In most cases, effective masking can be applied to the nontest ear such that the threshold measures represent only the function of the ear being measured.

The normal bony cochlea is an enclosed fluid-filled system with rigid walls except for two openings, the oval window that contains the footplate of the stapes and the adjacent round window that is covered by a flexible membrane. These two openings allow the applied signal to generate fluid movements within the cochlea that in turn deflect the cilia on the sensory cells within the cochlea and initiate the sense of hearing. A breach of the rigid walls can result in a third opening that can alter the normal fluid movements in a way that may differ between air- and bone-conduction stimulation. A condition called superior vestibular semicircular canal dehiscence is an abnormal opening in the superior canal of the vestibular system that can act as a third window that in turn can alter the normal fluid dynamics (Rosowski et al. 2004). The specific effects of a dehiscence are currently under investigation. In general, the presence of a superior canal dehiscence results in an improvement of bone conduction thresholds and a decrement in air-conduction thresholds, especially for the frequencies below 1 kHz (Merchant et al. 2007; Merchant and Rosowski 2008). This situation results in a clear difference between air- and bone-conduction thresholds that would suggest a change in middle ear function when this clearly is not the case. However, this condition can also provide a platform for making quantifiable measures that relate to new understanding of the basic mechanisms of air- and bone-conduction transmission.

8.2.4 Audiograms

Figure 8.2 is a representation of auditory thresholds in a typical format called an audiogram. The standard frequencies along the X-axis range in octaves from 250 Hz through 8 kHz. Standard intraoctave frequencies are also indicated. The octave representation of the stimulus directly maps onto the length of the cochlear partition, referred to as a tonotopic organization, with high frequencies near the stapes and low frequencies near the apex. Stimulus level is specified on a reverse decibel scale (dB HL) with the reference for 0 dB HL indicated. Because of the normal differential sensitivity of the ear with respect to frequency, the absolute level in decibels re an acoustic reference for air conduction (sound pressure level [SPL]) and in decibels re a force reference for bone conduction (Newtons) of the signal at each frequency in the audiogram format will differ. However, if the dB HL reference level is specified then the difference between the reference level and HL scales is known and measures on one scale can be converted to the other scale. In terms of auditory function, the dB HL scale is calibrated to be equivalent between air-conduction and bone-conduction behavioral responses in normal ears.

Fig. 8.2
figure 00082

Hearing thresholds in an audiogram format. (Left) Right ear thresholds by air conduction (O) and bone conduction (<). (Right) Left ear thresholds by air conduction with masking to the opposite ear

$$ \mathrm{ Threshol}{{\mathrm{ d}}_{\mathrm{ AC}}}(\mathrm{ dB}\ \mathrm{ HL})=\mathrm{ Threshol}{{\mathrm{ d}}_{\mathrm{ BC}}}(\mathrm{ dB}\ \mathrm{ HL}) $$
(8.1)

and

$$ 0 \ \mathrm{dB} \ \mathrm{HL}=\mathrm{Threshol}{{\mathrm{d}}_{{\mathrm{Normal} \ \mathrm{Ear}}}} $$
(8.2)

that allows direct comparison between air-conduction and bone-conduction responses. A symbol key on the audiogram identifies the type of conduction (air or bone), the test ear and whether or not the nontest ear was masked. Also indicated are the transducers that were used and the reliability of the behavioral threshold measurements.

The functional status of the middle ear is quantified as the change in transmission through the middle ear described as

$$ \mathrm{ Threshol}{{\mathrm{ d}}_{\mathrm{ AC}}}(\mathrm{ dB}\mathrm{ HL})-\mathrm{ Threshol}{{\mathrm{ d}}_{\mathrm{ BC}}}(\mathrm{ dB}\mathrm{ HL})=\mathrm{ dB}\mathrm{ loss}\ \mathrm{ through}\ \mathrm{ the}\ \mathrm{ middle}\ \mathrm{ ear} $$
(8.3)

Positive numbers represent the attenuation of the signal in decibels through the middle ear for that particular measurement. Occasionally negative decibel values are encountered that normally would suggest an active mechanism. However, in the case of the passive middle ear these negative values are the result of normal variability in the air and bone conduction thresholds and should be ignored.

The results in Fig. 8.2a are from a normal hearing right ear with normal middle ear function. The air conduction thresholds (O) are equivalent to the bone conduction thresholds (>) (in dB HL) such that

$$ \mathrm{ Threshol}{{\mathrm{ d}}_{\mathrm{ AC}}}(\mathrm{ dB}\mathrm{ HL})-\mathrm{ Threshol}{{\mathrm{ d}}_{\mathrm{ BC}}}(\mathrm{ dB}\mathrm{ HL})=0\mathrm{ dB} $$
(8.4)

This threshold difference, commonly referred to as the “air–bone gap” provides direct evidence that the middle ear is functioning normally because there is no loss of the signal through the middle ear compared to the normal case. Note that the interpretation applies only to the middle ear function and not necessarily the actual condition of the middle ear. It is entirely possible that the results would be identical in an ear with a small tympanic membrane perforation, a case in which the middle ear clearly is abnormal but with a particular abnormality that may not affect middle ear sound transmission.

The results in Fig. 8.2b are from the opposite left ear. The symbols indicate that the thresholds were obtained with masking to the opposite nontest ear to prevent it from participating in the measurements that allows an interpretation of the threshold results to accurately reflect only the left middle ear function. The air-conduction thresholds have much larger values than the bone-conduction thresholds, providing direct evidence that the left middle ear is not functioning normally, an interpretation that applies only to middle ear sound transmission. In this case the middle ear abnormality was a perforated tympanic membrane that did affect middle ear sound transmission quantifiably. The relation between a specified middle ear abnormality (location and area of the perforation) and the resultant effect on middle ear function (a transmission loss calculated as the decibel difference between the air- and bone-conduction threshold that can be calculated for each frequency) can be quantified quite precisely.

Though often considered to be subjective because they employ voluntary responses, auditory thresholds can be very reliable and very accurate for quantifying middle ear attenuation as a function of measurement frequency. At a minimum, several measurement variables must be specified for proper interpretation including stimulus units (typically dB HL), a known calibration standard for the transducer that was used and a specified measurement protocol that defines the stimulus sequence and the definition of threshold. The pattern of attenuation through the middle ear across frequency can be useful for understanding middle ear function in relation to observable middle ear conditions. The change in the air–bone gap as the result of an intervention such as surgery can also be used to quantify the efficacy of the intervention.

8.3 Middle Ear and Physical Characteristics

8.3.1 Acoustic Immittance

The generic term “acoustic immittance” was coined to include all physical quantities of acoustic impedance and its subcomponents. Acoustic immittance also refers collectively to measures of these physical quantities obtained with a device called an immittance meter that has a probe that delivers a measurement signal to the external ear canal and a microphone that measures residual, unabsorbed sound that indirectly infers middle ear function. A typical clinical immittance measurement system produces a variety of measures including acoustic reflex responses, tympanometry, and multifrequencyimmittance or wideband measures. The clinical acoustic immittance battery provides important diagnostic information about middle ear function and status. Single-frequency tympanometry and acoustic reflex measures have comprised the standard acoustic immittance battery for the past 40 years, while newer procedures using wideband stimuli have been developed recently, including wideband tympanometry.

8.3.2 Tympanometry

Tympanometry is a measure of acoustic immittance in the ear canal as a function of a range of applied air pressure in the ear canal that varies from positive to negative, relative to atmospheric pressure. The most common measurement is acoustic admittance (Y a), or the amount of acoustic energy absorbed by the ear canal, the tympanic membrane and the middle ear, expressed in acoustic millimhos (mmho) with ear canal air pressure expressed in decaPascals (daPa). The reciprocal of acoustic admittance, acoustic impedance, is also used but much less so. The sub-components of acoustic admittance and acoustic impedance can also be measured. Figure 8.3 shows an acoustic admittance tympanogram from a normal ear. Note that an acoustic impedance tympanogram would be inverted.

Fig. 8.3
figure 00083

An acoustic admittance tympanogram at 226 Hz from a normal ear showing tympanometric peak pressure, the static value, and the ear canal value

To record a tympanogram, a probe assembly attached to an acoustic immittance meter is inserted into the external ear canal. The probe assembly has a pliable tip that seals the probe into the external ear canal to prevent an acoustic leak and to allow air pressure in the ear canal to be varied. The probe assembly has three components: (1) a tube that is attached to an air pressure pump in the immittance meter to vary the air pressure in the ear canal, (2) a miniature loudspeaker that is attached to a signal generator to produce a measurement probe tone, and (3) a miniature microphone to measure the level of the residual or reflected probe tone in the ear canal. A sinusoidal probe tone is delivered from the probe to the ear canal while ear canal air pressure is varied from positive to negative (or in the opposite direction) at a specified rate of change. The meter measures the electrical current needed to maintain a constant sound pressure level (SPL) in the ear canal that is directly proportional to the acoustic admittance magnitude at the probe tip. As the acoustic admittance of the middle ear decreases because of the increased tension in the tympanic membrane caused by the increase (or decrease) in the applied air pressure, the SPL in the ear canal increases (or decreases) proportionally and therefore an increase (or decrease) in electrical current to the probe loudspeaker is required to maintain a constant SPL in the sealed ear canal. This change in the electrical current in response to changes in the SPL measured in the ear canal is directly proportional to the admittance magnitude at the tip of the probe.

8.3.3 Acoustic Impedance and Acoustic Admittance

Acoustic impedance (Z a in acoustic ohms) of the middle ear system is defined as the total opposition of the system to the flow of the acoustic energy. Acoustic admittance (Y a in acoustic mhos) is the amount of acoustic energy that flows into the middle ear system. The terms acoustic impedance (Z a) and acoustic admittance (Y a) are reciprocal to each other and are described mathematically as:

$$ {Z_{\mathrm{ a}}}=\frac{1}{{{Y_{\mathrm{ a}}}}} $$
(8.5)
$$ {Y_{\mathrm{ a}}}=\frac{1}{{{Z_{\mathrm{ a}}}}} $$
(8.6)

There are three physical properties that contribute to total acoustic admittance or total acoustic impedance: stiffness, mass, and friction. Stiffness elements in the middle ear system exert effects at low frequencies and are related primarily to the enclosed volume of air in both the external ear canal and the middle ear space. Mass elements in the middle ear system exert high frequency effects and are minimal in normal middle ears. Total acoustic susceptance (or total acoustic reactance in impedance terms) is the vectorial sum of the stiffness and mass elements. If the total acoustic susceptance (B a) is positive (between 0° and 90° phase), the system is stiffness controlled; if negative (between 0° and −90° phase), the system is mass controlled. The third variable, friction, is independent of frequency and determines the dissipation of acoustic energy, called acoustic conductance denoted by G a (or acoustic resistance, R a, in the impedance system).

To calculate the acoustic admittance at the lateral surface of the tympanic membrane, the acoustic admittance of the ear canal (Y ec ) is measured and subtracted from the overall measurement. In practice, the acoustic admittance of the ear canal alone is measured directly and subtracted from the total measured acoustic admittance resulting in a measurement of the acoustic admittance of the middle ear system (Y me ), as in the following equation:

$$ {Y_{\mathrm{ ME}}}=Y-{Y_{\mathrm{ EC}}} $$
(8.7)

A tympanogram includes the admittance of the external ear canal (Fig. 8.3). The admittance of the external ear canal can be subtracted and is then called an ear canal compensated tympanogram, represented by shifting the tympanogram down to where the admittance value at high negative and high positive air pressures is 0 mmhos. The resulting measurement refers to the acoustic characteristics at the lateral surface of the tympanic membrane (Y tm ). The asymmetry in the measured ear canal value at high and low air pressures (Fig. 8.3) is due to slight differences in the volume of the remaining ear canal due to slight displacements of the tympanic membrane at these high positive and high negative air pressures.

8.3.4 Estimate of Middle Ear Air Pressure

As the applied air pressure varies between high negative and high positive values, Y tm will reach its highest value when the air pressures on both sides of the tympanic membrane are equal, resulting in a single peak in the tympanogram. The ear-canal air pressure at which the peak of the tympanogram occurs is the tympanometric peak pressure which is an indirect measure of the air pressure in the middle ear space. Though the tympanometric peak pressure can overestimate the actual middle ear pressure by as much as 100 % (Renvall and Holmquist 1976), it can detect the presence of negative or positive middle ear pressure due to Eustachian tube dysfunction. As the Eustachian tube regains normal function and middle ear effusion resolves, the tympanogram progresses from having no peak (flat tympanogram), to a peak at a high negative pressure (negative tympanometric peak pressure), finally returning to a peak at atmospheric pressure (tympanometric peak pressure near 0 daPa) when Eustachian tube function has returned to normal. Multiple factors produce negative middle ear air pressure including frequent forceful inhaling through the nostrils (“sniffing”), ciliary action in the Eustachian tube, absorption of middle ear gases through increased and excessive diffusion and poor pneumatization of the mastoid. As a result of the multiple mechanisms that contribute to middle ear pressure as well as the inaccuracy of the tympanometric peak pressure measurement relative to actual middle ear pressure, negative tympanometric peak pressure has not been shown to provide reliable diagnostic specificity or sensitivity to diagnosing otitis media but can provide very useful information for quantifying Eustachian tube function and the physical status of the middle ear space.

8.3.5 Ear Canal Volume

A primary purpose of tympanometry is to accurately measure the acoustic admittance at the lateral surface of the tympanic membrane as an indicator of middle ear characteristics. Because the probe tip of the admittance measurement system is lateral to the surface of the tympanic membrane, the acoustic admittance measured at the probe tip jointly reflects the acoustic admittance of the external auditory canal and the acoustic admittance of the middle ear. The accuracy of determining the acoustic admittance at the lateral surface of the tympanic membrane relies upon obtaining an accurate measure of the acoustic admittance of the ear canal between the probe tip and the tympanic membrane. This volume, referred to as V ea , is affected by numerous factors such as the depth of insertion of the probe tip, the dimensions of the ear canal, and the volume occupied by substances in the external ear canal, specifically cerumen. The volume of the ear canal has also been referred to as V ec , or as the acoustic admittance of an equivalent volume of air, V eq .

The most common method to determine V ea is to use the measured acoustic admittance at a high positive air pressure (200 daPa, 1 daPa = 10 Pa) that drives the acoustic admittance of the middle ear toward zero. At this air pressure point the acoustic admittance measured at the probe tip represents the acoustic admittance of the air in the ear canal, assuming that the ear canal walls are rigid. Under reference conditions using a probe tone of 226 Hz, a 1 cubic centimeter (cc) or 1 milliliter (ml) volume of trapped air has an acoustic admittance of 1 mmho. This measure is called equivalent ear canal volume because the measured acoustic admittance is equivalent to the acoustic admittance of a hard-walled cavity of equivalent volume.

The normal range for V ea is highly dependent on several factors including depth of probe insertion and patient age and gender. Women have smaller ear canal volumes than men at all ages, and ear canal volumes steadily increase with age until the ninth decade, when they start to decrease due to collapsing canals (Wiley et al. 1996). The primary value of the V ea measurement is to ensure that the probe tip is not blocked, and that the tympanic membrane is intact. If the tympanic membrane is not intact there will be no change in the acoustic admittance as a function of air pressure in the canal, resulting in a flat tympanogram and the measure will reflect the volume of both the external ear canal and the middle ear space. In the case of a surgically inserted tympanostomy tube, this information can indicate whether or not the tube is blocked. To interpret a flat tympanogram, it is necessary to know the V ea and compare it to age-appropriate normative values.

8.3.6 Static Acoustic Admittance

The acoustic admittance measure after subtracting the acoustic admittance of the ear canal is called peak compensated static acoustic admittance, or simply static admittance, and represents the acoustic admittance at the lateral surface if the tympanic membrane (Y tm ). Static acoustic admittance can be determined accurately from the acoustic admittance tympanogram only if the phase of the probe tone is relatively constant when the ear canal is pressurized. At low frequencies (e.g., 226 Hz), phase shifts are negligible. At higher frequencies, phase shifts are more significant and acoustic conductance and acoustic susceptance must be calculated separately.

$$ {G_{\mathrm{ TM}}}={G_{\mathrm{ peak}}}-{G_{\rm tail }} $$
(8.8)
$$ {B_{\mathrm{ TM}}}={B_{\mathrm{ peak}}}-{B_{\mathrm{ tail}}} $$
(8.9)

Peak compensated static acoustic admittance can then be calculated from the compensated acoustic conductance and acoustic susceptance measures:

$$ {Y_{\mathrm{ TM}}}=\sqrt{{{G_{\mathrm{ TM}}}^2+{B_{\mathrm{ TM}}}^2}} $$
(8.10)

Although acoustic admittance (Y a) tympanometry can provide useful information about middle ear status in adults and children, modern immittance equipment can simultaneously measure B a and G a while varying air pressure in the external ear canal, known as multi-component tympanometry (B a/G a tympanometry) to obtain additional information. The B a/G a tympanograms can be recorded at various probe tone frequencies, but typically include 226, 678, and 1,000 Hz.

8.3.7 Tympanometric Gradient

A number of studies have demonstrated that the sharpness of the tympanometric peak is associated with middle ear pathology (Nozza et al. 1992, 1994). Two closely related measures that quantify the sharpness of the tympanometric peak are the tympanometric gradient and tympanometric width. Tympanometric gradient is a measure of the slope of the tympanogram on either side of the tympanometric peak. The most common method for calculating gradient is to calculate the difference in acoustic admittance at the peak and the average of the acoustic admittance values at +50 and −50 daPa relative to the acoustic admittance at peak pressure. The gradient is an index that ranges from 1.0 (flat tympanogram) to very high values depending on the value at the tympanometric peak pressure. The higher the gradient, the sharper and more narrow the tympanogram peak. The presence of middle ear effusion decreases the gradient (and increases the width) of the tympanogram peak. A less common method is to calculate the width of a tympanogram (in daPa) measured at one half the compensated static admittance point. Both measures provide an index of the sharpness of the tympanogram in the vicinity of the peak and quantify the relative sharpness (steepness) or roundness of the peak.

8.3.8 Probe Tone

Vanhuyse et al. (1975) examined tympanometric patterns in adults at various probe tone frequencies and developed a model that predicts the shape of B a and G a tympanograms at 678 Hz in normal ears and in ears with various pathologies. The Vanhuyse et al. model categorizes the tympanograms based on the number of peaks or extrema on the B a and G a tympanograms and predicts four tympanometric patterns at 678 Hz. The transition between different Vanhuyse patterns can be shifted to higher or lower probe tone frequencies depending on the nature of the middle ear pathology.

Evidence has accumulated that tympanometry using higher probe-tone frequencies (up to and including 1,000 Hz) is more sensitive to changes in middle ear status in infants less than 4 months old compared to 226-Hz tympanometry. Some studies have reported normative data for a variety of young ages, and some have investigated test performance of specific 1,000-Hz admittance criteria in predicting otoacoustic emission screening results (Hunter and Margolis 2011).

The resonant frequency of the middle ear is the frequency at which the total acoustic susceptance is zero and is directly proportional to the square root of stiffness and inversely proportional to the square root of mass. The resonant frequency of the middle ear can be determined using multifrequency tympanometry. The resonant frequency of the middle ear system may be shifted higher or lower compared to healthy ears by various pathologies. Otosclerosis, for example, increases the stiffness of the middle ear system and shifts the resonant frequency of the middle ear system to higher probe tone frequencies.

8.3.9 Wideband Energy Reflectance

Wideband energy reflectance (WBER) is a relatively new middle ear analysis technique, in which complex sounds ranging from 0.2 to 10 kHz or higher are presented into the ear canal and the amount of energy reflected back from the middle ear is calculated. Energy reflectance (ER) has been used in research on human middle ear function for two decades (Keefe et al. 1992; Voss and Allen 1994); however, its application in clinical assessment of the middle ear is still developing. Clinical systems are currently being commercialized. One system is based on the calibration method developed by Voss and Allen (1994; Mimosa Acoustics Corp.) that is FDA approved. A second system is based on the work by Keefe et al. (1992; Interacoustics) that is currently an investigatory research system. Wideband energy reflectance has an advantage over multifrequency tympanometry in that the location of the probe in the ear canal is not as critical as in single-frequency tympanometry, especially at higher frequencies. Further, energy reflectance compared to standard 226-Hz tympanometry may provide a more sensitive measure in evaluating middle ear disorders and conductive hearing loss. Another advantage of reflectance measurements is that the measurement frequency can be up to 10 kHz, with less contamination by standing waves in the ear canal.

Reflectance, R(f), refers to the ratio of the incident (forward) and retrograde (backward) pressure waves, while [R(f)]2 is the power reflectance (ER). A value of 0 occurs when all of the sound energy is absorbed by the middle ear and the cochlea while a value of 1.0 occurs when all of the energy is reflected back from the middle ear. The reciprocal of ER is known as power absorption (PA) and when expressed in decibels, is known as transmittance.

Reflectance is mathematically defined as the ratio of 1 minus the product of the admittance (Y) and characteristic impedance (Z0) and 1 plus the product of the admittance and characteristic impedance at different frequencies and static pressures. Normative data on measures of wideband energy reflectance (Rosowski et al. 2012) suggest that the most energy is reflected at the low frequencies, while there are regions of lower reflectance in the mid frequencies, and moderate reflectance at high frequencies. The input admittance (Y m) is related to reflectance through the following equation:

$$ \frac{{{Y_{\mathrm{ m}}}}}{{{Y_0}}}=\frac{{1-{R_{\mathrm{ m}}}}}{{1+{R_{\mathrm{ m}}}}} $$
(8.11)

where Y 0 = A/ρc (Voss and Allen 1994; Keefe and Simmons 2003), A is the cross-sectional area of the ear canal, ρ is the density of air in the ear canal, and c is the speed of sound. The values ρ and c are constants, while A is estimated based on the size of the probe tip that is selected when the measurement is made. Thus, measurements of wideband energy reflectance rely on several assumptions, including the assumption that the impedance at the lateral surface of the tympanic membrane is similar to that at the microphone, and the cross-sectional area in each subject who uses a specific probe-tip size is approximately the same. At each frequency, Z s and P s are calculated from the measurements obtained during a calibration procedure. In this procedure, an ear tip is placed separately into four cavities each with a diameter of 0.74 cm but with different lengths, and two measurements of the pressure response are made within each cavity. For each measurement, the pressure-response is plotted in relation to the noise floor for each frequency. Normative data (Merchant et al. 2010; Rosowski et al. 2012) and data from well characterized disease processes (Voss et al. 2012) are becoming available. Reflectance measures in combination with audiometry may improve the ability to differentiate ossicular fixation from ossicular discontinuity in patients with conductive hearing loss who have an intact tympanic membrane and an aerated middle ear (Nakajima et al. 2012).

8.3.10 Wideband Tympanometry

Wideband tympanometry is being implemented in a research system developed by Douglas Keefe at Boys Town National Research Hospital (distributed by Interacoustics) and is capable of measuring energy reflectance or related parameters at ambient pressure as well as at multiple air pressures. Wideband tympanometry can be used to replace standard tympanometry as well as measure middle ear muscle reflexes, described later. The calibration procedure and the system are similar to the one described and used by Keefe et al. (1992). Figure 8.4 is an illustration of the three-dimensional plot of energy absorbance across a range of frequencies and pressures that is produced, effectively plotting multiple tympanograms. The system also can extract a plot of phase which estimates resonant frequency, and a series of single-frequency tympanograms in Y a, B a, and G a units (Fig. 8.4).

Fig. 8.4
figure 00084

A three-dimensional wideband tympanogram of absorbance as a function of ear canal air pressure across frequency (a), sound pressure level (dB) and phase angle (b), and single frequency admittance tympanograms (c–e). (Interacoustics Wideband Research System)

8.3.11 Laser-Doppler Vibrometry

Laser-Doppler vibrometry is based on the concept of measuring the displacement of the tympanic membrane via small wavelength changes in a reflected laser signal. This approach currently is a research tool that is being considered for commercialization that can be useful for differentiation of various ossicular disorders in an ear with an intact tympanic membrane and aerated middle ear. The laser-Doppler vibrometry measures can help differentiate ossicular fixation from ossicular discontinuity in the presence of an air–bone gap (Rosowski et al. 2008). Carefully measured normative values already have been established (Rosowski et al. 2012).

8.4 Middle Ear and Middle Ear Muscles

8.4.1 Middle Ear Muscle Reflexes

The middle ear contains two muscles, the tensor tympani and the stapedius. A contraction of either of two middle ear muscles can alter middle ear function. Each muscle functions quite differently.

The tensor tympani is located in the bony canal above the osseous portion of the Eustachian tube and originates from the cartilaginous and osseous portions of the Eustachian tube. The muscle terminates in a thin tendon that enters the middle ear space, makes a right angle turn around the cochleariform process, and attaches to the manubrium of the malleus. Neural innervation of the tensor tympani is from the tensor tympani nerve, a motor fiber branch of the mandibular division of the trigeminal nerve (cranial nerve V) and does not receive input from the sensory fibers of the trigeminal ganglion. The muscle contracts reflexively and pulls the malleus medially, in the direction of the normal vibratory motion of the tympani membrane, resulting in an increase in the tension of the tympanic membrane. The increased tension dampens the high-level ossicular vibrations associated with chewing and possibly other high-level internally generated sounds. The tensor tympani muscle also contracts reflexively in response to very high-level external sounds that produce a generalized startle response. Though diagnostic measures can determine if the tensor tympani contracts abnormally, information about the function of this muscle provides only limited information about middle ear function or structure in humans other than that the muscle is present and functions normally, suggesting normal middle ear function (Jones et al. 2008).

The stapedius muscle is smaller than the tensor tympani muscle and is the smallest skeletal muscle in the human body. It originates from a small opening in a cone-shaped prominence on the posterior wall of the middle ear space and terminates in a thin tendon that is attached to the neck of the stapes. Neural innervation of the stapedius muscle is from the first branch of the facial nerve (cranial nerve VII) after it exits the facial nerve canal. The muscle contracts reflexively from auditory input via an ipsilateral or a contralateral pathway. The ipsilateral pathway originates in the cochlea then proceeds to the auditory nerve (cranial nerve VIII) and then to the ipsilateral cochlear nucleus in the brain stem. The pathway then proceeds to the nucleus of the ipsilateral facial nerve (cranial nerve VII) that runs through the internal auditory canal to the ipsilateral stapedius muscle. The contralateral pathway also originates in the cochlea, then to the ipsilateral auditory nerve (cranial nerve VIII), then to the ipsilateral cochlear nucleus in the brain stem. At this point the pathway crosses the brain stem through the trapezoid body to the contralateral superior olivary nucleus, then to the contralateral cochlear nucleus in the brain stem and then to the nucleus of the contralateral facial nerve (cranial nerve VII) that runs through the internal auditory canal to the contralateral stapedius muscle. The reflex is actually consensual with an ipsilateral input resulting in bilateral contractions of the stapedius muscles.

When contracted, the muscle pulls the head of the stapes laterally, orthogonal to the direction of the normal vibratory motion of the tympanic membrane. This muscle contraction also tenses the tympanic membrane and likely controls the amplitude of external sound waves through the middle ear to the cochlea. Because this muscle has a very high ratio of nerve fibers to muscle fibers it is likely that it provides a high degree of controlled tension as opposed to an on-off type of reflex response, suggesting that it may influence functions other than sensitivity such as improving the ability to hear in noise (Pang and Guinan 1997; Arnold et al. 2007). The acoustic stapedius muscle reflex, often called the acoustic reflex arc, is a largely involuntary reflex activated by external sound.

Several important characteristics of the acoustic stapedius muscle reflex response have been determined from studies of electrical potentials measured directly from the muscle. The reflex is inactive for acoustic signals less than about 80 dB HL. The reflex threshold level is defined as the lowest level stimulus that still produces an observable contraction. For higher level stimuli that activate the reflex there is a very short latency between the onset of the stimulus and the beginning of the muscle contraction, approximately 10 ms, reflecting the fact that there are only a few neural synapses in the reflex arc. As the stimulus level is increased the magnitude of the muscle contraction increases until the maximum stimulus input. The magnitude of the contraction is directly proportional to the magnitude of the acoustic stimulus, generally with 1 dB resolution demonstrating clearly that the reflex produces graded responses. These graded responses are likely possible because of the high neural fiber to muscle fiber ratio mentioned earlier.

Figure 8.5 illustrates the relative acoustic admittance (Y a) in the external ear canal of a subject with normal hearing as a function of time (seconds). Also indicated is a series of 1,000-Hz pure tones delivered ipsilaterally to activate the acoustic stapedius muscle at increasing levels. By time locking the activation signal to the measurement of the ongoing acoustic admittance, the muscle contractions associated with the activation signal easily can be differentiated from acoustic admittance changes not associated with the muscle contraction. For the large response at 90 dB HL, the slight delay between the inset of the stimulus and the rise of the response can be identified. Note that the magnitude of the acoustic admittance change is proportional to the level of the activating signal. Note also that changes in acoustic admittance that initially may appear to be related to the muscle contraction in fact may not be related. The apparent response between 70 dB and 75 dB HL is not due to a muscle contraction because the acoustic admittance change occurs a substantial amount of time after the stimulus. Discounting this aberrant response, a threshold can be identified as occurring between 85 and 90 dB HL.

Fig. 8.5
figure 00085

Change in acoustic admittance (Y a) in the external ear canal as a function of time (seconds). The vertical dashed lines indicate the onset of a 1 s duration pure tone at 1,000 Hz at the indicated level

The stapedius muscle also contracts reflexively to tactile stimuli applied to the skin in the area of the ipsilateral external ear canal. Though a tactile stimulus is much more difficult to control systematically, this mode of stimulation affords a mechanism for contracting the muscle without stimulating the auditory system.

8.4.2 Measurement Parameters

The continuous measurement of the acoustic characteristics in the external ear canal along with an indication of the stimulus levels and durations (Fig. 8.5) allows the middle ear muscle contractions to be detected and quantified effectively, indirectly and noninvasively. As the muscle contracts, the increase in tension of the tympanic membrane results in a proportional decrease in the measured acoustic admittance.

The acoustic admittance at the lateral surface of the tympanic membrane can be measured with a probe sealed to the external ear that contains a miniature loudspeaker for generating a measurement tone. The level of the tone is first calibrated with a known acoustic impedance load typically provided by a hard walled cavity of known dimensions. A 1 cc hard-walled cavity has an acoustic admittance of 1 millimho at 226 Hz. Next, the level of the measurement tone is measured in the canal using the calibrated microphone in the probe. The measured level is determined by the volume of the remaining ear canal between the probe and the tympanic membrane in combination with the acoustic characteristics of the lateral surface of the tympanic membrane. To repeat, the miniature microphone in the probe is used both for calibrating and measuring the level of the measurement tone. The level of the tone in the ear canal is continuously monitored and will change level in direct proportion to the magnitude of the muscle contraction. The same probe system can be used for presenting the activating signals for measurement of the ipsilateral stapedius muscle reflex. A variety of signal processing techniques make sure that the stimulus signal is not directly detected by the admittance measurement system, ensuring that the measurement represents only stapedius muscle activity.

A common measurement protocol uses a low-frequency probe tone (226 Hz) that has a wavelength much longer than the external ear canal insuring that the level of the measurement tone is the same throughout the ear canal and below the frequency range of the signals that are used to activate the ipsilateral reflex. The level of the probe tone is also kept below the threshold of the reflex to prevent the probe tone itself from directly activating the muscle reflex.

A system that records the ongoing acoustic admittance has temporal characteristics faster than the reflex response to allow accurate monitoring of the muscle activity. An indicator of when the activation signal is presented, either with a mark on the recording or with a second channel, will time lock the activation signal presentation and any changes in the measured signal and allow analysis of stapedius muscle reflex activity including its temporal characteristics.

The temporal characteristics of the reflex are characterized by a short latency (about 10 ms), a fairly rapid rise time, and an on time that is not directly related to the on time of the stimulus (Fig. 8.5). The off time, or the time of the response after cessation of the stimulus, represents the natural relaxation of the muscle.

Measurement of the acoustic stapedius muscle reflex behavior can provide useful information on the status of the middle ear. Careful consideration of the results is necessary for correct interpretation because the middle ear status can affect both the level of the activation stimulus reaching the cochlea and the ability to detect the muscle response. Two specific examples can be illustrative.

In the first example, both arches of the stapes are not contiguous because of a disease process. This abnormality will not prevent detection of the muscle contraction from acoustic admittance measures because the muscle tendon is attached lateral to the abnormality. However, this abnormality also causes a large conductive loss through the middle ear such that an ipsilateral stimulus will be greatly attenuated before reaching the ipsilateral cochlea. In this case the ipsilateral acoustic stapedius reflex will be absent. However, if the contralateral ear is normal and used to activate the consensual reflex, the middle ear muscle contraction in the abnormal ear will be detected easily and at normal levels. The combination of the ipsilateral and contralateral stapedius muscle responses provides quantifiable diagnostic information on the functional status of the stapes.

In the second example, a middle ear is filled with an exudate from a disease process that does not result in much of a loss through the middle ear as measured with the auditory thresholds using the air–bone gap. However, the presence of the exudate may decrease the acoustic admittance at the lateral surface of the tympanic membrane preventing detection of the muscle contraction even though the muscle contraction is present. Measurement of behavioral hearing thresholds by both air and bone conduction will provide estimates of the stimulus level reaching the cochlea. Once this is known, ipsilateral and contralateral measures of acoustic stapedius muscle contractions can characterize the status of the middle ear including the status of components within the middle ear, especially when considering the attachment of the stapedius tendon.

Acoustic stapedius muscle reflex thresholds generally are around 85 dB HL for 500-, 1,000-, 2,000-, and 4,000-Hz tonal signals and up to 20 dB better for wideband noise signals. More precise normative values consider the age of the subject, especially for young children, even newborns.

8.5 Middle Ear and Otoacoustic Emissions

The normal cochlea not only detects sounds but also generates sounds. These cochlea-generated sounds are an epiphenomenon associated with active processes within the cochlea. A sound originating from within the cochlea can propagate back through the middle ear and be detected in the external ear canal as an otoacoustic emission. The presence of cochlear active processes was first demonstrated experimentally as otoacoustic emissions in 1978 (Kemp 1978). The otoacoustic emissions arise from a number of different mechanisms within the cochlea but generally are associated with outer hair cell motility. Several lines of evidence suggest that, in mammals, the active processes associated with outer hair cells increase cochlear sensitivity and frequency selectivity.

Otoacoustic emissions will diminish or disappear after damage to the cochlea. Comparison of the measured otoacoustic emission levels with respect to otoacoustic emission levels in normal ears often is used as an indication of the functional status of the outer hair cells. Therefore, otoacoustic emissions have been studied almost exclusively in relation to understanding cochlear mechanisms (Shaffer et al. 2003) or measuring the effects of various cochlear pathologies on cochlear function. However, because the source of the emissions is just medial to the middle ear, and their detection occurs just lateral to the middle ear, measures of otoacoustic emissions also can be used to understand the middle ear properties as well, which is the focus of this chapter.

The levels of the otoacoustic emissions are very low in normal ears, decrease with age (Abdala and Dhar 2012), and become lower still in abnormal ears. Their accurate detection involves consideration of several factors that are independent of the particular type of otoacoustic emission. First, because the responses are very low level acoustic signals, control of the acoustic environment is essential. Usually this is accomplished by making the measures in a sound attenuating booth. Second, the somewhat transient air pressure in the middle ear, normally controlled by the Eustachian tube, can produce small changes in the admittance of the middle ear that in turn can affect both the stimulus level reaching the cochlea and the level of the emission propagating from the cochlea back to the external ear canal. These factors result in variability of the emission levels being measured but can be minimized by ensuring that the otoacoustic emissions measures are made only after the middle ear space is equalized by the Eustachian tube, or that an external air pressure is applied that maintains the air pressure differential across the tympanic membrane at 0 Pa. Detection of the otoacoustic emission can be optimized by this reduction in the variability of the measures.

Otoacoustic emissions have been categorized by various types. One type is associated with no stimulus and the remaining types are defined by the characteristics of an acoustic stimulus that evokes the emission.

Spontaneous otoacoustic emissions are otoacoustic emissions that occur spontaneously with no external stimulus. However, spontaneous otoacoustic emissions are unpredictable regarding their spectral composition, level and occurrence, even in normal ears. Therefore, they do not play much of a role in characterizing middle ear structure or function.

Evoked otoacoustic emissions are otoacoustic emissions that are evoked with an external sound stimulus. Evoked otoacoustic emissions can be measured using three different approaches. In general, the level of the evoked emission is much lower than the level of the evoking signal.

8.5.1 Transient Evoked Otoacoustic Emissions

Transient otoacoustic emissions are evoked with a transient stimulus, that is, a very short duration stimulus. The short duration stimulus can be either a brief duration impulse signal (click) or a brief duration tone burst signal. A transient click stimulus will contain spectral energy over the frequency range of the stimulating transducer, generally up to around 4 kHz. A transient tone burst stimulus will contain spectral energy consistent with the frequency of the tone burst. The measurement microphone will pick up the high level stimulus itself and the lower level emission. Because the transient stimulus is much shorter than the time it takes to generate the emission plus the additional time for the emission to travel from the cochlea back to the measurement microphone in the external ear canal, the stimulus portion of the measured signal easily can be separated from the emission portion of the measured signal by starting the measurement just after the stimulus is completed. The rapid onset of a transient stimulus allows time-locked measurement sequences and signal averaging to reduce the unwanted, uncorrelated noise in the measured signal. Thus, controlled stimuli can be used to evoke consistent emissions generated by the cochlea and detected in the external ear canal, affording a measurement of both the level of the stimulus through the middle ear and the level of the emission also through the middle ear but from the opposite direction. Consideration of this measurement paradigm allows direct measures of the middle ear function. The fundamental measurement paradigm for transient evoked otoacoustic emissions is analogous to the measurement paradigm for the ipsilateral acoustic stapedius muscle reflex. A controlled stimulus in the external ear canal travels through the middle ear and generates a constant and repeatable signal as an otoacoustic emission in the cochlea that then travels back through the middle ear where it can be detected in the ear canal providing information about losses through the middle ear from both directions (Puria 2003). Because the signal has a wide spectrum, the transient evoked otoacoustic emission also will have a wide spectrum. Spectral analyses of the response can provide some frequency-specific information though the frequency resolution will not be very high.

8.5.2 Stimulus Frequency Otoacoustic Emissions

Stimulus frequency otoacoustic emissions are evoked with a pure-tone stimulus and are detected by the vectorial difference between the stimulus waveform and the recorded waveform that consists of the sum of the stimulus and stimulus frequency otoacoustic emissions. This category of emission is much more difficult to measure because the emission and the stimulus are at the same frequency. Consequently, much less is known about this otoacoustic emission than the other otoacoustic emissions.

8.5.3 Distortion Product Otoacoustic Emissions

Distortion product otoacoustic emissions are evoked using a pair of primary tones with frequencies f 1 and f 2 with f 1/f 2 = 1.2, and levels L 1 and L 2 typically with L 1 = L 2 = 65 dB SPL or L 1 = 65 dB SPL and L 2 = L 1 − 10 dB). The evoked otoacoustic emissions from these stimuli occur at frequencies mathematically related to the primary frequencies, with the highest level distortion product emission f dp = 2f 1 − f 2 (the “cubic” distortion product). Because f dp is always different from the frequency of either of the 2 tones that evoke the otoacoustic emission, the otoacoustic emission can be measured in the presence of the evoking tones allowing for constant and simultaneous measurement of the emission. The measurement process can also use signal averaging in either the frequency domain or the time domain to reduce the level of the uncorrelated noise allowing for accurate measures of very low distortion product otoacoustic emissions.

Figure 8.6 illustrates the level of a distortion product otoacoustic emission (2f 1− f 2) as a function of the level of the two tones (f 1 and f 2 with L 2 = L 1 − 10 dB) used to evoke the emission, in a normal ear obtained with two different signal averaging algorithms that have different noise floors. An increase in the number of individual spectra during averaging will reduce the variability of the noise floor but maintain the same average level of the noise floor relative to the level of the distortion product. By contrast, an increase in the number of individual waveforms during averaging will maintain the variability of the noise floor but reduce the average noise floor level relative to the level of the distortion product and allow the detection of the emission at lower levels. The choice of signal averaging algorithm can greatly influence the results (Popelka et al. 1993; Nelson and Zhou 1996), making it necessary to specify the signal averaging parameters for proper interpretation of the emission level.

Fig. 8.6
figure 00086

Level of distortion product otoacoustic emissions evoked with a stimulus comprised of a pair of primary tones with frequencies f 1 and f 2 with f 2 /f 1 = 1.2, and levels L 1 and L 2 with overall level equal to L 1 and L 2 = L 1 − 10 dB, illustrating the difference in noise floors between averaging individual waveforms (time domain) and individual spectra (spectrum domain) (Adapted from Popelka et al. 1993)

When attempting to define changes in the level of the emission in relation to middle ear conditions, a threshold of the emission can be used. The threshold of the emission can be defined as the lowest level of the evoking signal that produces an observable emission specified as a predetermined level above the noise floor. However, for repeatability purposes, the exact parameters of the measurement process including the particular signal averaging method and the number of measures per average must be specified.

An alternative and more common analysis is to report changes in the level of the emission counting only those responses that are above a predetermined noise floor level. A related issue for proper determination of the emission level in this manner is that the input–output functions are often not monotonic (Popelka et al. 1993). The nonmonotonic nature of these functions has been studied extensively though they relate more to cochlear function than middle ear function. Once all of these parameters have been specified, changes in the level of an otoacoustic emission reflect the function of the middle ear in the direction of a signal (the emission) on its way from the cochlea through the middle ear into the external ear canal after accounting for the change in the stimulus level reaching the cochlea that evokes the emission. The air–bone gap in dB can be subtracted from the stimulus level (Fig. 8.6) to account for changes in stimulus level reaching the cochlea and the same air–bone gap in decibels can be subtracted from the emission level (Fig. 8.6) to arrive at an expected emission level in cases with a known air–bone gap.

8.6 Middle Ear Imaging

8.6.1 Otoscopy

The basic monocular otoscopy described earlier is necessary to provide information about the external ear canal to ensure that all of the subsequent middle ear measures are in fact measuring the middle ear. Traditional monocular otoscopy can be enhanced with the addition of a pneumatic mechanism for simultaneously inducing air pressure changes manually, usually with a soft bulb that can be squeezed by hand. Pneumatic otoscopy allows for visualization of the movement of the tympanic membrane under dynamic conditions and provides a subjective estimate of its stiffness. Though very useful for medical diagnoses, membrane mobility is observed only for the very low frequency and very large oscillations of the pneumatic system rather than the much smaller and much higher frequency oscillations of auditory signals going through the middle ear system. Otoscopy also can be performed with a binocular microscope that offers three-dimensional viewing. A video camera can be added to otoscopy to provide recorded images that can then be used for more objective measures. With otoscopy, the entire lateral surface of the tympanic membrane may not be visualized because of the tortuous nature of the external ear canal. In cases of a normally transparent tympanic membrane, several middle ear structures can be visualized as well as exudate in the middle ear space. Various rating scales have been devised that can be very useful for reliability across observers and for categorizing middle ear conditions(Casey et al. 2011). Though mandatory for medical diagnosis, otoscopy provides only limited information concerning the actual structure and function of the middle ear anatomy.

8.6.2 Middle Ear Endoscopy

Middle ear endoscopy can be achieved with a thin endoscope inserted into the middle ear. This can be performed surgically with the tympanic membrane raised or through an existing tympanic membrane perforation for a direct view of the middle ear space and its contents. Middle ear endoscopy also can be achieved with a surgical incision in an intact tympanic membrane but is a much more invasive procedure. Bremond et al. (1990) introduced the concept of middle ear endoscopy through the Eustachian tube orifice in the nasopharynx. This method is used less often than transtympanic endoscopy because it can cause tissue irritation and bleeding. The procedure also has had limited use because of the restricted field of view and the presence of disease related exudates in the middle ear that limit observation. In select cases, the method provides useful information including a direct visualization of the ossicular chain. The method may improve in the future as the diameter of the endoscopes decreases and the endoscopes become more flexible with improved optics. Very thin endoscopes currently under development for viewing the cochlea already have narrower diameters and improved optics (Monfared et al. 2006).

8.6.3 X-Ray Imaging

Though a conventional plain-field X-ray image is not useful for middle ear imaging because it is difficult to resolve overlapping structures in a single image, more advanced X-ray imaging can provide useful information. Computed tomography (CT) X-ray imaging uses a narrow collimated X-ray beam from an X-ray tube that rotates around the patient. The tissues of the body differentially attenuate the photons resulting in a gray-scale image that further can be processed with digital processing. The resulting images usually are calibrated with water as a reference and the signal processing can be optimized for viewing specific structures such as bone and soft tissue. Scanning sequentially through the structure of interest produces a series of images that can be viewed separately as very thin “slices” or in a fashion similar to varying the focal plane with a traditional microscope. The resulting series of thin planar images allows visualization of a single structure in the third dimension that improves identifiability of that structure compared to using only a single image. The table on which the patient is positioned can automatically move in concert with the X-ray beam resulting in the ability to image the tissue between the slices that in effect produces very high resolution imaging.

Because CT imaging is ideal for visualizing bone, the erosion of the surrounding bone caused by abnormal soft tissue such as a tumor or cholesteatoma can be observed readily. The extent of bone involvement can be used to quantify the size of the abnormal tissue even though the abnormal soft tissue was not imaged directly. Intravenous contrast agents also can be added that are differentially absorbed by abnormal tissues such as a tumor to allow clearer observation of the abnormal soft tissue itself provided that the soft tissue is sufficiently vascular to deliver the contrast agent. Software tools are provided in the image analysis that can be used for linear or volumetric measures so that the extent of the soft tissue abnormalities can be quantified.

8.6.4 MicroCT Imaging

MicroCT imaging is a form of X-ray tomography that produces cross-sectional planar images simultaneously in multiple orthogonal planes. These planar images are used to recreate a three-dimensional virtual image on a display. The term micro refers to the size of the voxels that are in the micrometer range. The technology has been used for research of the middle ear (Decraemer et al. 2007; Sim and Puria 2008) and is now becoming available for routine clinical use.

A clinical microCT scanner is generally much smaller than a conventional CT scanner. Typically, the patient sits upright with his or her head fixed using a chin rest and the X-ray source and the detector rotate around the patient’s head. Note that a typical fan-beam system uses an X-ray source that produces a flat pie-shaped beam collected by a line detector with multiple images obtained by moving the X-ray source axially. By comparison, a cone-beam system uses an X-ray source that produces a cone shaped beam that is collected by charge-coupled device focused on a scintillator material that converts X-ray radiation to visible light, in essence an area detector, to create images that can be used for a three-dimensional reconstruction of the tissue. The cone beam approach limits the amount of exposure to the ionizing radiation and allows image optimization of a specified small volume of tissue. The terms microCT and cone beam imaging tend to be used interchangeably and because the scanner is small, the device often is referred to as an in-office scanner. The largest application is in dentistry but such systems are increasingly being used in otolaryngology for imaging the nasal sinuses and the temporal bone.

Each orthogonal image has a consistent number of image pixels in a regular pattern separated by a specified distance, every 0.125 mm, for example. The resulting scans generally are isotropic so the image planes can be positioned in any direction rather than the traditional axial, coronal and sagittal planes. Post-scan image processing can also stack the individual planar images before rendering a volumetric image. Volume rendering allows a two dimensional projection of a three-dimensional discretely sampled data set. Each volume element, or voxel, is represented by a single value obtained by sampling the immediate area around the voxel.

Different structures with similar threshold density make it difficult to separately visualize them by only adjusting volume rendering parameters. A manual or automated segmentation procedure can remove the unwanted structures from the rendered image. Contrast agents can be used to optimize visualization of bone or in some cases even soft tissue. False coloring can be used to highlight specific structures. Stereo images can be created with colored filters, usually blue and green, to provide the illusion of depth.

Figure 8.7 shows the results of a temporal bone scan of a normal temporal bone centered on the middle ear. The ossicles can be seen clearly at high resolution. The supplied viewer application allows the three orthogonal planar images to be rotated individually for viewing of the rendered image from any direction. Cursor lines can be positioned to select specific regions of interest.

Fig. 8.7
figure 00087

A three-dimensional reconstruction of a normal right middle ear obtained with a cone-beam, in office, clinical microCT scanner (J. Morita Mfg Corp., model 3D Accuitomo) and automatic segmentation (large panel). The imaged volume was a 40 × 30 mm cylinder with a uniform 0.125 mm voxel size. The view is from the inside out. The umbo (U) at the end of the manubrium (M) of the malleus, the anterior mallear ligament (L) and the incus (I), are clearly visualized. The three smaller adjacent images show the planes used for the reconstruction and the cursor lines that can be moved for other views. The scan time was approximately 18 s with a radiation exposure about one-seventh of that for a conventional medical CT, similar to the exposure for a conventional dental panoramic image

8.6.5 Magnetic Resonance Imaging

Magnetic resonance imaging (MRI) uses a large magnetic field and radio waves to create images. A patient is placed in a uniform magnetic field that aligns the spinning protons of hydrogen atoms, located mostly in the tissues that contain fat and water, in the direction of the magnetic field. Radio frequency pulses are then applied that change the direction of the spinning protons with respect to the direction of the magnetic field that in turn give off energy that can be detected. The quality of the resulting image is dependent on the strength of this radio frequency energy change that is dependent on the strength of the magnetic field and the proximity to the structure of interest of the receiving coil that detects this energy change. Magnetic field strength is specified in Tesla units, with scanners of 0.5–3.0 T common and scanners up to 7 T becoming available. Receiving coils can be placed around the head or on the side of the head for creating images of the temporal bone and its structures.

Because no ionizing radiation is used and no negative effects of magnetism and or radio frequency exposure have been observed, MRI imaging is considered noninvasive with only few considerations. The MRI scanners produce considerable acoustic energy during the pulse sequence that may be a risk to hearing. Psychological reactions in the case of phobias associated with being positioned in the narrow confines of the imaging apparatus also occur. Implanted metallic objects such as heart pacemakers and cochlear implants are also a concern both in terms of patient safety as well as interference with generation of the images.

The detected energy changes are used to generate the image. Small changes in the radio waves and magnetic fields can affect the contrast of the image and the contrast settings can be adjusted to highlight different types of tissue. The imaging plane can be changed in thickness and to any location or direction without moving the patient. As with CT imaging, planar “slices” can be viewed individually or sequentially and used for constructing three-dimensional images.

The images from MRI imaging are very useful for identifying soft tissues, fluid-filled spaces and air-filled spaces. A variety of pulse sequences can be used to enhance or suppress visualization of various structures including soft tissue, air, blood flow, and fluid-filled structures. Contrast agents are also used, often gadolinium, that enhance visualization of abnormal tissues provided that the tissue is sufficiently vascular to absorb the contrast agent. Contrast agents can be nephrotic, so kidney function should be assessed before contrast agent administration. Middle ear soft tissue abnormalities such as cholesteatoma, polyps, granulation tissue, squamous cell tumors, glomus tumors, basal cell tumors, and cholesterol granulomas can be observed by utilizing various MRI sequences (T1, T2, diffusion restricted, post-gadolinium T1) that take advantage of differential imaging characteristics of the various pathologies. However, because MRI images do not identify bony structures very well, they are not useful for visualizing the ossicles or the bone margins that define the middle ear space.

8.6.6 Real-Time Magnetic Resonant Imaging

Real-time MRI imaging can be implemented by a pulse sequence that generates many planar images per second and played back as a movie. This allows imaging of moving structures in real time and is used for a variety of purposes. In otolaryngology, real-time MRI has been used for measuring the movement of structures in the airway including the soft palate, the tongue, and the airway diameter (Barrera et al. 2009, 2010), all important for quantifying functional airway abnormalities associated with sleep apnea or with speech production. With real-time MRI imaging, a tradeoff exists between image resolution and temporal resolution. Real-time imaging at 6 frames/s produces adequate images for larger structures that move relatively slowly such as the soft palate. An increase in the frame rate up to 30 frames/s will allow observation of structures that move faster such as the tongue during speech, but with a penalty of reduction of image resolution. Because the relevant moving structures in the middle ear oscillate in the auditory range (20 Hz–20 kHz) and at extremely small amplitudes, it is unlikely that real-time MRI will be able to assess the functional status of the tympanic membrane or the ossicles. However, it is potentially possible for Eustachian tube function to be measured in its natural state using real time MRI because the Eustachian tube movements are relatively slow and the tissue displacements are relatively large. The plane of the MRI image can be oriented directly on the oblique angle of the Eustachian tube and the imaging tied to voluntary swallowing. Attempts at this type of middle ear imaging were not successful with a 0.5 T scanner but may prove useful with higher Tesla scanners.

8.6.7 Optical Coherence Tomography

Optical coherence tomography (OCT) produces three-dimensional structural images of tissue with submicrometer resolution, and sound-induced tissue displacement information, from reflections of wide bandwidth light. Clinical optical coherence tomography systems are available in other disciplines such as ophthalmology and cardiology but none are yet available in otology. Adaptations of commercial systems or custom built systems have been used for human auditory research of the middle ear (Djalilian et al. 2008). A patent for an OCT device specifically for the ear was recently issued (USPTO, patent 8115934, February 14, 2012), suggesting that clinical systems will be available for the middle ear in the future.

Optical coherence tomography is a low coherence interferometric method that uses broad-band long wavelength light (near-infrared) that penetrates a few millimeters into soft tissue, and less so in bone, and has critical advantages for clinical measures of the middle ear. It has far greater image resolution than ultrasound or magnetic resonance imaging and greater tissue penetration than confocal microscopy. In contrast to X-ray and MRI imaging, it uses a back-scattered reflected light signal, an echo method similar to ultrasound, but no medium is required so the imaging transducers are not in direct contact with the tissue. Only low signal levels are needed so tissue damage is unlikely. Optical coherence tomography quickly produces sub-surface morphology images at high resolution with no tissue preparation or ionizing radiation. The technology is relatively noninvasive, is able to generate cross-sectional images with micron-scale resolution and has the potential of imaging middle ear structures including the tympanic membrane and its layers, the ossicles, and other middle ear structures such as the middle ear muscle tendons (Pitris et al. 2001) and pathological material in the ear such as cholesteatoma (Djalilian et al. 2010) and possibly biofilms from infectious processes (Xi et al. 2006).

The broad bandwidth light source, a very bright light emitting diode, a femtosecond pulsed laser or even white light, is divided into a direct arm and a mirror reference arm aimed at the tissue surface. A photodetector or a charge coupled device camera measures the combined reflected light from the sample arm and the reference arm. If both arms have the same optical length an interference pattern results based on the characteristics of the tissue. The mirror in the reference arm can be moved to scan the tissue resulting in a reflectivity profile called the time domain method. The sample areas that reflect most of the light create greater interference patterns while the scattered light from other sample areas falls outside the short coherence length and reduces the interference pattern. This reflectivity profile contains information about the spatial dimensions and locations of structures within the tissue. Because the level of reflected light decreases with tissue depth, the image is limited to only a few millimeters below the tissue surface. Wide bandwidth light signals (low coherence interferometry) have interference pattern distances in the micrometer range in contrast to narrow bandwidth signals (high coherence interferometry) with interference pattern distances in the meter range.

A point light on the tissue surface is used to produce an image in the axial dimension (Z-axis). The light source can be moved to scan the tissue along a line to produce a two-dimensional cross-sectional image (XZ axes) or over an area to produce a three-dimensional volumetric image (XYZ axes scan).

Broad bandwidth interference patterns also can be acquired with simultaneous detection of spectrally separated signals either by time locking the light frequency with a spectral scanning source or by using a dispersive detector such as a grating and a linear detector array, a process called spatially encoded frequency domain optical coherence tomography. This approach immediately generates a depth scan without movement of the reference arm mirror and greatly improves imaging speed. The simultaneous detection of multiple wavelengths determines the scanning range and the full spectral bandwidth determines the axial resolution. A full-depth scan can be acquired with a single exposure but only at the expense of a reduced dynamic range. Optical coherence tomography also has the potential of measuring sound induced tissue displacements at levels and across frequency ranges well within the normal auditory function of humans. Tissue vibrations from a specific structure can be measured by phase locking the signal from the direct arm. In experiments on the much smaller tissues of the inner ear, this approach has been shown to be appropriate for acoustic levels of from 20 to 100 dB SPL and frequencies up to 25 kHz (Chen et al. 2011). The addition of an acoustic transducer should allow accurate functional measurements of middle ear structures (Applegate et al. 2011).

A practical clinical instrument for the middle ear will involve several considerations. The surrounding bone will require all transducers to be placed in the external ear canal to image the middle ear, in particular the tympanic membrane and the structures a few millimeters deeper. The particular methods will be selected based on several considerations. Faster scans reduce movement artifacts but only at the expense of clearer images though other processing such as image stabilization can be considered. The miniaturization of the components also will be a factor because the diameter of the external ear canal limits the space, the tympanic membrane may be only partially visible and the mirror scanning mechanism must be included along with the OCT transducers and the acoustic transducer if functional measures are desired. The scanning parameters also will be a consideration to ensure that the scanning process does not go beyond the edge of the tympanic membrane, suggesting that a simultaneous otoscopic view may be necessary for positioning the transducers. Figure 8.8 is a cross-sectional image of the middle ear of an adult cadaver obtained with optical coherence tomography (A. Nguyen-Huynh, personal communication, 2012). The 3 × 3 mm area is orthogonal to the long process of the incus and the handle of malleus and has 16 μm resolution. The tympanic membrane, the manubrium of the malleus, and the long process of the incus can be visualized.

Fig. 8.8
figure 00088

A cross-sectional image of the middle ear of an adult cadaver obtained with optical coherence tomography. The area is orthogonal to the long process of the incus and the handle of malleus. The tympanic membrane, the manubrium of the malleus, and the long process of the incus are identified (3 × 3 mm; resolution, 16 μm) (Courtesy of A. Huynh 2012)

8.6.8 Post-imaging Processing

The raw images (typically in the DICOM format) for CT, microCT, and MRI images can be processed with a variety of software packages (OsiriX, for example) after they have been collected to obtain important additional information. In addition to three-dimensional reconstruction, co-registration of CT and MRI images can be implemented to visualize both bony and soft tissues simultaneously. The structure of interest can be manually segmented from the surrounding tissues in each planar image (Osborn et al. 2011) or computer algorithms can be implemented to segment a particular structure automatically.

An example of post-processing analysis specific to the middle ear can serve as an illustration. Figure 8.9a illustrates a series of planar CT images obtained in a newborn. Each planar image covers a thickness of 1 mm of tissue. The CT imaging was performed for medical reasons unrelated to the auditory system, in this case for diagnosing a brain problem so image resolution, location, and other imaging parameters cannot be varied for other purposes. However, because the planar images coincidently covered the peripheral ear, they can be used for post-imaging processing to assess the middle and the external ear. Figure 8.9b shows a three dimensional reconstruction (Dextroscope, Inc.) of the middle ear and the external ear canal of a newborn using the computerized tomographic images in Fig. 8.9a. The three dimensional image was reconstructed using an automated software segmentation algorithm rather than manual segmentation of a single structure such as the middle ear space (Osborn et al. 2011). The regions bounded by bone (middle ear space, the medial portion of the external ear canal) or by cartilage (the lateral portion of the external ear canal and the pinna) are very visible. The regions of soft tissue such as the tympanic membrane are much less visible. Under higher magnification the umbo of the manubrium and the annular ligament can be identified. By making some assumptions about the location of the tympanic membrane using the umbo and annular ligament as landmarks, and using the measurement tools provided by the software, this three-dimensional image allows a fairly accurate measure of the external ear canal volume. Note that this measure of the external ear canal volume was obtained with the structure in its completely natural state. Before this approach, measures of the external ear canal volumes were obtained by injecting substances into the ear canal, a procedure that can be much more invasive and that can actually distort the ear canal dimensions compared to this imaging technique.

Fig. 8.9
figure 00089

(a) A CT image of the head of a newborn showing the individual planar “slices” and the spacing (1 mm) between the “slices.”(b) A three-dimensional reconstruction (Dextroscope, Inc.) of the left middle ear, external ear canal, and pinna using the CT images in (a) and automatic segmentation

8.7 Summary

8.7.1 Overview

This chapter reviewed the existing conventional and current or soon to be current clinical diagnostic measures of the human middle ear and how they relate to specific middle ear conditions. An emphasis was placed on measures that are the least invasive and yet quantifiable. The measures span a very wide range of methods including behavioral voluntary responses, physical attributes of the middle ear structures, involuntary physiological responses, and imaging. Each middle ear measure provides unique or complimentary information and in aggregate can provide considerable information concerning the functional status of the middle ear, the structural status of several middle ear components including the air space, and the effect of contracting either of the two middle ear muscles. A recurring theme was to define a stimulus on the lateral side of the middle ear as close to the tympanic membrane as possible and use various attributes of the cochlea as a sensor immediately adjacent to the medial portion of the middle ear. This approach generally limits the measures to the middle ear and its components alone and attempts to minimize the influence of structures and systems both lateral and medial to the middle ear. A secondary recurring theme was to quantify all of the measures as much as possible, even those prone to variability associated with voluntary responses, noise, and other factors.

8.7.2 Future Directions

The limitations of the current middle ear measures are being addressed with innovative technology. New calibration procedures will allow more accurate specification of high-frequency acoustic stimuli in the external ear canal (Richmond et al. 2011). New bone conduction transducers will allow careful characterization of the middle ear over the entire frequency range (Popelka et al. 2010) and allow testing of new hypotheses of high frequency middle ear function (Puria and Steele 2010). New imaging systems along with new and improved image signal processing will produce increasingly clearer static images of the middle ear. These structural images evaluated in concert with improved functional measures such as those provided by laser Doppler vibrometry, optical coherence tomography, and real-time MRI will provide a more complete structural and functional minimally invasive analysis of the middle ear in humans.