INTRODUCTION

In Walt Disney’s first full-length animated film (1937), the fair Snow White, seeking shelter from her wicked stepmother, sweeps out the dusty and deserted cottage of the seven dwarfs as they dig dutifully in the mines. Assisted by the wagging furry tails of her forest friends, a motley gang of squirrels and rabbits, Snow White whistles while she works, and “cheerfully together they tidy up the place.” As they sweep the room to the merry tune, the crew of oscillating woodland hair bundles evokes the cellular movements widely supposed to underlie the spontaneous generation of sound by the inner ear.

Figure 1 shows the power spectrum of the sound recorded in quiet from a healthy human ear using a low-noise microphone placed in the ear canal. The seven red spikes reveal that the ear was whistling spontaneously at seven distinct frequencies, all simultaneously. The spontaneous otoacoustic emissions (SOAEs) in this ear occur at center frequencies of 653, 895, 1171, 1363, 1449, 1612, and 2408 Hz, roughly corresponding to the musical notes E5, A5, D6, F6, F#6, G#6, and D7. Individually, each SOAE sounds like a warbling tone; their collective caterwauling can be heard in the linked recording (Shera 2014).

Although spontaneous otoacoustic emissions provide perhaps the most direct and compelling evidence for the existence of active processes in the cochlea, their origins remain otherwise controversial, their proposed explanations disparate and untidy (Zwicker 1986; Talmadge et al. 1991; Sisto and Moleti 1999; Shera 2003; Vilfan and Duke 2008; Duifhuis 2011; Wit and van Dijk 2012; Wit and Bell 2017). So how, indeed, might these sounds arise?

Fig. 1
figure 1

Sound power spectrum showing spontaneous otoacoustic emissions recorded in a human ear canal. The ear is emitting seven almost pure tones (red spikes). In addition to the seven principals, the heads of a few dwarf SOAEs (\(*\)) peek out above the background noise floor (black). Data from subject WL-R (Shera 2003)

TWO CONCEPTIONS OF SOAE GENERATION WITHIN THE COCHLEA

Viewed from the ear canal, SOAEs have all the hallmarks of active, self-sustained, limit-cycle oscillators (e.g., Bialek and Wit 1984; Talmadge et al. 1991; Murphy et al. 1995a1995b1996; van Dijk and Wit 1990ab; Shera 2003; Bergevin et al. 2015). Within the cochlea, however, the biophysical mechanisms responsible for their generation are less clear. Conceptually, the various models so far proposed embody two partially overlapping frameworks distinguished by the locus and identity of the autonomously oscillating elements presumed responsible for the spontaneous emission of sound. In the first class, the oscillating element is taken to be an individual hair cell (or a small group of hair cells); in the second, the oscillation emerges collectively and encompasses, in effect, the entire cochlea and its basal boundary with the middle ear.

Local-Oscillator Framework

The standard and most straightforward account of spontaneous emission dates back to the prescient work of Thomas Gold (1948), who predicted the existence of human SOAEs some 30 years before their discovery (Kemp 1979a; Wilson 1980). To account for the ultra-sharp frequency tuning suggested by his (misinterpreted) psychophysical experiments (Gold and Pumphrey 1948; Hiesey and Schubert 1971; Green et al. 1975), Gold proposed his now-famous “regeneration hypothesis,” in which electromechanical feedback somehow counteracts the viscous damping in the cochlea. Gold noted that if the necessary “self-regulating mechanism” were ever to fail, so that “the feedback ever exceeded the losses, then a resonant element [hair cell in the organ of Corti] would become self-oscillatory, and [the] oscillations would build up.” The element’s autonomous oscillation would then be conveyed back to the stapes, pass through the middle ear, and appear in the ear canal as sound, where “we should hear a clear note.” In Gold’s model and its modern descendants, the characteristics of each spontaneous emission—such as its frequency and bandwidth—are determined locally within the organ of Corti; that is, by the oscillating hair cell and its immediate environment. Thus, the model suggests that SOAEs provide a direct, noninvasive window into the hair-cell’s active process and its internal dynamics.

Local-oscillator models have a lot going for them, in addition to their distinguished pedigree. For example, hair bundles in reptiles and amphibians are known to oscillate spontaneously (Crawford and Fettiplace 1985; Denk and Webb 1992; Martin and Hudspeth 1999; Martin et al. 2001; Bozovic 2019). And the underlying conception accords with an intuition that says that the behavior of any complex system is properly understood as a consequence of the behavior of its parts.

Fig. 2
figure 2

Autonomous hair-cell oscillator. In this framework, SOAEs measurable in the external ear canal occur when an active hair cell (red) goes unstable and begins to oscillate spontaneously. Note that the middle ear is crudely caricatured—no offense intended—using balls and sticks. Key to anatomical abbreviations: TM=tympanic membrane; OC=ossicular chain; OW=stapes and oval window; HCs=hair cells

Global-Oscillator Framework

The local-oscillator framework is so intuitively compelling that one may wonder whether there is really any viable alternative. While wondering, it may help to ponder another, seemingly unrelated issue: “How do humans fly?”. Although “Not very well!” may be the wisecracking retort, we humans do fly, just not by furiously flapping our arms or legs. Rather, we fly by being part of a social network, a technological society that has created airplanes, airports, and pilots. Parts acquire new properties by virtue of their embedding in the whole (Levins and Lewontin 1985), and collectively we do things that individually we cannot. To get spontaneous emissions off the ground, what is the alternative to an individual hair cell spontaneously flapping its bundle?

The answer, originally proposed by Kemp (1979ab), might be called a “global oscillator” (see Fig. 3). What does that mean? In the mammalian ear, a pure-tone stimulus creates a traveling wave within the cochlea that evokes an emission at the same frequency—a stimulus-frequency OAE (Kemp and Chum 1980; Zwicker and Schloth 1984; Shera and Zweig 1993a). Stimulus re-emission occurs when a forward-traveling wave encounters mechanical irregularities along its path that disturb the otherwise smooth forward flow of energy, a process equivalent to scattering the incoming wave (Shera and Zweig 1993b; Zweig and Shera 1995). On its way back to the ear canal, the emitted wave is partially reflected at the stapes, creating another forward wave that combines with the first. When the round-trip gain is high enough, and the round-trip phase shift allows the waves to combine in phase, the process can create an ongoing excitation—a self-sustaining evoked emission—that persists even after the initial stimulus is removed. In other words, the process of multiple internal reflection within the cochlea creates an SOAE (Kemp 1979ab; Zweig 1991; Talmadge and Tubis 1993; Shera 2003; Ku et al. 2009). Although illustrated here using a pure-tone stimulus purposefully applied, the initiating stimulus can be anything that launches a traveling wave, including internal noise. Interestingly, the global, standing-wave framework implies that the generation of SOAEs is analogous to the coherent emission of light by an optical laser (Shera 20032007).

Although the local-oscillator framework presumes that individual hair cells can go unstable and oscillate spontaneously—an assumption as yet uncorroborated in the mammalian cochlea—the formation of standing-wave resonances follows ineluctably from the physics of cochlear wave propagation and reflection. These same physical principles underlie other well-known auditory phenomena, including the microstructure of the threshold hearing curve (Long and Tubis 1988) and the waxing and waning often observed in basilar-membrane responses to acoustic clicks (Shera and Cooper 2013; Shera 2015). In contrast to the local-oscillator scenario, SOAE properties in this framework are not determined locally, but globally, by round-trip traveling-wave gain and phase shifts, including reflection at the cochlear boundary with the middle ear. Although explained here using the familiar language of the mammalian traveling wave, the concept of collective, global oscillation applies more broadly. For example, mechanisms closely analogous to coherent reflection can operate in ears, such as those of birds and lizards, whose tuned responses manifest mechanical phase shifts and delays but which otherwise appear to lack obvious candidates for the waves to be reflected (Bergevin and Shera 2010).

Fig. 3
figure 3

Global-oscillator models. The region between the stapes and the peak of the traveling wave acts as a “resonant cavity” enclosing a nonlinear gain medium powered by hair cells. Partial reflection of forward- and backward-traveling waves (red wavy lines) occurs at each end of the cavity. Standing waves occur at frequencies where the round-trip phase change is an integral number of cycles. Standing-wave amplitudes are stabilized when the round-trip gain matches the losses due to internal damping and acoustic emission into the ear canal

MIRROR, MIRROR

The two modeling frameworks thus emerge from and embody two rather different perspectives. Whereas the essential elements of Gold’s local-oscillator scenario are localized to the hair cell, those of Kemp’s global, standing-wave framework are better understood as emergent features of the whole than as properties of the parts. And whereas Gold supposed that SOAEs result from something relevant gone awry, such as the breakdown of a local feedback-control mechanism, the global-oscillator framework has the ear whistling while it works, emphasizing a common origin with evoked, reflection-source OAEs—both are the natural consequence of distributed wave amplification in the presence of intrinsic, nonpathological impedance perturbations.

Although both frameworks surely represent “accurate descriptions of our pathetic thinking” (Black 1988; Gunawardena 2014), how do we determine which provides the fairer description of actual spontaneous emission? To gain a better understanding of the issues, we briefly examine two telling SOAE features and how they might be accounted for.

Characteristic Minimum Frequency Spacings

We begin with the frequency spacing between adjacent SOAEs. Mining a large database of human SOAEs to construct a histogram of the frequency intervals between pairs of adjacent SOAEs yields the distribution of normalized spacings shown in Fig. 4 (black bars). The strong peak in the distribution implies the existence of a characteristic minimum spacing, a spacing clearly evident in Fig. 1 for the group of SOAEs centered near 1.5 kHz. The characteristic spacing varies systematically with SOAE frequency but in this range is roughly one semitone (see also Schloth 1983; Dallmayr 1985; Zwicker 1988; Russell 1992; Talmadge et al. 1993; Braun 1997).

Fig. 4
figure 4

Distribution of human SOAE spacings. The histogram (black bars) shows the distribution of values vSOAE(f) = Δ/\(\bar{\Delta }\)(f), pooled across frequency. For adjacent SOAEs at frequencies fa and fb, the spacing \(\Delta\)(f) is defined as Δ(f) = |fa − fb|, with f\(\sqrt{\varvec{f}_{{\mathrm{a}}}\;\varvec{f}_{\mathrm{b}}}\) taken as the geometric mean. The normalizing function \(\bar{\Delta }\)(f) denotes the mode (or robust loess trend line) computed from the scatterplot of spacings. The distribution was computed using a database containing 556 SOAE pairs measured in 47 subjects. The red curve shows the distribution of SOAE spacings predicted from measurements of human SFOAE delay. The curve represents the empirical distribution vSOAE(f) = \(\bar{\tau }\)(f)\(/{\tau }\)(f), where \(\tau\)(f) is measured SFOAE phase-gradient delay (1441 data points in 9 subjects) and \(\bar{\tau }\)(f) is the loess trend line computed from the scatterplot of delay values. Adapted from Fig. 4 of Shera (2003)

The global-oscillator framework provides a natural explanation for the characteristic minimum spacing. The emergence of self-sustaining, standing-wave oscillations requires that the total round-trip phase shift be an integral multiple of \(2\pi\), effectively “quantizing” SOAE frequencies. The condition holds at regular, quasi-periodic frequency intervals determined predominantly by the delay of stimulus-frequency OAEs (SFOAEs). (As explained below, the phase shift due to wave reflection from the stapes at the cochlear boundary with the middle ear also contributes.) Indeed, the frequency interval \(\Delta f_\mathrm {SFOAE}\) over which SFOAE phase rotates by one cycle is approximately \(1/\tau\), where \(\tau\) is SFOAE phase-gradient delay. Thus, the longer the delay, the smaller the minimum SOAE spacing. The model implies that measurements of SFOAE delay can therefore be used to predict the distribution of SOAE spacings (see also Bergevin et al. 20122015). The result, shown by the red line in Fig. 4, matches the location and width of the main lobe of the SOAE histogram almost exactly. Details of the calculation, including an explanation for why the prediction is not expected to hold in the long tail of the distribution (as, indeed, it does not), can be found elsewhere (Shera 2003).

Fig. 5
figure 5

Elastic coupling in an array of autonomous hair-cell oscillators. Placing springs between the elements allows the active oscillators to influence and potentially entrain one another, despite differences in their intrinsic natural frequencies of oscillation. Clusters of hair cells that oscillate at the same frequency then emerge (braces), behaving, in effect, as a single oscillator. The cluster size depends on the strength of the springs

By contrast, the most basic, unadorned local-oscillator model offers no explanation for the characteristic distribution of SOAE spacings. The reason is simple: The model, developed to describe single SOAEs, imposes no constraints on which hair cells go unstable, so all spacings are possible. Interestingly, however, the framework can be rescued by expanding what is meant by “local.” For example, if one assumes that the cochlea contains not just a handful of spontaneously oscillating cells but an extended array of them, all predisposed to oscillate at their own natural frequencies but coupled to one another by elastic elements (springs), as shown in Fig. 5, then the resulting model predicts that the hair cells will separate into synchronized clusters in which all cells in a cluster oscillate at the same frequency (Osipov and Sushchik 1998; Vilfan and Duke 2008; Gelfand et al. 2010; Wit and van Dijk 2012; Wit et al. 2000), behaving in many respects as a single oscillator (Wit and Bell 2017). The frequency spacing between the clusters is controlled by the strength of the coupling springs. The stronger the springs, the bigger the cluster and the greater the spacing. To match the SOAE spacings seen in human ears, the elastic coupling in the model would need to be strong enough to produce clusters spanning roughly 75 outer hair cells (i.e., 25 longitudinal groups of 3). Thus, by including sufficient coupling between the elements, and appropriately tuning the parameters, models derived from the local-oscillator framework can be rendered consistent with the data. Of course, the addition of coupling renders these models effectively nonlocal.

Frequency Shifts Induced by Changes in Middle-Ear Stiffness

We turn now to the shifts in SOAE frequency caused by changes in the effective stiffness of the middle ear. Stiffness changes due to tensing the eardrum or stretching the annular ligament can be induced using static pressure in the ear canal or by changing intracranial pressure with posture. Figure 6 shows SOAE frequency shifts induced by tilting the subject, as reported by de Kleine et al. (2000). The frequency shifts are largest (a few percent) at low frequencies and are generally in the upwards direction. Although similar results are seen in human ears using static pressure, the direction of the shift is more variable in lizards (Kemp 1981; Wilson and Sutton 1981; Zurek 1981; Schloth and Zwicker 1983; Hauser et al. 1993; van Dijk et al. 2011; van Dijk and Manley 2013).

Again, the global-oscillator model provides a natural explanation, at least for the human data. Because the round-trip, traveling-wave phase shift (i.e., the total phase lag incurred by traversing the wavy loop in Fig. 3) depends on phase shifts due to wave reflection from the cochlear boundary with middle ear, the frequencies that satisfy the quantization condition are sensitive to middle-ear mechanics. When used to predict frequency shifts due to increases in middle-ear stiffness, the model reproduces both the magnitude and the sign of the trends apparent in the data (red curves in Fig. 6). Details of the calculation can be found elsewhere (Shera 2003).

On a related note, the model predicts that the middle ear can make important contribution to SOAE bandwidths (Shera 2003). For example, temporal jitter in middle-ear stiffness can arise from variations in middle-ear cavity pressure due to breathing or swallowing, from spontaneous middle-ear muscle contractions or those related to eye movements (Gruters et al. 2018), and from changes in intracochlear pressure due to blood flow. According to the model, all these sources of mechanical jitter increase SOAE bandwidths by producing small corresponding variations in SOAE frequency. Furthermore, the model predicts that these increases are generally smaller (i.e., that SOAE frequencies are more stable against perturbations) when SFOAE delays are longer. Consistent with this prediction, humans generally have both the longest SFOAE delays and the narrowest SOAE bandwidths so far reported (Taschenberger and Manley 1997; van Dijk et al. 1994; Bergevin et al. 2015; Abdala et al. 2017).

In contrast to this success, models in the local-oscillator framework must be extended to account for the data. The reason is simple: By itself, the individual hair cell (or cluster of hair cells) knows nothing about the middle ear. But, once again, the local-oscillator framework can, in principle, be rescued by expanding what is meant by local. What is needed, of course, is to communicate changes in middle-ear stiffness back to the oscillating cells and then somehow arrange for the cells to respond by altering their intrinsic frequency of oscillation. At a minimum, one therefore needs to modify the framework so that information, presumably carried by traveling pressure waves, flows back and forth along the cochlear spiral. Not coincidentally, this configuration looks a lot like the global-oscillator illustrated in Fig. 2.

Fig. 6
figure 6

SOAE frequency shifts due to changes in middle-ear stiffness. The gray dots show measured SOAE frequency shifts (in percent) induced by postural changes (de Kleine et al. 2000). The lines show general predictions of the standing-wave model obtained using three values of the fractional stiffness increase. Model predictions were obtained by using Puria’s (2003) measurements and model of the stapes reflection coefficient to estimate changes in reflection phase caused by variations in middle-ear stiffness. To maintain consistency with the quantization condition (round-trip phase shift equal to an integral multiple of \(2\pi\)), the model requires that SOAE frequencies change in order to compensate for the phase change induced by reflection from the stapes. Because of the simplicity of the three-parameter middle-ear model, the predicted curves are smoother (and more consistently in the upward direction) than the frequency shifts observed. Nevertheless, the model captures the major trends apparent in the data. Adapted from Fig. 5 of Shera (2003)

Interestingly, even the global-oscillator framework must be expanded to account for SOAEs in some species of lizard. Roongthumskul et al. (2019) have shown that SOAEs recorded from the right and left ears of the tokay gecko often occur at matching frequencies. In addition, they found that static pressure applied to one ear modifies SOAE frequencies and levels in the other. The reason for this curious behavior is that gecko ears are pressure-gradient receivers in which the two ears are acoustically coupled through the mouth to increase their sensitivity to directional cues. In the tokay gecko, the effective SOAE oscillator is binaural and the global feedback loop spans the entire head.

TIDYING UP THE PLACE

Assessing the two frameworks by “hold[ing] as ’twere the mirror up to Nature” (Shakespeare 1604), has produced an unexpected result. To remain competitive, the naked local oscillator must appear dressed up in global trappings. Consistency with the data requires coupling the local oscillator to more and more of its surroundings, systematically inflating the concept of “local”. Perhaps this outcome should have been anticipated. After all, a moment’s reflection reveals that, as a model for SOAEs, a purely local oscillator must always be a non-starter. An oscillating hair cell cannot help but couple to and affect its environment. Indeed, such coupling is implicitly present—although generally glossed-over by most computational realizations—in the schematic of Fig. 2. Without this coupling, how could the resulting sounds ever appear in the ear canal? The issue, then, is not the existence of coupling, which is a given, but its role and strength. Is the coupling merely a conduit for energy to escape from the inner ear? Or is the coupling strong enough that it plays a determinative role in shaping SOAE generation and behavior? The evidence reviewed here supports the latter view. Coupling is strong enough that no satisfactory account of SOAEs appears viable without it.

Does expansion of the local-oscillator framework to include global couplings with the environment erase all meaningful distinctions between the models? Is the local oscillator, suitably decked out and disguised, now equivalent to the global? It is not. The essential difference between the frameworks emerges not when the coupling is strong—as it appears to be in the real cochlea—but at the opposite extreme, when the coupling is weak. In the weak-coupling limit, local hair-cell oscillators keep on locally oscillating, their movements self-sustaining and autonomous, albeit unheard by listeners in the ear canal. By contrast, were coupling with the middle ear to decrease dramatically by making the cochlear boundary with the middle ear fully transparent to reverse traveling waves, then emergent oscillations that require the globally coupled system (e.g., standing-wave resonances in the “cavity” formed by wave reflection at the stapes) would disappear.

Differences between the modeling frameworks are thus inextricably linked to ongoing controversies surrounding the biophysics of the cochlear amplifier. If cochlear gain arises through the action of critical oscillators poised close to—or, indeed, beyond—the edge of instability (van Hengel et al. 1996; Choe et al. 1998; Camalet et al. 2000; Duke and Jülicher 2003; Reichenbach and Hudspeth 2014), then the dynamical bifurcation responsible for the emergence of self-sustaining oscillation can arise locally (i.e., within the hair cell), and dressed-up versions of the local-oscillator model may then apply (Fruth et al. 2014). Even in these cases, however, the physics of wave propagation and reflection inevitably result in the emergence of global standing-wave resonances (Epp et al. 2015). Although cochlear models based on coupled arrays of self-tuned critical or limit-cycle oscillators—such as those studied by the Dutch engineer Balthasar van der Pol (1927)—have not been extensively tested against mammalian data, their operation near or within regions of dynamical instability tends to produce unrealistically sharp mechanical frequency responses at low stimulus levels (Magnasco 2003; Duke and Jülicher 2003; Kern and Stoop 2003). On the other hand, if cochlear amplification involves mechanisms that evolved to ensure stability of the effective admittance of the organ of Corti, then the necessary bifurcation responsible for SOAEs must arise collectively. For example, a recent model demonstrates that distributed coherent amplification of the traveling pressure wave can produce realistic mechanical responses without resorting to poles in the admittance perched precariously close to the real frequency axis (Altoé and Shera 2020). Many other models of the mammalian cochlea—too numerous to cite and not always in such compelling agreement with experiment—also exhibit stability of the effective admittance at stimulus levels near the threshold of hearing. Finally, and for obvious reasons, the theoretical possibility that cochlear gain arises exclusively via mechanisms that preclude the generation of SOAEs (van der Heijden 2014) is not considered here.

The biophysical implementation of the cochlear amplifier need not, of course, be universal. Different groups of animals may have evolved different strategies for boosting the cochlear response to quiet sounds. Animals without basilar membranes may employ one mechanism, those that exploit outer-hair-cell electromotility and the traveling wave may use another. A corollary is that not all SOAEs are necessarily alike. Whereas SOAEs in amphibians and reptiles, for example, may prove to be essentially “local” in character, SOAEs in mammals may arise “globally.” Supporting this possible dichotomy is the observation that SOAEs in some lizard species, unlike those in mammals, often emerge from a prominent pedestal of suppressible background noise consisting of so-called “baseline emissions” (Manley et al. 1996). Sometimes, both local and global mechanisms may operate within a single species, or perhaps even a single ear. For example, although SOAEs in the mouse may normally (one supposes) arise via standing-wave resonances, damage or mutations that modify the coupling within the organ of Corti, or otherwise decrease the local stability of the cochlear amplifier, may spawn a family of step-SOAEs bearing a different pedigree (Ó Maoileidigh and Hudspeth 2013; Cheatham et al. 2016; Bowling et al. 2019; Cheatham 2021ab).

The local- and global-oscillator frameworks for understanding SOAEs thus correspond to different conceptions of the mechanisms presumed to enhance the sensitivity and dynamic range of hearing. Remarkably, the two frameworks also quietly espouse radically opposing views of causality. In the local-oscillator framework, the cochlea emits sound because hair cells go unstable and begin to oscillate spontaneously. In the global framework, by contrast, the chain of causality is entirely reversed: hair cells in the organ of Corti oscillate spontaneously because the cochlea emits sound.