Keywords

8.1 Introduction

The auditory nerve (AN) is the transmission channel linking the cochlea and brainstem. In response to sound, “analog” hair cell receptor potentials are transformed to a “digital” representation in trains of all-or-none action potentials, or “spikes,” in a population of ~30,000 afferent AN fibers (ANFs) in each human ear. In addition, the cochlea receives efferent inputs from brainstem neurons. This chapter presents auditory afferent and efferent neurophysiology in normal and impaired hearing, emphasizing the relationship between physiology and perception.

8.1.1 Anatomy of Afferent and Efferent Cochlear Innervation

All acoustic information transduced by the cochlea is transmitted to the central nervous system by ANFs, which are the central projections of spiral ganglion neurons (SGNs). There are two varieties of SGN: type I and type II. The peripheral fibers of SGNs are known as type I and type II afferents (Fig. 8.1). Type I afferents make synaptic contact with inner hair cells (IHCs). Each type I afferent fiber receives input from a single IHC, although each IHC contacts 15 to 30 fibers (Liberman 1982). Type I ANFs form synapses onto neurons in the cochlear nucleus, the most peripheral structure in the auditory brainstem. Type I ANFs are the main source of auditory input to the brain, comprising 90–95% of the total SGNs in cats. Their physiological responses to sound are well characterized in both normal-hearing and hearing-impaired animals. The anatomy and function of type II SGNs are less well characterized (Robertson et al. 1999; Benson and Brown 2004). These neurons are unipolar, with thin slow-conducting axons. Each type II SGN peripherally innervates a group of outer hair cells (OHCs) over a relatively broad cochlear region (Brown 1987). Their central projection targets are a diverse set of neurons in the cochlear nucleus (Benson and Brown 2004). Recent work suggests that type II SGNs may form the afferent limb of a cochlear gain-control circuit involving the olivocochlear efferent system (Froud et al. 2015; Fig. 8.1).

Fig. 8.1
figure 1

The cochlea has afferent and efferent connections to the brainstem. Top: Schematized outline of a transverse brainstem section showing the lateral (LOC; green lines) and medial (MOC; thin blue and thin red lines) olivocochlear neurons and the olivocochlear bundle (OCB; gold). Blue arrows, inputs to the “crossed” pathways; red arrows, inputs to the “uncrossed” pathways. Bottom: Afferent and efferent innervation to the organ of Corti. Modified from Guinan (2013), with permission

The cochlea receives efferent innervation from neurons located in the superior olivary complex (SOC; Fig. 8.1). There are two types of olivocochlear efferent neuron: medial olivocochlear (MOC) and lateral olivocochlear (LOC) cells. Both MOC and LOC neurons receive excitatory input from the cochlear nucleus. MOC and LOC fibers travel in the vestibular nerve until close to the cochlea where they join auditory afferent fibers at the vestibulocochlear anastomosis. MOC fibers form cholinergic synapses bilaterally onto OHCs. These synapses are thought to function as a negative-feedback control system. MOC activation hyperpolarizes OHCs (Fuchs 2002), decreases their electromotility, and thereby reduces the gain of cochlear signal transduction. The “crossed” MOC pathway innervates the ipsilateral cochlea, and the “uncrossed” MOC pathway innervates the contralateral cochlea. The input to the MOC comes from the contralateral cochlear nucleus via the decussating fibers of the trapezoid body, so the “crossed” pathway is effectively “double crossed.” In small mammals, the “crossed” pathway is two to three times stronger than the “uncrossed” pathway, both in terms of number of efferent fibers and the strength of the physiologically characterized reflex (Liberman et al. 1996). This may not be true in humans, where ipsilateral and contralateral MOC reflexes appear similar in strength.

LOC neurons have thin, unmyelinated axons that are difficult to record from or to stimulate separately from MOC axons. Therefore, relatively little is known about the physiology of LOC efferents. They project primarily to the ipsilateral cochlea and synapse onto type I afferent fibers immediately beneath IHCs rather than onto the hair cells. LOC neurons are cytochemically heterogeneous (Darrow et al. 2006b), receive excitatory and inhibitory synaptic inputs with slow kinetics (Sterenborg et al. 2010), and elicit either slow (τ ~ 10 min) excitation or slow suppression of type I ANF output (Groff and Liberman 2003). For a detailed review of the olivocochlear efferent system, see Guinan (2006).

8.1.2 Noise-Induced Damage to Cochlear Anatomy

Noise overexposure is the main cause of preventable hearing impairment worldwide (Le Prell and Henderson 2012). Noise exposure causes mechanical and metabolic stress that can lead to temporary or permanent reductions in sensitivity through damage to or the death of OHCs and IHCs and supporting structures of the cochlea (Hu 2012). In cases of intense noise exposure, focal lesions (cochlea regions with >50% hair cell loss or “dead regions”) can occur. The cochlear extent of hair cell lesions depends on the noise exposure spectral content, intensity, and duration (Harding and Bohne 2009). High-frequency exposures create lesions limited to the cochlear base, whereas low-frequency exposures produce lesions spanning a broad region due to traveling-wave properties. Significant variability in cochlear damage occurs across individuals exposed to the same noise but not generally across ears within an individual (Bohne et al. 1986). Some listeners’ ears are “tender” and others “tough” (Cody and Robertson 1983). OHCs are generally more susceptible to noise exposure than IHCs in terms of cell survival (Hu 2012). This observation has led to the common audiological belief that mild-to-moderate hearing losses are primarily OHC based, with IHC damage only contributing to hearing losses greater than 60 dB (e.g., Moore 1995; Edwards 2004).

Single-neuron labeling studies have demonstrated that a combined measure of damage to IHC and OHC stereocilia is an accurate predictor of threshold shift in ANFs (Liberman and Dodds 1984). In contrast to hair cell death, IHCs and OHCs appear to be similarly susceptible to stereocilial damage but with IHCs generally demonstrating a greater fraction of stereocilial damage over broader cochlear regions than OHCs (Liberman and Dodds 1984; Hu 2012; Fig. 8.2). Thus, it is likely, as discussed in Sects. 8.4.2 and 8.7.1, that IHC dysfunction (in addition to OHC dysfunction) contributes to neural-coding deficits in many cases of mild-to-moderate sensorineural hearing loss (SNHL).

Fig. 8.2
figure 2

The cochlear distribution and degree of inner hair cell (IHC) stereocilial damage from noise exposure are generally larger than they are for outer hair cells (OHCs). The fraction of IHC and OHC stereocilial damage is plotted versus the cochlear characteristic frequency (CF) associated with cochlear place (in octave difference relative to exposure-noise center frequency). Data were reanalyzed from Liberman and Dodds (1984) by averaging across animals exposed to narrowband noise with center frequencies from 1.5 to 5.5 kHz

Cochlear synaptopathy can occur immediately after exposure to moderate noise levels that produce only a temporary threshold shift (Kujawa and Liberman 2015). Such temporary threshold shifts had long been assumed to imply no permanent cochlear damage. However, loss of 30–50% of the IHC synapses in cochlear regions basal to the noise exposure has been demonstrated in mice and guinea pigs (Kujawa and Liberman 2009; Furman et al. 2013). Cochlear synaptopathy is thought to result from glutamate excitotoxicity (Pujol and Puel 1999) and precedes delayed (months to years) loss of SGNs. Synaptopathy is hypothesized to underlie a form of hearing impairment for suprathreshold sounds despite normal thresholds. This phenomenon is termed “hidden hearing loss” because it is not detectable by standard audiometric threshold measures (Schaette and McAlpine 2011; Kujawa and Liberman 2015).

8.1.3 Perceptual Effects of Noise Damage

8.1.3.1 Threshold Elevation

Perhaps the most obvious effect of noise damage is loss of audibility so that soft sounds are no longer perceived. Loss of audibility corresponds to threshold elevation for single ANFs in that soft sounds no longer alter the firing pattern of an ANF compared with silence (in neither terms of firing rate nor temporal synchrony). This is typically due to mechanically damaged and tangled hair cell stereocilia or hair cell death in more severe cases (see Sect. 8.1.2).

8.1.3.2 Loss of Frequency Selectivity

The ability to perceptually resolve frequency components of complex sounds is a fundamental component of audition (Fletcher 1940). This relies on ANF frequency selectivity, reflecting mechanical basilar membrane responses (Sellick et al. 1982). The cochlea is commonly thought of as a frequency analyzer that decomposes incoming broadband acoustic energy into narrow frequency bands. Damage to OHCs produces broadened frequency tuning in ANFs (see Sect. 8.3.3). Therefore, even when sounds are made audible for hearing-impaired listeners, loss of frequency selectivity interferes with their ability to analyze complex sounds such as speech and music (Middlebrooks et al. 2016).

8.1.3.3 Impaired Speech-in-Noise Intelligibility

Speech intelligibility in environments with high levels of background noise and reverberation is dramatically reduced for hearing-impaired people (Duquesnoy 1983; Festen and Plomp 1990). Hypothesized deficits in temporal coding of sounds have been proposed as a mechanism underlying this degraded suprathreshold processing (Moore 2014).

8.1.3.4 Abnormal Growth of Loudness and Hyperacusis

Altered perception of loudness is a common effect of cochlear hearing loss. Despite elevated thresholds, loudness is often near normal at high sound levels (~90-dB sound pressure level [SPL]). This “catching up” of loudness results in a reduced dynamic range over which listeners with SNHL transition from just audible to painfully loud conditions. This “loudness recruitment” is typically described as an abnormally steep growth of loudness with increasing sound level. The severity of loudness recruitment generally scales with the degree of hearing loss; however, individual differences are common (Smeds and Leijon 2011). Hyperacusis, an intolerance (or hypersensitivity) to sound levels that would not be bothersome to normal-hearing listeners (Baguley 2003) is a related phenomenon in listeners with SNHL. Questions remain about the underlying neural basis of loudness recruitment and hyperacusis (see Sect. 8.4.2). At least part of the intersubject variability likely arises from individual differences in cochlear pathophysiology (e.g., OHC versus IHC dysfunction; Moore and Glasberg 2004).

8.2 Afferent Fiber Spontaneous Activity

8.2.1 Spontaneous Rate Categories

In the absence of acoustic stimulation, ANFs fire spontaneous spikes at rates between 0 and >100 spikes per second. The population spontaneous rate (SR) distribution is not unimodal (Liberman 1978). There are usually three SR groups described: low (SR < 1 spikes/s), medium (1 ≤ SR < 18 spikes/s), and high (SR ≥ 18 spikes/s). The SR correlates with many morphological and physiological characteristics of ANFs. High-SR fibers synapse on the pillar side of IHCs and are relatively thick (~1 µm diameter), whereas low-/medium-SR fibers synapse on the modiolar side of the IHC and are thinner (~0.5 µm diameter; Liberman 1980, 1982). The threshold at the characteristic frequency (CF) and the dynamic range are inversely related to SR within the same CF region (Liberman 1978), with important implications for coding a wide range of sound levels (see Sect. 8.4.1). Temporal synchrony to fine structure and to envelope components of sounds is typically higher in low-SR fibers than in high-SR fibers (Johnson 1974; Joris and Yin 1992).

8.2.2 Selective Loss of Low-Spontaneous Rate Fibers

Cochlear synaptopathy (see Sect. 8.1.2) involves primarily excitotoxic damage to, and subsequent loss of, low-SR ANFs (Furman et al. 2013). This bias in susceptibility may be related to the relatively low number of mitochondria in low-SR SGCs compared with their high-SR counterparts (Liberman 1980). Mitochondria are important in buffering Ca2+ overload, which is a key trigger for excitotoxic neural damage.

8.3 Frequency Tuning

8.3.1 Physiological Measures of Auditory Nerve Fiber Frequency Tuning

Frequency selectivity is a fundamental response property of auditory neurons in that a narrow range of acoustic frequency components drive the response of a neuron. In ANFs, this frequency tuning largely reflects the mechanics of cochlear signal transduction (Narayan et al. 1998). The frequency to which an ANF is most sensitive (at which detection threshold is lowest) is called the CF. The CF derives from the particular characteristic place along the basilar membrane to which the ANF connects (Liberman 1982). The CF is typically determined from a frequency-threshold tuning curve, which is constructed by adaptively varying the sound level to find the level for each tone frequency that causes a criterion increase in spike rate (Liberman 1978). In addition, it is typical to quantify the “sharpness” of tuning using the tuning-curve quality factor at 10 dB above the CF threshold (Q10dB = CF/10-dB bandwidth).

ANF tuning curves are often interpreted to represent the gain functions (albeit inverted) of cochlear band-pass filters. However, this is only partially true because there are nonlinear effects that complicate cochlear tuning as a function of sound level (Eustaquio-Martín and Lopez-Poveda 2011). Arguably, more accurate measures of tuning can be obtained by estimating the system impulse response for a fixed-level broadband stimulus (de Boer and de Jongh 1978; Recio-Spinoso et al. 2005). Filters obtained using this approach can be more explicitly thought of as true gain functions between the input to the cochlea and the output of SGCs.

8.3.2 Relating Physiological and Behavioral Estimates of Frequency Tuning

It is widely believed that behavioral frequency selectivity is a direct reflection of cochlear filtering. Animal experiments have shown a close correspondence between ANF tuning-curve bandwidths and behavioral frequency-resolution estimates in the same species (Evans 2001). Recent behavioral estimates of human cochlear tuning have emphasized that bandwidths are substantially smaller when derived using a forward-masking rather than a simultaneous-masking paradigm (Oxenham and Shera 2003). The discrepancy between tuning estimates from forward-masking and simultaneous-masking experiments can be understood by remembering that cochlear signal transduction involves an active nonlinear process. Due to saturation of the nonlinearity, one spectral component can reduce the response to another component (“two-tone suppression”; see Sect. 8.6.2). This reduction in cochlear-amplifier gain for broadband sounds increases the apparent bandwidth of filter estimates in simultaneous-masking paradigms. Putting this same argument in terms of single-unit physiology suggests tuning curves derived from responses to single tones may reflect sharp tuning that is rarely achievable under everyday listening conditions where broadband signals such as speech are typically encountered.

Noninvasive techniques for estimating frequency tuning based on stimulus-frequency otoacoustic emissions (SFOAEs) have been applied to many species, including humans (Shera et al. 2002; Bergevin, Verhulst, and van Dijk, Chap. 10). SFOAEs are sounds emitted from the cochlea in response to incoming sounds as a by-product of the mechanical amplification responsible for sharp basilar membrane tuning. The delay of SFOAEs is related to tuning bandwidth (i.e., sharper filters have longer impulse responses). Some evidence from this approach suggests that human cochlear filters are significantly sharper (by a factor of ~2–3) than those of other mammals (Shera et al. 2010). The precise interpretation of SFOAE data related to the question of whether human cochlear filter bandwidths differ substantially from those in other animals has been, and continues to be, controversial. Different experimental approaches and interpretations of data can yield very different results (Ruggero and Temchin 2005; Manley and van Dijk 2016). However, a close correspondence between SFOAE-derived tuning estimates and invasive physiological measures of tuning has been established in several animal models (Shera et al. 2010; Joris et al. 2011). Interestingly, ANF threshold tuning curves in old-world monkeys are a factor of two narrower than threshold tuning curves from cat ANFs at the same CF (Joris et al. 2011).

8.3.3 Selective Effects of Inner and Outer Hair Cell Damage on Tuning

Anatomical evidence that both IHC and OHC damage occur with acoustic trauma (see Sect. 8.1.2) suggests that functional effects of hair cell damage must be considered at the output of the cochlea (the AN) or higher, because basilar-membrane responses do not reflect the contribution of IHC dysfunction. The most significant difference between IHC and OHC dysfunction is in terms of frequency selectivity, which can be understood by considering the differing roles of IHCs and OHCs. IHCs transduce basilar membrane motion into neural responses in individual ANFs (Corey, Ó Maoiléidigh, and Ashmore, Chap. 4), whereas OHCs are responsible for the high sensitivity and sharp tuning characteristic of normal hearing (Santos-Sacchi, Navaratnam, Raphael, and Oliver, Chap. 5; Gummer, Dong, Ghaffari, and Freemann, Chap. 6). High sensitivity allows very soft sounds to be heard, whereas sharp tuning allows sounds that differ in spectral features to be discriminated.

Liberman and Dodds’ (1984) study established a strong correlation between abnormalities in ANF tuning-curve shapes after cochlear damage and hair cell status at the characteristic place of ANFs (Fig. 8.3). Hair cell stereocilial status is much more strongly correlated with threshold and tuning than is hair cell survival. An important implication of this observation is that hair cells are not simply either normally functioning or completely dysfunctional, but rather they can range in functionality after cochlear insult. Normal ANF tuning curves have well-defined “tip” and “tail” regions for CFs greater than ~2–3 kHz in cats, which requires normal stereocilia on both IHCs and OHCs (Fig. 8.3a). Selective OHC loss and/or significant OHC stereocilial damage is associated with elevated tips and hypersensitive tails (“W-shaped” tuning curves; Fig. 8.3b). Complete OHC loss (Fig. 8.3c) results in the complete absence of a tip, with the resulting broad “bowl-shaped” tuning curves thought to represent passive cochlear tuning after loss of the cochlear active process. Damage to IHC stereocilia (Fig. 8.3d), specifically disarray or loss of the tallest row of stereocilia, results in an elevated threshold at all frequencies, with very little effect on frequency selectivity. Passive (“component 2”; see Sect. 8.4.2) response properties, including severely elevated thresholds (90- to 100-dB SPL), greatly broadened tuning, and steepened rate-level functions (not shown) are observed when there is a complete absence of the tallest row of IHC stereocilia without damage to the shorter stereocilia (Fig. 8.3e). Note that all hair cell damage scenarios shown in Fig. 8.3b–d represent a 40-dB shift in ANF threshold, despite quite different frequency-selectivity effects, and thus represent a possible peripheral basis for individual differences in speech recognition despite similar audiometric thresholds.

Fig. 8.3
figure 3

Schematic representation of the structure-function correlation between IHC (red) and OHC (gold) damage (left) and frequency selectivity (right) characterized by pure-tone auditory nerve fiber (ANF) tuning curves (black and gray, normal; blue, impaired). a Normal IHC and OHC function. b Selective OHC loss and/or significant OHC stereocilial damage. c Complete loss of OHCs. d Disarray or loss of tallest row IHC stereocilia. e Complete absence of tallest row IHC stereocilia. SPL, sound pressure level. Modified from Liberman and Dodds (1984), with permission

8.3.4 Advances in Computational Modeling of Cochlear Impairment

Knowledge of the mechanistic and physiological aspects of cochlear function has increased dramatically in the decades since the first book on the cochlea in the Springer Handbook of Auditory Research (Dallos et al. 1996). Much of this knowledge is captured in computational auditory-nerve models, which simulate neural spike trains for single ANFs in response to arbitrary sounds. Several general translational applications (relating physiological properties to perception, modeling the effects of IHC vs. OHC dysfunction) have focused this work on phenomenological auditory-nerve models (reviewed by Heinz 2010, 2016). This style of model is not based on cochlear biophysics but rather represents the salient signal-processing modules for transduction and focuses on simulating response properties at the output of the cochlea.

Significant advances have been made in modeling the effects of IHC and OHC dysfunction on the neural coding of complex sounds (Bruce et al. 2003; Zilany et al. 2014). Key insight toward modeling OHC dysfunction was the recognition that a number of nonlinear cochlear-response properties (compression, level-dependent tuning, and suppression) were all related to a single OHC-based cochlear gain-control mechanism (Patuzzi 1996). Thus, auditory-nerve models that represent each of these nonlinear response properties as depending on the functionality of a single OHC-based module are able to successfully model the systematic effects of OHC dysfunction with a single parameter, including broadened tuning and loss of compression (Carney 1993). Similarly, a single parameter has been used to model the range of IHC dysfunction, including effects on the shapes and slopes of ANF rate-level functions (see Sect. 8.4.2) and speech coding (see Sect. 8.7.1). Recent advances include adding synaptic power-law dynamics, which improved modulation coding and dynamic-range adaptation (Zilany and Carney 2010), and modeling the temporal dynamics and level dependence for efferent control of cochlear tuning (Smalt et al. 2014). It is certainly difficult to understand the combined (and often confounding) effects of IHC and OHC dysfunction experimentally. Phenomenologically based computational models provide great potential for disentangling the relative contributions of IHC and OHC dysfunction and may help in the development of diagnostic tests and rehabilitative strategies for addressing some of the individual differences that currently challenge audiology.

8.4 Coding of Sound Level

8.4.1 Afferent Rate-Level Functions and the “Dynamic-Range Problem”

Absolute and relative sound levels are fundamental acoustic attributes underlying many aspects of auditory processing. Listeners with normal hearing use level cues to perceive speech robustly across a wide range of signal and background-noise levels. In contrast, listeners with cochlear damage struggle with a limited dynamic range and experience difficulty understanding speech in background noise. Despite the fundamental importance of level coding, the discrepancy (“dynamic-range problem”) between the wide perceptual dynamic range (up to ~120 dB) and the limited dynamic range of individual neurons (~30–50 dB) remains unresolved (Viemeister 1988; reviewed by Heinz 2012).

Level coding in single neurons is typically characterized by rate-level function (Fig. 8.4c), where the discharge rate averaged over the duration of a CF tone is plotted against sound level. For ANFs, rate varies monotonically with sound level, increasing from spontaneous rate to maximum (saturated) rate but only over a limited range of sound levels (less than ~50 dB). Both threshold sound level and dynamic range depend systematically on SR (see Sect. 8.2), with high-SR fibers having lower thresholds and smaller dynamic ranges than low-SR fibers (Sachs and Abbas 1974). The wider dynamic ranges of low-SR fibers can be understood from the interaction of their higher thresholds and the sound level of basilar-membrane compression onset (Fig. 8.4b, c). Because low-SR ANFs saturate at SPLs above compression onset, the shallower basilar-membrane responses create shallower ANF rate-level functions, extending their dynamic range (Sachs and Abbas 1974; Yates 1990).

Fig. 8.4
figure 4

Schematic of the relationship between perceptual and physiological representations of sound level for normal (blue) and impaired (red) hearing. a Typical loudness functions, with reduced dynamic range after cochlear damage. b and c Effects of OHC loss on input-output functions for basilar-membrane (BM) and ANF responses, respectively. c, solid lines: low-threshold, high-spontaneous rate (HSR) ANFs; dashed lines: higher threshold, low-SR (LSR) fibers

Several quantitative studies have attempted to solve the dynamic-range problem, each highlighting response properties that extend the neural dynamic range to account for more (but still not all) of the perceptual dynamic range (Siebert 1968; Colburn et al. 2003). Predictions of optimal just-noticeable differences (JNDs) in intensity, which factor in Poisson-like variability of ANF discharge counts, demonstrate that individual ANF rate saturation can be overcome by considering pooled ANF responses that include spread of excitation to CFs away from the tone frequency (Siebert 1968; Delgutte 1996). Although spread of excitation extends the predicted perceptual dynamic range for tones in quiet, robust intensity discrimination in notched-noise maskers suggests that a wide dynamic range must be accounted for by a restricted CF range (Viemeister 1983). A wider dynamic range within a limited set of CFs can be obtained by pooling across all SR groups because the low-SR ANFs have higher thresholds and wider dynamic ranges (Fig. 8.4c). However, predicted-intensity JNDs still increase significantly above 40- to 50-dB SPL due to Poisson variability (increased variance with increased rate) and thus do not account for human intensity JNDs remaining robust up to ~100-dB SPL (Delgutte 1987; Winslow and Sachs 1988). Alternate mechanisms have been proposed to help solve the dynamic-range problem, such as level-dependent temporal synchrony and phase responses (Heinz et al. 2001; Colburn et al. 2003), dynamic-range adaptation (see Sect. 8.4.3), and efferent feedback (see Sect. 8.8).

The benefit of low-SR ANFs for level coding is particularly interesting with respect to cochlear synaptopathy and hidden hearing loss (see Sects. 8.1.2 and 8.2). Low-SR fibers are more resilient to the saturating effects of background noise than high-SR fibers, suggesting that they may be important for listening in noise (Costalupes et al. 1984; Young and Barta 1986). Because low-SR fibers are more susceptible to moderate noise exposures, cochlear synaptopathy is an important factor to consider in explaining the difficulties listeners face in understanding speech in noisy environments, even with normal audiograms (Kujawa and Liberman 2015).

8.4.2 Loudness Correlates in Normal and Impaired Hearing

Loudness is a subjective measure of the perceived magnitude of a sound. The neural correlates of normal and impaired loudness perception remain unresolved (Heinz 2012). A long-standing hypothesis is that loudness relates to the total number of spikes in the ANF population (Fletcher and Munson 1933). Two important factors contribute to total ANF-response growth with sound level: response growth for ANFs with CFs at the tone frequency and recruitment of additional CFs above and below the tone frequency. Analytical AN models incorporating these factors demonstrate a power-law relationship between whole-nerve rate and sound level similar to perceptual loudness growth (Goldstein 1974). However, this spread-of-excitation model predicts a reduction in loudness growth in high-pass noise that is much more severe than observed perceptually. Experimental approaches using a gross action potential measured from the AN trunk to evaluate the summed AN spike-count hypothesis show that total activity grows less steeply than loudness, with a frequency dependency that is inconsistent with the frequency-independent perceptual loudness growth above 1 kHz (Relkin and Doucet 1997). Furthermore, although experimental approaches show a general ability of the total spike count hypothesis to account for loudness summation (i.e., the effect of bandwidth on loudness), there are significant inconsistencies with perceptual data (Pickles 1983).

Loudness recruitment is a common effect of cochlear hearing loss (Fig. 8.4a; see Sect. 8.1.3.4). At first glance, recruitment appears to be well accounted for in terms of altered basilar membrane response growth after OHC dysfunction (Fig. 8.4b). In line with this observation, it has been hypothesized that steeper ANF response growth could be a correlate of loudness recruitment. However, experimental support for this idea is largely limited to specific near-CF and midlevel conditions (Harrison 1981). A theoretical evaluation of the predicted effects of OHC dysfunction on ANF rate-level function slopes suggests that only a subset of ANFs become steeper with impairment (Sachs and Abbas 1974; Heinz and Young 2004). Because the dynamic range of low-SR fibers depends on basilar-membrane compression (Fig. 8.4c; see Sect. 8.4.1), their response growth is predicted to become steeper. In contrast, the dynamic range of high-SR ANFs is entirely within the linear portion of basilar-membrane response growth for both normal and impaired systems. When studied empirically, tone rate-level functions after acoustic trauma were actually shallower than normal (Heinz and Young 2004; Fig. 8.5a). This unexpected outcome can be understood in terms of the confounding effects of IHC stereocilial damage, which has been hypothesized and modeled to make the IHC transduction function shallower than normal, thus counteracting the steepening effects of OHC dysfunction (Zilany and Bruce 2006; Fig. 8.5b; see Sect. 8.3.4). Very steep responses were observed in some fibers, but they were limited to cases in which fiber thresholds were severely elevated (>80 dB) and thus were inconsistent with perceptual findings that loudness recruitment occurs for all degrees of SNHL. These very steep, high-threshold rate-level functions likely represent “component 2” responses associated with an entirely passive cochlea (Liberman and Kiang 1984). These high-level irregularities have been largely ignored in the interpretation of studies of normal hearing but may be extremely relevant for hearing-impaired listeners because hearing aids operate at these high SPLs. Furthermore, alternative hypotheses that considered pooling spike counts across ANFs did not account for loudness recruitment (Heinz et al. 2005). However, steeper response growth has been observed in certain ventral cochlear nucleus cell types, perhaps consistent with a contribution from central compensatory mechanisms (Cai et al. 2009).

Fig. 8.5
figure 5

In contrast to the predicted effects of OHC dysfunction, ANF rate-level slopes are on average shallower than normal after acoustic trauma. a Distributions of cat ANF rate-level slopes for normal hearing and mild and moderate-severe hearing losses. b Schematic of the opposing effects of OHC and IHC dysfunction on ANF level coding. Left: BM responses for normal hearing (nh; blue) and OHC dysfunction (hiOHC; red); middle: IHC transduction for normal hearing and IHC dysfunction (hiIHC; green); right: ANF rate-level functions for HSR (solid) and LSR fibers (dashed) for normal hearing and OHC dysfunction (red) and combined OHC and IHC dysfunction (green). a redrawn from Heinz and Young (2004); b modified from Heinz et al. (2005), with permission

8.4.3 Dynamic-Range Adaptation

Although natural sounds (e.g., speech) occur across a wide range of levels, in the short term (i.e., a single listening situation), they tend to have a fairly narrow (~30-dB) dynamic range that is comparable to that of individual neurons (see Sect. 8.4.1). Recent evidence shows that neurons throughout the auditory system can rapidly shift their dynamic range toward the most frequently occurring sound level (Dean et al. 2005; Wen et al. 2012; Fig. 8.6a), suggesting that this mechanism helps to solve the dynamic-range problem. Originally described in inferior colliculus neurons, this adaptive level coding also occurs (to a smaller degree) in ANF responses (Wen et al. 2012), perhaps related to synaptic power-law dynamics (Zilany and Carney 2010). Quantitative pooled-response analyses of the more prominent inferior colliculus effects show that level-coding information still degrades at high SPLs, suggesting that this effect may not be sufficient to resolve the dynamic-range problem entirely (Fig. 8.6b). However, because this mechanism does appear to contribute to normal auditory processing, it will be important to determine whether the selective loss of low-SR ANFs in cochlear synaptopathy reduces the robustness of level coding in cases of hidden hearing loss (see Sects. 8.1.2 and 8.2).

Fig. 8.6
figure 6

The dynamic range over which individual-neuron firing rates provide information to discriminate small increments in sound level that can adapt toward the most frequently occurring sound levels. a Rate-level functions (thick blue lines) for an individual inferior colliculus neuron for mean sound levels of 39 (solid blue) and 63 (dashed blue) dB SPL. Red lines, level-discrimination information computed from the response of this neuron. b Pooled population information versus sound level for four different mean sound level conditions. Horizontal bars, high-probability ranges. Data redrawn from Dean et al. (2005) and reprinted from Heinz (2012), with permission

8.5 Temporal Coding

8.5.1 Phase Locking to Temporal Fine Structure and Envelope

Any acoustic signal can be mathematically decomposed into the product of a rapid “temporal fine structure” (TFS) oscillation and a slower “envelope” (ENV) oscillation. The earliest descriptions of ANF tone responses noted that the timing of spikes from low-CF fibers is a function of the stimulus phase (Galambos and Davis 1943; Rose et al. 1967). Spikes are “phase locked” to the pattern of oscillation induced on the basilar membrane (Fig. 8.7b–d). In this way, TFS is represented in the temporal pattern of spiking. However, ANF spike phase locking is not typically on a cycle-by-cycle “entrained” basis. Phase-locking strength is commonly quantified by the “synchronization index” (a.k.a. “vector strength”). This metric is the magnitude of the vector sum over all spike times, where each spike is considered a unit vector with phase corresponding to the instantaneous stimulus phase (Goldberg and Brown 1969). Equivalently, the synchronization index is the ratio of the second to first Fourier components of a histogram of spike phases (a “period histogram”; Johnson 1980). The temporal information present in phase-locked spike trains is thought to be critical for many aspects of normal audition, such as sound localization and musical pitch perception. Moreover, disorders of auditory perception may reflect abnormal phase locking after damage to the nervous system (Moore 2014).

Fig. 8.7
figure 7

ANF phase locking to single tones. a Intracellular single-IHC recordings in response to 50-ms tones at 0.3–5 kHz. Modified from Palmer and Russell (1986), with permission. Blue arrows, alternating current (a.c.) component for the 2-kHz response; red arrows, direct current (d.c.) component for the 2-kHz response. b Spike-raster plot representing spike times recorded from a single low-SR chinchilla ANF (CF = 812 Hz) in response to a CF tone as a function of SPL (Sayles and Joris, unpublished data). c Period histogram of spike times expressed modulo stimulus phase. Spikes are well synchronized to the stimulus, with a synchronization index (SI) of 0.81. d CF tone synchronization index versus frequency. Replotted data are for guinea pig (population trend schematized; black; Palmer and Russell 1986), chinchilla (red; Sayles and Joris, unpublished data), cat (blue; Johnson 1980), and barn owl (population trend schematized; green; Köppl 1997)

ANF synchronization decreases with increasing stimulus frequency, reaching the measurement noise floor at some upper frequency limit (Johnson 1980; Palmer and Russell 1986). This upper frequency limit varies across species (Fig. 8.7d), whereas the degree of low-pass roll-off appears species invariant (Weiss and Rose 1988a). For cats, the upper frequency limit is ~5 kHz (Johnson 1980). Other common laboratory mammals have slightly lower limits: guinea pig, ~3.5 kHz (Palmer and Russell 1986) and chinchilla, ~4.5 kHz (Temchin and Ruggero 2009). The barn owl is exceptional, having the highest reported phase-locking limit at ~9 kHz (Köppl 1997). The origin of the low-pass behavior of phase locking is commonly thought to relate to the capacitive filtering properties of the IHC membrane and other synaptic properties (Weiss and Rose 1988b). There is a close correspondence between the roll-off in the modulation of IHC receptor potentials and the roll-off in ANF synchronization index (Palmer and Russell 1986; Fig. 8.7a).

ANFs also synchronize spiking to ENV components of multicomponent sounds. The most common method of characterizing temporal coding of ENV is with sinusoidal amplitude-modulated (SAM) tones (Joris et al. 2004). Strength of ANF spike synchrony to ENV is typically a low-pass function of modulation frequency, with a 3-dB cutoff frequency that increases with CF. Because the filter bandwidth increases with the CF, the increase in temporal-modulation frequency cutoff has been interpreted to imply that filter bandwidth has a significant effect on ENV coding. However, the modulation cutoff frequency saturates for CFs greater than ~10 kHz in cats (Joris and Yin 1992). This observation implies the existence of a separate mechanism that limits ENV coding at high frequencies, perhaps akin to the roll-off in TFS phase locking.

8.5.2 Effects of Acoustic Trauma on Temporal Coding in Auditory Nerve Fibers

Psychophysical studies have suggested that deficits in TFS processing underlie some suprathreshold perceptual abnormalities for hearing-impaired listeners (Moore 2014). Studies in hearing-impaired chinchillas demonstrate that under quiet listening conditions, the phase-locking synchronization index is similar to that found in normal-hearing animals. However, in the presence of background noise, the ability of ANFs of hearing-impaired animals to synchronize to a CF tone is severely diminished (Henry and Heinz 2012; Fig. 8.8a). This finding can be understood conceptually in terms of broadened frequency tuning in impaired animals (see Sect. 8.3.3). Broader filters pass more total masking-noise power, effectively decreasing the signal-to-noise ratio within the passband of the receptive field of the neuron. Thus, although there is no evidence for fundamentally impaired temporal precision in the ANF spike-generation mechanism, a deficit in temporal coding emerges in noisy listening situations due to impaired basilar-membrane filtering. This physiological finding is consistent with the perceptual observation that noisy environments are particularly troublesome for hearing-impaired listeners (see Sect. 8.1.3.3).

Fig. 8.8
figure 8

Effects of acoustic trauma on temporal coding of temporal fine structure (TFS) and envelope (ENV). a Synchronization index for ANF CF tone responses from normal-hearing (black) and hearing-impaired (red) chinchillas. Each panel shows responses at a different broadband (20-kHz) noise-to-tone overall-level ratio (in dB). Modified from Henry and Heinz (2012), with permission. b Ratio of ENV to TFS coding in ANF responses to sinusoidal amplitude-modulated (SAM) tones. Modified from Kale and Heinz (2010), with permission

ANFs from hearing-impaired animals show a remarkable loss of tonotopicity in TFS coding for complex sounds (Henry et al. 2016). In ANFs from hearing-impaired animals, TFS responses can be shifted down in frequency far away from the CF. For example, responses from fibers with CFs around 4 kHz become dominated by aberrant TFS information between 0.5 and 1 kHz. This loss of tonotopicity for complex sounds can be related conceptually to the interaction between W-shaped tuning curves (Fig. 8.3b) and the roll-off in phase locking (Heinz and Henry 2013; Fig. 8.7d). The perceptual consequences of this tonotopically altered brainstem input pattern for complex-sound perception are likely to be substantial.

Although perceptual studies suggest that humans with cochlear damage experience difficulty exploiting TFS-based cues, it is thought that ENV processing is less affected. Some studies have actually suggested improved amplitude-modulation coding for hearing-impaired listeners compared with normal-hearing controls (e.g., Moore et al. 1996). This finding has been interpreted to reflect a loss of basilar-membrane compression resulting from decreased OHC gain. Neurophysiological studies have examined ENV coding in ANF responses from chinchillas with noise-induced hearing loss (Kale and Heinz 2010, 2012; Henry et al. 2014). ENV coding increases in strength following noise-induced cochlear damage, especially in ANFs with very steep rate-level functions associated with severe IHC damage (Fig. 8.3e; see Sect. 8.4.2). Despite substantially broadened tuning curves, the upper frequency limit of modulation synchrony is unchanged (Kale and Heinz 2012). The major change in temporal-coding strength after noise-induced cochlear damage is a shift in the balance of TFS and ENV coding in mid-CF ANFs. Although hearing impairment does not change the fundamental ability of ANFs to code TFS, there is a substantial downward shift in the CF range over which temporal coding in response to complex sounds transitions mainly between TFS and ENV (Kale and Heinz 2010; Fig. 8.8b).

8.5.3 Effects of Acoustic Trauma on Across-Fiber Spatiotemporal Coding

Significant changes occur in the relative timing of phase-locked spikes across ANFs with slightly different CFs after acoustic trauma. These spatiotemporal patterns have been hypothesized to be perceptually relevant for speech (Shamma 1985), pitch (Loeb et al. 1983; Larsen et al. 2008), sound-level coding (Carney 1994; Heinz et al. 2001), tone-in-noise detection (Carney et al. 2002), and binaural processing of interaural time differences (Shamma et al. 1989; Joris et al. 2006). Furthermore, altered spatiotemporal patterns have been suggested to contribute to loudness recruitment (Carney 1994) and degraded processing of frequency modulation (Moore and Skrodzka 2002) after OHC dysfunction.

The primary origin of across-CF phase differences in ANFs responding to the same spectral feature is the cochlear traveling wave (Ruggero 1994), whereby basal locations respond to sound earlier than do apical locations. Phase differences across cochlear locations with CFs near a spectral feature also depend in part on the degree of resonance (cochlear gain and sharpness of tuning), with broader tuning creating more coincident responses across nearby CFs (Carney 1994). Thus, it is not surprising that cochlear damage appears to change delays in direct basilar-membrane measurements (Ruggero 1994) and in derived estimates based on human evoked responses (Strelcyk et al. 2009). Sparse sampling and imprecise CF estimates make accurate quantification of across-CF delays extremely difficult in ANF recordings. However, to overcome this limitation, the effect of acoustic trauma on across-CF delays has been estimated from single ANF responses to a set of frequency-shifted complex sounds (Heinz et al. 2010). Across-CF delays were up to one-quarter cycle smaller in noise-exposed ANFs, representing a more coincident spatiotemporal pattern after cochlear damage. Although this phase shift appears small, a quarter-cycle shift is very significant for spatiotemporal coding theories because it represents the difference between correlated and uncorrelated activity across CFs.

8.6 Adaptation, Suppression and Masking

8.6.1 Auditory Nerve Fiber Adaptation

Sensory systems adapt to stimulation. At the single-neuron level, this typically means that spike probability decreases from some initial peak at the stimulus onset to some lower value during sustained stimulation. After the stimulus offset, the “excitability,” or spontaneous spiking probability, is initially depressed and then recovers with a certain characteristic time course. The precise pattern of neural adaptation is thought to have a variety of functional roles in processing time-varying physical stimuli, such as sound. Temporal-adaptation patterns of mammalian ANFs are well characterized, typically showing a peak of onset-related activity followed by a rapid decline to a quasi-steady discharge rate (Westerman and Smith 1984). The underlying mechanism is thought to involve primarily the depletion of a pool of readily releasable synaptic vesicles at the IHC-ANF synapse (Moser and Beutner 2000; Goutman and Glowatzki 2007). ANF adaptation kinetics are thought to be advantageous for the coding of complex sounds containing transients, such as consonants in speech (Delgutte and Kiang 1984b), and to contribute to the coding of amplitude-modulated sounds (Smith and Brachman 1980). After sound offset, spontaneous activity recovers with a quasi-exponential time course over ~200 ms (Harris and Dallos 1979). This form of recovery from adaptation is considered a neurophysiological correlate of the time course of perceptual forward masking, although central contributions appear necessary to account fully for the perceptual effect (Relkin and Turner 1988; Ingham et al. 2016).

After acoustic trauma, ANF-response patterns are more dominated by the initial onset peak of activity, with a more rapid onset-adaptation time constant and a slower offset recovery (Scheidt et al. 2010). These altered kinetics are hypothesized to relate to changes in IHC intracellular calcium dynamics after noise damage. For example, noise overexposure is associated with elevated hair cell intracellular Ca2+ levels (Fridberger et al. 1998). This may alter the balance between the onset and sustained responses by differential action on rapidly inactivating Ca2+ channels and voltage-gated Ca2+ channels (Heil and Irvine 1997).

8.6.2 Two-Tone Suppression

The compressive nonlinearity of healthy cochlear signal transduction manifests as suppressive interactions between frequency components. In ANFs, these suppressive nonlinearities can be revealed using a two-tone stimulation paradigm (Sachs and Kiang 1968). The spike rate in response to one tone can be reduced by the simultaneous presence of a second tone. Hence, the term “two-tone rate suppression” (2TS) has been used (Delgutte 1990b). A similar phenomenon is observed in terms of spike synchrony. Synchrony to one tone can be reduced by the presence of a second, which “captures” the spike synchronization of the ANF (Javel 1981). The underlying basis is thought to be, at least partially, saturation of OHC receptor currents (Geisler et al. 1990). This mechanical phenomenon has almost-instantaneous action, hence the requirement for simultaneous tones (Arthur et al. 1971; van der Heijden and Joris 2005).

Suppressive frequency regions exist above the CF (“high-side suppression”) and below the CF (“low-side suppression”; Fig. 8.9). Suppression threshold as a function of frequency for single ANFs has been studied using adaptive-tracking approaches similar to those used to map excitatory-threshold tuning curves (Schmiedt 1982; Delgutte 1990b). Low-side suppression typically has higher thresholds than high-side suppression (Fig. 8.9a). However, low-side suppression grows at a faster rate (up to 3 dB/dB) compared with high-side suppression (<1 dB/dB; Delgutte 1990b; Fig. 8.9b, c).

Fig. 8.9
figure 9

Two-tone rate-suppression threshold and suppression growth in normal hearing. a Excitatory (blue) and suppressive (black) threshold tuning curves for a single cat ANF. Colored symbols and dashed lines indicate frequencies and sound-level range for the growth functions in b. b Suppression growth functions: two above and two below CF. c ANF population trend showing the growth rate of suppression versus normalized (re CF) suppressor frequency. Redrawn from Delgutte (1990b, Figs. 5 and 13), with permission

Instantaneous and frequency-dependent suppressive nonlinearities are important for the neural coding of complex sounds. When listening to broadband sounds, such as speech and music, there are suppressive interactions between multiple-frequency components (van der Heijden and Joris 2005; Sayles et al. 2016), which shape their neural representation (Sachs and Young 1980; Young 2008).

Acoustic trauma-induced OHC damage is associated with cochlear-response linearization. Aging in a noisy environment is associated with a reduction or loss of 2TS, with high-side suppression being the most vulnerable (Schmiedt et al. 1990). The low-side suppression threshold can remain relatively unchanged by hearing loss, even in cochlear regions with up to 60% OHC loss (Schmiedt and Schulte 1992).

8.6.3 Relationship of Adaptation and Suppression to Perceptual Masking

Adaptation and suppression in ANF responses are thought to be important for the neural coding of complex sounds and for the detection and discrimination of signals in background noise (Sachs and Young 1980; Miller et al. 1997). In the simple case of a two-tone stimulus, one at the CF (the signal) and one off the CF (the masker), both excitatory and suppressive masking can occur. If the masker itself excites the ANF, the threshold for signal detection will increase. Alternatively, if the masker acts to suppress the signal response but does not itself drive an increase in spike rate, the threshold for signal detection will also increase. The two mechanisms are not mutually exclusive, in that a masker may have both excitatory and suppressive influences simultaneously. Because suppressive masking effects only occur when the signal and masker overlap in time (simultaneous masking), but excitatory masking can also occur when the masker precedes the signal (forward masking), it is possible to tease apart physiologically the relative contributions of these two mechanisms to masking. Exploiting this approach, Delgutte (1990a) found suppressive masking to dominate for maskers much lower in frequency than the signal where suppressive growth rates are higher (Fig. 8.9b). This finding suggests that the perceptual “upward spread of masking” phenomenon may be derived largely from suppressive rather than excitatory masking, although this remains a topic of debate.

8.7 Coding of Complex Sounds

8.7.1 Speech Coding in the Auditory Nerve

Many fundamental phenomena in audition, such as frequency tuning, adaptation, suppression, and phase locking, can be understood by studying ANF responses to one- or two-tone stimuli. However, in everyday life, auditory systems typically encounter complex acoustic signals having a broadband spectrum, such as vocalizations and music. Knowledge of ANF physiological response properties to simple stimuli is an important foundation for building a detailed conceptual understanding of neural coding of the rich spectrotemporal content of such complex sounds.

The perceptually important frequency content of speech is largely determined by the resonant frequencies of the speaker’s vocal tract. These result in “formants,” which are peaks in the spectral energy distribution that underlie identification of different vowel sounds (Fant 1970). Neurophysiological studies of speech coding have therefore focused on representations of formant frequencies (Young and Sachs 1979; Delgutte and Kiang 1984a). Typically, spikes from several hundred ANFs with different CFs are recorded in response to the same speech sound. These data are then analyzed from a population perspective, asking what response properties convey information useful to identify salient acoustic features. Broadly speaking, two common schemes are usually considered: temporal codes based on interspike interval statistics (Young and Sachs 1979; Palmer et al. 1986) and rate-place codes based on the firing-rate distribution across the CF axis (Sachs and Young 1979; Recio et al. 2002). ANF response nonlinearities have large effects on both rate-place and temporal representations (Sachs and Young 1980).

At low sound levels, rate-place profiles of cat ANF responses provide a clear representation of synthetic human vowel sounds. Peaks in firing rate occur at CFs corresponding to the formants. However, as the sound level is increased toward those typical of conversational speech (above ~60-dB SPL), the formant-related pattern of peaks and valleys in the firing rate versus CF profile disappears (Sachs and Young 1979). This is due to a combination of firing-rate saturation and suppressive nonlinearity (Sachs and Young 1980). With increasing SPL, high-SR fibers tuned near formant peaks saturate their firing rate while high-SR fibers tuned away from formant peaks “catch up.” Additionally, for fibers tuned near formant peaks, there is increasing rate suppression of on-CF components by off-CF components. Sound-level dependence is the main criticism of rate-place vowel coding. However, low-SR ANFs with higher thresholds and wider dynamic ranges (see Sect. 8.4.1) are still able to represent formant structure at relatively high SPLs (Sachs and Young 1979). In addition, when vowel spectra are scaled appropriately for species-specific cochlear length, rate-place profiles are less dependent on sound level than when “human-scaled” speech is presented to other species (Recio et al. 2002).

Fibers with CFs near formant peaks (particularly the first and second formants, F1 and F2, respectively) show strong phase locking to spectral components nearest to the formant. There is said to be “synchrony capture” of the response of the fiber by those components. Across the tonotopic array, there are narrow CF regions where fibers are dominated by synchrony to F1, to F2, or to higher formants (Young and Sachs 1979; Delgutte and Kiang 1984a; Fig. 8.10a). Between these regions, fibers generally synchronize to modulation at the fundamental frequency (F0) for voiced vowel sounds. As the sound level increases, synchrony suppression of F1 by components near F2 in a normal-hearing ear allows fibers tuned near F2 to maintain a temporal representation of the second formant frequency. Moreover, and in contrast to rate-place profiles, tonotopic temporal representations of vowel sounds are relatively stable with increasing sound level in normal-hearing animals. This is because synchrony to formants tends to capture the response of a fiber and suppress a temporal response to stimulus components away from the formants. In hearing-impaired animals, synchrony to F1 tends to spread across the CF array, degrading the tonotopic representation characteristic of the normal system (Fig. 8.10b).

Fig. 8.10
figure 10

ANF temporal coding of the vowel /ε/ in normal (a) and impaired (b) hearing at equal sensation levels. Spikes are phase locked to stimulus frequency components. Data are Fourier transform magnitudes (insets) from poststimulus time histograms, expressed as synchronized firing rate in spikes per second and averaged across fibers of similar CF. Gray stripe, response frequencies within 0.5-octave of the CF; colored arrows, formant frequencies along both axes. Histograms below the main plots represent the number of fibers in each bin. Modified from Young (2008), with permission

In hearing-impaired animals, ANFs with CFs near F2 are unable to maintain a clear temporal representation of F2 (Miller et al. 1997; Fig. 8.11, left column). Instead, their responses become dominated by phase locking to components near F1 and intermodulation distortion components such as F2–F1. These alterations (relative to normal hearing) are expected to reduce speech intelligibility for hearing-impaired listeners. Miller et al. (1997) interpreted these data in terms of broadened frequency tuning and weakened suppressive nonlinearities, reflecting OHC dysfunction. However, IHC dysfunction likely also contributes to this altered temporal representation after acoustic trauma, where mixed IHC and OHC pathology is typical (see Sect. 8.1.2). For example, Bruce et al. (2003) found that both IHC and OHC dysfunction contribute to the degraded tonotopic representation of formants in a computational model of ANF responses. Temporal responses similar to those recorded by Miller et al. (1997) were reproduced by only impairing the IHC component of the model (Fig. 8.11, right column).

Fig. 8.11
figure 11

IHC dysfunction contributes to the effects of acoustic trauma on synchrony-based vowel coding. Left column: Fourier transforms of PST histograms from a normal (top, black line) and an impaired cat ANF (bottom, red line) in response to the vowel /ε/; blue dashed lines, formant frequencies (F1, F2, F3). Both fibers had a CF near F2. Reprinted from Miller et al. (1997), with permission. Right column: AN model responses to the same vowel. Top, response of the normal-hearing model; bottom, response of a hearing-impaired model ANF with only IHC damage. Modified from Bruce et al. (2003), with permission

Other aspects of ANF physiology, such as adaptation, likely also contribute to the coding of ongoing speech. The description here is limited to ANF representations of steady-state voiced vowel sounds. This view is simplistic because natural speech cannot simply be viewed as a temporal series of discrete isolated epochs. Moreover, the addition of background noise or a single concurrent talker adds significant complexity to the neural representation (Delgutte and Kiang 1984c; Palmer 1990). For a detailed review of speech coding, see Young (2008).

8.7.2 Auditory Scene Analysis

Ears are typically faced with a jumbled mixture of acoustic energy from more than one sound source. The auditory system must parse this mixture into several auditory “streams” to form distinct “auditory objects,” thereby enabling a listener to follow a single voice during a conversation in a noisy situation (Bregman 1990). An F0 difference between sound sources is a particularly salient acoustic feature for this task (Brokx and Nooteboom 1982). For example, the high-pitched voice of a young child in the presence of the low-pitched voiced of an adult male can be followed with relative ease. Palmer (1990) studied the representation of “double vowels” in the spiking responses of single guinea pig ANFs. Fibers with CFs near a formant peak of one of the two vowels tended to phase lock to spectral components from that vowel sound, giving a temporal cue to that formant frequency. Fibers with CFs away from a formant of either vowel tended to phase lock to the envelope modulation corresponding to the F0 of either one or both vowels in the mixture. The relative strength of phase locking to the F0 of each vowel in the mixture depended on the balance of energy from each vowel at the basilar membrane location corresponding to the CF of the fiber. This temporal information in ANF spike patterns can be exploited for perceptual segregation of the competing voices (de Cheveigné 1993; Keilson et al. 1997).

8.8 Role of Olivocochlear Efferents

8.8.1 Sensory-Context Modulation of Cochlear Function

MOC efferent activation is thought to improve signal detection in noisy backgrounds (Dewson 1968; Kawase et al. 1993). MOC activation decreases ANF responses to tones in a quiet background (Guinan and Gifford 1988); therefore, a role for MOC efferents in improving signal detection may appear counterintuitive. However, in the presence of background noise, MOC activation can actually increase ANF responses to transient sounds by decreasing neural adaptation to the background noise (Winslow and Sachs 1988; Kawase et al. 1993; Fig. 8.12a). Such improvements in the detectability of transient sounds may be particularly important for the coding of modulated signals, such as speech. A related “sensory-context” functional role for the MOC efferent system is the modulation of cochlear gain by other sensory modalities. Attention to visual stimuli reduces the gain of cochlear signal transduction, suggesting that selective attention can modulate the gain of auditory afferents via olivocochlear efferents (Delano et al. 2007).

Fig. 8.12
figure 12

Hypothesized functions of the MOC efferent system: signal unmasking and protection from acoustic trauma. a Schematic rate-level functions for a HSR ANF in response to CF tones in quiet, in background noise, and with and without MOC efferent activation. Based on a similar figure in Guinan (2006), with permission. b Permanent threshold shifts measured from compound action potential recordings in guinea pigs exposed to an octave-band noise. Data from Maison and Liberman (2000) grouped according to estimated MOC reflex strength and averaged across three noise-exposure bands

8.8.2 Protection from Acoustic Trauma

Efferents are thought to protect the cochlea from the damaging effects of intense-sound exposure (Rajan 2000). Recent work shows MOC innervation of OHCs is tonotopically aligned with ANFs in a manner consistent with that predicted to be optimal for cochlear protection (Brown 2014, 2016). Functionally, MOC reflex strength is inversely correlated with the degree of permanent threshold elevation after intense noise overexposure (Maison and Liberman 2000; Fig. 8.12b). Work on cochlear synaptopathy (Kujawa and Liberman 2015) has led to a greater understanding of the protective role of MOC efferents under everyday listening conditions. After exposure to moderate-level noise, mice with lesioned efferent neurons had a greater loss of IHC-ANF synapses compared with control mice (Maison et al. 2013). It therefore seems likely that protection from noise damage could be a major driving force for the evolution of such a negative-feedback control system operating at sound levels typical of conversational speech. The LOC system may also play a role in protection from acoustic trauma. Mice with selectively lesioned LOC neurons are more susceptible to ipsilateral acoustic trauma compared with control mice (Darrow et al. 2007).

8.8.3 Balancing Interaural Sensitivity

LOC neurons are located in the lateral superior olive, a brainstem nucleus that receives binaural inputs and is involved in the processing of interaural-level difference (ILD) cues for sound-source localization. One potential function of the LOC efferent system is to maintain the two cochleae in a state of balanced excitability, thereby contributing to effective ILD coding (Irving et al. 2011). Without precisely balanced excitability, any interaural difference in cochlear output may or may not reflect a true ILD. Darrow et al. (2006a) found a shift in the balance of excitability between the two cochleae after selectively lesioning LOC neurons. LOC lesions removed the tight interaural correlation in auditory brainstem response (ABR) wave-I amplitudes (reflecting the summed activity of all ANFs) seen in normal animals. These data therefore suggest that a functional LOC efferent system is required to maintain proper interaural balance in excitability, perhaps to enable high sensitivity to small ILDs.

8.9 Summary and Future Directions

This chapter has focused on the neural coding of sounds in the discharge patterns of ANFs in normal and impaired hearing. Work addressing the physiological effects of hearing loss on ANF responses has highlighted a number of perceptually and translationally relevant effects (e.g., broadened tuning and loss of tonotopicity) as well as the absence of some hypothesized physiological correlates of psychophysical phenomena (e.g., the lack of consistently steeper rate-level functions and the lack of degraded TFS phase-locking strength). This absence of hearing-loss effects on ANF physiology has led to important insights, such as the opposing effects of OHC and IHC dysfunction that complicate the interpretation of physiological correlates of loudness recruitment. It is critically important to study the effects of cochlear damage on the neural coding of complex sounds because the full impact of changes in tonotopicity and frequency-dependent nonlinearities may only be apparent for broadband sounds. More detailed work is needed to understand the effects of cochlear damage on suppressive nonlinearities, which are critical for robust speech coding in normal hearing. More generally, future work exploring the effects of OHC versus IHC dysfunction on auditory neural coding will be important for improving understanding of the basis for individual variability in speech perception across patients with similar audiograms. Such work may ultimately lead to diagnostic tests sensitive to differences in the balance of OHC and IHC dysfunction, thereby refining the currently singular category of SNHL and allowing individually tailored approaches to audiological therapy that are physiologically based.

Important insight into the salient features of neural coding has been garnered from both experimental and modeling studies quantifying correlations between peripheral physiology and perception. However, an enormous amount of neural computation is performed by the brain. The consequences of altered peripheral processing for neural coding in the brainstem and beyond, at a single-neuron level, have received much less attention than the peripheral effects presented in this chapter. Hypothesis-driven exploration of the effects of hearing impairment on the central auditory nervous system, based on knowledge of cochlear damage and ANF physiology, is a fruitful avenue for future work in this field. In particular, the effects of hearing loss on binaural processing will be important for understanding and ameliorating the difficulties listeners with cochlear hearing loss have in spatial source segregation in real-world listening situations (Middlebrooks et al. 2016), for which hearing aids currently provide limited benefit (Popelka et al. 2016).

Exciting recent work has begun to link cochlear synaptopathy after moderate sound exposure to suprathreshold processing deficits, so-called “hidden hearing loss” (Bharadwaj et al. 2015). One limitation in this work is that the evoked physiological measures (e.g., ABRs, envelope following responses, and, more recently, middle ear reflexes) that have been correlated with perceptual deficits do not provide direct confirmation of the degree of cochlear synaptopathy in humans. Future work to provide quantitative links between single-ANF responses (e.g., of differing SRs) and evoked responses is critical to the development of fast and reliable assays of cochlear synaptopathy that can be applied in the audiology clinic. From the opposite angle, animal behavioral studies are also needed to establish the perceptual consequences of cochlear synaptopathy. Finally, the loss of synaptic connections between IHCs and ANFs characteristic of this form of hearing impairment does have one major saving grace. The sensory cells themselves and their neural connections to the brain survive for months to years after the synaptic insult. This offers a window of opportunity for future therapeutic intervention, perhaps exploiting neurotrophins to restore functional IHC-ANF connections (e.g., Suzuki et al. 2016) or stem cells to replace damaged or missing auditory neurons (Nayagam and Edge 2016).