Introduction

Listening to speech in a noisy environment is a challenging task often encountered by the human auditory system. Fortunately, several physiological mechanisms address this important challenge including certain forms of adaptation. In order to accommodate listening in noisy environments, the ear’s response adapts based on the average sound intensity (Dean et al. 2005). This “dynamic range” (DR) adaptation may involve many physiological mechanisms along the auditory pathway including potential contributions from the inner-hair-cell (IHC) ribbon synapse (Wen et al. 2009), inferior colliculus synaptic depression (Dean et al. 2005), and the medial olivocochlear reflex (MOCR) (Kawase et al. 1993). As discussed by Wen et al. (2009), DR adaptation is different than classic firing rate (CFR) adaptation, which is characterized by a decrease in firing rate over the course of acoustic stimulation.

In simultaneous masking, some form of adaptation may partially explain why signal threshold improves as the signal is delayed from masker onset. This improvement, called “overshoot” (Zwicker 1965) can be as large as 15–20 dB and is often defined as the difference in threshold for a signal presented at a short delay (~2 ms) and a long delay (~200 ms) relative to masker onset. Theories have emerged regarding the mechanisms of overshoot, several of which are based on adaptation; however, the form of adaptation (CFR vs. DR) is still a matter of debate.

Classic firing rate adaptation has been proposed as a mechanism of overshoot (e.g., Bacon and Healy 2000). This hypothesis is based on the prediction that a constant incremental response in the presence of adaptation should improve threshold by 3–5 dB (Smith and Zwislocki 1975). Furthermore, CFR adaptation has been hypothesized to account for the level dependence of overshoot based on differences in onset responses and thresholds of high and low spontaneous-rate (SR) fibers (McFadden and Champlin 1990).

Several investigators have proposed DR adaptation via the MOCR as a mechanism of overshoot (e.g., Schmidt and Zwicker 1991). This hypothesis is based on MOCR physiology that shows that cochlear gain is high at masker onset and decreases to a plateau after about 100 ms (Backus and Guinan 2006). Strickland (2001) used a basilar membrane compression model to show that gain reduction can account for the magnitude and level dependence of overshoot. Moreover, this hypothesis is consistent with the observation that overshoot is reduced following cochlear hearing loss (Bacon and Takahashi 1992; Strickland and Krishnan 2005).

A common theme in theories of overshoot relates to a limitation in detecting the signal when presented near masker onset. Supposedly, this limitation is overcome by an adaptive mechanism which results in improved thresholds when the signal is delayed from the masker’s onset. The present study attempts to identify factors that may limit detection near masker onset and determine whether CFR adaptation and/or the MOCR can overcome such factors to account for overshoot. This objective was achieved using signal detection theory (SDT) and a computational model of the auditory nerve (AN), which (1) was recently improved to accurately account for CFR adaptation, (2) is able to simulate the MOCR and cochlear hearing loss, and (3) includes low, medium, and high SR responses (Zilany et al. 2009).

Methods

Combining computational modeling and SDT has been successful previously in evaluating the physiological bases of other psychoacoustic phenomena such as interaural time discrimination (Colburn 1973), intensity and frequency discrimination (Siebert 1970; Heinz et al. 2001a, b; Colburn et al. 2003), and frequency selectivity (Heinz et al. 2002). Until recently, computational models were not suited to test theories of overshoot because of difficulty in modeling CFR adaptation. Zilany et al. (2009) extended a well-established computational model of the cat AN (Carney 1993) by including power law dynamics to significantly improve the model adaptation properties associated with the IHC-AN synapse. Given its use of power law dynamics, this model will be referred to as the “power law” model.

The power law model captures many physiological properties of the auditory periphery. Bruce and Zilany (2007) summarized many of these properties when describing an earlier version of the model; including, cochlear compression and suppression (Heinz et al. 2001c; Zhang et al. 2001), middle-ear filtering (Bruce et al. 2003), level dependent shifts in best frequency (Tan and Carney 2003), and inner/outer hair cell impairment (Bruce et al. 2003; Zilany and Bruce 2006, 2007). In addition to these properties, the power law model now accounts for properties of CFR adaptation that were not accounted for by previous versions of the model. Among these properties is the ability to predict a constant incremental response in the presence of adaptation (Smith and Zwislocki 1975). Furthermore, the power law model accounts for the recent observation that DR adaptation exists in AN responses from anesthetized cats (Zilany and Carney 2010). This particular form of DR adaptation is unlikely to be due to the MOCR because of the suppressive influence of anesthesia on efferent function.

Stimuli

The noise masker had a bandwidth from 20 to 50,000 Hz and was 400 ms long with 5 ms rise/fall ramps. Its spectrum level ranged from −30 to 60 dB in 10-dB steps. The 11-kHz signal occurred either 2 ms (short-delay condition) or 200 ms (long-delay condition) after the masker’s onset. Its duration was 10 ms with 5 ms rise/fall ramps. The signal level ranged between 0 and 90 dB SPL in 10-dB steps. An 11-kHz signal was selected to account for cochlear differences between cats and humans. According to Greenwood (1990), 11 kHz in the cat corresponds roughly to 4 kHz in the human, which is a common test frequency for overshoot (e.g., Carlyon and Sloan 1987; Bacon 1990; Strickland 2001).

OHC control in the power law model

A schematic of the power law model is provided in Figure 1, where the OHC gain control module is highlighted in gray. The power law model’s C OHC parameter specifies the gain of the OHCs. This parameter takes on values from 0 (complete loss of OHC gain) to 1 (maximum OHC gain). After the middle ear module, the model splits into three paths. The C1 and C2 filter paths interact to account for the effects associated with high-input levels such as peak splitting and the C1/C2 transition (Kiang 1990). The “control path” filter accounts for nonlinearities associated with the OHCs such as compression and suppression. This control filter affects the time constant of the C1 filter, thus adjusting the filter’s gain and bandwidth. The C OHC parameter is essentially a scaling constant which determines the influence of the control path filter. Details regarding the implementation of the C OHC parameter, and the C1, C2, and control path filters can be found in Bruce et al. (2003) and Zilany and Bruce (2006). In previous applications of the model, hearing impairment has been simulated by adjusting outer and inner-hair cell health (Bruce et al. 2003; Zilany and Bruce 2006, 2007; Heinz and Swaminathan 2009). A similar method is employed in these experiments; however, in addition to simulating hearing impairment, MOC feedback was simulated by adjusting C OHC. This approach is similar to other modeling studies (Ferry and Meddis 2007; Ghitza et al. 2007; Messing et al. 2009; Brown et al. 2010) that model the basic effect of the MOCR by reducing the amount of cochlear nonlinearity.

FIG. 1
figure 1

Block diagram of the power law model for auditory nerve responses. The outer hair cell (OHC) gain can be manipulated by adjusting the model’s C OHC parameter (darkened box). In the current set of experiments, all parameters except C OHC were the same as Zilany et al. (2009). Figure modified slightly from Figure 2 in Zilany et al. (2009) and used with permission from the Acoustical Society of America.

Model settings

Detection thresholds in overshoot conditions were simulated using the model’s synapse output (labeled “r(t)” in Fig. 1) from 50 characteristic frequencies (CFs) spaced logarithmically from 6 to 20 kHz. Responses were obtained for each of the power law model’s three different SR classes: high SR, medium SR, and low SR. The specific SR value for a synapse of a given class is determined by the model’s fractional Gaussian noise. This noise exists in the power law model to account for the distribution of spontaneous rates observed in physiological data (see Jackson and Carney (2005) and Zilany et al. (2009) for details). The mean values of the three SR classes are 100, 5, and 0.1 spikes/s for high SR, medium SR, and low SR synapses, respectively.

In order to evaluate theories of overshoot, four “virtual listeners” were simulated. These virtual listeners differed from one another by their OHC gain. Outer hair cell gain was adjusted to create listeners with normal hearing (NH) or hearing impairment (HI) and to produce simulations with (MOCR+) or without (MOCR−) the MOCR. The “NHMOCR−” virtual listener had a C OHC value of 1 throughout the entire simulation regardless of the condition. For the “HIMOCR-” virtual listener, the C OHC parameter was consistent with a 40-dB flat hearing loss of OHC origin. Realistically, even mild hearing loss may involve some IHC damage (Plack et al. 2004). Thus, setting the model’s hearing loss with only OHC damage is a simplification.

The “NHMOCR+” and “HIMOCR+” virtual listeners had the same C OHC values as their MOCR − counterparts in the short-delay condition. In the long-delay condition, the C OHC parameter for these listeners was set based on the intensity of the masker in order to simulate the MOCR. In other words, the C OHC settings of these virtual listeners are based on the assumptions that (1) the MOCR is too sluggish to influence detection in the short-delay condition and (2) in the long-delay condition sufficient time has passed such that the strength of the MOCR is maximal. Under these assumptions, it is unnecessary to model the entire time course of the MOCR even though naturally gain would decrease dynamically during the course of the masker. Therefore, for both the signal + masker and masker-alone simulations MOCR strength was set to full-off (short-delay condition) or full-on (long-delay condition) for a given masker level,

The C OHC /masker-level relationship was defined based on data from Backus and Guinan (2006), who reported that MOCR strength (measured in % reduction in SFOAE magnitude) and elicitor level are linearly related. A given C OHC value produces a corresponding reduction in OHC gain. Figure 2 displays the relationship between OHC gain and masker level for all virtual listeners. For the MOCR+ simulations, the linear relationship between masker level and reduction in OHC gain occurs over a restricted range. The start values for this range were set near the masker level required to shift absolute threshold (i.e., −30 and 20 dB spectrum level for the NHMOCR+ and HIMOCR+ listener, respectively). The upper values for this range were based on the assumption that MOCR strength saturates at higher masker levels. For all C OHC values less than 1, the power law model’s “fitaudiogram” function verified that the desired amount of OHC gain was achieved and that little or no IHC loss was present. In all simulations, the reduction in OHC gain was constant across the range of CFs simulated (6–20 kHz).

FIG. 2
figure 2

Outer hair cell (OHC) gain settings as a function of masker spectrum level for each virtual listener. Feedback from the medial olivocochlear reflex (MOCR) was simulated in normal (NH) and hearing-impaired (HI) virtual listeners by reducing outer hair cell gain in the long-delay condition. A NHMOCR− and NHMOCR+ listeners. B HIMOCR− and HIMOCR+ listeners.

Procedure

Data were collected from the power law model using MATLAB® software (2007a, The MathWorks, Natick, MA). As with previous editions of the AN model, the model source code was provided in association with the published manuscript (Zilany et al. 2009). For the current set of experiments, the AN model source code was used as published. After compiling the model, the output of the synapse module was obtained by calling a series of MATLAB® functions. The inputs to these functions include: the stimulus time waveform, model CF, sampling period, duration of the output window, outer hair cell health (C OHC), inner-hair cell health (C IHC), and the SR type (high, medium, or low). Parameters beyond these inputs were not manipulated in the present experiments. Such unaltered parameters included (among others) the adaptation properties of the power law synapse, thresholds for high, medium and low SR synapses, rate/level function characteristics, and the model’s internal noise (see Jackson and Carney (2005) for a description of noise sources in the model). Leaving these parameters unaltered has the advantage of maintaining the model’s ability to account for physiological data. Specific details of how these unaltered parameters were set and their ability to account for physiological data can be found in publications associated with the current and previous versions of the AN model (e.g., Bruce et al. 2003; Jackson and Carney 2005; Zilany and Bruce 2006, 2007; Zilany et al. 2009).

In order to predict behavioral thresholds for a given virtual listener, simulations were separated into two general categories: one called “signal + masker” and the other called “masker-only.” In the signal + masker simulation, model responses were obtained for all possible pairs of masker and signal. This resulted in 200 combinations (10 signal levels × 10 masker levels × 2 signal delays). For a given simulation, a customized computer program generated the masker and signal independently and scaled them to the appropriate sound level (in Pascals). After generating the stimuli, the program combined the signal and masker at the specified signal delay. The program then presented the stimuli to the model, retrieved the synapse output for all SR classes and downsampled this output. The downsampling operation was included to minimize the amount of memory needed to store the data. In the masker-only simulation, model responses were obtained for the masker at all masker intensities. In this simulation, the signal was not combined with the masker, nor was it presented to the model.

In the procedure described above, the data were obtained from the power law model by presenting each stimulus independently rather than successively. This approach accounts for a prominent component of DR adaptation in AN responses, which occurs rapidly over a time course of several hundred milliseconds (shorter than the 400-ms stimuli used here) and which is captured by the power law model (Zilany and Carney 2010). This approach excludes long-term memory effects that may also contribute to DR adaptation in AN responses (Wen et al. 2009), but which are unlikely to have a large effect on overshoot since they would not differentially affect short-delay and long-delay thresholds. In order to reliably calculate threshold, it was necessary to repeat each simulation (i.e., each signal-masker pair) 240 times. This number of repetitions allowed us to account for the randomness inherent in the stimulus (external noise) and the synapse (internal noise). The randomness in the model’s synapse module arises from two sources; namely, the fractional Gaussian noise that affects the synapse waveforms used in these simulations, and the Poisson variability in AN spike trains, which is accounted for in the SDT analyses described in the next section.

Calculating detectability

Detectability was measured based on a statistical metric analogous to d-prime squared. This metric, referred to as “Q,” is computed from the time-varying discharge rate waveforms for the population of AN fibers (i.e., the synapse output waveforms provided by the AN model). The external (stimulus) variability associated with the noise masker is accounted for in Q by including synapse outputs in response to a number of stimulus repetitions. The internal (physiological) variability associated with AN spike-train responses is primarily accounted for by using Poisson statistics for AN spike trains in the derivation of the equation for Q, which is described in detail by Heinz (2000) and Heinz et al. (2002). In the context of this experiment, Q quantifies the sensitivity of a suboptimal detection process based on discharge rate. This is done by computing a weighted difference in the average synapse output between the masker-only and signal + masker simulations. This difference is summed across synapses, squared, and then divided by the sum of two sources of variability, one primarily related to spike-train variability and the other primarily related to stimulus variability. The equation for Q is

$$ Q = \frac{{{{\left\{ {\sum\nolimits_i {\left[ {\ln \frac{{{x_i}\left( {SN} \right)}}{{{x_i}(N)}}} \right]\left[ {{x_i}\left( {SN} \right) - {x_i}(N)} \right]} } \right\}}^2}}}{{\frac{1}{T}\sum\nolimits_i {{{\left[ {\ln \frac{{{x_i}\left( {SN} \right)}}{{{x_i}(N)}}} \right]}^2}{x_i}(N)} + {\hbox{Va}}{{\hbox{r}}_n}\left\{ {\sum\nolimits_i {\left[ {\ln \frac{{{x_i}\left( {SN} \right)}}{{{x_i}(N)}}} \right]{r_i}\left( {n|N} \right)} } \right\} }} $$
(1)

where i is the fiber number, the sum is over all CFs and the physiological distribution of SR groups (described below), x i (SN) and x i (N) are the synapse outputs averaged across time and noise repetitions for the signal + masker and masker-only intervals, respectively. The quotient ln[x i (SN)/x i (N)] is the weighting of each fiber that gives preference to fibers responding strongly to the signal, T is the duration of the analysis window, and r i (n|N) is the discharge rate for the nth realization of the random noise masker. The denominator in the Q equation separates variability into internal (spike-train variability, 1st term) and external (stimulus variability, 2nd term) sources. The first term in the denominator accounts for the Poisson variability inherent in AN spike-train responses, which can limit detection. This term is necessary because simulations involved collecting data from the model’s synapse output (which does not include Poisson variability) rather than the model’s spike generator (which does include Poisson variability). The second term in the denominator primarily represents variance inherent in the noise stimulus. Separating the denominator into two terms has the advantage of revealing to what extent each source of variability influences detection threshold.

Estimating thresholds

As a general rule, threshold for a given signal level was defined as the highest masker level producing a target Q value. Several steps were involved in calculating threshold. Firstly, signal + masker and masker-only simulations were paired according to masker level. Secondly, a sliding window was applied to each of these pairs. This rectangular window computed the average discharge rate across a narrow time frame defined by the window width as schematized in Figure 3. Thirdly, Q for each window output was computed, resulting in an estimate of detectability for numerous time slices across the duration of the masker. From these computations, a “detectability surface” was obtained, which is a three-dimensional plot of detectability versus time and masker level. Figure 4 displays the detectability surface for a signal level of 80 dB SPL. The detectability axis (z-axis) in Figure 4 has an upper limit equal to the target Q value (1 in this case). Scaling the z-axis in this way divides detectability into two regions. The filled region represents Q values less than the target (undetectable). Similarly, the unfilled (white) region represents Q values greater than the target (detectable). The boundary between the two regions is a visual representation of threshold for a given time slice. Finally, a customized computer script estimated the most sensitive time slice in the detectability surface and interpolated masker threshold for the target Q value. For example, the threshold in Figure 4 is approximately 30 dB spectrum level.

FIG. 3
figure 3

Schematic representation of the averaging window for the masker-only (A) and signal + masker (B) simulations. A rectangular moving window was applied to each synapse output, creating several time slices from 1 to m. Detectability was calculated by comparing corresponding time slices between masker-only and signal + masker simulations. In this example, detectability is high in time slice “k,” which corresponds to the long-delay condition of overshoot.

FIG. 4
figure 4

Detectability of an 80-dB SPL signal as a function of time and masker spectrum level. The signal is presented in the long-delay (200 ms) condition. Regions where Q is above and below detection threshold are represented by unfilled (white) and filled areas, respectively. The boundary between these regions corresponds to threshold for a given time slice. For all predictions, threshold (~30 dB in this case) was defined as the highest masker level corresponding to the most sensitive time slice (~200 ms in this case).

Estimating threshold in this way assumes human observers employ an interval comparison strategy to detect the signal. In other words, they compare signal + masker and masker-only intervals in the same time window. An alternative detection strategy that may be more realistic involves temporal profiling, where the observer identifies the signal + masker interval by comparing adjacent time windows within each individual interval. Although interval comparison and temporal profiling are qualitatively different detection strategies, such strategies have been shown to produce similar thresholds when the masker level is constant across intervals (Heinz and Formby 1999; Richards 2002). Pilot data (not shown) from the present experiment confirms this conclusion. Other strategies beyond the two discussed may also be used by human observers; however, the effects of such strategies on threshold were not evaluated in these experiments.

Analysis variables

In modeling psychophysical performance based on auditory physiology, predicted thresholds are often better than human performance. This discrepancy is often overcome by imposing limitations on the model (for a discussion see Delgutte 1996). In terms of the present experiment, several studies on overshoot (Champlin and McFadden 1989; McFadden and Champlin 1990; Bacon and Takahashi 1992) and a related effect in intensity discrimination called the “mid-level hump” (e.g., Carlyon and Moore 1984; Zeng and Turner 1992; Oxenham and Moore 1995) suggest that performance may be limited by high SR fibers dominating the response. Similarly, model predictions may be limited by the number of fibers used to determine psychophysical performance (e.g., Viemeister 1988). These and other “analysis variables” were used to determine what limitations to impose on the model in order to account for psychophysical thresholds in overshoot. To be parsimonious, detection was assumed to be governed by the same combination of analysis variables in the short- and long-delay conditions.

Detection thresholds in overshoot conditions were estimated for all possible combinations of five analysis variables. These variables were (1) the SR pooling, (2) range of CFs, (3) the number of synapses per CF, (4) the d-prime value defining threshold, and (5) the analysis time window width. Table 1 displays the analysis variables and their corresponding values.

TABLE 1 Analysis variables and their values

For pooled SR simulations (e.g., high/medium/low SR condition), the high, medium, and low SR types consisted of 61%, 23%, and 16% of the total number of synapses at each CF (Liberman 1978). In simulations where one or more SR types were absent (e.g., high/low SR), this method of pooling assumes that the absent SR types were ignored in the detection process. The d-prime and percent correct values (Macmillan and Creelman 2005) displayed in Table 1 are based on a three-alternative forced-choice procedure common to overshoot experiments (von Klitzing and Kohlrausch 1994; Strickland 2001; Savel and Bacon 2003). These d-prime values were squared to calculate the corresponding values for the Q metric.

After thresholds were obtained for each combination of analysis variables, the best-fitting combination was determined for a given virtual listener using a least squares method. In this context, “best fitting” is defined in relation to behavioral overshoot data from studies involving normal-hearing and hearing-impaired listeners. The fitting method involved one parameter which shifted all model thresholds by a constant dB value. In the figures that follow, only the best-fitting model thresholds are plotted for each virtual listener and then compared with behavioral data.

Results

Normal hearing simulations (NHMOCR− and NHMOCR+)

Strickland (2004) measured overshoot as a function of signal level in normal hearing listeners. Consistent with Bacon (1990), she found that overshoot was largest when the masker spectrum level was between 10 and 30 dB. Figure 5 compares simulated data from the NHMOCR− virtual listener (Fig. 5B) with the mean data from Strickland (2004) (Fig. 5A). The short-delay and long-delay thresholds are plotted as open and closed symbols in Figure 5A, B. Overshoot (Fig. 5C) for this figure and later figures is calculated by subtracting the short-delay thresholds from the long-delay thresholds for a given signal level. The behavioral data exhibits overshoot that increases over low to medium signal levels and then decreases slightly for the highest signal level. Conversely, overshoot is generally absent for the NHMOCR- virtual listener. The analysis variable values for these model data consisted of a 15-ms analysis window, high and low SR fibers, five fibers/channel, and the model CF at the signal frequency. All other combinations of analysis variables produced less overshoot in this virtual listener.

FIG. 5
figure 5

Behavioral (A) and model (B) thresholds for the NHMOCR− virtual listener in the short-delay (Δt = 2 ms) and long-delay (Δt = 200 ms) overshoot conditions. Behavioral thresholds are from Strickland (2004). C Overshoot from the data presented in (A) and (B). Values plotted in (C) were calculated by subtracting the short-delay thresholds from the long-delay thresholds. The analysis variable values for these model data consisted of a 15-ms analysis window, high and low SR fibers, five fibers/channel and the model CF at the signal frequency.

Interestingly, in the short-delay simulations (open symbols in Fig. 5) the majority of model predictions were unable to produce the shallow slope observed in the psychophysical data. The minority of simulations that did produce a shallow slope had only high SR synapses. In other words, model predictions in this condition were much more sensitive than psychophysical performance unless the model was limited to “listen” with only high SR fibers.

Figure 6 is similar to Figure 5, except that the model predictions are from the NHMOCR+ virtual listener, in which OHC gain was reduced in the long-delay condition for both the signal + masker and masker-only simulations. The model predicts overshoot that is nearly equal in magnitude to the overshoot observed in the behavioral data. Similarly, the model accounts for the build-up and decay of overshoot as a function of signal level. The analysis variable values for these model data consisted of a 15-ms analysis window, only high SR fibers, 100 fibers/channel and all model CFs.

FIG. 6
figure 6

Behavioral (A) and model (B) thresholds for the NHMOCR+ virtual listener in the short- and long-delay overshoot conditions. For this virtual listener, outer hair cell gain was reduced in the long-delay condition according to Figure 2. Behavioral thresholds are from Strickland (2004). C Overshoot from the data presented in (A) and (B). Values plotted in (C) were calculated by subtracting the short-delay thresholds from the long-delay thresholds. The analysis variable values for these model data consisted of a 15-ms analysis window, only high SR fibers, 100 fibers/channel and all model CFs.

The amount of overshoot produced by other combinations of analysis variables is summarized in Figure 7. In this figure, purple tiles indicate simulations which produced an average overshoot larger than 2.5 dB. Similarly, gray tiles indicate simulations which produced less than −2.5 dB of overshoot (i.e., undershoot). Unfilled (white) tiles indicate simulations between −2.5 and 2.5 dB. In this figure, average overshoot was computed over signal levels ranging from 60 dB SPL and above. For some simulations involving only low SR fibers, the fiber threshold in quiet was too high to produce a masked detection threshold for every signal level in this range, resulting in insufficient data to compute average overshoot. Such simulations are represented by tiles with an “X” through the center. Starred tiles represent simulations where the rms error between the model and behavioral thresholds was less than 4 dB. The circled tile corresponds to the best-fitting combination of analysis variables (data shown in Fig. 6).

FIG. 7
figure 7

Average overshoot in the NHMOCR+ virtual listener for all combinations of analysis variables. Fiber scaling and CF range is plotted versus analysis window width and fiber pooling. Combinations producing average overshoot greater than 2.5 dB are represented by purple tiles. Purple tiles marked with asterisks are those conditions which produced a good fit to the behavioral data (rms error <4 dB). Similarly, combinations producing less than 2.5 dB of overshoot are represented by gray tiles. Unfilled (white) tiles are conditions which did not produce overshoot (i.e. −2.5 < unfilled tiles <2.5). Conditions where average overshoot could not be calculated due to absent thresholds are represented by large crosses. The circled tile corresponds to the best-fitting combination of analysis variables.

Overshoot in normal hearing listeners ranges between 10 and 20 dB on average for mid-to-high masker levels (Bacon 1990). This range serves as a reference point in interpreting Figure 7. Specifically, this range suggests that purple tiles represent analysis variable combinations producing any overshoot above 2.5 dB, while starred tiles represent combinations producing overshoot of similar magnitude to normal hearing listeners. A four-way analysis of variance (ANOVA) with 1st order interactions was computed to determine which analysis variables were significant in producing overshoot. The ANOVA factors included were analysis window width, CF range, fiber scaling and SR pooling. The results revealed that all main effects and 1st order interactions were significant. Eighty-three percent of the total sum of squares was accounted for by the SR pooling analysis variable. A post hoc test revealed that the high SR pooling had a population marginal mean at least 12.5 dB greater than all other SR pooling strategies. In other words, simulations with only high SR fibers produced at least 12.5 dB more overshoot than simulations with other SR pooling strategies.

Psychometric functions were derived for the NHMOCR− and NHMOCR+ data plotted in Figures 5 and 6. Oxenham and Moore (1995) showed that psychometric functions are relatively shallower in the gated-masker condition (i.e., short-delay condition) when overshoot is present. Figure 8 plots d-prime as a function of masker spectrum level for three signal levels (rows) and two virtual listeners (columns). From top to bottom, the signal levels for the rows are 40, 60, and 80 dB SPL, respectively. The gray and black lines in each panel represent the short and long-delay predictions, respectively. The right column represents the NHMOCR+ virtual listener. Consistent with Oxenham and Moore (1995), this simulation produces relatively shallower slopes for the short-delay condition. No such relationship was observed for the NHMOCR− virtual listener (left column). Although Oxenham and Moore (1995) did not speculate regarding the mechanism responsible for shaping psychometric functions in overshoot, the present findings suggest the MOCR may play a role.

FIG. 8
figure 8

Psychometric functions for NHMOCR− (left column) and NHMOCR+ (right column) virtual listeners at three signal levels. In each panel, short- and long-delay conditions are represented by gray and black lines, respectively. Signal levels were 40 (top row), 60 (middle row), and 80 (bottom row) dB SPL. d-prime values plotted on the ordinate were calculated by taking the square root of the Q metric (see Eq. 1 and text for details). Analysis parameters for NHMOCR− and NHMOCR+ simulations were described in the text and in captions for Figures 5 and 6, respectively.

Hearing-impaired simulations (HIMOCR− and HIMOCR+)

Several studies have measured overshoot in listeners with presumed cochlear hearing loss (Carlyon and Sloan 1987; Bacon and Takahashi 1992; Strickland and Krishnan 2005). In general, overshoot is reduced in the hearing-impaired population; however, substantial inter-subject variability exists. Strickland and Krishnan (2005) showed that much of this variability can be explained by signal threshold in quiet. The HIMOCR− simulation involved a 40 dB flat hearing loss across the CFs simulated. This suggests the target overshoot for this simulation should be consistent with performance of a human listener with a 40 dB flat hearing loss. Subject 5 from Strickland and Krishnan (2005) fits this criterion. This degree and configuration of hearing loss was selected for simplicity in working with the model. Figure 9 compares overshoot for the HIMOCR− virtual listener with S5 from Strickland and Krishnan (2005). The model predictions do not account for the small amount of overshoot observed in the behavioral data. Similar to the NHMOCR− listener simulation (Fig. 5), this simulation predicts essentially no overshoot. The analysis variable values for these model data consisted of a 15-ms analysis window, high and low SR fibers, ten fibers/channel and all model CFs.

FIG. 9
figure 9

As in Figure 5 except for the HIMOCR− virtual listener. The analysis variable values for these model data consisted of a 15-ms analysis window, high and low SR fibers, ten fibers/channel and all model CFs. Behavioral data (A) are from Strickland and Krishnan (2005).

Figure 10 is similar to Figure 9, except the model predictions are for the HIMOCR+ virtual listener. The model predicts a small overshoot that grows with signal level up to 80 dB SPL, roughly consistent with the behavioral data. The analysis variable values for these model data consisted of a 15-ms analysis window, high and low SR fibers, five fibers/channel and only high-frequency model CFs. The amount of overshoot produced by other combinations of analysis variables is summarized in Figure 11. The interpretation of the various tiles in this figure is as described for Figure 7. Similar to the ANOVA for the NHMOCR+ simulations, most of the variance (71%) in average overshoot was due to the SR pooling analysis variable.

FIG. 10
figure 10

As in Figure 6 except for the HIMOCR+ virtual listener. The analysis variable values for these model data consisted of a 15-ms analysis window, high and low SR fibers, five fibers/channel and only high-frequency model CFs.

FIG. 11
figure 11

As in Figure 7 except for the HIMOCR+ virtual listener.

Detectability analysis for simulations without the MOCR

Data from the NHMOCR− and HIMOCR− virtual listeners suggest that CFR adaptation and DR adaptation in the AN responses of anestitized animals may not be mechanisms of overshoot. This conclusion is based on the fact that the power law model accounts for CFR adaptation and yet overshoot was never observed in predictions without the MOCR simulation (MOCR−) despite the large range of analysis variables considered (same range as Figs. 7 and 11). For example, consistent with Smith and Zwislocki (1975) and Zilany et al. (2009), the simulations showed that firing to the signal increased roughly the same amount in the short and long-delay conditions. Figure 12B illustrates this finding, where the mean difference from Eq. 1 (Δmean) is plotted versus signal level. The masker level in this figure was 20 dB spectrum level, which is in the region of maximal overshoot for normal hearing listeners (Bacon 1990; Strickland 2004). This mean difference represents the increment in firing rate due to the presence of the signal and is the numerator term for calculating d-prime (where d-prime is computed as the square root of Q in Eq. 1). Lines labeled “Δmean” are nearly equal suggesting incremental firing rate near signal threshold is similar across short and long-delay conditions. In other words, these data are evidence that the model captures the constant incremental firing in the presence of adaptation as described by Smith and Zwislocki (1975). Given equal firing between short and long-delay conditions, overshoot can only emerge if the standard deviation (i.e., denominator term for computing d-prime in Eq. 1) is appreciably lower in the long-delay condition. Although the standard deviation (lines marked “s.d.”) is slightly reduced in the long-delay condition, this reduction is not large enough to account for overshoot in normal-hearing or hearing-impaired subjects. Figure 12A displays the d-prime values over a range of signal levels near threshold. The horizontal line represents the value needed to achieve 71% correct performance. The signal thresholds corresponding to this level of performance produce an overshoot of less than 1 dB (Fig. 5C). This value is less than expected based on the estimates of 3–5 dB reported by Smith and Zwislocki (1975), whose estimate considered differences in mean firing rate between masker and signal + masker intervals. The somewhat smaller estimate of overshoot from the present modeling data is likely due to the use of signal detection theory techniques, which consider both the mean difference and the variability in firing rate when calculating threshold.

FIG. 12
figure 12

An analysis of factors contributing to d-prime explains why classic firing rate adaptation did not produce overshoot in the NHMOCR− virtual listener. The difference in mean (d-prime numerator, labeled Δmean) is roughly equal across long and short-delay conditions (B). This term represents the increment in firing rate due to the presence of the signal. The standard deviation (d-prime denominator, labeled “s.d.”) is relatively smaller in the long-delay condition (B); however, not enough to produce a large difference in detectability (A). In all panels, dashed and solid lines represent the short- and long-delay conditions, respectively. The horizontal line in (A) marks a d-prime of 1.28 or 71% correct; a common percentage defining threshold in an overshoot experiment. As in Figure 8, d-prime was calculated by taking the square root of the Q metric (see text for the definition of Q).

Variability in the best-fitting analysis variables

In many of the preceding figures, behavioral data were compared with model predictions obtained from the best-fitting combination of analysis variables. Since overshoot was only observed in the NHMOCR+ and HIMOCR+ virtual listeners, the discussion of analysis variables will be limited to these simulations. Although the best-fitting combination of analysis variables was different between these simulations, the set of combinations producing overshoot was quite similar. In fact, nearly all combinations producing overshoot in the HIMOCR+ simulation are a subset of those combinations producing overshoot in the NHMOCR+ simulation. This is most easily observed by comparing purple tiles in Figures 7 and 11. Moreover, roughly 50% of the combinations resulting in an rms error less than 4 dB (i.e., the starred tiles in Figs. 7 and 11) are common among NHMOCR+ and HIMOCR+ simulations. This suggests that there is nothing special about the best-fitting combination of analysis variables displayed in Figures 6 and 10. In other words, a fairly large subset of combinations was observed that would have produced similar results as the best-fitting combinations.

Discussion

Classic firing rate adaptation and MOCR feedback have been suggested as hypotheses for overshoot and were tested using computational modeling and SDT. Model predictions were based on the parsimonious assumption that detection was governed by the same set of analysis variables in the long and short-delay conditions. These analysis variables were used to limit the performance of the model to better match human performance. The ensuing discussion summarizes and interprets the results of the normal-hearing and hearing-impaired simulations for each overshoot hypothesis.

The classic firing rate adaptation hypothesis

Data from simulations without the MOCR (NHMOCR− and HIMOCR−) suggest that CFR adaptation may not be a mechanism of overshoot. The power law model accounts for CFR adaptation and yet overshoot was never observed in these predictions, despite the large range of analysis variables considered (see Figs. 7 and 11). In other words, regardless of how model performance was limited, CFR adaptation did not produce overshoot. This result was unexpected considering that CFR adaptation is commonly invoked (e.g., Bacon and Healy 2000) to explain at least 3–5 dB of the overshoot effect (Smith and Zwislocki 1975). As shown in Figures 5 and 12, CFR adaptation at most produced less than 1 dB of overshoot in the normal hearing simulations without the MOCR.

As previously mentioned, the power law model also accounts for AN firing rate DR adaptation in anesthetized cats. Thus, this form of adaptation is also present in the results (at least the rapid component of DR adaptation that occurs over a time course shorter than the 400-ms duration of the masker (Zilany and Carney 2010)). Based on the results of the present simulations, we can therefore also conclude that DR adaptation in the AN appears insufficient to account for overshoot, without the additional DR adaptation provided by the MOCR.

The medial olivocochlear reflex hypothesis

A subset of simulations with the MOCR (NHMOCR+ and HIMOCR+) was capable of producing overshoot that was similar in magnitude and level dependence to psychophysical data (Figs. 6 and 10). This finding strengthens the MOCR hypothesis and is consistent with the modeling findings from Strickland and colleagues (Strickland 2001, 2004; Strickland and Krishnan 2005; Strickland 2008), which were based on reducing basilar membrane gain (as suggested by von Klitzing and Kohlrausch 1994).

An advantage of the present experiments is the quantitative evaluation of processes in the auditory nerve, which were previously hypothesized to be related to overshoot (Champlin and McFadden 1989; McFadden and Champlin 1990; Bacon and Takahashi 1992), but were not rigorously tested. For example, the shallow slope of the short-delay function (open squares in Fig. 5A) in normal hearing listeners is often assumed to be a result of basilar membrane compression. The present results suggest this may not be the only factor, because this shallow slope was only achieved when the model’s decision process was limited to high SR fibers. In the long-delay condition, the effect of this limitation is overcome by adjusting the gain of the OHCs via the MOCR and thus releasing the high SR fibers from CFR adaptation to the noise masker. This finding is similar to other modeling studies that have suggested that high SR fiber profiles are more robust at high levels when efferent feedback is simulated (Messing et al. 2009; Brown et al. 2010).

The fact that only a subset of conditions produced overshoot in simulations with the MOCR suggests that some limitations must be imposed on this hypothesis. For example, overshoot only emerged when detection was dominated by high SR fibers. Specifically, this limitation was critical in predicting the short-delay condition where model performance was appreciably better than human performance when medium and low SR groups were included. Detection in noise via high SR fibers appears inconsistent with the general suggestion from physiological data that detection in noise is primarily based on low SR fibers (Young and Barta 1986); however, there are some important methodological issues to consider. The effects of MOC efferents are likely to have been greatly suppressed in most physiological data from AN fibers due to the effects of anesthesia. Without the MOC efferents the high SR fibers remain saturated, and thus their potential contribution to detection may be underestimated. Furthermore, the data from Young and Barta (1986) considered detection of a long duration tone (200 ms) occurring 15 s after the onset of a steady-state noise. Detection in this condition is most consistent with the long-delay overshoot condition in the present study, where the tone occurs well after masker onset. However, as discussed above, it was in the short-delay condition that the model needed to be limited to rely primarily on high SR fibers. Thus, detection in the short-delay condition of overshoot may be much different than the conditions for which low SR fibers have been suggested to be most important (i.e., detection of tones in steady-state noise). For example, at masker onset all SR fibers have a wider dynamic range and thus there is less need to rely on low SR fibers for detection (Smith and Zwislocki 1975).

A potential quantitative limitation in our simulations was that dynamic range adaptation, via the MOCR, was modeled by reducing OHC gain by up to 40 dB (Fig. 2A), similar to previous modeling studies (Messing et al. 2009). This value is somewhat higher than MOC strength observed on the basilar membrane. For example, MOC suppression on the basilar membrane ranges from 10 to 30 dB depending on the elicitor and the stimulus parameters (e.g., Murugasu and Russell 1996; Dolan et al. 1997; Russell and Murugasu 1997). Other mechanisms which exhibit DR adaptation may account for the difference between our model settings and physiological data related to MOCR strength. Robinson and McAlpine (2009) summarized how DR adaptation occurs at several locations along the auditory pathway. As discussed by Dean et al. (2005), this “diversity” in adaptive mechanisms suggests they work in concert to “improve the accuracy of the neural code for sound level.” In other words, our modeling results suggest that overshoot may be due to DR adaptation at several locations along the auditory pathway. Moreover, of these mechanisms, the MOCR appears to be a strong player in overshoot given that our simulations involved explicitly reducing the gain of the OHCs.

Overshoot and hearing impairment

Diminished overshoot in listeners with temporary (Champlin and McFadden 1989; McFadden and Champlin 1990) or permanent cochlear hearing loss (Bacon and Takahashi 1992) has been hypothesized to be due to a reduction in the onset responses of AN fibers. This hypothesis assumes that CFR adaptation is altered following cochlear hearing loss in such a way that onset responses are reduced. Such an alteration is inconsistent with findings of an enhanced onset to steady-state firing ratio in AN fibers following sensorineural hearing loss (Crumling and Saunders 2007; Scheidt et al. 2010). These physiological findings suggest that if CFR adaptation were responsible for overshoot then overshoot should be enhanced following SNHL, which contrasts with the observation that overshoot is often reduced in hearing-impaired listeners.

The decrease in overshoot with cochlear hearing impairment has also been used as evidence that overshoot is related to cochlear gain (von Klitzing and Kohlrausch 1994; Strickland 2001). Consistent with behavioral data (Fig. 10), overshoot was reduced in the hearing-impaired simulations with the MOCR. Although a reduction in overshoot was observed in the model predictions, the reduction was larger than for the human listener in Figure 10. In the hearing-impaired simulations, the hearing loss was modeled as resulting solely from a decrease in gain from the OHCs, whereas in the behavioral data the relative damage to IHCs and OHCs is not known, but is likely to have been mixed (Bruce et al. 2003; Plack et al. 2004). Since the MOCR is directly related to OHC gain, the present implementation of cochlear loss as entirely OHC damage is likely to have overestimated the reduction in overshoot that would have occurred with a mixed OHC/IHC loss. Nonetheless, these hearing-impaired model predictions provide further quantitative support for the MOCR hypothesis. Moreover, the best-fitting model simulations (i.e., starred tiles in Figs. 7 and 11) were common among normal-hearing and hearing-impaired listeners, suggesting that the MOCR may be a common mechanism in producing overshoot across listener populations.

This general finding suggests that the present modeling approach that combines the MOCR and cochlear hearing loss could be used in future studies to explore the effects of cochlear hearing loss on more complex listening situations. Previous MOCR modeling studies have suggested an important role for the MOCR in understanding speech in noise (Ferry and Meddis 2007; Ghitza et al. 2007; Messing et al. 2009; Brown et al. 2010). The present findings extend these studies to suggest that the benefits provided by the MOCR for listening in noise may be diminished in listeners with cochlear hearing loss. The present modeling approach suggests that this could occur even with no direct degradation to the efferent system itself, but simply because there is less OHC gain for the efferent system to reduce following cochlear hearing loss. These effects may have important implications for listening in noise with cochlear hearing loss, which represents the condition for which people have the most difficulty even with modern hearing aids.