INTRODUCTION

The audibility of a target sound in a complex acoustic environment can vary depending on the context surrounding the target. One example of such context dependence is an effect known as “auditory enhancement” or “the enhancement effect”. The enhancement effect occurs when a narrowband stimulus (target) embedded in a broader stimulus (masker) becomes more audible or more salient when the target–masker mixture is preceded by a copy of the masker (precursor) with attenuated or no energy at or near the target frequency (Viemeister 1980; Summerfield et al. 1984, 1987; Thibodeau 1991, 1996; Byrne et al. 2011, 2013; Carcagno et al. 2012). The enhancement effect may play an important role in perception considering evidence which suggests that the effect may facilitate recognition of speech in noise in listeners with normal hearing (Summerfield et al. 1987; Thibodeau 1996). The apparent reduction or absence of the enhancement effect in listeners with hearing loss (Thibodeau 1991) may also contribute to their difficulty in understanding speech in noise.

Enhancement has been demonstrated in a number of perceptual tasks through improved detection (Viemeister 1980) and improved pitch identification (Carcagno et al. 2012) of an enhanced target, increased forward masking by an enhanced target (Viemeister and Bacon 1982; Thibodeau 1991; Carcagno et al. 2014), and improved synthetic vowel recognition by enhanced formant frequencies (Summerfield et al. 1984, 1987; Summerfield and Assmann 1987; Thibodeau 1996), and yet the underlying mechanisms remain unclear. A decreased threshold for detecting a target and increased forward masking by the target observed following a precursor suggest that enhancement results from an increased physiological response to a target in the enhancement (precursor-present) condition compared with the no-precursor condition rather than from a decreased physiological response to a masker, although both changes in responses may be induced by a precursor and contribute to the overall increase in target’s salience. An increased physiological response to the target has been also suggested by the observation that an enhanced target requires an increased level of its contralateral copy to produce a centered binaural image (Byrne et al. 2011).

An increase in the physiological response to a target due to a precursor cannot be explained solely by frequency-selective adaptation of masker components by preceding stimulation, such as that observed in the auditory nerve (Smith 1977; Westerman and Smith 1984). The auditory-nerve adaptation would result in a decreased response to the masker and a decreased or intact response to the target, depending on the frequency spacing between the precursor and target components. Thus, adaptation at this level could explain an increase of the target response relative to the surrounding maskers, but not the absolute increase, relative to the unenhanced target (e.g., Palmer et al. 1995). An explanation based on adaptation of suppression or inhibition has been invoked to account for the absolute increase (Viemeister and Bacon 1982; Thibodeau 1996; Byrne et al. 2011; Carcagno et al. 2012). The most peripheral mechanism that could result in the enhancement effect involves a frequency-selective reduction of cochlear gain via the medial olivocochlear reflex (MOCR; Lilaonitkul and Guinan 2012). This adaptation would require prior stimulation long enough to allow for a sufficient buildup of the efferent effect (Backus and Guinan 2006). Medial olivocochlear (MOC) efferent activation has been shown to produce a relative increase in auditory-nerve responses for tones in noise (Dolan and Nuttall 1988; Winslow and Sachs 1988; Kawase et al. 1993). Theoretically, an MOCR-based mechanism could contribute to the enhancement effect in more than one way. Due to its activation by the precursor, the MOCR could decrease BM responses to the components in a masker and thereby produce a release from BM suppression of the response to a target by the masker components. A change in suppression of the target due to a change in cochlear gain at the suppressor frequency would require longitudinal coupling in the cochlea. Experimental and theoretical works have identified a few sources of longitudinal coupling in the cochlear structures: the BM, the cochlear fluids, and the tectorial membrane (Naidu and Mountain 2001, 2007; Meaud and Grosh 2010; Eze and Olson 2011). The tectorial membrane has been shown to affect cochlear gain and tuning because of its direct connection with the outer hair cells (OHCs) hair bundles (Mammano and Nobili 1993; Legan et al. 2000; Ghaffari et al. 2007; Russell et al. 2007; Meaud and Grosh 2010, 2014). Using a biophysical model of the cochlea, Meaud and Grosh (2010) showed that longitudinal coupling due to the viscoelastic load of the tectorial membrane on OHC hair bundles has a significant effect on somatic OHC motility because it affects the bend and deflection of the hair bundles, and thus it affects the OHC potassium current. Their model correctly predicts significantly sharper tuning in the absence of longitudinal coupling via tectorial membrane, consistent with experimentally observed sharpening of tuning in mutant mice with genetically decreased viscoelastic properties of the tectorial membrane (Russell et al. 2007). Because viscoelastic longitudinal coupling in the tectorial membrane affects tuning and nonlinear BM responses, a change in cochlear gain at the suppressor frequency may affect the amount of suppression at the target frequency, as suggested by Jau and Geisler (1983). The idea of adaptation of BM suppression due to MOC efferent activation has been previously used to explain changes in psychophysical tuning after a precursor (Strickland 2004) in a phenomenon known as “overshoot” or “the temporal effect” (Champlin and McFadden 1989a, b; von Klitzing and Kohlrausch 1994; Strickland 2001; Strickland and Krishnan 2005; Jennings and Strickland 2012) that bears many similarities to the enhancement effect. Another possibility is that enhancement results from adaptation of lateral inhibition occurring at a level central to the auditory nerve, but that adaptation is, at least in part, facilitated by the peripheral auditory processing. The MOCR could contribute to this adaptation by producing a greater reduction in cochlear response to the masker components than to the signal component by a precursor with a spectral notch around the signal frequency. In this case, the cochlear response to the signal would not increase compared to that in the no-precursor condition, but the frequency-selective decrease of the response to masker components by the precursor could set up a stage for the adaptation of central lateral inhibition. Nelson and Young (2010) showed enhanced neural responses in the central nucleus of the inferior colliculus (IC) in awake marmoset monkeys in response to stimuli inducing auditory enhancement, and they were able to predict these responses using a phenomenological model incorporating excitatory and inhibitory inputs to the IC. Although their finding does not preclude the possibility of peripheral contribution to the effects observed in the IC, peripherally generated enhancement was ruled out based on the lack of enhancement in responses of auditory-nerve fibers recorded by Palmer et al. (1995). However, Palmer et al. performed their measurements in anesthetized guinea pigs, and the effects of MOC efferent activation can be significantly reduced by anesthesia (Boyev et al. 2002; Chambers et al. 2012). Behavioral evidence in humans is similarly inconclusive. On one hand, Thibodeau (1991) reported an absence of enhancement in listeners with cochlear hearing loss, suggesting a peripheral component. On the other hand, Wang et al. (2012) reported enhancement effects in cochlear-implant users, for whom the cochlea is completely bypassed and the auditory nerve is stimulated directly, suggesting that the MOC efferent effects cannot account for the entire enhancement effect. In addition, Carcagno et al. (2014) reported no correlates of enhancement in the 80-Hz auditory steady-state response (ASSR), which is believed to reflect brainstem processing. Thus, the question about the site of the adaptation of suppression/inhibition and the exact mechanism underlying the enhancement effect remains open.

In this study, the possible effects of MOC efferent activation on BM responses were examined via stimulus frequency otoacoustic emissions (SFOAEs) using stimuli that have been shown to produce significant enhancement effects in previous studies (e.g., Byrne et al. 2011). SFOAE measurements provide a noninvasive method to probe mechanical cochlear responses and have been extensively used to study the effects of MOC efferent activation in humans (Guinan et al. 2003; Backus and Guinan 2006; Lilaonitkul and Guinan 2009, 2012; Wojtczak et al. 2015). We reasoned that MOC efferent activation by a precursor could result in frequency-selective reduction in gain of BM responses to masker components. Due to the longitudinal coupling in the cochlear structures, this gain reduction could result in a reduction of BM suppression of the response to a target by the masker components. The reduction of suppression would be expected to result in an increase in SFOAE magnitude at the target frequency in the presence of the precursor compared with the SFOAE magnitude measured without the precursor. On the other hand, if the SFOAE magnitude at the target frequency decreased or remained the same, while the SFOAE magnitudes for the flanking masker components decreased relatively more, then this effective relative enhancement could contribute to adaptation of inhibition at higher stages of auditory processing. Because not all listeners with normal hearing have shown a significant enhancement effect in previous studies (Thibodeau 1996), the stimuli used in the SFOAE measurements were also used to measure enhancement in the same subjects using a psychophysical detection task.

EXPERIMENT 1: EFFECTS OF MOC EFFERENT ACTIVATION ON SFOAES AND ENHANCEMENT FOR A WIDEBAND MASKER AND PRECURSOR

Rationale

Figure 1 schematically illustrates the mechanism whereby an increase in the response to the signal component would occur in the presence of a precursor (1B), relative to the baseline condition with no precursor (1A). The stimuli, a pure-tone signal (red line) and a multitone masker and precursor (blue lines), are schematically illustrated at the top of each panel. At the bottom of each panel, changes in cochlear gain are schematically illustrated for the masker/precursor (blue trace) and the signal (red trace) components.

FIG. 1
figure 1

Schematic illustration of the hypothesized mechanism of MOC-efferent induced enhancement. The top panels show the stimuli in the no-enhancement condition (A) and in the enhancement condition (B). The bottom panels show hypothetical changes in cochlear gain due to efferent activation for the signal (red line) and the non-signal (blue lines) components.

In Figure 1A, both the signal and masker components start at the same time. According to the time course of the MOC efferent effect on the SFOAE measured in humans (Backus and Guinan 2006), the cochlear gain would remain constant for about 25–30 ms, until the MOCR begins to build up. After this latency, marked by the vertical dashed line, the cochlear gain would gradually decrease over the course of about 230 ms and then reach an asymptotic level that would decrease very gradually over the course of seconds. The schematic illustration assumes that the decay of cochlear gain will be similar for the signal and masker components, and thus no component will perceptually stand out in the stimulus during its time course. In addition to the decrease in cochlear gain, the target and masker components mutually suppress each other’s responses on the BM. The suppression effect is instantaneous (Sachs and Kiang 1968) and not shown explicitly in this figure. Figure 1B schematically illustrates the effect of MOC efferent activation on cochlear gain at the signal and masker/precursor component frequencies in the presence of a precursor. The figure shows a decrease in cochlear gain at the frequencies of the precursor components due to MOC efferent activation. When the temporal gap between the precursor and the masker is shorter than the latency of the MOCR decay, BM gain of responses to components of the masker will remain reduced due to the sustained efferent effect (elicited during the precursor). Because there are no frequency components in the precursor at or near the signal frequency, the cochlear gain of the response to the signal is assumed to be less affected by the precursor and is reduced mainly via the instantaneous suppression of the BM response to the signal by the masker components. The frequency-selective efferent effects would therefore result in a stronger response to the signal component relative to the MOCR-suppressed responses to the masker components (solid red trace). In addition, because the responses to masker components are reduced on the BM by the precursor, these components may produce less suppression of the BM response to the signal if viscoelastic longitudinal coupling in the tectorial membrane affects the OHCs hair bundles such that it reduces saturation of the OHC current at the characteristic frequency corresponding to the signal. Reduced suppression of this nature would in turn result in an absolute increase in the response to the signal (dashed red trace) compared to that for the stimulus shown in Figure 1A.

The magnitude of an SFOAE is thought to depend on the cochlear gain at the cochlear place with a characteristic frequency corresponding to the frequency evoking the emission (Guinan 2006). If the hypothesized MOCR-based mechanism illustrated in Figure 1 is correct, SFOAE measurements should show a decrease in the SFOAE magnitudes at the masker component frequencies and either a smaller decrease or no change in SFOAE magnitude at the signal frequency relative to the SFOAE magnitudes measured for the stimulus in Figure 1A. Additionally, if a decrease in the response to the masker components results in a hypothesized reduction of suppression at the signal frequency, an increase in the SFOAE at the signal frequency may be observed, as shown by the dashed red line for the stimulus illustrated in Figure 1B.

Listeners

SFOAEs were measured in 12 listeners (six females, six males). Listeners’ ages ranged from 20 to 31 years (median 21.5 years). The listeners had normal hearing thresholds (<15 dB HL) at audiometric frequencies between 250 and 8000 Hz, as measured using an ANSI-certified audiometer (Madsen Conera). The listeners were screened for the presence of spontaneous otoacoustic emissions (SOAEs) to make sure that they did not have SOAEs with frequencies within 100 Hz around the frequencies for which SFOAEs were measured. Such screening is routinely performed before SFOAE measurements to avoid interactions between the spontaneous and evoked emissions which would complicate the interpretation of observed results (e.g., Guinan et al. 2003). During the experiment, the listeners were seated in a double-walled sound-attenuating booth. For the SFOAE measurements, the listeners were seated in a comfortable reclining chair and were instructed to remain relaxed and as still as possible during the recordings. For the psychophysical task, the listeners were seated in a comfortable chair and were instructed to listen attentively and provide responses to the stimuli. The listeners received about 15 min of practice in the psychophysical task using an equal number of runs with and without the precursor. Prior to data collection, the listeners provided written informed consent, and the protocol for this study was approved by the Institutional Review Board of the University of Minnesota.

Stimuli and Procedure

SFOAE Measurements

Ear-canal sound pressure was recorded for multitone inharmonic stimuli previously shown to produce a significant enhancement effect (Byrne et al. 2011). A recording block consisted of eight trials, each containing three types of stimuli, as shown in the schematic illustration in Figure 2. The first stimulus, denoted by M, consisted of 47 tones gated on synchronously for 250 ms including 10-ms raised-cosine onset/offset ramps. The stimulus was obtained by generating 51 pure tones equidistantly spaced on a log-frequency scale between 250 and 8000 Hz and then removing two most proximal components below and above the 2-kHz component (four components total). The second stimulus in each trial, denoted by P_M, consisted of the same 250-ms components as the first stimulus preceded by a precursor consisting of 500-ms components at the same frequencies but with the 2-kHz component removed. The precursor was also gated with 10-ms ramps, and there was no silent gap between the precursor and the following complex (0-ms gap between the 0-V amplitude points on the envelopes). The individual components in stimuli M and P_M each had a level of 50 dB SPL and started at a 0° (sine) phase. The third stimulus, denoted by SUP, consisted of two tones. The higher-frequency tone of the pair, termed the “measured tone”, was the tone for which the SFOAE was measured and compared across the three types of stimuli. In the example in Figure 2, the measured tone has a frequency of 2 kHz, i.e., the frequency expected to be perceptually enhanced in the P_M stimulus based on the study by Byrne et al. (2011). In separate conditions, the measured tone was either the 2-kHz (target) tone or one of the four most proximal masker components with frequencies of 1.52, 1.62, 2.46, and 2.64 kHz. SFAOEs were measured for the masker components to determine the effect of the precursor on SFOAE magnitudes at masker frequencies surrounding the (2-kHz) target. The measured tone was presented at 50 dB SPL and was gated on for 1 s, including 10-ms raised-cosine onset/offset ramps. The second tone, termed the “suppressor tone” had a frequency 110 Hz below that of the target, started 500 ms after the onset of the measured tone and was gated off simultaneously with the measured tone. The suppressor tone was presented at a level of 70 dB SPL and was used to suppress the SFOAE evoked by the measured tone. The stimuli within each trial and between consecutive trials were separated by 2-s silent gaps to allow the auditory system to recover from the effects of MOCR activation by the preceding stimulus. The polarity of all components except that at the measured-tone frequency was alternated between consecutive trials for each of the three stimuli. The recorded trials were analyzed online for artifact rejection (pairs of consecutive trials with opposite polarities were rejected upon detection of artifacts). For each condition, a total of 50 artifact-free trials contributed to the final waveform that was averaged across all trials. This averaging removed the physical stimuli for all the components except the measured tone while preserving their effects on the SFOAE evoked by the measured tone.

FIG. 2
figure 2

Schematic illustration of the stimuli used in SFOAE measurements. The polarity of all components except the target (dark red line) was alternated between consecutive trials.

All the stimuli were generated digitally on a PC using Matlab and a LynxStudio LynxTwo-B sound card controlled by SoundMexPro and presented at a sampling frequency of 48 kHz. The stimuli were presented to the listeners using two transducers of an Etymotic Research ER10C assembly, with the target presented via a separate transducer from all the remaining components in each stimulus. The ear with stronger emissions or the right ear (in case of no obvious difference) was used for SFOAE measurements. The ear canal sound pressure was recorded via the ER10C microphone and digitized using the LynxTwo-B sound card before being stored for offline analysis.

SFOAE Analysis

To estimate the effects of the simultaneous 250-ms components and the precursor on the magnitude of the SFOAE evoked by the measured tone at each of its five frequencies, the SFOAE magnitude evoked by the measured tone in quiet was first extracted from the SUP stimulus using the suppression technique (Brass and Kemp 1993; Shera and Guinan 1999, 2003). The averaged SUP waveform was analyzed using the heterodyne method to obtain a complex-valued sound pressure waveform at the target frequency (Guinan et al. 2003; Backus and Guinan 2006; Wojtczak et al. 2015). An example of the result of this analysis is shown in Figure 3 for a 2-kHz measured tone from a single listener. The magnitude and phase of the heterodyned ear-canal sound pressure at 2 kHz are shown in the top and bottom panels of Figure 3A, respectively. To extract the magnitude and phase of the SFOAE, it was assumed that the sound pressure at 2 kHz during the 500-ms suppressor represents the source sound pressure alone because the emission evoked by the 50-dB SPL 2-kHz measured tone is entirely suppressed during the presence of the 70-dB SPL suppressor tone. The real and imaginary parts of the complex-valued ear-canal sound pressure were averaged within a 150-ms window centered within the suppressor (the green rectangle in the top panel of Fig. 3A), and the results were subtracted from the real and imaginary parts of every point of the averaged and heterodyned SUP stimulus. The vector subtraction of the ear-canal sound pressure measured during the suppressor tone from that for the measured tone alone allowed us to estimate the magnitude and phase of the SFOAE waveform evoked by the measured tone, as shown in the top and bottom panels of Figure 3B, respectively. Note that the subtraction yields the noise floor during the portion of the SUP stimulus with the suppressor tone.

FIG. 3
figure 3

Analysis of recorded waveforms for the extraction of the SFOAE. The magnitude and phase of the ear-canal sound pressure are shown in the top and bottom panels of A, respectively. Changes in the magnitude and phase of the ear-canal sound pressure due to suppressing the SFOAE are shown in the top and bottom panels of B. Data from one listener.

The final estimate of the magnitude of the SFOAE at each target frequency was obtained by averaging the real and imaginary parts calculated over a 150-ms window centered during the signal-alone interval (the black rectangle in the top panel of Figure 3B).

To estimate the SFOAE magnitudes during the M and P_M stimuli, waveforms recorded for these stimuli were averaged across 50 artifact-free trials and were subjected to heterodyning. This analysis yielded a complex-valued sound pressure waveform at the measured-tone frequency that was assumed to be the sum of the source sound pressure at that frequency and the emission evoked by the measured tone in the context of the M and P_M stimuli. The SFOAE waveforms during the M stimulus (SFOAEM) and the P_M stimulus (SFOAEP_M) were obtained by a vector subtraction of the estimated source sound pressure (the vector average during the 150-ms window shown by the green rectangle in Fig. 3A) from each point of the heterodyned M waveform and each point of the portion of the P_M waveform containing the target. Because the stimuli were gated off and on between the precursor (P) and the following stimulus (M), the heterodyning procedure described above resulted in artifactual irregularities in the resultant waveforms over the first 30 ms or so of the measured tones. Consequently, it was not possible to use the parts of the measured tones for which the effects of the precursor on the SFOAEs were expected to be greatest. However, after the initial irregular segment, the extracted SFOAE magnitude and phase waveforms did not exhibit systematic changes over the duration of the measured tone. Based on the results in the study by Backus and Guinan (2006), the effect of the precursor was expected to extend over at least the first 250 ms of the target component (25–30 ms MOCR latency plus about 230-ms subsequent buildup of the efferent effect). Since after the initial transient no systematic changes in SFOAEs were observed, final estimates of the SFOAEM and SFOAEP_M magnitudes were calculated from vector averages of the resultant vector-difference waveforms over 150-ms windows centered within the respective waveforms derived from the measured tones.

Psychophysical Enhancement Measurements

Detection of a 250-ms 2-kHz tone (signal) was measured using a three-interval forced-choice (3IFC) procedure combined with an adaptive two-down one-up tracking rule estimating the 70.7 % correct point on the psychometric function (Levitt 1971). The detection of the signal was performed for three stimulus conditions, tested in separate blocks and presented in a different random order for each subject. In one condition, in which the signal was detected in quiet, two observation intervals contained silence and one interval, selected randomly on each trial, contained the signal. In the second condition, two observation intervals contained only the masker, consisting of the non-signal components of the stimulus M (see Fig. 2) in the SFOAE measurements, and one interval (again selected at random on each trial) contained both the masker and the 2-kHz signal. In the third condition, the signal interval contained the masker-plus-signal preceded by the precursor (equivalent to the stimulus P_M in Fig. 2) and the other two intervals contained the same stimulus with the signal component removed. In all three conditions, listeners had to detect the interval with the signal and provide their response via a mouse click or a key press. Visual feedback indicating the correct response was provided after each trial. The duration, the ramps, the precursor-masker gap, and the component levels were the same as for the stimuli used in the SFOAE measurements, except for the level of the 2-kHz signal, which was varied adaptively. The signal level was initially set to a value that was clearly audible. This level was decreased by 8 dB after two consecutive correct responses and increased by the same amount after each incorrect response until the second reversal in the direction of changes in signal level was obtained. After that, the step size in the adaptive tracking was reduced to 4 dB for the next two reversals and to 2 dB for the remaining eight reversals. A run terminated after a total of 12 reversals and the threshold was calculated by averaging the signal level at the final eight reversal points. Three single-run threshold estimates were averaged to obtain the final threshold for each condition and listener.

The stimuli were generated digitally on a PC using Matlab with a sampling rate of 48 kHz and were played out via a 24-bit LynxStudio Lynx22 sound card. The stimuli were presented monaurally (to the left ear) via a Sennheiser HD 580 headset. Stimulus presentation, feedback, and data collection were controlled by the AFC program under Matlab (Ewert 2013).

Results

Otoacoustic Emissions at 2 kHz

SFOAEs, measured using the suppressor technique, were considered significant when their magnitude exceeded the noise floor by more than two standard deviations of the mean noise-floor magnitude and was significant according to the one-tailed Welch’s t-test. Three listeners (one female, two males) did not have significant SFOAEs at 2 kHz, the frequency for which psychophysical enhancement has been observed using comparable stimuli (Byrne et al. 2011), and thus their data were excluded from further analyses of the otoacoustic emissions.

The SFOAE evoked by a 2-kHz probe in quiet (probe-alone portion of the SUP stimulus; SFOAEQ) was compared with the SFOAE magnitudes in the context of M (SFOAEM) and P_M (SFOAEP_M) stimuli. No changes in the extracted SFOAE magnitude and phase waveforms were observed over the duration of the probe-alone portion of the SUP stimuli (top and bottom panels of Fig. 3B), indicating that the 50-dB SPL probe did not elicit measurable MOCR effects. The lack of efferent effects elicited by low-to-moderate level pure tones is consistent with a number of studies, which have shown that pure tones are ineffective elicitors of the MOCR at levels below 60–70 dB SPL (e.g., Guinan et al. 2003; Lilaonitkul and Guinan 2009; Walsh et al. 2010). Lilaonitkul and Guinan (2009) reported small effects elicited by a 45-dB SPL ipsilateral pure tone with a frequency about half an octave above a 1-kHz probe, but these effects were expressed in terms of a change in SFOAE (i.e., ΔSFOAE) and thus could result from a change in SFOAE magnitude, phase, or both. In their later study, Lilaonitkul and Guinan (2012) showed that for probe frequencies of 1 kHz and higher, ipsilaterally presented off-frequency tonal elicitors with a level of 60 dB SPL produce a change in SFOAE phase (and thus a significant ΔSFOAE) without affecting the SFOAE magnitude. In addition, the effects observed for a 1-kHz probe by Lilaonitkul and Guinan (2009) have been shown to decrease with increasing frequency (Lilaonitkul and Guinan 2012). In this study, all the comparisons across conditions are performed for SFOAE magnitudes (as opposed to ΔSFOAE magnitudes which may be produced by changes in SFOAE phase only) since those would reflect elicitor-induced cochlear gain reduction. Because the suppressor tone was close in frequency to the target and presented at a level 20 dB above that of the target, it is reasonable to assume that it produced a complete suppression of the SFOAE evoked by the target (Brass and Kemp 1993; Shera and Guinan 1999). Because suppression on the BM is nearly instantaneous, the much slower effect produced by the activation of MOC efferents by the suppressor tone was unlikely to produce any additional suppression that would affect the baseline estimates of the SFOAEs.

Our first analysis involved a group-level comparison of the SFOAEs at 2 kHz in the three conditions tested: SFOAEQ, SFOAEM, and SFOAEP_M. Figure 4 shows levels, in dB SPL, of the SFOAE evoked by the 2-kHz tone for the nine listeners with significant emissions in these three conditions. As can be seen from the mean data shown in the far-right bars of Figure 4, there were no systematic differences in SFOAE amplitude across the three conditions. A repeated-measures analysis of variance (ANOVA) with SFOAE magnitude (expressed in dB SPL) as the dependent variable found no significant effect of condition [F(2,16) = 1.22, p = 0.32]. Our second analysis involved examining SFOAE at the level of the individual subjects. A bootstrapping analysis was used to determine if the presence of the precursor induced a significant change in the SFOAE magnitude compared to the no-precursor condition (SFOAEM versus SFOAEP_M) and if the SFOAE magnitude estimated from SUP stimulus (SFOAEQ) was significantly larger than SFOAEM and SFOAEP_M. The analysis used 25 pairs of consecutive recorded waveforms for stimuli SUP, M, and P_M with alternate polarities as the input. The 25 pairs were resampled with replacement and the mean magnitudes of the noise floor, the SFOAEQ, SFOAEM, and SFOAEP_M were calculated for each new sample. The procedure was repeated 1000 times for each of the three stimuli to create distributions for the three SFOAE magnitudes for each subject. Confidence intervals (95 %) were constructed around the mean of each distribution. SFOAEM and SFOAEP_M were considered as significantly different (lower) than SFOAEQ when their mean bootstrapped magnitudes fell outside of the 95 % confidence interval for the SFOAEQ. This was true only for listener S6. The effect of the precursor was considered significant if the mean bootstrapped magnitudes of SFOAEP_M fell outside of the 95 % confidence interval of the mean estimate of the SFOAEM. None of the listeners showed a significant effect of the precursor on the SFOAE magnitude according to this analysis. Based on these analyses at the group and individual levels, neither the precursor nor the presence of the masker had a significant effect on the SFOAE produced by the 2-kHz probe tone. This outcome may be because (1) any MOCR efferent effect of the precursor is sufficiently frequency selective to have no effect on the 2-kHz probe and (2) the masker components are sufficiently low in level and remote in frequency from the 2-kHz probe to produce no measurable suppression of the 2-kHz SFOAE.

FIG. 4
figure 4

Estimated magnitudes of the SFOAE during the SUP (open bars), the M (filled bars), and the P_M (hatched bars) stimuli. The rightmost bars show the average from the nine subjects, and the error bars denote one standard error of the mean.

Otoacoustic Emissions at Surrounding Masker Frequencies

Efferent activation could contribute to the enhancement effect by reducing the cochlear gain applied to the masker components in the precursor relative to the gain applied to the signal component. To test this hypothesis, SFOAEM and SFOAEP_M magnitudes were estimated for the two masker components directly below and the two masker components directly above 2 kHz. It was assumed that a reduction in the response to the neighboring components would have the greatest effect on the salience of the 2-kHz component. Figure 5 shows dB differences between the magnitudes of SFOAEP_M and SFOAEM, for the 2-kHz component (red symbol), two most proximal components below 2 kHz (open blue symbols), and two most proximal components above 2 kHz (filled blue symbols).

FIG. 5
figure 5

Differences between the levels of SFOAEP_M and SFOAEM for the enhanced component (red symbol), two nearest components in the masker below the signal (open blue symbols), and two nearest components above the signal (filled blue symbols). Data above the horizontal line corresponding to 0 dB indicate cases where the precursor produced an increase in the SFOAE magnitude. The mean level differences are shown by the rightmost symbols, and the error bars show one standard error of the mean.

The results would be consistent with the hypothesis linking MOCR effects to enhancement if the precursor produced a greater reduction of the SFOAEP_M relative to the SFOAEM for the components surrounding 2 kHz than at the signal frequency itself. In other words, relative enhancement would be observed if the blue symbols in Figure 5 consistently fell below the red symbols. The red symbols fall close to the 0-dB line, reflecting the lack of a significant effect of the precursor on the SFOAE at 2 kHz, as was shown in Figure 4. As shown by the blue symbols, the SFOAEP_M was not systematically reduced relative to the SFOAEM for the components that were present in the precursor. A group-level analysis using a repeated-measures ANOVA confirmed no significant effect of measured frequency on the dB difference between SFOAEP_M and SFOAEM [F(4,32) = 0. 38, p = 0.83]. The bootstrapping analysis described above for the SFOAEs at 2 kHz was performed for the SFOAEQ, SFOAEM, and SFOAEP_M at the frequencies below and above 2 kHz to determine if significant precursor effects on the SFOAE were observed for individual listeners. In contrast to the efferent-based hypothesis, for listeners S2, S3, and S4, the mean SFOAEP_M magnitude for the 2.46-kHz component was significantly greater than the magnitude of the SFOAEM (i.e., the opposite of what would be predicted by the efferent-based explanation of enhancement) and the magnitude of the SFOAEQ, according to the criterion based on the 95 % confidence interval. The increase in the magnitudes of SFOAEs evoked by masker components after the precursor observed for listeners S2, S3, and S4 is also inconsistent with known effects of MOC efferent activation on SFOAEs. The possible reasons for this result are addressed in the discussion below.

Behavioral Data

Psychophysical data for all the 12 listeners are shown in Figure 6. Listeners S10–S12 were not included in the SFOAE analysis because they did not have significant SFOAEs at 2 kHz. The filled and open bars show thresholds for detecting the 2-kHz component in the stimulus presented without (M) and with (P_M) the precursor, respectively. The rightmost set of bars shows the mean across the listeners. For each listener, the threshold for detecting the 2-kHz signal was lower when the precursor was present, consistent with the enhancement effects observed in different psychophysical tasks using the same stimuli (Byrne et al. 2011). A paired-samples t-test confirmed a significant effect of the precursor [t(11) = 8.67, p < 0.001]. The effect was also significant when only data from the nine listeners who had significant SFOAEs were used [t(8) = 7.57, p < 0.001]. The average improvement in the detection of the signal was 5.7 dB, an amount smaller than that reported in earlier studies which used harmonic complexes (Viemeister 1980; Viemeister and Bacon 1982) but comparable to that reported by Byrne et al. (2011) for the same spectral composition of the stimuli as in this study.

FIG. 6
figure 6

Thresholds for detecting a 2-kHz signal in the no-precursor (filled bars) and the precursor-present (open bars) conditions. The error bars denote one standard error of the mean.

Discussion

The psychophysical enhancement effect measured in our listeners was very robust and was in good agreement with the results from earlier psychophysical studies. However, this robust behavioral result was not reflected in the magnitudes of SFOAEs measured in the same listeners for the same stimuli.

Before ruling out the contribution of the MOCR to the auditory enhancement effect, an important aspect of the SFOAE data from this experiment should be addressed. It was expected that the magnitudes of the SFOAEP_M would be reduced compared to the SFOAEM, at least for the components of the M stimulus that were present in the precursor, due to efferent activation and a resulting decrease in cochlear gain at these frequencies. However, as shown in Figure 5 (positive values of the dB difference between the SFOAEP_M and SFOAEM), for some listeners the precursor led to an apparent increase of the SFOAE magnitude. In addition, in six out of nine listeners, the estimated SFOAEM and SFOAEP_M were larger in magnitude than the baseline SFOAEQ for at least one of the five target frequencies tested. Since the tonal suppressor was presumed to completely eliminate the SFOAE and since the precursor and masker were expected to decrease (or at least, not increase) cochlear gain, the magnitude of the SFOAEQ stimulus should be the largest possible across the three types of stimuli used (shown in Fig. 2). The fact that this was not always the case suggests that the effects of the precursor on the ear-canal sound pressure at the target frequency (and at other component frequencies of the M stimulus) may have resulted from changes in the middle-ear acoustic impedance due to the activation of the middle-ear-muscle reflex (MEMR) by the precursor and the multi-component masker itself. When the assumption about constant middle-ear impedance is violated, changes in SFOAE cannot be measured reliably. To address this possibility, the measurements were performed with modified (narrowband) stimuli presented at lower levels that were unlikely to elicit the MEMR (Guinan et al. 2003).

EXPERIMENT 2: THE ROLE OF MOC EFFERENTS IN ENHANCEMENT FOR NARROWBAND STIMULI

Rationale

Methods for estimating efferent effects from changes in the ear-canal sound pressure require that the middle-ear impedance is not affected by the presence of the MOCR elicitor (Guinan et al. 2003). Based on wideband measures of MEMR threshold (Feeney and Keefe 2001; Feeney et al. 2004), it was assumed in experiment 1 that the multitone complex presented at an overall level of 67 dB SPL was not intense enough to elicit the MEMR. However, Guinan et al. (2003) measured effects of the MOCR elicited by broadband noise on the SFOAE evoked by a 1-kHz tone and found that, in some listeners, changes in the ear-canal sound pressure due to the noise were dominated by the MEMR for a noise level as low as 65 dB SPL. Although clinically and perceptually relevant changes in the ear-canal sound pressure due to the MEMR activation are confined to frequencies below 2 kHz (Møller 1965), wideband measurements of middle-ear reflectance show changes in transmission of frequencies well above 2 kHz (Schairer et al. 2007). These changes could interfere with the SFOAE-based measures of the effects of efferent activation in the higher frequency region (≥2 kHz) and could be a reason why, in selected cases in experiment 1, the magnitudes of the SFOAEM and SFOAEP_M were actually larger (rather than smaller) than the magnitude of the SFOAEQ estimated using the suppression technique. For a given stimulus level, wideband stimuli are known to be substantially more effective in eliciting the MEMR than narrowband stimuli (Feeney and Keefe 2001; Feeney et al. 2004). In this experiment, the number of components around the signal frequency (2 kHz) and the level per component were reduced to lower the overall level of the precursor to a value that fell below any reported threshold for the MEMR activation.

In experiment 1, significant SFOAEs could not be obtained at 2 kHz for a few listeners. One reason for the lack of measurable SFOAEs could be that, for these listeners, 2 kHz corresponded to a minimum in the SFOAE rippled spectral pattern resulting from the interference of the forward traveling wave and partial reflections of the backward propagating SFOAE from the stapes (Shera and Cooper 2013). It has been shown that measurements of efferent activation using distortion product otoacoustic emissions (DPOAEs) are most reliable for frequencies around the peak of the DPOAE fine structure (Abdala et al. 2009). Because SFOAE interference patterns show more pronounced maxima and minima as the evoking stimulus level decreases (Shera and Cooper 2013) and because the level of each component was reduced in this experiment to avoid the MEMR activation, the exact frequencies of the signal component and the two components surrounding the signal on each side were adjusted individually for each listener to fall around the peaks in the SFOAE spectral patterns.

Listeners

Eight listeners (five female, three male) with normal hearing participated in this experiment. Their ages ranged from 18 to 50 years (median 21 years). Three of these listeners had participated in experiment 1. The listeners had audiometric thresholds below 15 dB HL at octave frequencies between 250 and 8000 Hz. Newly recruited listeners underwent screening for SOAEs, described in experiment 1, and received a short (about 15 min) practice in the psychoacoustic task with equal number of runs in conditions with and without a precursor.

Stimuli and Procedure

SFOAE Measurements

The experiment began by obtaining an SFOAE spectral interference pattern for each participant. SFOAEs were measured for frequencies in the range from 1450 to 2700 Hz in steps of 50 Hz using the suppression technique (Brass and Kemp 1993; Guinan et al. 2003). For these measurements, a block consisted of eight trials, each containing a 1-s test tone and a 0.5-s suppressor presented during the final 0.5-s of the test tone. The tones were gated with 10-ms raised-cosine onset/offset ramps. The trials were separated by a 1-s silent gap and the polarity of the suppressor tone was alternated between consecutive presentations. The test tones were presented at 45 dB SPL, and the suppressor tones were presented at 65 dB SPL. For each test tone, the suppressor had a frequency 110 Hz lower than the test tone. SFOAE magnitudes were estimated based on 50 artifact-free trials. The analysis performed to extract the SFOAE magnitudes was identical to that described in experiment 1 for extracting the magnitude of the baseline SFOAEQ from the SUP stimulus.

Five frequencies that fell around the peaks of the SFOAE magnitude spectral pattern and were closest to the nominal frequencies of the five components analyzed in experiment 1 (1.52, 1.62, 2.0, 2.46, and 2.64 kHz) were selected individually for each listener as components constituting the entire M stimulus (see Fig. 1) for testing the hypothesis about the contribution of MOC efferent effects to the enhancement effect. The mean deviation of the individual component frequency from the nominal frequency was 37.4 Hz (1.8 % the nominal frequency), and the deviation never exceeded 180 Hz (6.8 % the nominal frequency). Adjacent components were separated in frequency by at least 100 Hz. The P_M stimulus consisted of the same components with the component closest to 2 kHz (signal) removed. SFOAEs were measured for each component of the M stimulus. Because each component was presented at 45 dB SPL, the precursor had an overall level of 51 dB SPL, which was low enough to avoid MEMR activation even by a broadband noise known to be the most effective MEMR elicitor (Guinan et al. 2003). With only four components, the narrowband precursor was unlikely to elicit the MEMR. The blocks were run by stepping from the lowest to the highest of the five frequencies. For each frequency, the SFOAEQ, SFOAEM, and SFOAEP_M were estimated from 50 artifact-free trials. The method of stimulus presentation and acquisition and the equipment were the same as for the SFOAE recordings in experiment 1.

Psychophysical Enhancement Measurements

The psychophysical task was performed using the four-component masker and precursor to make sure that a significant enhancement effect was observed with the reduced number of components and the reduced level. As in SFOAE measurements, each component was presented at 45 dB SPL, but the components were presented at their nominal values of 1.52, 1.62, 2.0, 2.46, and 2.64 kHz because some listeners completed the psychophysical task before the SFOAE spectral pattern was measured. Applying the small frequency adjustments similar to those used in SFOAE measurements would be unlikely to have had a significant effect on the amount of enhancement. All the remaining experimental parameters and the equipment were the same as in experiment 1.

Results

Otoacoustic Emissions

One listener did not have significant SFOAEs for any of the frequencies around the nominal values and thus data from only seven listeners were analyzed. As in experiment 1, SFOAEM and SFOAEP_M magnitudes could not be reliably determined and compared over the initial 30–50 ms of the evoking stimuli due to the signal processing performed to extract the emissions. Since the MOCR effect has a latency of about 25–30 ms and a buildup time of about 230 ms (Backus and Guinan 2006), changes in SFOAE magnitude due to efferent activation were expected to occur over the entire duration of the 250-ms SFOAE waveforms. The fastest MOCR-induced decrease in SFOAE magnitude was expected to occur during the first half of the 250-ms waveforms, consistent with the short time constant of about 70 ms estimated by Backus and Guinan. To determine if significant differences in SFOAE magnitudes due to the precursor occurred between the first and the second half of the 150-ms window centered within the SFOAEP_M waveforms, a bootstrap analysis was performed using individual data. For the signal and each masker component, 25 recorded SFOAEP_M waveform pairs were sampled 10,000 times with replacement, separately for the first and the second half of the 150-ms central segments, and the difference between SFOAEP_M magnitudes derived from each half was calculated. The 5th and 95th percentiles of the difference distribution were used to test whether a mean difference of 0 dB was credible. Table 1 shows two-sided p values reflecting the significance of the difference for each listener and each frequency component. In no case was the difference between the SFOAEP_M during the first and the second half of the 150-ms analysis window significant. Since no systematic changes in SFOAE magnitudes were observed over the course of the evoking tones, magnitudes of the SFOAEQ, SFOAEM, and SFOAEP_M were calculated from vector averages within the center 150-ms waveform segments. Figure 7 shows these mean SFOAE magnitudes at the signal frequency for the seven listeners. The bars show the baseline SFOAEQ (open bar), the SFOAEM (filled bar), and the SFOAEP_M (hatched bar). A repeated-measures ANOVA with the SFOAE level as the dependent variable showed no significant effect of condition [F(1,12) = 2.41, p = 0.13]. The bootstrap analysis on the individual data, as described in experiment 1, also showed substantial overlap of the distributions for the SFOAEQ, SFOAEM, and SFOAEP_M magnitudes for each listener. Based on these distributions, no significant effects of the stimulus condition on the magnitude of the SFOAE at the signal frequency was observed, thus suggesting no significant enhancement of the response to the signal on the BM for the P_M stimulus compared with that for the M stimulus.

TABLE 1 Two-tailed p-values for comparisons of bootstrapped SFOAEP_M distributions calculated using the first and second halves of the 150-ms segment centered within SFOAE waveform for the signal and flanking masker components. None are lower than 0.06, suggesting no significant effects at either the group or individual levels
FIG. 7
figure 7

As in Fig. 3 except for a narrow multi-tone complex masker and precursor.

A bootstrap analysis of the SFOAE magnitudes for the remaining components of the M stimulus showed no significant decrease of SFOAEP_M magnitudes relative to the SFOAEM magnitudes for any listener, suggesting no significant decrease of the BM response to the components in the M stimulus that were preceded by their counterparts in the precursor. Figure 8 shows the dB differences between the magnitudes of the SFOAEP_M and SFOAEM. The dB differences fall close to the line corresponding to zero for the signal component (red symbol) and the four remaining components in the M stimulus (open and filled blue symbols). At a group level, a repeated-measures ANOVA showed no significant effect of stimulus frequency on the dB difference between the SFOAEP_M and SFOAEM magnitudes [F(4,24) = 1.10, p = 0.38)].

FIG. 8
figure 8

As in Fig. 5 except for a narrow multi-tone masker and precursor. The standard errors are not visible because they did not exceed the size of the symbols representing the mean dB differences.

To provide a clearer interpretation of our failure to reject the null hypothesis, a Bayesian parameter estimation was performed, and a Bayesian credible interval was built for the difference between the SFOAEM and SFOAEP_M magnitudes using the procedure described by Kruschke (2013). The difference between the two magnitudes was described by a t-distribution, and the posterior distributions for the mean, standard deviation, and normality of that distribution were estimated for the group data pooled across target frequency. The estimated distribution plotted in Figure 9A shows that the endpoints of the 95 % credibility interval (denoted by the horizontal red line) that encompasses a 0-dB difference between the SFOAEM and SFOAEP_M are at −0.36 and 0.65 dB. Figure 9B shows a comparison between a random sample of the predicted distributions (blue lines) and the histogram of the actual data (red bars). The plot demonstrates a good fit of the model to the data. The Bayesian parameter estimation analysis was repeated for individual frequency components to determine whether pooling data across frequency obscured significant differences. In no case was the difference between the SFOAEM and SFOAEP_M credible.

FIG. 9
figure 9

The Bayesian parameter estimation for the difference distribution between SFOAEM and SFOAEP_M: A the mean predicted difference of 0.1 dB is denoted by the vertical dotted line and the Bayesian credible interval (<1 dB) is shown by the horizontal red line. B A comparison between a random sample of model predictions (Kruschke 2013) and the histogram of the data pooled across target frequencies.

Behavioral Results

Figure 10 shows thresholds for detecting the 2-kHz signal in the no-precursor condition (filled bars) and in the presence of the precursor (open bars). Listener S8 did not show significant SFOAEs but his/her psychophysical data are included in this figure. Thresholds for detecting the signal were lowered by the presence of the precursor in all listeners. The average decrease in threshold was 5.1 dB. A paired-sample t-test showed that the improvement in signal detectability, i.e., the enhancement effect, was statistically significant [t(7) = 4.29, p = 0.004]. An independent-samples t-test comparing the sizes of the enhancement effect for the wider stimulus in experiment 1 and the narrower stimulus in experiment 2, as determined by the difference between detection of the signal in the absence and presence of the precursor, showed that decreasing the number of components and the level per component did not result in a significant reduction in enhancement [t(17) = 0.51, p = 0.62].

FIG. 10
figure 10

As in Fig. 4 except for a narrow multi-tone masker and precursor.

Discussion

Changing the configuration of the stimuli in experiment 2 appears to have eliminated confounding effects in the measurements of the SFOAEs involving MEMR, while preserving the same strength of the psychophysical enhancement. Despite the robust psychophysical enhancement effect, none of the listeners showed changes in SFOAEs consistent with an increase in cochlear gain at the signal frequency or a decrease in gain at masker frequencies, induced by the precursor. The lack of significant efferent effects in the SFOAE measurements on the SFOAEs evoked by the masker components may appear inconsistent with significant changes in SFOAEs due to on-frequency elicitors obtained in previous studies that used the same signal processing algorithm to extract the SFOAE from the source sound pressure (Guinan et al. 2003; Lilaonitkul and Guinan 2012; Wojtczak et al. 2015). However, these studies used elicitors that were 20 dB more intense than the tone evoking the SFOAE whereas the elicitor in experiment 2, with only four frequency components, had an overall level that was only 6 dB higher than each individual component in the M stimulus. In addition, the stimulus used in experiment 2 was sparse in frequency, having only five tonal components with 0.6-octave gaps between the center component and the closest masker components. Efferent effects have been shown to be much weaker for pure tones than for noise bands with the same overall level, with tonal elicitors producing no significant efferent effects for levels below 60–70 dB SPL (Guinan et al. 2003; Walsh et al. 2010; Lilaonitkul and Guinan 2012). Consistent with these reports, none of the listeners in this study exhibited changes in SFOAE magnitude in the SUP condition during the first 500 ms for any of the measured components indicating that no component elicited measurable efferent effects when presented in isolation. Due to the small number and the sparseness of stimulus components, the precursor and the masker-plus-signal stimuli in experiment 2 may have been closer to a pure tone than to a noise band with comparable overall bandwidth in terms of their effectiveness as elicitors of MOCR effects. Given these stimulus characteristics, the results of experiment 2 are not inconsistent with previous SFOAE-based measures of efferent effects but they are not consistent with the hypothesis that activation of MOC efferents produced adaptation of suppression or that MOC efferents decreased the cochlear gain at the masker frequencies more than at the signal frequency, thereby contributing to the enhancement of the response to the signal, relative to the response to the masker.

GENERAL DISCUSSION

The aim of the experiments performed in this study was to provide a direct (physiological) test of the hypothesis that MOC efferent activation contributes to the enhancement effect observed psychophysically (Viemeister 1980; Viemeister and Bacon 1982; Thibodeau 1991, 1996; Byrne et al. 2011, 2013). The contribution could occur in the form of relative enhancement, whereby components of the precursor suppress, via frequency-selective MOC efferent feedback (Lilaonitkul and Guinan 2012), the response to the components of the masker more effectively than the response to the signal, due to the spectral notch in the precursor around the signal frequency. Such relative enhancement of the response to the signal would account for the increase in the signal salience in the presence of the precursor but would not on its own account for its increased detectability (Viemeister 1980) and increased effectiveness of the signal as a forward masker (Viemeister and Bacon 1982; Thibodeau 1991). However, MOCR-induced relative enhancement at the peripheral level could set up a stage for adaptation of lateral inhibition at central sites of neural auditory processing (Nelson and Young 2010). Another way in which MOC efferents could contribute to the enhancement effect is via adaptation of suppression (Viemeister and Bacon 1982). Such absolute enhancement would occur if the components in the precursor suppressed (via MOC efferent feedback) the BM response to the components in the masker, and reduced the effectiveness of the masker as a suppressor of the BM response to the signal via longitudinal coupling in the BM and the organ of Corti (Naidu and Mountain 2001; Meaud and Grosh 2010; Eze and Olson 2011). SFOAEs were used to test the MOC-efferent based hypothesis because they provide a noninvasive physiological window into cochlear mechanics (Shera and Guinan 1999).

In experiment 1, stimuli were designed to closely resemble those used in psychophysical experiments that showed significant enhancement (Byrne et al. 2011). For these stimuli, the SFOAEs measured with and without the precursor did not show changes in magnitude consistent with either relative or absolute enhancement of the response to the signal component in the presence of the precursor. However, there was a possibility that changes in SFOAE magnitudes for these stimuli were confounded by the MEMR activation, the effect of which could obscure the effects of efferent activation consistent with the working hypothesis.

In experiment 2, the stimuli were modified to preclude activation of the MEMR. The stimuli had fewer components and lower level per component. The psychophysical task in experiment 2 showed a robust enhancement effect, as determined by significantly improved detection of the signal in the presence of the precursor compared with the detection of the signal in the no-precursor condition. For the narrower precursor and masker, SFOAEs measured following the precursor were not significantly different from those measured in the absence of the precursor. This was true for the signal component as well as for every component in the masker. Since for the stimuli used in experiment 2, confounding effects on the measured SFOAE magnitudes were unlikely, the results are not consistent with the hypothesis that MOC efferent effects contribute to the enhancement effect and suggest that the effect emerges at stages of the auditory processing central to the cochlea.

It is important to note that the results of this study are not inconsistent with physiological responses of auditory-nerve fibers in guinea pig (Palmer et al. 1995) measured for stimuli producing the enhancement effect. Palmer et al. found a reduction in firing rates for tones surrounding the enhanced stimulus that could account for relative enhancement of the signal component, but this reduction was likely due to adaptation in the synaptic connection between the inner hair cells and the spiral ganglion, and thus would not be revealed in changes in the magnitude of the SFOAE. The data are also consistent with expectations based on the study of Wang et al. (2012), who found significant enhancement in cochlear-implant users, for whom the cochlea (and hence the MOC efferent system) is bypassed. However, the results are not as expected based on the data of Thibodeau (1991), who reported reduced or absent enhancement in listeners with cochlear hearing loss, presumably based on their loss of cochlear gain.

Although the data shown here are not consistent with the hypothesis that MOC efferents contribute to the auditory enhancement effect, there remains a possibility that the method used in this study is not sensitive enough to show changes in cochlear gain that result in enhancement. The psychophysical enhancement was on average 5.1 dB for the listeners in this study. Because the size of the enhancement effect is expressed in terms of the stimulus (input) level, the difference between the output levels on the BM for the signal presented with and without the precursor is likely much smaller due to the compressive growth of the BM response. Assuming that the SFOAE magnitude is growing at a rate similar to the growth of the BM response in the region of the characteristic frequency equal to the signal frequency, a change in cochlear gain required to produce the 5.1-dB enhancement could be quite small. For example, assuming that the slope of the BM input/output function is 0.2, which is the typical estimate of cochlear compression from physiological and psychophysical studies (Yates 1990; Ruggero 1992; Oxenham and Plack 1997; Nelson et al. 2001; Lopez-Poveda et al. 2003), the 5.1-dB difference in signal level would correspond to about a 1-dB change in the BM response to the signal due to the precursor. This change in cochlear gain, and thus the SFOAE magnitude, could be swamped by the variability in the SFOAE measurements. However, because of the relatively low level of each tone (45 dB SPL), the signal and masker components were likely subjected to less than average compression (e.g., Lopez-Poveda et al. 2003). In addition, based on the deduced 1-dB change in SFOAE required to produce the psychophysical enhancement in detection, a Bayesian region of practical equivalence for demonstrating no effect of the precursor can be established around differences in SFOAE magnitudes between −1 and 1 dB (Kruschke 2013). The fact that 100 % of the posterior distribution fell within this region confirms the credibility of the null hypothesis (see Fig. 9A). The majority of the 95 % highest density interval fell within a region between −0.5 and 0.5 dB. The result of this analysis shows that the change in SFOAE due to the precursor did not reflect the required change in BM response necessary to account for the average psychophysical enhancement effect, assuming 0.2 compression. In some listeners, changes in threshold for detecting the signal were as large as 10–20 dB (e.g., S3 and S6 in Fig. 8). This effect size should be sufficient to produce significant changes in SFOAE magnitude, if the effect originated from efferent-induced changes of the BM response. Despite that, no listener in this study exhibited significant differences in SFOAE magnitudes between the M and P_M stimuli.

The temporal spacing between the stimuli during the SFOAE recordings was chosen to allow for a complete recovery from efferent effects elicited by the stimuli in each trial, based on the MOCR decay time estimated for humans by Backus and Guinan (2006). In that study the buildup and decay times for the MOCR were measured only at one frequency, 1 kHz. However, much longer decay times for the changes in cochlear responses that may have been due to MOC efferent activation have been reported at higher frequencies (Goodman and Keefe 2006; Wojtczak et al. 2015). If the decay time of the MOCR increases with frequency, the 2-s silent gaps between the stimuli may not have been sufficient for the system to completely recover from efferent activation, which would result in smaller differences between SFOAEs measured in the context of the three stimuli in a trial (see Fig. 2). However, in the psychophysical task, the silent inter-stimulus-intervals were only 500-ms long and robust enhancement effects were still observed. Thus, it is unlikely that the inconsistency between the lack of changes in SFOAE magnitudes due to the precursor and the efferent-based hypothesis could have resulted from an incomplete recovery from the MOCR.

In summary, the results of this study do not provide support for the hypothesis that the enhancement effect has a component originating in cochlear mechanics, suggesting that the mechanisms underlying the effect originate from higher stages in the auditory pathways. This finding is consistent with a recent study that found no enhancement in the ASSR in humans (Carcagno et al. 2014). Carcagno et al. suggested that the effect may originate in the cortical sites, although they did not rule out the possibility that their method may not have captured responses from neurons in the brainstem that may have shown enhanced responses but may not have phase-locked to the modulation rate that they used to keep track of (“tag”) the signal component. A number of centrally generated mechanisms of enhancement have been suggested in other studies (Viemeister and Bacon 1982; Byrne et al. 2011; Erviti et al. 2011; Carcagno et al. 2012, 2013; Demany et al. 2013). Adaptation of neural inhibition in the midbrain has physiological support from recordings of physiological responses in the inferior colliculus of awake marmoset monkey (Nelson and Young 2010). Our results are consistent with the hypothesis that auditory enhancement first emerges at levels of the auditory system higher than the cochlea.