INTRODUCTION

Envelope fluctuations, the relatively slow modulations of sound amplitude, carry important information for speech understanding in quiet and in noise (e.g., Rosen 1992; Drullman 1995; Zeng et al. 2005; Lorenzi et al. 2006). In a complex acoustic background, information contained in the speech envelope is often degraded by masking in the amplitude-modulation (AM) domain (Bacon and Grantham 1989; Dau et al. 1997a, 1997b; Wojtczak 2011): AM components in the envelope of the background stimulus mask AM components in the envelope of the speech signal. In fact, speech intelligibility can be successfully predicted based on the speech-to-noise envelope-power ratio (Jorgensen and Dau 2011; Stone et al. 2012; Jorgensen et al. 2013; Jorgensen et al. 2015).

Hearing-impaired individuals often exhibit reduced speech intelligibility, even if sounds are amplified to control for audibility, especially in a noisy background (e.g., Plomp 1978; Lorenzi et al. 2006). One reason for the suprathreshold deficits may be that one of the peripheral efferent-based mechanisms, the medial olivocochlear reflex (MOCR), does not provide the same benefits in ears with outer hair cell (OHC) damage as it does in ears with normal OHC function. The reflex is thought to improve neural representation of speech in noise (Giraud et al. 1997; Kumar and Vanaja 2004; Mishra and Lutman 2014), and it is reduced or absent in the auditory system with cochlear damage (Guinan 2006). Because the MOCR reduces cochlear responses via direct synaptic connection with OHCs, it has been studied using otoacoustic emissions, since their magnitude is related to the cochlear amplifier gain (Shera and Guinan 1999; Shera and Guinan 2003). Many studies of efferent activation have used stimulus frequency otoacoustic emissions (SFOAEs), as they provide the most frequency-specific window into cochlear responses (Guinan et al. 2003; Backus and Guinan 2007; Lilaonitkul and Guinan 2009a; Wojtczak et al. 2015).

Because the MOCR is relatively slow (Backus and Guinan 2006), when activated by noise, it is thought to enhance neural representations of transient sounds in noise due to the combined effects of cochlear gain reduction and rate adaptation in the auditory nerve (Guinan 2006; Marrufo-Perez et al. 2018b). This concept is schematically illustrated in Fig. 1, based on single-neuron responses measured with and without electrical stimulation of medial olivocochlear (MOC) efferents by Winslow and Sachs (1987).

Fig. 1
figure 1

Schematic illustration of auditory-nerve rate-level functions for a tone in three conditions: (i) in quiet (black curve), (ii) near the onset of a simultaneous masking noise (red curve), and (iii) in the masking noise with the MOCR activated. Hypothetical responses to AM tone in the three conditions are illustrated by two cycles of a half-wave-rectified AM tone. The modulation depth of the fiber’s response to an AM tone in noise is decreased compared to that for a tone in quiet due to the shallow slope of the rate-level function (red vs. black symbols). The modulation depth is restored when the MOCR is activated (blue symbol)

A number of psychophysical studies have implicated the contribution of the efferent system to various perceptual phenomena that appear consistent with the MOCR function, including the temporal effect or overshoot (von Klitzing and Kohlrausch 1994; Strickland 2004; Strickland 2008; Strickland and Krishnan 2005; McFadden et al. 2010), forward masking (Krull and Strickland 2008; Jennings et al. 2009; Roverud and Strickland 2010, 2014; Wojtczak et al. 2015), a reduction of a mid-level bump in intensity discrimination (Roverud and Strickland 2015a, 2015b), a reduction of a mid-level bump in AM detection (Almishaal et al. 2017; Jennings et al. 2018), and a temporal effect in AM detection (Marrufo-Perez et al. 2018b). However, an independent validation of the hypothesis that MOC efferents are involved in these perceptual effects has not been obtained with a few exceptions, where SFOAEs or compound action potentials were measured for comparable stimuli and conditions (Keefe et al. 2009; Walsh et al. 2010; Wojtczak et al. 2015; Verschooten et al. 2017). Evidence from the physiological measures for the role of MOCR effects in these perceptual phenomena has not been consistent across these studies. Some of them reported only partial support for the efferent-based hypothesis (Verschooten et al. 2017), and others found no support at all (Keefe et al. 2009; Wojtczak et al. 2015).

Given the effects of the MOCR on neural responses demonstrated in animal studies (Dolan and Nuttall 1988; Winslow and Sachs 1988; Kawase and Liberman 1993), an improved AM detection in noise due to a noise precursor, such as that recently reported by Marrufo-Perez et al. (2018b), could at least in part be due to efferent activation, even if not all aspects of the data appear consistent with efferent effects. Even a partial role of the MOCR would be important, because any benefit from it would likely be reduced or absent in hearing-impaired listeners (Jennings et al. 2018).

In this study, we used standard psychophysical measures to investigate the hypothesized contribution of the MOCR to the detection of brief bursts of AM of a tone embedded in noise. For a subset of stimuli, we also measured MOCR effects on SFOAEs using the elicitors and the SFOAE-evoking tones with the frequencies of the carrier tones for which large changes in AM detection thresholds were observed. The results were not consistent with the hypothesized changes in cochlear responses due to MOC efferent activation by noise precursors. Instead, the results suggest that the large AM unmasking effects arise from post-cochlear auditory processing.

EXPERIMENT 1: EFFECT OF A PRECURSOR ON AM DETECTION IN N

Rationale

In a recent study that investigated changes in AM detection in simultaneous masking noise as a function of the temporal position in the noise masker (Marrufo-Perez et al. 2018b), modulated tonal carriers started at the onset of the masker (early onset) or 300 ms after the masker onset (late onset). Because the onset of AM coincided with the onset of the noise masker in the early-onset but not the late-onset condition, the observed improvement in AM detection for the late onset may have simply resulted from transient masking in the early-onset condition (Bacon and Moore 1987), given that the AM contained only two modulation cycles (Sheft and Yost 1990). In our experiment, instead of delaying the AM tone from the onset, a short burst of gated noise was used as a simultaneous masker. A longer stimulus, gated off before the short noise masker, was used as a precursor to activate the MOCR.

To ensure that the AM detection task was challenging in the absence of MOC efferent activation and that there was substantial room for improvement in performance due to efferent feedback, the levels of short simultaneous noise maskers were set to values for which AM detection thresholds were high. Subsequently, AM detection thresholds in these noise maskers were measured for three types of precursor, schematically illustrated in Fig. 2. Based on published reports, the three precursor types should result in different MOCR effects and, consequently, different effects on AM detection thresholds for a tonal carrier presented in noise. Two of the precursors, a band of noise and a complex tone, had equal overall bandwidths but differed in spectral density. The third precursor was a pure tone known to be a relatively weak MOCR elicitor (Guinan et al. 2003; Lilaonitkul and Guinan 2009a; b). Our working hypothesis was that the noise precursor will be most effective at providing a release from AM masking and the pure-tone precursor will have little or no effect on AM masking by the simultaneous noise.

Fig. 2
figure 2

Schematic illustration of the spectrograms of the stimuli (not to scale) in the signal interval for three types of precursors. The modulated tonal carrier is depicted by the dashed line

Methods

Listeners

A total of 18 young normal-hearing listeners (six males, 12 females) aged from 18 to 28 years (mean = 20.7, SD = 2.5) participated in the study. Due to limited availability, not all listeners completed all experimental conditions as they required participation in multiple 2-h sessions. The listeners are consistently labeled across the manuscript for the ease of within-subject comparisons across different experimental conditions. Normal-hearing status (thresholds ≤ 15 dB HL) was determined by measuring hearing sensitivity at octave frequencies from 250 Hz to 8 kHz using a calibrated audiometer (Madsen Conera, GN Otometrics). For this and the subsequent experiment in this study, all listeners provided written informed consent and the protocol for the study was approved by the Institutional Review Board of the University of Minnesota.

Stimuli and Procedure

In all conditions, the listeners’ task was to detect 50-Hz AM in a tonal carrier presented with a simultaneous noise masker. The 50-Hz modulation rate was chosen as a compromise. On the one hand side, it was important to ensure that the duration of the target encompassed a few cycles of modulation while being shorter than the time of recovery from MOC efferent effects (Backus and Guinan 2006). On the other hand, the AM rate had to be low enough to allow room for improvement in AM detection. Sensitivity to AM in noise declines above 50 Hz due to modulation masking, as shown by the low-pass characteristic of the temporal modulation transfer function for detecting AM of a wide band of noise (Viemeister 1979; Dau et al. 1997a). The target consisted of two cycles of 50-Hz AM starting in a sine phase.

The experiment began by searching for the level of the noise masker for which AM detection thresholds without a precursor were poor but below ceiling (100 % AM) to allow room for substantial improvement in AM detection when a precursor was present. The desired noise level was first estimated using an adaptive two-alternative forced-choice procedure (2AFC) with a three-up one-down tracking rule estimating the 79.4 % correct point on the psychometric function (Levitt 1971). In each trial, one observation interval, selected at random, contained the AM tone with a fixed modulation depth of − 1.9 dB (80 % modulation depth) and the other contained the unmodulated tone. The modulated tone was scaled to have the same rms amplitude as the unmodulated tone to eliminate potential loudness cues. The observation intervals were marked by flashing colored boxes on a computer screen, and correct-answer feedback was provided after each trial. In both intervals, the tones were embedded in a two-octave-wide Gaussian noise band, geometrically centered on the carrier frequency. A new sample of noise was generated for each presentation. The duration of the noise was 100 ms, including 5-ms raised-cosine onset and offset ramps. The tonal carrier had a duration of 40 ms, including 5-ms raised-cosine ramps, and was delayed from the onset of the noise by 5 ms. While listeners performed AM detection, the noise level was varied adaptively using 8-dB steps for the first two reversals, 4-dB steps for the subsequent two reversals, and 2-dB steps for the remaining eight reversals. A run terminated after 12 reversals were obtained, and the noise level at threshold was defined as the mean of the levels at the final eight reversals. Noise levels from three runs were averaged to obtain the final estimate of the level for which the noise just masked AM at a depth of − 1.9 dB. Initially, it was assumed that a comparable threshold modulation depth would be observed when the noise is fixed at the level estimated using the above procedure and the modulation depth is varied adaptively to measure AM detection threshold. However, we found that this was not the case, and for almost all listeners, thresholds were lower (better) when measured with the fixed-level noise. In all such cases, the noise level needed to be iteratively increased by a few dB at a time, until AM detection thresholds from three consecutive runs were within the targeted range of − 6 to − 2 dB. This range was selected because pilot data showed that for thresholds within this range, the effects of the noise precursor on AM detection were consistently observed. For the measurements with the fixed-level noise masker, the stimuli had the same time course and spectra as in the initial stage, but the noise level was fixed and the modulation depth in the signal interval was varied adaptively using a three-down one-up tracking rule. The tracking procedure used steps of 4 dB [20log(m)] for the first two reversals, 2 dB for the subsequent two reversals, and 1 dB for the final eight reversals. When the tracking procedure called for a modulation depth greater than 0 dB (100 % modulation depth) before the fourth reversal was obtained, the modulation depth was reset to 0 dB and the tracking procedure continued. When this happened after the fourth reversal, the tracking procedure was aborted. The level of the noise was then lowered by 1 dB and the threshold measurement started anew. AM detection thresholds were calculated by averaging the modulation depths at the final eight reversals and the final threshold estimate was obtained as the mean of thresholds from three runs. When the standard deviation of the mean exceeded 2.5 dB, three additional thresholds were measured and the last three thresholds were averaged to obtain the final threshold estimate for detecting AM in the noise masker without a precursor.

Next, AM detection thresholds were measured using the same 2AFC procedure and adaptive tracking but for the tonal carrier in noise preceded by a 400-ms precursor gated on and off with 5-ms raised-cosine ramps. The noise masker immediately followed a precursor (0-ms delay between the precursor and the noise masker) in all three precursor conditions illustrated in Fig. 2. The complex-tone precursor consisted of seven tones that were equidistantly spaced on a logarithmic frequency scale over a two-octave range. The component frequencies were 0.5, 0.63, 0.79, 1, 1.26, 1.59, and 2 kHz in the precursor centered on 1 kHz and 3, 3.78, 4.76, 6, 7.56, 9.52, and 12 kHz in the precursor centered on 6 kHz. The pure-tone precursors had frequencies equal to those of the respective carriers. The overall levels of the precursors were set to be equal to the level of the simultaneous noise masker in all precursor conditions. The experiment was performed using three levels of the tonal carrier, 40, 60, and 80 dB SPL, and two carrier frequencies, 1 and 6 kHz. The two frequencies and three levels were used to investigate whether the level and frequency dependence of precursor effects agree with known characteristics of efferent effects on cochlear responses. The no-precursor condition was completed before the condition with the precursor for each precursor type but the order of carrier levels and carrier frequencies tested was randomized for each no-precursor and precursor condition. For listeners who completed the task for more than one type of precursor, the order of precursor type was randomly selected.

During the experiment, listeners were seated in a sound-attenuating booth. The stimuli were generated digitally on a PC using Matlab (MathWorks, Natick, MA) with a sampling rate of 48 kHz and were played via a 24-bit Lynx22 (Lynx Studio Technology, Costa Mesa, CA) sound card. The stimuli were presented diotically via a Sennheiser HD 650 headset. The diotic presentation was used to maximize efferent effects (Guinan et al. 2003). Stimulus presentation, feedback, and data collection were controlled by the AFC program under Matlab (Ewert 2013).

Results and Discussion

The levels of the simultaneous noise masker that resulted in AM detection thresholds that were in the targeted range from − 6 to − 2 dB varied across individuals. Table 1 provides the ranges of masker levels, the mean noise levels, and the mean signal-to-noise ratios (SNRs) across all listeners, for each carrier frequency and level. As stated above, the precursor level was equal to the level of the simultaneous noise masker in each condition for each listener. As shown in the fourth and seventh columns in Table 1, the SNRs needed for the targeted AM detection thresholds were larger for the 60-dB SPL carriers than for the lower or higher carrier levels. This result is likely due to the strongest compression of cochlear responses in the mid-level range and is consistent with previously shown mid-level bumps in intensity discrimination (Roverud and Strickland 2015a, 2015b) and AM detection (Almishaal et al. 2017; Jennings et al. 2018). More compressive responses for input levels around 60 dB SPL likely result in smaller post-cochlear SNRs and reduced effective modulation depths compared to those for levels of 40 and 80 dB SPL.

Table 1 The range of noise levels (LN) producing AM detection thresholds in the targeted range of modulation depths, the mean noise levels, and the mean SNRs, across all listeners

The effect of the precursor on AM detection was calculated by subtracting AM detection threshold measured without a precursor from that observed with a precursor. Negative values resulting from this subtraction indicate that a precursor improved AM detection compared with the no-precursor condition, and positive values indicate that AM detection deteriorated (additional masking) when a precursor was used.

The left and right panels of Fig. 3 show individual and mean changes in AM detection thresholds due to the two-octave noise precursor for the 1- and 6-kHz carriers, respectively. Different colors in the bar plot represent shifts in AM detection for different carrier levels. The rightmost sets of bars in each panel show the mean shifts.

Fig. 3
figure 3

Individual shifts in AM detection threshold due to a noise precursor for a 1-kHz carrier (left panel) and a 6-kHz carrier (right panel). The right set of bars shows the mean shifts. The error bars depict one standard error of the mean

For all ten listeners who completed the task with the noise precursor, the precursor improved AM detection thresholds compared with the corresponding no-precursor condition. For carrier levels of 40, 60, and 80 dB SPL, the respective average shifts in AM detection threshold were − 8.4, − 6, and − 8.1 dB, for the 1-kHz carrier, and − 7.4, − 6.5, and − 6.9 dB, for the 6-kHz carrier. A repeated-measures analysis of variance (ANOVA) on the AM detection thresholds with factors of condition (precursor/no-precursor), carrier level, and carrier frequency showed a significant main effect of precursor (F1, 9 = 120.3, p < 0.001), but no significant effect of carrier level (F2, 18 = 0.58, p = 0.57), no significant effect of carrier frequency (F1, 9 = 0.5, p = 0.51), and no significant interactions between the factors (p > 0.16 in all cases). Correlations between the precursor level and the size of the precursor-related shift in AM detection threshold were not significant for any of the three carrier levels at either of the two carrier frequencies (0.1 ≤ r ≤ 0.43, 0.21 ≤ p ≤ 0.86, with the Bonferroni-corrected significance level α = 0.008). However, it needs to be noted that only ten data points contributed to each correlation.

Changes in AM detection thresholds due to the complex-tone precursor are shown in the left and right panels of Fig. 4, for the 1- and 6-kHz carriers, respectively. Although the prevailing effect of the precursor was to improve AM detection, the effect was not as robust as for the noise precursor across all carrier levels and the two carrier frequencies. For carrier levels of 40, 60, and 80 dB SPL, the respective average shifts in AM detection threshold were − 3.6, − 3.2, and − 9.4 dB, for the 1-kHz carrier, and − 1.1, − 2.1, and − 3.6 dB, for the 6-kHz carrier. A repeated-measures ANOVA on the AM detection thresholds showed a significant effect of condition (F1, 9 = 36, p < 0.001) and a significant effect of level (F1.3, 11.7 = 10.3, p = 0.005, with Greenhouse–Geisser correction applied for a violation of sphericity). Although some listeners did not exhibit improvement in AM detection in the presence of the complex-tone precursor for the 6-kHz carrier, the effect of carrier frequency did not reach significance (F1, 9 = 3.9, p = 0.08). However, there was a significant interaction between condition and level (F2, 18 = 16.7, p < 0.001), a significant interaction between condition and carrier frequency (F1, 8 = 13.6, p = 0.005), and a significant three-way interaction between condition, carrier level, and carrier frequency (F2, 18 = 4.4, p = 0.03). Because by design, thresholds for detecting AM without a precursor did not differ significantly across carrier frequencies and levels, the interactions reflect the dependence of precursor effects on carrier frequency and level. As shown in Fig. 4, the effect of the precursor was stronger at 1 kHz than at 6 kHz and was the strongest for the 80-dB carrier, especially at 1 kHz. The weaker effect of the precursor at 6 kHz was likely due to reduced hearing sensitivity in the region of the three precursor frequencies that were above the carrier frequency (7.56, 9.52, and 12 kHz). Since these precursor components had a lower sensation level than those above the 1-kHz carrier, they likely contributed less to the effectiveness of the complex-tone precursor to the AM unmasking effect compared with the noise precursor. Note that the noise precursor extended over the same range of frequencies as the complex tone, but due to its high spectral density, the decreased hearing sensitivity at very high frequencies might have had a smaller impact on the precursor effect at 6 kHz.

Fig. 4
figure 4

Same as Fig. 3 but for a complex-tone precursor

The effect of the pure-tone precursor is shown in the left and right panels of Fig. 5, for the 1- and 6-kHz tones, respectively. The pattern of results differs from that for the noise and complex-tone precursors in that AM detection was not generally improved by the presence of the precursor. In fact, the average shift in threshold was sometimes positive, indicating poorer AM detection with than without the precursor. For the 1-kHz carrier, the average shift in AM detection was 0.9, 1.1, and − 1.5 dB, and for the 6-kHz carrier, it was 0.6, − 0.1, and − 0.9 dB, for the carrier level of 40, 60, and 80 dB SPL, respectively. A repeated-measures ANOVA on the AM detection thresholds showed that the effect of condition was not statistically significant (F1, 9 = 0.02, p = 0.91) indicating that overall the precursor had no effect on AM detection. There was no significant effect of carrier level (F2, 18 = 2.9, p = 0.08) and no significant effect of carrier frequency (F1, 9 = 0.1, p = 0.72). There was a significant interaction between condition and carrier level (F2, 18 = 11.7, p = 0.001), resulting from the fact that most listeners exhibited a decrease in AM detection threshold in the presence of the precursor for the 80-dB SPL carrier but not at the two lower levels. No other interactions between the main factors were significant (p > 0.2 in all cases).

Fig. 5
figure 5

Same as Figs. 3 and 4 but for a pure-tone precursor

Overall, there was a robust and sizeable improvement in AM detection due to a two-octave noise precursor, a weaker and level-dependent improvement for a complex-tone precursor spanning the same (two-octave) frequency band, and no improvement in AM detection for a pure-tone precursor. There was a trend for the pure-tone precursor to improve AM detection for the highest carrier level tested (80 dB SPL), particularly at 1 kHz.

A few aspects of data from experiment 1 are consistent with the putative contribution of MOC efferent activity to the improvement of AM detection for a tonal carrier presented in noise. One is that the noise precursor produced the greatest decrease in AM detection threshold and the pure-tone precursor was the least effective at improving AM detection. This is consistent with the findings based on efferent effects measured using SFOAEs, which are generally smaller for tonal than for noise MOCR elicitors (Guinan Jr. et al. 2003; Lilaonitkul and Guinan Jr 2009a), especially for elicitors with a frequency of the SFOAE-evoking tone (Lilaonitkul and Guinan Jr 2009b, 2012). The complex tone was less effective than the noise at improving AM detection for two lower carrier levels, 40 and 60 dB SPL, but on average, it still produced some decrease in AM detection threshold. The weaker effect could be explained in terms of spectral sparsity of the seven-tone complex over a two-octave range.

Other aspects of the results cannot be easily related to the known effects of efferent activation. Since MOCR activation results in reduced cochlear gain, the effects of the reflex should be the strongest for low to medium stimulus levels, at which the stimuli receive substantial gain in the cochlea (Guinan 2006). In contrast, at the highest carrier level (80 dB SPL), cochlear gain should be negligible, so efferent-driven changes in gain should be small or absent. The data are not consistent with this prediction. For the noise precursor, the amount of shift in AM detection threshold did not depend on carrier level (Fig. 3), and for the complex-tone precursor, the largest improvements in AM detection were seen for the highest carrier level (Fig. 4). However, efferent effects are known to grow with increasing level of the elicitor. Although a reduction in MOCR effect is expected with increasing carrier level, this reduction could, in principle, be offset by an increase in MOCR effects due to the higher precursor (putative MOCR elicitor) level used with more intense carriers. Still, it is unlikely that cochlear gain for an 80-dB SPL tone in noise would be sufficient to yield the same or greater MOC efferent effects than those observed for a 40-dB SPL tone, for which cochlear gain should be near its maximum (Ruggero et al. 1997). It is less clear how efferent effects on cochlear responses depend on frequency. Recent physiological evidence shows that cochlear gain is greater in the basal than in the apical region of the cochlea (Recio-Spinoso and Oghalai 2017). In contrast, efferent effects measured in humans using SFOAEs are greater for frequencies around 0.5 to 1 kHz than around 4 kHz (Lilaonitkul and Guinan Jr 2012). However, smaller MOCR effects at high frequencies in humans may be due to lower amplitudes of SFOAEs evoked by high-frequency tones and, thus, may reflect a limitation of the measurement technique rather than physiology.

The MOCR is only one of the two feedback-based reflexes operating in the auditory periphery. The other reflex, called the acoustic reflex or the middle-ear-muscle reflex (MEMR), is elicited by relatively high-level stimuli (Mukerji et al. 2010). This reflex causes the stapedius muscle to contract resulting in changes to middle-ear impedance and, consequently, changes of the stimulus level that reaches the cochlea. Although the higher-level stimuli used in experiment 1 may have activated the MEMR, these levels are in the range of MEMR thresholds and, thus, would produce changes in stimulus level on the order of a small fraction of a decibel (Schairer et al. 2007). Such changes in level reaching the cochlea would have negligible effect on performance in the AM detection task.

In summary, although MOC efferent effects could contribute to the observed improvement in AM detection thresholds by the noise and complex-tone precursors, mechanisms central to the cochlea (discussed in more detail in the “GENERAL DISCUSSION” below) may underlie the precursor effects. Previous studies of AM detection which showed smaller but significant improvements in AM detection in the presence of putative MOCR elicitors also suggested that post-cochlear mechanisms likely play at least some role (Almishaal et al. 2017; Jennings et al. 2018; Marrufo-Perez et al. 2018b).

EXPERIMENT 2: INVESTIGATING EFFERENT CONTRIBUTIONS TO THE AMPLITUDE-MODULATION UNMASKING EFFECT

Rationale

The most direct way to estimate efferent effects in humans is to measure otoacoustic emissions with and without an MOCR elicitor (Guinan 2006). Cochlear gain reduction due to MOCR activation results in a decrease in the magnitude of an SFOAE. If the precursors that improved AM detection in experiment 1 did so by activating the MOCR, then the same stimuli should reduce the magnitudes of SFOAEs evoked by the tones that were used as the carriers. Because the strongest AM unmasking effects were observed for the noise precursor, two-octave noise bands were used to measure efferent effects on SFOAEs in this experiment. To limit potentially confounding effects of MEMR activation, the measurements were performed using only the lowest (40 dB SPL) probe level (Guinan Jr. et al. 2003). At mid and higher levels, the two-octave noise band could produce changes in middle-ear impedance that would result in a change in the ear-canal sound pressure. These changes could be incorrectly interpreted as an MOCR effect. It is important to note that none of the elicitors was intense enough to produce more than just a small fraction of a dB change in the stimulus level reaching the cochlea at either carrier frequency used in this study (e.g., Schairer et al. 2007; Feeney et al. 2017). Although this change would be too small to be perceptually relevant, it would be sufficient to contaminate SFOAE-based measurements of the MOCR effects (Guinan et al. 2003).

Previous studies using SFOAEs have shown that relatively narrow (1/2-octave) bands of noise produce significant efferent effects on SFOAEs, especially for frequencies around 1 kHz (Lilaonitkul and Guinan 2009a, 2012). However, in most studies, efferent effects have been shown mainly for elicitor levels that were higher than the levels of the SFOAE-evoking tones, usually by 20 dB. For elicitor levels approaching the level of the probe, efferent effects were small or not observed (Backus and Guinan 2006; Lilaonitkul and Guinan 2009b). In the AM unmasking experiment, the precursors had levels equal to the levels of simultaneous noise maskers that were needed to produce a targeted range of AM detection thresholds in the absence of the precursors. These precursor levels were selected individually for each listener and carrier level. For all listeners and carrier levels, the levels of the precursors were well below the corresponding carrier levels (see Table 1). In this experiment, efferent effects on SFOAEs were measured as a function of the level of the noise elicitor. The aim of this experiment was to determine if the precursors that yielded significant improvements in AM detection thresholds for 40-dB SPL tonal carriers affect the magnitudes of cochlear responses and, thus, the SFOAEs evoked by the low-level tones used for the AM carriers in experiment 1.

Methods

Listeners

Ten listeners (five females, five males) with ages from 18 to 25 years (mean = 20.6, SD = 2.0) were recruited for this experiment. Seven of them had participated in experiment 1 and three were new. The newly recruited listeners had normal-hearing thresholds (≤ 15 dB HL) at audiometric frequencies from 250 Hz to 8 kHz.

Stimuli and Procedure

Efferent effects were measured for two-octave noise bands centered at frequencies from the regions around 1 and 6 kHz that corresponded to the peaks of the SFOAE fine structure. These frequencies were used for the SFOAE probes. Probes at the peak of the SFOAE fine structure were chosen because strong emissions with SNRs ≥ 20 dB are required to observe robust efferent effects (Goodman et al. 2013). None of the selected probe frequencies was within 100 Hz of spontaneous otoacoustic emissions, as confirmed by the measurement of spontaneously evoked emissions for each listener prior to the measurement of efferent effects. The experiment was, therefore, performed in three steps: (1) measurement of spontaneous otoacoustic emissions, (2) measurement of SFOAE fine structure around 1 and 6 kHz, and (3) measurement of efferent effects as a function of elicitor level.

Spontaneous emissions were recorded from the ear canal of the left ear in four 30-s segments during which no sound was played to the ear. The recorded segments were averaged and analyzed using a procedure described by Penner et al. (1993). The SFOAE fine structure was measured using a suppression technique (Zweig and Shera 1995; Shera and Guinan 1999). For each nominal frequency, fn (1 or 6 kHz), SFOAE magnitudes were estimated for probes with frequencies from 0.9fn to 1.1fn in steps of 0.02fn. During a run, a probe was presented continuously for 8 s at a level of 40 dB SPL. A suppressor tone with a frequency 50 Hz higher than the probe was presented for 0.5 s with a repetition rate of 1/s. The suppressor had a level of 60 dB SPL and its polarity was alternated on consecutive presentations. Eight runs were recorded for each probe frequency resulting in a total of 64 SFOAE recordings (8 runs × 8 1-s segments with the suppressor on-and-off cycle per run).

For each nominal frequency, the tone that evoked an SFOAE with the largest magnitude and the SNR ≥ 20 dB was used to measure efferent effects. Figure 6 shows a schematic illustration of the stimulus configuration in a single run. Each run consisted of a ‘suppressor run’ and an ‘elicitor run.’ A suppressor run was similar to a run in the measurement of SFOAE fine structure with the exception of the duration of the stimuli. A tonal probe continued for 16 s during which the suppressor 1-s on and 1-s off cycle was repeated eight times. The elicitor run consisted of eight presentations of 1 s of a probe alone followed by 1 s of the probe presented with a noise band (MOCR elicitor) followed by 4 s of the probe alone. The 4-s duration of the probe after the elicitor was used because Wojtczak et al. (2015) observed elicitor effects on high-frequency SFOAEs that had significantly slower decay times compared with that estimated by Backus and Guinan (2006) for a 1-kHz probe. The probe, which continued throughout the suppressor and elicitor runs, the suppressor, and the noise elicitor were gated on and off with 5-ms raised-cosine ramps. The polarities of the suppressor and elicitor were alternated on each presentation to cancel the physical waveforms during the analysis of the recorded ear-canal sound pressure. A complete cancelation of the physical suppressor and elicitor waveforms was confirmed by analyzing test recordings performed in a simulated ear-canal cavity (Walsh et al. 2015). Seven runs were completed for a single elicitor level. The measurements were performed for four elicitor levels, 30, 40, 50, and 60 dB forward pressure level (FPL). The details of calibration in terms of dB FPL are described below.

Fig. 6
figure 6

Schematic illustration of the spectrogram of stimuli used to estimate MOCR effects elicited by noise on the SFOAE evoked by a tonal probe. A run depicted in this figure was repeated seven times

For each probe frequency, the measurements of efferent effects were performed within a single session. The suppression runs that were interleaved with the elicitor runs and were used to monitor for any systematic changes in SFOAE magnitude that would suggest drift or other artifacts during the measurement. Recordings of each run were manually screened for artifacts. When artifacts were detected, pairs of consecutive segments with opposite elicitor polarities were discarded. Only clean recordings were kept for subsequent analyses.

During the recording session, listeners were seated comfortably in a semireclining chair in a double-wall sound-attenuating booth. They were asked to relax and sit as still as possible during sound presentation. After each run, listeners pressed the spacebar on a computer keyboard placed on their lap to initiate the next run with one finger movement. The stimuli were generated on a PC and were presented and recorded with a sampling rate of 48 kHz using an RME Fireface UC sound card and an ER10X assembly (Etymotic Research, Inc., Elk Grove, IL). The SFOAE-evoking probe and the suppressor tone were presented to the left ear via separate channels of the ER10X probe assembly to avoid electrical interaction between the stimuli. The microphone in the same ear was used to record the ear-canal sound pressure over time. The elicitor was presented bilaterally. In the ear ipsilateral to the probe, the elicitor and the probe were presented via separate channels. A frozen noise sample was used for the elicitor throughout the measurement of efferent effects.

All the stimuli were calibrated to achieve the desired FPL (Scheperle et al. 2008; Scheperle et al. 2011) using EMAV (EMAV Vesrion 3.32; Neely and Liu 1994). This calibration reduces across-listener variability by minimizing the effect of probe placement and the influence of standing waves on the ear-canal sound pressure at high frequencies. Thévenin source characteristics (source sound pressure and source impedance) for FPL calculations were obtained using the Stepper software developed at Boys Town National Research Hospital (http://audres.org/cel/thev/). The noise bands were scaled to obtain the desired level relative to 40 dB FPL for the corresponding probe frequency. Since the purpose of the FPL calibration is to achieve a desired level in dB SPL at the eardrum, in the following sections of the manuscript, the dB SPL unit will be used when referring to the levels of the stimuli.

Waveform Analysis

Waveforms from clean recordings were analyzed to extract the magnitude and phase of the SFOAE for each SFOAE-evoking tone from suppressor runs and the magnitude and phase of the elicitor-induced changes in SFOAE (ΔSFOAE) from elicitor runs. All the recorded waveforms were first high-pass filtered at 250 Hz using a fourth-order Butterworth filter and Matlab ‘filtfilt’ function to eliminate low-frequency noise from the waveforms without introducing phase shifts due to the filtering. Waveforms from the suppressor runs in the measurements of efferent effects (see Fig. 6) were cut into 2-s segments and averaged across segments within a run and across runs. The average 2-s waveform was analyzed using the heterodyne technique described in previous studies (Guinan et al. 2003; Backus and Guinan 2006; Wojtczak et al. 2015). This analysis results in a complex-valued ear-canal sound pressure at the probe frequency, from which the magnitude and phase of the sound pressure are obtained. The 2-s segment of the heterodyned waveform contained a 1-s interval with sound pressure equal to the sum of the sound pressure from the source and the sound pressure from the SFOAE, and a 1-s interval with the sound pressure from the source alone, assuming that the suppressor tone completely suppressed the probe-evoked SFOAE. The complex-valued sound pressure was averaged over a 500-ms window extending from 0.4 to 0.9 s of the second 1-s interval and the resultant mean complex value was subtracted from each point of the 2-s waveform. The 500-ms duration of the averaging window was chosen arbitrarily except that it was important the window provided a robust sample of the ear-canal sound pressure when an SFOAE was likely suppressed. The vector subtraction provided the magnitude and phase of the SFOAE during the first 1-s interval and the magnitude and phase of the noise floor of the measurement during the second 1-s interval. To quantify the level of the noise floor and the extracted SFOAE magnitude, complex values within 100-ms windows placed in the middle of each of the two 1-s intervals were averaged separately. The magnitudes calculated from these values were scaled to dB SPL. The 100-ms window duration for quantifying the SFOAE level was used because variations in SFOAE magnitude across runs were monitored for any systematic changes that could contribute to MOC efferent effects estimated from elicitor runs. The duration of this window had to be equal to the duration of the window over which post-elicitor efferent effects were quantified (as will be discussed below). The 100-ms duration of the averaging window has been used in previous studies using the same technique for estimating efferent-induced changes in SFOAEs (Guinan et al. 2003; Backus and Guinan 2006; Lilaonitkul and Guinan 2009a; b; Lilaonitkul and Guinan 2012; Wojtczak et al. 2015). The recordings of the SFOAE fine structure were analyzed in the same way as the waveforms from the suppressor runs.

To extract the ΔSFOAE magnitude and phase, i.e., the change in the SFOAE due to the MOCR elicitor, waveforms from the elicitor runs were cut into 6-s segments and averaged within a run and across the runs. The average 6-s waveform was heterodyned resulting in a complex-valued sound pressure waveform consisting of a 1-s interval with the sum of the source sound pressure and the probe-evoked SFOAE, a 1-s interval with the sum of the source sound pressure and the probe-evoked SFOAE affected by the noise-induced effects (suppression on the basilar membrane by the ipsilateral elicitor and the MOCR), and a 4-s interval with the sum of the source sound pressure and the SFOAE affected by the decaying MOCR. The vector average of the complex sound pressure was then calculated within a 400-ms window ending 100 ms before the elicitor. This average complex value was subtracted from each point of the entire 6-s segment. After the vector subtraction, the magnitude of the noise floor was calculated from a 100-ms window that ended 50 ms before the onset of the noise. The magnitude of the effect of basilar-membrane suppression by the ipsilateral noise was calculated from a 100-ms window starting 20 ms after the noise onset. The magnitude of the combined effects of basilar-membrane suppression and the MOCR effect during the noise elicitor was calculated using a 100-ms window ending 20 ms before the noise offset. Finally, the magnitude of the efferent effect unaffected by the basilar-membrane suppression was calculated using a 100-ms window that started 20 ms after the noise offset (Backus and Guinan 2006). The magnitude of the noise floor and all the magnitudes of the ΔSFOAE were converted to dB SPL.

Results and Discussion

Figure 7 shows an example of the effects of the noise elicitor on a 1-kHz SFOAE, for one listener (S12).

Fig. 7
figure 7

Changes in the SFOAE magnitude (top row) and phase (bottom row) due to a two-octave noise presented at levels provided on the top of upper panels. The ΔSFOAEs were measured using a 40-dB SPL probe tone with a nominal frequency of 1 kHz. Data are for one listener (S12). Red rectangles in the top right panel show temporal positions of windows over which the ΔSFOAE was averaged to estimate the noise floor, the suppression effects, and the post-elicitor MOCR effect. The gray rectangle shows the window used to estimate the combined effects of suppression and the MOCR

The top panels show the ΔSFOAE magnitude and the bottom panels show the ΔSFOAE phase for elicitor levels listed at the top of each column. The x-axis in each panel was limited to extend from 0.5 to 3 s for better readability of post-elicitor efferent effects that fully recovered within a few hundred milliseconds after the elicitor offset. The data in Fig. 7 are representative of the patterns observed in all the listeners. The first 0.5-s interval in each top panel shows the magnitude of the noise floor of the measurement. The interval from 1 to 2 s shows the ΔSFOAE during the noise elicitor. This change in emission represents the combined effects of the basilar-membrane suppression by the ipsilateral elicitor and the MOCR, when present. For all listeners, the ΔSFOAE during the noise was dominated by suppression of the response to the probe by the ipsilateral noise, as evidenced by an abrupt onset and no increase of the effect during the course of the elicitor. The interval from 2 to 3 s shows the post-elicitor ΔSFOAE. In the absence of significant efferent activation, the post-elicitor ΔSFOAE falls rapidly into the noise floor because suppression on the basilar membrane is essentially instantaneous (Sachs and Kiang 1968). This trend is shown by the data for elicitor levels of 30 and 40 dB SPL and is consistent with psychophysical evidence of significant two-tone suppression on the basilar membrane for suppressor levels at and below the level of the probe (Houtgast 1973). On the other hand, a gradual decrease in ΔSFOAE after the offset of the elicitor indicates that the MOCR was activated because the MOCR has a relatively slow decay time (Backus and Guinan 2006). The post-elicitor decay in ΔSFOAE magnitude is apparent for elicitor levels of 50 and 60 dB SPL (the right two panels). The bottom panels show the corresponding ΔSFOAE phases. As the noise effects become stronger, the phase coherence increases as shown by the increasingly tighter spread of the ΔSFOAE phase with increasing elicitor level. When the noise elicitor effect is absent or weak enough to fall into the noise floor, the ΔSFOAE phase spreads over a range from − 180 to 180 degrees across time (Guinan et al. 2003).

For the nominal frequency of 1 kHz, efferent effects were measured in nine out of ten listeners because one (new) listener did not have an SFOAE with an SNR ≥ 20 dB around 1 kHz. For the nominal frequency of 6 kHz, the measurements of efferent effects were performed for only three listeners (S9, S17, and S20) since the other seven listeners did not have SFOAEs with an SNR ≥ 20 dB.

To determine statistical significance of the noise effects on SFOAEs for individual listeners, a bootstrap procedure was used to generate distributions of ΔSFOAE magnitudes for the noise floor (NF), the suppression effect elicited by the ipsilateral noise (Supp), and the post-elicitor MOCR effect (MOCR). The temporal positions of the windows over which the respective ΔSFOAE magnitudes representing these effects were calculated are shown by red rectangles in the top right panel of Fig. 7. Pairs of elicitor segments with opposite polarities from the elicitor runs were selected randomly with replacement 10,000 times. Each sampling used the number of samples equal to the total number of artifact-free segment pairs. After each resampling, the waveforms were averaged and analyzed the same way as described above in the “Waveform Analysis” section. The effects of the MOCR and basilar-membrane suppression were considered significant if the means of the MOCR and Supp distributions fell outside of the 95 % confidence interval for the NF distribution.

Figure 8 shows examples of the distributions from the bootstrap analysis for the nominal frequency of 1 and 6 kHz, in the top and bottom row of panels, respectively. Different columns show NF, MOCR, and Supp distributions for different elicitor levels. Data for each probe frequency are from one listener.

Fig. 8
figure 8

Examples of bootstrapped distributions of ΔSFOAE magnitude for one listener: NF (gray) denotes distributions for the noise floor, MOCR (red) denotes distributions for the post-elicitor effect, and Supp (blue) denotes distributions for suppression effects. The top and bottom rows show data for the nominal probe frequency of 1 and 6 kHz, respectively

The individual means and standard deviations for the bootstrapped ΔSFOAE magnitudes from the NF, MOCR, and Supp distributions are shown in Fig. 9, for the 1-kHz nominal frequency. The different panels show data for different elicitor levels. The asterisks above the bars indicate the mean ΔSFOAE magnitudes that were significantly above the noise floor. For the 30-dB SPL elicitor, only one listener (S9) showed a significant MOCR effect but seven out of nine listeners exhibited significant suppression by the ipsilateral noise elicitor. For the 40-dB SPL elicitor, only two listeners (S9 and S13) exhibited significant MOCR effects, and all the listeners had significant suppression effects. A number of listeners with significant MOCR effects increased to seven out of nine for the 50-dB SPL elicitor and to all nine listeners for the 60-dB SPL elicitor. Suppression effects were significant for all nine listeners for elicitor levels ≥ 40 dB SPL. Thus, although the noise elicitor activated the MOCR and reduced the SFOAE in the 1-kHz region, significant post-elicitor efferent effects were only observed for elicitor levels > 40 dB SPL. For noise levels ≤ 40 dB SPL, i.e., for the range of precursor levels yielding significant release from AM masking in experiment 1, the noise did not produce significant efferent effects on the SFOAE evoked by the 40-dB SPL tone used as the carrier in that experiment.

Fig. 9
figure 9

Means of the bootstrapped distributions for individual listeners, for the nominal probe frequency of 1 kHz. The error bars denote one standard deviation for the mean and the asterisks indicate cases where the mean ΔSFOAE was outside of the 95 % confidence interval for the noise-floor distribution

The individual means and standard deviations for the bootstrapped ΔSFOAEs in the 6-kHz region are shown in Fig. 10. The suppression effect was significant for all three listeners at all elicitor levels. However, none of the three listeners had a significant MOCR effect for elicitor levels ≤ 50 dB SPL, and only one (S17) had a significant MOCR effect for the 60-dB SPL elicitor.

Fig. 10
figure 10

Same as Fig. 9 but for the nominal probe frequency of 6 kHz

Although SNRs for SFOAEs in the 6-kHz region were generally lower than those in the 1-kHz region, the three listeners for whom MOCR effects were measured met the SNR criterion. The SFOAEs were strong enough to show suppression effects, and yet, significant efferent effects were not observed in individual data for noise levels that produced AM unmasking in experiment 1.

To test whether significant efferent effects were present at a group level, a nonparametric Fisher’s sign test was performed on the median values of the measured ΔSFOAEs that corresponded to the noise floor and the post-elicitor MOCR effect. For the 1-kHz probe, the MOCR effects were not significant for two lower noise levels, 30 and 40 dB SPL (p ≥ 0.18). Efferent effects were significant for two higher noise levels, 50 and 60 dB SPL (p = 0.039 in both cases), consistent with findings from previous studies (e.g., Backus and Guinan 2006). For the 6-kHz probe, none of the noise levels produced significant MOCR effects at a group level (p ≥ 0.25), but the test was based on data from only three listeners.

In the above analyses, the post-elicitor ΔSFOAE magnitudes were used to test for significance of efferent effects elicited by the noise precursor, even though the tone was presented during a masking noise in experiment 1. Because after the offset of the noise elicitor, the tonal probe was presented in quiet, it should provide the best chance to observe efferent effects if they were present since cochlear gain was unaffected by any additional stimulus that would produce suppression on the basilar membrane. Based on the data in Figs. 7, 8, 9, and 10, the ipsilateral noise maskers in experiment 1 produced significant changes in the cochlear responses to the probes via basilar-membrane suppression for both probe frequencies, 1 and 6 kHz. To test if the noise precursor additionally reduced cochlear gain via MOC efferent system when the tone was presented in noise, Fisher’s sign test was performed on the medians of the measured ΔSFOAE magnitudes during two 100-ms windows. One, positioned 20 ms after the noise onset (red rectangle in the rightmost panel of Fig. 7), provided the estimate of suppression on the basilar membrane, and the other ending 20 ms before the noise offset (gray rectangle in Fig. 7) provided the estimate of combined effects of the MOCR and basilar-membrane suppression. No significant differences between the two ΔSFOAE magnitudes were observed for any of the noise levels for both probe frequencies (p ≥ 0.51). Based on these statistical outcomes, the SFOAE measurements provide no support for the hypothesis that MOCR contributed to the AM unmasking due to the noise precursor for the 40-dB SPL probe presented in a noise masker in experiment 1. Although the SFOAE measurements could not be performed for higher-level stimuli, it seems unlikely that the MOCR effects would have been stronger at higher levels because of lower cochlear gain.

GENERAL DISCUSSION

Perception of AM of a tonal carrier presented in noise is degraded by masking in the acoustic-frequency domain, which results in a lower effective sensation level of the modulated tone, and masking in the modulation domain due to the inherent envelope fluctuations in the noise masker (Dau et al. 1997a). Results from behavioral measurements of AM detection for a tone in noise with and without a precursor showed that when the noise level is such that it severely interferes with AM detection but does not completely mask the tonal carrier, adding a precursor with a level equal to that of the noise masker can significantly improve listeners’ ability to detect AM. The improvement depended on the type of the precursor. The largest decrease of AM detection thresholds was observed for the noise precursor with the same spectral content as the simultaneous noise masker. The amount of this AM unmasking was similar for the two carrier frequencies, 1 and 6 kHz, and three carrier levels, 40, 60, and 80 dB SPL, used in this study. A complex tone with seven components spanning the same spectral range as the noise masker produced a comparable AM unmasking when the carrier was presented at 80 dB SPL, but much smaller precursor effects were observed at the two lower levels. A tonal precursor had no significant effect on AM detection.

Comparison with Other Studies

The finding from this study is consistent with several recent reports on improvement in AM detection due to notched-noise (Almishaal et al. 2017; Jennings et al. 2018) and broadband-noise (Marrufo-Perez et al. 2018b) precursors. In our measurements, the level of the precursor was below the level of the tonal carrier for all listeners in all conditions (see Table 1). In an apparent contradiction to the sizeable and significant effects of the noise precursor in this study, no AM unmasking was observed by Almishaal et al. (2017) when the notched-noise precursor used in that study was 10–20 dB below the carrier level. The different findings are likely due to different stimulus configurations. Almishaal et al. (2017) used a narrowband-noise carrier that was presented with a low-level notched-noise masker to limit off-frequency listening. The notched noise likely produced much less modulation masking of the target AM than the noise that spectrally overlapped with the target in this study. The AM unmasking effects shown in the studies by Marrufo-Perez et al. (2018b) and Jennings et al. (2018) were smaller (2–4 dB) than the ~ 8-dB improvement in AM detection threshold by the noise precursor in our study. The larger effect may have been due to our requirement that the level of the simultaneous noise masker degrades AM detection to roughly the same level, resulting in AM detection thresholds in the range between − 6 and − 2 dB. This selection was based on our pilot study with only two listeners (data not shown) which showed that for masker levels yielding lower AM detection thresholds without a precursor, there was less or no benefit from the presence of a precursor. Most listeners in the previous studies (Jennings et al. 2018; Marrufo-Perez et al. 2018b) exhibited AM detection thresholds without a precursor that were below (better than) our targeted range of threshold modulation depths.

Previous studies of the effect of a precursor on AM detection investigated the effects of carrier level while keeping the precursor level constant (Almishaal et al. 2017; Jennings et al. 2018) and the effect of noise laterality for one noise level (Marrufo-Perez et al. 2018b). In this study, the precursor level was varied with the level of simultaneous noise masker and the carrier level, and only the diotic condition was used. Results from experiment 1 show that the AM unmasking did not depend on carrier level over a range from 40 to 80 dB SPL. Almishaal et al. (2017) used a notched-noise precursor fixed at 40 dB SPL and showed no significant effects of the precursor on AM detection for carrier levels below 60 dB SPL. A significant improvement in AM detection due to the precursor was only observed for carrier levels from 65 to 85 dB SPL. In an apparent contrast, Jennings et al. (2018) found an improvement in AM detection due to a fixed-level notched-noise precursor for carrier levels between 50 and 65 dB SPL but not for higher levels. Jennings et al. (2018) attributed the differences in the observed level dependence to the difference between the carrier frequencies used in the two studies and specifically to potentially stronger MOCR effects at 5 kHz than at 2 kHz. In this study, no significant effect of the carrier frequency on improvement of AM detection was found. However, direct comparisons of the outcomes are complicated by differences between the stimuli used in this and the previous studies, as described above.

The dependence of AM unmasking on the type of a precursor is broadly consistent with previous reports that efferent effects on otoacoustic emissions depend on the elicitor bandwidth (Maison et al. 2000; Lilaonitkul and Guinan 2009a). The complex-tone precursor had the same overall bandwidth as the noise precursor, but it was spectrally sparse. With only seven components over a two-octave range, the complex tone likely activated a smaller number of efferent fibers thereby having a smaller effect on the cochlear representation of the AM tone in noise. The complex tone was as effective as the noise precursor in improving AM detection for the highest elicitor level. This could be explained in terms of the excitation pattern which for a complex tone at high levels was more similar to that for a noise with the same bandwidth due to broader cochlear tuning at high levels (Glasberg and Moore 1990). However, because the cochlea applies little or no gain for 80-dB SPL tones, it seems doubtful that the large AM unmasking effect was due to the MOCR. A tonal precursor did not produce a significant change in AM detection, although there was a trend for AM unmasking for the highest carrier (and precursor) level.

Despite the fact that the pattern of the results from experiment 1 is somewhat consistent with known effects of efferent activation, SFOAE-based measurements from experiment 2 contradict the efferent-based explanation, as discussed below.

The MOCR and the Effect of the Noise Precursor on AM Detection in Noise

In a healthy cochlea, level-dependent cochlear gain results in compressive response growth with increasing stimulus level (e.g., Ruggero et al. 1997). For low-level stimuli, cochlear gain is typically assumed to be constant resulting in linear response growth over a limited range of levels (Plack et al. 2004; Almishaal et al. 2017). For these low levels, a decrease in cochlear gain could occur without a change in the amount of cochlear compression. While a reduced compression due to MOCR-induced gain reduction would likely enhance the neural representation of AM, a loss of compression is not necessary to produce the extended dynamic range of auditory-nerve fibers shown in Fig. 1. A reduction of cochlear gain due to MOCR activation combined with a decrease in auditory-nerve rate adaptation would be sufficient to improve coding of transient or brief fluctuating stimuli in noise by the auditory-nerve fibers. In experiment 2, the probe level was 40 dB SPL. Although for some listeners this level could fall into the linear range of cochlear responses, the efferent-related gain reduction would still be expected to improve the representation of AM for a tone in noise in the auditory nerve.

SFOAEs measured for the 40-dB SPL probe with and without a two-octave noise showed a change in SFOAE magnitude during the noise, but for noise levels ≤ 40 dB SPL, the effect disappeared rapidly after the offset of the noise. The effect during the noise elicitor always resulted in a decrease in ear-canal sound pressure. Although the noise components falling into the cochlear filter tuned to the probe frequency contributed to the overall excitation when both the probe and the noise were present, the net effect of the noise on the response to the probe was suppressive. As shown by the ΔSFOAEs, this effect did not change during the course of the noise elicitor for any noise level tested in experiment 2. The lack of changes in ΔSFOAE during the elicitor suggests that suppression by the ipsilateral noise dominated the reduction in cochlear gain and MOC efferent activation did not significantly add to it. Pure MOCR effects shown by post-elicitor changes in the SFOAE were observed only for noise levels of 50 and 60 dB SPL. Significant efferent effects were not observed for lower levels including those that resulted in significant and sizeable AM unmasking for the 40-dB SPL carriers in experiment 1. The lack of post-elicitor effects means that even for the largest amount of cochlear gain available (i.e., for the tone in quiet), no significant change to cochlear gain was observed for noise levels producing the AM unmasking. The measurement of noise effects on SFOAEs allowed us to monitor the cochlear channels tuned to the probe frequencies but not the adjacent channels. If the MOCR decreased cochlear responses to the noise in the adjacent channels without changing the cochlear gain in the channel with the target, such changes would not be reflected in the SFOAE measurements at the probe frequency. Such reduction of cochlear responses to noise via MOCR activation could contribute to AM unmasking by setting up the stage for an enhancement of the target via decreased lateral inhibition at higher stages of the auditory pathways. This interpretation has been used to explain a perceptual context effect known as auditory enhancement (e.g., Viemeister and Bacon 1982; Byrne et al. 2011; Beim et al. 2015) which is observed when a precursor is a copy of the simultaneous masker but has components removed from the spectral region around the target sound. However, this explanation of AM unmasking due to MOCR is unlikely for at least two reasons. First, tuning of the MOCR effects has been shown to be very broad, spanning a frequency range greater than one octave (Lilaonitkul and Guinan 2012). Such broad tuning makes it unlikely that MOCR would selectively affect only cochlear channels adjacent to the one with the target. Second, our SFOAE analysis of the waveforms during the elicitor shows that suppression on the basilar membrane swamps any additional changes that could be due to MOC efferent activation, even when the MOCR effects are observed after the offset of the noise elicitor. In short, changes in cochlear responses to noise due to MOCR activation are likely negligible compared with basilar-membrane suppression between components in the noise masker.

Other Mechanisms

Perceptual Grouping and Segregation

SFOAE-based measurements of efferent effects performed using stimuli that were used as precursors and probes in the psychophysical AM detection task suggest that MOCR effects did not contribute to the AM unmasking. The significant improvements in AM detection with a precursor appear to arise entirely from mechanisms central to the cochlea. One such mechanism could be related to grouping of the masker and the precursor based on their perceptual similarity (Bregman 1990). For all three precursor types, the tonal carrier was presented simultaneously with a two-octave noise masker. Because the noise precursor was perceptually similar to the masker in terms of timbre (the same spectral content), loudness (the same level), and the content of inherent envelope fluctuations, it could facilitate AM detection by helping listeners segregate the tonal carrier from the simultaneous noise masker (Carlyon 1989). However, the seven-tone complex was perceptually distinct from the noise masker, sounding more like a mistuned harmonic complex than a band of noise, and yet, the improvement in AM detection after this precursor for the highest level used was similar to that for the noise precursor. Because of a wide spacing of the components, the envelope of the complex tone at the output of cochlear filters tuned to each component was flat, but the envelope of the entire stimulus was fluctuating. Although there was some overlap between modulation rates in the envelopes of the complex tone and the noise masker, the level dependence of the AM unmasking by the complex-tone precursor makes grouping based on envelope similarities an unlikely explanation. A pure-tone precursor had the same frequency as the AM carrier and, thus, could be used to cue listeners to the pitch of the brief tone used in the AM detection task. The tone precursor could also serve as a more proximal unmodulated reference for the AM signal than the unmodulated tone in the other observation interval. Despite these potential benefits, the pure-tone precursor had no significant effect on AM detection. It is therefore unlikely that grouping based on perceptual similarity could account fully for the improvements in AM detection due to a precursor observed in experiment 1.

Feedback-Based Neural Mechanisms

Neuronal feedback circuits are not limited to the auditory periphery, but their role remains largely unknown. One of such circuits, observed in most mammals, involves collateral efferent projections from the MOC to stellate cells in the ventral cochlear nucleus (Brown et al. 1988). There are two types of stellate cells in the ventral cochlear nucleus, excitatory T stellate cells and inhibitory D stellate cells (Smith and Rhode 1989). Fujino and Oertel (2001) showed that in mice, narrowly tuned T stellate cells are part of a neuronal positive (excitatory) feedback loop through the ventral nucleus of the trapezoid body and the ventral cochlear nucleus. Broadly tuned D stellate cells are not affected by this feedback loop. This neural circuit may act to enhance encoding of spectral peaks for narrowband stimuli presented in noise, and it may do so after a sufficient time is allowed for slow cholinergic responses to build. Fujino and Oertel (2001) suggested that the enhancement of narrowly tuned responses in the ventral cochlear nucleus compliments the MOCR-induced expansion of the dynamic range of auditory-nerve responses for tones in noise and is observed even when the MOCR feedback loop is disabled. This mechanism could produce AM unmasking effects observed in experiment 1 of this study in the absence of measurable changes to cochlear responses to the tonal probes. It is unclear at present how the enhancement of responses to tones in noise by this feedback loop depends on physical parameters of acoustic stimuli, such as level and frequency. However, the function ascribed to this mechanism by Fujino and Oertel (2001) could produce an improvement in AM detection for a tone presented in noise, even without significant changes to cochlear responses.

Neural Dynamic-Range Adaptation

Another possible explanation is in terms of neural adaptation to sound level statistics. This adaptation differs from classic auditory-nerve firing-rate rate adaptation that reduces the firing rate but does not result in an increased dynamic range for level coding (Smith and Zwislocki 1975; Smith 1977). Adaptation to level statistics has been demonstrated in a number of physiological studies at different levels of the auditory pathways (Dean et al. 2005; Dean et al. 2008; Watkins and Barbour 2008; Wen et al. 2009, 2012). Neural recordings show that rate-level functions shift depending on the most frequent level during prior stimulation in preparation for optimal coding of changes in the stimulus level. These shifts are a prevalent feature in auditory-nerve responses and are magnified in the responses of midbrain and cortical neurons. A precursor in our study could induce this type of adaptation. The noise precursor would have been most effective because its level within the auditory filter tuned to the carrier frequency was the same as the level of the simultaneous masker. Because all three precursors had the same overall level, the levels of the complex tone and the pure tone within the auditory filter tuned to the carrier frequency were higher than that for the noise masker. This mismatch could make the complex-tone and pure-tone precursors less effective at producing shifts in neural responses that were optimal for coding of subsequent level changes in AM tone in noise. The neural adaptation to level statistics was implicated in a recent study of word recognition in noise by normal-hearing listeners with unprocessed and vocoded stimuli and by cochlear-implant users (Marrufo-Perez et al. 2018a). Cochlear-implant users rely on temporal envelope for speech perception, and because cochlear processing is bypassed in electric hearing, efferent effects on cochlear responses are absent in their auditory processing. Marrufo-Perez et al. (2018a) showed that improvement in word recognition due to a noise precursor was not significantly different in cochlear-implant users from that in normal-hearing individuals tested using vocoded stimuli. Their findings support the interpretation of AM unmasking arising entirely from post-cochlear sites in the auditory system.

Final Remarks

Noise stimuli that improved AM detection for a subsequent tone presented in noise did not elicit significant post-elicitor effects on SFOAEs expected from relatively slow decay of efferent effects. Changes in SFOAE magnitude due to basilar-membrane suppression by the ipsilateral noise remained constant throughout the noise duration suggesting no additional MOCR-induced cochlear gain reduction. Due to possible confounding effects, such as those introduced by inadvertent MEMR activation, efferent effects on cochlear responses could be measured only for the lowest probe level. Although this is a limitation of the study, the lowest probe level should reveal efferent effects most easily since cochlear gain is near its maximum for low-level tones. The lack of significant MOCR effects for the noise and tonal carrier levels similar to those in the AM detection task suggests that cochlear processing does not contribute to the precursor effect on AM detection in noise but rather that the effect originates from post-cochlear sites. One possibility is that the SFOAE-based measurements are not sensitive enough to detect efferent effects that are relevant perceptually. Because suppression by an ipsilateral noise was consistently observed in these measurements, we find this possibility unlikely. Some likely post-cochlear mechanisms include positive neuronal feedback loop that may lead to enhancement of response to tones in noise in the ventral cochlear nucleus and neural dynamic-range adaptation at different levels of the auditory pathways. Because no support for contribution of efferent effects to AM unmasking at the level of the cochlea were found in this study, it is unclear if hearing loss would reduce the benefits from prior stimulation exhibited here by listeners with normal hearing. Jennings et al. (2018) reported reduced improvements in AM detection after a notched-noise precursor in hearing-impaired listeners compared with normal-hearing controls, but the reduced effect might have been due to a better performance of the hearing-impaired listeners in the absence of a precursor. It is unclear whether individuals with cochlear hearing loss may exhibit differences in the operation of the post-cochlear neural processing. If present, such differences could contribute to the degraded processing of envelope fluctuations in noisy backgrounds and, thus, to the experienced difficulty in understanding speech in noise.