INTRODUCTION

Hearing in noise is a challenge for listeners both with and without hearing loss, yet the healthy auditory system accomplishes this task with remarkable ability. Previous studies using stimulus-based signal-processing strategies have identified envelope-based cues that play an important role in detection. The goal of this study was to determine whether nonlinear physiological models, which both distort and enhance envelope cues, can perform as well as the simple stimulus-based models in predicting listeners’ detailed performance.

Early studies of tone-in-noise detection (e.g., Blodgett et al. 1958, 1962; Dolan and Robinson 1967) used random noises for each trial to obtain listeners’ detection thresholds. Two listening conditions have been tested most extensively (Fig. 1): the diotic condition, in which identical noise-alone or tone-plus-noise stimuli are presented to the two ears, and the dichotic condition, in which out-of-phase tones and in-phase noises are presented to the two ears for the tone-plus-noise trials.

FIG. 1
figure 1

A schematic diagram illustrates two basic listening conditions: A diotic condition, NOSO, and B dichotic condition, NOS\( \pi \). Note that the tone is inverted in the lower right panel (after Moore 2003).

Reproducible noises, a set of pre-generated random noises, have been used to explore listeners’ detailed detection performance across different noise waveforms (Pfafflin and Mathews 1966; Gilkey et al. 1985; Siegel and Colburn 1983; Isabelle and Colburn 1991; Evilsizer et al. 2002; Davidson et al. 2006; Mao et al. 2013). Listeners’ responses vary across repetitions of any given stimulus, due to unknown factors which are referred to as internal noise that are assumed to be independent across trials. Each reproducible noise waveform is tested multiple times to yield average response patterns that reduce the effects of internal noise. Listeners’ performance can be described in terms of the set of hit (correct identification of “tone present”) and false alarm (FA) rates for each reproducible noise, referred to as a detection pattern (Davidson et al. 2006) (Fig. 2). Individuals’ detection patterns are highly consistent over time (Gilkey et al. 1985; Isabelle and Colburn 1991; Evilsizer et al. 2002; Davidson et al. 2006) and, in many conditions, across listeners, suggesting the use of similar strategies and cues. Several models have been proposed to predict listeners’ performance using specific cues. In each model, a decision variable (DV) is computed based on a feature of the stimulus, and DVs from the set of reproducible noises are compared to listeners’ detection patterns.

FIG. 2
figure 2

Detection pattern (hit and FA rates) of the average listener for diotic narrowband waveforms. The horizontal axis shows noise index; the insets show examples of tone-plus-noise (top) and noise-alone (bottom) waveforms. Note that listeners’ responses vary across reproducible waveforms; responses were highly consistent within and across individual listeners for this stimulus condition.

For diotic detection, energy within a critical band (Fletcher 1940) predicts a significant amount of the variance in listeners’ performance across different maskers (Davidson et al. 2009; Mao et al. 2013). However, the energy cue cannot explain performance in roving-level conditions, in which the overall sound level varies in each trial (Kidd et al. 1989). An envelope slope (ES) cue (Richards 1992; Zhang 2004) is robust for the roving-level condition and predicts a significant amount of the variance in listeners’ detection patterns (Davidson et al. 2009; Mao et al. 2013). Models based on temporal fine structure such as the phase opponency model (Carney et al. 2002) also predict a smaller but significant portion of the variance in detection patterns. A recent study shows that predictions based on an optimal combination of energy and temporal cues approach the predictable variance in detection patterns (the common variance among different listeners’ performance) for the diotic condition (Mao et al. 2013).

For dichotic detection, interaural level difference (ILD) and/or interaural time difference (ITD) cues have been used to predict listeners’ performance (Davidson et al. 2009; Isabelle 1995); however, these predictions were substantially lower than the predictable variance. A binaural envelope slope cue, the slope of the interaural envelope difference (SIED), yields significantly better predictions than ILD and/or ITD cues (Mao and Carney 2014). Thus, among stimulus-based models, those using envelope cues are most successful in predicting listeners’ performance for both diotic and dichotic conditions.

Envelope cues are important for detection in noise, but these cues are affected by every stage of neural processing. Nonlinearities in the auditory periphery that affect envelopes, including compression, rate saturation, and rate and dynamic range adaptation, are included in recent computational models for the auditory nerve (AN) (Zilany and Bruce 2006, 2007; Zilany et al. 2009, 2014; Zilany and Carney 2010). Dau et al. (1996) and Breebaart et al. (2001) simulated auditory signal processing for masked detection using models that included linear band-pass filters, rectification, and adaptation. The current study combined recent nonlinear AN models with models for envelope processing in the cochlear nucleus and inferior colliculus (IC) (Nelson and Carney 2004), where envelope tuning is first observed along the ascending auditory pathway (reviewed in Joris et al. 2004). Using the detailed detection performance available from reproducible noise studies, it was possible to investigate the physiological processing of envelope cues for masked detection and to test the hypothesis that physiological models can perform as well as the simpler signal-processing-based models in predicting listeners’ performance. For physiological models, responses were assessed based on the average discharge rate and on temporal fluctuations in the responses of monaural and binaural model IC cells. Given the success of the stimulus-based ES and SIED models for predicting listeners’ performance, these models served as benchmarks for the physiological models to investigate the effectiveness of cues extracted using models with nonlinear neural mechanisms that influence envelope coding.

METHODS

In this study, given that envelope cues for tone-in-noise detection are processed in the auditory pathway, it was hypothesized that cues obtained with basic neural mechanisms of the responses from the model IC cells yield similar predictions as the stimulus-based envelope cues for predicting listeners’ detection performance.

Datasets

Listeners’ detection performance for diotic and dichotic tone-in-noise detection was obtained from previous experiments (Davidson et al. 2006; Evilsizer et al. 2002; Mao and Carney 2014). Detection in the presence of reproducible noise maskers was tested on each listener in these studies. Each single-interval trial had a 50 % chance of being either a noise-alone or tone-plus-noise trial, and the masker waveform was randomly chosen from the set of reproducible waveforms. Listeners responded “tone present” or “tone not present”, and their performance was described in terms of hit rate (proportion of correct response of tone present for tone-plus-noise waveforms) and FA rate (proportion of responding tone present for noise-alone waveforms) for each reproducible masker waveform. The set of hit and FA rates across the ensemble of maskers is referred to as a detection pattern (Davidson et al. 2006). Figure 2 shows the detection pattern of the average listener (i.e., averaged performance across individual listeners) for diotic narrowband waveforms.

Data from diotic and dichotic listening conditions were used in this study. Both narrowband (452–552 Hz) and wideband (100–3,000 Hz) waveforms were used for diotic and dichotic conditions. The narrowband maskers were created by extracting the 100-Hz band of the spectrum from 452 to 552 Hz from the wideband maskers. The spectrum level of the noise waveform was 40 dB sound pressure level (SPL) (overall noise level was 60 dB SPL for narrowband waveforms and approximately 75 dB SPL for wideband waveforms). For the predictions here, the 500-Hz tone level was set to each listener’s threshold. Listeners’ detection thresholds for the dichotic condition were approximately 10 dB lower than those in the diotic condition. This decrease in threshold, the well-known binaural masking level difference (Moore 2003), was due to the binaural differences introduced in the dichotic condition. In this study, data from a total of eight listeners for the diotic condition (S1–S4 from Evilsizer et al. 2002 and S5–S8 from Davidson et al. 2006) and six listeners for the dichotic condition (S1–S4 from Evilsizer et al. 2002 and S9 and S10 from Mao and Carney 2014) were analyzed.

Diotic Models for Tone-in-Noise Detection

Two types of envelope-based cues were used in this study: the modified stimulus-based ES cue and the physiologically based envelope cue from model IC responses.

Stimulus-Based Model

The original ES model (Richards 1992; Zhang 2004; Davidson et al. 2009) quantifies changes in envelope fluctuations. Because the addition of a tone to a narrowband noise waveform results in a decrease of the envelope fluctuations, a low value for the DV indicates that the testing waveform is more likely to be a tone-plus-noise stimulus. By inspecting the frequency components of envelopes from tone-plus-noise and noise-alone stimuli, it was determined that the largest differences in envelope energy were within 50–150 Hz (Mao et al. 2013). Thus, a sixth-order band-pass filter with a center frequency of 100 Hz (Q = 1) was used to extract the envelope frequency range of interest. Figure 3A shows the schematic diagram of the modified ES model. The Hilbert transform is used to compute the analytic signal from the output of a fourth-order gammatone filter (center frequency of 500 Hz). The envelope is obtained from the analytic signal, and the DV of the model was calculated as the integral of the half-wave rectified slope of the envelope. The difference between this modified model and the original ES model (Richards 1992; Zhang 2004; Davidson et al. 2009) is that a tenth-order low-pass filter (cutoff frequency at 250 Hz, aiming to exclude the high frequency fine-structure components) is replaced by the band-pass filter to extract envelope cues from the most informative frequency range. A previous study showed that predictions based on the modified ES model (Mao et al. 2013) were more consistent with listeners’ performance than those using the original ES model.

FIG. 3
figure 3

Schematic diagrams of the monaural envelope models. A Stimulus-based modified ES model. The envelope was obtained from the analytic waveform computed from the Hilbert transform of a fourth-order gammatone-filtered waveform; a sixth-order band-pass filter with center frequency of 100 Hz was used to extract the envelope frequency range of interest. Cue values were computed as the integral of the half-wave rectified slope of the envelope fluctuations. B Physiological envelope model. The stimulus was passed through the AN and CN phenomenological models and the IC modulation filter to obtain the synapse output. The neural envelope cue was computed based on the rate and fluctuation of the model synapse output.

Physiological Model

In the physiological model, the stimulus was passed through a series of phenomenological models along the ascending auditory pathway (Fig. 3B). First, a human version of the AN model (Zilany et al. 2009, 2014; Ibrahim and Bruce 2010) was used to obtain the AN synapse output. The input to the AN model was first processed by a middle ear filter, followed by a set of band-pass filter paths that provided inputs to the inner hair cell (IHC). The IHC response provided the input to the synapse model, which provided the final model AN response. This AN model has been shown to simulate responses to a range of different stimuli accurately, including pure tones, forward masked stimuli, and amplitude-modulated (AM) stimuli (Zilany et al. 2009). Next, model AN responses were used as inputs to a cochlear nucleus (CN) model (Nelson and Carney 2004). Inhibitory and excitatory AN responses tuned to the same frequency were processed through low-pass filters representing convolution with postsynaptic potential waveforms and were then weighted by excitatory and inhibitory strengths and combined to provide the response of the CN model. For the purposes of the current study, the function of the CN model was to provide modest temporal tuning (i.e., increased synchrony to the envelope) as observed for several cell types in the anteroventral cochlear nucleus (Frisina et al. 1990; Gai and Carney 2006, 2008); no attempt was made to match the model to a particular cell type or CF (Gai and Carney 2006). Increased synchrony was accomplished using an inhibitory strength that was 0.6 times the excitatory strength. Higher values of the inhibitory input strength would yield stronger synchrony to the envelope (e.g., appropriate for onset cells), but little or no sustained response to noise stimuli, which was important for studying responses to the reproducible maskers. Thus, an intermediate value of inhibitory strength that resulted in improved synchrony but still yielded sustained responses was used for all simulations here. This value was the same as that used for the CN stage of the same-frequency inhibitory-excitatory (SFIE) model in Nelson and Carney (2004).

Krishna and Semple (2000) and Nelson and Carney (2007) showed that approximately half of the cells in the IC have band-pass tuning to AM. The IC cell is simulated with a modulation filter to represent this tuning in the model. Specifically, the IC responses are modeled by a band-pass modulation filter, with a peak, or best modulation frequency (BMF) that receives its input from the synapse output of the CN model. The SFIE-type IC model used by Nelson and Carney (2004) acts as a modulation filter, and a number of center frequencies can be achieved by carefully choosing time constants for the excitatory and inhibitory inputs. The Q value of the SFIE-type model is approximately 1.5. Preliminary results showed that better predictions were obtained with a broader filter. In this study, a sixth-order band-pass filter was used that allowed more flexible adjustment of the center frequency, as compared to the original SFIE model, and a Q value of 1 was used (see Appendix). The order of the filter used here was determined by the phase range obtained from physiological recordings of IC cells in awake rabbits (unpublished observations). Two basic neural mechanisms were used to obtain envelope cues from the physiological model: average discharge rate, which was computed as the averaged synapse output, and a measure related to the fluctuations of the model’s peri-stimulus time histograms, which was obtained from the integral of the half-wave rectified derivative of the model response.

Dichotic Models for Tone-in-Noise Detection

Similar to the diotic condition, both stimulus-based and physiological envelope models were used in the dichotic condition. For noise-alone trials in the dichotic condition, listeners were presented with identical noise stimuli to the two ears; thus, there were no binaural stimulus differences in the noise-alone trials. Stimulus-based models that depend on binaural differences cannot make any predictions for dichotic noise-alone trials. As a result, only hit rates were predicted for the stimulus-based models in the dichotic condition. In contrast, for the physiological model, binaural differences were available in models that included cell inputs with mismatched center frequencies, and thus, FA rates could be predicted in that case.

Stimulus-Based Model

For the dichotic condition, binaural differences occur because of the addition of out-of-phase tones to in-phase noises at the two ears. The SIED model (Mao and Carney 2014) focuses on the binaural envelope difference cues. Figure 4A shows a schematic diagram of the SIED model, in which envelopes from the contralateral and ipsilateral sides are extracted from the analytic signal computed from a fourth-order gammatone-filtered stimulus. The binaural envelope difference was calculated based on the difference between the computed monaural envelopes. Specifically, the SIED cue value was calculated as the time integral of the half-wave rectified slope of the envelope differences. It has been shown that the SIED cue represents a nonlinear combination of interaural time and level differences (Mao and Carney 2014).

FIG. 4
figure 4

A schematic diagram of the dichotic envelope models. A Dichotic stimulus-based SIED cue. Envelope was extracted using analytic signal computed from the Hilbert transform of the fourth-order gammatone-filtered waveforms. The SIED cue was computed as the time integral of the half-wave rectified slope of the envelope difference at the two ears. B Dichotic physiological envelope cue. Binaural stimuli were passed through the AN and CN models; excitatory contralateral and delayed inhibitory ipsilateral CN outputs were combined to compute the binaural envelope difference. Responses from the CN outputs were sent to the IC cell. The IC cell was simulated with a band-pass modulation filter, and the envelope cue from the IC cell was computed based on the average rate and fluctuation of the model’s response.

Physiological Model

Computation of the dichotic physiological envelope cue is shown in Figure 4B. Similar to the diotic physiological model (Fig. 3B), binaural stimuli are first passed through a series of phenomenological cell models along each monaural pathway. Model synapse outputs are obtained from AN and CN models for both contralateral and ipsilateral sides. The excitatory response from the contralateral CN model is combined with a delayed (2 ms) inhibitory response from the ipsilateral CN model via an inhibitory interneuron. The combination of the CN outputs represents the binaural difference, which includes both interaural time and level differences. The combined excitatory and inhibitory inputs are sent to the IC modulation filter. For the IC model, a band-pass modulation filter centered at 50 Hz was used; this modulation frequency contains the largest envelope differences related to tone presence in the dichotic condition (Mao and Carney 2014). The dichotic physiological model envelope cues based on either rate or response fluctuations were obtained from the model IC synapse output.

Evaluation of Model Predictions

Model predictions based on envelope cues were evaluated by comparing them to listeners’ detection patterns. For each model, a DV was computed for each waveform. The proportion of the variance in the detection pattern explained by the model was computed as the squared Pearson product-moment correlation coefficient between the DVs and the z-score of listeners’ detection patterns (Davidson et al. 2009; Mao et al. 2013). The variance predicted by each model was compared with the significance level (p < 0.05). In addition, the variances explained by the stimulus-based and physiological models were compared to test the hypothesis that stimulus-based and physiological envelope cues could predict similar amounts of the variances in listeners’ responses.

RESULTS

In this section, model predictions using stimulus-based and physiological envelope cues are shown. Model AN fibers with different center frequencies and IC band-pass modulation filters with different best modulation frequencies were used in the physiological models. Basic neural mechanisms were used to compute cues from the IC model responses: rate and synchrony from the synapse output and fluctuations of the model responses. Predictions computed using synchrony cues are not shown here because synchrony to the 500-Hz tone was not significantly correlated to listeners’ detection patterns.

Detection patterns were highly correlated across different pairs of listeners in the diotic narrowband and wideband and dichotic wideband conditions (Mao et al. 2013; Mao and Carney 2014), indicating that listeners used a similar strategy to detect tones in noise in each of these conditions. In this study, model predictions are only shown for the average listeners in these three conditions. For the dichotic narrowband condition, in which listeners’ patterns were not significantly correlated (Mao and Carney 2014), model predictions are shown for individual listeners. Model predictions using stimulus-based envelope cues were reported in previous studies (Mao et al. 2013; Mao and Carney 2014). Here, quantitative comparisons of stimulus-based and physiological envelope cues are shown in Tables 1 and 2.

TABLE 1 Stimulus-based and physiological model predictions of diotic and dichotic tone-in-noise detection patterns for the average listener
TABLE 2 Stimulus-based and physiological model predictions for individual listeners’ hit rates in the dichotic narrowband condition

Diotic Physiological Cues

Figure 5(A–C) shows model predictions of average listeners’ narrowband detection patterns using stimulus-based envelope cues, average rates, and fluctuations computed from the model IC cell responses. Predictions based on the same cues for the average listeners’ wideband detection patterns are shown in Figure 5(D–F). In each panel, the x-axis shows the model center frequencies and the y-axis shows the proportion of variance in the detection pattern that is explained by the model.

FIG. 5
figure 5

Stimulus-based and physiological model predictions of the average listener’s hit (triangles) and FA (circles) rates based on stimulus-based envelope cues (A narrowband, D wideband), average rate (B narrowband, E wideband), and fluctuations (C narrowband, F wideband) computed from the model IC cell responses. The x-axis shows the center frequencies of the model cells, and the y-axis shows the proportion of variance explained by the model. The black dotted line indicates the level required for significant predictions (p < 0.05).

In both narrowband and wideband conditions, the trends of model predictions across different frequency channels were similar, with the highest correlation to listeners’ detection patterns obtained at or near the 500-Hz target tone frequency. In addition, maximal predictions from the stimulus-based envelope cue and the physiological rate and fluctuations cues were similar in these two conditions (Fig. 5(A–C) for narrowband; Fig. 5(D–F) for wideband). In the narrowband condition, the frequency range that yielded the highest correlation to listeners’ hit rates was approximately 530 Hz for both stimulus-based and physiological cues. The reason that the maximal correlation to listeners’ performance occurred for the 530-Hz channel was likely due to the phase properties of the gammatone filter, as no significant difference in envelope energy was observed across these channels. In the wideband condition, maximal correlations to listeners’ detection patterns were obtained from model cells tuned near the frequency of the 500-Hz tone.

Dichotic Physiological Cues

For the stimulus-based SIED cue, FA rates cannot be predicted because there are no interaural differences in the identical noise-alone stimuli that were presented to the listener. In the physiological models, assuming that model IC cells receive inputs from AN synapse outputs with mismatched center frequencies (Joris et al. 2006), predictions of both hit and FA rates can be computed. In this section, each figure shows dichotic model predictions with different combinations of mismatched center frequencies. As described previously, the average listener was analyzed in the wideband condition because listeners’ detection patterns were highly correlated with each other, and individual listeners were analyzed for the narrowband condition because different listeners apparently use different strategies for detection of tones narrowband stimuli. In particular, different binaural combinations of frequency channels provide better predictions for different individual listeners in the narrowband dichotic condition (Mao and Carney 2014).

In Figure 6, dichotic model predictions of hit and FA rates for the average listener in the wideband condition (A, B) and three individual listeners in the narrowband condition [S1 (C, D), S3 (E, F), and S4 (G, H)] are shown. These individual listeners were chosen because their results were representative of the three other listeners. In each panel, the axes show the center frequencies of the model cells that received stimuli presented to the left and right ears. Predictions for the binaural models that receive matched 500-Hz frequency channels from both ears are shown at the lower left of each panel; the other predictions are for binaural combinations of different (mismatched) frequency channels from the two ears.

FIG. 6
figure 6

Physiological model predictions of the average listener’s wideband hit and FA rates (A, B) and individual listeners’ narrowband hit and FA rates (S1: C, D; S3: E, F; and S4: G, H) based on the average rate of the model IC synapse output. The x-axis shows the center frequencies of model cells receiving the stimuli presented to the left ear, and y-axis shows the difference of center frequencies of model cells at right ears with respect to the left ears. The grayscale value shows the proportion of variance explained by the model.

For the wideband condition, the trends of model predictions using mismatched AN inputs are similar: the highest correlation of hit rate was observed for model cells that received left and right inputs with AN tuning near the tone frequency of 500 Hz; for FA rates, predictions with ipsilateral inputs around 500 Hz were high (Fig. 6(A, B)). For the narrowband condition, the trends in the predictions varied across listeners: some were best predicted by frequency channels around 500 Hz, and others were best predicted by frequency channels away from the tone frequency for both ears (Fig. 6(C–G)). For instance, S3’s detection patterns were best predicted by rate information from frequency channels near 500 Hz, and S4’s hit rates were best predicted by frequency channels that were approximately one critical bandwidth apart. The diversity in these results implies that different listeners use different strategies for narrowband tone-in-noise detection, which also may explain the low correlations between listeners’ detection patterns for this condition. In both narrowband and wideband conditions, model predictions were not symmetric around the matched center frequencies of 500 Hz. This is partly due to the fact that the physiological cues were obtained by combining positive contralateral and negative ipsilateral CN inputs; exchanging the contralateral and ipsilateral inputs yields slightly different results (i.e., they are not mirror symmetric) because of the differences across reproducible waveforms.

Predictions based on the model response fluctuations for the average listener in the wideband (Fig. 7(A, B)) and individual listeners in the narrowband condition [S1, Fig. 7(C, D); S3, (E, F); and S4, (G, H)] are shown in Figure 7. The overall trends in Figure 7 are similar to the results from Figure 6. Listeners’ detection patterns were best predicted by combining frequency channels both tuned near 500 Hz in the wideband condition, whereas different combinations of frequency channels yielded better predictions of listeners’ patterns in the narrowband condition. However, there are some detailed differences between the trends in Figures 6 and 7. In the wideband condition, for fluctuation cues, the best frequency channels are located closer to 500 Hz compared with rate cues, though predictions from 600-Hz contralateral and 400-Hz ipsilateral inputs also yielded good predictions. For the narrowband condition, model predictions for the dichotic conditions based on model response fluctuations yielded a substantially higher correlation for some listeners’ patterns than predictions using the rate cue.

FIG. 7
figure 7

Physiological model predictions of the average listener’s wideband hit and FA rates (A, B) and individual listeners’ narrowband hit and FA rates (S1: C, D; S3: E, F; and S4: G, H) based on fluctuations of the model IC synapse output. The x-axis shows the center frequencies of model cells receiving the stimuli presented to the left ear, and y-axis shows the difference of center frequencies of model cells at right ears with respect to the left ears. The grayscale value shows the proportion of variance explained by the model.

Comparisons of model predictions for the average listeners using stimulus-based and physiological cues are shown in Table 1. Table 2 shows predictions for individuals in the dichotic narrowband conditions, because the average listener was not used in this condition. Both hit and FA rates were predicted using the physiological envelope cues for both diotic and dichotic conditions. For the stimulus-based cues, hit and FA rates were predicted for the diotic condition, whereas only the hit rate for the dichotic condition could be predicted. As shown from the table, both types of physiological envelope models predicted similar amounts of variance of the average listeners’ detection patterns in the wideband condition, and these predictions were similar to the stimulus-based cues. For the narrowband dichotic condition, predictions from fluctuation cues were slightly, but not significantly better than rate cues for some listeners, and predictions for S3 and S4 from these physiological cues were significantly improved in comparison to the stimulus-based SIED cue. Predictions were comparable (S6) or slightly higher (S1 and S5) for three other listeners and lower (S2) for only one listener. The improved predictions of the physiological model are likely due to the modulation gain that is present in this model, but not in the stimulus-based models. Modulation gain, in either the synchrony or rate responses, is present at every level of the physiological model.

DISCUSSION

Physiological and stimulus-based models that rely on envelope cues were tested with listeners’ detection patterns for diotic and dichotic reproducible noises. The physiological models included peripheral nonlinearities that distort envelopes cues, such as cochlear compression and saturation in the responses of hair cells and auditory nerve synapses. These models also include mechanisms that distort, but enhance, the envelopes, including adaptation in the auditory nerve, dynamic range adaptation, and tuning for amplitude modulation rate at the level of the IC (Zilany et al. 2009; Zilany and Carney 2010; Nelson and Carney 2004). The physiological models explained comparable or greater amounts of variance in most listeners’ detection patterns as compared to the stimulus-based models, for both diotic and dichotic conditions.

In previous studies, model predictions for diotic and dichotic reproducible noises were computed from different types of cues. For the diotic condition, commonly used models are based on energy (e.g., Fletcher 1940; Gilkey and Robinson 1986), envelope (Richards 1992; Zhang 2004; Davidson et al. 2006), fine-structure (Carney et al. 2002), and template-matching-based temporal cues (Dau et al. 1996; Breebaart et al. 2001). For the dichotic condition, models are based on interaural cues, such as energy-based equalization and cancellation (Durlach 1963), the normalized cross-correlation (Isabelle 1995), interaural level and time difference cues (Isabelle 1995), and different types of linear combinations of interaural level and time differences (Isabelle and Colburn 1987, 2004; Goupell and Hartmann 2007; Davidson et al. 2009).

In the current study, physiological monaural and binaural envelope cues were analyzed, motivated by the robustness of stimulus-based envelope cues for predicting listeners’ performance in both diotic and dichotic conditions (Richards 1992; Davidson et al. 2006; Mao et al. 2013; Mao and Carney 2014). The physiological envelope cues studied here were based on the average rate and response fluctuations computed from a physiological model for the auditory pathway up to the level of the IC. The rate computation from the model IC responses can be interpreted as the response energy from the envelope-sensitive cell, and the fluctuations of the IC responses are similar to the envelope slope cues used in the stimulus-based models. In addition, the synchrony of IC responses to the 500-Hz tone and synchronized rate (the product of synchrony and rate) values were also computed, but the correlations between DVs based on these metrics and listeners’ detection performance were not significant.

For predictions in the diotic condition, both the average rate and the response fluctuation of the IC model yielded maximal correlations to listeners’ detection patterns that were similar to those for stimulus-based envelope cues. For the dichotic wideband condition, envelope slope-related cues estimated from response fluctuations resulted in slightly, but not significantly, better predictions than the energy of the envelope, estimated from model rates. In addition, the physiological fluctuation cues better predicted listeners’ narrowband dichotic condition than did rate cues. Results from basic neural mechanisms involved in envelope processing suggest that physiological envelope cues are as reliable as the stimulus-based envelope cues in predicting listeners’ tone-in-noise detection results.

The results in Figures 6 and 7 indicate that for the dichotic wideband condition, for which listeners’ patterns were highly correlated, similar frequency channels were used across listeners to detect tones in noise. In general, the frequency channels tuned near the target frequency of 500-Hz provided the best predictions for the wideband condition. In contrast, for the dichotic narrowband condition, different combinations of frequency channels yielded the highest correlations for different listeners’ patterns. Similar results were observed when predicting dichotic detection patterns using the stimulus-based SIED cue with gammatone filters that had mismatched center frequencies (Mao and Carney 2014). Both results suggest that different listeners use different combinations of frequency channels for the dichotic narrowband condition.

Although predictions of listeners’ detection patterns based on envelope cues explained a significant amount of listeners’ performance in the current study, these predictions were lower than the predictable variance for the diotic condition (Mao et al. 2013). The predictable variance describes the proportion of the variation in detection patterns that is common among all listeners and is used as a benchmark for model predictions. As shown in previous studies, predictions based on an optimal nonlinear combination of stimulus-based energy and temporal cues approach the predictable variance (Mao et al. 2013). In the current model, physiological envelope cues were analyzed from a single model IC neuron that received input from a single model AN channel. Assuming that physiological cues are Gaussian-distributed, it would be worth investigating model predictions based on an optimal combination of these cues across different frequency channels, using the likelihood ratio-based method as in Mao et al. (2013).

In this study, the hypothesis that envelope cues available in a physiological model were as reliable as stimulus-based envelope cues in predicting listeners’ tone-in-noise detection patterns was tested. In conclusion, predictions from physiological cues were similar to stimulus-based cues in diotic wideband and narrowband and dichotic wideband conditions (Table 1). For the dichotic narrowband condition, in which different listeners apparently use different strategies, predictions from physiological cues explained comparable amounts, and in some cases significantly more, of the variance of listeners’ detection patterns than the stimulus-based binaural envelope cue (Table 2). Based on these results, changes in physiological mechanisms that would affect the representation of envelopes, including sensorineural hearing loss and aging, would be expected to explain some of the deficits of these listeners in masked detection. It is interesting that these envelope-based cues may play the most important role for detection of low-frequency signals, such as the 500-Hz tone used here, given that fine structure is typically assumed to dominate coding of these signals.