Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Introduction

The interaural time difference (ITD) is an important acoustical property that is used by humans and other animals to localize the sources of sound. This chapter studies the ability to discriminate between positive and negative ITDs for sine tones as a function of tone frequency. The chapter focuses on the functional dependence in high- and low-frequency limits with the goal of testing different models of binaural hearing.

High Frequencies

It is well known that human listeners are not able to detect interaural time differences (ITDs) in sine tones with frequencies greater than about 1,500 Hz (Zwislocki and Feldman 1956; Klumpp and Eady 1956). Those articles showed that although the smallest ITD thresholds occurred near 1,000 Hz, thresholds became unmeasurably high when the frequency increased above 1,300 Hz. The results point to a ­dramatic failure in ITD processing over a very small frequency range. Our ­high-frequency experiments explored this dramatic dependence in detail and ­determined the functional dependence of the high-frequency failure.

Methods

The listener heard two tones and was required to say whether the second tone appeared to be to the left or the right of the first. The tones were 460 ms in overall duration including a 140-ms rise duration and a 140-ms fall. The long rise time was intended to prevent the onset (identical for the two ears) from affecting ITD judgements. Tones were presented to the listener by headphones at a level of 60 dB SPL – the same in both ears. The level of the electrical signals was increased at lower frequencies to compensate for the headphone frequency response.

The application of ITDs was symmetrical about zero. For example, in a left-right trial with a nominal ΔITD of 20 μs, the tone led in the left ear by 10 μs during the first interval, and it led in the right ear by 10 μs during the second interval.

The experiment used a three-down, one-up adaptive staircase procedure with variable increments. The trials of an experimental run continued until the staircase had made 14 turnarounds, and the final 10 were averaged to obtain a ΔITD threshold for the run. Six runs were averaged to find the final threshold at any given frequency.

Results

There were four listeners in the experiment. Their thresholds appear in Fig. 27.1. The new information in Fig. 27.1 is that in the high-frequency limit, the threshold ΔITD grows faster than exponentially with increasing frequency.

Fig. 27.1
figure 00271

ΔITD thresholds for four listeners are shown by symbols. Data for L2, L3, and L4 are offset vertically by 20, 50, and 60 μs, respectively. Error bars are two standard deviations in overall length. The dotted line is the maximum-likelihood fit to a 1/f law. The dashed line is the maximum-likelihood fit to the form 1/ (f c  −  f)η

It is possible to fit the dramatic lateralization failure at 1,450 Hz with a signal processing theory that extends the Jeffress (1948) model of the binaural system. The theory has two main parts. One part is an array of coincidence cells in the midbrain operating as cross-correlators, as observed physiologically in the medial superior olive (MSO) (e.g., Goldberg and Brown 1969; Yin and Chan 1990; Coffee et al. 2006). The second part is a hypothetical binaural display that is a nexus between the coincidence cells and a spatial representation that is adequate to determine laterality for a listener. The display is imagined to have a wide distribution of best delays with only a weak frequency dependence.

Centroid Theory

The centroid lateralization display was introduced by Stern and Colburn (1978) and applied to the lateralization of 500-Hz tones with interaural time and level differences. It was modified and extended to other frequencies by Stern and Shear (1996) to fit the lateralization data of Schiano et al. (1986). In this model display, a sine tone with angular frequency ω and an ITD of Δt excites midbrain cross-correlators represented by a cross-correlation function c(ωτ), where τ is the lag and values of τ have a density distribution p(τ|ω), centered on τ  =  0.

The operative measure of laterality is the centroid of the density-weighted cross-correlation,

$$\overline{\tau }=\frac{{\displaystyle \int dt}p(t\text{{0.05em}}|\text{{0.05em}}w)tc(wt-w\Delta t)}{{\displaystyle \int dt}p(t\text{{0.05em}}|\text{{0.05em}}w)c(wt-w\Delta t)},$$
(27.1)

and the integrals are over the range of minus to plus infinity.

Values of \( \overline{\tau }\)can be computed given reasonable choices for c(ωτ) and p(τ|ω). A reasonable choice for c(ωτ) is a cosine function, at least for frequencies of 750 Hz and higher,

$$ c(wt)=1+m\mathrm{cos}(wt).$$
(27.2)

Parameter m (m  ≤  1) is the “rate-ITD modulation.” The density of lag values can be modelled as a constant for very small τ followed by an exponentially decaying function of τ, independent of frequency as per Colburn (1977).

As pointed out by Stern and Shear, if the width of p(τ) is chosen correctly, then as the tone frequency increases, more and more cycles of c(ωτ  −  ωΔt) fit within the range of lags given by p(τ). This has the effect of preventing the centroid from increasing much as Δt increases because of partial cancellation of the positive lobes of c(ωτ  −  ωΔt) by the negative lobes. Because the centroid is the cue to laterality available to the listener, limiting the centroid in this way limits the perceived laterality. That limit could be a key to the failure to lateralize at 1,450 Hz and above.

Values of centroid \( \overline{\tau }\)computed from the model are shown in Fig. 27.2 for m  =  0.4 and for six different tone frequencies. Predictions for a threshold ΔITD can be made if it is assumed that there is a threshold value of centroid \( {t}_{T}\). For example, if it is assumed that the centroid threshold is \({t}_{T}=8\text{m}\text{s}\), as shown by the dashed horizontal line in Fig. 27.2, then the model predicts that the threshold disappears altogether as the frequency approaches 1,450 Hz. However, in order to model the faster than exponential increase in threshold, the rate modulation m must decrease rapidly with increasing frequency.

Fig. 27.2
figure 00272

Interaural delay centroid as computed in the centroid display model for six tone frequencies. An illustrative value of centroid threshold τ T is shown at 8 ms. It predicts, for example, that for a 250-Hz tone, the ΔITD threshold is 76 μs, and that for 1,450 Hz, there is no threshold at all

Low Frequencies

All models for the neurophysiological processing of ITD depend on the ­cross-­correlation between the neural inputs from the left and right ears. The cross-correlation function provides a measure of the difference the phase or timing of signals arriving at the two ears and thereby encodes the azimuth of a source. But although there is a general agreement about the importance of cross-correlation, there are differences of opinion about how it is applied functionally. The 1948 Jeffress model imagines a doubly tuned array of cross-correlators, tuned in best interaural delay and tuned in frequency. The tuning in best delay is normally thought to be influenced by the largest possible delay in free field given the head size, but is otherwise rather broad, enabling a place model for localization. Tones with different ITD cause different neurons in the central auditory system to light up.

An alternative model abandons the concept of place process. Physiological studies of single units in the inferior colliculus of guinea pigs show a strong correlation between the best interaural delay Δt and the best frequency f. The relationship is such that the phase angle fΔt is in the neighborhood of an interaural phase of 45° (McAlpine et al. 2001). The two different models correspond to different mathematical forms for the density p(τ).

The goal of the low-frequency experiments reported here was to test models of ITD encoding against experimental measurements of just-detectable ITDs. The advantage of low-frequency experiments is that the synchrony of inputs to the cross-correlators becomes very high and stable (Joris et al. 1994) so that the remaining frequency dependence can be attributed to p(τ|ω) and to differences between binaural display models.

Low-Frequency Theory

Binaural theories require three elements: a model cross-correlator c, a distribution function for cross-correlation units p(τ|ω), and a model display. Function c is always a function of phase, which means that temporal parameters such as the ITD, Δt, always enter in the form ωΔt.

In the low-frequency limit, the functional dependence on frequency can be extracted from c by expanding it in a Taylor series about ωτ. Then details of this cross-correlation function become unimportant, and attention can be focused on the distribution p and the display model. For our thresholds, the expansion in IPD, ωΔt, is valid because the largest value obtained was always less than 0.07 cycles.

Centroid Model

The centroid model of Stern and Colburn (1978) and Stern and Shear (1996) is a model of the Jeffress type. A priori, it incorporates cells with a wide range of best delay at any frequency, though the effective extent of the array is limited by p(τ|ω).

If the best delays, τ, scale with frequency such that ωτ is constant, then density p can be written in terms of phase lag only, that is,

$$p(t|w)dt=p(wt)\text{d}(wt).$$
(27.3)

Then because c and its derivative c′ are also functions of phase, the low-­frequency limit of ΔITD threshold is independent of the tone frequency.

A second simple case occurs when the distribution of best delays does not depend at all on frequency, as for the high-frequency calculations in the previous section. Then the predicted threshold becomes

$$\Delta t=\Delta {\overline{t}}_{T}/[2m<{t}^{2}>{w}^{2}],$$
(27.4)

where  <  τ 2  >  is the second moment of p(τ|ω). Therefore, the low-frequency limit of the ΔITD threshold varies inversely as the square of the tone frequency. When p(τ|f) includes parts that scale inversely with ω and parts that are independent of ω, as in the form investigated by Stern and Shear (1996), low-frequency dependences that are intermediate between flat and 1/ω 2 are possible.

Rate-Code Models

If function p(τ|ω) is narrowly distributed on one side, [e.g., p(τ|ω)  »  p(−τ|ω)], the only available encoding for ITD is the difference in firing rates from left and right midbrain centers, E R  −  E L, presumably computed at a higher center.

A development parallel to the centroid model above, applied to the same symmetrical discrimination experiment, uses the change in E R  −  E L between the two intervals, here defined as \( \overline{\Delta }\), as the discrimination statistic. In the low-frequency limit,

$$ \overline{\Delta }=4w\Delta t{\displaystyle {\int }_{0}^{\infty }dt\left[p(t\text{{0.05em}}|\text{{0.05em}}w)-p(-t\text{{0.05em}}|\text{{0.05em}}w)\right]}|{c}^{\prime }(wt)|.$$
(27.5)

If p(τ|ω) is a function of the product ωτ, as it appears to be in the guinea pig, the integral is independent of ω, and the frequency dependence of the ITD threshold is given by

$$ \Delta t{\mu} \frac{{\overline{\Delta }}_{\text{T}}}{[4mw]},$$
(27.6)

where \( {\overline{\Delta }}_{\text{T}}\)is a constant, the threshold value of the discriminator. Therefore, the ΔITD is inversely proportional to the first power of the frequency.

If p(τ|ω) is independent of ω, then

$$\Delta t=\frac{{\overline{\Delta }}_{\text{T}}}{[4m<t>{w}^{2}]},$$
(27.7)

where  <  τ  >  is the first moment of p(τ|ω)  −  p(−τ|ω). The conclusions of the calculations in the low-frequency limit are given in Table 27.1.

Table 27.1 Predicted low-frequency slopes of ΔITD as a function of frequency

Low-Frequency Experiment

Experimental stimuli were sine tones with 11 different frequencies spaced by one-third octave from 1,000 Hz down to 99 Hz. Methods were the same as for the high-frequency experiment. There were seven listeners in the experiment, with thresholds shown in Fig. 27.3. Maximum-likelihood slopes for the seven listeners were as follows: −0.97, −1.15, −1.43, −1.43, −1.08, −1.25, and −0.83 μs. The average is −1.16 and the standard deviation is 0.23. This result is most evidently consistent with the rate-code prediction with a distribution of the form p(τ|ω)  =  p(ωτ), namely, a slope of −1. However, a final conclusion awaits the results of further experiments at still lower frequencies where the slopes appear to become steeper.

Fig. 27.3
figure 00273

Low-frequency thresholds for seven listeners. Three slopes are shown

Conclusion

Mathematical models based on cross-correlation can successfully fit features of the mid- and high-frequency dependence of ITD thresholds, including the divergence at 1,450 Hz. They also predict the low-frequency limit of thresholds. The models gain simplicity by postulating thresholds internal to the binaural system. Incorporating variance, as in signal detection theory, can be expected to make the predictions fuzzier and more complicated.