1 Introduction

Perception in natural environments depends on the integration of multiple sensory cues within and across modalities. When sensory cues provide complementary information or are corrupted by independent noise, combining them can lead to improved performance (van Beers et al. 1999; Ernst and Banks 2002; Battaglia et al. 2003; Oruç et al. 2003; Alais and Burr 2004; Hillis et al. 2004; Knill and Pouget 2004; Gu et al. 2008; Landy et al. 2011; Ohshiro et al. 2011; Hollensteiner et al. 2015). This occurs, for example, when visual and auditory cues are combined to determine the position of an object (Battaglia et al. 2003; Alais and Burr 2004; Whitchurch and Takahashi 2006). Performance improvement through cue combination can also occur within a single modality. For example, multiple visual cues are combined for depth perception (Landy et al. 1995; Jacobs 1999) and multiple auditory cues are combined for sound localization (Moiseff 1989; Middlebrooks and Green 1991; Wightman and Kistler 1993) or auditory scene analysis (Bregman 1994; Roman et al. 2003). Although the performance benefit of cue combination is well understood, whether cue combination by neural populations approaches optimality is an open question.

Bayesian inference provides a framework for studying how multiple sensory cues and prior information about the environment can be integrated optimally to drive behavior (Knill and Pouget 2004; Angelaki et al. 2009). Bayesian inference relies on a posterior distribution, which describes what is known about the environment given the sensory input. In many studies of sensory-cue combination, cues are assumed to be conditionally independent (e.g., van Beers et al. 1999; Ernst and Banks 2002; Battaglia et al. 2003; Jacobs 1999; Alais and Burr 2004; Hillis et al. 2004). Conditional independence of sensory cues means that the probability of one cue does not depend on the value of the other, given the state of the environment. Bayes’ theorem specifies that the posterior distribution, given conditionally independent sensory inputs, is determined by a product of sensory-cue likelihoods and the prior over the environmental variables.

Previous studies have used linear and nonlinear approaches to model Bayes-optimal cue combination in neural circuits. Several proposals suggest that the neural basis of optimal cue combination for Bayesian inference is a linear combination of neural responses (Ma et al. 2006; Fetsch et al. 2011). Multiplying the likelihood functions in Bayes’ theorem can correspond to a linear combination of firing rates when the distributions of neural responses are Poisson-like (Ma et al. 2006; Beck et al. 2007). Alternatively, a linear combination of firing rates may also be optimal if firing rates match the log-likelihood function, where the product of the likelihoods is achieved through the sum of log-likelihoods. Modeling work has shown that a network may compute such an operation with a linear combination of firing rates (Jazayeri and Movshon 2006). Nonlinear Bayes-optimal cue combination models can be divided into classes of models. For one, models that utilize a hidden layer of neurons acting as basis functions for nonlinear computation (Eliasmith and Anderson 2004; Beck et al. 2011). These models rely on the presence of neurons with nonlinear tuning to the cues that allow the network to perform accurate function approximation (Eliasmith and Anderson 2004) or information preservation (Beck et al. 2011) but the specific form of nonlinearity remains undefined. A second alternative class of nonlinear models for the neural implementation of Bayesian inference takes into account the ubiquitous non-uniformity in neural representations. In these models, referred to as non-uniform population codes, the statistical structure of the environment is encoded in the non-uniform distribution of preferred stimuli and tuning curve shapes across the population. Studies in birds and humans are consistent with this theory (Fischer and Peña 2011; Girshick et al. 2011; Cazettes et al. 2014). These studies have shown that the statistical relationship between environmental variables (such as image boundaries or the location of a sound source) and the sensory cues used to make inferences about these variables (edge orientation (Girshick et al. 2011) and interaural time difference (ITD) (Shi and Griffiths 2009; Fischer and Peña 2011), respectively) could be represented in the non-uniform tuning properties. In natural environments, the visual and auditory tasks of scene segmentation and sound localization may rely on the integration of multiple sensory cues (Moiseff 1989; Landy and Kojima 2001). However, the question of what operations neurons perform for Bayes-optimal cue combination in the non-uniform population code model also remains open.

Here we investigate the neural basis of optimal cue combination in the owl’s sound localization system. To localize sounds, animals rely on the noisy and ambiguous sensory cues ITD and interaural level difference (ILD) (Knudsen and Konishi 1979; Moiseff 1989; Brainard et al. 1992). In the owl’s external nucleus of the inferior colliculus (ICx), and its direct projection site the optic tectum (OT), there is a map of auditory space (Knudsen and Konishi 1978; Knudsen 1982). It has been shown that the spatial selectivity of ICx and OT neurons depends on the tuning to ITD and ILD, where ITD is used for azimuth and ILD for elevation (Moiseff and Konishi 1983; Moiseff 1989; Brainard et al. 1992). The combination selectivity to ITD and ILD has been shown to emerge by an effective multiplication of inputs tuned to ITD and ILD (Peña and Konishi 2001; Fischer et al. 2007). Estimating sound direction from ITD and ILD involves several sources of uncertainty. First, the ITD and ILD computed in the brain are corrupted by noise from the environment (Spitzer et al. 2003; Cazettes et al. 2014) and neural computation (Christianson and Peña 2006; Pecka et al. 2010). Second, ITD and ILD are not uniquely related to a particular direction in space (Brainard et al. 1992; Fischer and Peña 2011). Additional ambiguity arises because the true ITD cannot be distinguished from other ITDs corresponding to equivalent interaural phase differences (IPD) in the owl’s early sound localization pathway, where ITD is computed in narrow frequency bands (Wagner et al. 1987; Brainard et al. 1992; Peña and Konishi 2000).

The owl’s reliance on sound localization for survival (Konishi 1993) and the observation that the owl’s localization behavior in the horizontal dimension is consistent with Bayesian inference (Fischer and Peña 2011) motivates the hypothesis that neurons in the owl’s localization pathway should combine the spatial cues IPD and ILD optimally. Furthermore, the multiplicative integration of ITD and ILD tunings underlying the spatial selectivity of the owl’s space-specific neurons (Peña and Konishi 2001) offers the opportunity to examine the role of nonlinear integration in optimal cue combination. Here we derive the optimal form of cue combination in a non-uniform population code that matches the statistics of the environment and is read-out by a population vector (PV). We then show that IPD and ILD cues are approximately conditionally independent and that the owl’s localization behavior in the horizontal and vertical dimensions is described by an optimal combination of these cues (Fig. 1).

Fig. 1
figure 1

Bayesian cue combination. a, b Components of the Bayesian model for two example stimulus directions a (0°,10°) and b (5°,80°). The diamond plots show the frontal hemisphere measured in double-polar coordinates corresponding to azimuth (horizontal) and elevation (vertical) directions. Azimuth and elevation each change in five-degree steps. The color scale ranges from zero (blue) to the maximum of the plotted function (red). ILD and IPD provide complementary information about sound location. Left, the IPD likelihood is primarily restricted in azimuth and the ILD likelihood is primarily restricted in elevation. The target direction is indicated by a white circle. Center top, the ILD-IPD likelihood is a product of each cue’s likelihood. It has a peak at the true source direction, but also has secondary peaks. Center bottom, the prior emphasizes directions at the center in azimuth and below the center in elevation. Right, the posterior is the product of the likelihood and prior and has a single dominant peak. For a source direction near the center (a), the posterior is more focused on the true source direction than is the likelihood. In contrast, for a source direction in the periphery (b), the posterior is biased away from the source direction toward the center of gaze

2 Materials and methods

2.1 IPD and ILD estimated from head-related transfer functions (HRTFs)

The HRTFs of ten barn owls were provided by Dr. Keller (Keller et al. 1998) from the University of Oregon. The ILD and IPD cues were computed from the HRTFs by first convolving a white noise stimulus (0.5–12 kHz) with the head-related impulse responses for the left and right ears at the target direction. The left and right ear inputs were then filtered with a gammatone filterbank having center frequencies covering 2–9 kHz, equal gains across frequency, and bandwidths estimated from barn owl auditory-nerve fiber responses (Köppl 1997b). IPD was calculated in each frequency channel from the delay at the peak of the cross-correlation of the left and right filterbank outputs. ILD was calculated in each frequency channel as the difference in the logarithms of the root-mean-square values of the filterbank outputs.

2.2 Bayesian model of sound localization

To define the likelihood, we consider a model where the sensory observation made by the owl is given by the ILD and IPD spectra derived from barn owl head-related transfer functions (HRTFs) after the spectra are corrupted with additive noise. For a source azimuth θ and elevation ϕ, the observation vector s is expressed as

$$ \mathbf{s}=\left[\begin{array}{c}\hfill {\mathbf{s}}_{\mathrm{IPD}}\hfill \\ {}\hfill {\mathbf{s}}_{\mathrm{ILD}}\hfill \end{array}\right]=\left[\begin{array}{c}\hfill \mathbf{I}\mathbf{P}{\mathbf{D}}_{\theta, \phi}\hfill \\ {}\hfill \mathbf{I}\mathbf{L}{\mathbf{D}}_{\theta, \phi}\hfill \end{array}\right]+\left[\begin{array}{c}\hfill {\eta}_{\mathrm{IPD}}\hfill \\ {}\hfill {\eta}_{\mathrm{ILD}}\hfill \end{array}\right] $$
(1)

where the ILD spectrum ILD θ,ϕ  = [ILD θ,ϕ (ω 1), ILD θ,ϕ (ω 2), …, ILD θ,ϕ (ω K )] and the IPD spectrum IPD θ,ϕ  = [IPD θ,ϕ (ω 1), IPD θ,ϕ (ω 2), …, IPD θ,ϕ (ω K )] are specified at frequencies ω i between 3 and 9 kHz in steps of 0.6 kHz.

The noise corrupting the ILD spectrum η ILD is modeled as a Gaussian random vector with independent components. The variance of each component is frequency- and direction-dependent. The IPD noise η IPD is assumed to have a circular Gaussian distribution with mean zero at each frequency. As is the case for ILD, the variance of each IPD noise component is frequency- and direction-dependent. We assume that the ILD and IPD noise terms η ILD and η IPD are mutually independent conditioned on the source direction.

We calculated the environmental variability of IPD and ILD over different directions of concurrent noise sources (Cazettes et al. 2014). For each target direction, a white noise stimulus (0.5–12 kHz) was convolved with the head-related impulse responses for the left and right ears at the target direction. The directions of the target sounds covered the frontal hemisphere using five-degree steps measured in double polar coordinates, leading to 685 target directions. Additionally, concurrent white noise sources were simulated as arising from directions surrounding the owl by convolving the noise with head-related impulse responses for the left and right ears at the direction of the second source. The directions of concurrent noise sources covered directions surrounding the owl using all possible elevations in the frontal hemisphere and elevations between −25 and 25° for directions in the rear hemisphere, leading to 952 total directions. The signals from the two sources were added together to form the input to the left and right ears. The left and right ear inputs were then filtered with a gammatone filterbank having center frequencies covering 2–9 kHz, equal gains across frequency, and bandwidths estimated from barn owl auditory-nerve fiber responses, as above (Köppl 1997b). We calculated the IPD σ 2 IPD,θ,ϕ (ω i ) and ILD variances σ 2 ILD,θ,ϕ (ω i ) and the correlation between IPD and ILD over all directions of the second noise source. These variances are a function of the azimuth θ, elevation ϕ, and frequency ω i of the target sound.

The likelihood function of azimuth θ and elevation ϕ for observed cues s IPD and s ILD has the form

$$ {p}_{\mathbf{s}|\Theta, \Phi}\left(\mathbf{s}|\theta, \phi \right)={p}_{{\mathbf{s}}_{\mathrm{IPD}}|\Theta, \Phi}\left({\mathbf{s}}_{\mathrm{IPD}}|\theta, \phi \right){p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left({\mathbf{s}}_{\mathrm{ILD}}|\theta, \phi \right) $$
(2)

where the ILD likelihood function is a Gaussian given by

$$ {p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left({\mathbf{s}}_{\mathrm{ILD}}|\theta, \phi \right)\propto \exp \left[-\frac{1}{2}{\displaystyle {\sum}_{j=1}^N{\left(\frac{s_{ILD}\left({\omega}_j\right)-IL{D}_{\theta, \phi}\left({\omega}_j\right)}{\sigma_{\theta, \phi}\left({\omega}_j\right)}\right)}^2}\right] $$
(3)

and the IPD likelihood function is a circular Gaussian given by

$$ {p}_{{\mathbf{s}}_{\mathrm{IPD}}|\Theta, \Phi}\left({\mathbf{s}}_{\mathrm{IPD}}|\theta, \phi \right)\propto \exp \left[{\displaystyle {\sum}_{j=1}^N{\kappa}_{\theta, \phi}\left({\omega}_j\right) \cos \left({s}_{\mathrm{IPD}}\left({\omega}_j\right)-{\mathrm{IPD}}_{\theta, \phi}\left({\omega}_j\right)\right)}\right] $$
(4)

where κ θ,ϕ (ω j ) is a direction- and frequency-dependent parameter that determines the variance. We assume that the overall variance in IPD and ILD is a sum of the variance due to concurrent sources and a constant variance due to noise in neural computations: σ 2 θ,ϕ (ω i ) = σ 2 ILD,θ,ϕ (ω i ) + v ILD and \( \frac{1}{\kappa_{\theta, \phi}\left({\omega}_j\right)}={\sigma}_{IPD,\theta, \phi}^2\left({\omega}_i\right)+{v}_{IPD} \). The constant variances due to noise in neural computations v ILD and v IPD do not depend on direction or frequency and are parameters in the model that are found by fitting the model to the owl’s localization behavior. Behavioral data were taken from published reports in (Knudsen et al. 1979) of the absolute angular error for two owls performing head turns to sounds presented from speakers.

The prior density p Θ,Φ(θ, ϕ) is proportional to the product of an elevation component p Φ(ϕ) and an azimuth component p Θ(θ): p Θ,Φ(θ, ϕ) ∝ p Θ(θ)p Φ(ϕ). The elevation component of the prior density is a combination of two Gaussian functions that may have different widths above and below the mode μ ϕ :

$$ {p}_{\Phi}\left(\phi \right)=\left\{\begin{array}{c}\hfill \exp \left(-\frac{1}{2{\sigma}_{\phi_1}^2}{\left(\phi -{\mu}_{\phi}\right)}^2\right)\ \phi \le {\mu}_{\phi}\hfill \\ {}\hfill \exp \left(-\frac{1}{2{\sigma}_{\phi_2}^2}{\left(\phi -{\mu}_{\phi}\right)}^2\right)\ \phi >{\mu}_{\phi}\hfill \end{array}\ .\right. $$
(5)

The azimuth component of the prior density is a Laplace density

$$ {p}_{\Theta}\left(\theta \right)\propto \exp \left(-\frac{\left|\theta \right|}{\beta}\right), $$
(6)

with variance 2β 2. Parameters of the prior density were found by fitting the Bayesian model to the owl’s localization behavior.

The Bayesian estimate of stimulus direction in azimuth θ and elevation ϕ from the noisy IPD and ILD spectra is given by the mean direction under the posterior distribution p Θ,Φ|s (θ, ϕ|s). The mean direction is found by first computing the vector BV that points in the mean direction as

$$ BV=\iint u\left(\theta, \phi \right){p}_{\Theta, \Phi |\mathbf{s}}\left(\theta, \phi |\mathbf{s}\right) d\theta d\phi \propto \iint u\left(\theta, \phi \right){p}_{\mathbf{s}|\Theta, \Phi}\left(\mathbf{s}|\theta, \phi \right){p}_{\Theta, \Phi |\mathbf{s}}\left(\theta, \phi |\mathbf{s}\right) d\theta d\phi $$
(7)

where u(θ, ϕ) is a unit vector pointing in direction (θ, ϕ) and the proportionality follows from Bayes’ rule. The direction estimate is the direction of the mean vector.

3 Conditional independence of IPD and ILD

We analyzed the conditional independence of IPD and ILD by comparing kernel density estimates of the full joint distribution and the joint distribution assuming conditional independence. The kernel density estimate of the full joint distribution at one frequency from observations {IPD n (ω j ), ILD n (ω j )} N n = 1 was given by

$$ k\left(\mathrm{I}\mathrm{P}\mathrm{D},\mathrm{I}\mathrm{L}\mathrm{D}|\theta, \phi \right)\propto {\sum}_{n=1}^N \exp \left[\upalpha \cos \left(\mathrm{I}\mathrm{P}\mathrm{D}-{\mathrm{IPD}}_n\left({\omega}_j\right)\right)\right] \exp \left[-\frac{1}{2{\beta}^2}{\left(\mathrm{I}\mathrm{L}\mathrm{D}-{\mathrm{ILD}}_n\left({\omega}_j\right)\right)}^2\right] $$
(8)

and the kernel density estimate assuming conditional independence was given by

$$ {k}_{ind}\left(\mathrm{I}\mathrm{P}\mathrm{D},\mathrm{I}\mathrm{L}\mathrm{D}|\theta, \phi \right)\propto {\sum}_{n=1}^N \exp \left[\upalpha \cos \left(\mathrm{I}\mathrm{P}\mathrm{D}-{\mathrm{IPD}}_n\left({\omega}_j\right)\right)\right]{\sum}_{n=1}^N \exp \left[-\frac{1}{2{\beta}^2}{\left(\mathrm{I}\mathrm{L}\mathrm{D}-{\mathrm{ILD}}_n\left({\omega}_j\right)\right)}^2\right]. $$
(9)

The parameters α = 11 and β = 1 were selected so that the IPD and ILD components had widths that were the same percentage of the range of IPD and ILD, respectively. The similarity of the kernel density estimates was assessed using the Kullback–Leibler divergence (Kullback and Leibler 1951) and the fractional energy in the first singular value of the singular value decomposition of the joint density.

4 Neural model

The neural model consisted of a population of 1000 direction-selective neurons that model the OT. The model population size is small relative to the number of neurons in OT (Knudsen 1983). The preferred directions were drawn independently from the prior distribution over direction p Θ,Φ(θ, ϕ).

The neural tuning curves are proportional to the likelihood function and are given by

$$ {f}_n\left({\mathbf{s}}_{\mathrm{IPD}},{\mathbf{s}}_{\mathrm{ILD}}\right)={f}_{\max }{p}_{{\mathbf{s}}_{\mathrm{IPD}}|\Theta, \Phi}\left({\mathbf{s}}_{\mathrm{IPD}}|{\theta}_n,{\phi}_n\right){p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left({\mathbf{s}}_{\mathrm{ILD}}|{\theta}_n,{\phi}_n\right). $$
(10)

The maximum firing rate was set to 10 spikes/stimulus (Saberi et al. 1998). During simulations, the neurons have independent Poisson distributed firing rates r n (s IPD, s ILD) with mean values given by the neural tuning curves f n (s IPD, s ILD).

5 Population vector

The population vector is computed as a linear combination of the preferred direction vectors of the neurons, weighted by the firing rates

$$ PV\left({\mathbf{s}}_{\mathrm{IPD}},{\mathbf{s}}_{\mathrm{ILD}}\right)=\frac{1}{N}{\sum}_{n=1}^N\ u\left({\theta}_n,{\phi}_n\right){r}_n\left({\mathbf{s}}_{\mathrm{IPD}},{\mathbf{s}}_{\mathrm{ILD}}\right) $$
(11)

where u(θ n , ϕ n ) is a unit vector pointing in the n th neuron’s preferred direction. The direction estimate is the direction of the population vector (PV).

6 Density of preferred directions in the midbrain

To determine the density of preferred directions in OT, we used measurements of the shape of the OT auditory space map (Knudsen 1982), as described previously for azimuth (Fischer and Peña 2011). Briefly, assuming that cell density is homogeneous in OT, the physical distance between points corresponding to different preferred directions in the auditory space map will be proportional to the number of cells that lie between those directions. The relationship between preferred elevation and position in the auditory-space map may be described by a curve that is proportional to the cumulative distribution function of the density of preferred directions. To estimate the density of preferred elevations, we fit the relationship between preferred elevation and position in the OT space map with a curve that is proportional to the cumulative distribution function of a piecewise Gaussian density as used in the Bayesian model (Eq. 5). We fit the relationship between preferred azimuth and position in the OT space map with a curve that is proportional to a cumulative Laplace distribution function. The overall density of preferred directions in azimuth is a mixture of the fitted Laplace density and its mirror image for the other hemisphere (Fischer and Peña 2011).

7 Extracellular recording

Methods for surgery, stimulus delivery, and data collection have been described previously (Fischer et al. 2007). Briefly, four barn owls (Tyto alba) were anesthetized with intramuscular injections of ketamine (20 mg/kg; Ketaject; Phoenix Pharmaceuticals, St. Joseph, MO) and xylazine (2 mg/kg; Xyla-Ject; Phoenix Pharmaceuticals). Extracellular recordings of single ICcl neurons (n = 77) were made with tungsten electrodes (1 MΩ, 0.005-in.; A-M Systems, Carlsborg, WA). All recordings took place in a double-walled sound-attenuating chamber (Industrial Acoustics, Bronx, NY). Acoustic stimuli were delivered by a stereo analog interface [DD1; Tucker Davis Technologies (TDT), Gainesville, FL] through a calibrated earphone assembly. Stimuli for both intracellular and extracellular recordings consisted of broadband noise (0.5–12 kHz) 100 ms in duration with 5-ms linear rise and fall ramps. Stimulus ILD was varied in steps of 3–5 dB.

8 Intracellular recording

Methods for in vivo intracellular recordings of ICx neurons (n = 12) were described in detail and published previously (Peña and Konishi 2001, 2002, 2004). Briefly, barn owls were anesthetized by intramuscular injection of ketamine hydrochloride (25 mg/kg; Ketaset; Phoenix Pharmaceuticals, Mountain View, CA) and diazepam (1.3 mg/kg; Steris Laboratories, Phoenix, AZ). ICx was approached through a hole made on the exoccipital bone, which provided easier access to the optic lobe. All experiments were performed in a double-walled sound-attenuating chamber.

Sharp borosilicate glass electrodes filled with 2 M potassium acetate and 4 % neurobiotin were used. Analog signals were amplified (Axoclamp 2A) and stored in the computer. The tracer neurobiotin was injected by iontophoresis at the end of the recording (3 nA positive 300 ms current steps, 3 per second for 5 to 30 min). After the experiment owls were overdosed with Nembutal and perfused with 2 % paraformaldehyde. Brain tissue was cut in 60 μm thick sections and processed according to standard protocols (Kita and Armstrong, 1991).

We computed the median membrane potential during the first 50 ms of the response to sound and averaged it over three to five stimulus presentations. Mean resting potentials are the means of median membrane potentials averaged over all trials within a period of 100 ms before each stimulus onset. ITD and intensity response curves of median membrane potential responses were made by custom software written in Matlab.

9 Sound stimulation

Acoustic stimuli were digitally synthesized by a computer and delivered to both ears through calibrated earphones. Auditory stimuli consisted of broadband noise bursts (0.5–12.0 kHz; 50–100 ms duration and 5 ms rise and decay times; sound level was 40–50 dB sound pressure level. The computer synthesized three random signals to obtain different values of binaural correlation. One noise signal was delivered to both ears, making for the correlated component of the sound. The other two signals were used as the uncorrelated component of the stimulus by adding them to the correlated sound in varying amounts, while keeping the sound level constant. Binaural correlation varies with the relative amplitude of the uncorrelated and correlated noises by 1/(1 + k 2), where k is the ratio between the root-mean-square amplitudes of the uncorrelated and correlated noises.

10 Results

Below, we first use a theoretical approach to demonstrate that nonlinear combination of sensory cues allows for optimal Bayesian estimation provided cues are conditionally independent. Next, we confirm that neural responses perform nonlinear cue combination and that ITD and ILD are approximately conditionally independent. Finally, we show that the owl’s localization behavior in azimuth and elevation is described by the Bayesian model.

11 Nonlinear integration supports optimal cue integration

We first examine whether in this particular Bayesian framework optimal cue integration within (unisensory) or across (multisensory) sensory modalities is nonlinear. Consider the problem of inferring the value of an environmental stimulus X from two sets of cues C1 and C2. The cues are ultimately encoded by neurons that have preferred stimuli X n and mean responses to the cues given by f n (C1, C2). An example multisensory problem of this type is to infer the position of an object from auditory and visual cues. An example unisensory problem is to infer the position of a sound source from IPD and ILD. The Bayesian solution to this inference problem is to estimate the value of X from the posterior probability \( {p}_{\mathrm{X}|{\mathrm{C}}_1,{\mathrm{C}}_2}\left(X|{\mathrm{C}}_1,{\mathrm{C}}_2\right). \) If the preferred directions of neurons in the population X n are drawn from the prior distribution p X(X) and the population response is proportional to the likelihood \( {f}_n\left({\mathrm{C}}_1,{\mathrm{C}}_2\right)\propto {p}_{{\mathrm{C}}_1,{\mathrm{C}}_2|X}\left({\mathrm{C}}_1,{\mathrm{C}}_2|{X}_n\right) \), then a readout across the population with a PV will accurately approximate the mean of the posterior (Shi and Griffiths 2009; Fischer and Peña 2011). This analysis predicts that optimal cue combination is nonlinear because the likelihood will be a nonlinear function of the cues: \( {f}_n\left({\mathrm{C}}_1,{\mathrm{C}}_2\right)\propto {p}_{{\mathrm{C}}_1|X}\left({\mathrm{C}}_1|{X}_n\right){p}_{{\mathrm{C}}_2|X,{\mathrm{C}}_1}\left({\mathrm{C}}_2|{X}_n,{\mathrm{C}}_1\right)=g\left({\mathrm{C}}_1\right)h\left({\mathrm{C}}_1,{\mathrm{C}}_2\right). \) In general, cue combination can improve performance the most when cues have independent noise or provide different pieces of information about the stimulus. If the cues C1 and C2 are conditionally independent given X, then the likelihood factors and optimal cue combination is multiplicative:

$$ {f}_n\left({\mathrm{C}}_1,{\mathrm{C}}_2\right)\propto {p}_{{\mathrm{C}}_1|X}\left({\mathrm{C}}_1|{X}_n\right){p}_{{\mathrm{C}}_2|X}\left({\mathrm{C}}_2|{X}_n\right)={g}_n\left({\mathrm{C}}_1\right){h}_n\left({\mathrm{C}}_2\right). $$

Thus the main theoretical result is that for the PV to accurately approximate a Bayesian estimate, cue combination must be nonlinear. In particular, multiplicative neural responses are optimal for the combination of conditionally independent cues.

We now interpret this result in the context of estimating the azimuth and elevation (θ, ϕ) from IPD and ILD, and specify the combination rule that is optimal for neurons commanding this task, in the owl’s ICx. Because the spatial selectivity of ICx neurons results from their tuning to IPD and ILD (Moiseff and Konishi 1983; Brainard et al. 1992), the response of an ICx neuron with preferred direction (θ n , ϕ n ) to IPD and ILD can be described by a tuning function f n (IPD, ILD), where tuning to IPD and ILD is different for neurons with different preferred directions. If the sensory cues IPD and ILD are conditionally independent given the stimulus direction, then the likelihood function factors as a product of an IPD-based likelihood and an ILD-based likelihood: \( {p}_{\mathbf{s}|\Theta, \Phi}\left(\mathrm{I}\mathrm{P}\mathrm{D},\mathrm{I}\mathrm{L}\mathrm{D}|\theta, \phi \right)={p}_{{\mathbf{s}}_{\mathrm{IPD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{P}\mathrm{D}|\theta, \phi \right){p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{L}\mathrm{D}|\theta, \phi \right) \). In the non-uniform population code model we consider for the neural implementation of Bayesian inference (Shi and Griffiths 2009; Fischer and Peña 2011), the optimal neural representation of the sensory statistics requires tuning curves for IPD and ILD that are proportional to the likelihood function: f n (IPD, ILD) ∝ p s|Θ,Φ(IPD, ILD|θ n , ϕ n ). Thus, optimal cue combination is given by a product of one function of IPD and one function of ILD: \( {f}_n\left(\mathrm{I}\mathrm{P}\mathrm{D},\mathrm{I}\mathrm{L}\mathrm{D}\right)\propto {p}_{{\mathbf{s}}_{\mathrm{IPD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{P}\mathrm{D}|{\theta}_n,{\phi}_n\right){p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{L}\mathrm{D}|{\theta}_n,{\phi}_n\right)={g}_n\left(\mathrm{I}\mathrm{P}\mathrm{D}\right){h}_n\left(\mathrm{I}\mathrm{L}\mathrm{D}\right). \)

12 Statistics of spatial cues used for sound localization

Testing the Bayesian model for optimal combination of IPD and ILD cues requires a description of the variability in the sensory input and neural computations determining the likelihood functions of IPD and ILD. For this, we considered the sensory cues used by the owl for sound localization, given by the IPD and ILD spectra derived from barn owl head-related transfer functions (HRTFs), and how these cues can be corrupted by noise. The relationship between directions in space and IPD and ILD is known to be highly frequency dependent (Brainard and Knudsen 1993; Keller et al. 1998). The relationship is also ambiguous, where IPD and ILD cues near the center of gaze may be similar to those above and below the owl on the vertical plane (Fig. 2; (Brainard et al. 1992)). In addition, the sound localization cues that the owl uses to infer the source direction are subject to variability due to the nature of the sound, the presence of background noise (Nix and Hohmann 2006; Cazettes et al. 2014), and noise in neural computation (Christianson and Peña 2006; Fischer and Konishi 2008). Neural noise is particularly important for IPD because it limits the frequency range over which IPD cues are useful for sound localization. While IPD is a well-defined parameter of the acoustic signals over the entire audible frequency range in any animal, it is only useful in practice for sound localization up to approximately 9 kHz in barn owls and 5 kHz or lower in mammals because of limits in the abilities of neurons to phase lock to the stimulus at high frequencies (Johnson 1980; Palmer and Russell 1986; Köppl 1997a).

Fig. 2
figure 2

Ambiguity of sound localization cues. a Sounds from three different directions on the vertical plane (cyan, (5°,85°), red, (0°,0°) and black (30°,25°)). b ILD spectra for sound directions in (a). c IPD spectra for directions in (a). While the peripheral direction (cyan) has similar ILD and IPD spectra than sounds from a frontal direction (black), a source at an intermediate location (black) can have very different ILD and IPD spectra

We used owl HRTFs (n = 10, kindly provided by Clifford Keller) to determine the form of the variability in IPD and ILD as a function of frequency for different sound source directions, assuming naturalistic environments where concurrent sounds are usually present. We computed the variability of IPD for directions covering the frontal hemisphere, as previously shown on the horizontal plane (Cazettes et al. 2014), and expanded the analysis to include variability of ILD cues over different directions of concurrent noise sources. We calculated the reliability of ITD and ILD, defined as the inverse of the variance of each cue. The reliability of both IPD and ILD was largest for central directions at frequencies above approximately 4 kHz (Fig. 3). At lower frequencies, the reliability of ITD and ILD was less spatially dependent.

Fig. 3
figure 3

Reliability of sound localization cues ILD and IPD. The reliability (inverse variance) of ILD (left) and IPD (right) at each target direction in the frontal hemisphere in the presence of concurrent sources from other directions is plotted separately for frequencies between 2 and 8 kHz. The color axis is on the same scale for all ILD and IPD plots. Overall, the reliability of both IPD and ILD was highest for central directions at high frequencies

IPD and ILD are also subject to variability due to the neural computation underlying the emergence of the tuning for these cues. We used extracellular recordings of neural responses to IPD and ILD to assess this variability. Because the selectivity to both IPD and ILD is created by processing in narrow frequency channels (Manley et al. 1988; Carr and Konishi 1990; Mogdans and Knudsen 1994), we analyzed the tuning variability to IPD and ILD on a frequency-by-frequency basis. A previous study found that the variability of IPD tuning was constant across frequency channels in the owl’s localization pathway (Cazettes et al. 2014), indicated by a lack of correlation between the Fano factor of responses to IPD and stimulus frequency in ICx. Here, we extended this analysis to ILD coding by examining the neural variability of ILD tuning in the lateral shell of the central nucleus of the inferior colliculus (ICcl), which projects directly to ICx (Knudsen 1983). ICcl neurons are tuned to ITD and ILD but, unlike ICx, they are narrowly tuned to frequency. We used the average Fano factor over ILD to quantify the variability of ILD tuning of each ICcl neuron (min = 0.13, median = 0.77, max = 2.55, n = 77). There was no significant correlation between the best frequency and the average Fano factor in the sample of ICcl neurons with best frequencies ranging from 500 to 7900 Hz (r = 0.11, p = 0.39, n = 77). Therefore, we assumed a frequency-independent level of variability in IPD and ILD due to neural computation.

13 Conditional independence of IPD and ILD

Our theoretical result specifies that multiplicative cue combination is optimal for conditionally independent sensory cues. To test this result in the owl, we must first determine whether IPD and ILD are conditionally independent cues. Dependence between IPD and ILD can be due to environmental variability or noise in neural computation.

We first examined whether IPD and ILD are conditionally independent when considering the variability due to the presence of environmental noise induced by concurrent sounds. For each target direction, sounds were filtered by owl HRTFs at the target direction and other directions of a second source, following the same approach used to measure the individual variability of IPD and ILD. Figure 4 shows kernel density estimates of the joint distribution of IPD and ILD at three target directions, along with density estimates assuming conditional independence. The examples are at the first, second, and third quartiles of the Kullback–Leibler divergence between the joint density and the conditional-independence approximation (Fig. 4a–c). The close match between the conditional-independence approximation and the joint density is also seen in the large fractional energy carried by the first singular value of the singular value decomposition of the joint density, which is a measure of how accurately the joint density can be approximated by a product of functions of IPD and ILD alone (median = 0.98, interquartile range = 0.038). Additionally, we found low correlation between IPD and ILD variability over different concurrent noise directions (mean absolute correlation = 0.15, s.d. = 0.12, p < 10−3). These results are consistent with the environmental cues IPD and ILD being approximately conditionally independent at the input. We then tested whether IPD and ILD remained conditionally independent downstream in the sound localization pathway.

Fig. 4
figure 4

Conditional independence of IPD and ILD cues. a-c Environmental variability: (top) Kernel density estimates of the joint distribution of IPD and ILD induced by the presence of concurrent sources for three target sound directions; (bottom) Kernel density estimates assuming conditional independence for the same directions. The examples are at the first (a), second (b), and third (c) quartiles of the Kullback–Leibler divergence between the joint density and the independent approximation. d Three representative examples of normalized recordings of intracellular ICx membrane potential responses to broadband noise with varying ILD at a fixed IPD. Responses to correlated noise (dashed grey, BC = 1) match the responses when binaural correlation is 0 and IPD is undefined (solid black, BC = 0)

It is expected that IPD and ILD information remain conditionally independent within the sound localization system because IPD and ILD are processed in separate pathways (Takahashi et al. 1984). To test whether IPD and ILD cues remained conditionally independent down to ICx space specific neurons, we examined in vivo intracellular responses of these neurons to IPD and ILD. Specifically, we tested whether the ILD likelihood depends on IPD, i.e. whether \( {p}_{{\mathbf{s}}_{\mathrm{ILD}}|{\mathbf{s}}_{\mathrm{IPD}},\Theta, \Phi}\left(\mathrm{I}\mathrm{L}\mathrm{D}|\mathrm{I}\mathrm{P}\mathrm{D},\theta, \phi \right)={p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{L}\mathrm{D}|\theta, \phi \right) \). In the barn owl’s auditory system, IPD and ILD are initially processed in parallel pathways before converging in the inferior colliculus (Moiseff and Konishi 1983; Takahashi et al. 1984). To test for conditional independence of IPD and ILD, we examined membrane potential responses of ICx neurons to ILD for different conditions on IPD. Varying binaural correlation changes the reliability of the IPD cue (Jeffress et al. 1962; Albeck and Konishi 1995; Saberi et al. 1998; Egnor 2001; Peña and Konishi 2004). When the sound signals at the left and right ears are uncorrelated, the IPD is not defined because the sounds at the left and right ears are not delayed versions of each other and the responses of ICx neurons to ILD reflect the probability distribution \( {p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{L}\mathrm{D}|\theta, \phi \right) \). We therefore used responses to ILD measured with uncorrelated sounds as an estimate of the probability distribution of ILD that does not depend on IPD: \( {p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{L}\mathrm{D}|\theta, \phi \right) \). Conversely, we used responses to ILD with correlated sounds as a measure of the probability distribution of ILD given IPD \( {p}_{{\mathbf{s}}_{\mathrm{ILD}}|{\mathbf{s}}_{\mathrm{IPD}},\Theta, \Phi}\left(\mathrm{I}\mathrm{L}\mathrm{D}|\mathrm{I}\mathrm{P}\mathrm{D},\theta, \phi \right) \). If ILD and IPD are conditionally independent, ILD should have the same distribution in these cases and the responses should be highly correlated. In fact, the membrane potential responses of ICx neurons to ILD for uncorrelated sounds were highly correlated with the responses to ILD when IPD is present in the stimulus (n = 12; median correlation 0.97, interquartile range = 0.045; Fig. 4d). Additionally, if ILD and IPD are conditionally independent, then ILD should have the same distribution for any value of IPD and, similarly, IPD should have the same distribution for any value of ILD. This would predict that ILD tuning is invariant to changes in IPD and vice versa. Previous analyses have established that this holds in the owl’s ICx (Peña and Konishi 2001, 2004).

Thus the ILD cue is transmitted to the midbrain independently from IPD, consistent with the claim that the cues IPD and ILD, as processed and encoded by the owl’s sound localization pathway, are conditionally independent (Egnor 2001; Peña and Konishi 2004). Together, the measures of correlated variability at the input signals and neural responses suggest that IPD and ILD cues are approximately conditionally independent.

14 Sound localization in azimuth and elevation predicted by a Bayesian model

Because ILD and IPD cues are approximately conditionally independent, we modeled the noise corrupting these cues as Gaussian and circular Gaussian random vectors, respectively, with mutually independent components. With this noise model, the likelihood function is a product of IPD and ILD likelihoods (Eqs. 24). The noise variances are a sum of a frequency-dependent component determined by the environmental variability of IPD and ILD that we described above and a frequency-independent component modeling the neural noise that was fit to the owl’s behavior (Materials and Methods).

The prior in the Bayesian model of the owl’s sound localization predicts the bias for directions near the center of gaze (Edut and Eilam 2004; Fischer and Peña 2011). We modeled the prior as a product of two functions, one of azimuth and another of elevation. Based on a previous study of localization in azimuth (Fischer and Peña 2011), the azimuthal component emphasizes directions in the front. The density in azimuth was modeled as a Laplace density with zero mean and unknown variance (Eq. 6). In contrast to azimuth, where directions to the left and right of the owl have the same behavioral significance, directions above and below the owl may have different significance. Perched owls will spend time localizing prey from the perched position before flying to capture it (Ohayon et al. 2006; Fux and Eilam 2009a, b). Thus, while perched, directions below the owl may be more likely directions for prey. Therefore, we modeled the elevation component of the prior as a piecewise combination of two Gaussians with unknown variances and common mean to allow directions above and below the owl to be treated differently (Eq. 5). The prior thus has four unknown parameters corresponding to the variance in azimuth, the two width parameters in elevation, and the center in elevation.

After fitting the parameters of the model to the behavioral data, the performance of the Bayesian estimator matched the owl’s localization behavior (Fig. 5). The Bayesian model underestimated source directions in azimuth and elevation, similarly to the owl’s behavior (Knudsen et al. 1979). The root-mean-square error between the average estimates reported for two owls (Knudsen et al. 1979) and the Bayesian estimate was 2.4° in azimuth and 7.9° in elevation. The differences between the Bayesian model and the owl’s behavior lie within the angular discrimination limits of the owl in azimuth and elevation (Bala et al. 2007). Therefore, a Bayesian model relying on conditionally independent IPD and ILD cues describes the owl’s sound localization behavior in two dimensions.

Fig. 5
figure 5

Bayesian model performance in two-dimensional localization. The Bayesian model (Bayes) and population vector (PV) match the performance of two owls (Knudsen et al. 1979) in elevation (a) and azimuth (b). Error bars represent standard deviations over trials

15 Neural implementation of Bayesian sound localization in azimuth and elevation

We tested the predictions of the neural implementation of optimal cue combination using a neural model of the owl’s sound localization pathway (Materials and Methods). In this model, the tuning of the neurons to IPD and ILD was determined by the form of the likelihood function (Fig. 6a), while the preferred directions across the population were drawn from the prior that was found by fitting the Bayesian model to the owl’s behavior (Fig. 6b–d). The source direction in azimuth and elevation was estimated by a population vector (PV; Eq. 11), which has been shown to accurately predict the owl’s localization behavior in azimuth (Fischer and Peña 2011). With this network, a PV estimate of the source direction in azimuth and elevation matched both the Bayesian estimate and the owl’s behavior (Fig. 5). This result is expected, based on the mathematical argument that the PV will accurately approximate the Bayesian estimate when the preferred directions are drawn from the prior and the population response is proportional to the likelihood function (Fischer and Peña 2011). We then compared the neural representation of IPD and ILD in the model network to that found in the owl’s midbrain.

Fig. 6
figure 6

Neural implementation of the Bayesian model. a Model responses (top) are multiplicative, as seen in the owl’s ICx (bottom, data from (Takahashi 2010), reproduced with permission). The ILD-alone and IPD-alone plots are responses when only ILD or IPD is allowed to vary. b The Bayesian prior in two dimensions. c, d The prior (red) matches the distribution of preferred directions in azimuth c in the owl’s OT (dashed blue; (Knudsen 1982)). The prior in elevation is wider than the distribution of preferred elevations in OT, but both emphasize directions slightly below the center of gaze, which is indicated by the vertical line (d)

The modeled multiplicative responses of ICx neurons encoded the product of IPD- and ILD-based likelihoods: \( {f}_n\left(\mathrm{I}\mathrm{P}\mathrm{D},\mathrm{I}\mathrm{L}\mathrm{D}\right)\propto {p}_{{\mathbf{s}}_{\mathrm{IPD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{P}\mathrm{D}|{\theta}_n,{\phi}_n\right){p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{L}\mathrm{D}|{\theta}_n,{\phi}_n\right) \). The resulting multiplicative responses to IPD and ILD are consistent with the multiplicative responses reported previously in the owl’s midbrain (Peña and Konishi 2001, 2004; Fischer et al. 2007; Takahashi 2010) (Fig. 6a). This shows that the multiplicative responses observed in the owl’s midbrain can support optimal Bayesian cue combination.

The non-uniform prior distribution of direction in the Bayesian model and the distribution of preferred directions in the owl’s OT both emphasize directions near the center of gaze (Fig. 6b) (Knudsen et al. 1977; Knudsen 1982). As shown before (Fischer and Peña 2011), the density of preferred directions in azimuth predicted by the Bayesian model was consistent with the measured density in OT (Fig. 6c). However, the distribution of preferred directions in elevation was wider for the Bayesian model than the estimated density based on the shape of the OT auditory space map (Fig. 6d) (Knudsen 1982). The difference could be due to our method to estimate the prior from the shape of the space map. There are fewer data points in the periphery of the map for elevation than for azimuth, and this can cause more imprecision in the estimated shape of the distribution in elevation than in azimuth. Yet, for both the model and the owl’s space map, the highest density of directions in elevation was located below the horizontal plane. Therefore, the distribution of preferred elevations in OT is non-uniform and emphasizes directions below the owl, as seen in the prior distribution of the Bayesian model.

16 Nonlinear versus linear cue combination

To verify that multiplicative cue combination is necessary for optimal localization performance of the PV decoder, we tested this decoder on a population where model neurons combined IPD and ILD inputs additively (Fig. 7). Whether cue combination is linear or multiplicative greatly influences how sound source directions are represented in the auditory space map (Fig. 7a–h). Linear cue combination predicts increased activity at any place in the map where either preferred azimuth or elevation is consistent with the sensory IPD and ILD. By contrast, multiplicative cue combination predicts neural activity only at points in the map where both the preferred azimuth and elevation are consistent with the sensory IPD and ILD. With linear cue combination, the error increased dramatically in the periphery in both azimuth and elevation (Fig. 7i, j). This occurs because with additive responses, neurons with preferred directions near the center of gaze also respond when the source direction is in the periphery (Fig. 7g, h). Thus, additive responses cause source directions in the periphery to be confused with directions at the center of gaze, leading to unrealistically large errors in localization. These results show that for the PV decoder to perform optimally, cue combination must be multiplicative.

Fig. 7
figure 7

Localization with linear cue combination. Two examples (one in each row) of model ICx neuron responses where only ILD (a, b) or IPD (c, d) was let to vary with space for computing the responses. A nonlinear (multiplicative) combination of the ILD-alone and IPD-alone responses (IPD × ILD; e, f) produces a response only at directions that are consistent with both cues. A linear (additive) combination (IPD + ILD; g, h) produces responses at direction that are consistent with either cue. (i, j) The population vector fails to match the owl’s performance (Knudsen et al. 1979) in elevation (i) and azimuth (j) when model neurons combine ILD and IPD inputs linearly

17 Discussion

This study specified the role that nonlinear operations can play in optimal cue combination. We determined that conditionally independent sensory cues combined multiplicatively can support optimal estimate of the value of an unknown stimulus. This result predicts the robust multiplicative tuning to IPD and ILD that is observed in the owl’s midbrain and provides further meaning to the nonlinear integration of sensory cues within and across sensory modalities.

We have previously shown that a Bayesian model describes the owl’s localization behavior in azimuth for stationary (Fischer and Peña 2011) and moving (Cox and Fischer 2015) sources. The current work is the first analysis of optimal cue combination in this particular Bayesian framework. Here we extended previous analyses to show that the Bayesian model describes behavior in both azimuth and elevation. We also derived conditions for the PV to perform Bayesian cue combination. Therefore, the Bayesian framework explains localization in azimuth of stationary sources using ITD (Fischer and Peña 2011), localization in azimuth of moving sources using ITD (Cox and Fischer 2015), and, here, localization in azimuth and elevation using IPD and ILD.

The primary assumptions of this modeling framework are that the prior is represented in the distribution of preferred directions, that the likelihood is represented in the pattern of activity across the population and that the population is readout through a PV. There is clear experimental evidence for a non-uniform distribution of preferred directions in the midbrain space-map that matches the prior distribution in the model (Knudsen 1982; Fischer and Peña 2011). We have also found experimental support for the likelihood being represented in the pattern of activity across the population. In particular, the selectivity of the neurons decreases when sensory noise increases, so that activity is spread over more of the map (Cazettes et al. 2016). Moreover, the selectivity of neurons is lower in the periphery of the space map, where IPD is a less reliable cue. The present work highlights that the well-known multiplicative tuning of space specific neurons (Peña and Konishi 2001; Fischer et al. 2007) allows the space-map responses to match the nonlinear combination of IPD and ILD joint likelihood.

While our analysis leads to the prediction that optimal cue combination is nonlinear, some other approaches suggest that optimal cue combination is linear (Jazayeri and Movshon 2006; Ma et al. 2006; Beck et al. 2007). The differences are related to the way probabilities are assumed to be represented in neural populations. How probabilities are represented may depend on the type of inference problem that involves the neural population. Here, we describe optimal cue combination for a Bayesian estimation problem where the goal is to estimate a continuous variable (sound source direction) from the posterior distribution. Alternatively, others have studied cue combination in the context of a discrimination task where the goal is to choose between discrete stimulus conditions (Fetsch et al. 2011). An optimal approach to solving the discrimination task is to compare the log-likelihood ratio to a threshold (Van Trees 2004). This operation may be computed in a neural circuit using linear operations on neural responses (Gold and Shadlen 2001) and could generalize to the case of cue combination. Our prediction of nonlinear cue combination addresses the case of producing an optimal estimate of a continuous variable using a PV; we would not arrive at the prediction of nonlinear cue combination if the goal was to represent the log-likelihood ratio in order to perform a discrimination task. The different predictions for cue-combining neurons may also be related to different assumptions for how probabilities are represented in neural populations, even for the same task. To specify the differences we will consider an inference problem where the goal is to estimate an environmental variable X (e.g. object location) based on two sensory cues C1 and C2 (e.g. auditory and visual input) that are initially encoded in two separate populations with responses r 1 and r 2 and then combined in a population with response r 3. In a probabilistic population code (PPC), it is assumed that the brain produces an estimate of X from the posterior distribution \( {p}_{\mathrm{X}|{\boldsymbol{r}}_3}\left(X|{\boldsymbol{r}}_3\right) \) (Ma et al. 2006; Beck et al. 2007). It has been shown that if the neurons have Poisson-like variability, then the optimal cue combination strategy is for r 3 to be a linear combination of the input activities r 1 and r 2 (Ma et al. 2006; Beck et al. 2007). This prediction arises from the assumption that the brain is drawing inferences from a posterior distribution \( {p}_{\mathrm{X}|{\boldsymbol{r}}_3}\left(X|{\boldsymbol{r}}_3\right) \) that models the neural variability in the population r 3. However, the Poisson-like variability describes the neural variability for repeated presentations of the same stimulus and does not include the statistical relationship between the environmental cause X and the sensory stimulus, which is a major component of the overall statistics of sensory information. Also, a distribution over the population response r 3 is a very high-dimensional distribution, where the dimensionality matches the number of neurons in the population. Such a high-dimensional distribution may be difficult to learn. A similar alternative model assumes that inferences are computed using a log-likelihood function \( \ln \left({p}_{{\boldsymbol{r}}_1,{\boldsymbol{r}}_2|X}\left({\boldsymbol{r}}_1,{\boldsymbol{r}}_2|X\right)\right)= \ln \left({p}_{{\boldsymbol{r}}_1|X}\left({\boldsymbol{r}}_1|X\right){p}_{{\boldsymbol{r}}_2|X}\left({\boldsymbol{r}}_2|X\right)\right)= \ln \left({p}_{{\boldsymbol{r}}_1|X}\left({\boldsymbol{r}}_1|X\right)\right)+ \ln \left({p}_{{\boldsymbol{r}}_2|X}\left({\boldsymbol{r}}_2|X\right)\right) \) (Gold and Shadlen 2001; Jazayeri and Movshon 2006). Here, the logarithm transforms the nonlinear problem to adding log-likelihood functions for the input populations. Extending the model of (Jazayeri and Movshon 2006) for the representation of a log-likelihood function to cue combination would allow for optimal cue combination using linear operations. This model also assumes that the brain represents high-dimensional probability distributions that only describe the Poisson-like neural variability. Our approach is based on an alternative framework that assumes that the population r 3 represents the low-dimensional distribution \( {p}_{\mathrm{X}|{\mathrm{C}}_1,{\mathrm{C}}_2}\left(X|{\mathrm{C}}_1,{\mathrm{C}}_2\right) \) that describes the relationship between the environmental variables and the sensory cues. In this framework, the neural Poisson-like variability for repeated presentations of the same stimulus is treated as noise. While this is suboptimal, it simplifies the probability distribution that the brain must learn while representing the statistical relationship between the environment and the cues, which is the central component of the perceptual inference problem. Furthermore, for large populations, inferences made using the low-dimensional distribution \( {p}_{\mathrm{X}|{\mathrm{C}}_1,{\mathrm{C}}_2}\left(X|{\mathrm{C}}_1,{\mathrm{C}}_2\right) \) may closely approximate inferences made using the high-dimensional distribution \( {p}_{\mathrm{X}|{\boldsymbol{r}}_1,{\boldsymbol{r}}_2}\left(X|{\boldsymbol{r}}_1,{\boldsymbol{r}}_2\right) \) (Cazettes et al. 2016).

Nonlinear combinations have also been used in models of marginalization. In the PPC framework (Beck et al. 2011), the sound localization problem could be viewed as a marginalization problem, where the goal is to make inferences from the posterior \( {p}_{\Theta, \Phi |{\boldsymbol{r}}_3}\left(\theta, \phi |{\boldsymbol{r}}_3\right) \) as:

$$ {p}_{\Theta, \Phi |{\boldsymbol{r}}_3}\left(\theta, \phi |{\boldsymbol{r}}_3\right)=\int {p}_{\Theta, \Phi |\mathbf{s}}\left(\theta, \phi |\mathrm{I}\mathrm{P}\mathrm{D},\mathrm{I}\mathrm{L}\mathrm{D}\right){p}_{\mathbf{s}|\Theta, \Phi}\left(\mathrm{I}\mathrm{P}\mathrm{D},\mathrm{I}\mathrm{L}\mathrm{D}|{\boldsymbol{r}}_1,{\boldsymbol{r}}_2\right) dIPDdILD=\int {p}_{\Theta, \Phi |\mathbf{s}}\left(\theta, \phi |\mathrm{I}\mathrm{P}\mathrm{D},\mathrm{I}\mathrm{L}\mathrm{D}\right){p}_{{\mathbf{s}}_{\mathrm{IPD}}|{\boldsymbol{r}}_1}\left(\mathrm{I}\mathrm{P}\mathrm{D}|{\boldsymbol{r}}_1\right){p}_{{\mathbf{s}}_{\mathrm{ILD}}|{\boldsymbol{r}}_2}\left(\mathrm{I}\mathrm{L}\mathrm{D}|{\boldsymbol{r}}_2\right) dIPDdILD. $$

This may be accomplished using a basis function network with parameters that are optimized to preserve information (Beck et al. 2011). The basis function network may use multiplicative responses in the hidden layer, although a different form of nonlinearity may possibly provide better performance. In this framework, some form of nonlinear cue combination in the basis function layer would be required, but the analysis does not predict that multiplicative responses are optimal for cue combination in general. Experimental testing of the neural basis of marginalization will be necessary to prove the plausibility of these theories.

We propose that the responses of cue-combining neurons are given by a product of functions of the separate cues. Therefore, scaling the amplitude of one input scales the overall response. For cue reliability to influence behavior, a decrease in reliability must cause an increase in tuning widths for that cue. It has been shown that changing the reliability of IPD causes IPD tuning curves to widen, consistent with the widening of the likelihood function as reliability changes (Cazettes et al. 2016). This is distinct from a reweighting of inputs in linear neural responses that is predicted by the PPC model (Fetsch et al. 2011). The prediction of our optimal encoding model is that when the reliability of the cues change, the optimal form of cue combination remains multiplicative. For example, changing the reliability of IPD will affect the IPD-based likelihood \( {p}_{{\mathbf{s}}_{\mathrm{IPD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{P}\mathrm{D}|{\theta}_n,{\phi}_n\right) \), but it will not change the ILD-based likelihood \( {p}_{{\mathbf{s}}_{\mathrm{ILD}}|\Theta, \Phi}\left(\mathrm{I}\mathrm{L}\mathrm{D}|{\theta}_n,{\phi}_n\right) \), nor will it change the prediction that neural responses should be multiplicative. Intracellular in vivo recordings of responses to ITD and ILD in ICx showed that multiplicative cue combination is in fact robust to changes in cue reliability (Peña and Konishi 2004), which is consistent with the optimal cue combination model. These results suggest that ICx responses are consistent with the product of IPD- and ILD-based likelihoods for conditionally independent cues at different levels of reliability for IPD.

Linear and multiplicative cue combinations make different predictions for how sound source directions are represented in the midbrain auditory space and how source direction can be optimally decoded. Multiplicative cue combination predicts that activity is more localized over the map, compared to the activity predicted by linear cue combination (Fig. 7). The multiplicative model is therefore more energy efficient than the linear model by reducing the number of neurons that spike in response to a stimulus (Niven and Laughlin 2008). The restricted activity in regions of the map where both preferred IPD and ILD match the sensory input allows the sound source direction to be estimated optimally from the population responses with a PV. By contrast, if a linear combination rule is used, then an operation to detect the maximum is typically used to decode direction, which is a highly noisy mechanism (Simoncelli 2009).

There are several possible mechanisms for generating multiplicative neural responses (Koch 2004). It is possible to approximate multiplication by addition and thresholding (Fischer et al. 2009). Also, network mechanisms that depend on stimulus saliency and descending inputs can select one region of activity in the population response (Mysore and Knudsen 2012, 2013) creating a multiplicative response. The evidence is consistent with the hypothesis that multiplicative tuning in the owl’s midbrain is generated in stages of linear-threshold neurons that first produce nonlinear tuning to IPD and ILD within frequency channels and then produce nonlinear tuning across frequency (Fischer et al. 2009). Further work is required to determine how nonlinear tuning first arises in the midbrain and the role that recurrent connections play in shaping the responses.

In summary, we provided a theoretical justification for conditions under which optimal cue combination must be nonlinear and showed that these conditions are met in the owl’s sound localization system. These results expand the functional implication of the robust multiplicative tuning to IPD and ILD that is observed in the owl’s midbrain (Peña and Konishi 2001, 2004) by showing that multiplicative responses can allow for neurons to represent environmental statistics of multiple conditionally independent cues. This finding may apply to other cases of nonlinear cue integration within and across sensory modalities (Stein and Stanford 2008; Xu et al. 2012).