1 Introduction

“Binocular rivalry” and “binocular fusion” are two fascinating perceptual phenomena that arise in the cortical processing of visual information. Binocular rivalry is a visual phenomenon that appears when two incompatible monocular images are presented, one image to the left eye and the other image to the right eye. Under such stimulation, the perception is not that of a fused image; rather, only one of the images is perceived at a time, for a random duration of a few seconds, with irregular temporal transitions between the perception of one image and the other. This rivalry phenomenon was described centuries ago, for example by Della Porta (1593) as referenced by Wade (1998), and Dutour (1760); and has attracted numerous research – including psychophysical experiments (e.g., Blake and Fox 1974; Levelt1965; O’Shea and Crassini 1981), electrophysiology in animals (e.g., Leopold and Logothetis 1996; Gail et al. 2004), and neural imaging studies (e.g. fMRI) in humans (e.g., Polonsky et al. 2000; Tong and Engel 2001). On the other hand, binocular fusion (or single vision) generates the perception of a single image by “fusing” the two monocular stimulations in certain ways. For example, position fusion, as discussed in this paper, is the phenomenon that two images of oriented objects with similar orientation, presented binocularly, give rise to the perception of a single object that is a combination of the two monocular orientations (Nelson 1975).

Oriented bars and gratings are common visual stimuli with which to test binocular rivalry quantitatively. When the angles of grating orientation are similar, fusion results; when orthogonal, rivalry.

The cortical mechanisms that underlie binocular rivalry and fusion have not been completely understood. Most likely these mechanisms are initiated by front end “lower level” processing in the primary visual cortex (V1) (Xu et al. 2016), and significantly modulated by top-down feedback from higher cortical regions, such as attention. See Brown and Norcia (1997), Zhang et al. (2011), Brascamp and Blake (2012), Cavanagh and Holcombe (2006), and Ling and Blake (2012), and a review by Dieter et al. (2016). Interocular (or “cross-column”) inhibition plays a significant role underlying binocular rivalry, and theoretical models have been proposed to explain rivalry (Laing and Chow 2002; Wilson 2003; Moreno-Bote et al. 2007), each relying on some form of interocularor inhibition as a key mechanism. However, less common are models of binocular rivalry for distinct stimuli that also include the binocular fusion of similar stimuli, and those which do study rivalry and fusion all use the same mechanism (Said and Heeger 2013; Li et al. 2017; Wilson 2017). Previously there are two differing general conceptual ideas of fusion – one by Blake and O’Shea (1988) and Blake (1989) and the other by Wolfe (1986). In Blake’s view, fusion is the default percept for similar stimulations; and only when the two binocular stimuli are too dissimilar to fuse, does rivalry result (Blake 1989). On the other hand, in Wolfe’s view, rivalry is always present, and the perception of fusion results from the interaction of two pathways (the stereopsis and rivalry pathways) (Wolfe 1986). In addition to these two conceptual ideas of fusion, we propose a third: to introduce cross-column excitatory connections to overcome cross-column inhibition and allow the fusion of similar binocular stimulation.

In this work, we investigate how each one of these three conceptual mechanisms for binocular rivalry & fusion can be explicitly realized in computational models by developing three distinct neuronal network models. Each model consists of conductance-based, integrate-and-fire point neurons with firing rate adaptation, divided into subpopulations for each functional layer, each type (excitatory or inhibitory), and each ocular dominance “column”. These models are similar to more realistic large-scale neuronal models, but with two significant idealizations – i) a one dimensional “ring” architecture, and ii) the explicit incorporation into each model of a mechanism designed to produce rivalry and fusion. Our goal for these models is to understand in detail each potential conceptual mechanism; and (since these types of models are easy to compare with more realistic models) to develop some understanding about which of the conceptual frameworks is likely to emerge in large-scale comprehensive neuronal models of layers of primate V1 – models (such as Cai et al. 2005; Zhou et al. 2013; Chariker et al.2016) that are constructed through anatomical and physiological constraints, and not by inputting special architecture designed to achieve binocular rivalry & fusion. And through this, we hope to develop intuition about which of the conceptual mechanisms are likely to be present in human or primate primary visual cortex.

Each of the three models captures both fusion and rivalry, and the hysteretic transitions (Wilson 2017; Buckthought et al. 2008) between them, but through distinct underlying cortical mechanisms. These models address the front end of the visual system and do not contain higher level feedback such as attention; thus, they address only front end mechanisms that initiate binocular rivalry and fusion. Each model consists of several “layers” of neurons (see Figs. 12, & 3), with L-R ocular dominance layers of monocular neurons that receive excitation from left (right) eye, together with a binocular “summation” or “perception” layer that sums information from the two ocular dominance layers. In this Introduction, we will give a general overview of each of the three models, with more detailed descriptions in the section Methods and Models.

Fig. 1
figure 1

Schematic of the First Model: A three-layer “ring” model, with two ocular dominance columns (“left” & “right”). In each layer, the neurons reside on a “ring”, with each neuron’s position on the ring labeled by the neuron’s orientation preference. The first bottom layer consists of monocular neurons (yellow, excitatory; green, inhibitory); the second middle layer, opponency neurons (yellow, excitatory; green, inhibitory); and the third top, binocular summation neurons (red, excitatory). The binocular and monocular layers also contain additional sets of 90 inhibitory neurons (not shown) that provide local recurrent in-column inhibition. All the projections have left-right symmetry, thus only half of them are shown. Dotted projections are inactive under the stimulations shown. Cross-column projections 2,3 target neurons of the same orientation preferences. Projection 1 – left excitatory monocular neurons to left excitatory opponency neurons; projection 2 – cross-column feedforward projection of right excitatory monocular neurons to NMDA receptors on left inhibitory opponency neurons; projection 3 – local projection of inhibitory neurons to excitatory neurons within the left opponency layer; projection 4 – cross-column feedback projection of left excitatory opponency neurons to NMDA receptors on right monocular inhibitory neurons; projection 5 – excitatory projection from left monocular excitatory neurons to binocular neurons; projection 6 – local projection of right inhibitory neurons to right excitatory neurons of all orientation preferences. The coupling strengths of the labeled projections are 1 - 10nS, 2 - 20nS, 3 - 20nS, 4 - 16nS, 5 - 12nS, 6 - 30nS. a Stimulus to the left “eye” is orthogonal to the stimulus to right “eye”. With this orthogonal stimulation, the local inhibitory projection 3 is not active and rivalry results. b Stimuli of similar orientations project to each “eye”. With this similar stimulation, feedback projection 4 and local inhibitory projection 6 are not active, and fusion results

Fig. 2
figure 2

Schematic of Second Model: A four-layer “ring”model, with two ocular dominance columns (“left” & “right”). In each layer, the neurons reside on a “ring”, with each neuron’s position on the ring labeled by the neuron’s orientation preference, the color conventions follows Fig. 1. The top or fourth layer consists of binocular summation neurons. The third layer’s excitatory neurons provide cross-column feedback excitation to the monucular layer A. The second layer is the monocular layer A with the same architecture of connections as in the original model. The first or bottom layer is a new layer of monocular neurons, called monocular layer B. The binocular and monocular layer A also contain additional sets of 90 inhibitory neurons (not shown) that provide local recurrent in-column inhibition; all inhibitory neurons in monocular layer B are already present. As in model 1, all cross-column excitatory projections selectively target NMDA receptors on inhibitory neurons of the same orientation preference, and all the projections have left-right symmetry, thus only half of them are shown. Dotted projections are inactive under the stimulations shown. Projection 1 – feedback inhibition from monocular layer A, driven by cross-column excitation from excitatory neurons of the same orientation preference in the left layer A, and targeting inhibitory neurons of similar orientation preference in right layer B; projection 2 – local inhibition from inhibitory neurons in monucular layer B to excitatory neurons of similar orientation preference in the same layer; projection 3 – excitatory projection from right excitatory neurons in monocular layer B to binocular neurons of similar orientation preference. The coupling strengths of the labeled projections are 1 - 50nS, 2 - 500nS, 3 - 3nS. There are two additional feedback projections from layer B to layer A’s inhibitory neurons, one excitatory and one inhibitory (not shown). a Stimulus to the left “eye” is orthogonal to the right “eye”. With this orthogonal stimulation, the new layer is inactive because projection 2 remains active since projection 1 is targeting inhibitory neurons with orthogonal orientation preference that do not receive external stimulus; hence, projection 3 is inactive and the binocular neurons rival. b Stimuli of similar orientations project to each “eye”. With this similar stimulation, inhibitory projection 2 is inactive, and the binocular neurons fuse

Fig. 3
figure 3

Schematic of Third Model: A two-layer “ring”model, with two ocular dominance columns (“left” & “right”). In each layer, the neurons reside on a “ring”, with the neuron’s position on the ring labeled by the neuron’s orientation preference, the color conventions follows Fig. 1. The top layer consists of binocular summation excitatory neurons; the bottom layer is a monocular layer. Both layers also contain additional sets of 90 inhibitory neurons (not shown) that provide local recurrent in-column inhibition. All cross-column excitatory projections selectively target NMDA receptors on inhibitory neurons of similar orientation preference (sd of 3 degrees). All the projections have left-right symmetry, thus only half of them are shown. Projection 1 – cross-column feedback projection of left excitatory opponency neurons to NMDA receptors on right inhibitory neurons in the monocular layer; projection 2 – local projection of right inhibitory neurons to right monocular excitatory neurons of all orientation preferences; projection 3 – cross-column excitation, driven by cross-column projections from excitatory neurons in the left monocular layer, and targeting excitatory neurons of similar orientation preference in the right layer; projection 4 – excitatory projection from left excitatory neurons in the monocular layer to binocular neurons of similar orientation preference. The coupling strengths are 1 - 20nS, 2 - 8nS, 3 - 7.8nS, 4 - 12nS. a Stimulus to the left “eye” is orthogonal to the right “eye”. With this orthogonal stimulation, cross-column excitation fails to target the externally driven neurons. Thus cross-column inhibition dominates, resulting in rivalry of the percepts. b Stimuli of similar orientations project to each “eye”. With this similar stimulation, cross-column excitation by projection 3 is sufficient to fuse the percepts

We begin with the features common to each of the three models: Locally, and within each ocular dominance column, the excitatory synaptic connections between point neurons are mediated by (fast) AMPA type glutamate receptors; local inhibition is mediated by (fast) GABA A type receptors. All long-range cross-column projections are excitatory, with synaptic connections mediated by (slow) NMDA type receptors, selectively targeting both inhibitory and excitatory neurons with orientation preference similar to that of the projecting excitatory neuron. The models are idealized through the use of a “ring” architecture, in which the neurons of each subpopulation reside on a “ring” labeled by the neuron’s orientation preference, 𝜃j,j = 1,...,N. Theoretical studies (Shpiro et al. 2007; Moreno-Bote et al. 2007; Wilson 2003; Laing and Chow 2002) have convincingly argued that the primary mechanism underlying binocular rivalry is cross-column inhibition, which is common to all three models. This inhibition is generated by excitatory neurons in the left (right) ocular dominance column projecting to inhibitory neurons in the other right (left) column. These cross-column excitatory projections are presumably long-range; thus, they project selectively for orientation preference, to NMDA receptors on inhibitory neurons.

To prevent rivalry and allow fusion, some mechanism must overcome cross-column inhibition and allow similar stimuli to fuse. Each of the three models has its own distinct mechanism to overcome cross-column inhibition. The first model follows Said and Heeger (2013) and implements fusion as a defauilt percept through a subpopulation of excitatory neurons (“opponency neurons”). These opponency neurons drive feedback cross-column inhibition, but themselves are inhibited by “cross-column feedforward inhibition”; thus, they act as an “XOR” logic gate that is only active when the visual stimuli to each eye are sufficiently distinct. The second model realizes, for the first time, the conceptual framework of “two parallel pathways” – replacing the opponency neurons by a second layer of monocular neurons that provides a second pathway to the summation layer. This additional monocular layer acts as an AND logic gate which is active only when similar images are presented to each eye. The third model does not use a distinct subpopulation of neurons as a gate; rather, it introduces cross-column excitation between monocular neurons to overcome cross-column inhibition. These cross-column excitatory connections may be similar to connections suggested in the review paper Tong et al. (2006), where the exact function of the connections are not specified. In this third model, a balance between cross-column excitation and cross-column inhibition is achieved (we believe for the first time) allowing distinct images to rival and similar images to fuse.

2 Methods and models

Our models are multilayer, with each layer containing two ocular dominance columns corresponding to the two eyes. (For convenience, we use the term “layers” to organize the description of the sub-populations of neurons. These “layers” do not correspond to different layers of V1 in the visual cortex; rather, the sub-populations will most likely reside in a single layer of V1 such as 4Cα.) Each layer consists of conductance-based, integrate-and-fire point neurons with firing rate adaptation, divided into sub-populations for each layer and each type (excitatory or inhibitory). Each sub-population consists of 90 neurons, each labeled by its orientation preference, 𝜃j = jΔ𝜃,j = 1,...,90, and Δ𝜃 = 2 degrees. Thus, each sub-population can be thought of as residing on a “ring”. The membrane potential of each excitatory (inhibitory) neuron \(V^{j}_{\sigma }, \sigma = E,I\) satisfies

$$ \begin{array}{@{}rcl@{}} C_{\sigma} \frac{d}{dt}V^{j}_{\sigma} &=& - g_{L,\sigma}\left( V^{j}_{\sigma} - V_{L}\right) - g^{j}_{\sigma E}(t) \left( V^{j}_{\sigma} - V_{E}\right) \end{array} $$
$$ \begin{array}{@{}rcl@{}} && - g^{j}_{\sigma I}(t) \left( V^{j}_{\sigma} - V_{I}\right) , \end{array} $$
(1)

where the Cσ,σ = E,I, denotes capacitances, gL is the leak conductance, \(g^{j}_{\sigma } (t)\) are the time dependent excitatory and inhibitory conductances, and VL,VE,&VI are reversal potentials. The neuron generates a spike when its voltage reaches a threshold value, \(V^{j}_{\sigma } (t = t^{j}_{\sigma })=V_{T}\), and resets \(V^{j}_{\sigma }\) to Vreset for \(t \in (t^{j}_{\sigma }, t^{j}_{\sigma } + \tau _{ref} ). \) Biophysical parameters are used: CE = 0.5nF,CI = 0.2nF,gL,E = 25nS,gL,I = 20nS,VL = Vreset = − 70mV,VE = 0mV, VI = − 70mV,VT = − 55mV,τref = 2ms.

Conductance profiles \(g^{j}_{\sigma \sigma ^{\prime }}(t)\) contain an external drive, the cortical-cortical interactions within and between layers, and firing rate adaptation –

$$ \begin{array}{@{}rcl@{}} g^{j}_{\sigma E}(t) &=& g^{j}_{LGN}(t) + g^{j}_{\sigma E, c-c}(t), \\ g^{j}_{\sigma I}(t) &=& g^{j}_{\sigma I, c-c}(t) + g^{j}_{\sigma I, adapt}(t). \end{array} $$
(2)

The excitatory external drive \(g^{j}_{LGN}(t)\) loosely represents the drive from the visual grating stimuli, through the retina and the LGN, to V1. Only the neurons in the monocular layers are driven by visual stimuli, which are gratings presented to each “eye”, with the left (right) eye driving the left (right) monocular layers. This excitatory drive is represented by Poisson spike trains, producing AMPA synaptic inputs (with decay time scales of 2ms) to the monocular neurons. The strength of external drive (i.e., Poisson firing rate) to the j th monocular neuron with orientation preference 𝜃j decays with 𝜃𝜃j (where 𝜃 is the orientation of the grating) as a Gaussian with standard deviation σ = 12.0 degrees (FWHM = 28 degrees). These Poisson spike trains provide stochasticity, and there is no other source of noise in the model.

There are three different types of visual stimulation: i) For rivalry: the left (right) column is stimulated by an oriented grating with orientation 𝜃L (𝜃R = 𝜃L + 90 degrees); ii) For fusion: each column is stimulated simultaneously by gratings with very similar orientations; (iii) For the hysteretic transition: the left column is stimulated by a grating oriented at 𝜃L, while the right column is stimulated by a grating at 𝜃R = 𝜃L + Δ, ramping away from fusion by slowly increasing Δ from 0, and ramping back to fusion by fixing 𝜃L sufficiently positive and slowly decreasing Δ, with a speed of 2 degrees per second. In each model, the strength of stimulation is chosen to produce firing rates of a single neuron in a range of 10 − 50 Hz.

Within each ocular dominance column, the excitatory synaptic transmissions, including those that project to binocular neurons, are mediated by AMPA type glutamate receptors. Local inhibition is mediated by GABA A type receptors. All long-range cross-column projections are excitatory, with synaptic transmissions mediated by NMDA type receptors. The dynamics of AMPA and GABA A synaptic gating variables are modelled as instantaneous “rise-time” jumps of magnitude 1 when a spike occurs presynaptically, and then an exponential decay with time constants of 2ms (10ms) for AMPA (GABA A) respectively. Each neuron in the model undergoes an adaptation which contributes to the inhibitory GABA A conductance, and whose efficacy increases 0.15 with each spike and decays to zero with a time constant of 2000ms. NMDA synaptic dynamics are modeled as the following:

$$ \begin{array}{@{}rcl@{}} \tau_{s}\frac{d}{dt} s &=& -s + \alpha \tau_{s} x (1-s) \\ \tau_{x} \frac{d}{dt} x &=& - x + {\sum\limits_{i}^{N}} \delta(t-{t^{E}_{j}}), \end{array} $$
(3)

where s is a gating variable with decay time constant τs = 100ms, and the rise time to saturation is controlled by α = 0.5/ms, and an intermediate variable x which jumps instantaneously to magnitude 1 when an excitatory spike occurs presynaptically at spike time \({t^{E}_{j}}\), and decays exponentially with rate τx = 2ms.

The coupling strengths used in the three models are specified, together with the network architecture in the captions of Figs. 12, and 3. We emphasize that these are idealized models, in which each neuron receives many fewer synaptic contacts than a real cortical neuron; thus, we do not use biophysical measurements to set the synaptic coupling strengths. Rather, the strengths are tuned to produce the phenomena of rivalry and fusion with realistic characteristics.

The numerical simulation is carried out using the package Brian 2 in Python (Stimberg et al. 2013), with Runge-Kutta 4 as the numerical method. We use MATLAB (MATLAB 2018) for all data analysis.

Turning to the coupling architecture, we first describe the architecture that is common to all three models: All within-column excitatory connections target neurons of similar orientation preference; thus, these excitatory connections have weights following a Gaussian footprint as a function of difference in orientation preferences:

$$ W_{ij} = W(\theta_{i} - \theta_{j}) = exp\big[-\frac{(\theta_{i} - \theta_{j})^{2}}{2\sigma^{2}}\big], $$
(4)

where (𝜃i𝜃j) is the difference between orientation preferences of the ith and jth neurons, with σ = 12.0 degrees. Note that all synaptic projections across ocular dominance columns are excitatory and preferentially target neurons of the similar orientation preferences. Often the inhibitory neurons receiving cross-column excitatory projections themselves project non-selectively with respect to orientation, with coupling weights that are equal for all differences in orientation preference; however, each of the first two models have a sub-population of inhibitory neurons which project selectively with a Gaussian footprint, with σ = 12.0 degrees. We note that, in primate visual cortex, inhibitory neurons near the pinwheel centers of the ordered map of orientation preference would project to all angles of preference; and we note that the nonselective projection of inhibitory neurons is consistent with the psychophysical observation that “cross-eye inhibition” suppresses all orientations equally (Blake and Lema 1978).

The primary mechanism underlying binocular rivalry is common to all three models – cross-column inhibition. This inhibition is generated by excitatory neurons in the left (right) ocular dominance column projecting to inhibitory neurons in the other [right (left)] column. These cross-column excitatory projections project selectively (for orientation preference) to NMDA receptors on inhibitory neurons. To prevent rivalry, some mechanism(s) must overcome cross-column inhibition and allow similar visual stimuli to fuse. Each of the three models has its own distinct architecture and mechanism to overcome cross-column inhibition. We describe these separately for each model.

The first model (shown schematically in Fig. 1), is a three-layer model – monocular neurons which reside in the left (right) columns of the “lowest” layer; opponency neurons which reside in the left (right) columns (though they receive synaptic inputs from both left and right monocular neurons, we classify them by their source of excitation); and summation neurons which are binocular, shared by the two columns, and constitute the model’s “top” layer. Neurons in the binocular layer receive and sum excitatory inputs from the layer of monocular neurons (projection 5 in Fig. 1, together with its symmetric partner (not shown)), and their activities are assumed to reflect the percepts. This top summation layer has 90 excitatory binocular neurons that receive inputs from excitatory monocular neurons of both eyes, and 90 inhibitory binocular neurons (not shown in Fig. 1) that provide local, recurrent inhibition of the excitatory binocular neurons.

External stimuli of oriented gratings on the left (right) “eye” provide Poisson spike trains that drive the left (right) monocular layer, as described above. In each ocular dominance column, the monocular layer has 90 excitatory neurons receiving external stimuli, together with 90 inhibitory neurons (not shown in Fig. 1) that receive external stimuli and recurrent excitation from excitatory monocular neurons in the same column, and in turn inhibit locally. The monocular layer also has 90 additional inhibitory neurons (explicitly shown in Fig. 1) that receive cross-column feedback excitation from excitatory opponency neurons of the other ocular dominance column (projection 4 in Fig. 1); in turn, these monocular inhibitory neurons locally inhibit the excitatory monocular neurons equally at all angles of preference (projection 6 in Fig. 1). This type of inhibition of the excitatory monocular neurons (projection 4 composed with projection 6 in Fig. 1, together with its symmetric partner (not shown)) will be referred to as cross-column feedback inhibition.

Each layer of opponency neurons contains 90 excitatory neurons, and 90 inhibitory neurons. The excitatory opponency neurons receive excitatory inputs from the same eye monocular neurons (projection 1 in Fig. 1) and inhibitory inputs from inhibitory neurons in the opponency layer of similar orientation preference. These inhibitory neurons themselves are driven selectively by feedforward projections from excitatory monocular neurons of the other ocular dominance column. This pathway of inhibition of the excitatory oponency neurons (projection 2 composed with projection 3 in Fig. 1) will be referred to as cross-column feedforward inhibition.

We emphasize that, in this first model, the synaptic projections across ocular dominance columns are all excitatory and preferentially target neurons of the same orientation preferences. The monocular inhibitory neurons driven by cross-column excitation locally project to monocular neurons non-selectively with respect to orientation preference; while the inhibitory neurons in the opponency layer that participate in cross-column feedforward inhibition selectively project to excitatory opponency neurons of similar orientation preference through a Gaussian footprint with σ = 12.0 degrees.

The second model (shown schematically in Fig. 2) replaces opponency neurons with a second layer of monocular neurons that provide a second parallel projection to the binocular neurons. It is a four-layer model – with two monocular layers (A and B), a layer of excitatory neurons providing cross-column excitation, and a binocular summation layer. The excitatory neurons in the new monocular layer B are inactive under dis-similar binocular stimulation, because their external stimulation is cancelled by the local inhibition within the layer (which is also stimulated externally). Under similar stimulation, layer B becomes active (through cross-column dis-inhibition) and provides a second acitive pathway to the binocular layer (projection 3 in Fig. 2). We find it easiest to describe this second model as a two step conversion of the first: (i) The elimination of opponency neurons by removing from the opponency layer the inhibitory neurons involved in cross-column feedforward inhibition, while retaining the excitatory neurons in the (formerly) opponency layer as a source of cross-column excitation. (This layer is renamed as the cross-column excitation layer. It’s presence provides a source of cross-column feedback inhibition, and is just a matter of convenience, as this source could also be provided by the excitatory neurons in the “original” monocular layer.) (ii) The addition of layer B of monocular neurons, with its inhibitory neurons themselves inhibited selectively (by projection 1 of Fig. 2) by projections from the inhibitory neurons of similar orientation preference in the original monocular layer A of the same column; and with the excitatory neurons in layer B inhibited selectively by layer B’s inhibitory neurons of similar orientation preference (by projection 2 of Fig. 2). The excitatory neurons in both layers A and B project to the binocular summation neurons, providing the two parallel projection pathways. When layer B’s inhibitory neurons are active, they silence layer B’s excitatory neurons; hence, they silence layer B’s projection to the summation layer. On the other hand, when layer B’s inhibitory neurons are inhibited, layer B’s excitatory neurons get dis-inhibited and produce a second active projection to the summation layer.

In addition to the projections that mediate the disinhibition mechanism mentioned above, there are two additional projections (not shown in Fig. 2) from layer B to layer A’s inhibitory neurons, one excitatory and one inhibitory. These connections fascilitate the disinhibition mechanism by forming a feedback loop. In this second model, the coupling strengths are retained for all of the remaining local and global couplings of the first model. The newly-introduced couplings all have strengths following a Gaussian footprint of σ = 12.0 degrees.

In this second model, we note that all excitatory cross-column projections project selectively to the same orientation preferences. The inhibitory neurons driven by cross-column excitation project to monocular layer A neurons non-selectively; while the inhibitory projections from monocular layer A to the new monocular layer B and the inhibitory projections within layer B project selectively.

The third model (shown schematically in Fig. 3), is a two-layer model – with two ocular dominance columns of monocular neurons and a binocular summation layer. There is no opponency nor secondary monocular layer; rather, excitatory monocular neurons in each ocular dominance column possess two types of cross-column connections, each selectively targeting neurons in the other column of similar orientation preferences (sd of 3 degrees). The first type selectively targets inhibitory neurons, and is the source of cross-column inhibition (projection 1 in Fig. 3); the second type selectively targets excitatory neurons, and is the source of (a new) cross-column excitation (projection 3 in Fig. 3). In this third model, rather than shutting off cross-column inhibition, binocular stimulation with similar stimuli produces sufficient cross-column excitation, that together with the external drive, overcomes the cross-column inhibition and allows fusion to occur; whereas, under orthogonal stimulation, cross-column excitation targets neurons that are not externally stimulated, and thus is not strong enough to overcome cross-column inhibition and enables rivalry.

3 Results

We study each model’s responses to three different types of binocular stimulations: i) For rivalry: orthogonal gratings; ii) For fusion: similarly oriented gratings; iii)For the hysteretic transition: ramping the orientation difference in the two gratings, by slowly increasing (decreasing) the orientation difference from a fused (rivaling) configuration.

3.1 Rivalry and fusion in the first model

Binocular rivalry of orthogonal gratings

First, we stimulate with orthogonal gratings oriented at 𝜃L = 36, and 𝜃R = 126. Stimuli to the left and right eyes have identical strengths. Neurons located nearby L-monocular excitatory neuron with orientation preference (36) [and nearby R-monocular excitatory neuron with orientation preference (126)] also receive stimulation through the left eye (right eye), but with strengths falling off as a Gaussian, as described in Methods.

As shown in the spike-time raster plots of Fig. 4, the model rivals when orthogonal gratings are presented dichoptically. Figure 4 shows the firing patterns of the excitatory neurons in each of three layers (L & R monocular, L&R opponency, and binocular). The L & R monocular neurons rival – fire stochastically in alternations, with approximate average duration of 2-3 seconds (Fig. 4c). The R(L) opponency neurons have a similar alternating firing pattern to the R(L) monocular neurons (Fig. 4b), because the former are driven directly by the latter. These alternations are caused by cross-column feedback inhibition (projection 4 composed with projection 6 in Fig. 1). Under this orthogonal stimulation, the excitatory opponency neurons are not inhibited by those nearby inhibitory neurons driven by cross-column feed-forward projections from monocular neurons (projection 2 composed with projection 3 in Fig. 1), because those monocular neurons themselves are not excited. (See Fig. 1.) Figure 4a shows that the summation neurons stochastically alternate. These alternations at the summation layer are summed from projections from the (rivaling) L-R monocular layers.

Fig. 4
figure 4

The first model with orthogonal gratings: The spike-time rasterplots of the excitatory neurons (labeled with their orientation preferences) of all the layers are shown in the center panel, with a top panel showing the average firing rate (averaged over the entire layer), and a right panel showing the normalized histogram of spike counts over orientation preference. a Binocular neurons are in black. b Left (right) opponency neurons are in blue (red). c Left (right) monocular neurons are in pale blue (pale red). All three layers rival

Performance details of the model’s rivalry conform qualitatively with experimental observations. The distribution of dominance durations simulated in our model follows a skewed gamma distribution Fig. 5a, which is in agreement with previous literature (Laing and Chow 2002). We also confirm that the model conforms with two of Levelt’s rules (Levelt 1965). Levelt’s rules 1 & 2 predict that if stimulus contrast to one eye is fixed while that to the other eye is varied, (1) the mean dominance duration of the eye presented with increasing input will also increase, and (2) changes in the mean dominance duration are more significant for the image with relatively higher contrast (Li et al. 2017). To test this within the model, the Poisson firing rate of the incoming spike train is held constant for left ocular dominance column, while the firing rate to the right column is varied. As shown in Fig. 5b, the model captures the changes of the mean dominance duration predicted by Levelt’s rules 1 & 2. Since the other two models use the same mechanism to realize rivalry as explained in Methods, they conform with experimental observations as well (results not shown).

Fig. 5
figure 5

Dominance duration in the first model: Panel (a) shows the distribution (blue bars) of dominance duration, obtained from a 400 seconds simulation of the model. The black curve is the fitted gamma distribution \(f(t) = \frac {\lambda ^{r} t^{r -1} exp(-\lambda t)}{\Gamma (r)}\) with parameters λ = 1.9104 and r = 5.0493. Panel (b) shows the mean dominance duration of left and right eyes as a function of the rate of Poisson spike trains to the right eye neurons. The rate of the Poisson spike trains to the left eye neurons is fixed at 30 Hz

Binocular fusion of similarly oriented gratings

Psychophysics (Kaufman and Arditi 1976) shows that similar images fuse. To study fusion in the first model, we stimulate dichoptically each orientation column (L, R), with an oriented grating and a grating at nearby orientation, respectively. Figure 6 shows model one’s response when the gratings separated by only 2 degrees. The oriented gratings that stimulate each column are very similar; hence, they evoke largely overlapping neuronal activity in both left and right dominance columns. As the stimulation for each column is so similar, the excitatory and inhibitory drives of the excitatory opponency neurons cancel each other out, so that the opponency neurons rarely fire (Fig. 6b&c). These silenced opponency neurons cannot drive cross-column feedback inhibition, and allow persistent firing of excitatory monocular neurons (Fig. 6d&e). The binocular neurons sum the excitations from both eyes, and the dominant “percept” is a combination of the two stimulated orientations. As shown in Fig. 6a the images fuse, as described in the experimental literature (Nelson 1975).

Fig. 6
figure 6

The first model with similar gratings: The general layout follows Fig. 4. The two gratings are 2 degrees apart. a Binocular neurons are in black. b,c Left (right) opponency neurons are in blue (red), with several spikes at stimulus onset. d,e Left (right) monocular neurons are in pale blue (pale red). The binocular layer fuses the left and right monocular layers, while the opponency layer is generally inactive

More trivially, identical images (such as gratings at the same orientation, or identical plaids shown to both eyes) must fuse. In all three of our models, identical images fuse (not shown).

NMDA mediates opponency mechanism more effectively than AMPA

The models show that NMDA type receptors, due to their relatively long and sustained time course, facilitate the opponency mechanisms underlying fusion. To demonstrate this within the first model, we replaced the decay time constant of NMDA with a much smaller time constant (2 ms) like that of AMPA type receptors, and found (not shown) that the monocular neurons always rival, even under the presentation of identical stimulations to both columns. This difference in response between NMDA and AMPA is due to NMDA’s longer sustained time course, which provides a higher level of cross-column feedforward excitation to inhibitory neurons; and in turn, inhibits the excitatory opponency neurons more effectively than AMPA, whose time course is less-sustained. Thus, the opponency neurons drive cross-column feedback inhibition which results in rivalry – even with the presentation of identical stimuli. If fusion were to occur with cross-column excitation mediated by AMPA rather than NMDA receptors, a stronger cross-column feedback would be required.

3.2 Rivalry and fusion in the second model

The second model is realized as a four-layer model, with the opponency layer replaced by a purely excitatory layer, together with the addition of a second layer of monocular neurons. Both the excitatory and inhibitory neurons in this second monocular layer are externally driven, balanced in such a way that the external excitatory drive on the monocular excitatory neurons in the layer B is cancelled by that layer’s local inhibition; thus, the projection of this second layer to the binocular summation neurons is silent, unless this balance is broken by the other layers of the model.

Given the ever presence of cross-column feedback inhibition, left and right column monocular neurons in layer A will always rival, under both similar and distinct stimuli. In addition, these layer A inhibitory neurons also project selectively to inhibitory neurons in monocular layer B. Thus, the inhibitory neurons in monocular layer B can be inhibited, providing local “dis-inhibition“ to that layer’s excitatory neurons. These layer B excitatory neurons will only be active when this feedback dis-inhibition is present, which in turn will only be active with similar stimuli to both columns. When active, this dis-inhibition breaks the excitatory-inhibitory balance in layer B, and releases the layer B excitatory monocular neurons in the left (right) column in temporal phase with the layer A excitatory monocular neurons in the other [right (left)] column. That is, the rivalry of layer B neurons will be out of phase with the rivalry of layer A neurons; and the two parallel projections to the binocular neurons will sum together at the binocular layer for a coherent “perception” of fusion.

Binocular rivalry of orthogonal gratings

The response of model 2 to orthogonal binocular stimulation is shown in Fig. 7. Indeed, the new layer B of monocular neurons is silent Fig. 7c; the original monocular layer A rivals Fig. 7b; and the summation layer inherits this rivalry Fig. 7a. As this rivalry is driven solely by monocular layer A, it is identical to the rivalry in the rivalry in model 1, with identical stochastic characteristics (not shown).

Fig. 7
figure 7

The second model with orthogonal gratings: The general layout follows Fig. 4. a Binocular neurons are in black. b Left (right) monocular neurons of monocular layer A are in blue (red). c Left (right) monocular neurons of monocular layer B are in pale blue (pale red). The monocular layer A and the binocular layer rival within each layer, while monocular layer B is inactive

Binocular fusion of similarly oriented gratings

Figure 8 shows the response of model 2 to binocular stimulation by gratings of similar orientation, with gratings separated by two degrees of orientation preference. The original monocular layer A continues to rival (Fig. 8b); the new monocular layer B is now active and rivals (Fig. 8c), out of phase with layer A; and the two parallel projections produce fusion in the binocular layer (Fig. 8a). The same mechanism of model 2 causes identical binocular images such as plaids to fuse at the summation layer into a single image (not shown).

Fig. 8
figure 8

The second model with similar gratings: The general layout follows Fig. 4. The two gratings are 2 degrees apart. a Binocular neurons are in black. b Left (right) monocular neurons of monocular layer A are in blue (red). c Left (right) monocular neurons of monocular layer B are in pale blue (pale red). Left (right) monocular neurons rival within each monocular layer (A and B), but the two rivalries are exactly out-of-phase. Thus, the reponses to each eye’s stimulus fuse in the binocular layer

3.3 Rivalry and fusion in the third model

The third model is a two layer model – a monocular layer externally driven and projecting to a binocular summation layer. (See Fig. 3.) The excitatory monocular neurons have two essential types of cross-column projections – i) cross-column inhibition, that broadly inhibits the receiving column’s monocular excitatory neurons; and ii) cross-column excitation, where the monocular excitatory neurons selectively target monocular excitatory neurons of similar orientation preference in the other column. Under similar binocular stimulation, this additional direct and selective cross-column excitation overcomes the cross-column inhibition and allows similar stimuli to fuse.

Binocular rivalry of orthogonal gratings

When binocularly stimulated by orthogonal gratings, cross-column excitation is not effective in driving the monocular excitatory neurons in the other column. To see this, recall that the monocular excitatory neurons in the left column are driven by external gratings of orientation 𝜃L, and thus their cross-column excitatory projections selectively target excitatory neurons in the right column of orientation preference near 𝜃L. However, the orthogonal external stimulus to the right column does not drive these 𝜃L neurons; thus, the total excitation of these right column excitatory neurons is not strong enough to overcome their cross-column inhibition, and rivalry occurs. (See Fig. 9.)

Fig. 9
figure 9

The third model with orthogonal gratings: The general layout follows Fig. 4. a Binocular neurons are in black. b Left (right) monocular neurons are in blue (red). Both layers rival

Binocular fusion of similarly oriented gratings

On the other hand, when binocularly stimulated by gratings of similar orientation, monocular neurons are driven strongly by both the external drive and the cross-column excitation, which together overcome cross-column inhibition and allow similarly orientated gratings to fuse. (See Fig. 10.) We note that in the fused state of the third model, the spread of raster plot of monocular neurons is much narrower than that of rivalry. This is because the spread is determined by the cross-column excitation which has a tight orientation selectivity (sd of 3 degrees), thus the driven neurons common to both columns benefit the most from it.

Fig. 10
figure 10

The third model with similar gratings: The general layout follows Fig. 4. The two gratings are 2 degrees apart. a Binocular neurons are in black. b,c Left (right) monocular neurons are in blue (red). The binocular layer fuses the left and right monocular layers

3.4 Hysteretic transition between rivalry and fusion

In all three models, the transitions between fusion and rivalry are hysteretic, as they are in psychophysical experiments (Buckthought et al. 2008). To investigate the transitions between fusion and rivalry, we follow Wilson (2017) and very slowly (2 degrees per sec) change the difference in angle between the gratings stimulating the left and right eyes – first by slowly increasing this difference from identical angles (to study the transition from the fused state to a rivalry state), and then by slowly decreasing this difference from distinct angles (to study the transition from the rivalry to fusion). The results for all three models are shown in (Figs. 11 and 12, where, for each model, the average firing rates of the binocular layer are shown for both cases [“diverging” (increasing Δ𝜃) and “converging” (decreasing Δ𝜃)]. In all three models, the hysteretic response are clear, with the transition of the converging (Δ𝜃 decreasing) branch occuring at a smaller transition angle than the diverging branch. The transition values of Δ𝜃 (from fusion to rivalry and from rivalry to fusion) and the width in Δ𝜃 of the bi-stable region are in reasonable agreement with physchophysics (Buckthought et al. 2008). Note in model three, the averaged firing rate for the binocular neurons is very similar in both the rivalry and fusion states. This is because cross-column inhibition is always present, even with similar stimuli which fuse. Thus, we find it clearest to use the averaged firing rates of the monocular neurons to identify the transition angles (see Fig. 12b & c).

Fig. 11
figure 11

Hysteresis in the model 1 and 2: Two gratings are provided dichoptically, they start at an angle of 40 degrees (0 degree) apart and converge (diverge) to 0 degree (40 degrees) apart at a speed of 2 degrees per second. Average firing rates (averaged over the entire binocular layer) are plotted. a The binocular layer’s average firing rate of model 1 with converging (diverging) stimulus is plotted in blue (red). b The binocular layer’s average firing rate of model 2 with converging (diverging) stimulus is plotted in blue (red). The vertical dotted lines indicate the transition angles between rivalry and fusion, and the widths of hysteresis are around 4.0, 6.6 degrees for model 1 and 2, respectively

Fig. 12
figure 12

Hysteresis in model 3: The stimuli follow Fig. 11. a The firing rate of the binocular neurons under the converging (diverging) stimuli is plotted in blue (red). b The firing rate of left (right) monocular neurons under the diverging stimuli is plotted in blue (cyan). c The firing rate of left (right) monocular neurons under the converging stimuli is plotted in red (magenta). The vertical dotted lines indicate the transition angles between rivalry and fusion, and the width of hysteresis is around 10.65 degrees. The monocular neurons’ firing rates are much more informative on the transition angles of the hysteresis

4 Discussion

Summary

We have studied point neuron ring models of the front end initiation of binocular rivalry, fusion, and the hysteretic transition between rivalry and fusion. It is generally accepted (See, e.g., Laing and Chow 2002; Wilson 2003; Shpiro et al.2007; Moreno-Bote et al. 2007) that the mechanism underlying binocular rivalry is cross-column inhibition; thus, the issue is how this inhibition is overcome under similar stimulation to each eye, allowing fusion. Model 1 is a three-layer model that uses a layer of opponency neurons to shut off cross-column inhibition; Model 2 is a four-layer model that allows cross-column inhibition to be always present (by eliminating opponency neurons), and introduces an additional layer of monocular neurons as a parallel pathway projecting to the binocular neurons – with the two pathways working together to fuse similar images at the binocular layer; Model 3 is a two layer model that introduces no additional layers of neurons, but includes cross-column selective excitation to overcome cross-column inhibition, and together the cross-column excitation and inhibition allow similar images to fuse.

Interpretation as logic gates

The first model (shown schematically in Fig. 1) realizes an “exclusive or” (XOR) circuitry though opponency neurons, as can be seen explicitly as follows: The layer of opponency neurons serves as logic gates, one gate for each pair of opponency neurons of identical orientation preference. Each gate has two input ports: i) the feedforward excitation & inhibition to the left & right opponency neurons from left monocular neurons, and ii) the feedforward excitation & inhibition to the right & left opponency neurons from right monocular neurons; and output: the feedback inhibition on monocular neurons. This output is active only with active drive from either the left OR the right monocular neuron, and is inactive when both the left and right monocular neurons do not fire (no excitatory drive), or when both are active (feedforward cross-column inhibition on the opponency neurons cancelling same-column excitation).

Alternatively, the second model (shown schematically in Fig. 2) replaces the layer of opponency neurons (the XOR gate) with a second monocular layer B whose excitatory neurons serve as AND gates, one for each orientation preference, as can be seen explicitly as follows: The excitatory neurons in monocular layer B have two input ports: i) the excitation by the visual stimuli and ii) the feedback dis-inhibition from the monocular layer A; and an output: projection to the binocular summation layer. In this case, the output projection is active if and only if the excitation from the visual stimuli and the feedback dis-inhibition from monocular layer A are both active. In all other cases, the output will be inactive – either due to no stimulation or no dis-inhibition or the lack of both. This logic gate interpretation provides an intuitive description of the different mechanisms by which the two models overcome cross-column inhibition: Under similar stimulation, the XOR mechanism “shuts off” cross-column inhibition, while the AND mechanism “turns on” the second pathway from monocular layer B to the binocular layer.

From the perspective of dynamical systems

Each of the three models is a high dimensional stochastic dynamical system of dimension 3 × N, where N is the total number of neurons, each described by three variables (voltage v and conductances gE&gI). The source of noise is the Poisson processes representing the visual (LGN) drive to the system. In all three models, the rivalry state consists of two meta-stable states of the dynamical system – representing the two (left and right) percepts. The long time rivalry state approached by the dynamical system has random, noise induced, temporal jumps between these two meta-stable states. Without noise, under distinct dichoptic visual stimulation, each model would have two stable states (one for each percept), each with its own “basin of attraction”. Which stable state is approached dynamically would depend upon in which basin of attraction the initial condition resides.

On the other hand, the dynamics of the binocular fusion of similarly-oriented gratings is very different in model 2 from that in models 1 and 3. In the first and third models, there is one dynamical state that represents the fused percept. This dynamical state is stable. In contrast, the second model has two distinct meta-stable states, which rival out of phase as is apparent in the raster plots of the two monocular layers in Fig. 8. Note however, these two states possess identical firing patterns for the layer of binocular summation neurons, representing the fused percept. We note that in all three models at low contrast orthogonal input, the rivalry state is lost to a fused state (not shown). This is consistent with the psychophysical observation (Liu et al. 1992) that rivalry is lost to fusion near the detection threshold.

Possible experimental tests of the model mechanisms

Could differences in the response properties of the three models be detected experimentally, and thus identify which mechanism is present in the real cortex? The three models have very similar response properties in the rivalry regime; that is, when stimulated by distinct visual patterns such as orthogonal gratings. However, there are differences in how the models fuse similar visual patterns, and how the models transition between fusion and rivalry. These differences are not apparent in the response properties of the binocular summation neurons; however, there are differences in the response properties of the models’ (left vs right) monocular neurons. In the first and third models, the monocular neurons do not rival in the fused state (Figs. 610). In the second model with two monocular layers, the monocular neurons always rival in the fused state, with the rivalry in two monocular layers out of phase with each other (Fig. 8). This out of phase rivalry might be measurable experimentally. (There is a fMRI experiment (Xu et al. 2016) showing that large patches of V1 cortex do not rival during fusion, although “monocular layer B” could be below the resolution of fMRI.) Distinguishing between the mechanisms that underlie fusion in the first and third models seems more difficult. One distinction is that, in the first model, fusion is allowed by shutting off cross-column feedback inhibition; while in the third model, cross-column inhibition is always present, but overcome by cross-column excitation in the fusion regime. This distinction with respect to inhibition might be measurable, perhaps by combining pharmocolgy with electrophysiology.

Relationship to other work

Although our model has similarities to earlier models in the literature, we believe that our work is the first to study three distinct mechanisms to overcome the cross-column inhibition underlying rivalry; thus, to construct explicitly three distinct computational models that combine fusion, rivalry and the transition between. While our point neuron ring models are idealized, they employ the same components as realistic large-scale models of V1 (conductance-based integrate-and-fire point neurons, with adaptation; excitatory, selective for orientation preference, long-range connections, mediated at the time scales of NMDA type receptors; inhibition, mediated at the time scales of GABA A type receptors). The one dimensional ring architecture is the primary idealization in our models. Most of the earlier models addressed rivalry, but not fusion. While mean field firing rate models of rivalry were most common, some previous studies have employed point neuron models, (e.g., Wilson 2003; Moreno-Bote et al. 2007), including point neuron models with ring structures (e.g. Laing and Chow 2002; Cohen et al. 2019). The work of Laing and Chow (2002) provides a rather complete study of the rivalry between Hodgkin-Huxley point neurons driven by two gratings with distinct orientations; however, they do not model ocular dominance columns, and do not study fusion. The work of (e.g. Cohen et al. 2019) addresses the issue of operating states of idealized balanced cortical networks displaying rivalry – important because large-scale comprehensive neuronal networks mostly operate in a balanced state. While we have not focused on the operating points of our ring models, we do note that the monucular layers exhibit asynchronous firing, one of the most important properties of balanced networks. Said and Heeger (2013) and Li et al. (2017) used mean firing rate models with divisive normalization to realize opponency neurons that they introduced to rival stimuli by distinctly oriented gratings, while fusing identical plaids. Recently, Wilson (2017) developed a rather complete rate model to capture rivalry, fusion of plaids, and the hysteretic transition between fusion and rivalry within a single model with XOR circuitry. That model’s realization of fusion is somewhat different from our first model – in that the inhibitory neurons mediating rivalry in Wilson’s model target only orientations away from the orientation preference of the projecting neuron. This causes rivalry to cease under similar stimulations, and allows fusion. In our first model, the inhibitory neurons mediating mutual inhibition target all orientations, and fusion is achieved by using similar stimulation to “shut off” that inhibition.

Remark on “perceived contrast”

In this work, we have not studied contrast perception during monocular and binocular viewing. Nonetheless, model 3 has an interesting response property that is related to “perceived contrast”; namely, the binocular neurons’ firing rates (interpreted as “perceived contrast”) have similar values in the fused state and rivalrous state under the same input contrast (Fig. 12), similar to what experiments have found during binocular and monocular viewing (Legge and Rubin 1981; Wilson 2017). This is in contrast with models 1 and 2, whose binocular response in the rivalry state is much less than that in the fused state (Fig. 11). This one special case (with equal contrast to each eye) hints at the possibility that model 3 might have realistic contrast perception. In this case, we note that in model 3 the balance between cross column excitation and inhibition itself normalizes the binocular responses, instead of requiring an additional global inhibition such as the divisive normalization that is shown to produce realistic contrast perception in the firing rate model of reference (Wilson 2017).

An AND pathway may be necessary

There was a discussion in the literature (Wolfe 1986; Blake and O’Shea 1988; Blake 1989; Wolfe 1988) about whether AND or XOR is the more likely biological mechanism. Although Wolfe argued for the AND pathway, later experiments and computational studies focused almost solely on the XOR theory. Psychophysics experiments (Blake and Boothroyd 1985) were key to the debate, with Blake and O’Shea (1988) arguing that the AND theory would have difficulty explaining them. Here, by comparing our first and second model’s performances on one variant of these experiments, we re-examine the almost forgotten, but possibly valid AND mechanism of binocular vision (see the extensive review by Wolfe1986).

The psychophysical finding in Blake and Boothroyd (1985) may be briefly summarized: a vertical grating is presented to each eye with an additional horizontal grating presented to only one of the two eyes. The percept is a stable plaid. As the contrast of each of the three gratings is separately and abruptly decreased to a very low contrast, no difference is observed in the ability to detect the abrupt contrast reduction of each one of the three gratings, and are comparable to those under fusion (when only the two vertical gratings are present). The result leads to the conclusion that the horizontal grating is not suppressed under this stimulus setup – for if it were suppressed, say by rivalry with the other eye’s vertical grating, it’s contrast reduction would be more difficult to detect. Under the AND conceptual theory, the horizontal grating could only activate the rivalry pathway (having no similar stimuli in the other eye) but not the fusion pathway, and thus the horizontal grating should be intermitantly suppressed – leading to the conclusion that the AND theory would predict that it is more difficult to detect the contrast decrement of the horizontal grating. However, Wolfe argued that this is not the case: the horizontal grating would not be suppressed because of a permanent dominance of the eye receiving both vertical and horizontal stimulation (Wolfe 1988).

When presenting the stimuli described above to our first two models, we found that both models realize Wolfe’s explanation – the eye receiving both horizontal and vertical grating remains constantly dominant (Fig. 13). This supports Wolfe’s argument that the eye with the horizontal grating can always be dominant. Nevertheless, the two models do have some differences in response to these stimuli: (i) In the first (XOR) model, this constant dominance nullifies fusion of the two vertical gratings. (See Fig. 13a&b.) (The cross-column feedback inhibition originating from the eye stimulated with both vertical and horizontal gratings is too strong for it to be shut-off by the other eye that is driven only by the single vertical grating. Thus, in model 1 the monocular neurons driven by that eye’s single vertical grating are soon inactive.) We note that model 3 performs similarly to model 1 for these visual stimuli. (ii) In the second (AND) model, the pathway for the eye with a single vertical grating remains dis-inhibited due to the constant dominance of the other eye, allowing fusion of the vertical stimuli from the two eyes. (See Fig. 13c&d.) Thus, in the second model, all three stimuli are “perceived”, while in the first and third models, only the vertical-horizontal (plaid) stimuli from the one eye is “perceived”, and the sole vertical stimuli at the other eye is not. Model 2 would predict no difference in the ability to detect an abrupt decrease in the contrast of any one of the three grating stimuli; on the other hand, models 1 and 3 would not be able to detect the sole vertical grating. This does not account for the full list of experiments and interpretations discussed in O’Shea (1987); however, it does suggest that the XOR mechanism does not provide a complete view of the binocular vision, and that some form of a parallel pathway mechanism might be needed in the visual system.

Fig. 13
figure 13

Monocular responses to Blake and Boothroyd (1985): The stimuli are provided as the following: two orthogonal gratings (36 degree and 126 degree) to the left eye, and; a single grating (36 degree) to the right eye. The rasterplots show the responses of excitatory neurons in the monocular layers of model 1 (left panel) and model 2 (right panel). a, b Left (right) monocular neurons of model 1 are in blue (red, with very few spikes). c Monocular neurons of the left monocular layer A (B) of model 2 are in blue dots (cyan triangles, all are inactive thus none is shown). d Monocular neurons of the right monocular layer B (A) of model 2 are in red dots (magenta triangles, only at stimulus onset). For model 1, the lone 36 degree grating in right eye is suppressed indefinitely, while the same grating is preserved in model 2 by monocular layer B of the right eye, consistent with Blake and Boothroyd (1985)

Major assumptions in our models

First, in all three models, all cross-column projections are assumed to be excitatory. Since these cross-column projections must be at least moderately long range, we view it more realistic to model them as excitatory, selectively targeting NMDA receptors on local inhibitory neurons with similar orientation preferences. These assumptions have basis in experiments. Long distance excitatory projections are known to target neurons of similar orientation preference (Bosking et al. 1997), and are known to target NMDA (as well as AMPA) receptors. Moreover, NMDA type receptors do occur on inhibitory interneurons in the cerebral cortex (see, e.g., Jones and Bühl 1993; Maccaferri and Dingledine 2002; Homayoun and Moghaddam 2007; Wang and Gao 2009), and in visual cortex (see, e.g., Huntley et al. 1994; Wong-Riley et al. 1998). And there is some evidence (Homayoun and Moghaddam 2007) that the NMDA receptors on inhibition can dominate those on excitation – an additional assumption necessary in our models. Also, all three models require that the local inhibition itself acts selectively in some sub-populations and broadly at all angles of orientation preference in other sub-populations. While there is evidence supporting these assumptions, taken together they do constitute rather strong theoretical assumptions underlying the models. Alternatively, one could represent cross-column projections as inhibitory, with projections of moderate spatial extent that activate slow time scale inhibitory receptors. As the existence of long range, orientation selective, excitatory projections is more certain, we chose to represent the cross-column projections as excitatory.

Shortcomings of our models

One notable weakness of all three models is that they fail to capture the psychophysical observation that decreasing the angle between the two nearly orthogonal gratings will slightly increase the mean dominance durations of rivalry (Andrews and Purves 1997). (Note the model of Laing and Chow 2002 captures this effect.) In our models, the strengths of mutual inhibition are equal for all neurons. In order to capture this observation, the models would need to have a fall-off in inhibition in terms of the orientation difference of the neurons, but a broad fall-off because the neurons oriented orthogonally must receive enough mutual inhibition to be suppressed and therefore produce rivalry.

Conclusion

Of the three models, we believe that the mechanism of the third model is most likely to emerge in large-scale comprehensive models; and hence, we believe that it the most likely to underly rivalry and fusion in the real visual cortex. The third model does not rely on additional layers of neurons with additional architectures (as do Models 1 and 2). Rather, it simply relies on a balance between cross-column inhibition and cross-column excitation – a balance which changes depending upon the similarity or distinctness of the visual stimuli.