1 Introduction

Recent studies on vision research have shown that many, if not most, popular vision models can be described by a cascade of linear and nonlinear (L \(+\) NL) operations [38]. This is the case for several reference models describing visual perception—e.g. the Oriented Difference of Gaussians (ODOG) [13] or the Brightness Induction Wavelet Model (BIWaM) [41]—and, analogously, for models describing neural activities [20]. These L \(+\) NL models are suitable in many cases for describing retinal and thalamic activity, but they have been shown to have low predictive power for modelling the neural activity in the primary visual cortex (V1), explaining less than 40% of the variance of the data [20]. On the other hand, there exist several models in vision research which cannot be expressed as a combination of (L \(+\) NL) operations. Prominent examples are models describing neural dynamics via Wilson–Cowan equations [18, 45, 55]. Although these models have been extensively studied by the neuroscience community to describe cortical low-level dynamics, see, for example, [24], their use in the context of psychophysics to describe, for example, visual illusions has been considered only very recently [8].

In [6, 10, 11], the authors show how a slight, yet effective, modification of the Wilson–Cowan equation that does not consider orientation admits a variational formulation through an associated energy functional which can be linked to histogram equalisation, visual adaptation and the efficient representation principle, an important school of thought in vision science [40]. This principle, introduced by Attneave [2] and Barlow [4], is based on viewing neural systems through the lens of information theory and states that neural responses aim to overcome neurobiological constraints and to optimise the limited biological resources by self-adapting to the statistics of the images that the individual typically encounters, so that the visual information can be encoded in the most efficient way. Natural images (and, more generally, images in urban environments) are in fact not random arrays of values, since they present a significant statistical structure. With respect to such statistics, nearby points tend to have similar values; as a result, there is significant correlation among pixels, with a redundancy of \(90\%\) or more [1], and it would be highly inefficient and detrimental for the visual system to simply encode each pixel independently. Another very important reason to remove redundant statistical information from the representation is that the statistical rules impose constraints on the image values that are produced, preventing the encoded signal from utilising the full capacity of the visual channel, which is another inefficient or even wasteful use of biological resources. By removing what is redundant or predictable from the statistics of the visual stimulus, the visual system can concentrate on what’s actually informative [44]. Remarkably, the efficient representation principle has correctly predicted a number of neural processing aspects and phenomena and is the only framework able to predict the functional properties of neurons from a very simple principle. In [1], Atick makes the point that one of the two different types of redundancy or inefficiency in the visual system is the one that happens if some neural response levels are used more frequently than others: for this type of redundancy, the optimal code is the one that performs histogram equalisation, which can be obtained by means of the modification of the WC model described above.

Contribution The first contribution of this paper is to formally prove, in a completely general setting, that Wilson–Cowan equations are non-variational, i.e. they cannot be written as the gradient flow of an \(L^2\) energy functional. For this reason, their solutions do not provide a representation as efficient as the solutions to the local histogram equalisation model.

As a second contribution, we introduce an explicit orientation dependence both into the WC equations and into this modification via a lifting procedure inspired by the neurophysiological modelling of V1 [23, 27, 43], which has also been applied to several image processing problems [15, 57]. The lifting procedure, illustrated in Fig. 1, consists in associating with each point of the retinal plane \(x \in \mathbb {R}^2\), the tangent direction \(\theta \) of the contour at point x, thus “lifting” the retinal plane \(\mathbb {R}^2\) to the feature space \(\mathbb {R}^2 \times \mathbb {P}^1\) of positions and orientations. This mathematical construction mimics the neural representations of the image features that the visual cortex performs, as it is well known from the studies in vision neuroscience by Hubel and Wiesel [35].

Fig. 1
figure 1

Pipeline for cortical-inspired image processing: each \(x \in \mathbb {R}^2\) is lifted in the space of positions and orientations \(\mathbb {R}^2 \times \mathbb {P}^1\) according to the correspondent tangent direction of the curve at point x. In the lifted space, many operations can be performed, such as the completion of the given broken curve. Then, the information retrieved within the lifted space can be projected back to the \(\mathbb {R}^2\) plane

We then report some numerical evidence showing how the proposed model is able to better reproduce several visual perception biases than both its orientation-independent version and some reference (L \(+\) NL) models. In particular, after reporting some numerical results for classical non-orientation-dependent illusions, we test our model on orientation-dependent grating induction (GI) phenomena (generalising the ones presented in [13, Figure 3], see also [39]) and show a direct dependence of the output image on the local orientation, which cannot be described by orientation-independent models.

We then test the proposed model on a modified version of the Poggendorff illusion, a geometrical optical effect where a misalignment of two collinear segments is induced by the presence of a surface [52, 53], see Fig. 10a. For this modified version, our model is able to integrate the contrast feature better than state-of-the-art models such as those based on filtering techniques [13, 41], on natural images statistics [34] and cortical-based ones [29, 30]. Moreover, we also show that such feature is not correctly integrated by the classical WC equations even when orientation is explicitly taken into account in the modelling.

Finally, we report an empirical study concerning the sensitivity of the model to parameters showing the existence of threshold values able to change the nature of the completion properties of the model, for example, to make it switch from inpainting-type (geometrical completion) to perception-type (perceptual completion).

A preliminary version of this work, including some of the tests presented here, appeared in [9].

2 Variational and Evolution Methods in Vision Research

The use of variational methods for solving ill-posed imaging problems is nowadays very classical within the imaging community. For a given degraded image f and a (possibly nonlinear) degradation operator \(\mathcal {T}\) modelling noise, blur and/or under-sampling in the data, the solution of the problem

$$\begin{aligned} \text {find } u \quad \text {s.t.}\quad f=\mathcal {T}(u) \end{aligned}$$
(1)

often lacks fundamental properties such as existence, uniqueness and stability, requiring alternative strategies to be used in order to reformulate the problem in a well-posed way.

In the context of variational regularisation approaches, for instance, one looks for an approximation \(u_\star \) of the real solution u by solving a suitable optimisation problem, so that

$$\begin{aligned} u_\star \in \arg \min \mathcal E(u), \end{aligned}$$
(2)

where \(\mathcal E\) is a (possibly non-convex) energy functional which typically combines prior information available both on the image and on the physical nature of the signal (in terms, for instance, of its noise statistics), see, for example, [21] for a review.

In convex and smooth scenarios, a common alternative consists in considering the steepest descent of \(\mathcal {E}\) defined in terms of the Fréchet derivative \(\nabla \mathcal {E}\) calculated w.r.t. to some norm, which reduces the problem to the form

$$\begin{aligned} \frac{\partial }{\partial t} u = - \nabla \mathcal E(u), \quad u|_{t=0} = f, \end{aligned}$$
(3)

under appropriate conditions on the boundary of the image domain. Then, solutions \(u_\star \) to (2) correspond to stationary solutions of (3). We remark that while the connection between variational problems and parabolic PDEs is always guaranteed by taking the gradient descent of the corresponding energy functional as above, the reverse is not always possible, as it requires some additional structure of the functional space considered that may lack in several cases. We will comment on this issue in the next section, where we will provide some examples in this respect focusing at some neurophysiologically inspired models for vision.

In such context, evolution equations have been originally used as a tool to describe the physical transmission, diffusion and interaction phenomena of stimuli in the visual cortex, see, for example, [24]. Similarly, variational methods have been studied by the vision community to describe efficient neural coding properties, see, for example, [40, 51], i.e. all the mechanisms used by the human visual system to optimise the visual experience via the reduction in redundant spatio-temporal biases linked to the perceived stimulus.

In the context of vision, a first study on the efficient representation aspects of some neurophysiological model analogous to the one considered in this work has been recently performed by the authors in [8] where several visual illusions are studied.

2.1 Wilson–Cowan-Type Models for Neuronal Activation

A prominent example of evolution models describing neuronal dynamics is the Wilson–Cowan (WC) equations [18, 55] that we present here in a general context.

Consider a neuronal population parametrised by a set \(\varOmega \), endowed with a measure \(\mathrm{d}\xi \) supported on the whole \(\varOmega \). In the following sections, we will be interested in the two cases: \(\varOmega = {{\,\mathrm{\mathbb {R}}\,}}^2\) and \(\varOmega = {{\,\mathrm{\mathbb {R}}\,}}^2\times \mathbb {P}^1\), both endowed with the corresponding Lebesgue measure. Denoting by \(a(\xi ,t)\in {{\,\mathrm{\mathbb {R}}\,}}\) the state of a population of neurons with coordinates \(\xi \in \varOmega \) at time \(t>0\), the Wilson–Cowan model reads

$$\begin{aligned} \frac{\partial }{\partial t} a(\xi ,t)&= -\beta a(\xi ,t)\nonumber \\&\quad +\,\nu \int _{\varOmega } \omega (\xi \Vert \xi ') \sigma (a(\xi ',t))\,\mathrm{d}\xi ' + h(\xi ,t). \end{aligned}$$
(WC)

Here, \(\beta >0\) and \(\nu \in \mathbb {R}\) are fixed parameters, \(\omega (\xi \Vert \xi ')\) is a kernel that models interactions at two different locations \(\xi \) and \(\xi '\), the function h represents an external stimulus, and \(\sigma {:}\,\mathbb {R}\rightarrow \mathbb {R}\) is a nonlinear sigmoid saturation function.

In the following, we further assume that the interaction kernel \(\omega \) is non-negative and normalised:

$$\begin{aligned} \int _{\varOmega } \omega (\xi \Vert \xi ')\,\mathrm{d}\xi ' = 1, \quad \text {for a.e.\ }\xi \in \varOmega . \end{aligned}$$
(4)

Moreover, as a sigmoid \(\sigma \), we consider the following odd function:

$$\begin{aligned} \sigma (\rho ) := \min \{1,\max \{\alpha \rho , -1\}\}, \quad \alpha >1, \end{aligned}$$
(5)

which has been previously considered, for example, in [10]. Observe that, depending on the sign of \(\nu \), model (WC) is able to describe both excitatory (\(\nu >0\)) and inhibitory local interactions (\(\nu <0\)), see, for example, [17, Section 3]. Due to the oddness of \(\sigma \), this latter case can be equivalently expressed by keeping \(\nu >0\) and replacing \(\sigma \) with its “mirrored” version \(\hat{\sigma }(\rho ) = \sigma (-\rho ), ~\rho \in \mathbb {R}\), see Fig. 2.

Fig. 2
figure 2

Symmetric behaviour of excitatory and inhibitory sigmoid functions in the form (5) with \(\alpha =5\)

Equation (WC) has been studied intensively over the last decades to describe several neuronal mechanisms in V1, see, for example, [3, 24, 28, 45, 50]. However, one interesting aspect which, up to our knowledge, has not been previously investigated, is whether (WC) complies with any efficient representation principle, or, in more mathematical terms, whether such model can be interpreted as the gradient descent in the form (3) of some energy functional defined on \(L^2(\varOmega )\).

As a first result, we show in the following that the model (WC) does not satisfy a variational principle. As a consequence, it does not implement an efficient neural coding mechanism. A preliminary study has been performed by the authors in [8], in a completely discrete setting. Here, we make these considerations more rigorous by the following theorem.

Theorem 1

Assume that there exist two subsets of positive measure \(U_1,U_2\subset \varOmega \), \(U_1\cap U_2 = \varnothing \) such that \(\omega (\xi \Vert \xi ')> 0\) for any \(\xi \in U_1\) and \(\xi '\in U_2\). Then, for \(\sigma \) chosen as above, the Wilson–Cowan equation (WC) does not admit a variational formulation, that is, it cannot be expressed as the gradient descent in the Fréchet sense of any densely defined energy \(\mathcal {E}\).

Proof

We proceed by contradiction and assume that there exists a densely defined energy \(\mathcal E\) on \(L^2(\varOmega )\) such that (WC) can be expressed in the form (3).

Let \(\chi _i{:}\,\varOmega \rightarrow \{0,1\}\) be the characteristic function of \(U_i\), \(i=1,2\). Since up to reducing it we can always assume \(U_i\) to have finite measure, we have that \(\chi _i\in L^2(\varOmega )\). Then, we define \(J{:}\,{{\,\mathrm{\mathbb {R}}\,}}^2\rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) by

$$\begin{aligned} J(v) := \mathcal {E}(v_1\chi _1 + v_2\chi _2), \quad v=(v_1,v_2)\in {{\,\mathrm{\mathbb {R}}\,}}^2. \end{aligned}$$
(6)

By definition, we have

$$\begin{aligned} \partial _i J(v)=\langle \nabla \mathcal {E}(v_1\chi _1 + v_2\chi _2), \chi _i\rangle , \quad i=1,2. \end{aligned}$$
(7)

Here, \(\nabla \mathcal E\) denotes the Fréchet derivative of \(\mathcal E\), and \(\langle \cdot ,\cdot \rangle \) denotes the scalar product in \(L^2(\varOmega )\). Thus, by (3) and (WC), there holds

$$\begin{aligned} \partial _i J(v)= & {} \beta v_i - \langle h,\chi _i\rangle \nonumber \\&-\,\nu \int _\varOmega \int _\varOmega \omega (\xi \Vert \xi ')\chi _i(\xi ) \sigma \left( \sum _{k=1}^2 v_k \chi _k(\xi ')\right) \,\mathrm{d}\xi \,\mathrm{d}\xi '.\nonumber \\ \end{aligned}$$
(8)

We now aim to differentiate again the above w.r.t. the jth variable, \(j=1,2\). Observe that since \(\sigma \) is Lipschitz continuous, it is differentiable almost everywhere and, thanks to the fact that \(U_1\cap U_2=\varnothing \), for a.e. \(v\in {{\,\mathrm{\mathbb {R}}\,}}^2\) we have

$$\begin{aligned} \sigma '\left( v_1\chi _1(\xi ')+v_2\chi _2(\xi ')\right) = \sigma '(v_j )\quad \forall \xi ' \in U_j. \end{aligned}$$
(9)

This shows that for a.e. \(v\in {{\,\mathrm{\mathbb {R}}\,}}^2\) it holds

$$\begin{aligned} \partial _{ji} J(v) = \beta \delta _{ij} -\nu \sigma '\left( v_j \right) c_{ij}, \end{aligned}$$
(10)

where \(\delta _{ij}\) is the Kronecker delta symbol and \(c_{ij}\) is defined as

$$\begin{aligned} c_{ij} := \int _{U_j\times U_i}\omega (\xi \Vert \xi ')\,\mathrm{d}\xi \,\mathrm{d}\xi '. \end{aligned}$$
(11)

Observe that, by (4), the assumption on \(U_1\) and \(U_2\) and the fact that they have finite measure, it holds that \(0\le c_{ij}<+\infty \) for any \(i,j\in \{1,2\}\). Moreover, since \(\omega (\xi \Vert \xi ')>0\) for \(\xi \in U_1\) and \(\xi '\in U_2\), we have that \(c_{21}> 0\).

We now claim that \(J\in C^2(A\times A)\), where \(A = \{t\in {{\,\mathrm{\mathbb {R}}\,}}: |t|\ne 1/\alpha \}\) is the set of differentiability of \(\sigma \). To this purpose, we compute

$$\begin{aligned} \sigma '(t) = {\left\{ \begin{array}{ll} \alpha &{}\quad \text {if } |t|<1/\alpha ,\\ 0 &{}\quad \text {if } |t|>1/\alpha , \end{array}\right. } \quad \forall t\in A. \end{aligned}$$
(12)

Then, by (10), for any \(v\in A\times A\) and \(i,j\in \{1,2\}\) it holds

$$\begin{aligned} \partial _{ji} J(v)= {\left\{ \begin{array}{ll} \beta \delta _{ij} - \nu \alpha c_{ij} &{}\quad \text {if } |v_j|<1/\alpha ,\\ \beta \delta _{ij}&{}\quad \text {if } |v_j|>1/\alpha . \end{array}\right. } \end{aligned}$$
(13)

As a consequence, \(\partial _{ji}J\) is continuous on \(A\times A\), proving the claim.

To conclude the proof, we show that \(\partial _{21}J\not \equiv \partial _{12}J\), which contradicts the \(C^2\) differentiability of J by the Schwarz theorem and thus shows that the r.h.s. of (WC) cannot be the Fréchet derivative of an energy. Indeed, it suffices to consider \(v\in A\times A\) with \(v_1>1/\alpha \) and \(v_2<1/\alpha \), which by (13) implies

$$\begin{aligned} \partial _{12} J(v) - \partial _{21} J(v)= \nu \alpha c_{21} \ne 0 \end{aligned}$$
(14)

This completes the proof of the statement. \(\square \)

Remark 1

The above argument can be easily extended to any Lipschitz continuous sigmoid \(\sigma \) with non-constant derivative.

Remark 2

The variational nature of physical models describing neural interaction has been investigated in other contexts. For instance, in [33], the authors consider neural models eventually arising from asymmetric interaction kernels. We also refer to [32] for the identification of a Lyapunov functional for a Wilson–Cowan-like equation.

To overcome the non-existence of an underlying energy for (WC) and deal with a model complying with the efficient representation principle, we will consider in the following a variation in (WC), which has been introduced in [10] for Local Histogram Equalisation (LHE) of images in the particular case where \(\varOmega \) is a square domain in \({{\,\mathrm{\mathbb {R}}\,}}^2\). Keeping now \(\varOmega \) general and using the same notation as above, this model can be written as

$$\begin{aligned} \frac{\partial }{\partial t} a(\xi ,t)&= -\beta a(\xi ,t) \nonumber \\&\quad + \nu \int _{\varOmega } \omega (\xi \Vert \xi ') \sigma (a(\xi ,t)-a(\xi ',t))\,\mathrm{d}\xi ' + h(\xi ,t). \end{aligned}$$
(LHE)

We note that the only difference between (LHE) and (WC) is the different input of the sigmoid \(\sigma \) appearing inside the integral. While in (WC) this is equal to the stimulus intensity at location \(\xi '\), in (LHE) this is equal to the difference between the population at the point under consideration and the neighbouring ones.

Following the same line of proof as in [10] and letting \(\varSigma {:}\,\mathbb {R}\rightarrow \mathbb {R}\) be any (even) primitive function of \(\sigma \), it is easy to show that independently on the choice of \(\varOmega \), equation (LHE) is the gradient descent in the sense of (3) of the following energy functional

$$\begin{aligned} \mathcal {E}(a)= & {} \frac{\beta -1}{2}\int _{\varOmega } \left| a(\xi )\right| ^2 \, \mathrm{d}\xi +\frac{1}{2}\int _{\varOmega } \left| a(\xi )-h(\xi )\right| ^2 \, \mathrm{d}\xi \nonumber \\&+ \frac{\nu }{2}\int _{\varOmega }\int _{\varOmega } \omega (\xi \Vert \xi ')\varSigma (a(\xi )-a(\xi ')) \, \mathrm{d}\xi '\,\mathrm{d}\xi . \end{aligned}$$
(15)

The functional \(\mathcal {E}\) is the sum of three different terms: the first two can be thought of as data fitting terms whose minimisation forces the solution of (15) to stay close to the given stimulus and, possibly, to global average brightness intensity levels; the third one is an interaction term whose minimisation corresponds to maximise the local contrast (see [10] for more details). In the following section, we will make precise some specific choices of h, which will clarify the different ingredients of model (15) in more detail.

2.1.1 Orientation-Independent Modelling

We now focus on the application of (LHE) to describe contrast perception phenomena independent on local orientation information. To do that, we recall in the following the specific instance of the (LHE) model introduced in [10]. We model the visual plane as a rectangular domain \(Q\subset {{\,\mathrm{\mathbb {R}}\,}}^2\) and consider grey-scale visual stimuli to be functions \(f{:}\,Q\rightarrow [0,1]\), such that f(x) encodes the brightness intensity at x. For a given initial stimulus \(f_0\), we then denote by \(\mu \) its local average intensity computed as the convolution \(\mu = g\star f_0\) of \(f_0\) with some filter \(g\in L^1(Q)\) with \(\int _Q g(x)\,\mathrm{d}x = 1\).

In [10], the filter g was chosen to be uniform, while in [6] it was changed to a simple Gaussian in order to reproduce visual induction effects; another possibility for g would be to use a sum of Gaussians, which has been shown to better approximate lateral inhibition effects happening at the retinal level [56]. We also take the activation in (LHE) to be \(a: = f-1/2\), corresponding to the way our visual system encodes contrast (i.e. as the difference with respect to the average, which we take to be \(\frac{1}{2}\)). For the external stimulus h, we take a weighted sum of the initial stimulus \(a|_{t=0} = f_0-1/2\) and its filtering via g; in the visual system, this corresponds to a combination of magnocellular (spatially averaged) and parvocellular (fine detail) pathway information. Namely, for \(\lambda >0\), we consider:

$$\begin{aligned} h = (g\star a)|_{t=0} + \lambda a|_{t=0} = \mu +\lambda f_0 - \frac{1+\lambda }{2}. \end{aligned}$$
(16)

We stress that the input h is time independent. Such simplification follows from considering in our modelling the very short time frame where the stimulus is presented and retained by the visual system, a time length that is typically known as iconic memory. For visual illusions such as the ones presented in Sect. 3, this time frame typically spans less than 200  ms [48], which corresponds to the fixation time between rapid eye movements, and therefore, the temporal changes in h can be neglected.

By plugging the above ingredients in Eq. (LHE), and letting \(\beta = 1+\lambda \), we obtain the following (orientation-independent) LHE evolution model:

$$\begin{aligned} \frac{\partial }{\partial t} f(x,t)&= -(1+\lambda ) f(x,t)\nonumber \\&\quad + \nu \int _{Q} \omega (x,y)\sigma \big ( f(x,t)-f(y,t) \big )\,\mathrm{d}y +\left( \mu (x)+ \lambda f_0(x)\right) . \end{aligned}$$
(LHE-2D)

Remark 3

Re-arranging the (LHE-2D) equation as

$$\begin{aligned} \frac{\partial }{\partial t} f(x,t)&= \mu (x) - f(x,t)\nonumber \\&\quad + \nu \int _{Q} \omega (x,y)\sigma \big ( f(x,t)-f(y,t) \big )\,\mathrm{d}y \nonumber \\&\quad + \lambda \left( f_0(x) - f(x,t) \right) , \end{aligned}$$
(17)

we can better see the effect of each of its terms: the one multiplied by the parameter \(\nu \) enhances local contrast, the one multiplied by \(\lambda \) penalises the departure from the original function \(f_0\), and the term \(\mu (x) - f(x,t)\) pushes the solution towards the local mean. Note that if \(\mu (x)\) is the constant value 1/2, it can be considered as a global average, and the solution is then consistent with the so-called grey world principle that states that in a sufficiently varied scene the average perceived color is a mid-grey, i.e. a mean value of \(\frac{1}{2}\) for each color channel [7, 10].

As far as the interaction kernel \(\omega \) is concerned, in [36] the authors consider a kernel \(\omega \) which is a convex combination of two bi-dimensional Gaussians with different standard deviations. While this variation in the model (LHE-2D) is effective in describing assimilation effects, the lack of dependence on the local orientation makes such modelling intrinsically not adapted to explain orientation-induced contrast and colour perception effects such as the ones described in [13, 41, 46]. Reference models capable of explaining these effects are mostly based on oriented difference of Gaussian linear filtering coupled with some nonlinear processing, such as the ODOG and the BIWaM models described in [12, 13, 41], respectively. However, despite their good effectiveness in the reproduction of several visual perception phenomena, these models are not based on any neuronal evolution modelling nor on any efficient representation (i.e. variational) principle.

2.1.2 Orientation-Dependent Modelling

We now focus on orientation-dependent models. For a given visual stimulus f, we let \(Lf{:}\,Q \times [0,\pi ) \rightarrow \mathbb {R}\) be the corresponding cortical activation in V1, where \(Lf(x,\theta )\) encodes the response of the neuron with spatial preference x and orientation preference \(\theta \) to the stimulus f. Such activation is obtained via convolution with the receptive fields of V1 neurons, as explained in “Appendix A”, see also [23, 27, 42, 43]. Then, similar to above, the model (LHE) for a cortical activation \(a(x,\theta )\) depending on the local V1 coordinate \(\xi =(x,\theta )\) is obtained as follows: We define \(F:= a+1/2\) to be the visual stimulus and take as external stimulus \(h = L\mu + \lambda Lf_0 - (1+\lambda )/2\) [compare with (16)]. This, combined with the choice \(\beta = 1+\lambda \), yields the equation

$$\begin{aligned} \frac{\partial }{\partial t} F(x,\theta ,t)&= -(1+ \lambda ) F(x,\theta ,t) \nonumber \\&\qquad +\,\nu \int _0^\pi \int _{Q} \omega (x,\theta \Vert y,\phi )\sigma \nonumber \\&\qquad \big ( F(x,\theta ,t)-F(y,\phi ,t) \big )\,\mathrm{d}y\,\mathrm{d}\phi \nonumber \\&\qquad +\left( L\mu (x,\theta )+ \lambda Lf_0(x,\theta )\right) . \end{aligned}$$
(LHE-3D)

Here the kernel \(\omega \) depends both on positions \(x,y\in {{\,\mathrm{\mathbb {R}}\,}}^2\) and orientations \(\theta ,\phi \in [0,\pi )\). A typical choice for this kernel would be the anisotropic heat kernel naturally associated with the V1 connectivity, as considered in [45]. However, for numerical reasons, the results presented in the following are obtained by considering simply 3D Gaussians.

We remark once again that the model above describes the dynamic behaviour of neuronal activations in the 3D space of positions and orientation. As explained in “Appendix A”, once a stationary solution is found, the two-dimensional perceived image can be efficiently found by

$$\begin{aligned} f(x) = \frac{1}{\pi }\int _0^\pi F(x,\theta )\,\mathrm{d}\theta . \end{aligned}$$
(18)

Remark 4

In the following, we will consider the interaction to be excitatory (i.e. \(\nu >0\)) for both (LHE-2D) and (LHE-3D) models. Indeed, the integral term in both models is positive at x if, for example, \(f(x,t)>f(y,t)\). Thus, in order to enhance the contrast between x and its surround we need to have \(\nu >0\).

We now discuss on the numerical aspects required to implement model (LHE-3D).

2.2 Discretisation Via Gradient Descent

We discretise the initial (square) image \(f_0\) as an \(N\times N\) matrix. For simplicity, we assume here periodic boundary conditions. We additionally consider \(K\in \mathbb N\) orientations, parametrised by

$$\begin{aligned} k \in \{1,\ldots ,K\}\mapsto \theta _k := \frac{(k-1)\pi }{K}. \end{aligned}$$
(19)

The discretised lift operator, still denoted by L, transforms \(N\times N\) matrices into \(N\times N\times K\) arrays. Its action on an \(N\times N\) matrix f is defined for \(n,m\in \{1,\ldots , N\}\) and \(k\in \{1,\ldots ,K\}\) by

$$\begin{aligned} (Lf)_{n,m,k} = \mathcal F^{-1}\left( (\mathcal F f) \odot (R_{\theta _k} \mathcal F\varPsi ^{\text {cake}}) \right) _{n,m}, \end{aligned}$$
(20)

where \(\odot \) is the Hadamard (i.e. element-wise) product of matrices, \(\mathcal F\) denotes the discrete Fourier transform, \(R_{\theta _k}\) is the rotation of angle \(\theta _k\) and \(\varPsi ^{\text {cake}}\) is the cake mother wavelet (“Appendix A”).

We let \(F^0 = Lf_0\), and \(G_0 = L\mu \), where the local average intensity \(\mu \) is given by a Gaussian filtering of \(f_0\). The explicit time discretisation of the gradient descent (LHE-3D) is, for \(\Delta t\ll 1\) and \(\ell \in \mathbb N\),

$$\begin{aligned} \frac{F^{\ell +1} - F^\ell }{\Delta t} = -(1+\lambda )F^\ell + G_0 +\lambda F^0 + \nu \mathcal R_{F^\ell }, \end{aligned}$$
(21)

where \(\mathcal R_{F^\ell }\) is the discretisation of the integral term in (LHE-3D). That is, for a given 3D Gaussian matrix W encoding the weight \(\omega \) and an \(N\times N\times K\) matrix F, we let, for any \(n,m\in \{1,\ldots , N\}\) and \(k\in \{1,\ldots , K\}\),

$$\begin{aligned}&(\mathcal R_{F})_{n,m,k} \nonumber \\&\quad := \sum _{n',m'=1}^N\sum _{k'=1}^K W_{n-n', m-m', k-k'} \sigma ( F_{n,m,k} - F_{n',m',k'} ).\nonumber \\ \end{aligned}$$
(22)

We refer to [10, Section  IV.A] for the description of an efficient numerical approach used to compute the above quantity in the 2D case that can be translated verbatim to the 3D case under consideration.

After a suitable number of iterations \( \bar{\ell }\) of the above algorithm (measured by the stopping criterion \(\Vert F^{\ell +1}-F^\ell \Vert _2/\Vert F^\ell \Vert _2\le \tau \), for a fixed tolerance \(\tau \ll 1\)), the output image is then found via (18) as

$$\begin{aligned} \bar{f}_{n,m} = \sum _{k=1}^K F^{\bar{\ell }}_{n,m,k}. \end{aligned}$$
(23)
Fig. 3
figure 3

Model output for non-orientation-dependent examples. First column: original image. Second column: output of (LHE-2D) model. Third column: output of (LHE-3D) model. Parameters for (LHE-3D): \(\sigma _\mu =3\), \(\sigma _\omega =8\), \(\lambda =0.5\)

Fig. 4
figure 4

Line profiles of outputs in Fig. 3

3 Experiments

In this section, we present the results obtained by applying the cortical-inspired model presented in the previous section to some well-known phenomena where contrast perception may be affected by local orientations.

We compare the results obtained by our orientation-dependent 3D model (LHE-3D) with the corresponding 2D model (LHE-2D) already considered in [6, 36] for histogram equalisation and contrast enhancement. We further compare the performance of these models with two standard reference models based on oriented Gaussian filtering: the ODOG [13] and the BIWaM model [41]. In the former, the output is computed via a convolution of the input image with oriented difference of Gaussian filters in six orientations and seven spatial frequencies. The filtering outputs within the same orientation are then summed in a nonlinear fashion privileging higher frequencies. The BIWaM model is a variation of the ODOG model, the difference being the dependence on the local surround orientation of the contrast sensitivity function.Footnote 1

Prediction of the perceptual outcome In this study, our objective is to understand the capability of these models to replicate the visual illusions under consideration. That is, we are interested in whether the output produced by the models qualitatively agrees with the human perception of the phenomena in some specific and clearly visible region of the image called target. Examples of target are the grey central rectangles of Fig. 3a (left). We stress that our study is purely qualitative; it has to be intended as a proof of concept showing how the discussed models can be effectively used to replicate the perceptual effects according to our notion above. To do so, we use line profiles, which qualitatively predict the presence of a perceived illusory phenomenon by assessing a change of intensity grey levels in the target of each illusion. We do not address here the match of our numerical outcomes with empirical data since those depend on several further experimental conditions (image size, luminance of the presented stimulus, duration of the stimulus, etc.) for which a correspondence with the model parameters is not clear. A dedicated study on experiments motivated by psychophysics, addressing the validation of our models and, possibly, allowing for the creation of ground-truth references for a quantitative assessment is outside of the scope of this paper.

Parameters All the images considered in the following numerical experiments have size \(200 \times 200\) pixels. The lifting procedure to the space of positions and orientations is obtained by discretising \([0,\pi ]\) into \(K=30\) orientations. The parameter \(\nu \) is set \(\nu =1/2\). The relevant cake wavelets are then computed following [5], setting the frequency band \(\texttt {bw}=4\) for all experiments. In (LHE-3D), we compute the local mean average \(\mu \) by a 2D Gaussian filtering with standard deviation \(\sigma _\mu \) and the integral term by a 3D Gaussian filtering with standard deviation \(\sigma _\omega \). The gradient descent algorithm stops when the relative stopping criterion defined in Sect. 2.2 is verified with a tolerance \(\tau = 10^{-2}\).

3.1 Non-orientation-Dependent Examples

In this section, we test (LHE-2D) and (LHE-3D) on some classical non-orientation-dependent illusions. In particular, we focus on the three following examples:

  1. 1.

    White’s illusion [54], presented in Fig. 3a. Here, the left grey rectangle appears darker than the right one, although their brightness intensity is identical.

  2. 2.

    The simultaneous brightness contrast illusion [19, 22], presented in Fig. 3b. It consists in the lighter appearance of the left grey square than the right one.

  3. 3.

    The luminance illusion [37] presented in Fig. 3c. It consists of four identical dots over a background whose brightness is smoothly increasing from left to right: the dots on the left are perceived being lighter than the ones on the right.

We refer the reader to [8] where more non-orientation-dependent examples are studied.

Discussion As Figs. 3 and 4 show, both (LHE-2D) and (LHE-3D) predict the three described illusions. Figure 4 contains the line profiles relative to the results of Fig. 3. In Fig. 4a, we plot the central horizontal line of the images in Fig. 3a, which crosses both grey patches. As the plots show, both models correctly predict the left target to be perceived darker than the right one. Figure 4b contains the plot of the central horizontal line profile of the images in Fig. 3b, which crosses the two grey squares: both (LHE-2D) and (LHE-3D) correctly predict the left square to be lighter than the right one. Finally, in Fig. 4c we plot horizontal profiles crossing top left and right targets (grey circles) of the images in Fig. 3c. For each target, both models replicate the brighter perception of the left target with respect to the right one.

Notice that also the BIWaM and ODOG methods can correctly reproduce these illusions (see [13, 41] for numerical results).

Fig. 5
figure 5

Grating inductions with different orientations of the background grating w.r.t. to the central bar

Fig. 6
figure 6

Model outputs for input in Fig. 5a. Parameters for d): \(\sigma _{\mu } = 10\), \(\sigma _\omega = 5\), \(\lambda = 0.5\)

Fig. 7
figure 7

Model outputs for input in Fig. 5b. Parameters for (d): \(\sigma _{\mu } = 10\), \(\sigma _\omega = 5\), \(\lambda = 0.5\).

3.2 Grating Induction with Oriented Background

Grating induction (GI) is a contrast effect which has been first described in [39] and later studied, among others, in [13]. As the name suggests, the phenomenon describes the induction of a regular alternation of intensity changes on a constant image region due to the presence of an inducing background.

In this section, we describe our results on a variation in GI where a relative orientation \(\theta \) describes how much the background is oriented with respect to a constant grey bar in the middle of the image, see Fig. 5. In such situations, when the background has a different orientation from the central grey bar (i.e. \(\theta >0\)), an alternation of dark grey/light grey patterns within the central bar is produced and perceived by the observer. This phenomenon is contrast dependent, as the intensity of the induced grey patterns (dark grey/light grey) is in opposition with the background grating. Moreover, it is also orientation dependent, since the perceived intensity of the phenomenon varies depending on the background orientation, and, in particular, it is maximal when the background bars are orthogonal to the central one.

Discussion We observe that, in accordance with visual perception, model (LHE-3D) predicts the appearance of a counter-phase grating in the central grey bar, see Figs. 6d and 7d. The same result is obtained by the ODOG model, see Figs. 6a and 7a. Figures 8 and 9 show higher intensity profile when the background gratings are orthogonal to the central line, with respect to the case of background angle equal to \(\pi /3\), see orange and green dashed line. On the other hand, BIWaM and (LHE-2D) models do not appear suitable to describe this phenomenon. See for comparison the red and blue dashed lines in Figs. 8 and 9.

Fig. 8
figure 8

Middle line profiles of outputs in Fig. 6 (Color figure online)

Fig. 9
figure 9

Middle line profiles of outputs in Fig. 7 (Color figure online)

We will now consider a similar example, focusing more precisely on the illusory completion of collinear lines of the background in correspondence of the central grey bar.

3.3 Poggendorff Illusion

The Poggendorff illusion (Fig. 10b) consists in the perceived misalignment of two segments of a same continuous line due to the presence of a superposed surface. The perceived perceptual bias of the phenomenon has been investigated and studied via neurophysiological experiments, see, for example, [52, 53]. Recently, in [29, 30], a sub-Riemannian framework where orientations are computed via Gabor filtering has been used to study the geometrical versus perceptual completion effects induced by the illusion, successfully mimicking human perception. Here, we consider a modified version of the Poggendorff illusion, where the background is constituted by a grating pattern, see Fig. 10a, in order to account for both contrast and orientation features.

Note that this example is actually similar to the one considered in the previous section, the only difference being the width of the central grey bar, which is the responsible of the perceived misalignment.

Fig. 10
figure 10

Poggendorff illusion: input, detail extraction and result obtained by (LHE-3D). Parameters: \(\sigma _{\mu } = 3\), \(\sigma _\omega = 10\), \(\lambda = 0.5\)

Fig. 11
figure 11

Model outputs for the Poggendorff illusion in Fig. 10a via reference models, (LHE-2D), and (LHE-3D)

Fig. 12
figure 12

Middle line profiles of outputs in Fig. 11

Discussion The result obtained by applying (LHE-3D) to Fig. 10a is presented in Figs. 10c and 11d. As for the results on the grating induction presented in Sect. 3.2, we observe an induced counter-phase grating in the central grey bar.

However, the objective of this experiment goes further, the question being whether it is possible to compute numerically an image output reproducing the perceived misalignment between some fixed black stripe in the bottom part of Fig. 10a and its collinear prosecution in the upper part. Note that the perceived alignment differs from the actual geometrical one: for a fixed black stripe in the bottom part, the alignment of the corresponding collinear top stripe is in fact perceived slightly flushed left, see Fig. 10b, where single stripes have been isolated for better visualisation. The problem here is therefore not an inpainting problem, which is classical in the imaging community, but, rather, to reconstruct the perceptual output from the given input in Fig. 10a.

We now look at the results in Fig. 10c and mark by a continuous green line a fixed black stripe in the bottom part of the image. In order to find the corresponding perceived collinear stripe in the upper part, we follow how the model propagates the marked stripe across the central surface (dashed green line). We notice that the prosecution computed via the (LHE-3D) model does not correspond to its actual collinear prosecution, but, rather, it is in agreement with our perception. Comparisons with reference models are presented in Fig. 11, and the corresponding middle line profiles are shown in Fig. 12. We observe that the results obtained via the proposed (LHE-3D) model cannot be reproduced by the BIWaM nor the (LHE-2D) models, which, moreover, induce a non-counter-phase grating in the central grey bar which is different from the expected perceptual result. On the other hand, the result obtained by the ODOG model is consistent with ours, but presents a much less evident alternating grating within the central grey bar. In particular, the induced oblique bands are not visibly connected across the whole grey bar, i.e. their induced contrast is very poor and, consequently, the induced edges are not as sharp as the ones reconstructed via our model, see Fig. 12 for the middle line profile.

We further remark that a numerical implementation of the standard (WC) model, whose result is presented in Fig. 13, is not able to reproduce the desired perceptual completion. The model (LHE-3D) reproduces the visual illusion presented in this example better than (WC): this is consistent with the variational nature of the model discussed before.

Fig. 13
figure 13

Model output of the standard (WC) model for the input in Fig. 10a. See [8] for other results via the (WC) models and details on the implementation

Threshold for inpainting versus perceptual completion in Poggendorff grating Interestingly, the capability of model (LHE-3D) to reproduce the visual perception bias on the Poggendorff grating example is very much dependent on the choice of the parameter \(\sigma _{\omega }\) which accounts for the width of the interaction kernel.

As pointed out by the seminal works of Hubel, Wiesel and Bosking [16, 35, 49], it is possible to identify at least two main types of connectivity in the visual cortex: the intra-cortical connectivity, able to select the preferred orientations among cells belonging to the same hypercolumn and the long-range connectivity, connecting simple cells belonging to different hypercolumns (Fig. 14).

Fig. 14
figure 14

Examples of long-range (a) and intra-cortical (b) connectivity from [16]. Intra-cortical connectivity connects isotropically neurons belonging to the same hypercolumn, while the long range connects those belonging to different hypercolumns, but sensitive to the same orientation. Image Copyright 1997 Society for Neuroscience

Perceptual phenomena such as those presented in this work arise due to both these two connectivities, modelled in (LHE-3D) by the parameter \(\sigma _{\omega }\) (i.e. the standard variation in the Gaussian \(\omega \)), therefore accounting for smaller or bigger local interactions. This parameter can thus be modulated to vary the width of the connectivity between different hypercolumns so that when \(\sigma _\omega \) is small with respect to the overall size of the processed image, the geometrical completion (inpainting) is reproduced. On the other hand, when \(\sigma _\omega \) is large, perceptual-oriented phenomena such as illusory contours or geometrical optical illusions can be modelled. The change between these two types of interactions observed as the parameter \(\sigma _\omega \) grows is shown in Fig. 15.

This example highlights also the flexibility of our models to adapt to image processing problems and, at the same time but for different choices of parameters, to the modelling of the neural activity in V1.

Fig. 15
figure 15

Sensitivity to parameter \(\sigma _w\) for (LHE-3D) model. The completion inside the middle grey bar changes from geometrical (inpainting type) to illusory (perception type). The transition can be observed when varying \(\sigma _\omega = 5\) to \(\sigma _\omega = 6\), while the illusory phenomenon holds for \(\sigma _\omega \) bigger than 6. The other parameters are fixed across experiments: \(\sigma _\mu = 2\), \(\lambda = 0.8\)

Fig. 16
figure 16

Reconstruction of Shapley and Gordon illusions, see [47, Figures 1(A), 1(B)]. First row: image 1(A) in [47]. From left to right: original image, reconstruction via the (LHE2D) model, reconstruction via the (LHE3D) model. Second row: image 1(B) in [47]. From left to right: original image, reconstruction via the (LHE2D) model, reconstruction via the (LHE3D) model. Parameters for (LHE3D): \(\sigma _\mu =10\), \(\sigma _\omega =50\), \(\lambda =0.5\)

Fig. 17
figure 17

Line profiles of the images in Fig. 16

4 Conclusions

In this paper, we considered a neurophysiological evolution model to study the visual perception bias induced by contrast and, possibly, local orientation dependence. The proposed model has been originally introduced in [10] in the context of image processing for local histogram equalisation (LHE,) and it is a variation of the celebrated Wilson and Cowan equations, formulated in [55] to describe the evolution of a population of neurons in V1.

Firstly, in Sec. 2 we investigated on the efficient representation properties of the original WC model. In mathematical terms, this consists in interpreting the corresponding dynamics as the gradient descent of suitable energy functionals. We rigorously prove that for the WC model there is no energy minimised by the WC dynamics (Theorem 1), while an energy functional minimised by the stationary solutions of the LHE variant exists [see formula (15)].

Secondly, by mimicking the structure of V1, we extended the mathematical formulation of the LHE model to a third dimension in order to describe local orientation preferences. This new model, denoted by (LHE-3D), can be efficiently implemented via convolution with appropriate kernels and solved numerically via standard explicit schemes. The information on the local orientation allows to describe contrast phenomena as well as orientation-dependent illusions.

In Sec. 3, we tested this extension of LHE on some orientation-independent brightness illusions, showing that it is able to reproduce the perceptual results as well as standard linear + nonlinear filtering (such as the ODOG and the BIWaM models [13, 41]). Then, we performed some further tests on orientation-dependent illusions (such as grating induction and the Poggendorff illusion), observing that only the proposed orientation-dependent extension of the LHE model is capable of replicating the perceived visual bias. In agreement with the theoretical sub-optimality of the standard WC model with respect to the efficient representation principle pointed out before, it turns out that, among the neural field models tested, the (LHE-3D) model is the one capable of replicating the bigger number of illusions.

Finally, we reported a preliminary empirical discussion on the sensitivity of model (LHE-3D) to parameters describing different connectivity properties between hypercolumns in V1. Our experiment revealed the existence of a threshold parameter in correspondence of which the completion properties of model (LHE-3D) switch from inpainting type to perceptual type. A more accurate theoretical study based, for example, on bifurcation and stability analysis of the equilibria of the model, is left for future research.

Further investigations should also address a more accurate modelling reflecting the actual structure of V1. In particular, this concerns the lift operation where the cake wavelet filters can be replaced by Gabor filters as in [30], as well as the interaction weight \(\omega \) which could be taken to be the anisotropic heat kernel of [23]. Furthermore, more features of the image (e.g. scale, frequency, color, etc) should also be considered in future work. According to a preliminary analysis that we performed in this direction, (LHE-3D) seems to be promising when it comes to account for scale. In Fig. 16a, we present a variant of the luminance illusion where only one foreground round patch appears in the image at a bigger scale, while in Fig. 16d we present a variant where the target brightness has a gradient counterposed with respect to the background. The results obtained by applying (LHE-2D) and (LHE-3D) to Fig. 16a, d are then reported. We observe that both models correctly reproduce a change of sign in the contrast of the foreground patch, enhancing a 3D effect of the central grey patch. Moreover, (LHE-3D) seems to correctly predict the appearance of an illusory contour in the boundary of the central round patch, as showcased by the line profiles in Fig. 17b, d.

Extensive numerical experiments should also be performed to assess the compatibility of the model outputs with the psychophysical tests measuring the perceptual bias induced by these and other phenomena such as the ones discussed in [8]. This would provide insights about the robustness of the model in reproducing the visual pathway behaviour.