Abstract
We consider the evolution model proposed in Bertalmío (Front Comput Neurosci 8:71, 2014), Bertalmío et al. (IEEE Trans Image Process 16(4):1058–1072, 2007) to describe illusory contrast perception phenomena induced by surrounding orientations. Firstly, we highlight its analogies and differences with the widely used Wilson–Cowan equations (Wilson and Cowan in BioPhys J 12(1):1–24, 1972), mainly in terms of efficient representation properties. Then, in order to explicitly encode local directional information, we exploit the model of the primary visual cortex (V1) proposed in Citti and Sarti (J Math Imaging Vis 24(3):307–326, 2006) and largely used over the last years for several image processing problems (Duits and Franken in Q Appl Math 68(2):255–292, 2010; Prandi and Gauthier in A semidiscrete version of the Petitot model as a plausible model for anthropomorphic image reconstruction and pattern recognition. SpringerBriefs in Mathematics, Springer, Cham, 2017; Franceschiello et al. in J Math Imaging Vis 60(1):94–108, 2018). The resulting model is thus defined in the space of positions and orientation, and it is capable of describing assimilation and contrast visual bias at the same time. We report several numerical tests showing the ability of the model to reproduce, in particular, orientation-dependent phenomena such as grating induction and a modified version of the Poggendorff illusion. For this latter example, we empirically show the existence of a set of threshold parameters differentiating from inpainting to perception-type reconstructions and describing long-range connectivity between different hypercolumns in V1.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Recent studies on vision research have shown that many, if not most, popular vision models can be described by a cascade of linear and nonlinear (L \(+\) NL) operations [38]. This is the case for several reference models describing visual perception—e.g. the Oriented Difference of Gaussians (ODOG) [13] or the Brightness Induction Wavelet Model (BIWaM) [41]—and, analogously, for models describing neural activities [20]. These L \(+\) NL models are suitable in many cases for describing retinal and thalamic activity, but they have been shown to have low predictive power for modelling the neural activity in the primary visual cortex (V1), explaining less than 40% of the variance of the data [20]. On the other hand, there exist several models in vision research which cannot be expressed as a combination of (L \(+\) NL) operations. Prominent examples are models describing neural dynamics via Wilson–Cowan equations [18, 45, 55]. Although these models have been extensively studied by the neuroscience community to describe cortical low-level dynamics, see, for example, [24], their use in the context of psychophysics to describe, for example, visual illusions has been considered only very recently [8].
In [6, 10, 11], the authors show how a slight, yet effective, modification of the Wilson–Cowan equation that does not consider orientation admits a variational formulation through an associated energy functional which can be linked to histogram equalisation, visual adaptation and the efficient representation principle, an important school of thought in vision science [40]. This principle, introduced by Attneave [2] and Barlow [4], is based on viewing neural systems through the lens of information theory and states that neural responses aim to overcome neurobiological constraints and to optimise the limited biological resources by self-adapting to the statistics of the images that the individual typically encounters, so that the visual information can be encoded in the most efficient way. Natural images (and, more generally, images in urban environments) are in fact not random arrays of values, since they present a significant statistical structure. With respect to such statistics, nearby points tend to have similar values; as a result, there is significant correlation among pixels, with a redundancy of \(90\%\) or more [1], and it would be highly inefficient and detrimental for the visual system to simply encode each pixel independently. Another very important reason to remove redundant statistical information from the representation is that the statistical rules impose constraints on the image values that are produced, preventing the encoded signal from utilising the full capacity of the visual channel, which is another inefficient or even wasteful use of biological resources. By removing what is redundant or predictable from the statistics of the visual stimulus, the visual system can concentrate on what’s actually informative [44]. Remarkably, the efficient representation principle has correctly predicted a number of neural processing aspects and phenomena and is the only framework able to predict the functional properties of neurons from a very simple principle. In [1], Atick makes the point that one of the two different types of redundancy or inefficiency in the visual system is the one that happens if some neural response levels are used more frequently than others: for this type of redundancy, the optimal code is the one that performs histogram equalisation, which can be obtained by means of the modification of the WC model described above.
Contribution The first contribution of this paper is to formally prove, in a completely general setting, that Wilson–Cowan equations are non-variational, i.e. they cannot be written as the gradient flow of an \(L^2\) energy functional. For this reason, their solutions do not provide a representation as efficient as the solutions to the local histogram equalisation model.
As a second contribution, we introduce an explicit orientation dependence both into the WC equations and into this modification via a lifting procedure inspired by the neurophysiological modelling of V1 [23, 27, 43], which has also been applied to several image processing problems [15, 57]. The lifting procedure, illustrated in Fig. 1, consists in associating with each point of the retinal plane \(x \in \mathbb {R}^2\), the tangent direction \(\theta \) of the contour at point x, thus “lifting” the retinal plane \(\mathbb {R}^2\) to the feature space \(\mathbb {R}^2 \times \mathbb {P}^1\) of positions and orientations. This mathematical construction mimics the neural representations of the image features that the visual cortex performs, as it is well known from the studies in vision neuroscience by Hubel and Wiesel [35].
We then report some numerical evidence showing how the proposed model is able to better reproduce several visual perception biases than both its orientation-independent version and some reference (L \(+\) NL) models. In particular, after reporting some numerical results for classical non-orientation-dependent illusions, we test our model on orientation-dependent grating induction (GI) phenomena (generalising the ones presented in [13, Figure 3], see also [39]) and show a direct dependence of the output image on the local orientation, which cannot be described by orientation-independent models.
We then test the proposed model on a modified version of the Poggendorff illusion, a geometrical optical effect where a misalignment of two collinear segments is induced by the presence of a surface [52, 53], see Fig. 10a. For this modified version, our model is able to integrate the contrast feature better than state-of-the-art models such as those based on filtering techniques [13, 41], on natural images statistics [34] and cortical-based ones [29, 30]. Moreover, we also show that such feature is not correctly integrated by the classical WC equations even when orientation is explicitly taken into account in the modelling.
Finally, we report an empirical study concerning the sensitivity of the model to parameters showing the existence of threshold values able to change the nature of the completion properties of the model, for example, to make it switch from inpainting-type (geometrical completion) to perception-type (perceptual completion).
A preliminary version of this work, including some of the tests presented here, appeared in [9].
2 Variational and Evolution Methods in Vision Research
The use of variational methods for solving ill-posed imaging problems is nowadays very classical within the imaging community. For a given degraded image f and a (possibly nonlinear) degradation operator \(\mathcal {T}\) modelling noise, blur and/or under-sampling in the data, the solution of the problem
often lacks fundamental properties such as existence, uniqueness and stability, requiring alternative strategies to be used in order to reformulate the problem in a well-posed way.
In the context of variational regularisation approaches, for instance, one looks for an approximation \(u_\star \) of the real solution u by solving a suitable optimisation problem, so that
where \(\mathcal E\) is a (possibly non-convex) energy functional which typically combines prior information available both on the image and on the physical nature of the signal (in terms, for instance, of its noise statistics), see, for example, [21] for a review.
In convex and smooth scenarios, a common alternative consists in considering the steepest descent of \(\mathcal {E}\) defined in terms of the Fréchet derivative \(\nabla \mathcal {E}\) calculated w.r.t. to some norm, which reduces the problem to the form
under appropriate conditions on the boundary of the image domain. Then, solutions \(u_\star \) to (2) correspond to stationary solutions of (3). We remark that while the connection between variational problems and parabolic PDEs is always guaranteed by taking the gradient descent of the corresponding energy functional as above, the reverse is not always possible, as it requires some additional structure of the functional space considered that may lack in several cases. We will comment on this issue in the next section, where we will provide some examples in this respect focusing at some neurophysiologically inspired models for vision.
In such context, evolution equations have been originally used as a tool to describe the physical transmission, diffusion and interaction phenomena of stimuli in the visual cortex, see, for example, [24]. Similarly, variational methods have been studied by the vision community to describe efficient neural coding properties, see, for example, [40, 51], i.e. all the mechanisms used by the human visual system to optimise the visual experience via the reduction in redundant spatio-temporal biases linked to the perceived stimulus.
In the context of vision, a first study on the efficient representation aspects of some neurophysiological model analogous to the one considered in this work has been recently performed by the authors in [8] where several visual illusions are studied.
2.1 Wilson–Cowan-Type Models for Neuronal Activation
A prominent example of evolution models describing neuronal dynamics is the Wilson–Cowan (WC) equations [18, 55] that we present here in a general context.
Consider a neuronal population parametrised by a set \(\varOmega \), endowed with a measure \(\mathrm{d}\xi \) supported on the whole \(\varOmega \). In the following sections, we will be interested in the two cases: \(\varOmega = {{\,\mathrm{\mathbb {R}}\,}}^2\) and \(\varOmega = {{\,\mathrm{\mathbb {R}}\,}}^2\times \mathbb {P}^1\), both endowed with the corresponding Lebesgue measure. Denoting by \(a(\xi ,t)\in {{\,\mathrm{\mathbb {R}}\,}}\) the state of a population of neurons with coordinates \(\xi \in \varOmega \) at time \(t>0\), the Wilson–Cowan model reads
Here, \(\beta >0\) and \(\nu \in \mathbb {R}\) are fixed parameters, \(\omega (\xi \Vert \xi ')\) is a kernel that models interactions at two different locations \(\xi \) and \(\xi '\), the function h represents an external stimulus, and \(\sigma {:}\,\mathbb {R}\rightarrow \mathbb {R}\) is a nonlinear sigmoid saturation function.
In the following, we further assume that the interaction kernel \(\omega \) is non-negative and normalised:
Moreover, as a sigmoid \(\sigma \), we consider the following odd function:
which has been previously considered, for example, in [10]. Observe that, depending on the sign of \(\nu \), model (WC) is able to describe both excitatory (\(\nu >0\)) and inhibitory local interactions (\(\nu <0\)), see, for example, [17, Section 3]. Due to the oddness of \(\sigma \), this latter case can be equivalently expressed by keeping \(\nu >0\) and replacing \(\sigma \) with its “mirrored” version \(\hat{\sigma }(\rho ) = \sigma (-\rho ), ~\rho \in \mathbb {R}\), see Fig. 2.
Equation (WC) has been studied intensively over the last decades to describe several neuronal mechanisms in V1, see, for example, [3, 24, 28, 45, 50]. However, one interesting aspect which, up to our knowledge, has not been previously investigated, is whether (WC) complies with any efficient representation principle, or, in more mathematical terms, whether such model can be interpreted as the gradient descent in the form (3) of some energy functional defined on \(L^2(\varOmega )\).
As a first result, we show in the following that the model (WC) does not satisfy a variational principle. As a consequence, it does not implement an efficient neural coding mechanism. A preliminary study has been performed by the authors in [8], in a completely discrete setting. Here, we make these considerations more rigorous by the following theorem.
Theorem 1
Assume that there exist two subsets of positive measure \(U_1,U_2\subset \varOmega \), \(U_1\cap U_2 = \varnothing \) such that \(\omega (\xi \Vert \xi ')> 0\) for any \(\xi \in U_1\) and \(\xi '\in U_2\). Then, for \(\sigma \) chosen as above, the Wilson–Cowan equation (WC) does not admit a variational formulation, that is, it cannot be expressed as the gradient descent in the Fréchet sense of any densely defined energy \(\mathcal {E}\).
Proof
We proceed by contradiction and assume that there exists a densely defined energy \(\mathcal E\) on \(L^2(\varOmega )\) such that (WC) can be expressed in the form (3).
Let \(\chi _i{:}\,\varOmega \rightarrow \{0,1\}\) be the characteristic function of \(U_i\), \(i=1,2\). Since up to reducing it we can always assume \(U_i\) to have finite measure, we have that \(\chi _i\in L^2(\varOmega )\). Then, we define \(J{:}\,{{\,\mathrm{\mathbb {R}}\,}}^2\rightarrow {{\,\mathrm{\mathbb {R}}\,}}\) by
By definition, we have
Here, \(\nabla \mathcal E\) denotes the Fréchet derivative of \(\mathcal E\), and \(\langle \cdot ,\cdot \rangle \) denotes the scalar product in \(L^2(\varOmega )\). Thus, by (3) and (WC), there holds
We now aim to differentiate again the above w.r.t. the jth variable, \(j=1,2\). Observe that since \(\sigma \) is Lipschitz continuous, it is differentiable almost everywhere and, thanks to the fact that \(U_1\cap U_2=\varnothing \), for a.e. \(v\in {{\,\mathrm{\mathbb {R}}\,}}^2\) we have
This shows that for a.e. \(v\in {{\,\mathrm{\mathbb {R}}\,}}^2\) it holds
where \(\delta _{ij}\) is the Kronecker delta symbol and \(c_{ij}\) is defined as
Observe that, by (4), the assumption on \(U_1\) and \(U_2\) and the fact that they have finite measure, it holds that \(0\le c_{ij}<+\infty \) for any \(i,j\in \{1,2\}\). Moreover, since \(\omega (\xi \Vert \xi ')>0\) for \(\xi \in U_1\) and \(\xi '\in U_2\), we have that \(c_{21}> 0\).
We now claim that \(J\in C^2(A\times A)\), where \(A = \{t\in {{\,\mathrm{\mathbb {R}}\,}}: |t|\ne 1/\alpha \}\) is the set of differentiability of \(\sigma \). To this purpose, we compute
Then, by (10), for any \(v\in A\times A\) and \(i,j\in \{1,2\}\) it holds
As a consequence, \(\partial _{ji}J\) is continuous on \(A\times A\), proving the claim.
To conclude the proof, we show that \(\partial _{21}J\not \equiv \partial _{12}J\), which contradicts the \(C^2\) differentiability of J by the Schwarz theorem and thus shows that the r.h.s. of (WC) cannot be the Fréchet derivative of an energy. Indeed, it suffices to consider \(v\in A\times A\) with \(v_1>1/\alpha \) and \(v_2<1/\alpha \), which by (13) implies
This completes the proof of the statement. \(\square \)
Remark 1
The above argument can be easily extended to any Lipschitz continuous sigmoid \(\sigma \) with non-constant derivative.
Remark 2
The variational nature of physical models describing neural interaction has been investigated in other contexts. For instance, in [33], the authors consider neural models eventually arising from asymmetric interaction kernels. We also refer to [32] for the identification of a Lyapunov functional for a Wilson–Cowan-like equation.
To overcome the non-existence of an underlying energy for (WC) and deal with a model complying with the efficient representation principle, we will consider in the following a variation in (WC), which has been introduced in [10] for Local Histogram Equalisation (LHE) of images in the particular case where \(\varOmega \) is a square domain in \({{\,\mathrm{\mathbb {R}}\,}}^2\). Keeping now \(\varOmega \) general and using the same notation as above, this model can be written as
We note that the only difference between (LHE) and (WC) is the different input of the sigmoid \(\sigma \) appearing inside the integral. While in (WC) this is equal to the stimulus intensity at location \(\xi '\), in (LHE) this is equal to the difference between the population at the point under consideration and the neighbouring ones.
Following the same line of proof as in [10] and letting \(\varSigma {:}\,\mathbb {R}\rightarrow \mathbb {R}\) be any (even) primitive function of \(\sigma \), it is easy to show that independently on the choice of \(\varOmega \), equation (LHE) is the gradient descent in the sense of (3) of the following energy functional
The functional \(\mathcal {E}\) is the sum of three different terms: the first two can be thought of as data fitting terms whose minimisation forces the solution of (15) to stay close to the given stimulus and, possibly, to global average brightness intensity levels; the third one is an interaction term whose minimisation corresponds to maximise the local contrast (see [10] for more details). In the following section, we will make precise some specific choices of h, which will clarify the different ingredients of model (15) in more detail.
2.1.1 Orientation-Independent Modelling
We now focus on the application of (LHE) to describe contrast perception phenomena independent on local orientation information. To do that, we recall in the following the specific instance of the (LHE) model introduced in [10]. We model the visual plane as a rectangular domain \(Q\subset {{\,\mathrm{\mathbb {R}}\,}}^2\) and consider grey-scale visual stimuli to be functions \(f{:}\,Q\rightarrow [0,1]\), such that f(x) encodes the brightness intensity at x. For a given initial stimulus \(f_0\), we then denote by \(\mu \) its local average intensity computed as the convolution \(\mu = g\star f_0\) of \(f_0\) with some filter \(g\in L^1(Q)\) with \(\int _Q g(x)\,\mathrm{d}x = 1\).
In [10], the filter g was chosen to be uniform, while in [6] it was changed to a simple Gaussian in order to reproduce visual induction effects; another possibility for g would be to use a sum of Gaussians, which has been shown to better approximate lateral inhibition effects happening at the retinal level [56]. We also take the activation in (LHE) to be \(a: = f-1/2\), corresponding to the way our visual system encodes contrast (i.e. as the difference with respect to the average, which we take to be \(\frac{1}{2}\)). For the external stimulus h, we take a weighted sum of the initial stimulus \(a|_{t=0} = f_0-1/2\) and its filtering via g; in the visual system, this corresponds to a combination of magnocellular (spatially averaged) and parvocellular (fine detail) pathway information. Namely, for \(\lambda >0\), we consider:
We stress that the input h is time independent. Such simplification follows from considering in our modelling the very short time frame where the stimulus is presented and retained by the visual system, a time length that is typically known as iconic memory. For visual illusions such as the ones presented in Sect. 3, this time frame typically spans less than 200 ms [48], which corresponds to the fixation time between rapid eye movements, and therefore, the temporal changes in h can be neglected.
By plugging the above ingredients in Eq. (LHE), and letting \(\beta = 1+\lambda \), we obtain the following (orientation-independent) LHE evolution model:
Remark 3
Re-arranging the (LHE-2D) equation as
we can better see the effect of each of its terms: the one multiplied by the parameter \(\nu \) enhances local contrast, the one multiplied by \(\lambda \) penalises the departure from the original function \(f_0\), and the term \(\mu (x) - f(x,t)\) pushes the solution towards the local mean. Note that if \(\mu (x)\) is the constant value 1/2, it can be considered as a global average, and the solution is then consistent with the so-called grey world principle that states that in a sufficiently varied scene the average perceived color is a mid-grey, i.e. a mean value of \(\frac{1}{2}\) for each color channel [7, 10].
As far as the interaction kernel \(\omega \) is concerned, in [36] the authors consider a kernel \(\omega \) which is a convex combination of two bi-dimensional Gaussians with different standard deviations. While this variation in the model (LHE-2D) is effective in describing assimilation effects, the lack of dependence on the local orientation makes such modelling intrinsically not adapted to explain orientation-induced contrast and colour perception effects such as the ones described in [13, 41, 46]. Reference models capable of explaining these effects are mostly based on oriented difference of Gaussian linear filtering coupled with some nonlinear processing, such as the ODOG and the BIWaM models described in [12, 13, 41], respectively. However, despite their good effectiveness in the reproduction of several visual perception phenomena, these models are not based on any neuronal evolution modelling nor on any efficient representation (i.e. variational) principle.
2.1.2 Orientation-Dependent Modelling
We now focus on orientation-dependent models. For a given visual stimulus f, we let \(Lf{:}\,Q \times [0,\pi ) \rightarrow \mathbb {R}\) be the corresponding cortical activation in V1, where \(Lf(x,\theta )\) encodes the response of the neuron with spatial preference x and orientation preference \(\theta \) to the stimulus f. Such activation is obtained via convolution with the receptive fields of V1 neurons, as explained in “Appendix A”, see also [23, 27, 42, 43]. Then, similar to above, the model (LHE) for a cortical activation \(a(x,\theta )\) depending on the local V1 coordinate \(\xi =(x,\theta )\) is obtained as follows: We define \(F:= a+1/2\) to be the visual stimulus and take as external stimulus \(h = L\mu + \lambda Lf_0 - (1+\lambda )/2\) [compare with (16)]. This, combined with the choice \(\beta = 1+\lambda \), yields the equation
Here the kernel \(\omega \) depends both on positions \(x,y\in {{\,\mathrm{\mathbb {R}}\,}}^2\) and orientations \(\theta ,\phi \in [0,\pi )\). A typical choice for this kernel would be the anisotropic heat kernel naturally associated with the V1 connectivity, as considered in [45]. However, for numerical reasons, the results presented in the following are obtained by considering simply 3D Gaussians.
We remark once again that the model above describes the dynamic behaviour of neuronal activations in the 3D space of positions and orientation. As explained in “Appendix A”, once a stationary solution is found, the two-dimensional perceived image can be efficiently found by
Remark 4
In the following, we will consider the interaction to be excitatory (i.e. \(\nu >0\)) for both (LHE-2D) and (LHE-3D) models. Indeed, the integral term in both models is positive at x if, for example, \(f(x,t)>f(y,t)\). Thus, in order to enhance the contrast between x and its surround we need to have \(\nu >0\).
We now discuss on the numerical aspects required to implement model (LHE-3D).
2.2 Discretisation Via Gradient Descent
We discretise the initial (square) image \(f_0\) as an \(N\times N\) matrix. For simplicity, we assume here periodic boundary conditions. We additionally consider \(K\in \mathbb N\) orientations, parametrised by
The discretised lift operator, still denoted by L, transforms \(N\times N\) matrices into \(N\times N\times K\) arrays. Its action on an \(N\times N\) matrix f is defined for \(n,m\in \{1,\ldots , N\}\) and \(k\in \{1,\ldots ,K\}\) by
where \(\odot \) is the Hadamard (i.e. element-wise) product of matrices, \(\mathcal F\) denotes the discrete Fourier transform, \(R_{\theta _k}\) is the rotation of angle \(\theta _k\) and \(\varPsi ^{\text {cake}}\) is the cake mother wavelet (“Appendix A”).
We let \(F^0 = Lf_0\), and \(G_0 = L\mu \), where the local average intensity \(\mu \) is given by a Gaussian filtering of \(f_0\). The explicit time discretisation of the gradient descent (LHE-3D) is, for \(\Delta t\ll 1\) and \(\ell \in \mathbb N\),
where \(\mathcal R_{F^\ell }\) is the discretisation of the integral term in (LHE-3D). That is, for a given 3D Gaussian matrix W encoding the weight \(\omega \) and an \(N\times N\times K\) matrix F, we let, for any \(n,m\in \{1,\ldots , N\}\) and \(k\in \{1,\ldots , K\}\),
We refer to [10, Section IV.A] for the description of an efficient numerical approach used to compute the above quantity in the 2D case that can be translated verbatim to the 3D case under consideration.
After a suitable number of iterations \( \bar{\ell }\) of the above algorithm (measured by the stopping criterion \(\Vert F^{\ell +1}-F^\ell \Vert _2/\Vert F^\ell \Vert _2\le \tau \), for a fixed tolerance \(\tau \ll 1\)), the output image is then found via (18) as
3 Experiments
In this section, we present the results obtained by applying the cortical-inspired model presented in the previous section to some well-known phenomena where contrast perception may be affected by local orientations.
We compare the results obtained by our orientation-dependent 3D model (LHE-3D) with the corresponding 2D model (LHE-2D) already considered in [6, 36] for histogram equalisation and contrast enhancement. We further compare the performance of these models with two standard reference models based on oriented Gaussian filtering: the ODOG [13] and the BIWaM model [41]. In the former, the output is computed via a convolution of the input image with oriented difference of Gaussian filters in six orientations and seven spatial frequencies. The filtering outputs within the same orientation are then summed in a nonlinear fashion privileging higher frequencies. The BIWaM model is a variation of the ODOG model, the difference being the dependence on the local surround orientation of the contrast sensitivity function.Footnote 1
Prediction of the perceptual outcome In this study, our objective is to understand the capability of these models to replicate the visual illusions under consideration. That is, we are interested in whether the output produced by the models qualitatively agrees with the human perception of the phenomena in some specific and clearly visible region of the image called target. Examples of target are the grey central rectangles of Fig. 3a (left). We stress that our study is purely qualitative; it has to be intended as a proof of concept showing how the discussed models can be effectively used to replicate the perceptual effects according to our notion above. To do so, we use line profiles, which qualitatively predict the presence of a perceived illusory phenomenon by assessing a change of intensity grey levels in the target of each illusion. We do not address here the match of our numerical outcomes with empirical data since those depend on several further experimental conditions (image size, luminance of the presented stimulus, duration of the stimulus, etc.) for which a correspondence with the model parameters is not clear. A dedicated study on experiments motivated by psychophysics, addressing the validation of our models and, possibly, allowing for the creation of ground-truth references for a quantitative assessment is outside of the scope of this paper.
Parameters All the images considered in the following numerical experiments have size \(200 \times 200\) pixels. The lifting procedure to the space of positions and orientations is obtained by discretising \([0,\pi ]\) into \(K=30\) orientations. The parameter \(\nu \) is set \(\nu =1/2\). The relevant cake wavelets are then computed following [5], setting the frequency band \(\texttt {bw}=4\) for all experiments. In (LHE-3D), we compute the local mean average \(\mu \) by a 2D Gaussian filtering with standard deviation \(\sigma _\mu \) and the integral term by a 3D Gaussian filtering with standard deviation \(\sigma _\omega \). The gradient descent algorithm stops when the relative stopping criterion defined in Sect. 2.2 is verified with a tolerance \(\tau = 10^{-2}\).
3.1 Non-orientation-Dependent Examples
In this section, we test (LHE-2D) and (LHE-3D) on some classical non-orientation-dependent illusions. In particular, we focus on the three following examples:
-
1.
White’s illusion [54], presented in Fig. 3a. Here, the left grey rectangle appears darker than the right one, although their brightness intensity is identical.
-
2.
The simultaneous brightness contrast illusion [19, 22], presented in Fig. 3b. It consists in the lighter appearance of the left grey square than the right one.
-
3.
The luminance illusion [37] presented in Fig. 3c. It consists of four identical dots over a background whose brightness is smoothly increasing from left to right: the dots on the left are perceived being lighter than the ones on the right.
We refer the reader to [8] where more non-orientation-dependent examples are studied.
Discussion As Figs. 3 and 4 show, both (LHE-2D) and (LHE-3D) predict the three described illusions. Figure 4 contains the line profiles relative to the results of Fig. 3. In Fig. 4a, we plot the central horizontal line of the images in Fig. 3a, which crosses both grey patches. As the plots show, both models correctly predict the left target to be perceived darker than the right one. Figure 4b contains the plot of the central horizontal line profile of the images in Fig. 3b, which crosses the two grey squares: both (LHE-2D) and (LHE-3D) correctly predict the left square to be lighter than the right one. Finally, in Fig. 4c we plot horizontal profiles crossing top left and right targets (grey circles) of the images in Fig. 3c. For each target, both models replicate the brighter perception of the left target with respect to the right one.
Notice that also the BIWaM and ODOG methods can correctly reproduce these illusions (see [13, 41] for numerical results).
3.2 Grating Induction with Oriented Background
Grating induction (GI) is a contrast effect which has been first described in [39] and later studied, among others, in [13]. As the name suggests, the phenomenon describes the induction of a regular alternation of intensity changes on a constant image region due to the presence of an inducing background.
In this section, we describe our results on a variation in GI where a relative orientation \(\theta \) describes how much the background is oriented with respect to a constant grey bar in the middle of the image, see Fig. 5. In such situations, when the background has a different orientation from the central grey bar (i.e. \(\theta >0\)), an alternation of dark grey/light grey patterns within the central bar is produced and perceived by the observer. This phenomenon is contrast dependent, as the intensity of the induced grey patterns (dark grey/light grey) is in opposition with the background grating. Moreover, it is also orientation dependent, since the perceived intensity of the phenomenon varies depending on the background orientation, and, in particular, it is maximal when the background bars are orthogonal to the central one.
Discussion We observe that, in accordance with visual perception, model (LHE-3D) predicts the appearance of a counter-phase grating in the central grey bar, see Figs. 6d and 7d. The same result is obtained by the ODOG model, see Figs. 6a and 7a. Figures 8 and 9 show higher intensity profile when the background gratings are orthogonal to the central line, with respect to the case of background angle equal to \(\pi /3\), see orange and green dashed line. On the other hand, BIWaM and (LHE-2D) models do not appear suitable to describe this phenomenon. See for comparison the red and blue dashed lines in Figs. 8 and 9.
We will now consider a similar example, focusing more precisely on the illusory completion of collinear lines of the background in correspondence of the central grey bar.
3.3 Poggendorff Illusion
The Poggendorff illusion (Fig. 10b) consists in the perceived misalignment of two segments of a same continuous line due to the presence of a superposed surface. The perceived perceptual bias of the phenomenon has been investigated and studied via neurophysiological experiments, see, for example, [52, 53]. Recently, in [29, 30], a sub-Riemannian framework where orientations are computed via Gabor filtering has been used to study the geometrical versus perceptual completion effects induced by the illusion, successfully mimicking human perception. Here, we consider a modified version of the Poggendorff illusion, where the background is constituted by a grating pattern, see Fig. 10a, in order to account for both contrast and orientation features.
Note that this example is actually similar to the one considered in the previous section, the only difference being the width of the central grey bar, which is the responsible of the perceived misalignment.
Discussion The result obtained by applying (LHE-3D) to Fig. 10a is presented in Figs. 10c and 11d. As for the results on the grating induction presented in Sect. 3.2, we observe an induced counter-phase grating in the central grey bar.
However, the objective of this experiment goes further, the question being whether it is possible to compute numerically an image output reproducing the perceived misalignment between some fixed black stripe in the bottom part of Fig. 10a and its collinear prosecution in the upper part. Note that the perceived alignment differs from the actual geometrical one: for a fixed black stripe in the bottom part, the alignment of the corresponding collinear top stripe is in fact perceived slightly flushed left, see Fig. 10b, where single stripes have been isolated for better visualisation. The problem here is therefore not an inpainting problem, which is classical in the imaging community, but, rather, to reconstruct the perceptual output from the given input in Fig. 10a.
We now look at the results in Fig. 10c and mark by a continuous green line a fixed black stripe in the bottom part of the image. In order to find the corresponding perceived collinear stripe in the upper part, we follow how the model propagates the marked stripe across the central surface (dashed green line). We notice that the prosecution computed via the (LHE-3D) model does not correspond to its actual collinear prosecution, but, rather, it is in agreement with our perception. Comparisons with reference models are presented in Fig. 11, and the corresponding middle line profiles are shown in Fig. 12. We observe that the results obtained via the proposed (LHE-3D) model cannot be reproduced by the BIWaM nor the (LHE-2D) models, which, moreover, induce a non-counter-phase grating in the central grey bar which is different from the expected perceptual result. On the other hand, the result obtained by the ODOG model is consistent with ours, but presents a much less evident alternating grating within the central grey bar. In particular, the induced oblique bands are not visibly connected across the whole grey bar, i.e. their induced contrast is very poor and, consequently, the induced edges are not as sharp as the ones reconstructed via our model, see Fig. 12 for the middle line profile.
We further remark that a numerical implementation of the standard (WC) model, whose result is presented in Fig. 13, is not able to reproduce the desired perceptual completion. The model (LHE-3D) reproduces the visual illusion presented in this example better than (WC): this is consistent with the variational nature of the model discussed before.
Threshold for inpainting versus perceptual completion in Poggendorff grating Interestingly, the capability of model (LHE-3D) to reproduce the visual perception bias on the Poggendorff grating example is very much dependent on the choice of the parameter \(\sigma _{\omega }\) which accounts for the width of the interaction kernel.
As pointed out by the seminal works of Hubel, Wiesel and Bosking [16, 35, 49], it is possible to identify at least two main types of connectivity in the visual cortex: the intra-cortical connectivity, able to select the preferred orientations among cells belonging to the same hypercolumn and the long-range connectivity, connecting simple cells belonging to different hypercolumns (Fig. 14).
Perceptual phenomena such as those presented in this work arise due to both these two connectivities, modelled in (LHE-3D) by the parameter \(\sigma _{\omega }\) (i.e. the standard variation in the Gaussian \(\omega \)), therefore accounting for smaller or bigger local interactions. This parameter can thus be modulated to vary the width of the connectivity between different hypercolumns so that when \(\sigma _\omega \) is small with respect to the overall size of the processed image, the geometrical completion (inpainting) is reproduced. On the other hand, when \(\sigma _\omega \) is large, perceptual-oriented phenomena such as illusory contours or geometrical optical illusions can be modelled. The change between these two types of interactions observed as the parameter \(\sigma _\omega \) grows is shown in Fig. 15.
This example highlights also the flexibility of our models to adapt to image processing problems and, at the same time but for different choices of parameters, to the modelling of the neural activity in V1.
4 Conclusions
In this paper, we considered a neurophysiological evolution model to study the visual perception bias induced by contrast and, possibly, local orientation dependence. The proposed model has been originally introduced in [10] in the context of image processing for local histogram equalisation (LHE,) and it is a variation of the celebrated Wilson and Cowan equations, formulated in [55] to describe the evolution of a population of neurons in V1.
Firstly, in Sec. 2 we investigated on the efficient representation properties of the original WC model. In mathematical terms, this consists in interpreting the corresponding dynamics as the gradient descent of suitable energy functionals. We rigorously prove that for the WC model there is no energy minimised by the WC dynamics (Theorem 1), while an energy functional minimised by the stationary solutions of the LHE variant exists [see formula (15)].
Secondly, by mimicking the structure of V1, we extended the mathematical formulation of the LHE model to a third dimension in order to describe local orientation preferences. This new model, denoted by (LHE-3D), can be efficiently implemented via convolution with appropriate kernels and solved numerically via standard explicit schemes. The information on the local orientation allows to describe contrast phenomena as well as orientation-dependent illusions.
In Sec. 3, we tested this extension of LHE on some orientation-independent brightness illusions, showing that it is able to reproduce the perceptual results as well as standard linear + nonlinear filtering (such as the ODOG and the BIWaM models [13, 41]). Then, we performed some further tests on orientation-dependent illusions (such as grating induction and the Poggendorff illusion), observing that only the proposed orientation-dependent extension of the LHE model is capable of replicating the perceived visual bias. In agreement with the theoretical sub-optimality of the standard WC model with respect to the efficient representation principle pointed out before, it turns out that, among the neural field models tested, the (LHE-3D) model is the one capable of replicating the bigger number of illusions.
Finally, we reported a preliminary empirical discussion on the sensitivity of model (LHE-3D) to parameters describing different connectivity properties between hypercolumns in V1. Our experiment revealed the existence of a threshold parameter in correspondence of which the completion properties of model (LHE-3D) switch from inpainting type to perceptual type. A more accurate theoretical study based, for example, on bifurcation and stability analysis of the equilibria of the model, is left for future research.
Further investigations should also address a more accurate modelling reflecting the actual structure of V1. In particular, this concerns the lift operation where the cake wavelet filters can be replaced by Gabor filters as in [30], as well as the interaction weight \(\omega \) which could be taken to be the anisotropic heat kernel of [23]. Furthermore, more features of the image (e.g. scale, frequency, color, etc) should also be considered in future work. According to a preliminary analysis that we performed in this direction, (LHE-3D) seems to be promising when it comes to account for scale. In Fig. 16a, we present a variant of the luminance illusion where only one foreground round patch appears in the image at a bigger scale, while in Fig. 16d we present a variant where the target brightness has a gradient counterposed with respect to the background. The results obtained by applying (LHE-2D) and (LHE-3D) to Fig. 16a, d are then reported. We observe that both models correctly reproduce a change of sign in the contrast of the foreground patch, enhancing a 3D effect of the central grey patch. Moreover, (LHE-3D) seems to correctly predict the appearance of an illusory contour in the boundary of the central round patch, as showcased by the line profiles in Fig. 17b, d.
Extensive numerical experiments should also be performed to assess the compatibility of the model outputs with the psychophysical tests measuring the perceptual bias induced by these and other phenomena such as the ones discussed in [8]. This would provide insights about the robustness of the model in reproducing the visual pathway behaviour.
Notes
For our comparisons, we used the ODOG and BIWaM codes freely available at https://github.com/TUBvision/betz2015_noise.
References
Atick, J.J.: Could information theory provide an ecological theory of sensory processing? Netw. Comput. Neural Syst. 3(2), 213–251 (1992)
Attneave, F.: Some informational aspects of visual perception. Psychol. Rev. 61(3), 183 (1954)
Barbieri, D., Citti, G., Cocci, G., Sarti, A.: A cortical-inspired geometry for contour perception and motion integration. J. Math. Imaging Vis. 49(3), 511–529 (2014). https://doi.org/10.1007/s10851-013-0482-z
Barlow, H.B., et al.: Possible principles underlying the transformation of sensory messages. Sens. Commun. 1, 217–234 (1961)
Bekkers, E., Duits, R., Berendschot, T., ter Haar Romeny, B.: A multi-orientation analysis approach to retinal vessel tracking. J. Math. Imaging Vis. 49(3), 583–610 (2014)
Bertalmío, M.: From image processing to computational neuroscience: a neural model based on histogram equalization. Front. Comput. Neurosci. 8, 71 (2014)
Bertalmío, M.: Image Processing for Cinema. Chapman and Hall/CRC, London (2014)
Bertalmío, M., Calatroni, L., Franceschi, V., Franceschiello, B., Gomez Villa, A., Prandi, D.: Visual illusions via neural dynamics: Wilson–Cowan-type models and the efficient representation principle. J. Neurophysiol. 123(5), 1606–1618 (2020). https://doi.org/10.1152/jn.00488.2019
Bertalmío, M., Calatroni, L., Franceschi, V., Franceschiello, B., Prandi, D.: A cortical-inspired model for orientation-dependent contrast perception: a link with Wilson–Cowan equations. In: Lellmann, J., Burger, M., Modersitzki, J. (eds.) Scale Space and Variational Methods in Computer Vision, pp. 472–484. Springer, Cham (2019)
Bertalmío, M., Caselles, V., Provenzi, E., Rizzi, A.: Perceptual color correction through variational techniques. IEEE Trans. Image Process. 16(4), 1058–1072 (2007)
Bertalmío, M., Cowan, J.D.: Implementing the retinex algorithm with Wilson–Cowan equations. J. Physiol. 103(1), 69–72 (2009)
Blakeslee, B., Cope, D., McCourt, M.E.: The oriented difference of gaussians (ODOG) model of brightness perception: overview and executable Mathematica notebooks. Behav. Res. Methods 48(1), 306–312 (2016)
Blakeslee, B., McCourt, M.E.: A multiscale spatial filtering account of the White effect, simultaneous brightness contrast and grating induction. Vis. Res. 39(26), 4361–4377 (1999)
Bohi, A., Prandi, D., Guis, V., Bouchara, F., Gauthier, J.P.: Fourier descriptors based on the structure of the human primary visual cortex with applications to object recognition. J. Math. Imaging Vis. 57(1), 117–133 (2017). https://doi.org/10.1007/s10851-016-0669-1
Boscain, U.V., Chertovskih, R., Gauthier, J.P., Prandi, D., Remizov, A.: Highly corrupted image inpainting through hypoelliptic diffusion. J. Math. Imaging Vis. 60(8), 1231–1245 (2018). https://doi.org/10.1007/s10851-018-0810-4
Bosking, W.H., Zhang, Y., Schofield, B., Fitzpatrick, D.: Orientation selectivity and the arrangement of horizontal connections in tree shrew striate cortex. J. Neurosci. 17(6), 2112–2127 (1997)
Bressloff, P.C., Cowan, J.D.: An amplitude equation approach to contextual effects in visual cortex. Neural Comput. 14(3), 493–525 (2002)
Bressloff, P.C., Cowan, J.D., Golubitsky, M., Thomas, P.J., Wiener, M.C.: Geometric visual hallucinations, Euclidean symmetry and the functional architecture of striate cortex. Philos. Trans. R. Soc. Lond. Ser. B Biol. Sci. 356, 299–330 (2001)
Brücke, E.: Über Ergänzungs und Contrastfarben. In: Sitzungsberichte der Mathematisch-naturwissenschaftlichen Classe der Kaiserlichen, vol. 51, pp. 461–501. Akademie der Wissenschaften, Vienna (1865)
Carandini, M., Demb, J.B., Mante, V., Tolhurst, D.J., Dan, Y., Olshausen, B.A., Gallant, J.L., Rust, N.C.: Do we know what the early visual system does? J. Neurosci. 25(46), 10577–10597 (2005)
Chan, T., Shen, J.: Image Processing and Analysis. Society for Industrial and Applied Mathematics, Philadelphia (2005). https://doi.org/10.1137/1.9780898717877
Chevreul, M.E.: De la loi du contraste simultané des couleurs et de l’assortiment des object colorés [The law of simultaneous contrast of colors and the assortment of colored objects]. Pitois-Levreault, Paris, France (1839)
Citti, G., Sarti, A.: A cortical based model of perceptual completion in the roto-translation space. J. Math. Imaging Vis. 24(3), 307–326 (2006)
Cowan, J.D., Neuman, J., van Drongelen, W.: Wilson–Cowan equations for neocortical dynamics. J. Math. Neurosci. 6(1), 1 (2016)
Daugman, J.G.: Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. J. Opt. Soc. Am. A Opt. Image Sci. 2(7), 1160–1169 (1985)
Duits, R., Felsberg, M., Granlund, G., Haar-Romenij-ter, B.: Image analysis and reconstruction using a wavelet transform constructed from a reducible representation of the Euclidean motion group. Int. J. Comput. Vis. 72(1), 79–102 (2007). https://doi.org/10.1007/s11263-006-8894-5
Duits, R., Franken, E.: Left-invariant parabolic evolutions on \(SE(2)\) and contour enhancement via invertible orientation scores. Part I: linear left-invariant diffusion equations on \(SE(2)\). Q. Appl. Math. 68(2), 255–292 (2010)
Faugeras, O., Touboul, J., Cessac, B.: A constructive mean-field analysis of multi-population neural networks with random synaptic weights and stochastic inputs. Front. Comput. Neurosci. 3, 1 (2009). https://doi.org/10.3389/neuro.10.001.2009
Franceschiello, B., Mashtakov, A., Citti, G., Sarti, A.: Modelling of the poggendorff illusion via sub-riemannian geodesics in the roto-translation group. In: International Conference on Image Analysis and Processing, pp. 37–47. Springer, Berlin (2017)
Franceschiello, B., Mashtakov, A., Citti, G., Sarti, A.: Geometrical optical illusion via sub-riemannian geodesics in the roto-translation group. Differ. Geom. Appl. 65, 55–77 (2019)
Franceschiello, B., Sarti, A., Citti, G.: A neuromathematical model for geometrical optical illusions. J. Math. Imaging Vis. 60(1), 94–108 (2018)
French, D.: Identification of a free energy functional in an integro-differential equation model for neuronal network activity. Appl. Math. Lett. 17(9), 1047–1051 (2004). https://doi.org/10.1016/j.aml.2004.07.007
Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Natl. Acad. Sci. 79(8), 2554–2558 (1982). https://doi.org/10.1073/pnas.79.8.2554
Howe, C.Q., Yang, Z., Purves, D.: The poggendorff illusion explained by natural scene geometry. Proc. Natl. Acad. Sci. 102(21), 7707–7712 (2005)
Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195(1), 215–243 (1968)
Kim, J., Batard, T., Bertalmío, M.: Retinal processing optimizes contrast coding. J. Vis. 16(12), 1151–1151 (2016)
Kitaoka, A.: Adelson’s checker-shadow illusion-like gradation lightness illusion. http://www.psy.ritsumei.ac.jp/~akitaoka/gilchrist2006mytalke.html (2006). Accessed: 03 Nov 2018
Martinez-Garcia, M., Cyriac, P., Batard, T., Bertalmío, M., Malo, J.: Derivatives and inverse of cascaded linear+nonlinear neural models. PLOS ONE 13(10), 1–49 (2018)
McCourt, M.E.: A spatial frequency dependent grating-induction effect. Vis. Res. 22(1), 119–134 (1982)
Olshausen, B.A., Field, D.J.: Vision and the coding of natural images: the human brain may hold the secrets to the best image-compression algorithms. Am. Sci. 88(3), 238–245 (2000)
Otazu, X., Vanrell, M., Parraga, C.A.: Multiresolution wavelet framework models brightness induction effects. Vis. Res. 48(5), 733–751 (2008)
Petitot, J.: Elements of Neurogeometry: Functional Architectures of Vision. Lecture Notes in Morphogenesis. Springer, Berlin (2017)
Prandi, D., Gauthier, J.P.: A Semidiscrete Version of the Petitot Model as a Plausible Model for Anthropomorphic Image Reconstruction and Pattern Recognition. Springer Briefs in Mathematics. Springer International Publishing, Cham (2017)
Rucci, M., Victor, J.D.: The unsteady eye: an information-processing stage, not a bug. Trends Neurosci. 38(4), 195–206 (2015)
Sarti, A., Citti, G.: The constitution of visual perceptual units in the functional architecture of V1. J. Comput. Neurosci. 38(2), 285–300 (2015). https://doi.org/10.1007/s10827-014-0540-6
Self, M.W., Lorteije, J.A., Vangeneugden, J., van Beest, E.H., Grigore, M.E., Levelt, C.N., Heimel, J.A., Roelfsema, P.R.: Orientation-tuned surround suppression in mouse visual cortex. J. Neurosci. 34(28), 9290–9304 (2014)
Shapley, R., Gordon, J.: Nonlinearity in the perception of form. Percept. Psychophys. 37(1), 84–88 (1985). https://doi.org/10.3758/BF03207143
Sugita, Y., Hidaka, S., Teramoto, W.: Visual percepts modify iconic memory in humans. Sci. Rep. 8, 1–7 (2018)
Ts’o, D.Y., Gilbert, C.D., Wiesel, T.N.: Relationships between horizontal interactions and functional architecture in cat striate cortex as revealed by cross-correlation analysis. J. Neurosci. 6(4), 1160–1170 (1986)
Veltz, R., Faugeras, O.: Local/global analysis of the stationary solutions of some neural field equations. SIAM J. Appl. Dyn. Syst. 9(3), 954–998 (2009). https://doi.org/10.1137/090773611
Webster, M.A.: Visual adaptation. Annu. Rev. Vis. Sci. 1(1), 547–567 (2015). https://doi.org/10.1146/annurev-vision-082114-035509
Weintraub, D.J., Krantz, D.H.: The Poggendorff illusion: amputations, rotations, and other perturbations. Atten. Percept. Psychophys. 10(4), 257–264 (1971)
Westheimer, G.: Illusions in the spatial sense of the eye: geometrical-optical illusions and the neural representation of space. Vis. Res. 48(20), 212–2142 (2008)
White, M.: A new effect of pattern on perceived lightness. Perception 8(4), 413–416 (1979)
Wilson, H.R., Cowan, J.D.: Excitatory and inhibitory interactions in localized populations of model neurons. BioPhys. J. 12(1), 1–24 (1972)
Yeonan-Kim, J., Bertalmío, M.: Retinal lateral inhibition provides the biological basis of long-range spatial induction. PLOS ONE 11(12), 1–23 (2016)
Zhang, J., Duits, R., Sanguinetti, G., ter Haar Romeny, B.M.: Numerical approaches for linear left-invariant diffusions on se (2), their comparison to exact solutions, and their applications in retinal imaging. Numer. Math. Theory Methods Appl. 9(1), 1–50 (2016)
Acknowledgements
The authors acknowledge the anonymous referees for their suggestions which improved significantly the quality of their manuscript. M. B. acknowledges the support of the European Union’s Horizon 2020 research and innovation programme under Grant Agreement No. 761544 (Project HDR4EU) and under Grant Agreement No. 780470 (Project SAUCE), and of the Spanish government and FEDER Fund, Grant Ref. PGC2018-099651-B-I00 (MCIU/AEI/FEDER, UE). L. C., V. F. and D. P. acknowledge the support of a public grant overseen by the French National Research Agency (ANR) as part of the Investissement d’avenir program, through the iCODE project funded by the IDEX Paris-Saclay, ANR-11-IDEX-0003-02 and of the research project LiftME funded by INS2I, CNRS. V. F. acknowledges the support received from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie Grant No. 794592 and from the INdAM project Problemi isoperimetrici in spazi Euclidei e non. V. F. and D. P. also acknowledge the support of ANR-15-CE40-0018 project SRGI - Sub-Riemannian Geometry and Interactions. B. F. acknowledges the support of the Fondation Asile des Aveugles.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Orientation-Dependent Model of V1
A Orientation-Dependent Model of V1
Let us denote by \(R>0\) the size of the visual plane, and let \(D_R\subset {{\,\mathrm{\mathbb {R}}\,}}^2\) be the disk \(D_R:=\{x_1^2+x_2^2 \le R^2\}\). Fix \(R>0\) such that \(Q\subset D_R\). In order to exploit the properties of the roto-translation group \(\mathrm{SE}(2)\) on images, we now consider them to be elements of the set:
We remark that fixing \(R>0\) is necessary, since contrast perception is strongly dependent on the scale of the features under consideration w.r.t. the visual plane.
Orientation dependence of the visual stimulus is encoded via cortical inspired techniques, following, for example, [14, 23, 27, 42, 43]. The main idea at the base of these works goes back to the 1959 paper [35] by Hubel and Wiesel (Nobel prize in 1981) who discovered the so-called hypercolumn functional architecture of the visual cortex V1.
Each neuron \(\xi \) in V1 is assumed to be associated with a receptive field (RF) \(\psi _\xi \in L^2({{\,\mathrm{\mathbb {R}}\,}}^2)\) such that its response under a visual stimulus \(f\in \mathcal I\) is given by
Since each neuron is sensible to a preferred position and orientation in the visual plane, we let \(\xi =(x,\theta )\in \mathcal {M} = \mathbb R^2\times \mathbb P^1\). Here, \(\mathbb P^1\) is the projective line that we represent as \([0,\pi ]/\sim \), with \(0\sim \pi \). Moreover, in order to respect the shift-twist symmetry [17, Section 4], we will assume that the RF of different neurons are “deducible” one from the other via a linear transformation. Let us explain this in detail.
The double covering of \(\mathcal {M}\) is given by the Euclidean motion group \(\mathrm{SE}(2)={{\,\mathrm{\mathbb {R}}\,}}^2\rtimes \mathbb {S}^1\) that we consider endowed with its natural semi-direct product structure. That is, for \((x,\theta ),(y,\varphi )\in \mathrm{SE}(2)\), we let
In particular, the above operation induces an action of \(\mathrm{SE}(2)\) on \(\mathcal {M}\), which is thus an homogeneous space. Observe that \(\mathrm{SE}(2)\) is unimodular and that its Haar measure (the left and right-invariant measure up to scalar multiples) is \(\mathrm{d}x\mathrm{d}\theta \).
We now denote by \(\mathcal {U}(L^2({{\,\mathrm{\mathbb {R}}\,}}^2)) \subset \mathcal {L}(L^2({{\,\mathrm{\mathbb {R}}\,}}^2))\) the space of linear unitary operators on \(L^2({{\,\mathrm{\mathbb {R}}\,}}^2)\) and let \({\varPi }{:}\,\mathrm{SE}(2)\rightarrow \mathcal {U}(L^2({{\,\mathrm{\mathbb {R}}\,}}^2))\) be the quasi-regular representation of \(\mathrm{SE}(2)\). That is, \({\varPi }(x,\theta )\in \mathcal {U}(L^2({{\,\mathrm{\mathbb {R}}\,}}^2))\) is the unitary operator encoding the action of the roto-translation \((x,\theta )\in \mathrm{SE}(2)\) on square-integrable functions on \(\mathbb R^2\). The action of \({\varPi }(x,\theta )\) on \(\psi \in L^2({{\,\mathrm{\mathbb {R}}\,}}^2)\) is
Moreover, we let \(\varLambda {:}\,\mathrm{SE}(2)\rightarrow \mathcal {U}(L^2(\mathrm{SE}(2)))\) be the left-regular representation, which acts on functions \(F\in L^2(\mathrm{SE}(2))\) as
Letting \(L{:}\,L^2({{\,\mathrm{\mathbb {R}}\,}}^2)\rightarrow L^2(\mathcal {M})\) be the operator that transforms visual stimuli into cortical activations, one can formalise the shift-twist symmetry by requiring
Under mild continuity assumption on L, it has been shown in [43] that L is then a continuous wavelet transform. That is, there exists a mother wavelet \(\varPsi \in L^2({{\,\mathrm{\mathbb {R}}\,}}^2)\) satisfying \({\varPi }(x,\theta )\varPsi = {\varPi }(x,\theta +\pi )\varPsi \) for all \((x,\theta )\in \mathrm{SE}(2)\) such that
Observe that the operation \({\varPi }(x,\theta )\varPsi \) above is well defined for \((x,\theta )\in \mathcal {M}\) thanks to the assumption on \(\varPsi \). By (25), the above representation of L is equivalent to the fact that the RF associated with the neuron \((x,\theta )\in \mathcal {M}\) is the roto-translation of the mother wavelet, i.e. \(\psi _{(x,\theta )}={\varPi }(x,\theta )\varPsi \).
Remark 5
Letting \(\varPsi ^*(x):=\overline{\varPsi (-x)}\), the above formula can be rewritten as
where \(f*g\) denotes the standard convolution on \(L^2({{\,\mathrm{\mathbb {R}}\,}}^2)\).
Neurophysiological evidence shows that a good fit for the RFs is given by Gabor filters, whose Fourier transform is simply the product of a Gaussian with an oriented plane wave [25]. However, these filters are quite challenging to invert and are parametrised on a bigger space than \(\mathcal M\), which takes into account also the frequency of the plane wave and not only its orientation. For this reason, in this work we chose to consider as wavelets the cake wavelets introduced in [26], see also [5]. These are obtained via a mother wavelet \(\varPsi ^{\text {cake}}\) whose support in the Fourier domain is concentrated on a fixed slice, which depends on the number of orientations one aims to consider in the numerical implementation. To recover integrability properties, the Fourier transform of this mother wavelet is then smoothly cut off via a low-pass filtering, see [5, Section 2.3] for details. Observe, however, that in order to lift to \(\mathcal M\) and not to \(\mathrm{SE}(2)\), we consider a non-oriented version of the mother wavelet, given by \(\tilde{\psi }^{\mathrm{cake}}({\omega }) + \tilde{\psi }^{\mathrm{cake}}(e^{i\pi }{\omega })\), in the notations of [5].
An important feature of cake wavelets is that, in order to recover the original image, it suffices to consider the projection operator defined by
Indeed, by construction of cake wavelets, Fubini’s theorem shows that \((P\circ L)f = f\) for all \(f\in \mathcal I\).
Rights and permissions
About this article
Cite this article
Bertalmío, M., Calatroni, L., Franceschi, V. et al. Cortical-Inspired Wilson–Cowan-Type Equations for Orientation-Dependent Contrast Perception Modelling. J Math Imaging Vis 63, 263–281 (2021). https://doi.org/10.1007/s10851-020-00960-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-020-00960-x