1 Introduction

Contour refers to the lines that constitute the outer edge of the image or object. It not only helps to intuitively understand the information contained in the image but also affects the subsequent analysis and processing of the image. Contour detection is the basic content of image processing. It aims to extract the target contour as completely as possible while removing the background and internal texture noise. It was used in computer vision such as semantic edge detection [21], semantic segmentation [30], and image repair [39].

Contour detection methods in the early time mainly used the low-leveled features of the image, relying on the brightness, color, and contrast to judge the difference between adjacent pixels, such as the Canny algorithm [7], Pb algorithm [22], and so on. Pixel-level contour detection methods are insufficient in accuracy. Therefore, some researchers have proposed the methods of contour detection which used the middle-level features in the image, that is, local features, such as Sketch Token [16] method and Structured Forests for Fast Edge Detection(SE) [10] method. With the popularity of deep learning methods, contour detection models pay more attention to the high-level abstract features of the image. The DeepEdge model [5] adopted the local high-level features, using a multi-scale deep network composed of a local feature extraction network and a fully connected network to extract contours. The HED model [32] used the global high-level features to achieve the contour detection task of end-to-end training. It was the first time that end-to-end training has been successfully used for contour detection tasks. Combining saliency detection and contour detection, the MLM model [31] improved the accuracy of contour detection and saliency target boundary detection at the same time. Also, using a generative adversarial network(GAN) to improve the contour quality was endless [19, 35, 38].

Thanks to the continuous development of biological vision research, various vision mechanisms have been applied to image processing. Based on the biological vision, most of the contour extraction research focused on the optimization of the receptive field and the use of color antagonism. However, these studies stayed at the level of the single visual cortex, only considering the feedforward effect of the visual pathway, which ignored the role of horizontal connection and feedback connection, making the models lack completeness. Based on these, this paper proposed a contour perception model that simulates the complex connection pattern of the visual cortex. First, the sparse coding characteristics of the LGN neurons were simulated to obtain the primary perception of the LGN on image contour. Second, the directional selectivity of the primary visual cortex receptive field was simulated, and the optimal orientation contour responses of the classical receptive field(CRF) were obtained. Third, simulating the hue perception characteristics of the advanced visual cortex, a three-channel hue perception model including surround suppression was constructed, which obtains the contour response through the action of the advanced visual cortex neurons. Finally, a visual cortex model fusing feedforward, feedback, and the horizontal connection was constructed, in which the horizontal connection part adjusted the enhancement and inhibition of neurons by simulating the windmill-like functional structure of the primary visual cortex. Using the interaction between three layers of neurons, we corrected the discharge strength of visual cortex neurons to improve the accuracy of contour extraction.

The contributions are as follows:

  1. (1)

    Simulated the windmill-like functional structure of the primary visual cortex, adjusting the enhancement and inhibition of the neuron action to simulate the horizontal connection of the visual cortex.

  2. (2)

    Simulated the advanced visual cortex hue perception characteristics to construct a three-channel hue perception model that includes surround suppression.

  3. (3)

    Constructed a contour perception model that integrates the feedforward, feedback, and horizontal connection patterns of the visual cortex.

The rest of this paper is organized as follows. In Section 2, we reviewed the related work in the past studies and summarized the advantages and limitations of the previous models. In Section 3, we presented the overall structure of the proposed model. In Section 4, we conducted the comparative test and analyzed the experiment results. In Sections 5 and 6, we presented the discussion and conclusion for the proposed model.

2 Related work

The innovations of the contour detection methods based on the biological vision mechanism can be roughly divided into three categories: optimization of the receptive field, application of the color antagonism, and connection mode of the visual cortex.

2.1 Optimization of the receptive field

Since Hubel and Wiesel [13] proposed the concept of the receptive field and pointed out that it has the characteristic of orientation selection, the receptive field has gradually become the focus of biologists and machine vision researchers. Based on the research of Hubel and Wiesel, Grigorescu [12] proposed to use the two-dimensional Gabor energy function to describe the orientation selection characteristics of the CRF, and further, used the difference of the Gaussian(DoG) function to simulate the lateral inhibition of the non-classical receptive field(nCRF). Since then, biological vision and mathematical models are linked together. In recent years, the researches on the form and function of the receptive field never stopped, Melotti [23] focused on the study of receptive field interaction and built a model to simulate the push-pull inhibition and surround inhibition of the receptive field. This model can suppress the texture while keeping the strongest responses to lines and edges. As for the researches on the form of the receptive field, Fang [11] inspired by the asymmetric receptive field of frogs, simulated the R3 cell model and proposed an information fusion strategy based on the bilateral asymmetric receptive fields. The R3 cell model can suppress the texture of the primary contour response of the image with different intensities of texture. Zhang [40] focused on the dynamic regulation of the receptive field under the external stimuli and the depth determination by binocular cells and proposed the DIDY model, which combined the luminance and disparity information responses based on the optimal orientation. Based on a determination of the relative importance of the receptive field responses, Liu [20] modeled the response of the stereoscopic image content of the classical and non-classical anisotropic receptive fields. However, the above contour detection models are unable to meet the needs of real scene image processing, which have some problems such as insufficient generalization ability and unsatisfactory extraction effect.

2.2 Application of the color antagonism

There are two complementary theories in the study of color vision, namely the theory of Trichromacy [28] and the theory of Opponent process [6]. They proposed the color antagonism theory, which has been applied to contour detection tasks by many researchers. Yang [34] proposed the SCO model based on the color antagonism mechanisms of the biological visual system. Compared with Yang’s use of color antagonistic information only, Lin [17] combined color information with brightness information and proposed a color antagonism model based on dark-light adaptivity, which used the dark and light channel structure to simulate the light and dark processing mechanism of the visual cortex. In addition, Yuan [37] simulated the human visual system and proposed a saliency model based on the color-opponent mechanism in the primary visual cortex. The researches on color antagonism only stayed at the level of the visual cortex, which can’t reflect the complete process of image processing by the biological vision system.

2.3 Connection mode of the visual cortex

A complete biological vision model not only needs to simulate the internal mode in one visual cortex but also needs to consider the interaction between visual cortices. Due to the complexity of the biological vision pathway, its mathematical model is constantly updated with the progress of biological experiments. In recent studies, Capparelli [9] built a neural network in the primary visual cortex where two neural populations were organized in different layers within orientation hypercolumns that were connected by local, short-range, and long-range recurrent interactions. When trained with natural images, the model predicted a connectivity structure linking neurons with similar orientation preferences matching the typical patterns found for long-ranging horizontal axons and feedback projections in the visual cortex. Akbarinia [1] proposed a biologically-inspired edge detection model based on feedback and surround modulated, which modeled a contrast-dependent surround modulation of V1 receptive fields by accounting for full, far, iso-, and orthogonal-orientation surround. Cao [8] put forward a method for extracting local center-surround contrast information from nature images by using a normalized DoG function and a sigmoid activated function. Compared with previous contour detection models, this method can efficiently suppress textures more quickly and accurately. These models well simulated the structure of the biological visual pathways, but they all had the problems of large computational complexity, which led to the low computational efficiency.

3 Materials and methods

The biological vision system is a multi-channel, multi-cascade, and cross-channel complex information processing system [29]. For the sake of simplicity, most of the traditional calculation models paid attention to the construction of the neuron network structure within the functional areas, thus ignoring the connection mode between the layers. In reality, there are complex connection patterns within and between functional areas of the visual cortices [14], which are of great significance to the coding mechanism of the visual nervous system. Azzopardi [4] pointed out that the LGN is the center of the processing of visual information, which plays an important role in the initial processing and transmission of visual stimuli. Murgas [27] showed that the advanced visual cortex mainly plays a role in information processing and integration, which gathers the input signals from all parts. Therefore, focusing on the application of target contour perception, this paper simulated the information transmission relationship among the LGN, primary visual cortex, and advanced visual cortex in the visual pathway, and constructed a three-layer neural network to simulate the response and transmission process of the biological visual system. The contour perception model was shown in Fig. 1.

Fig. 1
figure 1

 A contour perception model that simulates the complex connection patterns of the visual cortex

3.1 LGN sparse measurement method based on statistical characteristics of the receptive field

LGN is the core area connecting the retina and the brain. The number of neurons in the LGN is significantly less than the anterior and posterior layers in the visual pathway, showing the sparse coding characteristics of visual information transmission. In the visual computing model, the common sparse measurement method has a limited effect on distinguishing the contour and texture regions. In order to optimize its distinguishing effect, this paper incorporated a weight factor into the sparse measurement simulated the neuron discharge coding of the LGN local area, and proposed a sparse measurement method based on the statistical characteristics of the receptive field.

In this paper, the sparse measurement method proposed by Alpert [2] is used, as shown in Formula (1) and (2). At the same time, defined the receptive field window Sij, and used σ to represent the half-length of the receptive field window.

$$\tau \frac{d}{{dt}}E_{{ij}}^{{{\text{LGN}}}}\left( t \right)= - E_{{ij}}^{{{\text{LGN}}}}\left( t \right)+{I_{ij}}$$
(1)
$$spa{r_{ij}}=\frac{1}{{\sqrt n - 1}}\left( {\sqrt n - \frac{{{{\left\| {\overrightarrow {{h_{ij}}} } \right\|}_1}}}{{{{\left\| {\overrightarrow {{h_{ij}}} } \right\|}_2}}}} \right)$$
(2)

In the formula, τ denotes the neuron membrane potential constant; \(E_{ij}^{\mathrm{LGN}}\) denotes the membrane potential of neurons at the discharge coding model, (i,j) which simulates the LGN neuron; Iij denotes the gray value of an external input image at (i,j); \(\overrightarrow {{h_{ij}}}\) denotes the histogram with \(E_{ij}^{\mathrm{LGN}}\) in the receptive field window Sij; n denotes the dimension of \(\overrightarrow {{h_{ij}}}\); ||●||p denotes the p norm.

The sparse measurement method was proposed based on the image segmentation, which tests whether each region has texture features. Its purpose was to exclude the texture region from the segmentation point through the setting of sparsity and does not consider the extraction of the image contour. And so, the effect of applying it to contour optimization was limited. Therefore, this paper set the weight factor and sparse measurement threshold to improve the distinction between image contour and texture by neurons, as shown in Formulas (35).

$$sparsity_{ij}=\left\{\begin{array}{*{20}l} \frac{\delta_{ij}^2}{\mu_{ij}}\cdot spar_{ij}\;\;,\;if\;spar_{ij}\geqslant threshold\\ 0\;\;,\;if\;spar_{ij} < threshold\end{array}\right.$$
(3)
$$threshold=\frac{{\sum\nolimits_{i} {\sum\nolimits_{j} {spa{r_{ij}}} } }}{{{\sigma ^2}}}$$
(4)
$$V_{{ij}}^{{{\text{LGN}}}}=E_{{ij}}^{{{\text{LGN}}}} \cdot sparsit{y_{ij}}$$
(5)

In the formula, δij, µij denote the variance and mean of \(E_{ij}^{\mathrm{LGN}}\) in the receptive field window Sij. Variance denotes the inconsistency between pixels at (i,j) and surroundings - the greater the variance, the greater the probability that (i,j) is a contour. Mean denotes the brightness and darkness of the (i,j)-centered range of receptive fields. The smaller the mean is, the darker the image is. The variance item of the weight factor plays a role in distinguishing contour and background, and the mean item plays a role in balancing the brightness and darkness of the picture. threshold denotes the threshold of the sparse metric. Its value is the mean of sparse measurement in the receptive field window Sij. Vij denotes the neuronal membrane potential at (i,j) after sparse measurement.

3.2 Lateral adjustment of the V1 windmill-like receptive field

The visual cortex of higher mammals shows a clear orientation preference. Most neurons only respond to their preferred single orientation graphics. The functional columns of different preferences are arranged orderly in space. In the cortex, periodic expansion is carried out with a singular point as the center. This organizational structure of spatial arrangement according to cell function is called the windmill-like functional construction of the cortex [25]. Each subregion of the windmill is structurally differentiated and functionally divided [15] to adjust the input visual information. Most of the previous methods focused on the orientation selectivity and non-orientation selectivity inhibition of the primary visual cortex, these methods depended excessively on the judgment of the best orientation and were vulnerable to the regional noises. This paper argued that the enhancement and inhibition of neurons can be obtained by combining the windmill-like function construction of the cortex and by considering the optimal angle between two neurons in the receptive field, which can effectively avoid the error of the best orientation judgment caused by noises. Therefore, this paper constructed a lateral adjustment model integrating the receptive field of the windmill structure to simulate the functional column characteristics of the windmill-like structure in the biological visual cortex.

The horizontal modulation of neurons at the windmill structure receptive field (i’, j’) on neurons at (i,j) is related to the distance between neurons and the angle between the optimal response direction of the neurons. Taking (i,j) as the center of the windmill, the angle of the optimal response direction of the two neurons was calculated. If the angle is in the range of 0 ~ π/4 or 3π/4 ~ π, it has an enhancement effect, and if the angle is in the range of π/4 ~ 3π/4, it has an inhibitory effect. The enhancement and inhibition of neurons in the receptive field at (i,j) are shown in Formula (6). Formula (7) and (8) represent the distance between neurons and the enhancement/inhibition function between neurons.

$$ {\left\{\begin{array}{*{20}l} Δ{E_{ij}} ={\sum\nolimits_{\left( {{i}^{'},{j}^{'}} \right) \in {S_{ij}}}}{D * {\omega_{\text{E}}}} \\ Δ {I_{ij}} = {\sum\nolimits_{\left({i}^{'} , {j}^{'} \right) \in {S_{ij}} } } {D * {\omega_{\text{I}}}}\end{array}\right.} $$
(6)
$$D=\exp \left( { - \frac{{{{\left( {i - {{i}^{'}}} \right)}^2}+{{\left( {j - {{j}^{'}}} \right)}^2}}}{{{\sigma ^2}}}} \right)$$
(7)
$$\left\{ \begin{array}{*{20}l} {\omega _{\text{E}}}=\exp \left( { - \frac{{\left| {{\varphi _{ij}} - {\varphi _{{{i}^{'}}{{j}^{'}}}}} \right|}}{{\sigma _{{\Delta \varphi }}^{2}}}} \right)\quad ,\;0 \leqslant \left| {{\varphi _{ij}} - {\varphi _{{{i}^{'}}{{j}^{'}}}}} \right| \leqslant \frac{\pi }{4} \hfill \\ {\omega _{\text{E}}}=\exp \left( { - \frac{{\pi - \left| {{\varphi _{ij}} - {\varphi _{{{i}^{'}}{{j}^{'}}}}} \right|}}{{\sigma _{{\Delta \varphi }}^{2}}}} \right)\quad ,\;\frac{{3\pi }}{4} \leqslant \left| {{\varphi _{ij}} - {\varphi _{{{i}^{'}}{{j}^{'}}}}} \right| \leqslant \pi \hfill \\ {\omega _{\text{I}}}=\exp \left( { - \frac{{\frac{\pi }{2} - \left| {{\varphi _{ij}} - {\varphi _{{{i}^{'}}{{j}^{'}}}}} \right|}}{{\sigma _{{\Delta \varphi }}^{2}}}} \right)\quad ,\;\frac{\pi }{4} \leqslant \left| {{\varphi _{ij}} - {\varphi _{{{i}^{'}}{{j}^{'}}}}} \right| \leqslant \frac{\pi }{2} \hfill \\ {\omega _{\text{I}}}=\exp \left( { - \frac{{\left| {{\varphi _{ij}} - {\varphi _{{{i}^{'}}{{j}^{'}}}}} \right| - \frac{\pi }{2}}}{{\sigma _{{\Delta \varphi }}^{2}}}} \right)\quad ,\;\frac{\pi }{2} \leqslant \left| {{\varphi _{ij}} - {\varphi _{{{i}^{'}}{{j}^{'}}}}} \right| \leqslant \frac{{3\pi }}{4} \hfill \\ \end{array} \right.$$
(8)

In the formulas, D denotes the distance between the neurons, the smaller the D is, the closer the distance between the two neurons is as well as the stronger the interaction is. \(*\) denotes the convolution operation. ωE, ωI denotes the inter-neuronal enhancement and inhibition functions. σΔφ denotes the decay rate, which decides how fast the peripheral inhibition intensity decay as the orientation difference increases. φij denotes the connection direction between (i,j) and (i’, j’). φi’j’ denotes the best orientation of neurons at (i’, j’), as shown in Fig. 2.

Fig. 2
figure 2

The physical orientation between two neurons in connection and the optimal orientation angle of a certain neuron

Primary visual cortex neurons have orientation selection characteristics and have different response intensities for different orientation stimulus signals. When the orientation of stimulus signals is equal to the optimal orientation of neurons, the responses of neurons are the strongest. In this paper, a two-dimensional Gabor function was used to simulate the orientation selection characteristics of CRF in the primary visual cortex, and the maximum response was selected as the edge response, as shown in Formula (9).

$$\left\{ \begin{array}{*{20}l} {g_{\widetilde {i},\widetilde {j}}}\left( {{\theta _k}} \right)=\frac{1}{{2\pi {\sigma ^2}}}\exp \left( { - \frac{{{{\widetilde {i}}^2}+{\varepsilon ^2}{{\widetilde {j}}^2}}}{{2{\sigma ^2}}}} \right) \hfill \\ {\theta _k}=\frac{{\left( {k - 1} \right)\pi }}{{{N_\theta }}}\quad ,\;k=1,2…,{N_\theta } \hfill \\ {e_{ij}}\left( {{\theta _k}} \right)=\left| {\sum\nolimits_{{i,j \in {S_{ij}}}} {{I_{ij}}\left( t \right) * \frac{{\partial {g_{\widetilde {i},\widetilde {j}}}\left( {{\theta _k}} \right)}}{{\partial \widetilde {i}}}} } \right| \hfill \\ V_{{ij}}^{{{\text{max}}}}=\hbox{max} \left( {{e_{ij}}\left( {{\theta _k}} \right)} \right) \hfill \\ \end{array} \right.$$
(9)

In the formula, ε denotes the space compression ratio which controls the aspect ratio of the filter. \(\widetilde {i}=i\cos \left( {{\theta _k}} \right)+j\cos \left( {{\theta _k}} \right)\), \(\widetilde {j}= - i\sin \left( {{\theta _k}} \right)+j\cos \left( {{\theta _k}} \right)\), θk denotes the best orientation of the filter. k denotes the optimal orientation coefficient of the filter. Nθ denotes the number of orientations of the CRF filter. eij(θk) denotes the response of neurons (i,j) and the best orientation is θk. Vmax ij denotes the neuronal membrane potential at (i,j) after the maximum response.

Although the main pathway of the biological visual system is perpendicular to the cortical surface, there are still highly convergent cluster pyramidal axons on the horizontal plane. When the target to be detected is located in a complex environment, the response intensity of the neurons caused by it depends on the modulation of the surrounding neurons. The lateral adjustment model of the integrated windmill-like structure receptive field proposed in this paper was used to achieve the effect of target contour enhancement and texture noises suppression by adjusting the horizontal connection between neurons, as shown in Formula (10).

$$V_{{ij}}^{{{\text{HC}}}}=V_{{ij}}^{{\hbox{max} }}+\delta \left( {\Delta {E_{ij}} - \Delta {I_{ij}}} \right)$$
(10)

In the formula, \(V_{ij}^{\mathrm{max}}\) denotes the membrane potential of neurons at (i,j) after lateral regulation. δ denotes the intensity coefficient of neuronal interaction. ΔEij, ΔIij denotes that the neurons at (i,j) was enhanced and inhibited in the receptive field, respectively.

3.3 Hue perception model of the advanced visual cortex

The advanced visual cortex receives the information from the primary visual cortex, which is fed back to the primary visual cortex after a series of actions, forming a circulatory regulation mechanism [26]. The traditional color antagonism model simply integrates the input of antagonistic cells based on the receptive field characteristics of neurons at all levels, without considering the odd and even channel structure of the visual pathway and the effect of complex cells on the input stimulation. Therefore, a three-channel advanced visual cortex hue perception system containing surround inhibition was constructed to simulate color antagonism and surround inhibition of complex cells in the advanced visual cortex, as shown in Fig. 3.

Fig. 3
figure 3

Three-channel advanced visual cortex hue perception system with surround suppression

Considering the interaction between the CRF and nCRF in the visual cortex, the Gaussian difference function was integrated into the double antagonistic receptive field model. The competition coefficient of neurons in the position (i,j) of r+g channel is shown in Formula (11).

$$\left\{ \begin{array}{*{20}l} {\text{Do}}{{\text{G}}_{pq}}=\frac{1}{{\sqrt {2\pi } {\sigma _{\text{E}}}}}\exp \left( { - \frac{{{p^2}+{q^2}}}{{2\sigma _{{\text{E}}}^{2}}}} \right) - \frac{1}{{\sqrt {2\pi } {\sigma _{\text{I}}}}}\exp \left( { - \frac{{{p^2}+{q^2}}}{{2\sigma _{{\text{I}}}^{2}}}} \right) \hfill \\ C_{{ij}}^{{{\text{rg}}}}=\frac{{\sum\nolimits_{{p,q \in {S_{ij}}}} {{{\left[ {{\text{Do}}{{\text{G}}_{pq}}} \right]}^+}{R_{pq}} - \sum\nolimits_{{p,q \in {S_{ij}}}} {{{\left[ {{\text{-Do}}{{\text{G}}_{pq}}} \right]}^+}{G_{pq}}} } }}{{{{\text{A}}_1}+\sum\nolimits_{{p,q \in {S_{ij}}}} {{{\left[ {{\text{Do}}{{\text{G}}_{pq}}} \right]}^+}{R_{pq}}+\sum\nolimits_{{p,q \in {S_{ij}}}} {{{\left[ {{\text{-Do}}{{\text{G}}_{pq}}} \right]}^+}{G_{pq}}} } }} \hfill \\ \end{array} \right.$$
(11)

In the formula, DoGpq denotes the Gauss difference function of neurons at (p,q). σE denotes the radius of the exciting receptive field. σI denotes the radius of the inhibitory receptive field. [a]+=max(0,a). Rpq, Gpq denotes the red and green component inputs of neurons at p and q, respectively. A1 denotes the attenuation coefficient. For the g+r channel, b+y channel, and y+b channel (i,j) position, the expression of neuronal activity is similar to Formula (11). Only by changing Rpq and Gpq as the corresponding color component inputs, were the neuronal competition coefficients \(C_{ij}^{\mathrm{gr}}\), \(C_{ij}^{\mathrm{by}}\) and \(C_{ij}^{\mathrm{yb}}\) at the (i,j) position of each channel obtained.

Since color information and brightness information play complementary roles in the distinction between contour and texture, the dark region is dominated by color information, and the bright region is dominated by brightness information. Therefore, the brightness channel is added on the basis of the color antagonistic channel to obtain the color information and brightness information at the same time. The brightness channel is divided into the opening channel and the closing channel. The opening channel is responsible for enhancing the information of brightness higher than the surrounding area, setting σE <σI. The closing channel is responsible for suppressing the information of brightness lower than the surrounding area, setting σE >σI. Letting Rij= Gij = Iij, Formula (11) can obtain the neuron competition coefficients \(C_{ij}^{\mathrm{on}}\) and \(C_{ij}^{\mathrm{off}}\) at the (i,j) position of the on and off paths, respectively.

The feature extraction of a visual system is a calculation process of local energy. If the activity of three-channel neurons were simply added, it will cause confusion between the contour and detail area. The traditional local energy model uses the square root of the square sum of the response of the Gabor filter with two orthogonal phases to represent the Gabor energy and directly fuses the output of odd and even filters. There is no advantage in image detail processing. Inspired by the multi-channel filtering theory for the early processing of visual information in the human visual system [24], this paper proposes to simulate the structure of odd and even channels in the visual cortex, and applies the local energy model of multi-channel filtering to integrate the obtained color antagonism and brightness antagonism information.

The two-dimensional Gabor filter was used to filter the input information, as shown in Formula (12). The information obtained by the filter was combined with the neuronal activity of each channel to obtain the simple cell activity of each odd and even component. g+r channel (i,j) position’s simple cell activity of odd and even components were shown in Formula (13).

$${g_{ij}}=\frac{1}{{2\pi {\sigma ^2}}}\exp \left( { - \frac{{{i^2}+{\varepsilon ^2}{j^2}}}{{2{\sigma ^2}}}} \right)\cos \left( {2\pi \frac{i}{\lambda }+\varphi } \right)$$
(12)
$$\left\{ \begin{array}{*{20}l} o_{{ij}}^{{{\text{rg}}}}=\frac{{\sum\nolimits_{{p,q \in {S_{ij}}}} {g_{{pq}}^{{\text{O}}}c_{{pq}}^{{{\text{rg}}}}} }}{{{{\text{A}}_2}+\sum\nolimits_{{p,q \in {S_{ij}}}} {\left| {g_{{pq}}^{{\text{O}}}} \right|c_{{pq}}^{{{\text{rg}}}}} }} \hfill \\ e_{{ij}}^{{{\text{rg}}}}=\frac{{\sum\nolimits_{{p,q \in {S_{ij}}}} {g_{{pq}}^{{\text{E}}}c_{{pq}}^{{{\text{rg}}}}} }}{{{{\text{A}}_2}+\sum\nolimits_{{p,q \in {S_{ij}}}} {\left| {g_{{pq}}^{{\text{E}}}} \right|c_{{pq}}^{{{\text{rg}}}}} }} \hfill \\ \end{array} \right.$$
(13)

In the formula, φ denotes phase parameters, for odd symmetric filter φ=-π/2 or π/2 and for even symmetric filter φ = 0 or π. gO pq, gE pq denotes the odd/even components at the (p,q) position, respectively. For the g+r channel, b+y channel, and y+b channel, the simple cell activity expression of odd and even components at position (i,j) were similar to Formula (13). One only needed to modify \(C_{ij}^{\mathrm{rg}}\) to \(C_{ij}^{\mathrm{gr}}\), \(C_{ij}^{\mathrm{by}}\) and \(C_{ij}^{\mathrm{yb}}\) to obtain the odd and even components of each channel \(o_{ij}^{\mathrm{gr}}\), \(e_{ij}^{\mathrm{gr}}\), \(o_{ij}^{\mathrm{by}}\), \(e_{ij}^{\mathrm{by}}\) and \(o_{ij}^{\mathrm{yb}}\), \(e_{ij}^{\mathrm{yb}}\). For the simple cell activity of odd and even components of the brightness channel, it was necessary to fuse the open channel and the closed channel, as shown in Formula (14).

$$\left\{ \begin{array}{*{20}l} o_{{ij}}^{{\text{L}}}=\frac{{\sum\nolimits_{{p,q \in {S_{ij}}}} {g_{{pq}}^{{\text{O}}}\left( {{{\left[ {c_{{ij}}^{{{\text{on}}}}} \right]}^+} - {{\left[ {c_{{ij}}^{{{\text{off}}}}} \right]}^+}} \right)} }}{{{{\text{A}}_2}+\sum\nolimits_{{p,q \in {S_{ij}}}} {\left| {g_{{pq}}^{{\text{O}}}} \right|\left( {{{\left[ {c_{{ij}}^{{{\text{on}}}}} \right]}^+}+{{\left[ {c_{{ij}}^{{{\text{off}}}}} \right]}^+}} \right)} }} \hfill \\ e_{{ij}}^{{\text{L}}}=\frac{{\sum\nolimits_{{p,q \in {S_{ij}}}} {g_{{pq}}^{{\text{E}}}\left( {{{\left[ {c_{{ij}}^{{{\text{on}}}}} \right]}^+} - {{\left[ {c_{{ij}}^{{{\text{off}}}}} \right]}^+}} \right)} }}{{{{\text{A}}_2}+\sum\nolimits_{{p,q \in {S_{ij}}}} {\left| {g_{{pq}}^{{\text{E}}}} \right|\left( {{{\left[ {c_{{ij}}^{{{\text{on}}}}} \right]}^+}+{{\left[ {c_{{ij}}^{{{\text{off}}}}} \right]}^+}} \right)} }} \hfill \\ \end{array} \right.$$
(14)

Only considering the role of simple cells may lead to incomplete contour extraction and excessive texture in the details. Therefore, this paper used two complex layers of cells to simulate the role of the advanced visual cortex and process the input information from simple cells. The first layer was responsible for fusing ten groups of simple cell responses of three channels and unifying the contour features extracted from each channel. The second layer adopted the surround suppression and used the competitive network in the surrounding environment outside the cell receptive field to achieve the texture suppression effect, as shown in Formula (15) and (16).

$${v_{ij}}=\sum\nolimits_{{k{\text{=L,rg,gr,by,yb}}}} {\left( {{{\left[ {o_{{ij}}^{k}} \right]}^+}+{{\left[ {e_{{ij}}^{k}} \right]}^+}} \right)}$$
(15)
$$V_{{ij}}^{{\text{G}}}=\frac{{{v_{ij}} - \varsigma \cdot \sum\nolimits_{{p,q \in {S_{ij}}}} {\left( {{\text{Do}}{{\text{G}}_{pq}} \cdot {v_{pq}}} \right)} }}{{{{\text{A}}_3}+{v_{ij}}+\sum\nolimits_{{p,q \in {S_{ij}}}} {\left( {{\text{Do}}{{\text{G}}_{pq}} \cdot {v_{pq}}} \right)} }}$$
(16)

In the formulas, \(V_{ij}^{\mathrm{G}}\) denotes the neuronal membrane potential at (i,j) after hue perception. A3 denotes the model coefficient. ς denotes inhibition constant.

3.4 Contour extraction model that simulates the complex connection of visual pathway

Physiological studies have shown that the complex and changeable pathway structure is the basis for the formation of the visual system. Most traditional visual nerve calculation models focused on the construction of the neural network within functional areas, ignoring the interaction between functional areas. In order to better simulate the biological vision system, this paper proposed a contour perception model to simulate the complex connection pattern of the visual cortex. A three-layer neural network was constructed to simulate the response and transmission process of LGN, the primary visual cortex, and the advanced visual cortex. The output was integrated with the response of advanced visual cortex to correct background contour and texture noises, as shown in Formula (17).

$${E_{ij}}=A \cdot \left[ {e^{-t /\tau}+\left( {\frac{{\alpha \cdot V_{{ij}}^{{{\text{LGN}}}}+\beta \cdot V_{{ij}}^{{\text{G}}}+\gamma \cdot V_{{ij}}^{{{\text{HC}}}}}}{\tau }} \right)t} \right]+B \cdot V_{{ij}}^{{\text{G}}}$$
(17)

In the formula, Eij denotes the output contour response. α, β and γ denotes the feedforward, feedback, and horizontal connection coefficient, respectively. A, B denote the coefficient of integration, respectively.

3.5 Parameters description

As this paper contained many parameters, we added Table 1 to illustrate the definitions and values of related parameters used in this paper.

Table 1 Definitions and values of related parameters used in this paper

4 Experiments

In order to verify the effectiveness of the proposed method, the experiments selected the mainstream biological vision methods and the deep learning methods for comparison. The biological vision methods included the contour detection model based on the multiple-cue inhibition (MCI) [33], the contour detection model based on DO and spatial sparseness constraint (SCO) [34], the boundary detection model based on the feedback and surround modulation (SED) [1], and the contour detection model based on the bilateral asymmetric receptive field (BAR) [11]. As for deep learning methods, including the HED [32] network and the RCF [18] network. The BSDS500 [3] dataset and the NYUD dataset were selected for comparative analysis of each method.

4.1 BSDS500 dataset

The BSDS500 dataset contains 500 images. The corresponding 500 ground-truth maps are the contour of the artificial identification, which are used to evaluate the effectiveness of the method.

4.1.1 Coefficient selection experiments

In order to determine the influence of different values of the fusion coefficients A and B in Eq. 17 on the overall performance of the model, the coefficient selection experiment was carried out. The accuracy rate, recall rate and the F-score evaluation index proposed in the literature [22] were used to quantitatively analyze the results. The specific calculation process of the F-score is shown in Formulas (1820). Since there is a deviation between the final contour map and the standard contour map, this paper defined that if the detected pixel appears in the 5 × 5 neighborhood of the standard contour pixel, it is judged that the pixel is labeled correctly. The standard contour pixel set is denoted by ED, the contour set detected by the algorithm is denoted by EGT. The correct pixel set detected by the algorithm is \(E={E_D} \cap \left( {{E_{GT}} \oplus T} \right)\), where ⊕ denotes the expansion operation and T denotes the 5 × 5 structural unit. The error pixel set EFP detected by the algorithm is \({E_{FP}}={E_D} - E\). The missing pixel set EFN of the algorithm is \({E_{FN}}={E_{GT}} - \left( {{E_{GT}} \cap \left( {{E_D} \oplus T} \right)} \right)\).

$${P_r}=\frac{{{\text{card}}\left( E \right)}}{{{\text{card}}\left( E \right)+{\text{card}}\left( {{E_{FP}}} \right)}}$$
(18)
$${R_c}=\frac{{{\text{card}}\left( E \right)}}{{{\text{card}}\left( E \right)+{\text{card}}\left( {{E_{FN}}} \right)}}$$
(19)
$$F=\frac{{2{P_r}{R_c}}}{{{P_r}+{R_c}}}$$
(20)

In the formula, card(E) denotes the number of elements in the set E, Pr denotes the accuracy, and Rc denotes the recall rate. F denotes the consistency between the contour detected by the algorithm and the standard contour. A series of accuracy and recall rates can be obtained by adjusting the threshold. With the recall rate as the horizontal axis and the accuracy as the vertical axis, the precision–recall(P-R) curve is plotted, which is shown in Fig. 4. The larger the area under the P-R curve, the better the contour extraction performance of the algorithm.

Fig. 4
figure 4

P-R curves of different coefficients for the BSDS500 dataset. The notion of [F= ] in the legend illustrates the best F-score

In the case of determining the fusion coefficients A and B, six images were randomly selected from the BSDS500 dataset as the experimental objects. It can be seen from the figure and data that when A = 0.6 and B = 0.4, the proposed method has a better detection effect.

4.1.2 Comparative experiments

In order to verify the effectiveness of the proposed method in this paper, the BDSD500 dataset was used for comparative experiments, the results are shown in Fig. 5. In order to show the contour extraction effect of each method more clearly, the optimal binarization operation was carried out on each contour map. We named the method proposed in this paper as CCP.

Fig. 5
figure 5

Contour detection results of the BSDS500 dataset

It can be seen from the results in Fig. 5 that the MCI method combined a variety of visual features and can better extract the key contour information, but the processing of the details was insufficient, such as the 8068 red circle area. SCO method extracted contours based on the color information and added the texture suppression part, which can better balance the contour extraction and the textures suppression. However, in some complex images, the contours and the textures still cannot be completely separated, such as the 176,035 red circle area. On the basis of the MCI method, the BAR method added the asymmetric receptive field mechanism, so its detection effect was slightly improved, but still had many textures, such as the 113,044 red circle area. The contour obtained by the SED method was relatively complete, but there are some textures around the contour, such as the 38,092 and 118,035 red circle areas. CCP method took the feedforward, feedback, and horizontal connection into consideration, using visual cortex multi-level interaction and dynamic connection to finish the contour extraction task. So the obtained contour was more complete than other methods. Furthermore, there were relatively fewer textures.

For the contour obtained by each method in the BSDS500 dataset, uses accuracy rate and recall rate to quantitatively analyze the results. Figure 6 shows the P-R curves and F-scores of different algorithms. Table 2 is the performance evaluation of each method applied to the BSDS500 dataset-optimal dataset scale (ODS), optimal image scale (OIS), and the average accuracy (AP) [3] of each method.

Fig. 6
figure 6

P-R curves of different models for the BSDS500 dataset

Table 2 Comparison of the models for the BSDS500 dataset

It can be seen from the graph and table that the CCP method performs better than the methods based on the biological vision (MCI, SCO, BAR and SED) in various performance indicators. As for deep learning methods such as RCF and HED, the CCP method is slightly insufficient in performance indicators, but deep learning methods have many limitations. First of all, the results obtained through deep learning methods cannot be corrected, and the portability is weak, if the dataset is replaced, it needs to be retrained from scratch. Secondly, it is generally accepted that the deep learning methods such as RCF and HED are more of a network structure at the black box level, lacking the interpretability of brain-like methods. To a certain extent, deep learning methods also lose the opportunity for humans to find errors. Finally, due to the complexity of the models in deep learning, the time complexity of the algorithms has increased dramatically. In order to ensure the real-time performance of the algorithm, higher programming skills, more training time and better hardware support are required. In contrast, the method proposed in this paper used a visual neural computing model to simulate the mechanism of visual information transmission and processing in the visual pathway. Therefore, CCP method had better feasibility in the smaller dataset scale.

4.1.3 Ablation experiments

In order to verify the contribution of each part of the method to contour extraction, ablation experiments were carried out. The complete model was compared with the removal of the horizontal connection module, the removal of the feedback connection module, and the removal of the horizontal connection and feedback connection module at the same time. The P-R curve is shown in Fig. 7, and the indexes of ODS, OIS, and AP are shown in Table 3.

Fig. 7
figure 7

P-R curves of ablation experiments for the BSDS500 dataset

Table 3 Results of ablation experiments for the BSDS500 dataset

It can be seen from the curve in Fig. 7 and the data in Table 3 that both the horizontal connection module and the feedback connection module contribute to the model constructed in this paper. When the horizontal connection module or the feedback connection module is removed, the F-score drops to 0.70 and 0.68, respectively. When the horizontal connection and feedback connection module are removed at the same time, the F-score drops to 0.64. The horizontal connection module simulates the lateral adjustment function of the biological visual receptive field, which helps to avoid the influence of regional noise and enhance the robustness of the model. The feedback connection module simulates the feedback process of the biological visual system, which uses the information obtained by the advanced visual cortex to adjust the primary visual cortex to enhance the overall stability of the model.

4.2 NYUD dataset

In order to further demonstrate the effectiveness of the model algorithm in this paper, the NYUD dataset was used for further analysis. The NUYD dataset consists of video sequences of various indoor scenes recorded by Microsoft Kinect’s RGB and depth cameras, which contain 1449 annotated RGB images and depth maps from 464 scenes in 3 cities. In this paper, five pictures were randomly selected from the NYUD dataset for results display and data analysis, as shown in Fig. 8.

Fig. 8
figure 8

Contour detection results for the NYUD dataset

It can be seen from the results in Fig. 8, the MCI method can extract the contour of the image relatively complete, but for areas with weak contrast characteristics, such as the red frame area of 5415 and 5598, the contours extracted by the MCI method were missing. SCO method used color information to extract the contour, it was easy to ignore the contour information where the color difference was small, such as the red frame area of 5341and 6138. BAR method cannot balance the contour extraction and the texture suppression, for example, the red frame area of 5415 has more texture information, while the red frame area of 5450 has obvious missing the contours. Due to the richness of the image details in the NYUD dataset, the contours obtained by the CCP method cannot satisfy higher completeness and accuracy, but compared with the three biological vision-based methods of MCI, SCO and BAR, the contours obtained by the CCP method were more complete, as well as the texture information is relatively less.

Using the method proposed by Grigorescu [12] to quantitatively evaluate the obtained contour. The performance evaluation index error rate eFP, missed rate eFN, and overall performance index Performance-value were calculated by Formulas (2123). The specific results are shown in Table 4.

$${e_{FP}}=\frac{{{\text{card}}\left( {{E_{FP}}} \right)}}{{{\text{card}}\left( E \right)}}$$
(21)
$${e_{FN}}=\frac{{{\text{card}}\left( {{E_{FN}}} \right)}}{{{\text{card}}\left( {{E_{GT}}} \right)}}$$
(22)
$${\text{Performance}}=\frac{{{\text{card}}\left( E \right)}}{{{\text{card}}\left( E \right)+{\text{card}}\left( {{E_{FP}}} \right)+{\text{card}}\left( {{E_{FN}}} \right)}}$$
(23)
Table 4 Comparison of the models for the NYUD dataset

For the more complex image set of NYUD, the MCI method can extract image contour information well, but the false detection rate eFP is higher on the images with more details. The Performance-value of the BAR method is slightly higher than that of the MCI method, but its contour extraction is not complete. SCO method used the color information and had good performance for most images in the NYUD dataset, but it was difficult to balance the contour extraction and the texture suppression for images with dim brightness and a large number of details. In this paper, the overall performance index Performance-value was improved compared with other methods, meanwhile, the false detection rate and missed detection rate were decreased. According to the above analysis, the CCP method performs better both in contour extraction and texture suppression.

By using each method to extract the contours of the images in the NYUD dataset, 1449 best Performance-values can be obtained. In order to verify the stability of each method, the Performance-values obtained through HED, RCF, MCI, SCO, BAR and CCP were statistically calculated in the form of box-and-whisker statistics. The top and bottom of the box represent the maximum and minimum values after removing the outliers, and the middle horizontal line represents the median of the Performance-value. The shorter the box, the better the stability of the method [36] (Fig. 9).

Fig. 9
figure 9

Box-and-Whisker statistics of Performance-value in contour detection for the NUYD dataset

It can be seen from the box-and-whisker statistics that the deep learning methods (HED, RCF) are superior in performance to the methods based on the biological vision. However, the acquisition of its excellent performance required a large amount of training time. If the dataset is replaced, they need to be retrained from scratch. On the contrary, the methods based on biological vision do not require training, and the models are constructed by simulating the biological vision system, which are fully interpretable. When comparing the methods based on the biological vision, it can be found that the CCP method gets a higher Performance-value, which means that the CCP method has good performance indicators. Also, the box obtained by the paper’s method is shorter than others, which means that the method is more stable.

5 Discussion

In this paper, we chose the BSDS500 dataset and NYUD dataset to compare the advantages and limitations of the proposed method, mainstream biological vision methods and popular deep learning methods.

As for the methods based on biological vision, the MCI method fully considers the influence of various factors on contour extraction. However, it excessively relies on the local features of the image, so the effect of contour extraction on texture boundaries and low-contrast regions needs to be improved. The SCO method constructed a contour detection model based on color antagonism and spatial sparsity constraint suppression. Nevertheless, it can only reflect the local contrast of the images and ignore the overall characteristics. Based on the feedback and surround modulated, the SED method constructed a contour detection model which structured a contrast-dependent surround modulation of V1 receptive fields. The contours obtained by this method are still having numerous textures at the boundary. The BAR method constructed a contour detection model based on the bilateral asymmetric receptive fields of the visual pathway, helping to highlight the contrast difference in local areas. However, this method only considers the brightness and contrast information, ignoring the rest of the features, which makes it unable to balance the contour extraction and texture suppression.

The CCP method simulated the complex connection pattern of the visual cortex, taking feedforward, feedback, and horizontal connection into consideration, to construct a contour calculation model inspired by the biological vision. In addition, the application of sparse coding, optimal orientation selection and other visual mechanisms helped the model to obtain the full contours efficiently. Furthermore, benefitting from the use of hue perception based on the surround suppression mechanism, the CCP method produced a better suppression effect on textures. But there are still some limitations, the CCP method was essentially an unsupervised learning pattern, which didn’t require labeling of the sample set, so it lacked the necessary prior knowledge or experience for the data. Therefore, after using the difference in the receptive field response to extract the primary contour, the subsequent processing cannot improve the closure and continuity of the target contour. In addition, the CCP method did not consider the binocular disparity structure. When processing images containing depth information, some important information will be lost, resulting in incomplete contour lines.

6 Conclusion

Based on the biological vision, this paper discussed the influence of the interaction of different visual cortexes on the visual information transmission. First, the sparse coding characteristic of the LGN was simulated to obtain the primary perception of LGN neurons on the contour. Second, the directional selectivity of CRF in the primary visual cortex was simulated, the lateral adjustment model integrating windmill-like structure receptive fields was constructed to obtain the perception of the primary visual cortex on image contour. Third, by simulating the hue perception characteristics of the advanced visual cortex, a three-channel hue perception model including surround suppression was constructed. Through the fusion of the complex cells’ response to color antagonism channel and brightness channel and its surround suppression characteristics, we obtained the response of the advanced visual cortex to image contour, which effectively highlights the target contour as well as suppresses texture noise. Finally, a contour detection model was constructed to simulate the complex connection of visual cortex. The connection includes the feedforward input from LGN, the feedback input from the advanced visual cortex and the horizontal input from the same layer of neurons. The interaction between the visual cortexes was used to correct the discharge strength of the visual cortex neurons for input image stimulus and improves the accuracy of the contour.

Some improvements to the CCP model can be implemented in future work. (1) Developing the role of labeled samples in the contour recognition process, and exploring the construction and implementation of the semi-supervised learning methods for the visual computing models. In order to realize the image contour extraction based on the certain prior knowledge. (2) Introduce the binocular disparity structure into the visual computing model to extract the depth information of the target images, so as to achieve the accurate and complete expression of contours.