1 Introduction

Visual saliency detection (VSD) is a computational model that simulates the human visual system (HVS), which aims to identify important regions or objects in an image or video. In the past few decades, it has been applied to computer vision and neurosciences, including image segmentation [3, 19], image matching [38], video tracking [22, 27, 28], visual relation detection [37, 46], event-based multimedia analysis [23, 25], etc.

According to whether prior knowledge is used or not, the current saliency-detection models can be divided into two categories: bottom-up and top-down ones. Bottom-up models [1, 4, 7, 8, 10, 11, 20, 21, 24, 26, 39, 45, 47] are stimulus-driven, using low-level features, such as color, texture, contrast and spatial location cues. On the other hand, top-down based models usually exploit high-level human perceptual knowledge, semantics and object priors to guide saliency computation [6, 9, 42], which are often task-driven and special application-oriented with huge training data and prior knowledge.

It was commonly accepted that the efficacious integration of multiple low-level cues can produce reliable and robust saliency-detection results [14]. Thus, in this paper, we obtain preliminary saliency maps by integrating multiple layers of prior knowledge. Then we exploited the local and global contrast based on the colour and texture feature of the image region [4, 26] to calculate a tentative saliency map. Since human beings are subjective when looking at an image, the simple contrast calculation may not be enough to provide adequate visual clues. In fact, human beings are accustomed to focus on more colourful areas in the center of the image, whereas the dark areas in the boundary of the image are easily ignored. Therefore, we adopt three low-level prior-knowledge cues to generate an initial saliency map, including colour prior and spatial location prior and background prior. Then, we refine the preliminary saliency map based on Cellular Automata (CA).

In contrast to previous works, in this paper, we propose an effective and compound saliency-detection model, which is based on multi-level prior and a propagation mechanism relying on Cellular Automata. In our method, we firstly divide the input image into compact uniform elements. Then, we incorporate the Convex-Hull-Based center method to obtain the spatial location of the salient objects. According to psychology research, the warm colors such as red and yellow are more remarkable and distinct for human visual perception. In order to use this fundamental information, we also exploit color prior during the procedure of saliency detection. Meanwhile, in order to capture the boundary details of salient object, we generate the background-prior map by calculating the probability that the image region is connected to the image boundaries. Finally, these three typical maps are integrated to obtain a multi-level saliency map, and then we apply a propagation mechanism based on CA to reduce the redundant information in similar regions to obtain a refined saliency map for salient-object detection. In order to evaluate the proposed saliency-detection model, we carried out extensive experiments on several widely used datasets and experimental results show that the proposed method outperforms other state-of-the-art saliency-detection approaches.

The rest of this paper is organized as follows. In Section 2, we review the related work on saliency detection models. In Section 3, we describe the proposed framework in detail. Section 4 shows the experimental results. Finally, conclusion and discussion are presented in Section 5.

2 Related work

Most of the existing visual saliency-detection (VSD) methods are based on the bottom-up computational scheme, which uses low-level visual features in images or videos. Early work of saliency detection was proposed by Itti et al. [13]. In this model, based on visual characteristics (include colour, intensity and orientation maps), center-surrounding difference of low-level image features were used to obtain the final saliency map of an input image. In [26], Ma et al. proposed an approach for visual attention area detection in images based on local contrast analysis, which was computed based on center-surrounding and colour difference. Harel and Perona [8] proposed a graph-based visual saliency-detection (GBVS) model aiming to fuse different feature maps so as to strengthen those salient regions in an image. However, this method produces a lower resolution saliency map. Liu et al. [24] used Conditional Random Fields (CRF) to identify salient region in an image, which employed three types of low-level features including the multi-resolution contrast, the center-surround histogram, and the colour spatial distribution. Hou et al. [11] proposed a spectral-residual (SR) saliency-detection approach by analysing the log-spectrum of an image, with the intent of focusing on the contrast while ignoring the redundant information. In [1], a frequency-tuned (FT) based model for saliency detection was proposed by calculating the colour difference from the mean image colour at the pixel level. Goferman et al. [7] proposed a context-aware (CA) saliency-detection method based on local contrast and human visual attention rules to detect salient objects. In [45], Zhai and Shah proposed a fusion spatial attention model and temporal attention model for video saliency detection. In [20], a high-dimensional colour transform method was proposed for salient object detection. Cheng et al. [4] proposed a histogram-based contrast (HC) and a region-based contrast (RC) method for saliency detection. Li et al. [21] proposed a salient object detection model by combing meanshift filtering and fusing of colour information in an image. In [47], Zhang and Sclaroff designed a boolean-map based method by exploiting surroundedness for saliency detection. Wei et al. [39] exploited background priors and geosdesic distance for saliency detection. Zhu et.al [48] integrated boundary connectivity and multiple low-level cues to estimate saliency of images. Shen et al. [34] proposed a saliency-detection model via unifying high-level priors into a low-rank framework. In [31], a novel structured matrix decomposition model (SMD) with two structural regularizations, namely tree-structural sparsity-inducing regularizations and Laplacian regularizations, was developed for salient object detection. Qin et al. [32] proposed a model by using global distance matrices and applied Cellular Automata to optimize the prior map for salient object detection. In [44], Yang, et al. proposed a graph-based manifold ranking method, which used boundary regions consider both foreground and background cues. Song et al. [36] proposed a co-saliency method based on depth cues to detect the salient object objects. In [12], by taking the proposals as the bags of instances of multiple instances learning (MIL), where the instances were the superpixels contained in the proposals, Huang et al. formulated saliency detection problem as a MIL task. Chang et al. [3] proposed a novel computational framework for saliency detection with the aid of minimizing energy function and knowledge prior. In [17], a salient object segmentation method was proposed by integrating both bottom-up salient stimuli and object-level shape prior. Based on a set of representative texture atoms, Scharfenberger et al. [33] proposed a sparse texture model for the saliency object detection through extracting rotationa-invariant neighborhood-based textural representations. In [41], Yan et al. proposed a multi-layer approach to analyse saliency cues for saliency detection. Cheng et al. [5] proposed a method using a soft image abstraction representation to decompose an image into large scale perceptually homogeneous elements for efficient salient region detection. Recently, a saliency detection method, using the quaternionic distance based weber local descriptor and low-level priors, is proposed for natural images [16]. In [15], Jian et al. proposed a saliency-detection model by integrating quaternionic distance based weber descriptor (QDWD) with pattern distinctness and local contrast for underwater vision saliency detection. In [18], a novel computational model for saliency detection is proposed by combining the local and global kernel density estimations during the saliency computation process.

Compared to bottom-up saliency-detection model, top-down methods are based on the high-level priors of images and therefore is always specific task-driven with training and learning scheme. In [42], Yang et al. proposed a top-down saliency detection model by jointly learning a Conditional Random Field (CRF) and a visual dictionary for salient object detection. Cholakkal et al. [16] designed a top-down saliency detection method based on locality-constrained contextual sparse coding. In their approach, three locality constraints were combined to label an object category. In [15], a two-stage deep method was exploited to learn the intra-class association between the exemplars and query objects for top-down saliency detection.

In this paper, we concentrate on the popular category of the bottom-up computational saliency detection model. Different from conventional approaches, the proposed scheme can ameliorate the accuracy and consistency of the estimated saliency results by fusing multi-level saliency priors and local similarity cues. The main contributions of the proposed approach are twofold:

  • An efficient and compound framework by integrating low-level saliency priors and local similarity cue is proposed for visual saliency detection.

  • With the merit of propagation mechanism based on CA, a principle refinement framework is devised to combine multi-level prior maps and local similarity cues together for uniformly highlighting salient objects. As a result, the scheme can reduce the difference among similar regions and suppress the background by ameliorating the multi-level prior maps.

3 The proposed method

In this section, we describe the designed saliency-detection framework, which incorporates multiple low-level prior maps together through a propagation mechanism based on Cellular Automata and saliency refinement procedure. Multi-level priors will first be described, including center prior, colour prior and background prior. All these different types of information are fused to obtain an integrated saliency map.

3.1 Multi-level prior Integration

Given an input image, we use the simple linear iterative clustering (SLIC) algorithm [2] to segment the image into N superpixels P = {p1, p2, .......pN}.

3.1.1 Center-prior

Plenty of existing center prior methods directly use the image center [7, 24, 26]. Actually, when the salient objects are in the image center, these methods will produce good results. However, there are many salient objects that deviate from the center of image in real situation; in this case these methods are no longer applicable. In order to tackle this issue, in this paper, we adopt the convex hull of interest points to estimate the real spatial center of the salient object [43]. Firstly, linear iterative clustering (SLIC) algorithm [2] is applied to segment the image into N superpixels, and then compute a convex hull which encloses interesting points to orientate the location of salient object. Secondly, we regard the centroid of the convex hull as the center to achieve the convex-hull-based center prior estimation. The center-prior saliency value of a pixel can be defined as follows:

$$ {CenP}_i=\exp \left(-{\frac{\left\Vert {x}_i-{x}_0\right\Vert }{2{\theta}_x^2}}^2-{\frac{\left\Vert {y}_i-{y}_0\right\Vert }{2{\theta}_y^2}}^2\right) $$
(1)

where xi and yi denote the mean horizontal and vertical positions of the superpixel i, (x0, y0) denote the center of the superpixel i, θx and θy indicate the horizontal and vertical variances. We use a centered anisotropic Gaussian distribution to model the center prior map and set θx = θy with pixel coordinate normalize to [0, 1]. The convex hull provides a rough location of the salient object, and the convex-hull-based center prior map is more reasonable and robust, which achieves a good performance. Some examples of center prior map are illustrated in Fig. 1. It shows that the convex-hull-based center prior is able to localize the spatial position of the salient object. Even if the object is at the edge of the image, it can still achieve satisfactory results.

Fig. 1
figure 1

Examples of center prior maps. The first line represents the input image; the second line is the center-prior cues based on the centroid of the convex hull

3.1.2 Colour-prior

According to the human vision system, human eyes are more sensitive to warm colour such as red and yellow colour. Warm colours are thus more attention-getting for human perception. To exploit this information, we used colour-prior proposed in [34] and obtain a 2-D histogram distribution H(S) in nR-nG colour space, which can be expressed as follows:

$$ nR=\frac{R}{R+G+B} $$
(2)
$$ nG=\frac{G}{R+G+B} $$
(3)

Similarly, a histogram indicating the colour distribution of the background H(퐵) is generated as well. For each quantized colour, we obtain the values from two histograms, denoted by hs and hB, respectively. Then, the color prior for saliency estimation can be expressed as follows:

$$ {ColP}_i=\exp \left(\left({h}_S\left({c}_x\right)-{h}_B\left({c}_x\right)\right)/{\lambda}_3^2\right) $$
(4)

where cx indicates the color at location x. Figure 2c shows some samples of color prior maps.

Fig. 2
figure 2

Samples of multiple low-level prior maps. a Initial input images; b center prior maps; c color prior maps; d background prior maps; e Fusion results of prior maps; f Saliency maps of propagation mechanism; g Saliency refinement; and h Ground truths

3.1.3 Background-prior

Recently, some research works [39, 44] introduce boundary prior and treat image boundary regions as background. In [39], the boundary superpixels are used as background queries. While in [44], the authors calculate the shortest-path distance to the image boundary for saliency detection via graph-based manifold ranking. Inspired by these previous works, we calculate the probabilities of image regions connected to image boundaries [48]. The background-prior can be expressed as below:

$$ {BgP}_i=\exp \left(-\frac{bdco{n}^2\left({p}_i\right)}{2{\sigma}^2}\right), $$
(5)

where bdcon denote the boundary connectivity, parameter σ=0.3. The background-prior for saliency detection is shown in Fig. 1d.

3.1.4 Multi-level prior fusion

To enhance saliency map, we integrate these three low-level visual cues simultaneously, namely binding color prior, center prior and background prior to obtain multiple low-level prior Sprior(i) as follows:

$$ {S}_{prior(i)}= Cen{P}_i.\ast Col{P}_i.\ast Bg{P}_i. $$
(6)

where ⋅∗ is the pixel-by-pixel multiplication operator. Figure 2e displays some samples of multiple low-level priors.

3.2 Propagation mechanism

In order to strengthen the relevance of similar regions, we employ a propagation mechanism based on Cellular Automata [32, 40]. We detect the neighbors of the superpixel as well as sharing common boundaries with its adjacent superpixels as a cell. If two adjacent image superpixels are similar, we utilize the affinity matrix to represent the similarity of superpixels pair (pi, pj).The affinity matrix W = (wi, j ∈ RN × N), with wi. j is defined as:

$$ {w}_{i,j}=\Big\{{\displaystyle \begin{array}{l}\exp \left(-{\frac{\left\Vert {p}_i-{p}_j\right\Vert }{2{\lambda}^2}}^2\right), if\left({p}_i,{p}_j\right)\in V\\ {}0, otherwise\end{array}} $$
(7)

where pi, pj denotes superpixel i and j, V denotes the set of adjacent superpixel pairs which are either neighbors on the image. In order to normalize affinity matrix, a degree matrix D =  diag {d1, d2, ......dN} is generated, where di = ∑jwi, j, to form a row-normalized impact affinity matrix [32], which can be written as blow:

$$ W\ast ={D}^{-1}\cdot W $$
(8)

Considering that each superpixel of the next state is determined by itself and neighbours, we need to balance the importance of the two factors relationship. If a superpixel is quite different from all neighbours in the color space, the next state will primarily rely on itself. On the contrary, if a superpixel is similar to neighbors, it is more likely to be assimilated by the local environment. So we build a coherence matrix C =  diag {c1, c2, ......cN} to better promote the evolution of all neighbours, each superpixel coherence towards its current state ci = 1/ max(wi, j), then we can construct the coherence matrix \( C\ast =\mathit{\operatorname{diag}}\left\{{c}_1^{\ast },{c}_2^{\ast },......{c}_N^{\ast}\right\} \) by the formulation as in [32]:

$$ {c}_i^{\ast }=a\cdot \frac{c_i-\min \left({c}_j\right)}{\max \left({c}_j\right)-\min \left({c}_j\right)}+b $$
(9)

where j = 1, 2, ...N, and set a = 0.6, b = 0.2. With coherence matrix C∗, each superpixel can evolve into a more accurate and stable state. Under the influence of neighbouring superpixels, the salient objects can be more easily detected. According to the affinity matrix and coherence matrix, it can be defined the propagation mechanism [32] as follows:

$$ {S}_c=C\ast \cdot {S}^p+\left(I-C\ast \right)\cdot W\ast \cdot {S}^p $$
(10)

where Sp= {Sprior(1), Sprior(2), ...Sprior(i)...., Sprior(N)}, I is an identity matrix, C∗ and W∗ represent coherence matrix and affinity matrix, respectively. Figure 2f shows some saliency maps of propagation mechanism.

We also evaluated the performance of the propagation mechanism based on and without CA. As shown in the Fig. 3, the first line represents the input images, while the second line is the saliency map without CA, and the third line shows the saliency maps with CA. By using the CA processing, the effect of image background noises can be reduced, and meanwhile salient object in the image can be highlighted more uniformly. Therefore, salient maps produced by CA contain a clear outlines, so the cellular automata (CA) plays an important role in our devised scheme.

Fig. 3
figure 3

Comparison experiments by using CA and without CA. The first line are the input images, the second line is the saliency maps without CA, while the third line shows the saliency maps with CA

Furthermore, the proposed updating rule applies to most images. Through exploiting the intrinsic relationship in the neighborhood superpixels, our CA-based model can enhance saliency consistency among similar regions. As shown in Fig. 3, CA can suppress the background and uniformly highlight the salient object in an image. In addition, even if the image is at the boundary of the image, such as the first image and the fourth image, satisfactory results can also be obtained.

3.3 Saliency refinement

To obtain a high-quality saliency map, we finally combine multiple low-level saliency prior and propagation mechanism together, and the saliency refinement procedure can be expressed as follows:

$$ {S}_{(i)}=\frac{\left(M+{S_i}^p\ast +{S}_{c(i)}\ast \right)}{{S_{(i)}}^p\ast } $$
(11)

where M is a Laplacian matrix, Sip∗ denotes the value of multiple low-level prior diagonal matrix, Sc(i)∗ denotes the diagonal matrix value of Sc. Figure 2g shows some results of saliency refinement, compared with the ground truths in Fig. 2g, saliency refinement is capable of further enhancing the saliency maps.

4 Experimental results

In order to evaluate the performance of our algorithm, we conduct a series of experiments using three widely used datasets, including MSRA [30], ECCSD [35], SOD [29]. The proposed saliency-detection method is compared with other state-of-the-art methods, i.e. Frequency-tuned: FT [1], Robust background detection: RBD [48], Spectral residual: SR [11], Global contrast: RC [4], Manifold ranking: MR [44], Local contrast: LC [4], Histogram contrast: HC [4]. For comprehensive evaluation, we use precision-recall (P-R), F-measure to assess the quality of the saliency maps. Precision is related to the saliency detection correctly of the computation model, Recall is the ration of correctly detection to ground-truth, F-measure is a performance measure harmonic mean of precision (P) and recall (R) [24] quantificationally, which is defined as follows:

$$ {F}_{\beta }=\frac{\left(1+{\beta}^2\right)\times P\times R}{\beta^2\times P+R} $$
(12)

where β2 is set to 0.3 as suggested in [1]. The P-R curve is an important indicator of the model performance. The larger the area under the curve, the more obvious the superiority and robust of the algorithm.

4.1 Evaluation on MSAR 5000 dataset

Firstly, we evaluated our method on the MSAR 5000 dataset for each image with a single salient object and compare our results with other seven methods, including FT, LC, RC, HC, RBD, MR and SR. The visual comparison is given in Fig. 4, it can be seen that the proposed method is able to produce reliable and high quality saliency-detection results. Figure 5 shows the precision, recall and the F-measure values of all the different methods. From the comparison, our method can achieve the top precision. In addition, the proposed model outperforms the other existing methods in terms of detection accuracy. Experiment results performed on MSAR 5000 database show that our model is efficient and accurate.

Fig. 4
figure 4

Comparison of different saliency-detection methods. a Input images. b-h saliency maps generated by FT [1], HC [4], LC [4], SR [11], RC [4], MR [44], RBD [48], accordingly. i Saliency maps of our method. j Ground truth. The first to four rows are from MASK dataset, and five to eight rows are from ECSSD dataset, while nine to twelve rows are from SOD dataset, respectively

Fig. 5
figure 5

Comparison of different saliency-detection methods in terms of average precision, recall, and F-measure. Left: P-R curve on MSAR-5000 dataset; Right: F-measure on MSAR-5000 dataset

4.2 Evaluation on ECSSD dataset

We also tested our method on the publicly available saliency-detection dataset ECSSD, which contains 1000 images with complex foreground and background. We also quantitatively compared our method with other methods of FT, LC, RC, HC, RBD, MR and SR. Figure 4 (five to eight rows) shows some results of saliency maps generated by different methods. It can be observed that the proposed method can handle complex foreground and background, generating the detection results of almost entire salient objects and preserving the outline of salient object more clearly.

Figure 6 gives the precision, recall and the F-measure values on the dataset ECSSD of individual algorithms. According to P-R curve, the proposed model produces the optimal saliency-detection result. In terms of F-measure, the proposed method also yields the best result compared with other existing saliency-detection algorithms.

Fig. 6
figure 6

Comparison of different saliency-detection methods in terms of average precision, recall, and F-measure. Left: P-R curve on ECSSD dataset; Right: F-measure on ECSSD dataset

4.3 Evaluation on SOD dataset

We also compared our method with other state-of-the-art methods on SOD dataset [29]. This dataset contains many multiple object images with complicated backgrounds. The P-R curve and F-measure histogram are shown in Fig. 7. From Fig. 7, we can see that the proposed model is able to generate comparable saliency-detection results and even give a superior performance to other existing methods.

Fig. 7
figure 7

Comparison of different saliency-detection methods in terms of average precision, recall, and F-measure. Left: P-R curve on SOD dataset; Right: F-measure on SOD dataset

5 Conclusion and discussion

In this paper, we have proposed a novel scheme combining multiple low-level saliency priors and local similarity cue for efficient and accurate saliency detection. We first utilize multi-low-level prior maps and propagation mechanism based on Cellular Automata to achieve reliable saliency detection, then saliency refinement is utilized to further augment the saliency maps more smoothly and preserve the salient object outlines completely. We have tested the designed model on three publicly available datasets, and experimental results show that the proposed method is effective and efficient.