1 Introduction

Saliency detection is concentrated on detecting the most attractive objects in an image. Recently, this area has witnessed rapid progress. As a preprocessing procedure, automatic saliency detection has been widely used in a variety of computer vision tasks such as image segmentation [1], object recognition [2], compression [3], image retrieval [4], de-blurring [5], and others.

Several saliency models have been proposed in the past years. Due to the lack of a uniform definition of salient objects, most salient object detection methods are based on effective assumptions. Contrast prior is one of the most popular principles adopted by various kinds of models from either a local or global view [12, 13]. Essentially, local contrast-based methods [14] prefer to detect the high-frequency information such as edges, failing to pop out the salient holistic object, as shown in Fig. 1b. On the contrary, global contrast-based methods can locate the salient object while the performance of these methods is limited in such scenarios when the foreground regions are complex and with diverse appearance, as shown in Fig. 1c [13].

Fig. 1
figure 1

Examples to illustrate the differences among various saliency models. (a) Input; (b) FT [6]; (c) PCA [7]; (d) GS [8]; (e) LPS [9]; (f) RR [10]; (g) MAP [11]; (h) Ours; (i) Ground-truth

Fig. 2
figure 2

Main pipeline of the proposed saliency detection algorithm on an example image

To address the limitation of the contrast cue, boundary prior is also applied to detect salient regions, where the image boundary areas are looked upon as background [15,16,17]. For example, Yang et al. [15] generated the saliency of image regions via manifold ranking, in which the regions on four boundary sides are treated as queries. However, such methods are fragile and limited when the object appears on the image boundary. To overcome this limitation, several improvement mechanisms have been proposed. Lu et al. [18] proposed an effective method based on dense and sparse reconstruction using the background templates. Xia et al. [19] exploited corner information together with the convex hull to extract background seeds. Li et al. [9] removed the top 30% border pixels with a considerable color difference among a border set. Wang et al. [20] dropped the untrustworthy superpixels with sharp edge out of the border set to obtain the reliable background regions. Zhou et al. [21] used diffusion-based techniques on the proposed sparse graph. All of them perform well in easy cases, but they still struggle in complex scenes for two reasons: 1) the distinctiveness of their calculation is confined to the border set instead of the entire image; 2) their constructed similarity metric is weak, as shown in Fig. 1d-e.

Furthermore, it is insufficient for saliency detection based on simple boundary prior, especially when the image has an intricate background structure or the salient object has a similar appearance with non-salient. Accordingly, a variety of methods have been proposed. Some researchers committed to conducting a robust discriminative matrix based on high-level information to enhance the difference between object and background. Lee et al. [22] concatenated encoded low-level distance map and high-level features to compute saliency. Hou et al. [23] introduced short connections to the Holistically-Nested Edge Detector (HED), which can effectively combine rich multi-scale features to identify the whole salient object and accurately capture its boundary. Liu et al. [24] proposed Deeply-supervised Nonlinear Aggregation (DNA) to make better use of the complementary information on various side-outputs by using a nonlinear side-output prediction aggregation. Even though these models show promising performance on benchmark datasets, they need to collect large hand-labeled images, yet expensive to set up the learning framework. On the contrary, based on the boundary prior, different propagation models have been put forward to improve the visual quality of saliency maps. Li et al. [10] estimated the saliency via regularized random walks ranking, Jiang et al. [11] formulated saliency detection based on Markov absorption probabilities on an image graph model, and Yu et al. [25] presented a cross-diffusion process for salient object detection. However, they still suffer from a saliency overestimation of the background when the background is clustered, and the similarity metric is inefficient. Besides, they may not be able to highlight salient objects fully and may even produce erroneous results, because the graph-based approaches ignore the overall consistency between different regions of the salient object, as depicted in Fig. 1f and g.

In this paper, we propose a novel saliency detection method, which is a hybrid of background scatter and foreground contour completeness. The pipeline of the proposed method is depicted in Fig. 2. First, we derive superpixels from an edge guided segmentation. Different from previous methods for the extraction of a robust background, we consider it from the global view. Namely, the background scatter of the whole image is introduced to remove the foreground noise from the border set (BS) by setting different thresholds for different boundaries, so then the solid boundary background (BG) can be achieved. Based on the boundary prior, a boundary contrast saliency map is then generated. Besides, with the help of the BG together with the boundary contrast saliency map, a contour completeness map can be produced by considering the expectation of times of the regions being activated in a hierarchical segmentation space. Finally, we refine the integrated result to obtain a more smooth and accurate saliency map based on an iterative Bayesian optimization framework. Comprehensively, the main contributions of this proposed research lie in the following aspects:

  1. (1)

    We proposed a novel model for saliency detection via background scatter and foreground contour completeness. A selective mechanism for robust background nodes is presented based on background scatter. To maintain the completeness of the salient object, the contour completeness saliency map is derived via the completely closed shape of the object.

  2. (2)

    An iterative Bayesian optimization framework is proposed to achieve improved performance by optimizing the integration result of background-based saliency map and contour completeness saliency map into a more favorable result.

  3. (3)

    Extensive experiments and comprehensive analysis are conducted on four public datasets that demonstrate the effectiveness of the proposed method against state-of-the-art saliency detection methods.

The work in this paper is a substantial extension of our preliminary study titled “A hybrid of background scatter and foreground contour completeness for salient object detection” [26] (referred to as BSFC). Three-fold extensions are made upon BSFC in this paper, namely algorithm, framework, and experiment. Specifically, this paper proposes to additionally develop a discriminative metric to measure the similarity between different nodes, which can improve the effectiveness of the proposed algorithm (as demonstrated in Section 2.1). Secondly, we improve the original framework by adding a refinement step, which can highlight the foreground while suppressing the background (as demonstrated in Section 2.5). Lastly, we conducted more plentiful experiments by comparing with more state-of-the-art methods and implementing the more comprehensive and in-depth analysis of the algorithm.

2 The proposed approach

The remainder of the proposed algorithm is organized as follows. First, a mechanism for extracting robust boundary background regions based on scatter prior is proposed. Next, two saliency cue maps (boundary contrast map and contour completeness map) are generated via two different computational schemes, and lastly, refinement to obtain the final saliency map is proposed.

2.1 Background scatter

The structured random forest edge detector [27] to derive the probability of boundary (PB) is adopted, so then superpixels are obtained by thresholding an ultrametric contour map (UCM) based on PB. Let us define the initial segmentation \(P_{0} =\{R_{i}\}\) and the number of regions is K0. Based on the superpixels segmentation result, an undirected weighted graph G =< V,E > is constructed, where V is the set of superpixels, and E is weighted by an affinity matrix.

Boundary prior is a handy cue that has been widely used in saliency detection that assumes the image boundary as background regions. However, it may lead to adverse effects if we directly use all border nodes as background when the object appears on the border. Accordingly, we perform boundary analysis to exclude salient superpixels on the four boundaries of the image. In an image, the salient regions usually represent a similar color and compact spatiality as well, while non-salient regions generally show a different appearance and loose scatter around. Hence, we introduce the scatter prior measured by spatial variance that takes the complete image as the research object instead of the border set (BS). In detail, it is defined as below:

$$ Div_{c}(i)=normalize(\sum\limits_{j=0}^{N}w_{ij}\cdot|s_{j}-\rho_{i}|), $$
(1)
$$ \rho_{i}=normalize(\sum\limits_{j=0}^{N}w_{ij}\cdot s_{j}). $$
(2)

Herein, si and ρi indicate the position and the weighted mean position of superpixel i, respective; N = K0 represents the number of superpixels; normalize(x) is a function that normalizes x to [0,1]; wij is the edge weight used to measure the similarity between different nodes. At this point, a joint metric taking the color difference, spatial distance, and edge information into consideration is adopted to compute the edge weight wij. The following aspects are considered:

  1. (1)

    According to the cognitive property of color similarity, image regions with similar colors usually belong to the same class.

  2. (2)

    According to spatial proximity property, spatially adjacent image regions are likely to share the same label.

  3. (3)

    In some cases, using the edge map can be better to highlight the outline between foreground and background than (1) and (2), as shown in Fig. 3.

Fig. 3
figure 3

Effects of edge-weight. (a) Input; (b) Only using color affinity matrix; (c) Only using spatial distance affinity matrix; (d) Using both color and spatial distance affinity matrix; (d) Edge detection by [27]; (e) The proposed affinity matrix

Based on the above considerations, the proposed edge weight is formulated as:

$$ w_{ij}=exp(-\frac{d_{c}(i,j)+d_{s}(i,j)+d_{e}(i,j)}{2{\sigma_{w}^{2}}}), $$
(3)

where σw controls the strength of weight between a pair of nodes, σw = 0.1 is empirically set, dc(i,j), ds(i,j), and de(i,j) represents the color difference, spatial distance, and contour magnitude, respectively. The color difference dc(i,j) is defined:

$$ d_{c}(ij)=|R_{i}- R_{j}|, $$
(4)

where |RiRj| denotes the Euclidean distance between superpixel i and j in CIELab color space. For spatial distance ds(i,j), we combine geodesic distance and sine spatial distance to measure the distance between superpixel i and j:

$$ \begin{array}{l c@{~}l} s_{ij}=sqrt\left( \min\left( X_{ij},I_{x}-X_{ij}\right)^{2}+\min\left( Y_{ij},I_{y}-Y_{ij}\right)^{2}\right),\\ s.t.\ \ X_{ij}=|sin(\pi\cdotp |x_{i}-x_{j}|)|\\ \ \ \ \ \ \ Y_{ij}=|sin(\pi\cdotp |y_{i}-y_{j}|)| \end{array} $$
(5)

where xi, yi, Ix and Iy refer to the x- and y-coordinates of the region i and image, respectively. The calculation mechanism can realize the connection of the regions with the same structure well in an arbitrary shape and range. de(i,j) is defined as the accumulated weight along the shortest path on the graph, i.e.

$$ d_{e}(ij)=\min_{u_{1}=i, \cdots, u_{k}=j}\sum\limits_{j=1}^{k-1}e_{u_{j}u_{j+1}}, $$
(6)

where the weight of graph edge eij is assigned as the strength of image edge between i and j in terms of UCM.

As shown in Table 1, it is noted that there are different probabilities of the object appearing at different boundaries. Accordingly, we choose different threshold values for different sides instead of setting a single value, and a robust boundary background set BG can be obtained from the border set BS. As shown in Fig. 4, the proposed method is superior to other methods.

Table 1 Probability distribution on different datasets
Fig. 4
figure 4

Robust boundary background extraction. (a) Input; (b) Original border set BS; (c)-(e) Boundary background extracted by different methods: BFS [20], LPS [9] and Ours

2.2 Boundary contrast map

According to the boundary prior, a saliency map can be obtained by measuring the contrast with the robust background BG. However, superpixels in the background may only be similar to a portion of the border set rather than all of the background nodes. Therefore, it is unwise to measure the saliency value of a superpixel by counting all the differences between corresponding superpixel and the background seeds. To solve it, several methods have been proposed in previous works [10, 28]. In this paper, we put forward a distinctive way of measuring the boundary contrast map. First, the Kmeans algorithm is applied to divide the set BG into K clusters (empirically set K = 3). For each cluster k, we compute the mean color \(\bar {c}_{k}^{m}, m \in \{L,a,b,Lab\}\) in CIELab color space and the color covariance matrix V ark. Also, for each superpixel, we select one of the K clusters that have the minimum difference from the superpixel. Then the saliency of superpixel Ri can be obtained based on the Mahalanobis distance:

$$ S_{B}(R_{i})=\underset{m \in \{L,a,b,Lab\}}{\sum} v^{m}(R_{i}), $$
(7)

and

$$ \begin{array}{l l} & v^{m}(R_{i})=\\ &\!\! \sum\limits_{(w,h) \in R_{i}}^{|R_{i}|}P_{R_{i}}\sqrt{({c_{g}^{m}}(w,h) - \bar{c}_{g}^{m})(Va{r_{g}^{m}})^{-1}({c_{g}^{m}}(w,h) - \bar{c}_{g}^{m})^{T}}, \end{array} $$
(8)

where g represents the index of the cluster that has the smallest difference with Ri; (w,h) indicates a pixel in superpixel Ri; |Ri| is the cardinality of Ri; \(P_{R_{i}}\) provides the proportion of superpixel Ri compared with the whole image. Then the background contrast saliency map SB can be achieved by normalizing the obtained saliency values to [0,1]. Fig. 5 shows the results of SB. It can be seen that SB can highlight the salient regions.

Fig. 5
figure 5

Boundary contrast map and contour completeness map. From left to right: (a) Input; (b)SB; (c) Contour completeness map without processing [29]; (d) SC; (e) Ground-truth

2.3 Contour completeness map

Boundary contrast map SB in collaboration with BG works well, whereas in some complex scenes. As is shown in the second example of Fig. 5b, relying solely on the boundary prior might lead to highlight the background regions erroneously. Generally, the salient object has a well-defined closed boundary [30]. As a consequence, the contour completeness cue is introduced to facilitate the salient object detection.

A hierarchical segmentation Pξ can be obtained by setting different ξ ∈ [0,ξN] for the derived hierarchical UCM. In this paper, we compute the expectation of times for which a region is superimposed in the whole hierarchical segmentation space to determine whether a region has a closed outline or not in segmentation Pξ = {Ri}. That is, the more times the holistic homogeneous regions are activated, the more salient they are. An indicator map Q at level ξ is defined:

$$ Q_{\xi}(x)=\left\lbrace \begin{array}{ll} 1 \ \ if x \in R, R \in P_{\xi}^{in} \\ s \ \ if x \in R, R \in P_{\xi}^{out}, \end{array} \right. $$
(9)

where \(P_{\xi }^{in} =\{R_{i}|R_{i}\cap BG=\emptyset \}\) is the set of inner regions, \(P_{\xi }^{out} =P_{\xi } \setminus P_{\xi }^{in}\) is the set of outer regions. Different from [29], we adopt BG instead of BS in \(P_{\xi }^{in}\). The rationale behind this is that it may yield a bad result if we adopt BS in \(P_{\xi }^{in}\) when the salient objects connect to the boundary (see the third row in Fig. 5d).

Because the accuracy of the saliency map is sensitive to the number of superpixels, the problem of an optimal threshold selection needs to be solved in thresholding segmentation. However, it is difficult for us to obtain an optimal threshold value by computing directly. To handle the problem, we first introduce saliency weights to assign a different probability to each pixel belonging to be the salient object. Here, we adopt the saliency value of SB as the weights of each pixel, defined as s. By statistically averaging the expectation of the entire hierarchical segmentation space, we obtain the contour completeness saliency map:

$$ S_{C}(x)=\int Q_{\xi}(x)\cdotp p(\xi)d\xi, $$
(10)

where ξ obeys a uniform distribution with the probability density function Pξ. Besides, we compute the saliency map at different resolutions to further deal with the problem that salient objects are likely to appear at different scales, so then they are resized into the standard size and integrated to retrieve the strong map SC, as shown in Fig. 5e. From observations, we note that the proposed method can work well in the cases where salient regions appear at the boundary.

2.4 Integration

The saliency maps measured by boundary contrast and contour completeness are complementary to each other: the former prefers to highlight the object while the latter aims to suppress noise. Thus, effective integration is a wise operation. Although previous works adopt linear combinations to fuse individual saliency maps using weight for simplicity, it is not feasible because such processing will either weaken the suppression of background noise or interfere with foreground detection.

In order to better achieve the purpose of seeking common ground while reserving differences, we have designed a fine-grained fusion method. The mathematical definition of the fusion mechanism can be expressed as follows:

$$ \begin{array}{ll} S_{I}=&{\Upsilon}\{S_{C}^{*}\cdot S_{B}^{*}==1\}max(S_{C},S_{B})\\ &+{\Upsilon}\{S_{C}^{*}+S_{B}^{*}==0\}min(S_{C},S_{B})\\ &+{\Upsilon}\{S_{C}^{*}\neq S_{B}^{*}\}exp(S_{C})\cdot exp(S_{B}). \end{array} $$
(11)

Here Υ{x} = 1 if x is true, and Υ{x} = 0 otherwise. S is a saliency vector by binarizing a saliency map with a threshold calculated from S using the algorithm in [31]. Figure 6d demonstrates that the integration map is more accurate than the saliency map of Fig. 6b and c in the detection of foreground and background, which further proves the effectiveness of the proposed integration method.

Fig. 6
figure 6

Integration and refinement. (a) Input; (b) SB; (c) SC; (d) SI; (e) Final map without iteration Fal(t0); (f) Final map Fal(t); (g) Ground-truth

2.5 Iterative Bayesian optimization framework

In order to optimize the integration results, an iterative Bayesian optimization framework is proposed. According to Bayesian inference, posterior probability \(P(\varTheta _{k},k\in \{F,B\}|x_{i})\) representing the probability that region xi belongs to the salient ΘF (background ΘB) can be computed as:

$$ P(\varTheta_{F}|x_{i})=\frac{P(\varTheta_{F})P(x_{i}|\varTheta_{F})}{{P(\varTheta_{F})P(x_{i}|\varTheta_{F})}+P(\varTheta_{B})P(x_{i}|\varTheta_{B})}, $$
(12)

where \(P(x_{i}|\varTheta _{k},k\in \{F,B\})\) is a conditional probability. P(ΘF) and P(ΘB) = 1- P(ΘF) are the prior probabilities of being salient and background, respectively. In this work, we set the fusion result SI as the initial prior probability, i.e., P(ΘF) = SI and P(ΘB)= 1-SI. How to calculate the conditional probability will be described in detail as follows.

The key to the method based on the Bayesian framework is to obtain conditional probability. The current conventional method is to obtain a conditional probability by calculating the proportion of each quantized feature in a binary image. However, a simple thresholding operation may mislead further inferences, and it is also challenging to get a reliable threshold. Therefore, a soft approach is proposed based on iterative optimization to solve the problem. We define the probability of each region as the weighted average of all the saliency values. The conditional probabilities can be defined by [32]:

$$ P(x_{i}|\varTheta_{F})=\frac{{\sum}_{j}f(SF_{i})w_{ij}}{{\sum}_{j}w_{ij}}, $$
(13)
$$ P(x_{i}|\varTheta_{B})=\frac{{\sum}_{j}f(1-SF_{i})w_{ij}}{{\sum}_{j}w_{ij}}, $$
(14)
$$ f(z)=\frac{\theta_{1}Th^{\tau}}{\theta_{2}(1-z)^{\tau}+Th^{\tau}}+\theta_{3}. $$
(15)

f(z) is an enhancement function that makes the difference between foreground and background more apparent while keeping the saliency value of superpixel near the threshold (Th). The parameter 𝜃i is mainly to make sure that f(z) can pass through three defined coordinate points ((0, 0), (1, 1) and (Th, Th)), and τ = 4. In this paper, we get the optimal threshold Th by size prediction and feature weight:

$$ Th=\arg\min_{T}{W}^{f}_{T}{W}^{b}_{T}[\rho_{i}({S}^{i}_{T}-{B}^{i}_{T})]^{2}, $$
(16)

where \({S}^{i}_{T}\)(\({B}^{i}_{T}\)) and \({W}^{f}_{T}\)(\({W}^{B}_{T}\)) correspond to the mean value and the weight of the foreground (background) histogram in channel i of opponent color space [33], respectively. T denotes the percentage of the potential salient regions, and we set T ∈ [0.1,0.5]. The rationality of this hypothesis lies in that the size proportion of salient objects is usually in a range from 0.05 to 0.6 [34]. ρi is feature weight, which is measured by computing its separating power based on variance ratio.

$$ \rho_{i}=\frac{1}{\Gamma}\frac{Vr(Lg ,\frac{s+b}{2})}{Vr(L,s)+Vr(L,b)}, $$
(17)
$$ Vr(y,x)=\underset{i}{\sum}x(i)y^{2}(i)-[\underset{i}{\sum}x(i)y(i)]^{2}, $$
(18)
$$ Lg(i)=log\frac{s(i)+\phi}{b(i)+\phi}, $$
(19)

where ϕ is a constant used to prevent the numerator and denominator from being 0. \({\Gamma }={\sum }_{i}\rho _{i}\). Lg(i) is the log-likelihood of foreground and background, where s(i) and b(i) denote the discrete probability distributions of foreground and background, respectively.

Based on the above formulas, we can get \(P(\varTheta _{F}|x_{i})\). In order to optimize the results, we iteratively re-assign the prior function with P(ΘF) = \(P(\varTheta _{F}|x_{i})\) and the feature weights ρi with new ρi. To eliminate the disconnectedness and small fragments and improve the accuracy of detection, we utilize the smooth result of \(P(\varTheta _{F}|x_{i})\) to re-assign the prior function in the experiment. Herein, we adopt the guided filter [35] to smooth \(P(\varTheta _{F}|x_{i})\), which can well preserve the integrity of the edge of the object. Finally, we denote Fal(t), t = {t0,t1,⋯ ,tn} as the saliency map corresponding to various iterations (t), and Fal(t0) indicates the saliency map without iteration. Figure 6e-f show the result refined by the proposed iterative Bayesian optimization framework. Our model can always highlight the full salient objects while suppressing the background, which illustrates the effectiveness of our system.

3 Experimental results

In this paper, we extensively present comparison of the proposed algorithm against 15 state-of-the-art saliency detection methods that includes BFS [20], BL [36], BSCA [28], BSFC [26], FCB [37], CGV [38], GS [8], HCC [29], LPS [9], MR [15], MB [39], MILPS [34], RCRR [40], SF [41], SRD [42] on the ASD [6], DUT-OMRON [15], SOD [43], and ECSSD [44] datasets. For fair evaluation purposes, either the results provided by the original authors be directly used or execute their own implementations through the source codes publicly available. Admittedly, compared with the traditional methods, saliency methods using deep learning such as MDF [45] and ILS [46] can achieve better performance, although they require a large amount of training data. Considering that this paper is mainly focused on unsupervised learning, the proposed method is solely compared to traditional saliency methods.

3.1 Evaluation metrics

To compare the performance, several popular evaluation metrics are used in our experiments, which include the precision-recall curve (PR), average precision (AP), F- measure, overlapping ratio (OR) and mean absolute error (MAE).

We first use the Precision-Recall (PR) curves to evaluate the performance of the proposed method. For each saliency map, binary maps are generated by binarizing a saliency map with fixed thresholds varying from 0 to 255, and the PR curves are obtained by comparing the ground truth with binary maps.

For saliency detection, precision, and recall are often required to be high, though these two indicators are usually negatively correlated. To comprehensively assess the salient object detection model, the weighted harmonic mean of precision (P) and recall (R) called F-measure is calculated, given as:

$$ F_{\beta}=\frac{(1+\beta)P\cdot R}{\beta^{2}P+R}, $$
(20)

where β2 is set to 0.3 to emphasize precision [13]. Since a high recall rate can easily be obtained by detecting the complete image as the salient object, it is not possible to guarantee such accuracy, unfortunately.

Noting that the above metrics mainly focus on the probability of salient pixels being correctly detected and ignore the effects of correct assignment of non-salient pixels, and therefore OR and MAE are adopted to address these issues. MAE represents the measure of the similarity between saliency map S and ground-truth G [41]: MAE = mean(|SG|). OR is defined as the overlapping ratio between the segmented object mask S and ground truth G: OR = |SG|/|SG|, where S can be obtained by binarization of S with an adaptive threshold, i.e., twice the mean values of S as in [47].

3.2 Quantitative results

We extensively present a comparison of the proposed algorithm against 15 recently proposed algorithms on four datasets, and the experimental results are summarized in Fig. 7 and Table 2. The results show that, in most cases, the algorithm ranks first or second in different evaluation indicators on the test datasets.

Fig. 7
figure 7

Quantitative evaluations by precision-recall curves (left two columns), F-measure (third column), and mean absolute error (MAE) (right column) on four benchmark datasets: from top to bottom are ASD, DUT, ECSSD, and SOD

Table 2 Results on three datasets in terms of AP↑, F-Measure↑, OR↑, and MAE↓

Specially, we can observe that our model achieves the best performance in terms of AP (0.9372, 0.8158, 0.7848), F-measure (0.9299, 0.7851, 0.7171), and OR (0.85159, 0.58105, 0.44333) on ASD, ECSSD, and SOD datasets. For the challenging DUT-OMRON dataset, our method performs the highest F-measure (0.669) and OR (0.48892), and the second-best MAE (0.11734) with a minor margin (0.00259) to the best results HCC (0.11475). Besides, except for the MAE on the SOD dataset, all indicators of the algorithm on other datasets are better than BSFC. Besides, our method can achieve the highest precision and recall simultaneously, which ensures that the results obtained by our method are closer to their ground truth, and the proposed method is more applicable to practical applications. Furthermore, our method achieves relatively superior performance compared to the other methods for PR curves on four datasets.

The above quantitative comparisons show that the proposed method consistently outperforms all the compared state-of-the-art approaches on various scenes.

3.3 Qualitative results

We provide some saliency maps of the proposed algorithm and other thirteen state-of-the-art algorithms in Fig. 8. From the results, we can see that the saliency maps generated by our algorithm are of the best similarity to the ground truth. For images with a single object, in the first two rows, our method can ultimately retain salient objects while background noise is well suppressed simultaneously. When an image has complex background structures, the proposed algorithm can still obtain favorable results with a less noisy background. For example, in the 3rd and 4th rows, our saliency maps can uniformly pop out the foreground object of the images, but other algorithms fail to extract the salient object from the scattered background. Also, when salient objects and backgrounds share a similar appearance, our algorithm can detect salient regions accurately. At the same time, other methods either fail to identify the salient objects or incorrectly include background regions into detection results, as shown in the 5th and 6th two rows. Furthermore, as shown in the 7th and 8th two rows, our results perform well and can preserve the completeness while other methods are a weakness when salient objects are on the boundaries, which conclusively prove our high robustness of background selection. All the above results demonstrate the robustness and the effectiveness of our method in highlighting salient objects and restraining the background regions.

Fig. 8
figure 8

Comparison of our saliency maps with other thirteen state-of-the-art algorithms. (a) Input; (b) SF; (c) GS; (d) MR; (e) BFS; (f) LPS; (g) BSCA; (h) MB; (i) RR; (j) BL; (k) CGV; (l) SRD; (m) HCC; (n) MILPS; (o) Ours; (p) Ground-truth

3.4 Analysis of our algorithm

  • (1) Effectiveness of the proposed boundary prior map To demonstrate the effectiveness of the proposed boundary prior map, we compute the PR curves and quantitative results of F-measure for boundary prior map on the ASD dataset. Different background seeds are received by three methods (BFS [20], MR [15], and LPS [9]) and construct their corresponding boundary-based saliency maps. The resulting curves and quantitative results are shown in Fig. 9a, where the dotted line shows the performance of each method using their own boundary set while the solid line provides the performance using our boundary nodes. Also, the red line shows the performance of our boundary contrast map. As shown by these curves, the proposed method outperforms those competitive methods. Yet, the performance of three saliency models has also been conspicuously improved, illustrating the effectiveness of the selection of boundary nodes and the computation of the background-based saliency map.

  • (2) Validation of components in contour completeness map Since both the robust boundary background set BG and saliency weights are introduced into the contour completeness map, we further examine the effectiveness of each component for the contour completeness map. Herein, we consider different components for comparative analysis, including:

    1. (a)

      contour completeness map based on BS without saliency weights (SCBS),

    2. (b)

      contour completeness map based on BG and without saliency weights (SCBG),

    3. (c)

      contour completeness map based on BS with saliency weights (SCBSW),

    4. (d)

      contour completeness map based on BG with saliency weights (SCBGW).

    Fig. 9
    figure 9

    Precision-recall curves and quantitative results of F-measure to measure the effectiveness of our algorithm. (a) Evaluation of our background-based saliency map on the ASD dataset; (b) Evaluation of our contour completeness saliency map on the SOD dataset; (c) Evaluation of each component in the proposed algorithm on the SOD dataset.

    The resulting curves on SOD and the corresponding quantitative results are shown in Fig. 9b. We note that the performance of SCBG (SCBGW) is higher than the performance of SCBS (SCBSW) since the former takes the issue of the object connecting to boundary into consideration while the latter directly use all boundary nodes as background, effectively demonstrating the effectiveness of BS in improving the contour completeness map. Besides, we observe that the performance of SCBSW (SCBGW) is higher than the performance of SCBS (SCBG), due to the recent introduction of saliency weights that assign a different probability to each pixel belonging to be a salient object. Furthermore, Compared with SCBS, SCBG, and SCBSW, SCBGW achieves the best performance, which conclusively proves the necessity of considering the two indexes (BG and saliency weights) in the calculation of the contour completeness map.

  • (3) Examination of refinement Since the system optimization adopts an iterative mechanism, we first verify the stability of the system. To this end, we compute the performance of the proposed method on a different step of iteration with both PR curve and F-measures on two datasets (SOD and ECSSD), as shown in Fig. 10. We note that the system obtains a stable performance with only three steps of iterations (i.e., Fal(t3)), since the performance of the proposed system cannot further be improved by increasing the steps of iterations. Furthermore, to validate the effectiveness of the refinement, a comparison between the final result and other steps from both quantitative and qualitative aspects are processed and shown in Fig. 9c, where the quantitative comparison results of the saliency refinement mechanism on the SOD dataset present performance enhancement of the saliency map. Also shown in Fig. 6e and f, the salient object can be highlighted uniformly, and the background can be suppressed effectively due to the refinement, which qualitatively verifies the effectiveness of the refinement mechanism.

Fig. 10
figure 10

Quantitative evaluation on two salient object datasets for our method with various iteration steps (a) SOD and (b) ECSSD

3.5 Running time

The execution time test is conducted on a 64-bit PC with Intel Core i5-4460 CPU @ 3.20GHz and 8G memory. All codes for the experimentation testings are provided by the corresponding authors and executed without any change in MATLAB R2015a with C++ mex implementation, and the average running time is computed on the ASD dataset.

We selected several competitive accuracy methods or those akin to the proposed algorithm, and the results are shown in Table 3. It is significantly faster than LPS, RR, and BFS, even though being slower than MR and GS, the proposed method still outperforms them both considering the overall evaluation performance. Therefore, it can be attained that the proposed algorithm can achieve a reasonable balance between accuracy and efficiency.

Table 3 Running average time test results (seconds per image) on ASD

3.6 Limitation and analysis

Although the proposed method can perform well in most of the cases, there remain some challenging scenarios that cannot accurately extract the complete salient objects. The last two rows of Fig. 8 show some failure cases. When an image contains multiple salient objects in complex scenes, the proposed method may not work well in highlighting all objects in an image due to the difference and diversity of multi-objects in terms of size and position. However, it is also tricky for state-of-art saliency detection methods. For an image with a complex background structure and small differences between object and background, it is inadequate for our method to achieve the satisfactory detection results by the proposed low-level and mid-level features. Hence, it is an excellent choice to design an even more discriminative similarity metric by incorporating the traditional low-level features and the high-level, in-depth learning features. Besides, the study on the reliability of edge and the selection of an optimal segmentation threshold can improve the performance of saliency detection as well.

4 Conclusions and future work

In this paper, we present an efficient salient object detection algorithm via background scatter and foreground contour completeness. To accurately extract robust background, background scatter is proposed and designed, and the corresponding boundary contrast map can be obtained via boundary prior. Besides, contour completeness embedded with boundary contrast map can be applied together with robust boundary nodes, to generate contour completeness saliency map, so finally, an optimization function is proposed to highlight salient objects uniformly. Experimental results not only demonstrate the superior performance of the proposed method but also the right balance between accuracy and computation cost.

As future work, we investigate efficient methods that incorporate high-level features with the support of deep learning to achieve higher performance. Taking into consideration that image depth information plays an essential role in saliency detection and challenging to collect image depth information via camera, ways how to use non-camera methods to obtain image depth information and improve saliency detection in also be investigated and validated.