1 Introduction

Studying, biking, running and most of the human activities depend on visual system concentration named visual attention. It means focusing on specific areas and ignoring other regions [1]. Expanding methods for visual attention modeling have yielded to the birth of new problems and issues. One of the popular research branches of visual attention modeling is salient object detection [2] which concentrates on important objects extraction. Salient object detection is an attractive research area in last decades due to various applications in computer vision such as visual tracking [3], video quality assessment [4], image retrieval [5], flame detection [6], motion detection [7] and superpixel generating [8].

Taking together, various types and including vast fields of mathematics are good reasons for using graphs in saliency modeling. In graph-based approaches, saliency detection is mostly performed in four main steps: 1) node set creation, 2) feature extraction, 3) graph construction and 4) saliency detection strategy. In order to produce a node set in the first step, image portioning methods such as simple linear iterative clustering (SLIC) [9], Laplacian sparse subspace clustering (LSSC) [10], blocks and other segmentation approaches [11, 12] were employed. In the second step, specific features such as global contrast [13, 14], local contrast [15, 16], spatial distribution [15, 17] and mean histogram of optical flow [18] were exploited as the inputs for next step. To form the graphs in third step, different graphs such as complete weighted graph [14, 19, 20], k-regular graph [15, 17, 18, 21] and hypergraph [22] were suggested. In the fourth step, four strategies were frequently used to determine the saliency values:

  1. 1-

    Random walk distribution [13, 14, 16, 20, 21]: In this class, a random walker was assumed to walk on graph nodes. Then, based on equilibrium Markov chain distribution, saliency values were computed.

  2. 2-

    Manifold ranking graphs [15, 23]: In these methods, prior background/foreground queries (nodes) were determined. Then, a ranking function was defined to rank all other nodes of the graph. Finally, saliency map was generated by using nodes ranks.

  3. 3-

    Convex hull producing [10, 17]: Assuming several facts or applying initial process labeled some regions as background/foreground regions. Then, the convex hull was created to form saliency maps.

  4. 4-

    Cost function optimization [22]: A cost function on the graph nodes was defined. Then, by optimizing cost function, the salient objects were detected.

In the graph-based approaches, some methods used SLIC algorithm to create a node set. Yang et al. [15] ranked the similarity of image elements based on foreground and background cues via a k-regular graph. They defined different query sets for foreground nodes and background nodes. According to these queries, other image nodes have ranked and their saliency value has defined in terms of relevance of nodes to foreground or background nodes. Fu et al. [18] proposed a salient object detection method in videos. They extracted “mean histogram of optical flow” as a feature. Next, they applied the graph-based manifold ranking approach [15] for salient object detection. Yang et al. [17] proposed three-stage salient object detection framework. They constructed an initial saliency map and then a convex hull of interest points were created to find the localization of objects. Finally, they minimized a continuous energy function via graph regularization to detect the salient object. They investigated local and global features to learn the similarity values for nodes and thereby they obtained a faithful graph affinity model. A semi-supervised manifold algorithm was applied to learn the weights, and foreground/background seeds were utilized for saliency detection. Yan et al. [23] developed a linear propagation algorithm that created the weighted graph based on similarity of regions. The border of the image was labeled as background prior, and saliency was derived by propagating labels on the graph. Jiang et al. [21] detected saliency by defining random walk prior in the complete graph and employing virtual nodes as absorbing nodes. Then, saliency map was created by normalizing the absorbed time for each node (superpixel). Roy et al. [24] suggested bottom-up saliency detection technique by extracting three low-level cues. The similarity of regions was computed by the exponential of color difference and used for weight matrix of image graph. Rarity was exploited using eigenvector of Laplacian of the graph. Spatial compactness and background prior were extracted by combining color and spatial disparity and finding Mahalanobis distance from the mean background color, respectively. Nouri et al. [25] represented a salient object detection method by introducing a global contrast graph. The global contrast was extracted to form a complete weighted graph. Then, a global contrast graph was defined by omitting high weighted edges and degree vector was computed to create saliency map.

In some other methods, LSSC was used to create graphs. Xie et al. [10] introduced LSSC algorithm for image partitioning based on extracted local features. Then, low cues were adopted on the convex hull of interest points to create the saliency map. Li et al. [22] generated superpixels by hypergraphs [12] method and modeled saliency detection task as finding salient vertices or hyperedge. They also proposed an alternative approach based on center-surround contextual contrast analysis and optimized an objective cost function of support vector machine (SVM).

As we mentioned before, some other methods produced regions by specific graph-based segmentation approach [11]. Lu et al. [26] explored the fact that stated region of the convex side of a curved boundary tends to assign an object. A hierarchical model was designed based on multi-scale analysis of human visual perception to add up concavity context in different scales. Graph-cut algorithm [27] was adopted to extract salient objects. Yu et al. [14] also introduced maximal entropy random walk prior on fully connected graphs by two different weight matrixes. They assigned region saliency values in terms of stationary probabilities.

Partitioning image by creating blocks with same size is another way to produce nodes set. Kim et al. [16] modeled eye movements as a random walk on the graph. In Random Walk with Restart (RWR) prior, high weight was assigned to an edge if its connected nodes had dissimilar intensity and color feature and if the ending node was more compact than starting node. Then, they computed stationary distribution of Markov chain as saliency map.

The following sections describe proposed method for salient object detection, experimental results and conclusion remarks, respectively.

Fig. 1
figure 1

The flowchart of the proposed method

2 Proposed method

In this paper, we propose a graph-based salient object detection method using multiple graphs. As it is shown in Fig. 1, the proposed method consists of three major stages. In the first stage, preprocessing, the input image is partitioned into superpixels to create a node set for global, local and high contrast graphs. Then, color, spatial and background features are extracted from superpixels to form color, spatial and background labels. In the second stage, three different graphs consisting of global contrast graph, local contrast graph and high contrast graph are produced by using superpixels as nodes and similarity of superpixels as edge weights. Finally, in the third stage, most salient nodes are defined based on graphs obtained in the two previous stages. Integration approach is employed to combine degree vectors of graphs to create saliency maps. The saliency value of each node is assigned to its pixels. The performance of the proposed method is evaluated in terms of resulted saliency maps on three publicly available benchmark datasets.

2.1 Preprocessing

In order to provide necessary materials for graph construction, the input image is partitioned into superpixels using SLIC [9]. SLIC is a region clustering method that adopts k-mean algorithm to create regular superpixels with precise boundary. By applying this algorithm, input image is broken into N superpixels \(V=\{v_1 ,v_2 ,...,v_N \}\) used as a node set for the graphs construction. Next, the average color of pixels forming each superpixel is transformed from RGB into CIE Lab color space due to the closeness of Lab space to human visual perception. It is considered as the color label of superpixel: \(C=\{c_1 ,c_2 ,...,c_N \}\). In addition, the spatial labels \(l_i =(x_i ,y_i )\) have been computed which are the coordinates of pixel \((x_i ,y_i )\) located in the first, left and up, in each superpixel. The spatial labels are named spatial label set: \(L=\{l_1 ,l_2 ,...,l_N \}\). Finally, all superpixels in the image border that include first row, last row, first column and last column of the image are considered as background nodes [21]. The color labels of background nodes are used to obtain background label \(B=\frac{1}{M}\mathop \sum \nolimits _{i=1}^M {c_i } \).

2.2 Graphs construction

Three different graphs are constructed to exploit specific features, where nodes set is V and edges weights set created by regions similarities. The distances between color and spatial labels are used to define regions similarities. Degree vector of each graph is computed and normalized to use as inputs for next stage (i.e., saliency detection).

2.2.1 Region similarities

The similarity between nodes \(V=\{v_1 ,v_2 ,...,v_N \}\) is defined as follows:

$$\begin{aligned} w_{ij} =\left\{ {{\begin{array}{ll} {\exp \left( {-\frac{{D_c (c_i ,c_j )}/{D_s (l_i ,l_j )}}{\sigma ^{2}}} \right) }&{} {i\ne j} \\ 1&{} {i=j} \\ \end{array} }} \right. \end{aligned}$$
(1)

where \(w_{ij} \) is similarity between regions i and j, and \(D_s (l_i ,l_j )\) is Euclidean distance between spatial labels of regions i and j. The \(D_c (c_i ,c_j )\) is color distance between regions ij with background influence obtained by \(D_c (c_i ,c_j )=(c_i -B)^{2}\times \left\| {c_i -c_j } \right\| \).

In region similarity measurement, three factors play important roles: (1) color distances, (2) spatial distances and (3) having close color label to background label. If the color distance is low, region similarity is high which makes sense. If the spatial distance is high, region similarity is also high which decreases the effect of far regions on each other, based on graph construction rules which are described in the following subsection. Finally, if the color label of a region \(c_i \) is close to background label B, the similarities between node i and other nodes, \(w_{ij} \), are increased. It decreases the influence of region in saliency detection process according to graph construction rules described in next subsections.

Fig. 2
figure 2

The block diagram of the proposed graph integration technique

2.2.2 Global contrast graph

The image can be represented as a complete weighted graph \(G=(V,E)\), where nodes \(V=\{v_1 ,v_2 ,...,v_N \}\) are superpixels and E are a set of entire possible edges between nodes. The \(w_{ij} \)element of the region similarity matrix W is considered as the weight of the edge \(e_{ij} \). In this work, we used the global contrast graph [25] as a graph combined with two other graphs to create saliency maps. In order to form global contrast graph, a threshold on matrix W is defined as follows:

$$\begin{aligned} \mathrm{Thr}_\mathrm{GCG} =\frac{\alpha }{N^{2}}\sum _i {\sum _j {w_{ij} } } \end{aligned}$$
(2)

All edges with weights greater than the threshold \(\mathrm{Thr}_\mathrm{GCG} \) are omitted. The remaining graph is named as global contrast graph which is undirected unweighted graph. Notice that by omitting the high weights connections, the most similar and common connections are omitted and edges representing more discriminative regions are not removed; thus, nodes with more connections are more dominant in the image. Degree vector of graph nodes \(D=\{d_1 ,d_2 ,...,d_N \}\) is created by counting the number of connected edges to each node and is normalized using \(D_N =\frac{D-\min (D)}{\max (D)-\min (D)}\). Low-value degrees in \(D_N \) are regions belonged to the background. It causes having a high difference from dominant regions. In order to remove the effects of background degrees in future processing, an adaptive thresholding is adopted as \(t_{adp} =\frac{\eta }{N}\sum _{i=1}^N {D_N (i)} \), where \(\eta \) is a constant value. As mentioned in Sect. 2.2.1, high spatial distances and close color labels to background label yield high region similarity. Thus, to create global contrast graph, high weights connections are omitted. High spatial distances and close color label to background label cause fewer connections of nodes.

2.2.3 High contrast graph

The same procedure as global contrast graph construction is taken by a lower threshold on edge weights to generate high contrast graph. The lower threshold is defined as follows:

$$\begin{aligned} \mathrm{Thr}_\mathrm{HGC} =\frac{\beta }{N^{2}}\sum _i {\mathop {\min }\limits _j w_{ij} } \end{aligned}$$
(3)

where \(\beta \) is constant. Similar to global contrast graph, degrees vector is constructed and low degrees of high contrast graph nodes are omitted by an another adaptive threshold. Then, normalization is applied as Eq. (3) and resulted degree vector is created.

2.2.4 Local contrast graph

Regions in the vicinity of sharp edges play an important role in saliency detection process. Between each pair of nodes with color distance \(D_c (c_i ,c_j )\) more than \(\lambda =\frac{1}{5}\sum _i {{\begin{array}{l} {\max } \\ j \\ \end{array} }} D_c (c_i ,c_j )\) and spatial distance \(D_s (l_i ,l_j )\) less than \(\gamma =5\), the edge \(e_{ij} \) is drawn to create local contrast graph. Degree vector of local graph nodes is also calculated as same as described in Sect. 2.2.2 for global contrast graph. Then, an adaptive threshold is applied as same as one used in high contrast graph for omitting small degrees.

2.3 Salient object detection

In this section, a novel approach, named integration technique, is presented as a strategy for saliency detection. In this strategy, the most salient nodes are defined. Then, according to most salient nodes, saliency values of all other nodes are assigned.

2.4 Salient node definition

The greater degrees in high contrast and local contrast graphs belong to regions which are most dominant based on global and local contrast, respectively. These regions are considered as the salient nodes and according to them, an integration technique is proposed to create saliency map.

2.4.1 Graphs integration technique

Figure 2 shows the integration technique for producing saliency map. In the first step, degree vector of most salient nodes is adopted to create the most salient map. Degree vector of most salient nodes is rescaled to [0,255]. Each rescaled degree is assigned to related region pixels as the saliency value. Second, global contrast map is generated using the degree vector of global contrast graph. Then, distance of pixels from the nearest most salient pixel is computed. If it is lower than a specific threshold r, the global contrast value is assigned as the saliency value, else it is zero. The parameter r should be big to cover all pixels (thus needing a larger r) of object and in the same time to exclude redundant (background) pixels (thus needing a smaller r).

3 Experiments and results

In order to assess the performance of the proposed method, we performed extensive experiments on three different challenging datasets MSRA10K, SED2 and ECSSD introduced as benchmark datasets [28]. The results were compared with 6 state-of-the-art saliency detection models: histogram-based contrast (HC) [29], context-aware (CA) [30], salient region detection and segmentation (AC) [31], sparse salient region (SS) [32], saliency detection by self-resemblance (SeR) [33], frequency-tuned (FT) [34]. Saliency maps of other methods are downloaded [28].

Fig. 3
figure 3

Influence of each map to create high-resolution saliency maps

Fig. 4
figure 4

The effect of maps on SED2 dataset. Left to right: ROC, precision–recall curve and precision, recall and f-measure chart

  1. 1)

    MSRA10K [28] contains 10,000 different images that cause to account as a large scaled dataset. It also covers ASD [34] dataset images. MSRA10K is enough efficient and useful to introduce as a benchmark for research on saliency detection methods.

  2. 2)

    SED2 [35] includes 100 images with two salient objects in varying sizes and locations. As mentioned on [28], the salient objects have the longest distance among 6 other datasets that used as a benchmark

  3. 3)

    ECSSD [36] has 1000 natural images with complex structure. It contains a large amount of semantic and meaningful images [28].

3.1 Performance analysis

The proposed method firstly generates global contrast map by using degree vector of global contrast graph. Next, most salient region map is created via local and high contrast graph. Then, the maps are combined to produce saliency maps by graph integration technique. According to our experiments, we choose \(r=30\) in this paper to cover pixels of object and in the same time to exclude redundant (background) pixels.

Fig. 5
figure 5

Saliency maps on different datasets. a Input image, b ground truth, ch other method saliency maps, i saliency maps of proposed method

Fig. 6
figure 6

The ROC curve and precision, recall and F-measure charts on MSRA10K (a, b), SED2 (c, d) and ECSSD (e, f)

Figure 3 shows the results of each step of the proposed method: The third column shows the effect of local and high contrast graph, salient region (SR) map, the fourth column illustrates global contrast graph, global contrast (GC) map, influence and in the last column the saliency maps as results of graph integration technique is shown. As it can be seen, applying global contrast and salient region maps and combining them with graph integration technique provide appropriate results in comparison with applying only global contrast and salient region maps and combining them with graph integration technique provide appropriate results in comparison with applying only contrast. In order to evaluate the contribution of each part of the proposed method for salient object detection, we employed receiver operating characteristic (ROC) curve, which plots the true positive rate (TPR) versus of false positive rate (FPR), and precision–recall curve which plots the precision versus of recall. The ROC curves were generated by thresholding on the obtained saliency maps. The thresholds are fixed numbers with equal step from 1 to 255. Also, means of precision, recall and f-measure were calculated. Figure 4 illustrates the ROC, precision–recall (PR) curves and precision, recall and f-measure chart for salient object detection using the proposed method (saliency maps), global contrast maps and most salient regions maps applying SED2 data set. As illustrated in Fig. 4, the precision values were more than 0.6 in the PR curve. In addition, the FPR values were less than 0.12 in ROC curve for the proposed method.

3.2 Subjective evaluation

For images with a simple background or homogenous objects, most saliency methods generated high-resolution saliency maps. However, in complicated cases with heterogeneous objects such as rows 1, 4, 9, 10 and 11 in Fig. 5, containing cluttered background (rows 1, 3, 4, 8, 9, 10 and 12) and including digits as salient objects (row 8) the proposed method removed background regions effectively and detected salient objects with precise boundaries closer to ground truth rather than other saliency detection methods. Furthermore, the proposed method highlighted both large-scale salient objects (rows 1, 2, 3, 5 and 9) and small-scale salient objects (row 7); moreover, salient objects with different color in a scene (rows 2, 7 and 11) have been detected more appropriately by the proposed method.

3.3 Objective evaluation

In order to objectively evaluate the proposed method, we adopted ROC curve, precision, recall and f-measure chart and mean absolute error (MAE). For precision, recall and f-measure chart, we obtained average over all thresholds and all images of the dataset. Original saliency maps with continuous values from 0 to 255 are dealt to compute mean absolute error. In the ROC curve of the proposed method, the TPR and the FPR pairs are so close and dense. For MSRA10K dataset, in the ROC curve (Fig. 6a) and MAE chart, the best performance is obtained by proposed method (Fig. 7). For SED2 dataset, Fig. 6 (c, d), in the ROC curve, FT and the proposed method are close and above other methods. Also, the greatest amount of FPR values for all thresholds are significantly low. For precision and f-measure, the proposed method obtained the highest values and on MAE chart yielded the lowest value (Fig. 7). On ECSSD dataset, Fig. 6 (e, f), CA and the proposed method are the highest ones. The first rank in precision and f-measure and lowest MAE also belong to proposed method (Fig. 7).

Fig. 7
figure 7

The MAE scores (from top to down) on MSRA10K, SED2 and ECSSD datasets

4 Conclusion

In this paper, a novel salient object detection method in a graph framework was developed by using three different graphs. Global contrast, high contrast and local contrast were exploited with considering the background label which was calculated by computing average over background nodes color labels. High degree nodes in local contrast and high contrast graph were adopted as the foreground nodes to detect the most salient regions. Then, an integration technique was presented to assign the saliency values of other regions. Subjective and objective evaluation on the challenging datasets demonstrated that the proposed method achieved higher salient object detection performance compared to the state-of-the-art methods and created high-resolution saliency maps.