Abstract
In this paper, we propose a novel multi-graph-based method for salient object detection in natural images. Starting from image decomposition via a superpixel generation algorithm, we utilize color, spatial and background label to calculate edge weight matrix of the graphs. By considering superpixels as the nodes and region similarities as the edge weights, local, global and high contrast graphs are created. Then, an integration technique is applied to form the saliency maps using degree vectors of the graphs. Extensive experiments on three challenging datasets show that the proposed unsupervised method outperforms the several different state-of-the-art unsupervised methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Studying, biking, running and most of the human activities depend on visual system concentration named visual attention. It means focusing on specific areas and ignoring other regions [1]. Expanding methods for visual attention modeling have yielded to the birth of new problems and issues. One of the popular research branches of visual attention modeling is salient object detection [2] which concentrates on important objects extraction. Salient object detection is an attractive research area in last decades due to various applications in computer vision such as visual tracking [3], video quality assessment [4], image retrieval [5], flame detection [6], motion detection [7] and superpixel generating [8].
Taking together, various types and including vast fields of mathematics are good reasons for using graphs in saliency modeling. In graph-based approaches, saliency detection is mostly performed in four main steps: 1) node set creation, 2) feature extraction, 3) graph construction and 4) saliency detection strategy. In order to produce a node set in the first step, image portioning methods such as simple linear iterative clustering (SLIC) [9], Laplacian sparse subspace clustering (LSSC) [10], blocks and other segmentation approaches [11, 12] were employed. In the second step, specific features such as global contrast [13, 14], local contrast [15, 16], spatial distribution [15, 17] and mean histogram of optical flow [18] were exploited as the inputs for next step. To form the graphs in third step, different graphs such as complete weighted graph [14, 19, 20], k-regular graph [15, 17, 18, 21] and hypergraph [22] were suggested. In the fourth step, four strategies were frequently used to determine the saliency values:
-
1-
Random walk distribution [13, 14, 16, 20, 21]: In this class, a random walker was assumed to walk on graph nodes. Then, based on equilibrium Markov chain distribution, saliency values were computed.
-
2-
Manifold ranking graphs [15, 23]: In these methods, prior background/foreground queries (nodes) were determined. Then, a ranking function was defined to rank all other nodes of the graph. Finally, saliency map was generated by using nodes ranks.
-
3-
Convex hull producing [10, 17]: Assuming several facts or applying initial process labeled some regions as background/foreground regions. Then, the convex hull was created to form saliency maps.
-
4-
Cost function optimization [22]: A cost function on the graph nodes was defined. Then, by optimizing cost function, the salient objects were detected.
In the graph-based approaches, some methods used SLIC algorithm to create a node set. Yang et al. [15] ranked the similarity of image elements based on foreground and background cues via a k-regular graph. They defined different query sets for foreground nodes and background nodes. According to these queries, other image nodes have ranked and their saliency value has defined in terms of relevance of nodes to foreground or background nodes. Fu et al. [18] proposed a salient object detection method in videos. They extracted “mean histogram of optical flow” as a feature. Next, they applied the graph-based manifold ranking approach [15] for salient object detection. Yang et al. [17] proposed three-stage salient object detection framework. They constructed an initial saliency map and then a convex hull of interest points were created to find the localization of objects. Finally, they minimized a continuous energy function via graph regularization to detect the salient object. They investigated local and global features to learn the similarity values for nodes and thereby they obtained a faithful graph affinity model. A semi-supervised manifold algorithm was applied to learn the weights, and foreground/background seeds were utilized for saliency detection. Yan et al. [23] developed a linear propagation algorithm that created the weighted graph based on similarity of regions. The border of the image was labeled as background prior, and saliency was derived by propagating labels on the graph. Jiang et al. [21] detected saliency by defining random walk prior in the complete graph and employing virtual nodes as absorbing nodes. Then, saliency map was created by normalizing the absorbed time for each node (superpixel). Roy et al. [24] suggested bottom-up saliency detection technique by extracting three low-level cues. The similarity of regions was computed by the exponential of color difference and used for weight matrix of image graph. Rarity was exploited using eigenvector of Laplacian of the graph. Spatial compactness and background prior were extracted by combining color and spatial disparity and finding Mahalanobis distance from the mean background color, respectively. Nouri et al. [25] represented a salient object detection method by introducing a global contrast graph. The global contrast was extracted to form a complete weighted graph. Then, a global contrast graph was defined by omitting high weighted edges and degree vector was computed to create saliency map.
In some other methods, LSSC was used to create graphs. Xie et al. [10] introduced LSSC algorithm for image partitioning based on extracted local features. Then, low cues were adopted on the convex hull of interest points to create the saliency map. Li et al. [22] generated superpixels by hypergraphs [12] method and modeled saliency detection task as finding salient vertices or hyperedge. They also proposed an alternative approach based on center-surround contextual contrast analysis and optimized an objective cost function of support vector machine (SVM).
As we mentioned before, some other methods produced regions by specific graph-based segmentation approach [11]. Lu et al. [26] explored the fact that stated region of the convex side of a curved boundary tends to assign an object. A hierarchical model was designed based on multi-scale analysis of human visual perception to add up concavity context in different scales. Graph-cut algorithm [27] was adopted to extract salient objects. Yu et al. [14] also introduced maximal entropy random walk prior on fully connected graphs by two different weight matrixes. They assigned region saliency values in terms of stationary probabilities.
Partitioning image by creating blocks with same size is another way to produce nodes set. Kim et al. [16] modeled eye movements as a random walk on the graph. In Random Walk with Restart (RWR) prior, high weight was assigned to an edge if its connected nodes had dissimilar intensity and color feature and if the ending node was more compact than starting node. Then, they computed stationary distribution of Markov chain as saliency map.
The following sections describe proposed method for salient object detection, experimental results and conclusion remarks, respectively.
2 Proposed method
In this paper, we propose a graph-based salient object detection method using multiple graphs. As it is shown in Fig. 1, the proposed method consists of three major stages. In the first stage, preprocessing, the input image is partitioned into superpixels to create a node set for global, local and high contrast graphs. Then, color, spatial and background features are extracted from superpixels to form color, spatial and background labels. In the second stage, three different graphs consisting of global contrast graph, local contrast graph and high contrast graph are produced by using superpixels as nodes and similarity of superpixels as edge weights. Finally, in the third stage, most salient nodes are defined based on graphs obtained in the two previous stages. Integration approach is employed to combine degree vectors of graphs to create saliency maps. The saliency value of each node is assigned to its pixels. The performance of the proposed method is evaluated in terms of resulted saliency maps on three publicly available benchmark datasets.
2.1 Preprocessing
In order to provide necessary materials for graph construction, the input image is partitioned into superpixels using SLIC [9]. SLIC is a region clustering method that adopts k-mean algorithm to create regular superpixels with precise boundary. By applying this algorithm, input image is broken into N superpixels \(V=\{v_1 ,v_2 ,...,v_N \}\) used as a node set for the graphs construction. Next, the average color of pixels forming each superpixel is transformed from RGB into CIE Lab color space due to the closeness of Lab space to human visual perception. It is considered as the color label of superpixel: \(C=\{c_1 ,c_2 ,...,c_N \}\). In addition, the spatial labels \(l_i =(x_i ,y_i )\) have been computed which are the coordinates of pixel \((x_i ,y_i )\) located in the first, left and up, in each superpixel. The spatial labels are named spatial label set: \(L=\{l_1 ,l_2 ,...,l_N \}\). Finally, all superpixels in the image border that include first row, last row, first column and last column of the image are considered as background nodes [21]. The color labels of background nodes are used to obtain background label \(B=\frac{1}{M}\mathop \sum \nolimits _{i=1}^M {c_i } \).
2.2 Graphs construction
Three different graphs are constructed to exploit specific features, where nodes set is V and edges weights set created by regions similarities. The distances between color and spatial labels are used to define regions similarities. Degree vector of each graph is computed and normalized to use as inputs for next stage (i.e., saliency detection).
2.2.1 Region similarities
The similarity between nodes \(V=\{v_1 ,v_2 ,...,v_N \}\) is defined as follows:
where \(w_{ij} \) is similarity between regions i and j, and \(D_s (l_i ,l_j )\) is Euclidean distance between spatial labels of regions i and j. The \(D_c (c_i ,c_j )\) is color distance between regions i, j with background influence obtained by \(D_c (c_i ,c_j )=(c_i -B)^{2}\times \left\| {c_i -c_j } \right\| \).
In region similarity measurement, three factors play important roles: (1) color distances, (2) spatial distances and (3) having close color label to background label. If the color distance is low, region similarity is high which makes sense. If the spatial distance is high, region similarity is also high which decreases the effect of far regions on each other, based on graph construction rules which are described in the following subsection. Finally, if the color label of a region \(c_i \) is close to background label B, the similarities between node i and other nodes, \(w_{ij} \), are increased. It decreases the influence of region in saliency detection process according to graph construction rules described in next subsections.
2.2.2 Global contrast graph
The image can be represented as a complete weighted graph \(G=(V,E)\), where nodes \(V=\{v_1 ,v_2 ,...,v_N \}\) are superpixels and E are a set of entire possible edges between nodes. The \(w_{ij} \)element of the region similarity matrix W is considered as the weight of the edge \(e_{ij} \). In this work, we used the global contrast graph [25] as a graph combined with two other graphs to create saliency maps. In order to form global contrast graph, a threshold on matrix W is defined as follows:
All edges with weights greater than the threshold \(\mathrm{Thr}_\mathrm{GCG} \) are omitted. The remaining graph is named as global contrast graph which is undirected unweighted graph. Notice that by omitting the high weights connections, the most similar and common connections are omitted and edges representing more discriminative regions are not removed; thus, nodes with more connections are more dominant in the image. Degree vector of graph nodes \(D=\{d_1 ,d_2 ,...,d_N \}\) is created by counting the number of connected edges to each node and is normalized using \(D_N =\frac{D-\min (D)}{\max (D)-\min (D)}\). Low-value degrees in \(D_N \) are regions belonged to the background. It causes having a high difference from dominant regions. In order to remove the effects of background degrees in future processing, an adaptive thresholding is adopted as \(t_{adp} =\frac{\eta }{N}\sum _{i=1}^N {D_N (i)} \), where \(\eta \) is a constant value. As mentioned in Sect. 2.2.1, high spatial distances and close color labels to background label yield high region similarity. Thus, to create global contrast graph, high weights connections are omitted. High spatial distances and close color label to background label cause fewer connections of nodes.
2.2.3 High contrast graph
The same procedure as global contrast graph construction is taken by a lower threshold on edge weights to generate high contrast graph. The lower threshold is defined as follows:
where \(\beta \) is constant. Similar to global contrast graph, degrees vector is constructed and low degrees of high contrast graph nodes are omitted by an another adaptive threshold. Then, normalization is applied as Eq. (3) and resulted degree vector is created.
2.2.4 Local contrast graph
Regions in the vicinity of sharp edges play an important role in saliency detection process. Between each pair of nodes with color distance \(D_c (c_i ,c_j )\) more than \(\lambda =\frac{1}{5}\sum _i {{\begin{array}{l} {\max } \\ j \\ \end{array} }} D_c (c_i ,c_j )\) and spatial distance \(D_s (l_i ,l_j )\) less than \(\gamma =5\), the edge \(e_{ij} \) is drawn to create local contrast graph. Degree vector of local graph nodes is also calculated as same as described in Sect. 2.2.2 for global contrast graph. Then, an adaptive threshold is applied as same as one used in high contrast graph for omitting small degrees.
2.3 Salient object detection
In this section, a novel approach, named integration technique, is presented as a strategy for saliency detection. In this strategy, the most salient nodes are defined. Then, according to most salient nodes, saliency values of all other nodes are assigned.
2.4 Salient node definition
The greater degrees in high contrast and local contrast graphs belong to regions which are most dominant based on global and local contrast, respectively. These regions are considered as the salient nodes and according to them, an integration technique is proposed to create saliency map.
2.4.1 Graphs integration technique
Figure 2 shows the integration technique for producing saliency map. In the first step, degree vector of most salient nodes is adopted to create the most salient map. Degree vector of most salient nodes is rescaled to [0,255]. Each rescaled degree is assigned to related region pixels as the saliency value. Second, global contrast map is generated using the degree vector of global contrast graph. Then, distance of pixels from the nearest most salient pixel is computed. If it is lower than a specific threshold r, the global contrast value is assigned as the saliency value, else it is zero. The parameter r should be big to cover all pixels (thus needing a larger r) of object and in the same time to exclude redundant (background) pixels (thus needing a smaller r).
3 Experiments and results
In order to assess the performance of the proposed method, we performed extensive experiments on three different challenging datasets MSRA10K, SED2 and ECSSD introduced as benchmark datasets [28]. The results were compared with 6 state-of-the-art saliency detection models: histogram-based contrast (HC) [29], context-aware (CA) [30], salient region detection and segmentation (AC) [31], sparse salient region (SS) [32], saliency detection by self-resemblance (SeR) [33], frequency-tuned (FT) [34]. Saliency maps of other methods are downloaded [28].
-
1)
MSRA10K [28] contains 10,000 different images that cause to account as a large scaled dataset. It also covers ASD [34] dataset images. MSRA10K is enough efficient and useful to introduce as a benchmark for research on saliency detection methods.
-
2)
SED2 [35] includes 100 images with two salient objects in varying sizes and locations. As mentioned on [28], the salient objects have the longest distance among 6 other datasets that used as a benchmark
-
3)
ECSSD [36] has 1000 natural images with complex structure. It contains a large amount of semantic and meaningful images [28].
3.1 Performance analysis
The proposed method firstly generates global contrast map by using degree vector of global contrast graph. Next, most salient region map is created via local and high contrast graph. Then, the maps are combined to produce saliency maps by graph integration technique. According to our experiments, we choose \(r=30\) in this paper to cover pixels of object and in the same time to exclude redundant (background) pixels.
Figure 3 shows the results of each step of the proposed method: The third column shows the effect of local and high contrast graph, salient region (SR) map, the fourth column illustrates global contrast graph, global contrast (GC) map, influence and in the last column the saliency maps as results of graph integration technique is shown. As it can be seen, applying global contrast and salient region maps and combining them with graph integration technique provide appropriate results in comparison with applying only global contrast and salient region maps and combining them with graph integration technique provide appropriate results in comparison with applying only contrast. In order to evaluate the contribution of each part of the proposed method for salient object detection, we employed receiver operating characteristic (ROC) curve, which plots the true positive rate (TPR) versus of false positive rate (FPR), and precision–recall curve which plots the precision versus of recall. The ROC curves were generated by thresholding on the obtained saliency maps. The thresholds are fixed numbers with equal step from 1 to 255. Also, means of precision, recall and f-measure were calculated. Figure 4 illustrates the ROC, precision–recall (PR) curves and precision, recall and f-measure chart for salient object detection using the proposed method (saliency maps), global contrast maps and most salient regions maps applying SED2 data set. As illustrated in Fig. 4, the precision values were more than 0.6 in the PR curve. In addition, the FPR values were less than 0.12 in ROC curve for the proposed method.
3.2 Subjective evaluation
For images with a simple background or homogenous objects, most saliency methods generated high-resolution saliency maps. However, in complicated cases with heterogeneous objects such as rows 1, 4, 9, 10 and 11 in Fig. 5, containing cluttered background (rows 1, 3, 4, 8, 9, 10 and 12) and including digits as salient objects (row 8) the proposed method removed background regions effectively and detected salient objects with precise boundaries closer to ground truth rather than other saliency detection methods. Furthermore, the proposed method highlighted both large-scale salient objects (rows 1, 2, 3, 5 and 9) and small-scale salient objects (row 7); moreover, salient objects with different color in a scene (rows 2, 7 and 11) have been detected more appropriately by the proposed method.
3.3 Objective evaluation
In order to objectively evaluate the proposed method, we adopted ROC curve, precision, recall and f-measure chart and mean absolute error (MAE). For precision, recall and f-measure chart, we obtained average over all thresholds and all images of the dataset. Original saliency maps with continuous values from 0 to 255 are dealt to compute mean absolute error. In the ROC curve of the proposed method, the TPR and the FPR pairs are so close and dense. For MSRA10K dataset, in the ROC curve (Fig. 6a) and MAE chart, the best performance is obtained by proposed method (Fig. 7). For SED2 dataset, Fig. 6 (c, d), in the ROC curve, FT and the proposed method are close and above other methods. Also, the greatest amount of FPR values for all thresholds are significantly low. For precision and f-measure, the proposed method obtained the highest values and on MAE chart yielded the lowest value (Fig. 7). On ECSSD dataset, Fig. 6 (e, f), CA and the proposed method are the highest ones. The first rank in precision and f-measure and lowest MAE also belong to proposed method (Fig. 7).
4 Conclusion
In this paper, a novel salient object detection method in a graph framework was developed by using three different graphs. Global contrast, high contrast and local contrast were exploited with considering the background label which was calculated by computing average over background nodes color labels. High degree nodes in local contrast and high contrast graph were adopted as the foreground nodes to detect the most salient regions. Then, an integration technique was presented to assign the saliency values of other regions. Subjective and objective evaluation on the challenging datasets demonstrated that the proposed method achieved higher salient object detection performance compared to the state-of-the-art methods and created high-resolution saliency maps.
References
Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. 35, 185–207 (2013)
He, C., Chen, Z., Liu, C.: Salient object detection via images frequency domain analyzing. Signal Image Video Process. 10, 1295–1302 (2016)
Chattopadhyay, C., Das, S.: Prominent moving object segmentation from moving camera video shots using iterative energy minimization. Signal Image Video Process. 9, 1927–1934 (2015)
Qi, F., Zhao, D., Fan, X., Jiang, T.: Stereoscopic video quality assessment based on visual attention and just-noticeable difference models. Signal Image Video Process. 10, 737–744 (2016)
Le, H.T., Urruty, T., Gbèhounou, S., Lecellier, F., Martinet, J., Fernandez-Maloigne, C.: Improving retrieval framework using information gain models. Signal Image Video Process. 11(2), 309–316 (2016)
Liu, Z.-G., Yang, Y., Ji, X.-H.: Flame detection algorithm based on a saliency detection technique and the uniform local binary pattern in the YCbCr color space. Signal Image Video Process. 10, 277–284 (2016)
Bhattacharya, S., Venkatsh, K.S., Gupta, S.: Background estimation and motion saliency detection using total variation-based video decomposition. Signal Image Video Process. 11, 113–121 (2017)
Xu, L., Zeng, L., Wang, Z.: Saliency-based superpixels. Signal Image Video Process. 8, 181–190 (2014)
Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., Süsstrunk, S.: SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Trans. Pattern Anal. Mach. Intell. 34, 2274–2281 (2012). https://doi.org/10.1109/TPAMI.2012.120
Xie, Y., Lu, H., Yang, M.: Bayesian saliency via low and mid level cues. IEEE Trans. Image Process. 22, 1–1 (2012)
Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient graph based image segmentation. Int. Conf. Comput. Vis. 59, 167–181 (2004)
Zhou, D., Huang, J., Schölkopf, B.: Learning with hypergraphs: clustering, classification, and embedding. In: Advances Neural Information Processing Systems (2006)
Kim, Hansang, Kim, Youngbae, Sim, Jae-Young, Kim, Chang-Su: Spatiotemporal saliency detection for video sequences based on random walk with restart. IEEE Trans. Image Process. 24, 2552–2564 (2015)
Yu, J., Member, S., Zhao, J., Tian, J., Tan, Y.: Maximal entropy random walk for region-based visual saliency. IEEE Trans. Cybern. 44, 1661–1672 (2014)
Yang, C., Zhang, L., Lu, H., Ruan, X., Yang, M.H.: Saliency detection via graph-based manifold ranking. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3166–3173 (2013)
Kim, C., Kim, J., Sim, J.: Multiscale saliency detection using random walk with restart. IEEE Trans. Circuits Syst. Video Technol. 24, 198–210 (2014)
Yang, C., Zhang, L., Lu, H.: Graph-regularized saliency detection with convex-hull-based center prior. Signal Process. Lett. IEEE. 20, 637–640 (2013)
Fu, K., Gu, I.Y.H., Yun, Y., Gong, C., Yang, J.: Graph construction for salient object detection in videos. In: 2014 22nd International Conference on Pattern Recognition, pp. 2371–2376 (2014)
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Advances in Neural Information Processing Systems, pp. 545–552 (2006)
Gopalakrishnan, V., Hu, Y., Rajan, D.: Random walks on graphs for salient object detection in images. IEEE Trans. Image Process. 19, 3232–3242 (2010)
Jiang, B., Zhang, L., Lu, H., Yang, C., Yang, M.-H.: Saliency detection via absorbing markov chain. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 1665–1672 (2013)
Li, X., Li, Y., Shen, C., Dick, A., Van Den Hengel, A.: Contextual hypergraph modeling for salient object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3328–3335 (2013)
Yan, Y., Jin, Z., Zhou, J., Gao, S.: Saliency detection framework via linear neighbourhood propagation. IET Image Process. 8, 804–814 (2014)
Roy, S., Das, S.: Saliency detection in images using graph-based rarity, spatial compactness and background prior. In: Computer Vision Theory and Applications (VISAPP), International Conference on 2014 Jan 5, vol. 1, pp. 523–530. IEEE (2014).
Nouri, F., Kazemi, K., Danyali, H.: Salient object detection via global contrast graph. In: 2015 Signal Processing and Intelligent Systems Conference (SPIS), pp. 159–163. IEEE (2015)
Lu, Y., Zhang, W., Lu, H., Xue, X.: Salient object detection using concavity context. In: Proceedings of IEEE International Conference on Computer Vision, pp. 233–240 (2011)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22, 888–905 (2000)
Borji, A., Cheng, M.-M., Jiang, H., Li, J.: Salient object detection: a benchmark. IEEE Trans. Image Process. 24, 5706–5722 (2015)
Cheng, M.-M., Mitra, N.J., Huang, X., Torr, P.H.S., Hu, S.-M.: Global contrast based salient region detection. IEEE Trans. Pattern Anal. Mach. Intell. 37, 569–582 (2015)
Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach. Intell. 34, 1915–1926 (2012)
Achanta, R., Estrada, F., Wils, P., Süsstrunk, S.: Salient region detection and segmentation. In: Computer Vision Systems. pp. 66–75. Springer, Berlin (2008)
Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34, 194–201 (2012)
Seo, H.J., Milanfar, P.: Static and space-time visual saliency detection by self-resemblance. J. Vis. 9, 15 (2009)
Achanta, R., Hemami, S., Estrada, F., Susstrunk, S.: Frequency-tuned salient region detection. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597–1604. IEEE (2009)
Alpert, S., Galun, M., Basri, R., Brandt, A.: Image segmentation by probabilistic bottom-up aggregation and cue integration. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)
Shi, J., Yan, Q., Xu, L., Jia, J.: Hierarchical image saliency detection on extended CSSD. IEEE Trans. Pattern Anal. Mach. Intell. 38, 717–729 (2016)
Acknowledgements
This work was supported by the Cognitive Science and Technology Council (CSTC) of Iran under the grant number 3232.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nouri, F., Kazemi, K. & Danyali, H. Salient object detection using local, global and high contrast graphs. SIViP 12, 659–667 (2018). https://doi.org/10.1007/s11760-017-1205-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-017-1205-5