1 Introduction

Visual saliency detection is the process of identifying the most conspicuous locations of objects in images and has become a hot and active research area in recent years. Serving as a pre-processing step, it can efficiently focus on the interesting image regions or objects related to the current task and broadly applied to various computer vision tasks, such as image classification/retrieval, image retargeting, video coding [35, 36], vision tracking, video processing [37], visual motion estimation [38], etc.

Human can accurately identify the salient regions or objects in image scenes. Simulating such an ability of the human vision attention system in computer vision is critical for many real-world multimedia applications. The key issue to simulate human visual system (HVS) is the distinctness of images, which is standing out from their neighbors and may cause human visual stimuli [14]. Recently, many research papers have been published on simulating such intelligent behavior in human visual system (HVS). However, effectively and accurately modeling the process of salient-object detection is a tough task and still remains a challenge right now, especially when no prior knowledge is available, or without any context for images.

In this paper, a novel saliency-detection framework based on a bottom-up mechanism, integrating Quaternionic Distance Based Weber Local Descriptor (QDWLD), center prior and color cues, is proposed for saliency detection. This proposed model includes quaternion number system and object cues, which aims to simulate how humans process the visual stimuli in the detection saliency process. In the proposed method, we first computed the QDWLD, which was initially designed for detecting outliers in color images, to represent the directional cues in an image. Secondly, we incorporated the Convex-Hull-Based center method to measure the location of the salient objects and applied color contrast to estimate the color saliency. Finally, these three different maps are combined to represent the most important stimuli for saliency detection. In order to evaluate the performance of our proposed method, we carried out extensive experiments on two publicly available and widely used datasets, and the comparison results with other state-of-the-art saliency-detection algorithms show that the proposed approach is effective and efficient for saliency detection.

A preliminary conference version appeared in [15]. In contrast to the preliminary work in [15], we adopt a more precise center cue estimation algorithm, called Convex-Hull-Based center estimation, to compute the spatial position of the salient object more accurately than the preliminary work [15]. In addition, we have analyzed the proposed model and the individual contributions of the Quaternionic Distance Based Weber Local Descriptor (QDWLD), center prior, and color prior, which show that the new proposed saliency model can generate more accurate results than the preliminary model. At last, we also performed extensive experiments on another widely used database, comparing with the other eight state-of-the-art saliency-detection methods, to show our method is efficient and effective.

The remainder of the paper is organized as follows. In Section 2, we will introduce the related work about the saliency detection. In Section 3, we will present the proposed framework in detail. Experimental results are presented in Section 4. The paper closes with a conclusion and discussion in Section 5.

2 Related work

Generally, there are two approaches to modeling the saliency-detection mechanism: the top-down and the bottom-up algorithms. Currently, most of the existing visual detection modeling methods [1, 5, 7, 10, 12, 13, 22, 24, 41] are based on the bottom-up computational scheme, which uses low-level visual features in images, such as intensity, color, contrast, and orientations, to achieve saliency detection. A pioneer work of saliency detection was proposed by Itti et al. [13]. In the model, a final saliency map, to represent the saliency of each pixel in an image, is produced by using three individual feature maps: the color, intensity, and orientation maps. In [24], an alternative local contrast analysis approach was proposed for calculating saliency maps based on a fuzzy growth algorithm for saliency-attention detection. Koch and Perona [10] proposed a model, namely Graph-Based Visual Saliency (GBVS), which aims to fuse different feature maps so as to strengthen those remarkable regions in an image. In [41], a global contrast method, using the luminance channel, was proposed to estimate the salient regions in an image. Liu et al. [22] designed a saliency detection task as an image segmentation problem by segmenting salient objects from input images. In this method, the multi-scale contrast, the center-surround histogram, and the color spatial distribution are simultaneously considered for detecting a salient object. Hou and Zhang proposed a spectral-residual algorithm by analyzing the log-spectrum of an image to estimate the corresponding saliency map [12]. In [1], a frequency-tuned model for saliency detection was proposed by calculating the color difference from the average color in an image. Goferman et al. [7] proposed a method based on local contrast and visual organization rules to detect salient objects. Chen et al. [5] proposed a histogram-based contrast (HC) and a region-based contrast (RC) method for saliency detection. In [14], a visual-patch, attention-aware saliency-detection model was proposed. Later, Zhong et al. proposed a generalized nonlocal mean framework with object-level cues for saliency detection in [43]. A spatio-temporal saliency detection method based on the amplitude or phase spectrum of the Fourier transform of an image in frequency domain was proposed in [8]. In [23], a Gaussian-based saliency detection approach was proposed for salient object segmentation. In [25], an unsupervised model was proposed to estimate the pixel-wise saliency in an image. In [45], a saliency detection method based on linear neighbourhood propagation was proposed. Li et al. designed a salient object detection model by combing meanshift filtering with colour information in an image [20]. Recently, Song et al. [29] proposed a RGBD co-saliency method via bagging-based clustering to detect the salient objects in an image. Zhang and Sclaroff designed a boolean-map based method for saliency detection in [42]. In [31], a novel context features auto-encoding algorithm based on regression tree was proposed to handle numerical features and select effective features. In [30], a multistage saliency detection framework, which is based on multilayer cellular automata (MCA), was proposed to detect the saliency of images. A novel saliency-detection model based on probabilistic object boundaries was proposed in [19]. Li et al. [21] presented an effective deep neural network framework embedded with low-level features (LCNN) for salient object detection. In [44], Zhong et al. proposed a novel video saliency detection model based on perceptional orientation to detect the attended regions in video sequences.

In contrast with the bottom-up visual-attention model, which is driven by low-level visual features, the top-down method is based on the high-level cues of images and usually requires prior knowledge and context-aware understanding [27]. In general, the top-down model is task dependent or application oriented, therefore it is difficult to refine the attention targets in an image. Furthermore, the top-down mechanism usually needs supervised learning and is lacking in extendibility and scalability, so not much work has been proposed, based on the top-down model. In [24], a top-down saliency method by using the global scene configuration was proposed for saliency detection. The works in [11], a novel saliency-detection model was proposed to learn the intra-class association between the exemplars and query objects. Rahman et al. [28] proposed a top-down contextual weighting (TDCoW) saliency model, which incorporates high-level knowledge of the gist context of images, to appropriate weights to the features. In [39], a novel top-down saliency model, which learns a Conditional Random Field (CRF) and a visual dictionary was proposed for saliency detection. The shortcoming of top-down methods is lack of expandability and scalability, and also takes long time to train the corresponding parameters in the models.

According to the attention mechanism of human visual systems (HVS), directional information of an image is an important cue for saliency detection. Relative to the previously preliminary work [15], this paper introduces a more precise center cue to produce better saliency estimation. In this paper, an efficient saliency-detection method is designed by utilizing both visual-directional stimuli and two low-level-object cues. The proposed model, which is independent of image contents and scenes, is biologically plausible, and the directional cues can be utilized as neuronal visual stimuli for HVS to detect salient objects. Thus, the main contributions of our proposed saliency detection model are listed as follows:

  • We utilized the Quaternionic Distance Based Weber Local Descriptor (QDWLD), which was initially designed for detecting outliers in color image, to represent the directional cues in an image for saliency detection. To the best of our knowledge, it is the first time to incorporate the QDWLD into saliency detection issue.

  • We applied a simple but powerful algorithm to estimate the location of the salient objects, which is more precise than the traditional center prior estimation method.

  • We further employed the color contrast to define the color saliency of an image, which plays an important role in related saliency detection tasks.

3 The proposed method

In this section, we present a novel framework, which incorporates the Quaternionic Distance Based Weber Local Descriptor (QDWLD) with object cues, to simulate saliency detection. QDWLD will first be described, followed by the object cues based on the center prior and color prior. All these different types of information are fused to form a saliency map.

3.1 Quaternionic distance based weber local descriptor (QDWLD)

Neurobiological studies have indicated that directional information of an image, such as edges and outliers, plays an important role in HVS for saliency detection [14]. In our proposed framework, QDWLD, which was initially proposed for detecting the outliers and edges in an image [18], is utilized to represent the directional cues for the HVS to detect saliency. QDWLD has also been applied to texture classification and face recognition [18]. In this section, we will briefly introduce QDWLD, which is utilized to represent the directional cues in our method.

According to quaternion algebra, a quaternion \( \dot{q} \) is made up of one real part and three imaginary parts as follows:

$$ \dot{q}=a+ib+jc+kd, $$
(1)

where a , b , c , d ∈ ; i , j , k are complex operators; a is the real part; and {ib, jc, kd} are the imaginary parts. In the polar form with \( S\left(\dot{q}\right)=a \) and \( V\left(\dot{q}\right)=ib+jc+kd \), the quaternion \( \dot{q} \) can be rewritten as:

$$ \dot{q}=\mid \dot{q}\mid {\mathrm{e}}^{\dot{u}\theta }=\mid \dot{q}\mid \left( \cos \theta +\dot{u} \sin \theta \right), $$
(2)

where \( \dot{u}=V\left(\dot{q}\right)/\mid V\left(\dot{q}\right)\mid \) and \( \theta ={ \tan}^{-1}\left(|V\left(\dot{q}\right)|/S\left(\dot{q}\right)\right) \) are the eigenaxis and phase of \( \dot{q} \), respectively. In [18], \( V\left(\dot{q}\right) \) is used to encode a RGB image as follows:

$$ \dot{Q}\left(x,y\right)=iR\left(x,y\right)+jG\left(x,y\right)+kB\left(x,y\right), $$
(3)

where \( \dot{Q}\left(x,y\right) \) is the Quaternionic Representation (QR) of the pixel at location (x, y), and R(x, y) ,  G(x, y), and B(x, y) are the red, green, and blue components in a color image, respectively.

By considering two pixels in a color image, \( {\dot{q}}_1={r}_1i+{g}_1j+{b}_1k \) and \( {\dot{q}}_2={r}_2i+{g}_2j+{b}_2k \), different types of distances can be defined to measure the distance of quaternons. Let \( {D}_t\left({\dot{q}}_1,{\dot{q}}_2\right) \) represent the t th type of QR of quaternions \( {\dot{q}}_1 \) and \( {\dot{q}}_2 \). Since the modulus of a quaternion \( \dot{q} \) is nonnegative, the quaternionic distance (QD), denoted as \( {D}_1\left({\dot{q}}_1,{\dot{q}}_2\right) \), can be defined as follows:

$$ {D}_1\left({\dot{q}}_1,{\dot{q}}_2\right)=\mid {\dot{q}}_1-{\dot{q}}_2\mid . $$
(4)

Let \( \dot{\rho}={e}^{\frac{\pi }{4}\dot{u}} \). By defining the intensity and the chromaticity components of \( {\dot{q}}_1 \) as \( \left(\dot{\rho}{\dot{q}}_1{\dot{\rho}}^{\ast }+{\dot{\rho}}^{\ast }{\dot{q}}_1\dot{\rho}\right)/2 \) and \( \left(\dot{\rho}{\dot{q}}_1{\dot{\rho}}^{\ast }-{\dot{\rho}}^{\ast }{\dot{q}}_1\dot{\rho}\right)/2 \), respectively, Quaternionic distances \( {D}_2\left({\dot{q}}_1,{\dot{q}}_2\right) \) and \( {D}_3\left({\dot{q}}_1,{\dot{q}}_2\right) \) can be expressed as follows [3]:

$$ {D}_2\left({\dot{q}}_1,{\dot{q}}_2\right)=\frac{1}{2}\mid \left(\dot{\rho}{\dot{q}}_1{\dot{\rho}}^{\ast }+{\dot{\rho}}^{\ast }{\dot{q}}_1\dot{\rho}\right)-\left(\dot{\rho}{\dot{q}}_2{\dot{\rho}}^{\ast }+{\dot{\rho}}^{\ast }{\dot{q}}_2\dot{\rho}\right)\mid, $$
(5)
$$ {D}_3\left({\dot{q}}_1,{\dot{q}}_2\right)=\frac{1}{2}\mid \left(\dot{\rho}{\dot{q}}_1{\dot{\rho}}^{\ast }-{\dot{\rho}}^{\ast }{\dot{q}}_1\dot{\rho}\right)-\left(\dot{\rho}{\dot{q}}_2{\dot{\rho}}^{\ast }-{\dot{\rho}}^{\ast }{\dot{q}}_2\dot{\rho}\right)\mid . $$
(6)

In consideration of the differences between the intensity and chromaticity components simultaneously, the quaternionic distance \( {D}_4\left({\dot{q}}_1,{\dot{q}}_2\right) \) is defined as follows [6]:

$$ {D}_4\left({\dot{q}}_1,{\dot{q}}_2\right)={D}_2\left({\dot{q}}_1,{\dot{q}}_2\right)+{D}_3\left({\dot{q}}_1,{\dot{q}}_2\right). $$
(7)

Let \( \dot{u}=\left(i+j+k\right)/\sqrt{3} \), which represents an axis in RGB space with values R = G = B. In [16], by rotating one quaternion towards the gray line \( \dot{u} \), a quaternion \( {\dot{q}}_3 \) should be close to the gray line. If \( {\dot{q}}_1 \) and \( {\dot{q}}_2 \) are closed to each other, \( {\dot{q}}_3={\dot{q}}_2=u{\dot{q}}_1{u}^{\ast }={r}_3i+{g}_3j+{b}_3k \). Then, the quaternionic distance \( {D}_5\left({\dot{q}}_1,{\dot{q}}_2\right) \) can be expressed as follows:

$$ {D}_5\left({\dot{q}}_1,{\dot{q}}_2\right)=\mid \left({r}_3-\mu \right)i+\left({g}_3-\mu \right)j+\left({b}_3-\mu \right)k\mid, $$
(8)

where μ = r 3 + g 3 + b 3/3.

By adding the luminance component into the quaternionic distance \( {D}_5\left({\dot{q}}_1,{\dot{q}}_2\right) \), a new quaternionic distance \( {D}_6\left({\dot{q}}_1,{\dot{q}}_2\right) \) is proposed in [17], as follows:

$$ {D}_6\left({\dot{q}}_1,{\dot{q}}_2\right)=\lambda {D}_5\left({\dot{q}}_1,{\dot{q}}_2\right)+\left(1-\lambda \right)\mid L\left({\dot{q}}_1,{\dot{q}}_2\right)\mid . $$
(9)

where \( L\left({\dot{q}}_1,{\dot{q}}_2\right)={k}_1\left({r}_2-{r}_1\right)+{k}_2\left({g}_2-{g}_1\right)+{k}_3\left({b}_2-{b}_1\right) \) is the luminance difference, λ is the weight to balance the difference between the chromaticity and luminance component, and k 1, k 2 and k 3 are the weights of the different color channels to luminance.

Extensive analysis and intensive discussions of the above quaternionic distances are not provided because it is beyond the scope of this paper. It should be noted that Lan et al. [18] gave a proof that the quaternionic distances \( {D}_3\left({\dot{q}}_1,{\dot{q}}_2\right) \) and \( {D}_5\left({\dot{q}}_1,{\dot{q}}_2\right) \), which were defined from different viewpoints, have an equal relationship, i.e. \( {D}_3\left({\dot{q}}_1,{\dot{q}}_2\right)={D}_5\left({\dot{q}}_1,{\dot{q}}_2\right) \).

In order to ensure the derived features are robust and discriminative, we take the Weber’s law (WL) into consideration. The essence of Weber’s law, which assumed that the ratio between the smallest perceptual change in a stimulus (ΔI min) and the background level of the stimulus I is a constant, is proposed by the German physiologist Ernst Weber. The Weber’s law can be defined as follows:

$$ \frac{\Delta {I}_{\min }}{I}=u $$
(10)

where u is called Weber fraction and is a constant.

Inspiring by the Weber’s law, the above-described quaternionic distances, which can be defined as the increments between two quaternions, can also be used to measure the similarity between different quaternions from different perspective viewpoints. That is to say, the quaternionic increment between quaternions can be computed by using quaternionic distances (QDs). By integrating the quaternionic distances (QDs) and Weber’s law, the Quaternionic Distance Based Weber Local Descriptor (QDWLD) can be derived for image feature extraction. Assume that \( {\dot{q}}_c \) denotes the center quaternion in a local patch, and \( {\dot{q}}_l \) (l ∈ L, where L is the index set) represents the residual quaternions in the patch. Let \( {\xi}_2^t\left({\dot{q}}_c\right) \) denote the differential feature of QDWLD defined by D t . Then, the total quaternionic increment in a local patch can be written as \( {\sum}_{l=0}^{l=7}{D}_t\left({\dot{q}}_c,{\dot{q}}_l\right) \). With the aid of \( \mid {\dot{q}}_c\mid \) as the quaternionic intensity, \( {\xi}_2^t\left({\dot{q}}_c\right) \) can be represented as follows:

$$ {\xi}_2^t\left({\dot{q}}_c\right)= \arctan \left(\frac{\sum_{l\in L}{D}_t\left({\dot{q}}_c,{\dot{q}}_l\right)}{\mid {\dot{q}}_c\mid}\right). $$
(11)

The nonlinear mapping, arctan(⋅), aims to enhance \( {\xi}_2^t\left({\dot{q}}_c\right) \) to become more robust [9, 18]. Actually, Eq. (10) is the Euclidean distance between the center quaternion \( {\dot{q}}_c \) and the original point in the color space. With the differential features \( {\xi}_2^t\left({\dot{q}}_c\right) \) of QDWLD produced by using differential D t  (t = 1, 2, 3, 4, 6), the increments in all directions can be contained.

Figure 1a shows an example of the QDWLD features for an input color image. Figure 1b-f are the differential feature maps \( {\xi}_2^t\left({\dot{q}}_c\right) \) of QDWLD obtained by using D t  (t = 1, 2, 3, 4, 6). The image generated by the 5th QD is not given since it is equivalent to the 3rd QD. As shown in Fig. 1b-f, these differential features can be utilized to reflect the directional information of an image. Then, we normalize these directional maps and add them to form an integrated holistic directional map Dir. It should be noted that we consider the five different quaternionic distances (QDs) have the equal weights and the final integrated directional map is the mean of these different feature maps of QDWLD obtained by using different D t  (t = 1, 2, 3, 4, 6). Figure 1g shows the fusion of the different directionality maps to form an integrated directional map, which can be utilized for saliency detection.

Fig. 1
figure 1

An example of the QDWLD features. a An input color image. (bf) The differential feature maps of QDWLD produced by different QDs. g The corresponding integrated directional map

3.2 Object cues – center prior

Spatial position is also a useful cue for saliency detection. Some pristine center priors proposed in [2, 7, 22, 24] are generally based on a Gaussian distribution from the center of an image. These typical center cues can strengthen the saliency-detection power in human-captured photographs. However, in most circumstances, a salient object may be far away from the image center, and it will then be affected or disabled. Considered the real situation for spatial position, an improved method by choosing the N top most bounding boxes detected in an image [4, 43], which can estimate the object spatial position slight more precisely than the traditional center prior, was proposed in [43] and also employed in [15]. However, extensive experimental results show that although the improved method [15, 43] can produce object spatial position more efficiently than the traditional center prior, it still far from estimating the accurate object spatial position, as shown in Fig. 2b.

Fig. 2
figure 2

Comparison of center priorsestimation using different methods. a An input color image. b The method in [15, 43]. c The Convex-Hull-Based center prior

To alleviate this problem, we introduce and employ the Convex-Hull-Based center prior instead of the previous one [15, 43]. The convex hull is proposed in [32, 33, 40] in order to improve the inference of Bayesian saliency model, which achieves a robust and promising performance. The main advantage is that it uses the convex hull of interest points to estimate the center of the salient object rather than directly use the image center which makes the saliency result more robust to the location of objects. In this paper, we first compute a convex hull which encloses interesting points to orientate the location of salient region. Then, we regard the centroid of the convex hull as the center to achieve the convex-hull-based center prior.

The saliency of a pixel can be defined as follows:

$$ {Cen}_i= \exp \left(-\frac{\mid \mid {x}_i-{x}_0\mid \mid }{2{\psi}_x^2}-\frac{\mid \mid {y}_i-{y}_0\mid \mid }{2{\psi}_y^2}\right), $$
(12)

where (x 0, y 0) is the center of the pixel i, x i and y i are the mean horizontal and vertical positions of the pixel i respectively, ψ x and ψ y denote the horizontal and vertical variances. We use a centered anisotropic Gaussian distribution to model the center prior. So, we set ψ x  = ψ y with pixel coordinates normalized to [0, 1] in our implementation. The convex-hull-based center prior map is more reasonable and robust owing to the convex hull provides a rough location of the salient object. With this reliable center prior, the object’s spatial position can be estimated more accurately than those traditional center priors, as shown in Fig. 2c.

3.3 Object cues – color prior

Color is another important cue for saliency detection [5, 14, 43]. For each patch v i , its feature vector c = c i  , 1 ≤ i ≤ n, is produced, based on its mean color in a color space [43]. A global color contrast with its spatial contrast can be expressed as follows:

$$ {Cor}_i=\sum_{j=1}^n\phi \left({v}_j\right)\varphi \left({v}_i,{v}_j\right)\mid \mid {c}_i-{c}_j\mid \mid i=1,2,\dots, n, $$
(13)

where ϕ(v j ) is the total number of pixels in v j . The idea is that the contribution of those regions with more pixels should be larger than those with fewer pixels. The local contrast parameter φ(v i , v j ) is set to exp(−d(v i , v j )/δ 2), and d(v i , v j ) is the Euclidean distance between the patch pair (v i , v j ).

3.4 Final saliency fusion

In the final stage of our algorithm, the integrated holistic directional map Dir is fused with the object cues of the center prior map, Cen, and the color prior map, Cor, to produce the final saliency map as follows:

$$ Sal=\begin{array}{ccccc}\hfill Dir\hfill & \hfill \cdot \ast \hfill & \hfill Cen\hfill & \hfill \cdot \ast \hfill & \hfill Cor,\hfill \end{array} $$
(14)

where ⋅∗ is the pixel-by-pixel multiplication operator.

4 Experimental results

In this section, we present experimental results based on two publicly available benchmark datasets: MSRA [26] and ECSSD [34], and compare our method with other eight state-of-the-art methods: including the Itti [13], fuzzy growing (FG) [24], graph-based (GB) [10], multi-scale contrast MC [22], spectral residual (SR) [12], linear neighbourhood propagation (LNP) [45], meanshift filtering (MF) [20], multi-scale contrast and colour information (MCI) [46]. In order to quantitatively compare the state-of-the-art saliency-detection methods, the average precision, recall, and F ‐ measure are utilized to measure the quality of the saliency maps. The adaptive threshold is twice the average value of the whole saliency map to get the accurate results. Each image is segmented with superpixels and masked out when the mean saliency values are lower than the adaptive threshold. The F ‐ measure is defined as follows:

$$ {F}_{\beta }=\frac{\left(1+\beta \right)\times \mathrm{Precision}\times \mathrm{Recall}}{\beta \times \mathrm{Precision}+\mathrm{Recall}} $$
(15)

where β is a real positive value and is set at β =0.3 according to [5, 34].

4.1 MSRA database

We first evaluate our proposed algorithm on the MSRA database. It contains 5000 images with pixel- wise ground truth, which is widely used for saliency detection and most of the images include only one salient object with high contrast to the background.

Figure 3 shows some saliency detection results based on MSRA database. It can be seen that the proposed scheme is able to produce reliable and promising saliency-detection results. We also compared the performance of the proposed method with other eight state-of-the-art saliency detection methods. Figure 4 shows the precision, recall and the F ‐ measure values of all the different methods. From the comparison, we can see that most have results higher than 55%, namely FG [24], GB [10], MC [22], SR [12], LNP [45], MF [20], MCI [46] and our proposed algorithm. For the overall F ‐ measure, all the methods, except Itti [13], are higher than 50%, and our proposed model achieves the highest performance according to the overall among the nine different state-of-the-art saliency-detection models. Our proposed method outperforms the other eight methods in terms of detection accuracy. The extensive experiment results show that our proposed model is efficient and accurate.

Fig. 3
figure 3

Some distinctive saliency-detection results using our method based on the MSRA database

Fig. 4
figure 4

Comparison of different saliency-detection methods in terms of average precision, recall, and F ‐ measure on the MSRA dataset. Itti [13] FG [24] GB [10] MC [22] SR[12] LNP[45] MF[20] MCI[46] Proposed

4.2 ECSSD database

We also test the proposed model on the openly available ECSSD database [34], which includes many semantically meaningful but structurally complex images for performance evaluation. There are 1000 images in this database. The images are acquired from the internet and 5 helpers were asked to produce the ground truth masks.

Figure 5 shows some results of saliency maps generated by our method based on the ECSSD database. The results show that our final saliency maps can accurately detect almost entire salient objects and preserve the salient object’s contours clearly. Similar to the experiments on the MSRA database, we also used the precision, recall and the F ‐ measure to evaluate the performance of our proposed method. Figure 6 shows the comparisons of different methods according to different evaluation criterions (precision, recall and the F ‐ measure). As can be seen from Fig. 6, our proposed method outperforms the other eight methods in terms of detection accuracy and the proposed method achieves the best overall saliency-detection performance (with precision = 62.0%, recall = 63.0%), and the F ‐ measure is 62.2%. The experiment results show that the proposed model is efficient and effective.

Fig. 5
figure 5

Samples of the saliency detection results based on ECSSD database

Fig. 6
figure 6

Performance of the proposed model compared with state-of-the-art methods on ECSSD dataset in terms of average precision, recall, and F ‐ measure

4.3 Model analysis and limitation

In this subsection, we provide the model analysis to analyze the performance of each individual component in the proposed scheme. Figure 7a shows the performance of each step in our proposed algorithm, we can see that the components of Quaternionic Distance Based Weber Local Descriptor (QDWLD), center prior, and color prior are all contributed to our final saliency maps and are complementary for each other. In Fig.7b, we also present and analyze the results of the proposed method according to different quaternionic distances (QDs). From the comparison, we can see that the individual quaternionic distance has its own contributory effect on the final saliency detection results. In order to achieve a robust and better performance, the fusion of QDWLD from different quaternionic distances (QDs) is used to represent the final directional cues in an image for saliency detection.

Fig. 7
figure 7

Model analysis. a shows performance of each component in the proposed method, (b) shows the results of our method in different quaternionic distance. Both of them are evaluated on the MSRA dataset

5 Conclusion and discussion

In this paper, we have proposed a new bottom-up method for efficient and accurate saliency detection. In the proposed approach, the integrated holistic directional map generated by QDWLD and the object cues are utilized to estimate the final saliency map. We have performed our method on two publicly available datasets, and experimental results show that our algorithm is effective and efficient.