1 Introduction

Saliency detection also plays important roles in many computer vision applications such as moving object detection [13], tracking [46], and action recognition [7, 8]. In the late 1990s, Itti proposed biological visual attention mechanism and image saliency theory based on the biology for the first time [9]. Itti’s model studied man’s visual attention transfer mechanism when people observe things in the nature. And it obtains saliency maps by using intensity, color, and direction channels of the input image [10]. Then, it fused all saliency maps and obtained ideal detection effect. In recent years, machine vision got great development and was applied to many fields of image processing, artificial intelligence, and others. At the same time, image saliency theory, as an important branch of machine vision, has made great progress. Many new detection methods explain saliency from the perspectives of biological vision, statistics, and information theory.

The paper is organized as follows. In Sect. 2, we briefly review the previous researches that have made a contribution to image saliency theories and target detection methods. Then, we carefully describe our method for image saliency target detection in Sects. 3, 4, 5, and 6. The results of our experiments could be found in Sects. 7 and 8. Finally, we summarize this paper in Sect. 9.

2 Previous related works

The existed saliency researches could be divided into two categories [11, 12]: (1) the bottom–up model based on stimulating and data-driving. Itti’s biological visual saliency model [13] detects salient object by extracting image’s primary characteristics of colors, intensity, and directions. Then, feature maps were multi-scale analyzed by establishing a nine-layer Gaussian pyramid. Finally, feature maps were obtained by using center-around operator. They were fused to obtain finally saliency map. Harel’s saliency detection was based on graph theory [14], and it used the characteristics of Markov random field to construct two-dimensional images of Markov chain on the basis of Itti model. Yang et al. [15] used background and foreground as prior knowledge and quadrilateral node as benchmark for the first round of sorting. The signification maps were binary-transformed and was the second sort benchmark. Finally, saliency map was obtained, and the significant detection model based on graph sort was set up. Hou’s spectral residual method [16] did not need to deal with the color, intensity, and other characteristics, but analyzed images logarithmic spectrum and calculated residual. Then, the final saliency maps were obtained by Fourier inverse transform. Achanta proposed saliency detection method [17] which was based on the images frequency analysis. (2) The top–down model was based on task and knowledge such as Maybank proposed image significant area matching method [18], and Yang and others obtained specific target by dictionary learning conditions and random testing method [19].

Currently, most of saliency models use early visual features to analysis images. This includes colors, texture, intensity, and directions. In this paper, we use the HSV color space to analyze visual color and intensity features. The texture map is obtained by using texture filter which can be found by a MATLAB function called rangefilt. The directions feature is obtained by wavelet transforming. Therefore, wavelet transform has the ability to extract the images’ horizontal, vertical, and diagonal direction information.

In general, the salient object usually locates in the central field of the vision when people observe the surrounding scenery. Based on this, many papers put forward center-priori method to obtain saliency objects. But, in the reality captured images, the most significant things in the off-center position even appear in the image boundary. This leads to the traditional center-priori theory which is not consistent with actual image characteristics. So, this paper gets image’s saliency points by using Harris [20] corner detection algorithm and obtains convex hull of the regional [21]. Then, the area within the convex hull is regarded as false foreground information and the outside is false background information. Finally, we could get an improved center-priori coefficient by using frequency-tuned method.

Shannon information coding theory [22] thinks that any information should contain important part and redundant part, and it is a key problem how to extract the effective information and remove redundant part in the field of modern information code. As a kind of complex information, images can be identified by people directly and divided into sensitive and insensitive parts which correspond to the significant and not-significant parts of images. Wavelet transform has good local information representation ability [23]. Especially in recent years, because of its ability of analysis on spatial and frequency domain in multi-scale transform at the same time, it is concerned by scholars [2427] in the field of visual attention. Wavelet transform can effectively suppress the image’s high-frequency information by analyzing images’ local visual features (colors, texture, directions, intensity), and reserve the significant part. We can obtain local features saliency maps by wavelet transforming. Hou’s spectral residual theory would not analyze specific features. But to get global saliency map, it is necessary for image logarithmic transformation. The two methods have their advantages and disadvantages for different types of images. Therefore, this paper combines local features and global features to get saliency map. To enhance salient detection results, we use improved center-priori knowledge based on frequency-tuned method. The experimental results show that our method is better.

3 The improved center-priori coefficient based on frequency-tuned method

Traditional center-priori theories [28] think the salient object usually located in the center of vision, as shown in Fig. 1a. Therefore, the paper regards it as priori knowledge which is simulated by Gaussian function. Gaussian model can highlight center part and suppress surrounding, as shown in Fig. 2a.

$$\begin{aligned}&G(x)=\frac{1}{\sqrt{2\pi \sigma ^{2}}}e^{-\frac{(x-\mu )^{2}}{2\sigma ^{2}}} \end{aligned}$$
(1)
$$\begin{aligned}&G(x,y)=\frac{1}{2\pi \sigma ^{2}}e^{-\frac{(x-x_0 )^{2}+(y-y_0 )^{2}}{2\sigma ^{2}}} \end{aligned}$$
(2)

Gaussian model can simulate the visual center-priori theory, but the image is two dimensional. Therefore, the two-dimensional Gaussian function is used to implement it, such as Formula (2). Gaussian model can match to the priori knowledge. It is shown in Fig. 2b.

Fig. 1
figure 1

Saliency target images

In fact, the salient region usually deviates from center in many images, as shown in Fig. 1b. Traditional center-priori knowledge and actual situation have a huge deviation, which causes detection results to deteriorate. In response to this problem, the paper proposes Harris corner detection and convex hull theory to get image salient object center coordinate. Harris corner detection algorithm calculates grayscale image curvature distribution and the maximum curvature key angular point. Then, it calculates convex hull geometric center which is regard as the center of the salient object.

Fig. 2
figure 2

Center-priori model

Fig. 3
figure 3

Improved center-priori coefficient

This paper gets key angular point and convex hull in original image, such as Fig. 3a. Then, we could obtain Fig. 3b by binary image segmenting. In Fig. 3b, the white region is convex hull area which could be regarded as false foreground; the black region is false background. We establish a two-dimensional Gaussian model by computing the center coordinates of the convex hull. The model G(xy) is preliminary center-priori coefficient. It is mapped to the two-dimensional space in Fig. 3c. Figure 3f is artificial significant target segmentation map. Compared with Fig. 3b, f, we can find that the false foreground region includes a part of the background information, but in most images, the false background region contained the real background. Based on this, compared to the enter image, the false background region having more similar visual features to the real background region. Therefore, we compute the mean false background feature vector \(I_\mu \) not the enter image and get frequency-tuned coefficient S(xy) by formula (3). \(I_{\omega hc} \) is the corresponding image pixel vector value in the Gaussian blurred version (using a 5*5 separable binomial kernel) of the original image, and \(\Vert \Vert \) is the L2 norm. The frequency-tuned coefficient is mapped to the two-dimensional space in Fig. 3d. Figure 3e is improved by center-priori coefficient \(\Theta (x,y)\) based on frequency-tuned method. It can be obtained by formula(4).

$$\begin{aligned} S(x,y)= & {} \left\| {I_\mu -I_{\omega hc} (x,y)} \right\| \end{aligned}$$
(3)
$$\begin{aligned} \Theta (x,y)= & {} e^{[g(x,y)\times S(x,y)]} \end{aligned}$$
(4)

4 Local features saliency model based on wavelet transform

Wavelet transform can transform features image from spatial domain to frequency domain. And it can break images down into high- and low-frequency parts. The paper detects features of colors, intensity, textures, and direction in HSV color space. To get multi-scale feature image, low-frequency part of each layer needs a secondary decomposition [29]. Each low-frequency feature map is dealt by inverse wavelet transform to get local feature saliency maps. (In this paper, four-layer wavelet decomposition structure is established).

$$\begin{aligned}{}[A_N^c ,H_S^c ,V_S^c ,D_S^c ]=WT_N [I_c (x)] \end{aligned}$$
(5)

The respective characteristics of image are obtained by Formula (5). Then, two-dimensional wavelet reconstruction algorithm is used to reconstruct feature maps in order to get feature saliency maps.

$$\begin{aligned} S_i (x)=g(x)*IWT[c(x)*A_S^{\mathrm{c}} (x)] \end{aligned}$$
(6)

where g(x) is Gaussian filter; c(x) is mean filter.

Finally, four kinds of feature saliency maps are fused by Formula (7) for final local feature saliency maps.

$$\begin{aligned} \mathrm{LsMap}=g(x){*}\left( \sum _{i=1}^4 {k_i {\cdot }S_i } /{\max \{S_i ,0.5\}}\right) \end{aligned}$$
(7)

5 Global features saliency model based on spectral residual

Hou’s image spectral residual analysis method is based on the properties of natural image similarity. It is a statistical method of image saliency detection. It need not analyze the features. Its core idea is to remove redundant information and enhance regions of interest (ROI) in the image. This property is also known as 1/f law. It states that the amplitude A(f) of the averaged Fourier spectrum of the ensemble of natural images obeys a distribution: \(E\{A(f)\}\propto 1/f\). Hou found the average amplitude spectrum logarithmic was local linear by studying a lot of images. The spectral residual is obtained by calculating the difference in images’ amplitude and average logarithm amplitude spectrum. Finally, the global saliency map is obtained by logarithmic inverse transformation of spectral residual [30]. Spectral residual calculation Formula:

$$\begin{aligned} R(f)=\hbox {log}[A(f)]-h_n {*}\hbox {log}[A(f)] \end{aligned}$$
(8)

where \(h_n \) is mean filter.

The steps of Hou’s saliency detection are shown as follows:

  • Step 1 The input images are Fourier-transformed, and their amplitudes A(f) are calculated.

    $$\begin{aligned} A(f)=\mathfrak {R}(F[I(x)]) \end{aligned}$$
    (9)
  • Step 2 Images’ phase spectrums P(f) are calculated.

    $$\begin{aligned} P(f)=\Gamma (F[I(x)]) \end{aligned}$$
    (10)
  • Step 3 Spectral residuals R(f) are calculated by Formula (8).

  • Step 4 Global feature maps GsMap(x) are obtained.

    $$\begin{aligned} \mathrm{GsMap}(x)=g(x)*F^{-1}[\exp (R(f)+P(f))]^{2} \end{aligned}$$
    (11)

6 The model of images saliency detection

Existed researches show [31] that in the process of biology observing around things, the images are equal bandwidth processed by retina. The researchers also found that eyeball is more sensitive to low-frequency elements. Therefore, the low-frequency elements are usually regarded as visual significant goals. The low- and high-frequency information of multi-scale image features is obtained by wavelet decomposition. Local features saliency maps are obtained by removing high-frequency information and reconstructing low-frequency images. It simulates the process of biological vision. However, wavelet transform has good local information representation capability. It is easy to cause redundant information is detected. So, we enhance the saliency target by fusing global feature saliency information and suppressing redundant information. Hou’s method is based on information theory. It uses logarithmic spectrum and some mathematical operation to analysis original images. It effectively removes the image background and obtains saliency maps. The final saliency map is obtained by fusion algorithm (13). Images’ significant detection is shown in Fig. 4. The experimental results show that the precision is higher by fusing local and global features saliency maps.

$$\begin{aligned}&\mathrm{FsMap}=M[\mathrm{LsMap}(x)\times e^{\mathrm{GsMap}(x)}]{*}g(x) \end{aligned}$$
(12)
$$\begin{aligned}&\mathrm{SaliencyMap}=\Theta _{(x)} \times F\hbox {s}Map \end{aligned}$$
(13)

FsMap is non-center-priori saliency map; SaliencyMap is the center-priori saliency map. \(M({\cdot })={({\cdot })^{\ln \sqrt{2}}}/{\sqrt{2}}\) is a fusion algorithm [23].

In general, visual focus is more close to the center of people’s view. But, due to the shooting angle or actual demand, saliency target maybe deviated from center of image and even appear in the image border area. So, the improved center-priori saliency model is built by Harris corner detection and convex hull based on the frequency-tuned method.

Fig. 4
figure 4

Model of saliency detection

7 Experiment results and analysis of the MSRA database

This paper used MSRA database [32] for different measurement model. It contains 1000 RGB images which are used to test the detection results. In addition, the database contains binary images which are marked manually. The significant target is marked as “1.” Background is marked as “0”. In addition, this paper adopts evaluation mechanism that is widely recognized: PR curve, F-Measure chart, ROC curve [33] and AUC value. This paper adopts F-Measure to evaluate the detection performance of local features saliency map, global feature saliency map, non-center-priori fusion model and center-priori model.

Fig. 5
figure 5

Feature saliency maps. a is original map. b is local saliency map. c is global saliency map. d is non-center-priori saliency map. e is final saliency map

Figure 5 contains local saliency map (LsMap), global saliency map (GsMap), saliency map (FsMap), and center-priori saliency map (SaleincyMap). Wavelet transform saliency model can effectively detect significant target and get target outline. But, at the same time, background texture was detected because of its strong local information detection ability. It reduces detection accuracy as shown in Fig. 5b. Hou’s method is almost not influenced by complex texture, but can cause significant target detection which is incomplete as shown in Fig. 5c. Therefore, we fused two kinds of feature maps by Formula (12), and the effect was shown in Fig. 5d. Fusion algorithm makes saliency maps more layered and makes it easier to distinguish between salient object and background. In order to further increase significantly the detection precision, we would enhance saliency target by adopting center-priori method based on the frequency-tuned method. The results show that the background is suppressed and the saliency target is further highlighted by adopting improved center-priori coefficient, such as Fig. 4e.

Fig. 6
figure 6

Features saliency maps PR curve

Figure 5 shows features maps for a few images, and our method is better intuitively. In order to further enhance the persuasiveness, we use 1000 images in the MSRA10K database for the testing experiments and PR curve was obtained. (PR algorithm as Formula (14), the Precision P refers to accuracy; the Recall R is recall rate, it is overlap ratio for significant regional and artificial mark target). In the process of calculation PR parameter, the paper used fixed threshold segmentation algorithm. We need to calculate a set of P, R for each threshold. Finally, the PR curve is obtained as shown in Fig. 6. It shows that our method can improve the effect of detection. The precision is improved effectively by using center-priori coefficient and histogram of the image enhancement process. In addition, the F-Measure chart is a comprehensive evaluation for PR value. We adopted the Otsu threshold segmentation algorithm [34] for getting F-Measure value. The saliency map is needed to be transformed to binary image M from grayscale image and is dealt by Formula (14) with manual marked image G. Figure 7 shows that shooting rate of our approach is higher. And, the comprehensive evaluation PR is better. The method in this paper can highlight saliency target and suppress background information, as shown in Fig 5.

$$\begin{aligned}&\hbox {Precision}=\frac{\left| {M\cap G} \right| }{\left| M \right| }\nonumber \\&\hbox {Recall}=\frac{\left| {M\cap G} \right| }{\left| G \right| },\nonumber \\&F_\mathrm{measure} =\frac{(1+\alpha )\times P\times R}{\alpha \times P+R} \end{aligned}$$
(14)

where M means saliency map to a binary mask, G means ground-truth map. The \(F_\alpha \) is proposed as a weighted harmonic mean of Precision and Recall with a nonnegative weight \(\alpha \). It is often set to 0.3 to raise more importance to the Precision value. The reason for weighting precision more than recall is that recall rate is not as important as precision.

Fig. 7
figure 7

F-Measure histogram

In order to measure the performance of the proposed model, we use ROC curve to measure Itti’s model, wavelet transform method, Hou’s spectral residual, and Achanta’ FT model. The calculations of TPR and FPR are described as Formula (15). AUC is refers to the area under ROC curve. If the AUC value is bigger, the detection result is better. So, Fig. 8 shows that our method is better, but images’ edge is blurry sometimes.

$$\begin{aligned} \mathrm{TPR}=\frac{\left| {M\cap G} \right| }{\left| M \right| } \quad \ \mathrm{FPR}=\frac{\left| {M\cap G} \right| }{\left| {M\cap G} \right| +\left| {\overline{M} \cap \overline{G} } \right| } \end{aligned}$$
(15)

where TPR is true-positive rate, FPR is false-positive rate, and AUC is area under ROC curve.

Fig. 8
figure 8

Detection effects of various models, a Input b Itti [13] c WT [23] d SR [16] e FT [17] f Ours

Figure 9 shows ROC curve of several models which shows that our method is better. The area under ROC curve of our method is maximum, which is shown in Fig. 10.

8 Experimental results and analysis for the CSSD

Although images from MSRA database have a large variety in their content, background structures are primarily simple and smooth. And, in this database, most images contain only a single salient object. To represent the situations that natural images generally fall into, we use Complex Scene Saliency Dataset (CSSD) [35]. It includes many semantically meaningful but also structurally complex images for evaluation. Figures 11, 12, and 13 are the results of the qualitative and quantitative analyses.

Fig. 9
figure 9

ROC curve of several models

Fig. 10
figure 10

AUC value of several models

Fig. 11
figure 11

Detection effects of various models, a Input b Itti [13] c WT [23] d SR [16] e FT [17] f Ours

Figure 11 shows that our model is effective for the complicated scenarios and multi-object saliency detection. Compared with other models, our method is more effective to highlight salient object and suppress background area, but also retain original images’ information instead of a vague brightness area. In addition, Figs. 12 and  13 show that our approach is better.

9 Conclusions

The paper extracts image feature and obtains low-frequency information of image features by wavelet decomposing. Then, the local features saliency maps are obtained by wavelet reconstruction. Wavelet transform has obvious advantage in detecting images texture. But, it is not effective to highlight saliency target. Hou’s method can effectively makes target prominent. But target edge texture information is lack. Therefore, we fuse the two methods to enhance the detection effects. In the end, it is conducive to further highlight the salient object and suppress background by using center-priori based on the frequency-tuned method. The final results show that our model detection effect is better.

Fig. 12
figure 12

PR curve of the several models

Fig. 13
figure 13

F-Measure histogram