Keywords

1 Introductions

Visual object tracking has always been a challenging work especially in the sequences with the deformation and rotation. The MS tracking has an outstanding performance and is easy to implement. It tracks by minimizing the Bhattacharyya distance between two probability density functions represented by a target and target candidate histograms. The histogram is a statistical feature that does not depend on the spatial structure within the search window. This makes it more robust than other algorithms. But it lacks the essential template update and pixel’s neighborhood information leading to a worse accuracy.

The mean-shift algorithm is a nonparametric mode-seeking method for density functions proposed by Fukunaga and Hostetler [1]. Comaniciu et al. [2, 3] use it to track object. And Comaniciu et al. change the window size over multiple runs by a constant factor but produces little effect because the smaller windows usually have higher similarity. The image pyramids and an additional mean-shift algorithm for scale selection had been used after estimating the position to confirm the window size in Collins [4]. But its speed is lower than the conventional MS algorithm. A new histogram that exploits the object neighborhood has been proposed to help discriminate the target which is called background ratio weighting (BRW) in Vojir et al. [5]. This approach is faster than others and has a superior effect in sequences with scale change but performs poorly for grayscale sequences.

Gradient information is crucial for appearance representation since it contains pixel’s neighborhood information and is insensitive to illumination variation but it always be ignored. Based on this observation, we present a novel adaptive scale MS algorithm with gradient histogram. The gradient information is calculated by Canny edge detector which was developed by Canny [6].

Moreover, the BRW algorithm has been used to improve the performance of videos with scale change. The template of target will be updated by liner interpolation only if conditions are met to avoid addition of incorrect information. The template update also can cope with appearance variation of target. The proposed tracker is compared with lots of algorithms, and the experiment results show that it is more robust and accurate.

2 Canny Edge Detector

The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide range of edges in images. To remove the noise, a Gaussian filter is applied to the image; the Gaussian filter kernel of size (2k + 1) × (2k + 1) is given by:

$$H_{ij} = \frac{1}{{2\pi \sigma^{2} }}\exp \left( { - \frac{{\left( {i - \left( {k + 1} \right)} \right)^{2} + \left( {j - \left( {k + 1} \right)} \right)^{2} }}{{2\sigma^{2} }}} \right);\;1 \le i,\;j \le \left( {2k + 1} \right)$$
(1)

Let Io denote the original image and H is a classic Gaussian filter matrix; we get the image without noise I = H * Io. Then we extract intensity gradient of the image with the Sobel operator proposed by Sobel [7]. The gradient information in horizontal and vertical directions is \(G_{x}\) and \(G_{y}\). From this, the edge gradient can be determined by

$$G = \sqrt {G_{x}^{2} + G_{x}^{2} }$$
(2)

where

$$G_{x} = \left[ {\begin{array}{*{20}c} { - 1} & 0 & 1 \\ { - 2} & 0 & 2 \\ { - 1} & 0 & 2 \\ \end{array} } \right]*I\;{\text{and}}\;G_{y} = \left[ {\begin{array}{*{20}c} { - 1} & { - 2} & { - 1} \\ 0 & 0 & 0 \\ 1 & 2 & 1 \\ \end{array} } \right]*I.$$
(3)

The edge extracted from the gradient value is still quite blurred after processing the \(G\) with Gauss filter and Sobel operator. The non-maximum suppression and double-threshold joint should be applied to the processed image to improve effect of gradient. Then we can get the gradient image \(I_{g}\) to track the target.

Fig. 1.
figure 1

(The sequence names from top to bottom are dog1, freeman1, freeman3, jogging, mountainbike, and singer1)

Result of six trackers in six sequences.

3 The Tracking Algorithm

Different from the conventional MS tracking algorithm, we append the Ig to original image to get \(I_{e}\). After the combination of the images, we extract the histogram \({\hat{\mathbf{q}}}\) from \(I_{e}\). To cope with the problem which caused by the size of target changes, we use the BRW-MS instead of conventional MS. We can get \({\hat{\mathbf{q}}}\) from:

$$\hat{q}_{u} = C\sum\limits_{i = 1}^{N} {k\left( {\frac{{\left( {x_{i}^{*1} } \right)^{2} }}{{a^{2} }} + \frac{{\left( {x_{i}^{*2} } \right)^{2} }}{{b^{2} }}} \right)\delta \left[ {b\left( {\varvec{x}_{i}^{*} } \right) - u} \right]}$$
(4)

and an ellipsoidal region is used to represent target \(\frac{{\left( {x_{i}^{*1} } \right)^{2} }}{{a^{2} }} + \frac{{\left( {x_{i}^{*2} } \right)^{2} }}{{b^{2} }} < 1\) in current frame. The target candidate is given by

$$\hat{p}_{\varvec{u}} \left( {\varvec{y},h} \right) = C_{h} \sum\limits_{i = 1}^{N} {k\left( {\frac{{\left( {y^{1} - x_{i}^{1} } \right)^{2} }}{{a^{2} h^{2} }} + \frac{{\left( {y^{2} - x_{i}^{2} } \right)^{2} }}{{b^{2} h^{2} }}} \right)\delta \left[ {b\left( {\varvec{x}_{i} - u} \right)} \right]}$$
(5)

where the h is the scale factor. The location of the target is obtained by

$$\hat{y}_{1}^{1} = \frac{1}{{a^{2} }}m_{k}^{1} \left( {\hat{\varvec{y}}_{0} ,h_{0} } \right) + \hat{y}_{0}^{1} , \;\hat{y}_{1}^{2} = \frac{1}{{b^{2} }}m_{k}^{2} \left( {\hat{\varvec{y}}_{0} ,h_{0} } \right) + \hat{y}_{0}^{2}$$
(6)
$$h_{1} = \left[ {1 - \frac{A}{K}} \right]h_{0} + \frac{1}{{h_{0} }}\frac{B}{K}$$
(7)

where A is

$$\sum\limits_{i = 1}^{N} {w_{i} k\left( {\frac{{\left( {y_{0}^{1} - x_{i}^{1} } \right)^{2} }}{{a^{2} h_{0}^{2} }} + \frac{{\left( {y_{0}^{2} - x_{i}^{2} } \right)^{2} }}{{b^{2} h_{0}^{2} }}} \right),}$$
(8)

B is

$$\sum\limits_{i = 1}^{N} {w_{i} \left( {\frac{{\left( {y_{0}^{1} - x_{i}^{1} } \right)^{2} }}{{a^{2} }} + \frac{{\left( {y_{0}^{2} - x_{i}^{2} } \right)^{2} }}{{b^{2} }}} \right)g\left( {\frac{{\left( {y_{0}^{1} - x_{i}^{1} } \right)^{2} }}{{a^{2} h_{0}^{2} }} + \frac{{\left( {y_{0}^{2} - x_{i}^{2} } \right)^{2} }}{{b^{2} h_{0}^{2} }}} \right)} .$$
(9)

And \(g\left( x \right) = - k^{{\prime }} \left( x \right)\) is the derivative of \(k\left( x \right)\). \(w_{i}\) can be obtained by

$$w_{i} = { \hbox{max} }\left( {0,W} \right)$$
(10)

where W is

$${\text{W}} = \left( {\sum\limits_{u = 1}^{m} {\frac{1}{{\hat{\rho }\left[ {\hat{\varvec{p}}\left( {\hat{\varvec{y}}_{0} ,h_{0} } \right),\hat{\varvec{q}}} \right]}}\sqrt {\frac{{\hat{q}_{u} }}{{\hat{p}_{u} \left( {\hat{\varvec{y}}_{0} ,h_{0} } \right)}} } - \frac{1}{{\rho \left[ {\hat{\varvec{p}}\left( {\hat{\varvec{y}}_{0} ,h_{0} } \right),3\widehat{{\varvec{bg}}}} \right]}}\sqrt {\frac{{\widehat{{bg_{u} }}}}{{\hat{p}_{u} \left( {\hat{y},h_{0} } \right)}}} } } \right)\delta \left[ {b\left( {\varvec{x}_{\varvec{i}} } \right) - u} \right]$$
(11)

The \(\widehat{{\varvec{bg}}}\) is the histogram of background computed over the neighborhood of the target in the first frame. Let us denote

$$K = \sum\limits_{i = 1}^{N} {w_{i} g} \left( {\frac{{\left( {y_{0}^{1} - x_{i}^{1} } \right)^{2} }}{{a^{2} h_{0}^{2} }} + \frac{{\left( {y_{0}^{2} - x_{i}^{2} } \right)^{2} }}{{b^{2} h_{0}^{2} }}} \right)$$
(12)

and

$${\mathbf{m}}_{\text{k}} \left( {\widehat{\varvec{y}},h_{0} } \right) = \frac{{\sum\nolimits_{i = 1}^{N} {x_{i} w_{i} g\left( {\frac{{\left( {y_{0}^{1} - x_{i}^{1} } \right)^{2} }}{{a^{2} h_{0}^{2} }} + \frac{{\left( {y_{0}^{2} - x_{i}^{2} } \right)^{2} }}{{b^{2} h_{0}^{2} }}} \right)} }}{K} - \hat{\varvec{y}}_{0}$$
(13)

When the location was determined, the \(\hat{q}\) is updated by

$${\hat{\mathbf{q}}}_{{{\mathbf{new}}}} = \left\{ {\begin{array}{*{20}l} {\hat{\varvec{q}}_{{{\mathbf{old}}}} \quad {\text{if}}\;{\varvec{\uprho}}\left[ {{\hat{\mathbf{p}}}\left( {\mathbf{y}} \right),{\hat{\mathbf{q}}}_{{{\mathbf{old}}}} } \right] \le \alpha } \hfill \\ {\left( {1 - \lambda } \right)\hat{\varvec{q}}_{{{\mathbf{old}}}} + \lambda \hat{\varvec{p}}_{\varvec{u}} \left( \varvec{y} \right)\quad {\text{if}}\;{\varvec{\uprho}}\left[ {{\hat{\mathbf{p}}}\left( {\mathbf{y}} \right),{\hat{\mathbf{q}}}_{{{\mathbf{old}}}} } \right] > \alpha } \hfill \\ \end{array} } \right.$$
(14)

4 Experiment

Experiments are conducted on sequences from Object Tracking Benchmark2013 (OTB2013) dataset [8]. The sequences in OTB2013 not only suffer the deformation but also have other change such as fast motion, background clutter, motion blur, and so on. So, we selected six sequences from OTB2013 dataset to show the results. We compared the proposed algorithm with conventional and state-of-the-art algorithms which are available as source code. They are conventional mean-shit algorithm [2], ASMS [5], OAB [9], LOT [10], and CSK [11]. The parameters for those algorithms are set default.

Figure 1 shows the result of six trackers in sequences. The score of distance precision (DP) rate, overlap success (OS) rate, and center location error (CLE) can be obtained from Table 1.

In general, the gradient histogram improves performance of MS algorithm for the gradient histogram. The data in Table 1 show that proposed algorithm has a higher score than others in DP, OS, and CLE which means our tracker is better than others. The sequence dog1 suffers the scale change, and results show standard MS failed to track target because it only uses the gray levels to calculate histogram and it lacks template update. But our tracker can deal well with this condition for the adopting of gradient histogram and RBW algorithm. The proposed tracker has a better performance than other trackers in sequence singer1 which suffers illumination variation since the addition of gradient histogram which does not dependent on the current pixel value but the difference of adjacent pixel. What is more we find that our tracker has a great improvement in gray image compared to the colorful image than the ASMS algorithm since the ASMS algorithm only can acquire information in the gray channel which is scanty.

Table 1. Scores of six trackers in six sequences

5 Conclusion

In this paper, an adaptive scale mean-shift algorithm with gradient histogram has been proposed to improve the tracking performance of MS-like algorithms. The gradient histogram is constructed by color histogram and gradient feature calculated by Canny edge detector. To deal with the scale change, the RBW algorithm is adopted. Template update is used to cope with appearance variation when the Bhattacharyya coefficient between the current frame and the template is greater than the threshold. The setting of the threshold makes tracker more robust for incorrect information. The proposed tracker is compared with a lot of algorithms in OTB2013 dataset. The experiment results show the tracker’s effectiveness in deformation, rotation, scale change, and illumination variation. Moreover, our tracker has a better preference in gray sequences than conventional MS-like algorithms.