Keywords

1 Introduction

Target tracking and locating is a prosperous research filed in computer vision which can be applied to many fields such as the pedestrian tracking [1], security surveillance, intelligent transportation, military, etc. However, restricted by the complex environment, positioning accuracy and device factor, the target tracking and locating methods still remain to be further improved.

Tracking location research can be divided into two parts: target tracking and target positioning. In target tracking, there is no single visual feature that can describe the target information completely. Tracking algorithms based on single feature, therefore, are often limited by the practical application environment. According to the intrinsic differences between different features, a feasible solution can be obtained by multi-feature fusion [2] to improve the accuracy and robustness of tracking algorithm. In the research of target positioning, the commonly used methods include global positioning system (GPS), Bluetooth, radio frequency identification (RFID), infrared, computer vision, etc. GPS signal is easily weakened in indoor or high dense-constructed area; Infrared positioning technology needs to launch modulation infrared ray, it can only support the line-of-sight transmission because the ray cannot through the obstacles. In addition, most position technologies need expensive reception terminal devices and relevant communication sensors for targets, so that they are limited to be widely used. Traditional positioning technology [3] based on computer vision can be divided into three sub-processes, i.e., stereo matching, three dimensional reconstruction, and calculating spatial coordinates. Because the stereo vision matching needs to spend much time to extract the future points and has mismatching feature points, real-time performance and robustness of the algorithm are reduced.

In order to solve this problem, an algorithm combined the adaptive weighting particle filter algorithm and multi-feature fusion is presented according to the observation probability density fusion of the particle filter algorithm. In tracking model, color histograms [4], local binary pattern (LBP) and edge character were chosen to describe the target area. Color histogram, LBP, and edge character describe the color distribution of target area, texture information, and the association between the surrounding pixels and target edge information, respectively. In the fusion strategy, the adaptive weighting can be obtained through calculating the probability density distribution of corresponding characteristics for the target area and background area. The adaptive weighting particle filter algorithm based on multi-feature fusion, not only focuses on the similarity between the candidate area and the target area, but also considers distinction between target region and background region. The proposed tracking algorithms can enhance the target tracking precision, and estimate the centroid position of motion target in the image, which can avoid the complex stereo vision matching process in visual positioning. For this reason, the spatial location of target can be calculated by using binocular vision technology and the performance of the algorithm presented in this paper is verified by a practical experimental test.

The paper is organized as follows. The feature extraction is presented in Sect. 2. Section 3 focuses on the target tracking with multi-feature fusion. Section 4 states how to calculate the spatial location. Section 5 gives the experiment analysis. Finally, we draw some conclusions and shed light on future work in Sect. 6.

2 Target Feature Extraction

In the image sequences, there are many typical characteristics that can be used for target tracking. Because each character has its own special properties and different sensitivity to the environment, the choice of characteristics is very significant problem. Taking the distinguishability, stability, and independence of the features into account, the color histogram, LBP, and edge character are chosen as target visual features in this paper.

2.1 Color Histogram

Color histogram describes the color distribution in the form of histogram by computing the value of each pixel in the target area. In order to increase the reliability of the distribution, a kernel function is used to compute the weights for every pixel of the target. Then the target model are described as follows:

$$ q_{\text{color}} (u) = f\frac{1}{N}\sum\limits_{i = 1}^{N} {k\left( {\left\| {\frac{{x_{i} - x_{0} }}{h}} \right\|} \right)} \delta \left[ {b\left( {x_{i} } \right) - u} \right],\quad u = 1,2, \ldots ,m $$
(2.1)
$$ k(x) = \left\{ {\begin{array}{*{20}l} {1 - x^{2} } \hfill & {x \le 1} \hfill \\ 0 \hfill & {\text{others}} \hfill \\ \end{array} } \right. $$
(2.2)
$$ f = {1 \mathord{\left/ {\vphantom {1 {\sum\limits_{i = 1}^{N} {k\left( {\left\| {\frac{{x_{i} - x_{0} }}{h}} \right\|} \right)} }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{N} {k\left( {\left\| {\frac{{x_{i} - x_{0} }}{h}} \right\|} \right)} }} $$
(2.3)

where \( k(x) \) is the weighted kernel function, \( \delta \) is the Kronecker delta function, f is the normalization factor, and h is defined by \( h = \sqrt {h_{x}^{2} + h_{y}^{2} } \), in which \( h_{x} \) and \( h_{y} \) is the half width, and half height of the target area, respectively.

2.2 Local Binary Pattern

The LBP operator [5], shown as a powerful measure of image, can extract the texture feature of target easily. The LBP operator labels the pixel in an image by comparing its neighborhood with the center value and considering the results as a binary number (binary pattern). The LBP operator is described by

$$ {\text{LBP}}\left( {x_{\text{c}} ,y_{\text{c}} } \right) = \sum\limits_{p = 0}^{{{P} - 1}} {2^{p} s\left( {i_{\text{p}} - i_{\text{c}} } \right)} $$
(2.4)

where \( i_{\text{c}} \) denotes the gray value of the center pixel \( (x_{\text{c}} ,y_{\text{c}} ),\;i_{p} \) denotes the gray value of the neighborhood, and P is the number of its neighborhood. The function \( s(x) \) is defined as follow

$$ s(x) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {\text{if}} \hfill & {x \ge 0} \hfill \\ 0 \hfill & {\text{if}} \hfill & {x < 0} \hfill \\ \end{array} } \right. $$
(2.5)

Calculate LBP value of each pixel according to Eq. (2.5), then the texture histogram of LBP character is normalized as

$$ q_{\text{color}} (u) = \frac{1}{N}\sum\limits_{i = 1}^{N} {\delta [b(x_{i} - u)]} $$
(2.6)

where \( b(x_{i} ) \) is the index function, returning the histogram serial number of pixel \( x_{i} \), N is the total number of pixels in target area.

2.3 Edge Feature

The edge feature [6] can be described by histogram of oriented gradient. In gray image, the horizontal edge \( G_{x} (x_{i} ) \) and vertical edge \( G_{y} (x_{i} ) \) can be acquired by using edge detection operators to calculate the edge information in X and Y direction, respectively. The gradient \( G(x_{i} ) \) and orientation \( \theta \) of each pixel can be defined as

$$ \begin{aligned} G(x_{i} ) & = \sqrt {G_{x}^{2} (x_{i} ) + G_{y}^{2} (x_{i} )} \\ \theta & = \arctan \left( {{{G_{y} (x_{i} )} \mathord{\left/ {\vphantom {{G_{y} (x_{i} )} {G_{x} (x_{i} )}}} \right. \kern-0pt} {G_{x} (x_{i} )}}} \right) \\ \end{aligned} $$
(2.7)

The weighted oriented gradient histogram is calculated by dividing the orientation into m bins. The histogram of oriented gradient is given by

$$ q_{\text{edge}} (u) = C\sum\limits_{i = 1}^{N} {k\left( {\left\| {\frac{{x_{i} - x_{0} }}{h}} \right\|} \right)} G(x_{i} )\delta \left[ {b(x_{i} ) - u} \right] $$
(2.8)

where kernel function \( k(x) \) is shown as Eq. (2.2), the normalization factor C is defined as

$$ C = {1 \mathord{\left/ {\vphantom {1 {\sum\nolimits_{i = 1}^{N} {G(x_{i} )} }}} \right. \kern-0pt} {\sum\limits_{i = 1}^{N} {G(x_{i} )} }} $$
(2.9)

3 Target Tracking with Adaptively Multi-feature Fusion

3.1 Multi-feature Fusion

After extracting the target features, the multi-feature fusion is realized by the Gaussian weighted strategy. Given the target state \( x_{k} \) and time \( k \), the entire observation likelihood can be calculated by

$$ p(z_{k} \left| {x_{k} } \right.) = \exp \left\{ { - \frac{1}{2}\left( {\alpha \frac{{d_{\text{color}}^{2} }}{{\sigma_{\text{c}}^{2} }} + \beta \frac{{d_{\text{texture}}^{2} }}{{\sigma_{\text{t}}^{2} }} + \gamma \frac{{d_{\text{edge}}^{2} }}{{\sigma_{\text{e}}^{2} }}} \right)} \right\} $$
(3.10)

where \( \sigma_{\text{c}}^{2} ,\sigma_{\text{t}}^{2} \) and \( \sigma_{\text{e}}^{2} \) denote the noise variance of the color histogram, LBP character, edge feature, respectively; \( d_{\text{color}} ,d_{\text{texture}} \) and \( d_{\text{edge}} \) denote the similarity distance of the color histogram, LBP character, edge feature between target area, and candidate area; \( \alpha ,\beta \) and \( \gamma \) are the weights of the color, texture, and edge features, respectively.

The similarity distance \( d_{\text{f}} \) can be calculated by Bhattacharyya coefficient. During the current frame, assuming that the likelihood of candidate area is \( p_{\text{f}} \) while the target area is \( q_{\text{f}} \), the similarity between candidate area and target area is described as follows

$$ \rho_{\text{f}} \left( {q_{\text{f}} ,p_{\text{f}} } \right) = \sum\limits_{u = 1}^{m} {\sqrt {q_{\text{f}} (u)p_{\text{f}} (u)} } , \quad {\text{f}}\, \in \left\{ {\begin{array}{*{20}c} {{\text{color}},} & {\text{texture,}} & {\text{edge}} \\ \end{array} } \right\} $$
(3.11)

where

$$ d_{\text{f}} = \sqrt {1 - \rho_{\text{f}} (q_{\text{f}} ,p_{\text{f}} )} ,\quad {\text{f}} \, \in \left\{ {\begin{array}{*{20}c} {{\text{color}},} & {{\text{texture}},} & {\text{edge}} \\ \end{array} } \right\} $$
(3.12)

The smaller value of \( d_{\text{f}} \) indicates the higher similarity of character between candidate area and target area, then particle gets bigger weighting in Eq. (3.10).

3.2 Adaptive Weights

If multiple features are characterized by equal or fixed weighting without discrimination in multi-feature fusion, the target tracking would be seriously disturbed by some features, which is sensitive on the occasion of varying environment. In this paper, an evaluating function, used to reflect the ability of identification, has been addressed to calculate the weights of features [7].

Based on the sum of intraclass and interclass variance, the evaluation function for the effectiveness of characteristic is defined as follow:

$$ \text{var} \left( {q_{\text{f}} ; \, \mu_{\text{f}}^{\text{t}} ,\mu_{\text{f}}^{\text{b}} } \right) = \frac{{w_{\text{f}}^{\text{t}} w_{\text{f}}^{b} \left( {\mu_{\text{f}}^{\text{t}} - \mu_{\text{f}}^{\text{b}} } \right)^{2} }}{{w_{\text{f}}^{\text{t}} \sigma_{\text{ft}}^{2} + w_{\text{f}}^{\text{b}} \sigma_{\text{fb}}^{2} }},\quad {\text{f}}\, \in \left\{ {\begin{array}{*{20}c} {\text{color,}} & {\text{texture,}} & {\text{edge}} \\ \end{array} } \right\} $$
(3.13)

where \( \mu_{\text{f}}^{\text{t}} \) and \( \mu_{\text{f}}^{\text{b}} \) denote the average of target and background area’s correspondence characteristic value; \( w_{\text{f}}^{\text{t}} \) and \( w_{\text{f}}^{\text{b}} \) denote the weights of target and background area’s correspondence characteristic value; \( \sigma_{\text{ft}}^{2} \) and \( \sigma_{\text{fb}}^{2} \) denote the variance of target and background area’s correspondence characteristic value. In Eq. (3.13), the molecules are the sum of interclass, the denominators is the sum of intraclass variance. According to pattern recognition theory, the larger variance in interclass, while the smaller variance in intraclass, the discernment for target and background is more robust.

As mentioned above, the weights can be calculated by

$$ \left\{ {\begin{array}{*{20}l} {\alpha = \frac{{\text{var} \left( {q_{\text{color}} ; \, \mu_{\text{color}}^{\text{t}} ,\mu_{\text{color}}^{\text{b}} } \right)}}{{\sum {\text{var} \left( {q_{\text{f}} ; \, \mu_{\text{f}}^{\text{t}} ,\mu_{\text{f}}^{\text{b}} } \right)} }}} \hfill \\ {\beta = \frac{{\text{var} \left( {q_{\text{texture}} ; \, \mu_{\text{texture}}^{\text{t}} ,\mu_{\text{texture}}^{\text{b}} } \right)}}{{\sum {\text{var} \left( {q_{\text{f}} ; \, \mu_{\text{f}}^{\text{t}} ,\mu_{\text{f}}^{\text{b}} } \right)} }}} \hfill \\ {\gamma = \frac{{\text{var} \left( {q_{\text{edge}} ; \, \mu_{\text{edge}}^{\text{t}} ,\mu_{\text{edge}}^{\text{b}} } \right)}}{{\sum {\text{var} \left( {q_{\text{f}} ; \, \mu_{\text{f}}^{\text{t}} ,\mu_{\text{f}}^{\text{b}} } \right)} }}} \hfill \\ \end{array} } \right.,\quad f = \left\{ {\begin{array}{*{20}c} {\text{color,}} & {\text{texture,}} & {\text{edge}} \\ \end{array} } \right\} $$
(3.14)

3.3 Tracking Algorithm Based on Particle Filter

This paper uses the color histogram, LBP texture, and histogram of oriented gradient to describe the target information and embeds it into the particle filter tracking.

The diagram of the tracking algorithm is shown in Fig. 1.

Fig. 1
figure 1

The structure of tracking algorithm based on particle filter

4 Target Positioning Based on Binocular Vision

After robust tracking, centroid coordinates of targets in binocular video sequences can be acquired. On the basis, the spatial location of target can be calculated with the use of binocular vision technology [8].

In stereo vision system, an arbitrary point \( P(X_{\text{w}} ,Y_{\text{w}} ,Z_{\text{w}} ) \) is fallen the left and right image on each \( P_{\text{L}} (u_{\text{l}} ,v_{\text{l}} ) \) and \( P_{\text{R}} (u_{\text{r}} ,v_{\text{r}} ) \). Assume that projection matrix \( M_{\text{L}} \) and \( M_{\text{R}} \) denote the geometric relationship from P to P L, from P to P R, respectively, then model is shown as follows

$$ z_{\text{cl}} \left[ {\begin{array}{*{20}l} {u_{\text{l}} } \\ {v_{\text{l}} } \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {m_{11}^{\text{l}} } & {m_{12}^{\text{l}} } & {m_{13}^{\text{l}} } & {m_{14}^{\text{l}} } \\ {m_{21}^{\text{l}} } & {m_{22}^{\text{l}} } & {m_{23}^{\text{l}} } & {m_{24}^{\text{l}} } \\ {m_{31}^{\text{l}} } & {m_{32}^{\text{l}} } & {m_{33}^{\text{l}} } & {m_{34}^{\text{l}} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {X_{\text{w}} } \\ {Y_{\text{w}} } \\ {Z_{\text{w}} } \\ 1 \\ \end{array} } \right] = M_{1} \left[ {\begin{array}{*{20}c} {X_{\text{w}} } \\ {Y_{\text{w}} } \\ {Z_{\text{w}} } \\ 1 \\ \end{array} } \right] $$
(4.15)
$$ z_{\text{cr}} \left[ {\begin{array}{*{20}l} {u_{\text{r}} } \\ {v_{\text{r}} } \\ 1 \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {m_{11}^{\text{r}} } & {m_{12}^{\text{r}} } & {m_{13}^{\text{r}} } & {m_{14}^{\text{r}} } \\ {m_{21}^{\text{r}} } & {m_{22}^{\text{r}} } & {m_{23}^{\text{r}} } & {m_{24}^{\text{r}} } \\ {m_{31}^{\text{r}} } & {m_{32}^{\text{r}} } & {m_{33}^{\text{r}} } & {m_{34}^{\text{r}} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}l} {X_{\text{w}} } \\ {Y_{\text{w}} } \\ {Z_{\text{w}} } \\ 1 \\ \end{array} } \right] = M_{2} \left[ {\begin{array}{*{20}l} {X_{\text{w}} } \\ {Y_{\text{w}} } \\ {Z_{\text{w}} } \\ 1 \\ \end{array} } \right] $$
(4.16)

where \( M_{i} ,\;i = L,R \) is a \( 3 \times 4 \) matrix, \( Z_{\text{cl}} \) and \( Z_{\text{cr}} \) denote the coordinates of the Z axis of \( P \) in the correspondence camera coordinate system.

To further simplify the formula, the model be calculated as follows

$$ \left\{ {\begin{array}{*{20}c} {u_{\text{l}} = \frac{{m_{11}^{\text{l}} X_{\text{w}} + m_{12}^{\text{l}} Y_{\text{w}} + m_{13}^{\text{l}} Z_{\text{w}} + m_{14}^{\text{l}} }}{{m_{31}^{\text{l}} X_{\text{w}} + m_{32}^{\text{l}} Y_{\text{w}} + m_{33}^{\text{l}} Z_{\text{w}} + m_{34}^{\text{l}} }}} \\ {v_{\text{l}} = \frac{{m_{21}^{\text{l}} X_{\text{w}} + m_{22}^{\text{l}} Y_{\text{w}} + m_{23}^{\text{l}} Z_{\text{w}} + m_{24}^{\text{l}} }}{{m_{31}^{\text{l}} X_{\text{w}} + m_{32}^{\text{l}} Y_{\text{w}} + m_{33}^{\text{l}} Z_{\text{w}} + m_{34}^{\text{l}} }}} \\ \end{array} } \right. $$
(4.17)
$$ \left\{ {\begin{array}{*{20}c} {u_{\text{r}} = \frac{{m_{11}^{\text{r}} X_{\text{w}} + m_{12}^{\text{r}} Y_{\text{w}} + m_{13}^{\text{r}} Z_{\text{w}} + m_{14}^{\text{r}} }}{{m_{31}^{r} X_{\text{w}} + m_{32}^{\text{r}} Y_{\text{w}} + m_{33}^{\text{r}} Z_{\text{w}} + m_{34}^{\text{r}} }}} \\ {v_{\text{r}} = \frac{{m_{21}^{\text{r}} X_{\text{w}} + m_{22}^{\text{r}} Y_{\text{w}} + m_{23}^{\text{r}} Z_{\text{w}} + m_{24}^{\text{r}} }}{{m_{31}^{\text{r}} X_{\text{w}} + m_{32}^{\text{r}} Y_{\text{w}} + m_{33}^{\text{r}} Z_{\text{w}} + m_{34}^{\text{r}} }}} \\ \end{array} } \right. $$
(4.18)

The model can be expressed in matrix equation as

$$ A_{\text{i}} P = b_{\text{i}} $$
(4.19)

where \( A_{\text{i}} = \left[ {\begin{array}{*{20}c} {m_{31}^{\text{i}} u_{\text{i}} - m_{11}^{\text{i}} } & {m_{32}^{\text{i}} u_{i} - m_{12}^{\text{i}} } & {m_{33}^{\text{i}} u_{\text{i}} - m_{13}^{\text{i}} } \\ {m_{31}^{\text{i}} v_{\text{i}} - m_{21}^{\text{i}} } & {m_{32}^{\text{i}} v_{i} - m_{22}^{\text{i}} } & {m_{33}^{\text{i}} u_{\text{i}} - m_{23}^{\text{i}} } \\ \end{array} } \right] \), \( b_{\text{i}} = \left[ {\begin{array}{*{20}c} {m_{14}^{\text{i}} - m_{34}^{\text{i}} u_{\text{i}} } \\ {m_{24}^{\text{i}} - m_{34}^{\text{i}} v_{\text{i}} } \\ \end{array} } \right] \) , \( {\text{i}} = {\text{l}},{\text{r}} \).

Let \( A = (A_{\text{l}} ,A_{\text{r}} )^{\text{T}} ,b = (b_{\text{l}} ,b_{\text{r}} )^{\text{T}} \), then

$$ AP = b $$
(4.20)

Finally, based on least square method, the coordinate of \( P \) can be obtained by \( P = (A^{\text{T}} A)^{ - 1} A^{\text{T}} b \).

5 Experiment

To evaluate the performance of the proposed algorithm, a series of experiments are implemented. The experiments are divided into two parts. The former aims to prove that the accuracy and robustness of particle filter tracking, and the latter is to verify the effectiveness of the positioning.

The test video 1 which has complicated environment is from the Visual Tracker Benchmark (video is available at http://www.visual-tracking.net), and its resolution is \( 320 \times 240 \) pixel. The test result is shown in Fig. 2.

Fig. 2
figure 2

The results of the targets tracking, a particle filter based on color histogram in literature [3], b particle filter based on multi-feature fusion

As can be seen from Fig. 2, the particle filter based on color histogram is easily disturbed when the target’s color is similar to background, and the algorithm presented in this paper can track the target accurately.

The test video 2 which is used to test the location accuracy is from Digital Navigation Center of Beihang University, and its resolution is \( 1280 \times 960 \) pixel. The test result is shown in Fig. 3.

Fig. 3
figure 3

The tracking results in left and right camera

Figure 3 gives the results of human tracking indoor. After tracking the target accurately, we can estimate the geometric center coordinate. Then the target location information can be calculated by Eq. (4.20).

In this experiment, we chose front-left corner of room as the origin of coordinate in the binocular positioning in general. The related parameters were measured as follows: camera installation height is 2 m, it’s lens optical axis is inclined downward and the angle with vertical plane is \( \pi /3 \), two camera’s axes are parallel with the horizontal space 45 cm, and the calibrated focus length is 3.6 mm. The calculated result of target location is shown in Table 1.

Table 1 The tracking results in left and right camera

Some samples of final positioning results are demonstrated in Table 1. The frame indexes are 18, 25, 34, 53, 58, 76, 83, and 97. The calculated mean absolute error of the top 100 frame is (13.32, 12.76, 8.15). The reason is that the estimation of targets’ geometric center coordinate may not completely match in left and right video sequences. Then this will lead to calculating error during binocular vision positioning. According to Table 1, we can see that the proposed positioning method can realize target positioning in the range of permitted error. The error mainly comes from the tracking error between left and right camera while parts from measurement error.

6 Conclusion

In this paper, a target tracking and locating algorithm was reported, including feature extraction, multi-feature fusion, particle filter tracking, and target positioning. The video test results have proved that the proposed algorithm can realize the target tracking and positioning. However, there are certain limitations of proposed algorithm. First, the tracking error may cause estimated error of targets’ pixel coordinate. Second, the target real-time positioning is still challenging. All the mentioned contents need further improvement and are under investigation.