Keywords

1 Introduction

Tracking can be referred to as a task in order to generate the trajectories of the objects which are moving and compute the motion of sequenced images. Numerous approaches are proposed for translating an object in a sequence of frames, MS is a common approach to perform the task for tracking an object. It is easy in implementing and robustly tracks the performance [1]. MS algorithm compares the target model with the current frame to obtain the region of an object which is selected. It is hard to deal with occlusion in the object and loss of an object in the frame [1]. So, MS procedure has to be improved by using a Kalman filter. Kalman filter estimates active systems state, though the exact form is not known. Other limitations of the MS approach are subjected to local minima where few features of the target are presented in the backdrop. BWH is implemented to reduce the backdrop interference which represents the target. But unfortunately the transformation formula is incorrect and BWH is similar to MS tracking with the usual representation of the target [14]. To achieve the improved target localization CBWH MS algorithm is implemented which is obtained by not changing the target candidate model but the target model in the frames. The advantage of CBWH is, in spite of having much information in backdrop CBWH can work robustly. An object position is tracked by using Kalman Filter and that position is observed by MS algorithm. Kalman filter composed with a set of equations which are mathematical formulations that results as an effective computational work for estimating the particular state in several aspects in processing [58]. It consists of two groups: Time and measurement update equations, for projecting the progressive and ongoing state and error covariance measurement, are estimated and to extract the prior estimation of next step time updating equations are used, which gives optimal solutions [1, 9, 10]. Initially, an overview of the traditional MS algorithm is provided and further, the CBWH scheme is introduced in detail. Finally, it explains the fundamentals of KF using the formula and describes the algorithm by proposing the method of tracking an object using the CBWH algorithm and KF [4, 1113].

2 Literature Survey

Wen and Cai presented an MS algorithm with Gaussian is studied and applied for tracking an object. A convergence theorem and proofs are provided [1]. From the experiment, an object is found even in the presence of occlusions in Fig. 1 [1418].

Fig. 1
figure 1

Results are with the various frames a 20 frame, b 30 frame, c 70 frame, d 120 frame

Jeyakar and Babu presented MS algorithm has been proved as an efficient algorithm for tracking an object in a video sequence. The author proposed a robust tracking algorithm which overcomes the drawbacks presented in the color histogram-based tracking [9]. From Fig. 2, it is shown that multi-fragment representation of the target and candidate models is used to increase the robustness of object tracking [1921].

Fig. 2
figure 2

Robustness to illumination change is shown in the above figure

Comaniciu et al. presented a new method is obtained toward representing the target and localization, the tracking of non-rigid objects is proposed visually by using a central component as shown in Fig. 3 [22].

Fig. 3
figure 3

Football sequenced video frames are used to extract the tracking results

Yang et al. presented a new algorithm is proposed for tracking an object through color as a feature in a complex environment. In order to find a location of an object, an iterative procedure is followed and to improve the MS algorithm a CBWH method is introduced to decrease the interference of the backdrop in target localization [23]. Based on the experiment, tracking of an object is not influenced by the changes in the scale and less subjected to clutter.

Ning et al. presented an improved MS algorithm is introduced namely BWH which is put forwarded for reducing the interference of backdrop in the frame but it does not introduce any new improvement and it is similar to MS. Therefore, CBWH is proposed by changing target model only and refrains the changes in target candidate model [10]. The CBWH algorithm recognizes the target which is more reliable and accurate. It also achieves small errors and standard deviation than BWH MS algorithm.

3 Proposed Methodology

In order to have robust tracking, the mean-shift algorithm will be an effective approach for tracking the objects whose appearance is bounded by histograms. BWH- and CBWH-based MS tracking is implemented by decreasing the involvement of background in target localization. The following objectives are formulated for achieving the CBWH and BWH MS tracking algorithm.

  • To study and analyze the conventional robust object tracking algorithms.

  • To implement existing mean-shift tracking algorithm.

  • To implement CBWH- and BWH-based MS tracking algorithms.

To compare the CBWH- and BWH-based MS tracking algorithms with existing mean-shift tracking algorithms. Tracking with BWH and CBWH MS algorithms. In this methodology, the BWH algorithm transforms both the candidate model and target model but does not actually decrease the interference of background features to amend localization of the target. CBWH algorithm is then introduced to overcome the issues by not changing the candidate model but changing the target model only. The basic goal of this CBWH algorithm is to achieve improved target localization by reducing the interference of background as shown in Fig. 4. MS algorithm, the Kernel-based deterministic procedure, is an iteration of MS tracking which converges the local maximum function of measurement with assumptions of behavior on the kernel. It is a less complicated algorithm; it gives a reliable and general solution to track an object and independent in representing the target.

Fig. 4
figure 4

Representing the target in MS algorithm

Let \(\left\{ {y_{i}^{*} } \right\}_{i = 1 \ldots n}\) is denoted for the position which is normalized of a pixel which is present in the region of target, then it is a center in origin point. \(\hat{q}\) is a target model which is corresponded to the region of target which is performed is described below:

$$\hat{q} = \left\{ {\hat{q}_{v} } \right\}_{v = 1 \ldots m}$$
(1)
$$\hat{q}_{v} = C\sum\limits_{(i = 1)}^{n} {k\left( {\left\| {y_{i}^{*} } \right\|^{2} } \right)\delta \left( {f\left( {y_{i}^{*} } \right) - v} \right)}$$
(2)

Here, \(\hat{q}_{v}\) is represented to the probability feature of vin target model \(\hat{q}\), feature spaces are denoted as m, Kronecker delta function is denoted as (δ), \(f\left( {y_{i}^{*} } \right)\) associates the pixel to the histogram bin, an isotropic kernel profile is k(y) and C is constant which is a normalization function defined by

$$C = \frac{1}{{\sum_{i = 1}^{n} k\left( {\left\| {y_{i}^{*} } \right\|^{2} } \right)}}$$
(3)

The representation of color feature space ‘v’ in the MS algorithm is clearly shown in the above picture of the target model. The candidate region corresponds to the target candidate model \(\hat{p}_{v} (x)\) which is given by

$$\hat{p}\left( x \right) = \left\{ {\hat{p}_{v} \left( x \right)} \right\}_{v = 1 \ldots m}$$
(4)
$$\hat{p}_{v} \left( x \right) = C_{h} \sum\limits_{i = 1}^{{n_{h} }} {k\left( {\left\| {\frac{{x - y_{i} }}{h}} \right\|^{2} } \right)} \delta \left( {b\left( {y_{i} } \right) - v} \right)$$
(5)

where

$$C_{h} = \frac{1}{{\sum_{i = 1}^{{n_{h} }} k\left( {\left\| {\frac{{y - X_{i} }}{h}} \right\|^{2} } \right)}}$$
(6)

where \(\hat{p}_{v} (x)\) represented as the feature of probability v in the candidate model. The pixel positions are denoted as, \(\hat{p}_{v} (x),\left\{ {y_{i}^{*} } \right\}_{i = 1 \ldots n}\) in the target candidate region which is centered at x, h is denoted as bandwidth and constant \(C_{h}\) is a normalization function. For calculating the target model and the candidate model, a Bhattacharyya coefficient is derived between the two normalized histograms \(\hat{q}\) and \(\hat{p}(x)\) as follows:

figure a
$$\rho \left[ {\hat{p}\left( x \right),\hat{q}} \right] = \sum\limits_{u = 1}^{m} {\sqrt {\hat{p}_{v} \left( x \right)\hat{q}_{v} } }$$
(7)

Calculating the distance in between \(\hat{p}(x)\) and \(\hat{q}\) is defined as

$$d\left( x \right) = \sqrt {1 - \rho \left[ {\hat{p}\left( x \right),\hat{q}} \right]}$$
(8)

3.1 Tracking the MS

The issue of tracking an algorithm is an offset computation form where it moves from the current location x to a new location x1 based on iteration equation.

$$x_{1} = \frac{{\sum_{i = 1}^{{n_{h} }} x_{i} w_{i} g \left( {\left\| {\frac{{x - y_{i} }}{h}} \right\|^{2} } \right)}}{{\sum_{i = 1}^{{n_{h} }} w_{i} g \left( {\left\| {\frac{{x - y_{i} }}{h}} \right\|^{2} } \right)}}$$
(9)

where

$$w_{i} = \sum\limits_{u = 1}^{m} {\sqrt {\frac{{\hat{q}_{v} }}{{\hat{p}_{v} \left( {x_{0} } \right)}} \delta \left[ {b\left( {y_{i} } \right) {-}v} \right]} }$$
(10)

Choose the kernel within the Epanechnikov profile, (9) is decreased to

$$x_{1} = \frac{{\sum_{i = 1}^{{n_{h} }} y_{i} w_{i} }}{{\sum_{i = 1}^{{n_{h} }} w_{i} }}$$
(11)

By using (Eq. 11), the algorithm tracks to finds the new frame which is similar to a region of an object. The target localization is shown briefly in Fig. 5.

Fig. 5
figure 5

Localization of target in MS tracking

Performance, the target localization accuracies for this algorithm are mentioned below and their diagrammatic representations are shown in Fig. 6.

Fig. 6
figure 6

Position of the tracker (red) and its associated ground truth bounding box (blue). Centroid distance is described by dark blue line. Overlap is represented by orange

3.2 Normalized Centroid Distance (NCD)

Normalized centroid distance (NCD) for a tracker centered at (xt, yt) and a ground truth bounding box with center (xb, yb). The NCD in terms of the height (hb) and the width (wb) of the bounding box is given as

$${\text{NCD}} = \left( {\frac{{x_{t} - x_{b} }}{{w_{b} }}} \right)^{2} + \left( {\frac{{y_{t} - y_{b} }}{{h_{b} }}} \right)^{2}$$
(12)

Overlap, the useful measure of tracker accuracy is part of the ground truth bounding box is occupied in a given frame, which is referred to as overlap.

$${\text{Overlap}} = \frac{{{\text{area}}_{\text{Common}} }}{{{\text{area}}_{{{\text{bounding}\_\text{box}}}} }}$$
(13)

3.3 BWH MS Tracking

BWH MS tracking, to decrease the interference in target localization which has salient background features, Comaniciu et al. proposed a model for representing the background features and discriminative features are selected from the target candidate region and the target region. In the background, the histogram is expressed as \(\left\{ {\hat{o}_{v} } \right\}_{v = 1 \ldots m}\) and surrounding area of target is calculated by the algorithm. The region present in the backdrop is considered as twice the size of the target. The value of non-zero is minimal and represented as \(\left\{ {\hat{o}_{v} } \right\}_{v = 1 \ldots m}\) is denoted by \(\hat{o}^{*}\). To define the changes in between the representations of the candidate model and target model, a coefficient is used and given below.

$$v_{u } = \left\{ {\frac{{\hbox{min} \left( {\hat{o}^{*} } \right)}}{{\hat{o}_{u} }} , 1} \right\}_{u = 1 \ldots m}$$
(14)

This transformation will make the weights less of these features with low \(v_{u}\), i.e., which are the features of background. Then, the new model of the target is defined as:

$$\hat{q}_{v}^{'} = C^{\prime}u_{v} \sum\limits_{i = 1}^{n} {k\left( {\left\| {y_{i}^{*} } \right\|^{2} } \right)\delta \left( {b\left( {y_{i}^{*} } \right) - v} \right)}$$
(15)

where

$$C^{\prime} = \frac{1}{{\sum_{i = 1}^{n} k\left( {\parallel y_{i}^{*} \parallel^{2} } \right)\sum_{u = 1}^{m} u_{v} \delta \left( {b\left( {y_{i}^{*} } \right) - v} \right) }}$$
(16)

The new target candidate model is:

$$\hat{p}_{u}^{ '} (x) = C_{h}^{'} u_{v} \sum\limits_{i = 1}^{{n_{h} }} { k\left( {\left\| {\frac{{x - y_{i} }}{h}} \right\|^{2} } \right) \delta \left( {b\left( {y_{i} } \right) - v} \right)}$$
(17)

where

$$C_{h}^{'} = \frac{1}{{\sum_{i = 1}^{{n_{h} }} k\left( {\left\| {\frac{{x - y_{i} }}{h}} \right\|^{2} } \right)\sum_{u = 1}^{m} u_{v} \delta \left( {b\left( {y_{i} } \right) - v} \right) }}$$
(18)

The above algorithm is used for reducing the effect of features present in the target candidate model which is present in the target localization.

3.4 Similarities for Representing the BWH with Usual Representation

In the target candidate region, the weights of points determine the convergence by using the iteration formula in tracking an algorithm, only by reducing the weights in the features of backdrop will decrease the information in the backdrop in target localization relevantly. For evaluating the changes in weights of points \(y_{i}\) which is computed by the target candidate region in BWH, \(w_{i}^{'}\) denotes the weight of a point computed in the target candidate region by the BWH.

$$w_{i}^{'} - \mathop \sum \limits_{N = 1}^{m} \sqrt {\frac{{\hat{q}}}{{\hat{p}\left( y \right)}}} \delta \left[ {b\left( {y_{i} } \right) - v} \right]$$
(19)

Here, \(v^{\prime}\) be the feature space in the bin index which corresponds to point \(y_{i}\) in the candidate region. We have \(\delta \left( {b\left( {y_{i} } \right) - v^{\prime}} \right) = 1\). So Eq. (16) can be simplified as

$$w_{i}^{'} = \sqrt {\hat{q}_{{v^{\prime}}} /\hat{p}_{{v^{\prime}}} } \left( y \right)$$
(20)

Substitute Eqs. (14) and (16) into Eq. (19), there is

$$w_{i}^{'} = \sqrt {\frac{{c^{\prime}u_{v}^{'} \mathop \sum \nolimits_{j = 1}^{n} k\left( {IIyII} \right)_{j}^{2} \delta \left[ {b\left( {y_{j}^{*} } \right) - v^{\prime}} \right]}}{{c_{h}^{'} u_{v}^{'} \mathop \sum \nolimits_{j = 1}^{{n_{h} }} k\left( {II\frac{{x - y_{j} }}{h}II} \right)^{2} \delta \left[ {b\left( {y_{j} } \right) - v^{\prime}} \right]}}}$$
(21)

By substituting normal factors C and \(C_{h}\) and removing the common factor \(v_{{u^{\prime}}}\) from the numerator and into the above equation, we have

$$w_{i}^{'} = \sqrt {\frac{{cc_{h} }}{{cc_{h} }}.\frac{{c^{\prime}\mathop \sum \nolimits_{i = 1}^{n} k\left( {IIY_{t}^{*} II} \right)^{2} \delta \left[ {b\left( {Y_{t}^{*} } \right) - v^{\prime}} \right]}}{{c_{h}^{'} \mathop \sum \nolimits_{i = 1}^{{n_{h} }} k\left( {II\frac{{x - y_{t} }}{h}II} \right)^{2} \delta \left[ {b\left( {y_{t} } \right) - v^{\prime}} \right]}}} = \sqrt {\frac{{c^{\prime}c_{h} }}{{cc_{h}^{'} }}} . \sqrt {\frac{{\hat{q}_{v}^{'} }}{{\hat{p}_{v}^{'} }}} = \sqrt {\frac{{c^{\prime}c_{h} }}{{cc_{h}^{i} }}} w_{i}$$
(22)

where \(w_{i}\) calculated by Eq. (10) is the general representation of the target candidate model as the weight of the point and target model. Equation (21) suggests that \(w_{i}^{' }\) is reciprocal to \(w_{i}\). Moreover, by associating MS iteration Eq. (11) we have

$$y_{1} = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{h} }} y_{i} g_{i} w_{i}^{'} }}{{\mathop \sum \nolimits_{i = 1}^{{n_{h} }} g_{i} w_{i}^{'} }} = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{h} }} y_{i} g_{i} w_{i} \sqrt {\frac{{c^{\prime}c_{h} }}{{cc_{h}^{'} }}} }}{{\mathop \sum \nolimits_{i = 1}^{{n_{h} }} w_{i} g_{i} \sqrt {\frac{{c^{\prime}c_{h} }}{{cc_{h}^{'} }}} }} = \frac{{\mathop \sum \nolimits_{i = 1}^{{n_{h} }} y_{i} g_{i} w_{i} }}{{\mathop \sum \nolimits_{i = 1}^{{n_{h} }} w_{i} g_{i} }}$$
(23)

Equation (23) is the MS iteration formula is uniform which is the scale transformation of weights is. Therefore, BWH actually does not strengthen MS tracking by transferring the target model representation and also target candidate model where the results are almost the same as that, without using BWH (Fig. 7).

Fig. 7
figure 7

Flowchart of proposed methodology

3.5 CBWH Algorithm

As it is discussed above the BWH algorithm does not improve any target localization. To achieve that truly a new formula is introduced known as CBWH as shown in Figs. 8 and 9. It is employed to change the only target but not the candidate model. By this, it can reduce the outstanding features of the backdrop in the target region.

Fig. 8
figure 8

Algorithm of CBWH

Fig. 9
figure 9

Feature density example of MS clustering

New formula,

$$w_{i}^{n} = \sqrt {\frac{{\hat{q}v}}{{\hat{p}v}}} \left( x \right)$$
(24)

Therefore,

$$w_{i}^{n} = \sqrt {u_{v} } . w_{1}$$
(25)

The CBWH MS tracking algorithm is:

3.6 Applications

It is a procedure of functional application to find the mode: In the density function, begin from the points of data and conduct mean-shift method in order to the find the stationary points. By holding the local maxima, compress or reduce the points. Area of the attraction of the model describes the set of locations which assembles to the same mode. The points which are available in the same area of attraction are correlated with the same cluster. Below figure shows feature density example of MS clustering.

4 Results

In the analysis of MS tracking, color feature space and the RGB color model feature are used in all the experiments and 16 × 16 × 16 bins are quantized. The results of all the four video sequences are tracked, i.e., skating sequence, cube sequence, smiley sequence, and basketball sequence are shown in Figs. 10, 11, 12 and 13, Table 1 shows that the CBWH model which consists of greater accuracy in localization when compared to BWH model and the usual MS model. Because the CBWH model truly accomplishes the information of background in target localization with an error of low centroid distance and high overlap, Figures 14 and 15 show the plots of target localization accuracies. Figure 16 illustrates the numbers of iterations by plots in three methods for basketball sequence of video. Table 1 also shows the average number of iterations by three methods. An average number of iterations is less for CBWH when compared to BWH and the usual MS model which is observed through the images. The noticeable features are improved in the target model while the features of the background are muted in CBWH. By this, the MS algorithm can be located with the greater accuracy as a target. All experiments were done on MATLAB 7.12 version with Image Processing toolbox and Computer vision toolbox with the following specification of computer: Processor: Intel(R) Core(TM), Processor Speed: I3, Operating System: Windows 8, Hard Disk: 910 GB, RAM: 4 GB.

Fig. 10
figure 10

Comparison of three methods using color feature space for skating sequence. Frames 10, 24, 40 are displayed. From (c), CBWH MS tracking method has improved tracking accuracy than BWH and usual MS tracking methods, a MS tracking, b BWH MS tracking, c CBWH MS tracking

Fig. 11
figure 11

Comparison of three methods using color feature space for cube sequence. Frames 19, 25, 36 are displayed. From (c), CBWH MS tracking method has improved tracking accuracy than BWH and usual MS tracking methods. a MS tracking, b BWH MS tracking, c CBWH MS tracking

Fig. 12
figure 12

Comparison of three methods using color feature space for the smiley sequence. Frames 15, 36, 55 are displayed. From (c), CBWH MS tracking method has improved tracking accuracy than BWH and usual MS tracking methods. a MS tracking, b BWH MS tracking, c CBWH MS tracking

Fig. 13
figure 13

Comparison of three methods using color feature space for basketball video sequence. Frames 26, 35, 45 are displayed. From (c), CBWH MS tracking method has improved tracking accuracy than BWH and usual MS tracking methods. a MS tracking, b BWH MS tracking, c CBWH MS tracking

Table 1 Comparison of the analysis of mean-shift-based algorithms
Fig. 14
figure 14

Plot of normalized centroid distance error versus different video frames using color feature space

Fig. 15
figure 15

Plot of overlap versus different video frames using color feature space

Fig. 16
figure 16

Plot of average number of iterations versus different video sequences using color feature space

5 Conclusion

In the computer vision field, the popular research subject or content is object tracking. It has applications which are necessary for the field of transportation intelligence, defense security, intelligent video surveillance, and robot navigation. There are many algorithms which are used in the literature on object detection and tracking. Histogram of color feature for mean shift is based on the object tracking algorithm which obtains a huge range of applications, due to its simplicity and good real-time performance. In this project, color feature space is analyzed based on tracking of mean shift, in target representation, localization of a target has to be improved and interference of background should be decreased, analyze the BWH and CBWH. The results of the experiments will confirm that CBWH with reduced MS iteration number and also provides 90.12% accuracy in tracking. The advantage of CBWH sensitivity has to be reduced in target initialization for tracking the mean shift, i.e., CBWH will strongly track even though the target is not initialized properly.

Concerning the future research, the MS method can be further improved to track the objects using a fuzzy coding histogram to overcome the effects of quantization inherent in fixed bin histogram. At the initial stage of this algorithm, the fuzzy clustering is used on the initial detection region to get the cluster prototypes and regard it as a fuzzy codebook. During the tracking period, the candidate image region is also represented by a histogram which is constructed by the fuzzy memberships. In addition to this, a cumulative distribution function between the fuzzy coding histogram is used to construct a cross-bin metric and then use mean-shift iteration to realize the robust visual tracking. The concept of MS can also be extended for robustness by including tracking of multiple objects, occlusion detection, and scale changes to improve the tracking accuracy.