Research and Implementation of Vehicle Tracking Algorithm Based on Multi-Feature Fusion

Guan, Heng; Liu, Huimin; Yu, Minghe; Zhao, Zhibin

doi:10.1007/978-3-030-30952-7_8

Heng Guan¹²,
Huimin Liu¹²,
Minghe Yu¹³ &
…
Zhibin Zhao¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11817))

Included in the following conference series:

International Conference on Web Information Systems and Applications

2024 Accesses

Abstract

The main task of multi-object tracking is to associate targets in diverse images by detected information from each frame of a given image sequence. For the scenario of highway video surveillance, the equivalent research issue is vehicles tracking, which is necessary and fundamental for traffic statistics, abnormal events detection, traffic control et al. In this paper, a simplified and efficient multi-object tracking strategy is proposed. Based on the position and intersection-over-union (IOU) of the moving object, the color feature is derived, and unscented Kalman filter is involved to revise targets’ positions. This innovative tracking method can effectively solve the problem of target occlusion and loss. The simplicity and efficiency make this algorithm applicable for the perspective of real-time system. In this paper, highway video recordings are explored as data repository for experiments. The results show that our method outperforms on the issue of vehicle tracking.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Vehicle Detection and Tracking from Surveillance Cameras in Urban Scenes

A nighttime highway traffic flow monitoring system using vision-based vehicle detection and tracking

Article 06 July 2023

Efficient vehicle detection and tracking strategy in aerial videos by employing morphological operations and feature points motion analysis

Article 09 July 2020

Keywords

1 Introduction

Multi-object tracking is an important research issue in computer vision. It has a wide range of applications, such as video surveillance, human-computer interaction, and driverless driving. The main task of multi-object tracking is to correlate the moving objects detected in the video sequence and plot the trajectory of each moving object, as shown in Fig. 1. These trajectories will contribute to abnormal event report, traffic control and other applications.

In the practical application of high-speed scenes, most of the surveillance cameras are installed on the roadside, so in the process of video surveillance, vehicles often occlude each other. When the target vehicle reappears in the video after occultation, it is likely recognized as new, which results in the loss of the original target. A simplified and efficient correlation algorithm is proposed in this paper. After calculating the correlation degree of each object in different images by its position, motion and color information, we match and merge those objects with a high degree as an identified target.

The main contributions of this paper are listed below:

A vehicle location prediction model is proposed to accurately predict the location when occluded.
A data association method based on multi-feature fusion is proposed for data association for simplification and efficiency.
Tracking algorithm is improved on real data sets as experiments.

2 Related Work

From RCNN [1] series to the SSD [2] and YOLO [3] series, many achievements appear in object detection. As YOLOv3 is an end-to-end object detection method, it runs very fast [4, 5].

For video analysis, multi-object tracking has been a research focus which consists of Detection-Free Tracking (DFT) and Detection-Based Tracking (DBT) [6]. Because the vehicle in the video is tracked continuously, DBT is considered in this paper.

Based on DBT, many researchers propose different solutions [7,8,9]. However, these methods do not involve image features, the results are not acceptable for occlusion.

CNN is also introduced for multi-object tracking [10,11,12,13]. Based on SORT [8, 14, 15] uses CNN to extract the image features of the tracking target. Although CNN provides more precision, it shows powerlessness in real-time computation. Because the performances on single object tracking in complex scenarios have been improved [16,17,18], [19] transforms the multi-object tracking problem into multiple single object tracking problems, and also achieves good performance.

For location revision, Kalman filter [20] can filter the noise and interference and optimize the target state. However, Kalman filter is only applicable to Gauss function and deals with linear model.

Based on the above works, this paper proposes a method on multi-object tracking by correlating the color and the location features. It predicts the location when the target is lost in an image, and then applies the unscented Kalman Filter for calibration.

3 Data Association

Given an image sequence, we employ object detection module to get n detected objects in the t-th frame as $ S^{t} = \{ S_{1}^{t} ,S_{2}^{t} , \ldots ,S_{n}^{t} \} $. The set of all the detected objects from time $ t_{s} $ to $ t_{e} $ is $ S^{{t_{s} :t_{e} }} = \left\{ {S_{1}^{{t_{s} :t_{e} }} ,S_{2}^{{t_{s} :t_{e} }} , \ldots ,S_{n}^{{t_{s} :t_{e} }} } \right\} $, where $ S_{i}^{{t_{s} :t_{e} }} = \left\langle {S_{i}^{{t_{s} }} ,S_{i}^{{t_{s} + 1}} , \ldots ,S_{i}^{{t_{e} }} } \right\rangle \left( {i \in n} \right) $ is the trajectory of the object i-th from $ t_{s} $ to $ t_{e} $. The purpose of multi-object tracking is to find the best sequence of states of all objects, which can be modeled by using MAP (maximum a posteriori) according to the conditional distribution of the states of all observed sequences, defined as:

$$ \bar{S}^{{t_{s} :t_{e} }} = \mathop {\arg \hbox{max} }\limits_{{S^{{t_{s} :t_{e} }} }} \text{P}\left( {S^{{t_{s} :t_{e} }} |O^{{t_{s} :t_{e} }} } \right) $$

(1)

For more accuracy, prediction on the object position according to the historical trajectory is necessary. Equations (2) and (3) show the prediction method.

$$ S_{i}^{t} = \left\langle {Loc_{{S_{i}^{t} }} ,MotionInfo_{{S_{i}^{t} }} ,Features_{{S_{i}^{t} }} } \right\rangle $$

(2)

$$ \left\{ {\begin{array}{*{20}l} {Loc_{{S_{i}^{t} }} = \left[ {\left( {x_{{S_{i}^{t} }} ,y_{{S_{i}^{t} }} } \right),\left( {\left( {x + w} \right)_{{S_{i}^{t} }} ,\left( {y + h} \right)_{{S_{i}^{t} }} } \right)} \right]} \hfill \\ {MotionInfo_{{S_{i}^{t} }} = \left[ {\bar{v}_{{S_{i}^{t} }} ,\bar{a}_{{S_{i}^{t} }} ,k} \right]} \hfill \\ {Features_{{S_{i}^{t} }} = \left[ {color_{{S_{i}^{t} }} ,type_{{S_{i}^{t} }} } \right]} \hfill \\ \end{array} } \right. $$

(3)

$ Loc_{{S_{i}^{t} }} $ is the location of i-th object in t-th frame, including upper left coordinates $ \left( {x_{{S_{i}^{t} }} ,y_{{S_{i}^{t} }} } \right) $ and lower right coordinates $ \left( {\left( {x + w} \right)_{{S_{i}^{t} }} ,\left( {y + h} \right)_{{S_{i}^{t} }} } \right) $. $ MotionInfo_{{S_{i}^{t} }} $ is motion information of i-th object in t-th frame, including average velocity $ \bar{v}_{{S_{i}^{t} }} $, average acceleration $ \bar{a}_{{S_{i}^{t} }} $ and motion direction k; $ Features_{{S_{i}^{t} }} $ is characteristic information of i-th object in t-th frame, including color features $ color_{{S_{i}^{t} }} $ and object class $ type_{{S_{i}^{t} }} $.

Kalman filter is used to calibrate the position. However, Kalman filter algorithm is applicable to linear model and does not support multi-object tracking. Therefore, UKF is used for statistical linearization called nondestructive transformation. UKF first collects n points in the prior distribution, and uses linear regression for non-linear function of random variables for higher accuracy.

In this paper, three data are used for data matching – location information, IOU (Intersection over Union) and color feature. When calculating the position information of each $ S_{i}^{t - 1} $ and $ S_{i}^{t} $, we use the adjusted cosine similarity to calculate the position similarity of $ S_{i}^{t - 1} $ and $ S_{i}^{t} $. Formula for calculating position similarity L_confidence between $ S_{i}^{t - 1} $ and $ S_{i}^{t} $ is defined as (4).

$$ L_{confidence} = \frac{{\sum\limits_{k = 1}^{n} {\left( {Loc_{{S_{i}^{t} }} - \overline{Loc}_{{S_{i}^{t} }} } \right)_{k} \left( {Loc_{{S_{i}^{t - 1} }} - \overline{Loc}_{{S_{i}^{t - 1} }} } \right)_{k} } }}{{\sqrt {\sum\limits_{k = 1}^{n} {\left( {Loc_{{S_{i}^{t} }} - \overline{Loc}_{{S_{i}^{t} }} } \right)_{k}^{2} } } \sqrt {\sum\limits_{k = 1}^{n} {\left( {Loc_{{S_{i}^{t - 1} }} - \overline{Loc}_{{S_{i}^{t - 1} }} } \right)_{k}^{2} } } }} $$

(4)

The object association confidence between $ S_{i}^{t - 1} $ and $ S_{i}^{t} $ is shown as:

$$ \text{Confidence} \left( {S_{i}^{t - 1} ,S^{t} } \right) = \left\{ {\text{c}_{1} \left( {S_{i}^{t - 1} ,S_{1}^{t} } \right),\text{c}_{2} \left( {S_{i}^{t - 1} ,S_{2}^{t} } \right), \ldots ,\text{c}_{n} \left( {S_{i}^{t - 1} ,S_{n}^{t} } \right)} \right\} $$

(5)

Thereby $ Loc_{{S_{i}^{t} }} $ is:

$$ Loc_{{S_{i}^{t} }} = Loc_{{S_{{\text{maxIndex} \left( {\text{Confidence} \left( {S_{i}^{t - 1} ,S^{t} } \right)} \right)}}^{t} }} $$

(6)

4 Experiments

The real high surveillance videos are used as the experimental data set, and three different scenes are intercepted: ordinary road section, frequent occlusion and congestion. Each video lasts about 2 min with a total of 11338 pictures annotated according to the MOT Challenge standard data set format.

The performance indicators of multi-object tracking show the accuracy in predicting the location and the consistency of the tracking algorithm in time. The evaluation indicators include: MOTA, combines false negatives, false positives and mismatch rate; MOTP, overlap between the estimated positions and the ground-truth averaged over the matches; MT, percentage of ground-truth trajectories which are covered by the tracker output for more than 80% of their length; ML, percentage of ground-truth trajectories which are covered by the tracker output for less than 20% of their length; IDS, times that a tracked trajectory changes its matched ground-truth identity [6].

Firstly, we implement the basic IOU algorithm to calculate the association of targets. The comparison is between IOU, SORT and IOU17. In the experiment, our method sets $ T_{minhits} $ (shortest life length of the generated object) as 8, $ T_{maxdp} $ as 30; the same to SORT. IOU17 sets $ T_{minhits} $ as 8, $ \sigma_{iou} $ as 0.5; the objective detecting accuracy is set as 0.5 and the results are shown in Tables 1, 2 and 3.

Table 1. Accuracy for ordinary section

Full size table

Table 2. Accuracy for frequently occlusion section

Full size table

Table 3. Accuracy for congestion sections

Full size table

It is illustrated from Tables 1, 2 and 3 that the method with predicted position and color features we proposed in this paper outperforms other methods. On frequently occlusion scenario, predicted position method greatly improves the accuracy. In congestion sections, color features are more helpful in accuracy than location prediction. The method in this paper does not predict the position in the case of amble because of the large deviation. However, the color feature doesn’t change in amble. For amble, the color feature will make the accuracy significantly increase.

SORT [8] and IOU17 [9] only use the position information of the target to correlate the data over image feature and position prediction. In the case of frequent congestion and occlusion, the vehicle can’t be tracked effectively because of the decrease of the accuracy of the object detection and the occlusion of the tracking target.

5 Conclusions

Target loss happens when occlusion and other events occur in a video surveillance. In this paper, we use linear regression to analyze the vehicle’s historical trajectory, and then predict the position of the vehicle when it disappears in a video frame, and recognize the target when the vehicle appears again by the predicted position and color feature. Experiments show that the algorithm also shows it sufficiency for occlusion and congestion.

This method only uses color feature as extracted image feature. Although the computation is light, the deviation for color similarity of targets exits. For the future work, we will research shallow CNN to extract image features to enhance the performances in terms of efficiency and differentiation.

References

Girshick, R., Donahue, J., Darrelland, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, pp. 580–587. IEEE (2014)
Google Scholar
Liu, W., et al.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, pp. 779–788. IEEE (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Hawaii, pp. 6517–6525. IEEE (2017)
Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. https://arxiv.org/abs/1804.02767. Accessed 15 Mar 2019
Luo, W., Xing, J., Milan, A., et al.: Multiple object tracking: a literature review. https://arxiv.org/abs/1409.7618. Accessed 15 Jan 2019
Jérôme, B., Fleuret, F., Engin, T., et al.: Multiple object tracking using K-shortest paths optimization. IEEE Trans. Pattern Anal. Mach. Intell. 33(9), 1806–1819 (2011)
Article Google Scholar
Bewley, A., Ge, Z., Ott, L., et al.: Simple online and realtime tracking. https://arxiv.org/abs/1602.00763. Accessed 18 Feb 2019
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2017, Lecce, pp. 1–6. IEEE (2017)
Google Scholar
Chu, Q., Ouyang, W., Li, H., et al.: Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, pp. 4846–4855. IEEE (2017)
Google Scholar
Leal, T., Laura, F.C.C., Schindler, K.: Learning by tracking: Siamese CNN for robust target association. In: Computer Vision and Pattern Recognition Conference Workshops, pp. 33–40 (2016)
Google Scholar
Son, J., Baek, M., Cho, M., et al.: Multi-object tracking with quadruplet convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Hawaii, pp. 5620–5629. IEEE (2017)
Google Scholar
Zhao, H., Xia, S., Zhao, J., Zhu, D., Yao, R., Niu, Q.: Pareto-based many-objective convolutional neural networks. In: Meng, X., Li, R., Wang, K., Niu, B., Wang, X., Zhao, G. (eds.) WISA 2018. LNCS, vol. 11242, pp. 3–14. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02934-0_1
Chapter Google Scholar
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. https://arxiv.org/abs/1703.07402. Accessed 22 Feb 2019
Chu, Q., Ouyang, W., Li, H., et al.: Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, pp. 4836–4845. IEEE (2017)
Google Scholar
Bertinetto, L., Valmadre, J., Golodetz, S., et al.: Staple: complementary learners for real-time tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, pp. 1401–1409. IEEE (2016)
Google Scholar
Fan, H., Ling, H.: Parallel tracking and verifying: a framework for real-time and high accuracy visual tracking. In: IEEE International Conference on Computer Vision, ICCV 2017, Venice, pp. 5487–5495. IEEE (2017)
Google Scholar
Li, Y., Zhu, J.: A scale adaptive Kernel correlation filter tracker with feature integration. In: Agapito, L., Bronstein, M.M., Rother, C. (eds.) ECCV 2014, Part II. LNCS, vol. 8926, pp. 254–265. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16181-5_18
Chapter Google Scholar
Chu, P., Fan, H., Tan, C.C., et al.: Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In: IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Hawaii, pp. 161–170. IEEE (2019)
Google Scholar
Julier, S.J., Uhlmann, J.K.: New extension of the Kalman filter to nonlinear systems. In: Signal Processing, Sensor Fusion, and Target Recognition VI, vol. 3068, pp. 182–194. International Society for Optics and Photonics (1997)
Google Scholar

Download references

Acknowledgement

This research is supported by the National Key R&D Program of China under Grant No. 2018YFB1003404.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Northeastern University, Shenyang, 110819, China
Heng Guan, Huimin Liu & Zhibin Zhao
School of Software, Northeastern University, Shenyang, 110819, China
Minghe Yu

Authors

Heng Guan
View author publications
You can also search for this author in PubMed Google Scholar
Huimin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Minghe Yu
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Heng Guan .

Editor information

Editors and Affiliations

Southeast University, Nanjing, China
Weiwei Ni
Tianjin University, Tianjin, China
Xin Wang
Wuhan University, Wuhan, China
Wei Song
Tianjin University of Technology, Tianjin, China
Yukun Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guan, H., Liu, H., Yu, M., Zhao, Z. (2019). Research and Implementation of Vehicle Tracking Algorithm Based on Multi-Feature Fusion. In: Ni, W., Wang, X., Song, W., Li, Y. (eds) Web Information Systems and Applications. WISA 2019. Lecture Notes in Computer Science(), vol 11817. Springer, Cham. https://doi.org/10.1007/978-3-030-30952-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-30952-7_8
Published: 16 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30951-0
Online ISBN: 978-3-030-30952-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Research and Implementation of Vehicle Tracking Algorithm Based on Multi-Feature Fusion

Abstract

Similar content being viewed by others

Vehicle Detection and Tracking from Surveillance Cameras in Urban Scenes

A nighttime highway traffic flow monitoring system using vision-based vehicle detection and tracking

Efficient vehicle detection and tracking strategy in aerial videos by employing morphological operations and feature points motion analysis

Keywords

1 Introduction

2 Related Work

3 Data Association

4 Experiments

5 Conclusions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Research and Implementation of Vehicle Tracking Algorithm Based on Multi-Feature Fusion

Abstract

Similar content being viewed by others

Vehicle Detection and Tracking from Surveillance Cameras in Urban Scenes

A nighttime highway traffic flow monitoring system using vision-based vehicle detection and tracking

Efficient vehicle detection and tracking strategy in aerial videos by employing morphological operations and feature points motion analysis

Keywords

1 Introduction

2 Related Work

3 Data Association

4 Experiments

5 Conclusions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation