Abstract
This paper addresses multi-target tracking using a monocular vision sensor. To overcome the fundamental observability issue of the monocular vision, a convolutional neural network (CNN)-based method is proposed. The method combines a CNN-based multi-target detection into a model-based multi-target tracking framework. While previous CNN applications to image-based object recognition and tracking focused on prediction of region of interest (RoI), the proposed method allows for prediction of the three-dimensional position information of the moving objects of interest. This is achieved by appropriately construct a network tailored to the moving object tracking problems with potentially occluded objects. In addition, the cubature Kalman filter integrated with a data association scheme is adopted for effective tracking of nonlinear motion of the objects with the measurements information from the learned network. A virtual simulator that generates the trajectories of the target motions and a sequence of images of the scene has been developed and used to test and verify the proposed CNN scheme. Simulation case studies demonstrate that the proposed CNN improves the position accuracy in the depth direction substantially.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
A. Yilmaz, O. Javed, and M. Shah, “Object tracking: a survey,” Acm Computing Surveys, vol. 38, no. 4, 2006.
K. Seo, J. Shin, W. Kim, and J. Lee, “Real-time object tracking and segmentation using adaptive color snake model,” International Journal of Control, Automation, and Systems, vol. 4, no. 2, pp. 236–246, 2006.
A. Chosh, B. N. Subudhi, and S. Chosh, “Object detection from videos captured by moving camera by fuzzy edge incorporated Markov random field and local histogram matching,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 8, pp. 1127–1135, 2012.
T. Moranduzzo and F. Melgani, “A SIFT-SVM method for detecting cars in UAV images,” Proc. of Geoscience and Remote Sensing Symposium, pp. 6868–6871, 2012.
J. Yang, H. Ji, and Z. Fan, “Probability hypothesis density filter based on strong tracking MIE for multiple maneuvering target tracking,” International Journal of Control, Automation, and Systems, vol. 11, no. 2, pp. 306–316, 2013.
J. Yang, P. Li, L. Yang, and H. Ge, “An improved ET-GM-PHD filter for multiple closely-spaced extended target tracking,” International Journal of Control, Automation, and Systems, vol. 15, no. 1, pp. 468–472, 2017.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffener, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Suk-thankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural network,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732, 2014.
C. P. Papageorgiou, M. Oren, and T. Poggio, “A general framework for object detection,” Proc. of 6th IEEE International Conference on Computer Vision, pp. 555–562, 1998.
D. G. Lowe, “Object recognition from local scale-invariant features,” The Proceedings of the 7th IEEE International Conference on Computer Vision, pp. 1150–1157, 1999.
N. Dalai and B. Triggs, “Histograms of oriented gradients for human detection,” Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893, 2005.
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell, “DeCAF: a deep convolutional activation feature for generic visual recognition,” Proc. of International Conference on Machine Learning, vol. 32, no. 1, pp. 647–655, 2014.
H.-H. Kim, J.-K. Park, J.-H. Oh, and D.-J. Kang, “Multi-tast convolutional neural network system for license plate recognition,” International Journal of Control, Automation, and Systems, vol. 15, no. 6, pp. 29422949, 2017.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587, 2014.
R. Girshick, “Fast R-CNN,” Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448, 2015.
S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” Advances in Neural Information Proceeding Systems, pp. 91–99, 2015.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: unified, real-time object detection,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016.
J. Redmon and A. Farhadi, “YOLO9000: better, faster, stronger,” arXivpreprint arXiv: 1612.08242, 2016.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single Shot Multibox Detector,” Proc. of European Conference on Computer Vision, pp. 21–37, 2016.
I. Arasaratnam and S. Haykin, “Cubature Kalman filters,” IEEE Transaction on Automatic Control, vol. 54, no. 6, pp. 1254–1269, June 2009.
S. J. Julier and J. K. Uhlmann, “A new extension of the Kalman filter to nonlinear systems,” Proc. of Int. Symp. Aerospace/Defense Sensing, Simul. and Controls, 1997.
D. Shreiner, G. Sellers, J. Kessenich, and B. Licea-Kane, OpenGL Programming Guide: The Official Guide to Learning OpenGL, version 4.3, Addison-Wesley, 2013.
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman, “The Pascal visual object classes (VOC) challenge,” International Journal of Computer Vision, vol. 88, no. 2, pp. 303–338, 2010.
T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, and C. L. Zitnick, “Microsoft COCO: common objects in context,” Proc. of European Conference on Computer Vision, pp. 740–755, 2014.
S. Bouabdallah, A. Noth, and R. Siegwart, “PID vs LQ control techniques applied to an indoor micro quadrotor,” Proc. oflEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 3, pp. 2451–2456, 2004.
L. R. G. Carrillo, A. E. D. Lopez, R. Lozano, and C. Pegard, “Modeling the quad-rotor minirotorcraft, Quad Ro-torcraft Control, Advances in Industrial Control, Springer, London, pp. 23–34, 2013.
F. Rinaldi, S. Chiesa, and F. Quagliotti, “Linear quadratic control for quadrotors UAVs dynamics and formation flight,” Journal of Intelligent & Robotic Systems, vol. 70, no. 1–4, pp. 203–220, April 2013.
D. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, 2014.
M. D. Jeiler, “ADADELTA: an adaptive learning rate method,” arXiv preprint arXiv: 1212.5701, 2012.
T. Tieleman and G. Hinton, “Lecture 6.5-RMSprop: divide the gradient by a running average of its recent magnitude,” Coursera: Neural Networks for Machine Learning, pp. 3919–3924, 2012.
L. Y. Pao and R. M. Powers, “A comparison of several different approaches for target tracking with clutter,” Proc. of the American Control Conference, vol. 5, 2003.
B. L. Stevens, F. L. Lewis, and E. N. Johnson, Aircraft Control and Simulation: Dynamics, Controls Design, and Autonomous Systems, John Wiley and Sons.
Author information
Authors and Affiliations
Corresponding author
Additional information
Recommended by Associate Editor Vu Nguyen under the direction of Editor Won-jong Kim. This work was supported in part by ICT R&D program of the Ministry of Science & ICT via Institute for ICT Planning and Evaluation (#R-20150223-000167), and in part by Defense Acquisition Program Administration via High-speed Vehicle Research Center and Agency for Defense Development (#UD170018CD).
Sang-Hyeon Kim is a senior researcher in Samsung Electronics. He received the Ph.D. decree in Aerospace Engineering at KAIST (Korea Advanced Institute of Science and Technology) in 2018. Prior to this, he received the B.S. degree in Aerospace Engineering from the Inha University, Incheon, Korea, in 2011 and the M.S. degree in Aerospace Engineering from KAIST, Daejeon, Korea in 2013. His research interests include vision-based estimation and control for autonomous systems and deep learning techniques.
Han-Lim Choi is an Associate Professor of Aerospace Engineering at KAIST (Korea Advanced Institute of Science and Technology). He received the B.S. and M.S. degrees in aerospace engineering from KAIST, Daejeon, Korea, in 2000 and 2002, respectively, and the Ph.D. degree in aeronautics and astronautics from MIT (Massachusetts Institute of Technology), Cambridge, in 2009. His research interests include decision making for multi-agent systems and machine learning methods for dynamic systems.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kim, SH., Choi, HL. Convolutional Neural Network for Monocular Vision-based Multi-target Tracking. Int. J. Control Autom. Syst. 17, 2284–2296 (2019). https://doi.org/10.1007/s12555-018-0134-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12555-018-0134-6