Abstract
Object detection is widely used in object tracking; anchor-free object tracking provides an end-to-end single-object-tracking approach. In this study, we propose a new anchor-free network, the Siamese center-prediction network (SiamCPN). Given the presence of referenced object features in the initial frame, we directly predict the center point and size of the object in subsequent frames in a Siamese-structure network without the need for perframe post-processing operations. Unlike other anchor-free tracking approaches that are based on semantic segmentation and achieve anchor-free tracking by pixel-level prediction, SiamCPN directly obtains all information required for tracking, greatly simplifying the model. A center-prediction sub-network is applied to multiple stages of the backbone to adaptively learn from the experience of different branches of the Siamese net. The model can accurately predict object location, implement appropriate corrections, and regress the size of the target bounding box. Compared to other leading Siamese networks, SiamCPN is simpler, faster, and more efficient as it uses fewer hyperparameters. Experiments demonstrate that our method outperforms other leading Siamese networks on GOT-10K and UAV123 benchmarks, and is comparable to other excellent trackers on LaSOT, VOT2016, and OTB-100 while improving inference speed 1.5 to 2 times.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Danelljan, M.; Häger, G.; Shahbaz Khan, F.; Felsberg, M. Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference, 2014.
Henriques, J. F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583–596, 2015.
Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 7, 1409–1422, 2012.
Fan, R. C.; Zhang, F. L.; Zhang, M.; Martin, R. R. Robust tracking-by-detection using a selection and completion mechanism. Computational Visual Media Vol. 3, No. 3, 285–294, 2017.
Bertinetto, L.; Valmadre, J.; Henriques, J. F.; Vedaldi, A.; Torr, P. H. S. Fully-convolutional Siamese networks for object tracking. In: Computer Vision — ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua, G.; Jégou, H. Eds. Springer Cham, 850–865, 2016.
Li, B.; Wu, W.; Wang, Q.; Zhang, F. Y.; Xing, J. L.; Yan, J. J. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4277–4286, 2019.
Li, B.; Yan, J. J.; Wu, W.; Zhu, Z.; Hu, X. L. High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8971–8980, 2018.
Tao, R.; Gavves, E.; Smeulders, A. W. M. Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1420–1429, 2016.
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137–1149, 2017.
Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J. J.; Hu, W. M. Distractor-aware Siamese networks for visual object tracking. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11213. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 103–119, 2018.
Zhang, Z. P.; Peng, H. W. Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586–4595, 2019.
Guo, D. Y.; Wang, J.; Cui, Y.; Wang, Z. H.; Chen, S. Y. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6268–6276, 2020.
Zhang, Z. P.; Peng, H. W.; Fu, J. L.; Li, B.; Hu, W. M. Ocean: Object-aware anchor-free tracking. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12366. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 771–787, 2020.
Han, G.; Du, H.; Liu, J. X.; Sun, N.; Li, X. F. Fully conventional anchor-free Siamese networks for object tracking. IEEE Access Vol. 7, 123934–123943, 2019.
Wang, Q.; Zhang, L.; Bertinetto, L.; Hu, W. M.; Torr, P. H. S. Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1328–1338, 2019.
Xu, Y. D.; Wang, Z. Y.; Li, Z. X.; Yuan, Y.; Yu, G. SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 7, 12549–12556, 2020.
Peng, S. Y.; Yu, Y. X.; Wang, K.; He, L. Accurate anchor free tracking. arXiv preprint arXiv: 2006.07560, 2020.
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84–90, 2017.
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 1, 172–186, 2018.
Newell, A.; Yang, K. Y.; Deng, J. Stacked hourglass networks for human pose estimation. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 483–499, 2016.
Papandreou, G.; Zhu, T.; Kanazawa, N.; Toshev, A.; Tompson, J.; Bregler, C.; Murphy, K. Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3711–3719, 2017.
Zhou, X. Y.; Wang, D. Q.; Krähenbuhl, P. Objects as points. arXiv preprint arXiv: 1904.07850, 2019.
Law, H.; Deng, J. CornerNet: Detecting objects as paired keypoints. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11218. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 765–781, 2018.
Lin, T. Y.; Goyal, P.; Girshick, R.; He, K. M.; Dollár, P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2999–3007, 2017.
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211–252, 2015.
Huang, L. H.; Zhao, X.; Huang, K. Q. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: https://doi.org/10.1109/TPAMI.2019.2957464, 2019.
Fan, H.; Lin, L. T.; Yang, F.; Chu, P.; Deng, G.; Yu, S. J.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. LaSOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5369–5378, 2019.
Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision — ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740–755, 2014.
Real, E.; Shlens, J.; Mazzocchi, S.; Pan, X.; Vanhoucke, V. YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7464–7473, 2017.
Wu, Y.; Lim, J.; Yang, M. H. Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2411–2418, 2013.
Wu, Y.; Lim, J.; Yang, M. H. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 9, 1834–1848, 2015.
Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Čehovin, L.; Voj´r, T.; Häger, G.; Lukežič, A.; Fernández, G. et al. The visual object tracking VOT2016 challenge results. In: Computer Vision — ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua, G.; Jégou, H. Eds. Springer Cham, 777–823, 2016.
Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for UAV tracking. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 445–461, 2016.
Henriques, J. F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583–596, 2015.
Danelljan, M.; Hager, G.; Khan, F. S.; Felsberg, M. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 8, 1561–1575, 2017.
Danelljan, M.; Häger, G.; Khan, F. S.; Felsberg, M. Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 4310–4318, 2015.
Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P. H. S. Staple: Complementary learners for realtime tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1401–1409, 2016.
Valmadre, J.; Bertinetto, L.; Henriques, J.; Vedaldi, A.; Torr, P. H. S. End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5000–5008, 2017.
Nam, H.; Han, B. Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4293–4302, 2016.
Danelljan, M.; Bhat, G.; Khan, F. S.; Felsberg, M. ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6931–6939, 2017.
Danelljan, M.; Robinson, A.; Shahbaz Khan, F.; Felsberg, M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9909, Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 472–488, 2016.
Wang, G. T.; Luo, C.; Xiong, Z. W.; Zeng, W. J. SPM-tracker: Series-parallel matching for realtime visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3638–3647, 2019.
Danelljan, M.; Bhat, G.; Khan, F. S.; Felsberg, M. ATOM: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4655–4664, 2019.
Miller, G. A. WordNet. Communications of the ACM Vol. 38, No. 11, 39–41, 1995.
Guo, Q.; Feng, W.; Zhou, C.; Huang, R.; Wan, L.; Wang, S. Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 1781–1789, 2017.
Hong, Z. B.; Zhe, C.; Wang, C. H.; Mei, X.; Prokhorov, D.; Tao, D. C. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 749–758, 2015.
Zhang, J. M.; Ma, S. G.; Sclaroff, S. MEEM: Robust tracking via multiple experts using entropy minimization. In: Computer Vision — ECCV 2014. Lecture Notes in Computer Science, Vol. 8694. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 188–203, 2014.
Hare, S.; Golodetz, S.; Saffari, A.; Vineet, V.; Cheng, M. M.; Hicks, S. L.; Torr, P. H. S. Struck: Structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 10, 2096–2109, 2016.
Acknowledgements
Code and experimental data are available at https://github.com/KevinDongDong/SCPN. Other data required for experiments (including training data, test data, etc.) are provided by websites given in references.
We thank the anonymous reviewers for their valuable comments. This work was supported by the National Key R&D Program of China (Grant No. 2018YFC0807500), and the National Natural Science Foundation of China (Grant Nos. U20B2070 and 61832016).
Author information
Authors and Affiliations
Corresponding author
Additional information
Dong Chen is a student in the School of Artificial Intelligence, University of the Chinese Academy of Sciences. He received his B.E. degree in computer science and technology from Shihezi University in 2017. He is currently working towards an M.Eng. degree at the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. His research interests include computer vision and machine learning.
Fan Tang is an assistant professor in the School of Artificial Intelligence, Jilin University. He received his B.Sc. degree in computer science from North China Electric Power University in 2013 and his Ph.D. degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, in 2019. His research interests include computer graphics, computer vision, and machine learning.
Weiming Dong is a professor in the Sino-European Lab in Computer Science, Automation and Applied Mathematics (LIAMA) and National Laboratory of Pattern Recognition (NLPR) at the Institute of Automation, Chinese Academy of Sciences. He received his B.Sc. and M.Sc. degrees in computer science in 2001 and 2004, both from Tsinghua University, China. He received his Ph.D. degree in computer science from the University of Lorraine, France, in 2007. His research interests include computational visual media and computational creativity.
Hanxing Yao received his B.Sc. degree in architectural engineering in 1999 and his M.Sc. degree in computer science in 2002, both from Chongqing University, China. He is the director of the AI Department of Beijing LLVISION Technology Co., Ltd. His research interests include computer vision and video retrieval.
Changsheng Xu is a professor in the National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences and Executive Director of the China-Singapore Institute of Digital Media. His research interests include multimedia content analysis, indexing and retrieval, pattern recognition, and computer vision. He holda 30 granted or pending patents and has published over 200 refereed research papers in these areas. He is an Associate Editor of IEEE Trans. on Multimedia, ACM Trans. on Multimedia Computing, Communications and Applications, and ACM/Springer Multimedia Systems Journal.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.
The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.
About this article
Cite this article
Chen, D., Tang, F., Dong, W. et al. SiamCPN: Visual tracking with the Siamese center-prediction network. Comp. Visual Media 7, 253–265 (2021). https://doi.org/10.1007/s41095-021-0212-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41095-021-0212-1