SiamCPN: Visual tracking with the Siamese center-prediction network

Chen, Dong; Tang, Fan; Dong, Weiming; Yao, Hanxing; Xu, Changsheng

doi:10.1007/s41095-021-0212-1

SiamCPN: Visual tracking with the Siamese center-prediction network

Research Article
Open access
Published: 05 April 2021

Volume 7, pages 253–265, (2021)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

SiamCPN: Visual tracking with the Siamese center-prediction network

Download PDF

Dong Chen^1,2,4,
Fan Tang³,
Weiming Dong^1,2,4,
Hanxing Yao^4,5 &
…
Changsheng Xu^1,2,4

805 Accesses
5 Citations
Explore all metrics

Abstract

Object detection is widely used in object tracking; anchor-free object tracking provides an end-to-end single-object-tracking approach. In this study, we propose a new anchor-free network, the Siamese center-prediction network (SiamCPN). Given the presence of referenced object features in the initial frame, we directly predict the center point and size of the object in subsequent frames in a Siamese-structure network without the need for perframe post-processing operations. Unlike other anchor-free tracking approaches that are based on semantic segmentation and achieve anchor-free tracking by pixel-level prediction, SiamCPN directly obtains all information required for tracking, greatly simplifying the model. A center-prediction sub-network is applied to multiple stages of the backbone to adaptively learn from the experience of different branches of the Siamese net. The model can accurately predict object location, implement appropriate corrections, and regress the size of the target bounding box. Compared to other leading Siamese networks, SiamCPN is simpler, faster, and more efficient as it uses fewer hyperparameters. Experiments demonstrate that our method outperforms other leading Siamese networks on GOT-10K and UAV123 benchmarks, and is comparable to other excellent trackers on LaSOT, VOT2016, and OTB-100 while improving inference speed 1.5 to 2 times.

Article PDF

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

Article 04 July 2022

Ocean: Object-Aware Anchor-Free Tracking

Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks

Article Open access 06 January 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Danelljan, M.; Häger, G.; Shahbaz Khan, F.; Felsberg, M. Accurate scale estimation for robust visual tracking. In: Proceedings of the British Machine Vision Conference, 2014.
Henriques, J. F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583–596, 2015.
Article Google Scholar
Kalal, Z.; Mikolajczyk, K.; Matas, J. Tracking-learning-detection. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 34, No. 7, 1409–1422, 2012.
Article Google Scholar
Fan, R. C.; Zhang, F. L.; Zhang, M.; Martin, R. R. Robust tracking-by-detection using a selection and completion mechanism. Computational Visual Media Vol. 3, No. 3, 285–294, 2017.
Article Google Scholar
Bertinetto, L.; Valmadre, J.; Henriques, J. F.; Vedaldi, A.; Torr, P. H. S. Fully-convolutional Siamese networks for object tracking. In: Computer Vision — ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua, G.; Jégou, H. Eds. Springer Cham, 850–865, 2016.
Chapter Google Scholar
Li, B.; Wu, W.; Wang, Q.; Zhang, F. Y.; Xing, J. L.; Yan, J. J. SiamRPN++: Evolution of Siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4277–4286, 2019.
Li, B.; Yan, J. J.; Wu, W.; Zhu, Z.; Hu, X. L. High performance visual tracking with Siamese region proposal network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8971–8980, 2018.
Tao, R.; Gavves, E.; Smeulders, A. W. M. Siamese instance search for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1420–1429, 2016.
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 6, 1137–1149, 2017.
Article Google Scholar
Zhu, Z.; Wang, Q.; Li, B.; Wu, W.; Yan, J. J.; Hu, W. M. Distractor-aware Siamese networks for visual object tracking. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11213. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 103–119, 2018.
Chapter Google Scholar
Zhang, Z. P.; Peng, H. W. Deeper and wider Siamese networks for real-time visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4586–4595, 2019.
Guo, D. Y.; Wang, J.; Cui, Y.; Wang, Z. H.; Chen, S. Y. SiamCAR: Siamese fully convolutional classification and regression for visual tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6268–6276, 2020.
Zhang, Z. P.; Peng, H. W.; Fu, J. L.; Li, B.; Hu, W. M. Ocean: Object-aware anchor-free tracking. In: Computer Vision — ECCV 2020. Lecture Notes in Computer Science, Vol. 12366. Vedaldi, A.; Bischof, H.; Brox, T.; Frahm, J. M. Eds. Springer Cham, 771–787, 2020.
Chapter Google Scholar
Han, G.; Du, H.; Liu, J. X.; Sun, N.; Li, X. F. Fully conventional anchor-free Siamese networks for object tracking. IEEE Access Vol. 7, 123934–123943, 2019.
Article Google Scholar
Wang, Q.; Zhang, L.; Bertinetto, L.; Hu, W. M.; Torr, P. H. S. Fast online object tracking and segmentation: A unifying approach. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1328–1338, 2019.
Xu, Y. D.; Wang, Z. Y.; Li, Z. X.; Yuan, Y.; Yu, G. SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, No. 7, 12549–12556, 2020.
Article Google Scholar
Peng, S. Y.; Yu, Y. X.; Wang, K.; He, L. Accurate anchor free tracking. arXiv preprint arXiv: 2006.07560, 2020.
Krizhevsky, A.; Sutskever, I.; Hinton, G. E. ImageNet classification with deep convolutional neural networks. Communications of the ACM Vol. 60, No. 6, 84–90, 2017.
Article Google Scholar
He, K. M.; Zhang, X. Y.; Ren, S. Q.; Sun, J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778, 2016.
Cao, Z.; Hidalgo, G.; Simon, T.; Wei, S.; Sheikh, Y. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 43, No. 1, 172–186, 2018.
Article Google Scholar
Newell, A.; Yang, K. Y.; Deng, J. Stacked hourglass networks for human pose estimation. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9912. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 483–499, 2016.
Chapter Google Scholar
Papandreou, G.; Zhu, T.; Kanazawa, N.; Toshev, A.; Tompson, J.; Bregler, C.; Murphy, K. Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3711–3719, 2017.
Zhou, X. Y.; Wang, D. Q.; Krähenbuhl, P. Objects as points. arXiv preprint arXiv: 1904.07850, 2019.
Law, H.; Deng, J. CornerNet: Detecting objects as paired keypoints. In: Computer Vision — ECCV 2018. Lecture Notes in Computer Science, Vol. 11218. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 765–781, 2018.
Chapter Google Scholar
Lin, T. Y.; Goyal, P.; Girshick, R.; He, K. M.; Dollár, P. Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, 2999–3007, 2017.
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M. ImageNet large scale visual recognition challenge. International Journal of Computer Vision Vol. 115, No. 3, 211–252, 2015.
Article MathSciNet Google Scholar
Huang, L. H.; Zhao, X.; Huang, K. Q. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild. IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: https://doi.org/10.1109/TPAMI.2019.2957464, 2019.
Fan, H.; Lin, L. T.; Yang, F.; Chu, P.; Deng, G.; Yu, S. J.; Bai, H.; Xu, Y.; Liao, C.; Ling, H. LaSOT: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5369–5378, 2019.
Lin, T. Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C. L. Microsoft COCO: Common objects in context. In: Computer Vision — ECCV 2014. Lecture Notes in Computer Science, Vol. 8693. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 740–755, 2014.
Chapter Google Scholar
Real, E.; Shlens, J.; Mazzocchi, S.; Pan, X.; Vanhoucke, V. YouTube-BoundingBoxes: A large high-precision human-annotated data set for object detection in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 7464–7473, 2017.
Wu, Y.; Lim, J.; Yang, M. H. Online object tracking: A benchmark. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2411–2418, 2013.
Wu, Y.; Lim, J.; Yang, M. H. Object tracking benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 9, 1834–1848, 2015.
Article Google Scholar
Kristan, M.; Leonardis, A.; Matas, J.; Felsberg, M.; Pflugfelder, R.; Čehovin, L.; Voj´r, T.; Häger, G.; Lukežič, A.; Fernández, G. et al. The visual object tracking VOT2016 challenge results. In: Computer Vision — ECCV 2016 Workshops. Lecture Notes in Computer Science, Vol. 9914. Hua, G.; Jégou, H. Eds. Springer Cham, 777–823, 2016.
Chapter Google Scholar
Mueller, M.; Smith, N.; Ghanem, B. A benchmark and simulator for UAV tracking. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9905. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 445–461, 2016.
Chapter Google Scholar
Henriques, J. F.; Caseiro, R.; Martins, P.; Batista, J. High-speed tracking with kernelized correlation filters. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 37, No. 3, 583–596, 2015.
Article Google Scholar
Danelljan, M.; Hager, G.; Khan, F. S.; Felsberg, M. Discriminative scale space tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 39, No. 8, 1561–1575, 2017.
Article Google Scholar
Danelljan, M.; Häger, G.; Khan, F. S.; Felsberg, M. Learning spatially regularized correlation filters for visual tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 4310–4318, 2015.
Bertinetto, L.; Valmadre, J.; Golodetz, S.; Miksik, O.; Torr, P. H. S. Staple: Complementary learners for realtime tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 1401–1409, 2016.
Valmadre, J.; Bertinetto, L.; Henriques, J.; Vedaldi, A.; Torr, P. H. S. End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 5000–5008, 2017.
Nam, H.; Han, B. Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 4293–4302, 2016.
Danelljan, M.; Bhat, G.; Khan, F. S.; Felsberg, M. ECO: Efficient convolution operators for tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 6931–6939, 2017.
Danelljan, M.; Robinson, A.; Shahbaz Khan, F.; Felsberg, M. Beyond correlation filters: Learning continuous convolution operators for visual tracking. In: Computer Vision — ECCV 2016. Lecture Notes in Computer Science, Vol. 9909, Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 472–488, 2016.
Chapter Google Scholar
Wang, G. T.; Luo, C.; Xiong, Z. W.; Zeng, W. J. SPM-tracker: Series-parallel matching for realtime visual object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3638–3647, 2019.
Danelljan, M.; Bhat, G.; Khan, F. S.; Felsberg, M. ATOM: Accurate tracking by overlap maximization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 4655–4664, 2019.
Miller, G. A. WordNet. Communications of the ACM Vol. 38, No. 11, 39–41, 1995.
Article Google Scholar
Guo, Q.; Feng, W.; Zhou, C.; Huang, R.; Wan, L.; Wang, S. Learning dynamic Siamese network for visual object tracking. In: Proceedings of the IEEE International Conference on Computer Vision, 1781–1789, 2017.
Hong, Z. B.; Zhe, C.; Wang, C. H.; Mei, X.; Prokhorov, D.; Tao, D. C. MUlti-Store Tracker (MUSTer): A cognitive psychology inspired approach to object tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 749–758, 2015.
Zhang, J. M.; Ma, S. G.; Sclaroff, S. MEEM: Robust tracking via multiple experts using entropy minimization. In: Computer Vision — ECCV 2014. Lecture Notes in Computer Science, Vol. 8694. Fleet, D.; Pajdla, T.; Schiele, B.; Tuytelaars, T. Eds. Springer Cham, 188–203, 2014.
Chapter Google Scholar
Hare, S.; Golodetz, S.; Saffari, A.; Vineet, V.; Cheng, M. M.; Hicks, S. L.; Torr, P. H. S. Struck: Structured output tracking with kernels. IEEE Transactions on Pattern Analysis and Machine Intelligence Vol. 38, No. 10, 2096–2109, 2016.
Article Google Scholar

Download references

Acknowledgements

Code and experimental data are available at https://github.com/KevinDongDong/SCPN. Other data required for experiments (including training data, test data, etc.) are provided by websites given in references.

We thank the anonymous reviewers for their valuable comments. This work was supported by the National Key R&D Program of China (Grant No. 2018YFC0807500), and the National Natural Science Foundation of China (Grant Nos. U20B2070 and 61832016).

Author information

Authors and Affiliations

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, 100040, China
Dong Chen, Weiming Dong & Changsheng Xu
NLPR, Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Dong Chen, Weiming Dong & Changsheng Xu
School of Artificial Intelligence, Jilin University, Changchun, 130012, China
Fan Tang
CASIA-LLVISION Joint Lab, Beijing, 100190, China
Dong Chen, Weiming Dong, Hanxing Yao & Changsheng Xu
LLVISION Technology Co., LTD., Beijing, 100190, China
Hanxing Yao

Authors

Dong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Fan Tang
View author publications
You can also search for this author in PubMed Google Scholar
Weiming Dong
View author publications
You can also search for this author in PubMed Google Scholar
Hanxing Yao
View author publications
You can also search for this author in PubMed Google Scholar
Changsheng Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weiming Dong.

Additional information

Dong Chen is a student in the School of Artificial Intelligence, University of the Chinese Academy of Sciences. He received his B.E. degree in computer science and technology from Shihezi University in 2017. He is currently working towards an M.Eng. degree at the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences. His research interests include computer vision and machine learning.

Fan Tang is an assistant professor in the School of Artificial Intelligence, Jilin University. He received his B.Sc. degree in computer science from North China Electric Power University in 2013 and his Ph.D. degree from the National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, in 2019. His research interests include computer graphics, computer vision, and machine learning.

Weiming Dong is a professor in the Sino-European Lab in Computer Science, Automation and Applied Mathematics (LIAMA) and National Laboratory of Pattern Recognition (NLPR) at the Institute of Automation, Chinese Academy of Sciences. He received his B.Sc. and M.Sc. degrees in computer science in 2001 and 2004, both from Tsinghua University, China. He received his Ph.D. degree in computer science from the University of Lorraine, France, in 2007. His research interests include computational visual media and computational creativity.

Hanxing Yao received his B.Sc. degree in architectural engineering in 1999 and his M.Sc. degree in computer science in 2002, both from Chongqing University, China. He is the director of the AI Department of Beijing LLVISION Technology Co., Ltd. His research interests include computer vision and video retrieval.

Changsheng Xu is a professor in the National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences and Executive Director of the China-Singapore Institute of Digital Media. His research interests include multimedia content analysis, indexing and retrieval, pattern recognition, and computer vision. He holda 30 granted or pending patents and has published over 200 refereed research papers in these areas. He is an Associate Editor of IEEE Trans. on Multimedia, ACM Trans. on Multimedia Computing, Communications and Applications, and ACM/Springer Multimedia Systems Journal.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Chen, D., Tang, F., Dong, W. et al. SiamCPN: Visual tracking with the Siamese center-prediction network. Comp. Visual Media 7, 253–265 (2021). https://doi.org/10.1007/s41095-021-0212-1

Download citation

Received: 10 January 2021
Accepted: 04 February 2021
Published: 05 April 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s41095-021-0212-1

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

SiamCPN: Visual tracking with the Siamese center-prediction network

Abstract

Article PDF

Similar content being viewed by others

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

Ocean: Object-Aware Anchor-Free Tracking

Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SiamCPN: Visual tracking with the Siamese center-prediction network

Abstract

Article PDF

Similar content being viewed by others

Siamese Centerness Prediction Network for Real-Time Visual Object Tracking

Ocean: Object-Aware Anchor-Free Tracking

Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation