Human Detection in Crowded Situations by Combining Stereo Depth and Deeply-Learned Models

Beleznai, Csaba; Steininger, Daniel; Broneder, Elisabeth

doi:10.1007/978-3-030-04946-1_47

Csaba Beleznai³,
Daniel Steininger³ &
Elisabeth Broneder⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 810))

Included in the following conference series:

International Symposium on Artificial Intelligence and Robotics

807 Accesses
1 Citations

Abstract

Human detection in crowded situations represents a challenging task in many practically relevant scenarios. In this paper we propose a passive stereo depth based human detection scheme employing a hierarchically-structured tree of learned shape templates for delineating clusters corresponding to humans. In order to enhance the specificity of the depth-based detection approach towards humans, we also incorporate a visual object recognition modality in form of a deeply-trained model. We propose a simple way to combine the depth and appearance modalities to better cope with complex effects such as heavily occluded and small-sized humans, and clutter. Obtained results are analyzed in terms of improvements and shortcomings introduced by the individual detection modalities. Our proposed combination achieves a good accuracy at a decent computational speed in difficult scenarios exhibiting crowded situations. Hence in our view, the presented concepts represent a detection scheme of practical relevance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

A robust three-stage approach to large-scale urban scene recognition

Article 06 September 2017

Robust Object Tracking in Crowd Dynamic Scenes Using Explicit Stereo Depth

References

Beleznai, C., Zweng, A., Netousek, T., Birchbauer, J.A.: Multi-resolution binary shape tree for efficient 2D clustering. In: 3rd IAPR Asian Conference on Pattern Recognition, pp. 569–573 (2015)
Google Scholar
Beyer, L., Hermans, A., Linder, T., Arras, K.O., Leibe, B.: Deep person detection in 2D range data (2018). arXiv:1804.02463
Bradski, G.R.: Computer vision face tracking for use in a perceptual user interface. Intel Technol. J. (Q2), 15 (1998)
Google Scholar
Bulò, S.R., Neuhold, G., Kontschieder, P.: Loss max-pooling for semantic image segmentation. In: Proceedings of CVPR, pp. 7082–7091. IEEE Computer Society (2017)
Google Scholar
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. PAMI 24, 603–619 (2002)
Article Google Scholar
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proceedings of BMVC, pp. 91.1–91.11 (2009)
Google Scholar
Engelmann, F., Stückler, J., Leibe, B.: Joint object pose estimation and shape reconstruction in urban street scenes using 3D shape priors. In: Proceedings of the German Conference on Pattern Recognition (GCPR) (2016)
Google Scholar
Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2008)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 ( 2015)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, ICCV 2017, pp. 2980–2988 (2017)
Google Scholar
Humenberger, M., Zinner, C., Weber, M., Kubinger, W., Vincze, M.: A fast stereo matching algorithm suitable for embedded real-time systems. Comput. Vis. Image Underst. 114(11), 1180–1202 (2010)
Article Google Scholar
Krotosky, S., Trivedi, M.M.: A comparison of color and infrared stereo approaches to pedestrian detection. In: 2007 IEEE Intelligent Vehicles Symposium, pp. 81–86 (2007)
Google Scholar
Linder, T., Arras, K.O.: Multi-model hypothesis tracking of groups of people in RGB-D data. In: 17th International Conference on Information Fusion, FUSION, pp. 1–7 (2014)
Google Scholar
Linder, T., Breuers, S., Leibe, B., Arras, K.O.: On multi-modal people tracking from mobile platforms in very crowded and dynamic environments. IEEE International Conference on Robotics and Automation (ICRA), pp. 5512–5519 (2016)
Google Scholar
Liu, H., Luo, J., Wu, P., Xie, S., Li, H.: People detection and tracking using RGB-D cameras for mobile robots. Int. J. Adv. Robot. Syst. 13(5), 1–8 (2016)
Google Scholar
Lu, H., Li, Y., Chen, M., Kim, H., Serikawa, S.: Brain intelligence: go beyond artificial intelligence. Mob. Netw. Appl. (2017)
Google Scholar
Lu, H., Li, Y., Uemura, T., Kim, H., Serikawa, S.: Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Gen. Comput. Syst. 82, 142–148 (2018)
Article Google Scholar
Muñoz Salinas, R., Aguirre, E., García-Silvente, M.: People detection and tracking using stereo vision and color. Image Vis. Comput. 25(6), 995–1007 (2007)
Article Google Scholar
Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: Proceedings of International Conference on Pattern Recognition, vol 3, pp. 850–855 (2006)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Google Scholar
Woonhyun, N., Dollár, P., Hee Han, J.: Local decorrelation for improved pedestrian detection. In: Proceedings of NIPS (2014)
Google Scholar
Yu, F., Wang, D., Darrell, T.: Deep layer aggregation. In: Proceedings of CVPR. IEEE Computer Society (2018)
Google Scholar
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: Towards reaching human performance in pedestrian detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 973–986 (2018)
Article Google Scholar

Download references

Acknowledgements

The authors thank both the Austrian Federal Ministry for Transport, Innovation and Technology as well as the Austrian Research Promotion Agency (FFG) for co-funding the research project “LEAL” (FFG Nr. 850218) within the National Research Development Programme KIRAS Austria.

Author information

Authors and Affiliations

Center for Vision, Automation & Control, AIT Austrian Institute of Technology, Vienna, Austria
Csaba Beleznai & Daniel Steininger
Center for Digital Safety & Security, AIT Austrian Institute of Technology, Vienna, Austria
Elisabeth Broneder

Authors

Csaba Beleznai
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Steininger
View author publications
You can also search for this author in PubMed Google Scholar
Elisabeth Broneder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Csaba Beleznai .

Editor information

Editors and Affiliations

Department of Mechanical and Control Engineering, Kyushu Institute of Technology, Kitakyushu, Japan
Huimin Lu

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Beleznai, C., Steininger, D., Broneder, E. (2020). Human Detection in Crowded Situations by Combining Stereo Depth and Deeply-Learned Models. In: Lu, H. (eds) Cognitive Internet of Things: Frameworks, Tools and Applications. ISAIR 2018. Studies in Computational Intelligence, vol 810. Springer, Cham. https://doi.org/10.1007/978-3-030-04946-1_47

Download citation

DOI: https://doi.org/10.1007/978-3-030-04946-1_47
Published: 19 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04945-4
Online ISBN: 978-3-030-04946-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Human Detection in Crowded Situations by Combining Stereo Depth and Deeply-Learned Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

A robust three-stage approach to large-scale urban scene recognition

Robust Object Tracking in Crowd Dynamic Scenes Using Explicit Stereo Depth

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Human Detection in Crowded Situations by Combining Stereo Depth and Deeply-Learned Models

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

A robust three-stage approach to large-scale urban scene recognition

Robust Object Tracking in Crowd Dynamic Scenes Using Explicit Stereo Depth

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation