Abstract
Human detection in crowded situations represents a challenging task in many practically relevant scenarios. In this paper we propose a passive stereo depth based human detection scheme employing a hierarchically-structured tree of learned shape templates for delineating clusters corresponding to humans. In order to enhance the specificity of the depth-based detection approach towards humans, we also incorporate a visual object recognition modality in form of a deeply-trained model. We propose a simple way to combine the depth and appearance modalities to better cope with complex effects such as heavily occluded and small-sized humans, and clutter. Obtained results are analyzed in terms of improvements and shortcomings introduced by the individual detection modalities. Our proposed combination achieves a good accuracy at a decent computational speed in difficult scenarios exhibiting crowded situations. Hence in our view, the presented concepts represent a detection scheme of practical relevance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Beleznai, C., Zweng, A., Netousek, T., Birchbauer, J.A.: Multi-resolution binary shape tree for efficient 2D clustering. In: 3rd IAPR Asian Conference on Pattern Recognition, pp. 569–573 (2015)
Beyer, L., Hermans, A., Linder, T., Arras, K.O., Leibe, B.: Deep person detection in 2D range data (2018). arXiv:1804.02463
Bradski, G.R.: Computer vision face tracking for use in a perceptual user interface. Intel Technol. J. (Q2), 15 (1998)
Bulò, S.R., Neuhold, G., Kontschieder, P.: Loss max-pooling for semantic image segmentation. In: Proceedings of CVPR, pp. 7082–7091. IEEE Computer Society (2017)
Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. PAMI 24, 603–619 (2002)
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: Proceedings of BMVC, pp. 91.1–91.11 (2009)
Engelmann, F., Stückler, J., Leibe, B.: Joint object pose estimation and shape reconstruction in urban street scenes using 3D shape priors. In: Proceedings of the German Conference on Pattern Recognition (GCPR) (2016)
Felzenszwalb, P., Mcallester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2008)
Girshick, R.: Fast R-CNN. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 ( 2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE International Conference on Computer Vision, ICCV 2017, pp. 2980–2988 (2017)
Humenberger, M., Zinner, C., Weber, M., Kubinger, W., Vincze, M.: A fast stereo matching algorithm suitable for embedded real-time systems. Comput. Vis. Image Underst. 114(11), 1180–1202 (2010)
Krotosky, S., Trivedi, M.M.: A comparison of color and infrared stereo approaches to pedestrian detection. In: 2007 IEEE Intelligent Vehicles Symposium, pp. 81–86 (2007)
Linder, T., Arras, K.O.: Multi-model hypothesis tracking of groups of people in RGB-D data. In: 17th International Conference on Information Fusion, FUSION, pp. 1–7 (2014)
Linder, T., Breuers, S., Leibe, B., Arras, K.O.: On multi-modal people tracking from mobile platforms in very crowded and dynamic environments. IEEE International Conference on Robotics and Automation (ICRA), pp. 5512–5519 (2016)
Liu, H., Luo, J., Wu, P., Xie, S., Li, H.: People detection and tracking using RGB-D cameras for mobile robots. Int. J. Adv. Robot. Syst. 13(5), 1–8 (2016)
Lu, H., Li, Y., Chen, M., Kim, H., Serikawa, S.: Brain intelligence: go beyond artificial intelligence. Mob. Netw. Appl. (2017)
Lu, H., Li, Y., Uemura, T., Kim, H., Serikawa, S.: Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Gen. Comput. Syst. 82, 142–148 (2018)
Muñoz Salinas, R., Aguirre, E., García-Silvente, M.: People detection and tracking using stereo vision and color. Image Vis. Comput. 25(6), 995–1007 (2007)
Neubeck, A., Van Gool, L.: Efficient non-maximum suppression. In: Proceedings of International Conference on Pattern Recognition, vol 3, pp. 850–855 (2006)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of Advances in Neural Information Processing Systems, vol. 28, pp. 91–99 (2015)
Woonhyun, N., Dollár, P., Hee Han, J.: Local decorrelation for improved pedestrian detection. In: Proceedings of NIPS (2014)
Yu, F., Wang, D., Darrell, T.: Deep layer aggregation. In: Proceedings of CVPR. IEEE Computer Society (2018)
Zhang, S., Benenson, R., Omran, M., Hosang, J., Schiele, B.: Towards reaching human performance in pedestrian detection. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 973–986 (2018)
Acknowledgements
The authors thank both the Austrian Federal Ministry for Transport, Innovation and Technology as well as the Austrian Research Promotion Agency (FFG) for co-funding the research project “LEAL” (FFG Nr. 850218) within the National Research Development Programme KIRAS Austria.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Beleznai, C., Steininger, D., Broneder, E. (2020). Human Detection in Crowded Situations by Combining Stereo Depth and Deeply-Learned Models. In: Lu, H. (eds) Cognitive Internet of Things: Frameworks, Tools and Applications. ISAIR 2018. Studies in Computational Intelligence, vol 810. Springer, Cham. https://doi.org/10.1007/978-3-030-04946-1_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-04946-1_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04945-4
Online ISBN: 978-3-030-04946-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)