Abstract
This study presents a unified network to localize dynamic, static traffic objects and pedestrians, classify traffic light colors, detect drivable area and lane line simultaneously. In the network architecture, traffic object branch is created to classify dynamic objects such as cars, trucks, buses, motorcycle, and bicycle. Static objects are categorized by traffic sign and traffic light objects. Pedestrians are also localized as a separate traffic object group. Traffic light is classified correctly when it is visible. The network design has a unified architecture, one shared encoder for feature extraction and three decoders for three tasks. For benchmarking purposes, the BDD100K dataset is used. The presented model is ranked in the second place for drivable area segmentation, lane line detection, and inference speed while benchmarking with publicly available multi-networks. In comparison with respect to state-of-the art segmentation models re-trained with BDD100K dataset, the task of dynamic object localization’s MIoU metric is reached to the level of 73.54%, which is 40% higher than the results of re-trained segmentation methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Pan X, Shi J, Luo P, Wang X, Tang X (2018) Spatial as deep: spatial cnn for traffic scene understanding. Proc AAAI conf Artif Intell 32(1)
Qin Z, Wang H, Li X (2020) Ultra fast structure-aware deep lane detection. arXiv Prepr. arXiv:2004.11757
Hou Y, Ma Z, Liu C, Loy CC (2019) Learning lightweight lane detection CNNs by self attention distillation. In; Proceedings of the IEEE/CVF international conference on computer vision, pp 1013–1021
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via regionbased fully convolutional networks. arXiv Prepr. arXiv:1605.06409
Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), July 2017
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Yin M, Yao Z, Cao Y, Li X, Zhang Z, Lin S, Hu H (2020) Disentangled non-local neural networks. arXiv Prepr. arXiv: 2006.06668
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical image computing and computer-assisted intervention—MICCAI 2015. Springer International Publishing, Cham, pp 234–241
Qian Y, Dolan JM, Yang M (2019) DLT-Net: joint detection of drivable areas, lane lines, and traffic objects. IEEE Trans Intell Transp Syst 21(11):4670–4679
Teichmann M, Weber M, Zoellner M, Cipolla R, Urtasun R (2016) Multinet: real-time joint semantic reasoning for autonomous driving. arXiv Prepr. arXiv:1612.07695
Wu D, Liao M, Zhang W, Wang (2021) YOLOP: you only look once for panoptic driving perception. arXiv Prepr. arXiv:2108.11250
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)
Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T (2018) Bdd100k: a diverse driving video database with scalable annotation tooling. arXiv Prepr 2(5):6. arXiv:1805.04687
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv Prepr. arXiv:1409.1556
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss ´for dense object detection. In: Proceedings of the IEEE ınternational conference on computer vision, pp 2980–2988
The web page presenting drivable area results of networks for benchmarking purposes: https://github.com/SysCV/bdd100k-models/tree/main/drivable
The web page presenting segmentation results of networks for benchmarking purposes: https://github.com/SysCV/bdd100k-models/tree/main/sem_seg
The website for titan-XP GPU, used to compare performance with other GPUs: https://www.nvidia.com/en-us/titan/titan-xp/
The website for tesla-p100 GPU, used to compare performance with other GPUs: https://www.nvidia.com/en-us/data-center/tesla-p100/
Acknowledgements
This work has been supported by the Scientific Research Projects Commission of Galatasaray University under grant #19.401.005.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Çaldıran, B.E., Acarman, T. (2023). Multi-network for Joint Detection of Dynamic and Static Objects in a Road Scene Captured by an RGB Camera. In: Ranganathan, G., Fernando, X., Rocha, Á. (eds) Inventive Communication and Computational Technologies. Lecture Notes in Networks and Systems, vol 383. Springer, Singapore. https://doi.org/10.1007/978-981-19-4960-9_63
Download citation
DOI: https://doi.org/10.1007/978-981-19-4960-9_63
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-4959-3
Online ISBN: 978-981-19-4960-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)