Skip to main content

Multi-network for Joint Detection of Dynamic and Static Objects in a Road Scene Captured by an RGB Camera

  • Conference paper
  • First Online:
Inventive Communication and Computational Technologies

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 383))

  • 447 Accesses

Abstract

This study presents a unified network to localize dynamic, static traffic objects and pedestrians, classify traffic light colors, detect drivable area and lane line simultaneously. In the network architecture, traffic object branch is created to classify dynamic objects such as cars, trucks, buses, motorcycle, and bicycle. Static objects are categorized by traffic sign and traffic light objects. Pedestrians are also localized as a separate traffic object group. Traffic light is classified correctly when it is visible. The network design has a unified architecture, one shared encoder for feature extraction and three decoders for three tasks. For benchmarking purposes, the BDD100K dataset is used. The presented model is ranked in the second place for drivable area segmentation, lane line detection, and inference speed while benchmarking with publicly available multi-networks. In comparison with respect to state-of-the art segmentation models re-trained with BDD100K dataset, the task of dynamic object localization’s MIoU metric is reached to the level of 73.54%, which is 40% higher than the results of re-trained segmentation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Pan X, Shi J, Luo P, Wang X, Tang X (2018) Spatial as deep: spatial cnn for traffic scene understanding. Proc AAAI conf Artif Intell 32(1)

    Google Scholar 

  2. Qin Z, Wang H, Li X (2020) Ultra fast structure-aware deep lane detection. arXiv Prepr. arXiv:2004.11757

  3. Hou Y, Ma Z, Liu C, Loy CC (2019) Learning lightweight lane detection CNNs by self attention distillation. In; Proceedings of the IEEE/CVF international conference on computer vision, pp 1013–1021

    Google Scholar 

  4. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

    Google Scholar 

  5. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

    Google Scholar 

  6. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

    Google Scholar 

  7. Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via regionbased fully convolutional networks. arXiv Prepr. arXiv:1605.06409

  8. Zhao H, Shi J, Qi X, Wang X, Jia J (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), July 2017

    Google Scholar 

  9. Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848

    Article  Google Scholar 

  10. Yin M, Yao Z, Cao Y, Li X, Zhang Z, Lin S, Hu H (2020) Disentangled non-local neural networks. arXiv Prepr. arXiv: 2006.06668

    Google Scholar 

  11. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

    Google Scholar 

  12. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical image computing and computer-assisted intervention—MICCAI 2015. Springer International Publishing, Cham, pp 234–241

    Chapter  Google Scholar 

  13. Qian Y, Dolan JM, Yang M (2019) DLT-Net: joint detection of drivable areas, lane lines, and traffic objects. IEEE Trans Intell Transp Syst 21(11):4670–4679

    Article  Google Scholar 

  14. Teichmann M, Weber M, Zoellner M, Cipolla R, Urtasun R (2016) Multinet: real-time joint semantic reasoning for autonomous driving. arXiv Prepr. arXiv:1612.07695

  15. Wu D, Liao M, Zhang W, Wang (2021) YOLOP: you only look once for panoptic driving perception. arXiv Prepr. arXiv:2108.11250

  16. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  17. Yu F, Xian W, Chen Y, Liu F, Liao M, Madhavan V, Darrell T (2018) Bdd100k: a diverse driving video database with scalable annotation tooling. arXiv Prepr 2(5):6. arXiv:1805.04687

  18. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

    Google Scholar 

  19. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv Prepr. arXiv:1409.1556

  20. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis (IJCV) 115(3):211–252

    Article  MathSciNet  Google Scholar 

  21. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss ´for dense object detection. In: Proceedings of the IEEE ınternational conference on computer vision, pp 2980–2988

    Google Scholar 

  22. The web page presenting drivable area results of networks for benchmarking purposes: https://github.com/SysCV/bdd100k-models/tree/main/drivable

  23. The web page presenting segmentation results of networks for benchmarking purposes: https://github.com/SysCV/bdd100k-models/tree/main/sem_seg

  24. The website for titan-XP GPU, used to compare performance with other GPUs: https://www.nvidia.com/en-us/titan/titan-xp/

  25. The website for tesla-p100 GPU, used to compare performance with other GPUs: https://www.nvidia.com/en-us/data-center/tesla-p100/

Download references

Acknowledgements

This work has been supported by the Scientific Research Projects Commission of Galatasaray University under grant #19.401.005.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bekir Eren Çaldıran .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Çaldıran, B.E., Acarman, T. (2023). Multi-network for Joint Detection of Dynamic and Static Objects in a Road Scene Captured by an RGB Camera. In: Ranganathan, G., Fernando, X., Rocha, Á. (eds) Inventive Communication and Computational Technologies. Lecture Notes in Networks and Systems, vol 383. Springer, Singapore. https://doi.org/10.1007/978-981-19-4960-9_63

Download citation

Publish with us

Policies and ethics