Abstract
Deep neural networks have greatly facilitated the applications of semantic segmentation. However, most of the existing neural networks bring massive calculations with lots of model parameters for achieving a higher precision, which is unaffordable for resource-constrained edge devices. To achieve an appropriate trade-off between computing efficiency and segmentation accuracy, we proposed an effective lightweight attention-guided network (ELANet) for real-time semantic segmentation based on an asymmetrical encoder–decoder framework in this work. In the encoding phase, we combined atrous convolution and depth-wise convolution to design two types of effective context guidance blocks to learn contextual semantic information. A refined feature fusion module with a dual attention-guided fusion (DAF) unit was developed in the decoder to exploit different levels of features. Without any pretraining, we estimated the performance of multi-attention ELANet with extensive experiments on the Cityscapes dataset with an input resolution of 512\(\times \)1024, resulting in 75.4% mIoU and 83 FPS inference speed with only 0.76 M parameters and 10.34 GFLOPs on a single 3090 GPU. The code is publicly available at https://github.com/DGS666/ELANet.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hong C, Yu J, Zhang J, Jin X, Lee K-H (2018) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inf 15(7):3952–3961
Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell 44(2):563–578
Yu J, Yao J, Zhang J, Yu Z, Tao D (2020) SPRNet: single-pixel reconstruction for one-stage instance segmentation. IEEE Trans Cybern 51(4):1731–1742
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Siam M, Gamal M, Abdel-Razek M, Yogamani S, Jagersand M, Zhang H (2018) A comparative study of real-time semantic segmentation for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 587–597
Siam M, Elkerdawy S, Jagersand M, Yogamani S (2017) Deep semantic segmentation for automated driving: Taxonomy, roadmap and challenges. In: 2017 IEEE 20th international conference on intelligent transportation systems (ITSC). IEEE, pp 1–8
Bovcon B, Perš J, Kristan M et al (2018) Stereo obstacle detection for unmanned surface vehicles by imu-assisted semantic segmentation. Robot Auton Syst 104:1–13
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017)MobileNets: efficient convolutional neural networks for mobile vision applications. Preprint at arXiv:1704.04861
Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856
Romera E, Alvarez JM, Bergasa LM, Arroyo R (2017) ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans Intell Transp Syst 19(1):263–272
Li G, Yun I, Kim J, Kim J (2019) DABNet: depth-wise asymmetric bottleneck for real-time semantic segmentation. Preprint at arXiv:1907.11357
Zhang X, Du B, Wu Z, Wan T (2022) LAANet: lightweight attention-guided asymmetric network for real-time semantic segmentation. Neural Comput Appl:1–15
Li Y, Li X, Xiao C, Li H, Zhang W (2021) EACNet: enhanced asymmetric convolution for real-time semantic segmentation. IEEE Signal Process Lett 28:234–238
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: 4th international conference on learning representations
Li Y, Li M, Li Z, Xiao C, Li H (2022) EFRNet: efficient feature reuse network for real-time semantic segmentation. Neural Process Lett:1–13
Elhassan MA, Huang C, Yang C, Munea TL (2021) DSANet: dilated spatial attention for real-time semantic segmentation in urban street scenes. Expert Syst Appl 183:115090
Lin G, Milan A, Shen C, Reid I (2017) RefineNet: multi-path refinement networks for high-resolution semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1925–1934
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7132–7141
Wang Q, Wu B, Zhu PF, Li P, Zuo W, Hu Q (2020) ECA-Net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 11531–11539
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Brostow GJ, Fauqueur J, Cipolla R (2009) Semantic object classes in video: a high-definition ground truth database. Pattern Recogn Lett 30(2):88–97
Caesar H, Uijlings J, Ferrari V (2018) Coco-stuff: thing and stuff classes in context. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1209–1218
Peng G, Yang S, Wang H (2021) Refine for semantic segmentation based on parallel convolutional network with attention model. Neural Process Lett 53(6):4177–4188
Paszke A, Chaurasia A, Kim S, Culurciello E (2016) ENet: a deep neural network architecture for real-time semantic segmentation. Preprint arXiv:1606.02147
Li G, Li L, Zhang J (2021) BiAttnNet: bilateral attention for improving real-time semantic segmentation. IEEE Signal Process Lett 29:46–50
Wang Y, Zhou Q, Liu J, Xiong J, Gao G, Wu X, Latecki LJ (2019) LEDNet: a lightweight encoder-decoder network for real-time semantic segmentation. In: 2019 IEEE international conference on image processing (ICIP). IEEE, pp 1860–1864
Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recogn 116:107952
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. Preprint at arXiv:2010.11929
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Yang X, Li S, Chen Z, Chanussot J, Jia X, Zhang B, Li B, Chen P (2021) An attention-fused network for semantic segmentation of very-high-resolution remote sensing imagery. ISPRS J Photogramm Remote Sens 177:238–262
Wu T, Tang S, Zhang R, Cao J, Zhang Y (2020) CGNet: a light-weight context guided network for semantic segmentation. IEEE Trans Image Process 30:1169–1179
Hao X, Hao X, Zhang Y, Li Y, Wu C (2021) Real-time semantic segmentation with weighted factorized-depthwise convolution. Image Vis Comput 114:104269
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning. PMLR, pp 1139–1147
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Preprint at arXiv:1412.6980
Mehta S, Rastegari M, Caspi A, Shapiro L, Hajishirzi H (2018) ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 552–568
Zhuang M, Zhong X, Gu D, Feng L, Zhong X, Hu H (2021) LRDNet: a lightweight and efficient network with refined dual attention decorder for real-time semantic segmentation. Neurocomputing 459:349–360
Zhou Q, Wang Y, Fan Y, Wu X, Zhang S, Kang B, Latecki LJ (2020) AGLNet: towards real-time semantic segmentation of self-driving images via attention-guided lightweight network. Appl Soft Comput 96:106682
Lu M, Chen Z, Wu QJ, Wang N, Rong X, Yan X (2020) FRNet: factorized and regular blocks network for semantic segmentation in road scene. IEEE Trans Intell Transp Syst
Liu J, Zhou Q, Qiang Y, Kang B, Wu X, Zheng B (2020) FDDWNet: a lightweight convolutional neural network for real-time semantic segmentation. In: ICASSP 2020-2020 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 2373–2377
Jiang W, Xie Z, Li Y, Liu C, Lu H (2020) LRNNET: a light-weighted network with efficient reduced non-local operation for real-time semantic segmentation. In: 2020 IEEE international conference on multimedia & expo workshops (ICMEW). IEEE, pp 1–6
Yu C, Xiao B, Gao C, Yuan L, Zhang L, Sang N, Wang J (2021) Lite-HRNet: a lightweight high-resolution network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10440–10450
Liu J, Xu X, Shi Y, Deng C, Shi M (2022) RELAXNet: residual efficient learning and attention expected fusion network for real-time semantic segmentation. Neurocomputing 474:115–127
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) BiSeNet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341
Badrinarayanan V, Kendall A, Cipolla R (2017) SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495
Li H, Xiong P, Fan H, Sun J (2019) DFAnet: deep feature aggregation for real-time semantic segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9522–9531
Zhang X-L, Du B-C, Luo Z-C, Ma K (2022) Lightweight and efficient asymmetric network design for real-time semantic segmentation. Appl Intell 52(1):564–579
Zhao H, Qi X, Shen X, Shi J, Jia J (2018) ICNet for real-time semantic segmentation on high-resolution images. In: Proceedings of the European conference on computer vision (ECCV), pp 405–420
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2017) DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell 40(4):834–848
Acknowledgements
This work was supported by Guangzhou Leading Talents in Innovation and Entrepreneurship under Grant No. 2019019, the National Natural Science Foundation of China under Grant No. 62002134, Guangdong Basic and Applied Basic Research Foundation under Grant No. 2020A1515110645, the Key Laboratory of New Semiconductors and Devices of Guangdong Higher Education Institutes under Grant No. 2021KSY001, the Fundamental Research Funds for the Central Universities at Jinan University under Grant No. 21620353, and the JNU-Techtotop Joint Foundation of Postgraduates Training Base under Grant No. 82621176.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yi, Q., Dai, G., Shi, M. et al. ELANet: Effective Lightweight Attention-Guided Network for Real-Time Semantic Segmentation. Neural Process Lett 55, 6425–6442 (2023). https://doi.org/10.1007/s11063-023-11145-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-023-11145-z