Abstract
In recent years, convolutional neural networks (CNNs) are leading the way in many computer vision tasks, such as image classification, object detection, and face recognition. In order to produce more refined semantic image segmentation, we survey the powerful CNNs and novel elaborate layers, structures and strategies, especially including those that have achieved the state-of-the-art results on the Pascal VOC 2012 semantic segmentation challenge. Moreover, we discuss their different working stages and various mechanisms to utilize the structural and contextual information in the image and feature spaces. Finally, combining some popular underlying referential methods in homologous problems, we propose several possible directions and approaches to incorporate existing effective methods as components to enhance CNNs for the segmentation of specific semantic objects.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Liang G Q, Ca J N, Liu X F, et al. Smart world: a better world. Sci China Inf Sci, 2016, 59: 043401
Wang J L, Lu Y H, Liu J B, et al. A robust three-stage approach to large-scale urban scene recognition. Sci China Inf Sci, 2017, 60: 103101
Wang W, Hu L H, Hu Z Y. Energy-based multi-view piecewise planar stereo. Sci China Inf Sci, 2017, 60: 032101
Hoiem D, Efros A A, Herbert M. Recovering surface layout from an image. Int J Comput Vis, 2007, 75: 151–172
Saxena A, Sun M, Ng A Y. Make3d: learning 3d scene structure from a single still image. IEEE Trans Patt Anal Mach Intell, 2009, 31: 824–840
Gould S, Fulton R, Koller D. Decomposing a scene into geometric and semantically consistent regions. In: Proceedings of the IEEE International Conference on Computer Vision, Kyoto, 2009. 1–8
Gupta A, Efros A A, Hebert M. Blocks world revisited: image understanding using qualitative geometry and mechanics. In: Proceedings of European Conference on Computer Vision, Crete, 2010. 482–496
Zhao Y B, Zhu S C. Image parsing via stochastic scene grammar. In: Proceedings of the Conference and Workshop on Neural Information Processing System, Granada, 2011. 73–81
Liu C, Yuen J, Torralba A. Nonparametric scene parsing via label transfer. IEEE Trans Patt Anal Mach Intell, 2011, 33: 2368–2382
Stella X Y, Zhang H, Malik J. Inferring spatial layout from a single image via depth-ordered grouping. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Anchorage, 2008
Lee D C, Hebert M, Kanade T. Geometric reasoning for single image structure recovery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Miami, 2009. 2136–2143
Zheng Y, Byeungwoo J, Xu D, et al. Image segmentation by generalized hierarchical fuzzy C-means algorithm. J Intell Fuzzy Syst, 2015, 28: 4024–4028
Liu C, Yuen J, Torralba A. SIFT flow: dense correspondence across scenes and its applications. IEEE Trans Softw Eng, 2010, 33: 978–994
Papandreou G, Chen L C, Murphy K P, et al. Weakly-and semi-supervised learning of a deep convolutional network for semantic image segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1742–1750
Ghiasi G, Fowlkes C C. Laplacian pyramid reconstruction and refinement for semantic segmentation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 519–534
Peng C, Zhang X Y, Yu G, et al. Large kernel matters—improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 4353–4361
Everingham M, van Gool L, Williams C K I, et al. The Pascal visual object classes (VOC) challenge. Int J Comput Vis, 2010, 88: 303–338
Zheng S, Jayasumana S, Romera-Paredes B, et al. Conditional random fields as recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1529–1537
Lin G S, Shen C H, van den Hengel A, et al. Efficient piecewise training of deep structured models for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3194–3203
Liu Z W, Li X X, Luo P, et al. Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1377–1385
Lin G S, Shen C H, Reid I, et al. Deeply learning the messages in message passing inference. Comput Sci, 2015, 71: 866–872
Shuai B, Zuo Z, Wang B, et al. Dag-recurrent neural networks for scene labeling. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3620–3629
Kuen J, Wang Z H, Wang G. Recurrent attentional networks for saliency detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3668–3677
Liang X D, Shen X H, Xiang D L, et al. Semantic object parsing with local-global long short-term memory. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3185–3193
Noh H, Hong S, Han B. Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1520–1528
Yu F, Koltun V. Multi-scale context aggregation by dilated convolutions. In: Proceedings of International Conference on Learning Representations, San Juan, 2016
Chen L C, Papandreou G, Kokkinos I, et al. Deeplab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. arXiv:1606.00915, 2016
Sermanet P, Fergus R, LeCun Y, et al. Overfeat: integrated recognition, localization and detection using convolutional networks. In: Proceedings of International Conference on Learning Representations, Banff, 2014
Zeiler M D, Fergus R. Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 818–833
Krähenbühl P, Koltun V. Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Proceedings of Advances in Neural Information Processing Systems, Granada, 2011. 109–117
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: Proceedings of International Conference on Learning Representations, San Diego. 2015
He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 770–778
Gao W, Zhou Z H. Dropout Rademacher complexity of deep neural networks. Sci China Inf Sci, 2016, 59: 072104
Wu Z F, Shen C H, Hengel A. High-performance semantic segmentation using very deep fully convolutional networks. arXiv:1604.04339, 2016
Hariharan B, Arbeláez P, Girshick R, et al. Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 447–456
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3431–3440
Xie S N, Tu Z W. Holistically-nested edge detection. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1395–1403
Lin G S, Milan A, Shen C H, et al. RefineNet: multi-path refinement networks with identity mappings for highresolution semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 1925–1934
Wu Z F, Shen C H, Hengel A. Wider or deeper: revisiting the ResNet model for visual recognition. arXiv:1611.10080, 2016
Hong S, Oh J, Lee H, et al. Learning transferrable knowledge for semantic segmentation with deep convolutional neural network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3204–3212
Chen L C, Yang Y, Wang J, et al. Attention to scale: scale-aware semantic image segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3640–3649
Liu S, Qi X J, Shi J P, et al. Multi-scale patch aggregation (MPA) for simultaneous detection and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3141–3149
Bertasius G, Shi J, Torresani L. Semantic segmentation with boundary neural fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3602–3610
Mostajabi M, Yadollahpour P, Shakhnarovich G. Feedforward semantic segmentation with zoom-out features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, 2015. 3376–3385
Hong S, Noh H, Han B. Decoupled deep neural network for semi-supervised semantic segmentation. In: Proceedings of Advances in Neural Information Processing Systems, Montreal, 2015. 1495–1503
Arnab A, Jayasumana S, Zheng S, et al. Higher order conditional random fields in deep neural networks. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 524–540
Vemulapalli R, Tuzel O, Liu M Y, et al. Gaussian conditional random field network for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3224–3233
Zhao H S, Shi J P, Qi X J, et al. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, 2017. 2881–2890
Yang J, Price B, Cohen S, et al. Object contour detection with a fully convolutional encoder-decoder network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 193–202
Lee C Y, Xie S, Gallagher P, et al. Deeply-supervised nets. In: Proceedings of Artificial Intelligence and Statistics, San Diego, 2015. 562–570
Kokkinos I. Pushing the boundaries of boundary detection using deep learning. In: Proceedings of International Conference on Learning Representations, San Juan, 2016
Giusti A, Ciresan D C, Masci J, et al. Fast image scanning with deep max-pooling convolutional neural networks. In: Proceedings of the 20th IEEE International Conference on Image Processing (ICIP), Melbourne, 2013. 4034–4038
Sutton C, Mccallum A. Piecewise training for undirected models. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence. Edinburgh: AUAI Press, 2005. 568–575
Adams A, Baek J, Davis M A. Fast high-dimensional filtering using the permutohedral lattice. Comput Graph Forum, 2010, 29: 753–762
Dai J F, He K M, Sun J. Boxsup: exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, Santiago, 2015. 1635–1643
Rother C, Kolmogorov V, Blake A. Grabcut: interactive foreground extraction using iterated graph cuts. ACM Trans Graph, 2004, 23: 309–314
Uijlings J R R, van de Sande K E A, Gevers T, et al. Selective search for object recognition. Int J Comput Vis, 2013, 104: 154–171
Arbeláez P, Pont-Tuset J, Barron J, et al. Multiscale combinatorial grouping. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, 2014. 328–335
Krahenbhl P, Koltun V. Geodesic object proposals. In: Proceedings of European Conference on Computer Vision, Zurich, 2014. 725–739
Lin D, Dai J F, Jia J Y, et al. Scribblesup: scribble-supervised convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3159–3167
Romera-Paredes B, Torr P H S. Recurrent instance segmentation. In: Proceedings of European Conference on Computer Vision, Amsterdam, 2016. 312–329
Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, 2016. 3150–3158
Acknowledgements
This work was supported by National High-tech R&D Program of China (863 Program) (Grant No. 2015AA016403) and National Natural Science Foundation of China (Grant Nos. 61572061, 61472020).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Geng, Q., Zhou, Z. & Cao, X. Survey of recent progress in semantic image segmentation with CNNs. Sci. China Inf. Sci. 61, 051101 (2018). https://doi.org/10.1007/s11432-017-9189-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-017-9189-6