A survey on deep learning-based fine-grained object classification and semantic segmentation

Zhao, Bo; Feng, Jiashi; Wu, Xiao; Yan, Shuicheng

doi:10.1007/s11633-017-1053-3

A survey on deep learning-based fine-grained object classification and semantic segmentation

Review
Published: 18 January 2017

Volume 14, pages 119–135, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Automation and Computing Aims and scope Submit manuscript

A survey on deep learning-based fine-grained object classification and semantic segmentation

Download PDF

Bo Zhao^1,2,
Jiashi Feng²,
Xiao Wu¹ &
…
Shuicheng Yan²

5508 Accesses
238 Citations
10 Altmetric
Explore all metrics

Abstract

The deep learning technology has shown impressive performance in various vision tasks such as image classification, object detection and semantic segmentation. In particular, recent advances of deep learning techniques bring encouraging performance to fine-grained image classification which aims to distinguish subordinate-level categories, such as bird species or dog breeds. This task is extremely challenging due to high intra-class and low inter-class variance. In this paper, we review four types of deep learning based fine-grained image classification approaches, including the general convolutional neural networks (CNNs), part detection based, ensemble of networks based and visual attention based fine-grained image classification approaches. Besides, the deep learning based semantic segmentation approaches are also covered in this paper. The region proposal based and fully convolutional networks based approaches for semantic segmentation are introduced respectively.

Article PDF

Review on the Methodologies for Image Segmentation Based on CNN

Survey of recent progress in semantic image segmentation with CNNs

Article 17 November 2017

RETRACTED ARTICLE: Image object detection and semantic segmentation based on convolutional neural network

Article 11 September 2019

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

A. Graves, M. Liwicki, S. Fernández, R. Bertolami, H. Bunke, J. Schmidhuber. A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 855–868, 2009.
Article Google Scholar
H. Sak, A. W. Senior, F. Beaufays. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the 15th Annual Conference of the International Speech Communication Association, ISCA, Singapore, pp. 338–342, 2014.
Google Scholar
W. Zaremba, I. Sutskever, O. Vinyals. Recurrent neural network regularization. arXiv:1409.2329, 2014.
Google Scholar
K. Cho, B. van Merrienboer, D. Bahdanau, Y. Bengio. On the properties of neural machine translation: Encoderdecoder approaches. arXiv:1409.1259, 2014.
Google Scholar
G. B. Zhou, J. X. Wu, C. L. Zhang, Z. H. Zhou, Minimal gated unit for recurrent neural networks. International Journal of Automation and Computing, vol 13, no. 3, pp. 226–234, 2016.
Article Google Scholar
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Backpropagation applied to handwritten zip code recognition. Neural Computation, vol 1, no. 4, pp. 541–551, 1989.
Article Google Scholar
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, F. F. Li. ImageNet: A large-scale hierarchical image database. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Miami, USA, pp. 248–255, 2009.
Google Scholar
A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25, NIPS, Lake Tahoe, USA, pp. 1097–1105, 2012.
Google Scholar
K. Simonyan, A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
Google Scholar
C. Szegedy, W. Liu, Y. Q. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 1–9, 2014.
Google Scholar
A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern RecognitionWorkshops, IEEE, Columbus, USA, pp. 512–519, 2014.
Google Scholar
L. X. Xie, R. C. Hong, B. Zhang, Q. Tian. Image classification and retrieval are ONE. In Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, ACM, New York, USA, pp. 3–10, 2015.
Chapter Google Scholar
L. X. Xie, L. Zheng, J. D. Wang, A. Yuille, Q. Tian. Interactive: Inter-layer activeness propagation. In Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, IEEE, Las Vegas, USA, pp. 270–279, 2016.
Google Scholar
T. Berg, P. N. Belhumeur. POOF: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Portland, USA, pp. 955–962, 2013.
Google Scholar
J. X. Liu, A. Kanazawa, D. Jacobs, P. Belhumeur, Dog breed classification using part localization. In Proceedings of the 12th European Conference on Computer Vision, Springer, Florence, Italy, vol 7572, pp. 172–185, 2012.
Google Scholar
S. L. Yang, L. F. Bo, J.Wang, L. G. Shapiro. Unsupervised template learning for fine-grained object recognition. Advances in Neural Information Processing Systems 25, NIPS, Lake Tahoe, USA, pp. 3122–3130, 2012.
Google Scholar
E. Gavves, B. Fernando, C. G. M. Snoek, A. W. M. Smeulders, T. Tuytelaars. Fine-grained categorization by alignments. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Sydney, Australia, pp. 1713–1720, 2013.
Google Scholar
Y. N. Chai, V. Lempitsky, A. Zisserman. BiCoS: A Bi-level co-segmentation method for image classification. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Barcelona, Spain, pp. 2579–2586, 2011.
Google Scholar
N. Zhang, J. Donahue, R. Girshick, T. Darrell, Part-based R-CNNs for fine-grained category detection. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, vol 8689, pp. 834–849, 2014.
Google Scholar
R. Girshick, J. Donahue, T. Darrell, J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, USA, pp. 580–587, 2014.
Google Scholar
J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders, Selective search for object recognition. International Journal of Computer Vision, vol 104, no. 2, pp. 154–171, 2013.
Article Google Scholar
K. J. Shih, A. Mallya, S. Singh, D. Hoiem. Part localization using multi-proposal consensus for fine-grained categorization. arXiv:1507.06332, 2015.
Book Google Scholar
C. L. Zitnick, P. Dollár. Edge boxes: Locating object proposals from edges. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, pp. 391–405, vol. 8693, 2014.
Google Scholar
S. Branson, G. Van Horn, S. Belongie, P. Perona. Bird species categorization using pose normalized deep convolutional nets. arXiv:1406.2952, 2014.
Google Scholar
S. Branson, O. Beijbom, S. Belongie. Efficient large-scale structured learning. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Portland, USA, pp. 1806–1813, 2013.
Google Scholar
S. L. Huang, Z. Xu, D. C. Tao, Y. Zhang. Part-stacked CNN for fine-grained visual categorization. arXiv:1512.08086, 2015.
Google Scholar
O. Matan, C. J. C. Burges, Y. LeCun, J. S. Denker. Multidigit recognition using a space displacement neural network. Advances in Neural Information Processing Systems 4, NIPS, San Mateo, USA, pp. 488–495, 1992.
Google Scholar
D. Lin, X. Y. Shen, C. W. Lu, J. Y. Jia. Deep LAC: Deep localization, alignment and classification for fine-grained recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 1666–1674, 2015.
Google Scholar
J. P. W. Pluim, J. B. A. Maintz, M. A. Viergever, Mutualinformation-based registration of medical images: A survey. IEEE Transactions on Medical Imaging, vol 22, no. 8, pp. 986–1004, 2003.
Article Google Scholar
Z. Y. Ge, C. McCool, C. Sanderson, P. Corke. Subset feature learning for fine-grained category classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Boston, USA, pp. 46–52, 2015.
Google Scholar
Z. Y. Ge, A. Bewley, C. McCool, P. Corke, B. Upcroft, C. Sanderson. Fine-grained classification via mixture of deep convolutional neural networks. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, IEEE, Lake Placid, USA, pp. 1–6, 2016.
Google Scholar
Z. H. Wang, X. X. Wang, G. Wang. Learning finegrained features via a CNN tree for large-scale classification. arXiv:1511.04534, 2015.
Google Scholar
D. Q. Wang, Z. Q. Shen, J. Shao, W. Zhang, X. Y. Xue, Z. Zhang. Multiple granularity descriptors for fine-grained categorization. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 2399–2406, 2015.
Google Scholar
T. Y. Lin, A. RoyChowdhury, S. Maji. Bilinear CNN models for fine-grained visual recognition. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 1449–1457, 2015.
Google Scholar
M. Cimpoi, S. Maji, I. Kokkinos, S. Mohamed, A. Vedaldi. Describing textures in the wild. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, USA, pp. 3606–3613, 2014.
Google Scholar
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzen, T. Darrel. DeCAF: A deep convolutional activation feature for generic visual recognition. arXiv:1310.1531, 2013.
Google Scholar
A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson. CNN features off-the-shelf: An astounding baseline for recognition. In Proceedings of IEEE Conference on Computer Vision and Pattern RecognitionWorkshops, IEEE, Columbus, USA, pp. 512–519, 2014.
Google Scholar
T. J. Xiao, Y. C. Xu, K. Y. Yang, J. X. Zhang, Y. X. Peng, Z. Zhang. The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 842–850, 2015.
Google Scholar
P. Sermanet, A. Frome, E. Real. Attention for fine-grained categorization. arXiv:1412.7054, 2014.
Google Scholar
J. Ba, V. Mnih, K. Kavukcuoglu. Multiple object recognition with visual attention. arXiv:1412.7755, 2014.
Google Scholar
X. Liu, T. Xia, J. Wang, Y. Q. Lin. Fully convolutional attention localization networks: Efficient attention localization for fine-grained recognition. arXiv:1603.06765, 2016.
Google Scholar
V. Mnih, N. Heess, A. Graves, K. kavukcuoglu. Recurrent models of visual attention. Advances in Neural Information Processing Systems 27, Montréal, Canada, pp. 2204–2212, 2014.
Google Scholar
B. Zhao, X. Wu, J. S. Feng, Q. Peng, S. C. Yan. Diversified visual attention networks for fine-grained object classification. arXiv:1606.08572, 2016.
Google Scholar
C. Wah, S. Branson, P. Welinder, P. Perona, S. Belongie. The Caltech-UCSD Birds-200-2011 Dataset, Computation & Neural Systems, Technical Report, CNS-TR, California Institute of Technology, USA, 2011.
Google Scholar
S. Sharma, R. Kiros, R. Salakhutdinov. Action recognition using visual attention. arXiv:1511.04119, 2015.
Google Scholar
M. Jaderberg, K. Simonyan, A. Zisserman, K. Kavukcuoglu. Spatial transformer networks. Advances in Neural Information Processing Systems 28, Montréal, Canada,pp. 2017–2025, 2015.
Google Scholar
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv:1502.03044, 2015.
Google Scholar
R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, vol 8, no. 3–4, pp. 229–256, 1992.
MATH Google Scholar
C. Farabet, C. Couprie, L. Najman, Y. LeCun, Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 35, no. 8, pp. 1915–1929, 2013.
Article Google Scholar
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected CRFs. arXiv:1412.7062, 2014.
Google Scholar
J. Long, E. Shelhamer, T. Darrell. Fully convolutional networks for semantic segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, USA, pp. 3431–3440, 2015.
Google Scholar
B. Hariharan, P. Arbeláez, R. Girshick, J. Malik, Simultaneous detection and segmentation. In Proceedings of the 13th European Conference on Computer Vision, Springer, Zurich, Switzerland, vol 8695, pp. 297–312, 2014.
Google Scholar
J. F. Dai, K. M. He, J. Sun. Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 1635–1643, 2015.
Google Scholar
P. Arbeláez, J. Pont-Tuset, J. Barron, F. Marques, J. Malik. Multiscale combinatorial grouping. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Columbus, USA, pp. 328–335, 2014.
Google Scholar
S. Gupta, R. Girshick, P. Arbeláez, J. Malik, Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of the 13th European Conference Computer Vision, Springer, Zurich, Switzerland, vol 8695, pp. 345–360, 2014.
Google Scholar
H. Noh, S. Hong, B. Han. Learning deconvolution network for semantic segmentation. In Proceedings of IEEE International Conference on Computer Vision, IEEE, Santiago, Chile, pp. 1520–1528, 2015.
Google Scholar
L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. arXiv:1606.00915, 2016.
Google Scholar
D. R. Liu, Hong-Liang Li, L. D. Wang, Feature selection and feature learning for high-dimensional batch reinforcement learning: A survey. International Journal of Automation and Computing, vol 12, no. 3, pp. 229–242, 2015.
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Science and Technology, Southwest Jiaotong University, Chengdu, 613000, China
Bo Zhao & Xiao Wu
Department of Electrical and Computer Engineering, National University of Singapore, Singapore, 117583, Singapore
Bo Zhao, Jiashi Feng & Shuicheng Yan

Authors

Bo Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jiashi Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Shuicheng Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiao Wu.

Additional information

This work was supported by the National Natural Science Foundation of China (Nos. 61373121 and 61328205), Program for Sichuan Provincial Science Fund for Distinguished Young Scholars (No. 13QNJJ0149), the Fundamental Research Funds for the Central Universities, and China Scholarship Council (No. 201507000032).

Recommended by Associate Editor Nazim Mir-Nasiri

Bo Zhao received the B. Sc. degree in networking engineering from Southwest Jiaotong University in 2010. He is a Ph.D. degree candidate at School of Information Science and Technology, Southwest Jiaotong University, China. Currently, he is at the Department of Electrical and Computer Engineering, National University of Singapore, Singapore as a visiting scholar.

His research interests include multimedia, computer vision and machine learning.

ORCID iD: 0000-0002-2120-2571

Jiashi Feng received the B.Eng. degree from University of Science and Technology, China in 2007, and the Ph.D. degree from National University of Singapore, Singapore in 2014. He was a postdoc researcher at University of California, USA from 2014 to 2015. He is currently an assistant professor at Department of Electrical and Computer Engineering, National University of Singapore, Singapore.

His research interests include machine learning and computer vision techniques for large-scale data analysis. Specifically, he has done work in object recognition, deep learning, machine learning, high-dimensional statistics and big data analysis.

Xiao Wu received the B.Eng. and M. Sc. degrees in computer science from Yunnan University, China in 1999 and 2002, respectively, and the Ph.D. degree in computer science from City University of Hong Kong, China in 2008. He is an associate professor at Southwest Jiaotong University, China. He is the assistant dean of School of Information Science and Technology, and the head of Department of Computer Science and Technology. Currently, he is at School of Information and Computer Science, University of California, USA as a visiting associate professor. He was a research assistant and a senior research associate at the City University of Hong Kong, China from 2003 to 2004, and 2007 to 2009, respectively. From 2006 to 2007, he was with the School of Computer Science, Carnegie Mellon University, USA as a visiting scholar. He was with the Institute of Software, Chinese Academy of Sciences, China, from 2001 to 2002. He received the second prize of Natural Science Award of the Ministry of Education, China in 2015.

His research interests include multimedia information retrieval, image/video computing and data mining.

ORCID iD: 0000-0002-8322-8558

Shuicheng Yan is currently an associate professor at the Department of Electrical and Computer Engineering, National University of Singapore, Singapore, the founding lead of the Learning and Vision Research Group (http://www.lvnus.org). He has authored/co-authored nearly 400 technical papers over a wide range of research topics, with Google Scholar citation>12 000 times. He is ISI highly-cited researcher 2014, and IAPR Fellow 2014. He has been serving as an associate editor of IEEE Transactions on Knowledge and Data Engineering, Computer Vision and Image Understanding and IEEE Transactions on Circuits and Systems for Video Technology. He received the Best Paper Awards from ACM MM’13 (Best paper and Best student paper), ACM MM’12 (Best demo), PCM’11, ACM MM’10, ICME’10 and ICIMCS’09, the runnerup prize of ILSVRC’13, the winner prizes of the classification task in PASCAL VOC 2010–2012, the winner prize of the segmentation task in PASCAL VOC 2012, the honorable mention prize of the detection task in PASCAL VOC’10, 2010 TCSVT Best Associate Editor (BAE) Award, 2010 Young Faculty Research Award, 2011 Singapore Young Scientist Award, and 2012 NUS Young Researcher Award.

His research interests include machine learning, computer vision and multimedia.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, B., Feng, J., Wu, X. et al. A survey on deep learning-based fine-grained object classification and semantic segmentation. Int. J. Autom. Comput. 14, 119–135 (2017). https://doi.org/10.1007/s11633-017-1053-3

Download citation

Received: 01 July 2016
Accepted: 30 September 2016
Published: 18 January 2017
Issue Date: April 2017
DOI: https://doi.org/10.1007/s11633-017-1053-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A survey on deep learning-based fine-grained object classification and semantic segmentation

Abstract

Article PDF

Similar content being viewed by others

Review on the Methodologies for Image Segmentation Based on CNN

Survey of recent progress in semantic image segmentation with CNNs

RETRACTED ARTICLE: Image object detection and semantic segmentation based on convolutional neural network

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey on deep learning-based fine-grained object classification and semantic segmentation

Abstract

Article PDF

Similar content being viewed by others

Review on the Methodologies for Image Segmentation Based on CNN

Survey of recent progress in semantic image segmentation with CNNs

RETRACTED ARTICLE: Image object detection and semantic segmentation based on convolutional neural network

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation