Abstract
Deep learning has been developed as an efficient machine learning approach that incorporates several layers of features or data interpretation and delivers state-of-the-art results. The application of deep learning has illustrated amazing performance in various fields of implementation, especially in the field of image classification, segmentation, and object detection. Recent developments in deep learning methods are promoting efficiency in fine-grained image classification, which intends to differentiate between sub-categories. In this paper, we include a thorough analysis of the different deep architectures and frameworks illustrating the model specifications. CNN or convolutional neural network has been the fundamental way to go for object detection computer vision and much more. But with the development of data into a more complex form, classical CNN is not capable to provide up to the mark results. Hence, this review paper is made with the aim to bring some prominent models and techniques back into the light and provide their results on different popular datasets. There have been key discoveries made which are discussed throughout the paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105:2295–2329
Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20:30–42
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484
Wang XJ, Zhao LL, Wang S (2012) A novel SVM video object extraction technology. In: 2012 8th international conference on natural computation. IEEE, pp 44–48
Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, no 22, pp 41–46
Li J, Li J (2015) Fast image search with deep convolutional neural networks and efficient hashing codes. In: 2015 12th international conference on fuzzy systems and knowledge discovery (FSKD), Zhangjiajie, pp 1285–1290
Park SU, Park JH, Al-masni MA, Al-antari MA, Uddin Z, Kim T (2016) A depth camera-based human activity recognition via deep learning recurrent neural network for health and social care services. Proc Comput Sci 100:78–84
Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding. Springer, Berlin, Heidelberg pp 29–39
Zhao X, Shi X, Zhang S (2015) Facial expression recognition via deep learning. IETE Tech Rev 32(5):347–355
Xie S, Yang T, Wang X, Lin Y (2015) Hyper-class augmented and regularized deep learning for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2645–2654
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778
Floyd MW, Turner JT, Aha DW (2017) Using deep learning to automate feature modeling in learning by observation: a preliminary study. In: 2017 AAAI spring symposium series
Tang C, Feng Y, Yang X, Zheng C, Zhou Y (2017) The object detection based on deep learning. In: 2017 4th international conference on information science and control engineering (ICISCE), pp 723–728
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798–1828
Huang FJ, Boureau Y-L, LeCun Y, Huang Fu Jie, Boureau Y-Lan, LeCun Yann et al (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE conference on computer vision and pattern recognition. CVPR’07. IEEE, pp 1–8
Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36:193–202
Kim Y (2011) Convolutional neural networks for sentence classification. arXiv:1408.5882
Zhou X, Gong W, Fu W, Du F (2017) Application of deep learning in object detection. In: 2017 IEEE/ACIS 16th international conference on computer and information science (ICIS). IEEE, pp 631–634
Ranjan R, Sankaranarayanan S, Bansal A, Bodla N, Chen J-C, Patel VM, Castillo CD, Chellappa R (2018) Deep learning for understanding faces: machines may be just as good, or better, than humans. IEEE Signal Process Mag 35(1):66–83
Milyaev S, Laptev I (2017) Towards reliable object detection in noisy images. Pattern Recognit Image Anal 27(4):713–722
Zhou X, Gong W, Fu W, Du F (2017) Application of deep learning in object detection, pp 631–634
Druzhkov PN, Kustikova VD (2016) A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit Image Anal 26(1):9–15
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks, pp 1097–1105
Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: 30th international conference on machine learning, ICML, 1302
Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I, Bergeron A, Bouchard N, Bengio Y (2012) Theano: new features and speed improvements. In: Deep learning and unsupervised feature learning NIPS 2012 workshop
Rifai S, Dauphin Y, Vincent P, Bengio Y, Muller X (2011) The manifold tangent classifier. In: NIPS’2011, student paper award
Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. In: International conference on learning representations
Glorot X, Biordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: JMLR W&CP: proceedings of the fourteenth international conference on artificial intelligence and statistics (AISTATS 2011), Apr 2011
Goodfellow IJ, Courville A, Bengio Y (2013) Joint training of deep Boltzmann machines for classification. In: International conference on learning representations: workshops track
Maier A, Syben C, Lasser T, Riess C (2019) A gentle introduction to deep learning in medical image processing. Zeitschrift für Medizinische Physik 29.https://doi.org/10.1016/j.zemedi.2018.12.003
Hong Z (2011) A preliminary study on artificial neural network. In: 2011 6th IEEE joint international information technology and artificial intelligence conference, vol 2, pp 336–338
Xu H, Han Z, Feng S, Zhou H, Fang Y (2018) Foreign object debris material recognition based on convolutional neural networks. EURASIP J Image Video Process 2018:21
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2014.81
Yudistira N, Kurita T (2017) Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning. EURASIP J Image Video Process 2017:85
Szegedy C, LiuW, JiaY, Sermanet P, Reed S, Anguelov D et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Papakostas M, Giannakopoulos T, Makedon F, Karkaletsis V (2016) Short-term recognition of human activities using convolutional neural networks. In: 2016 12th international conference on signal-image technology and internet-based systems (SITIS). IEEE, pp 302–307
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y (2013) OverFeat: integrated recognition, localization and detection using convolutional networks. In: International conference on learning representations (ICLR) (Banff)
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC
Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multistage feature learning. In: Proceedings of international conference on computer vision and pattern recognition (CVPR’13). IEEE, June 2013
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. In: IEEE transactions on pattern analysis and machine intelligence (in Press)
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp 770–778
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: AAAI conference on artificial intelligence
Xia X, Xu C, Nan B (2017) Inception-v3 for flower classification. In: 2017 2nd international conference on image, vision and computing (ICIVC). IEEE, pp 783–787
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence
Jastrzebski S, Arpit D, Ballas N, Verma V, Che T, Bengio Y (2018) Residual connections encourage iterative inference. In: ICLR
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456
Zeiler M, Fergus R (2013) Visualizing and understanding convolutional neural networks. ECCV 2014, Part I, LNCS 8689. https://doi.org/10.1007/978-3-319-10590-1_53
Huang G, Sun Y, Liu Z, Sedra D, Weinberger KQ (2016) Deep networks with stochastic depth. In: ECCV. Springer, pp 646– 661
Szegedy C et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, pp 1–9
Pawlowski N, Ktena SI, Lee MC, Kainz B, Rueckert D, Glocker B et al (2017) DLTK: state of the art reference implementations for deep learning on medical images. arXiv:1711.06853
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. In: IJCV, pp 303–338
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
Aman D, Payal P (2018) Image retrieval techniques: a survey. Int J Eng Technol 7(1.2):215–219
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: ECCV
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV
Zhou SK, Greenspan H, Shen D (2017) Deep learning for medical image analysis. Academic Press
Lu L, Zheng Y, Carneiro G, Yang L (2017) Deep learning and convolutional neural networks for medical image computing. Springer
Zheng Y, Comaniciu D (2014) Marginal space learning. In: Marginal space learning for medical image analysis. Springer, pp 25–65
Gauthier J (2014) Conditional generative adversarial nets for convolutional face generation. In: Class project for Stanford CS231N: convolutional neural networks for visual recognition, Winter semester 2014
Ghesu FC, Krubasik E, Georgescu B, Singh V, Zheng Y, Hornegger J et al (2016) Marginal space deep learning: efficient architecture for volumetric image parsing. IEEE Trans Med Imaging 35:1217–1228
Ker J, Wang L, Rao J, Lim T (2018) Deep learning applications in medical image analysis. IEEE Access 6:9375–9389
Lin M, Chen Q, Yan S (2013) Network in network
Aman D, Payal P (2019) Analysis of non-linear activation functions for classification tasks using convolutional neural networks. Recent Patents Comput Sci 12:156. https://doi.org/10.2174/2213275911666181025143029
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Aditi, Dureja, A. (2021). A Review: Image Classification and Object Detection with Deep Learning. In: Gao, XZ., Kumar, R., Srivastava, S., Soni, B.P. (eds) Applications of Artificial Intelligence in Engineering. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-33-4604-8_6
Download citation
DOI: https://doi.org/10.1007/978-981-33-4604-8_6
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4603-1
Online ISBN: 978-981-33-4604-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)