Skip to main content

A Review: Image Classification and Object Detection with Deep Learning

  • Conference paper
  • First Online:
Applications of Artificial Intelligence in Engineering

Part of the book series: Algorithms for Intelligent Systems ((AIS))

  • 1191 Accesses

Abstract

Deep learning has been developed as an efficient machine learning approach that incorporates several layers of features or data interpretation and delivers state-of-the-art results. The application of deep learning has illustrated amazing performance in various fields of implementation, especially in the field of image classification, segmentation, and object detection. Recent developments in deep learning methods are promoting efficiency in fine-grained image classification, which intends to differentiate between sub-categories. In this paper, we include a thorough analysis of the different deep architectures and frameworks illustrating the model specifications. CNN or convolutional neural network has been the fundamental way to go for object detection computer vision and much more. But with the development of data into a more complex form, classical CNN is not capable to provide up to the mark results. Hence, this review paper is made with the aim to bring some prominent models and techniques back into the light and provide their results on different popular datasets. There have been key discoveries made which are discussed throughout the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436

    Article  Google Scholar 

  2. Sze V, Chen Y-H, Yang T-J, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105:2295–2329

    Article  Google Scholar 

  3. Dahl GE, Yu D, Deng L, Acero A (2012) Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans Audio Speech Lang Process 20:30–42

    Article  Google Scholar 

  4. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529:484

    Article  Google Scholar 

  5. Wang XJ, Zhao LL, Wang S (2012) A novel SVM video object extraction technology. In: 2012 8th international conference on natural computation. IEEE, pp 44–48

    Google Scholar 

  6. Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3, no 22, pp 41–46

    Google Scholar 

  7. Li J, Li J (2015) Fast image search with deep convolutional neural networks and efficient hashing codes. In: 2015 12th international conference on fuzzy systems and knowledge discovery (FSKD), Zhangjiajie, pp 1285–1290

    Google Scholar 

  8. Park SU, Park JH, Al-masni MA, Al-antari MA, Uddin Z, Kim T (2016) A depth camera-based human activity recognition via deep learning recurrent neural network for health and social care services. Proc Comput Sci 100:78–84

    Article  Google Scholar 

  9. Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2011) Sequential deep learning for human action recognition. In: International workshop on human behavior understanding. Springer, Berlin, Heidelberg pp 29–39

    Google Scholar 

  10. Zhao X, Shi X, Zhang S (2015) Facial expression recognition via deep learning. IETE Tech Rev 32(5):347–355

    Article  Google Scholar 

  11. Xie S, Yang T, Wang X, Lin Y (2015) Hyper-class augmented and regularized deep learning for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2645–2654

    Google Scholar 

  12. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778

    Google Scholar 

  13. Floyd MW, Turner JT, Aha DW (2017) Using deep learning to automate feature modeling in learning by observation: a preliminary study. In: 2017 AAAI spring symposium series

    Google Scholar 

  14. Tang C, Feng Y, Yang X, Zheng C, Zhou Y (2017) The object detection based on deep learning. In: 2017 4th international conference on information science and control engineering (ICISCE), pp 723–728

    Google Scholar 

  15. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798–1828

    Article  Google Scholar 

  16. Huang FJ, Boureau Y-L, LeCun Y, Huang Fu Jie, Boureau Y-Lan, LeCun Yann et al (2007) Unsupervised learning of invariant feature hierarchies with applications to object recognition. In: IEEE conference on computer vision and pattern recognition. CVPR’07. IEEE, pp 1–8

    Google Scholar 

  17. Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36:193–202

    Article  Google Scholar 

  18. Kim Y (2011) Convolutional neural networks for sentence classification. arXiv:1408.5882

  19. Zhou X, Gong W, Fu W, Du F (2017) Application of deep learning in object detection. In: 2017 IEEE/ACIS 16th international conference on computer and information science (ICIS). IEEE, pp 631–634

    Google Scholar 

  20. Ranjan R, Sankaranarayanan S, Bansal A, Bodla N, Chen J-C, Patel VM, Castillo CD, Chellappa R (2018) Deep learning for understanding faces: machines may be just as good, or better, than humans. IEEE Signal Process Mag 35(1):66–83

    Article  Google Scholar 

  21. Milyaev S, Laptev I (2017) Towards reliable object detection in noisy images. Pattern Recognit Image Anal 27(4):713–722

    Article  Google Scholar 

  22. Zhou X, Gong W, Fu W, Du F (2017) Application of deep learning in object detection, pp 631–634

    Google Scholar 

  23. Druzhkov PN, Kustikova VD (2016) A survey of deep learning methods and software tools for image classification and object detection. Pattern Recognit Image Anal 26(1):9–15

    Article  Google Scholar 

  24. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks, pp 1097–1105

    Google Scholar 

  25. Goodfellow I, Warde-Farley D, Mirza M, Courville A, Bengio Y (2013) Maxout networks. In: 30th international conference on machine learning, ICML, 1302

    Google Scholar 

  26. Bastien F, Lamblin P, Pascanu R, Bergstra J, Goodfellow I, Bergeron A, Bouchard N, Bengio Y (2012) Theano: new features and speed improvements. In: Deep learning and unsupervised feature learning NIPS 2012 workshop

    Google Scholar 

  27. Rifai S, Dauphin Y, Vincent P, Bengio Y, Muller X (2011) The manifold tangent classifier. In: NIPS’2011, student paper award

    Google Scholar 

  28. Zeiler MD, Fergus R (2013) Stochastic pooling for regularization of deep convolutional neural networks. In: International conference on learning representations

    Google Scholar 

  29. Glorot X, Biordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: JMLR W&CP: proceedings of the fourteenth international conference on artificial intelligence and statistics (AISTATS 2011), Apr 2011

    Google Scholar 

  30. Goodfellow IJ, Courville A, Bengio Y (2013) Joint training of deep Boltzmann machines for classification. In: International conference on learning representations: workshops track

    Google Scholar 

  31. Maier A, Syben C, Lasser T, Riess C (2019) A gentle introduction to deep learning in medical image processing. Zeitschrift für Medizinische Physik 29.https://doi.org/10.1016/j.zemedi.2018.12.003

  32. Hong Z (2011) A preliminary study on artificial neural network. In: 2011 6th IEEE joint international information technology and artificial intelligence conference, vol 2, pp 336–338

    Google Scholar 

  33. Xu H, Han Z, Feng S, Zhou H, Fang Y (2018) Foreign object debris material recognition based on convolutional neural networks. EURASIP J Image Video Process 2018:21

    Article  Google Scholar 

  34. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  35. Girshick R, Donahue J, Darrell T, Malik J (2013) Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2014.81

    Article  Google Scholar 

  36. Yudistira N, Kurita T (2017) Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning. EURASIP J Image Video Process 2017:85

    Article  Google Scholar 

  37. Szegedy C, LiuW, JiaY, Sermanet P, Reed S, Anguelov D et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

    Google Scholar 

  38. Papakostas M, Giannakopoulos T, Makedon F, Karkaletsis V (2016) Short-term recognition of human activities using convolutional neural networks. In: 2016 12th international conference on signal-image technology and internet-based systems (SITIS). IEEE, pp 302–307

    Google Scholar 

  39. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y (2013) OverFeat: integrated recognition, localization and detection using convolutional networks. In: International conference on learning representations (ICLR) (Banff)

    Google Scholar 

  40. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC

    Google Scholar 

  41. Sermanet P, Kavukcuoglu K, Chintala S, LeCun Y (2013) Pedestrian detection with unsupervised multistage feature learning. In: Proceedings of international conference on computer vision and pattern recognition (CVPR’13). IEEE, June 2013

    Google Scholar 

  42. Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchical features for scene labeling. In: IEEE transactions on pattern analysis and machine intelligence (in Press)

    Google Scholar 

  43. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), Las Vegas, NV, pp 770–778

    Google Scholar 

  44. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2016) Inception-v4, Inception-ResNet and the impact of residual connections on learning. In: AAAI conference on artificial intelligence

    Google Scholar 

  45. Xia X, Xu C, Nan B (2017) Inception-v3 for flower classification. In: 2017 2nd international conference on image, vision and computing (ICIVC). IEEE, pp 783–787

    Google Scholar 

  46. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Thirty-first AAAI conference on artificial intelligence

    Google Scholar 

  47. Jastrzebski S, Arpit D, Ballas N, Verma V, Che T, Bengio Y (2018) Residual connections encourage iterative inference. In: ICLR

    Google Scholar 

  48. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456

    Google Scholar 

  49. Zeiler M, Fergus R (2013) Visualizing and understanding convolutional neural networks. ECCV 2014, Part I, LNCS 8689. https://doi.org/10.1007/978-3-319-10590-1_53

  50. Huang G, Sun Y, Liu Z, Sedra D, Weinberger KQ (2016) Deep networks with stochastic depth. In: ECCV. Springer, pp 646– 661

    Google Scholar 

  51. Szegedy C et al (2015) Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, pp 1–9

    Google Scholar 

  52. Pawlowski N, Ktena SI, Lee MC, Kainz B, Rueckert D, Glocker B et al (2017) DLTK: state of the art reference implementations for deep learning on medical images. arXiv:1711.06853

  53. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. In: IJCV, pp 303–338

    Google Scholar 

  54. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR

    Google Scholar 

  55. Aman D, Payal P (2018) Image retrieval techniques: a survey. Int J Eng Technol 7(1.2):215–219

    Google Scholar 

  56. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV

    Google Scholar 

  57. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: ECCV

    Google Scholar 

  58. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV

    Google Scholar 

  59. Zhou SK, Greenspan H, Shen D (2017) Deep learning for medical image analysis. Academic Press

    Google Scholar 

  60. Lu L, Zheng Y, Carneiro G, Yang L (2017) Deep learning and convolutional neural networks for medical image computing. Springer

    Google Scholar 

  61. Zheng Y, Comaniciu D (2014) Marginal space learning. In: Marginal space learning for medical image analysis. Springer, pp 25–65

    Google Scholar 

  62. Gauthier J (2014) Conditional generative adversarial nets for convolutional face generation. In: Class project for Stanford CS231N: convolutional neural networks for visual recognition, Winter semester 2014

    Google Scholar 

  63. Ghesu FC, Krubasik E, Georgescu B, Singh V, Zheng Y, Hornegger J et al (2016) Marginal space deep learning: efficient architecture for volumetric image parsing. IEEE Trans Med Imaging 35:1217–1228

    Article  Google Scholar 

  64. Ker J, Wang L, Rao J, Lim T (2018) Deep learning applications in medical image analysis. IEEE Access 6:9375–9389

    Article  Google Scholar 

  65. Lin M, Chen Q, Yan S (2013) Network in network

    Google Scholar 

  66. Aman D, Payal P (2019) Analysis of non-linear activation functions for classification tasks using convolutional neural networks. Recent Patents Comput Sci 12:156. https://doi.org/10.2174/2213275911666181025143029

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aditi, Dureja, A. (2021). A Review: Image Classification and Object Detection with Deep Learning. In: Gao, XZ., Kumar, R., Srivastava, S., Soni, B.P. (eds) Applications of Artificial Intelligence in Engineering. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-33-4604-8_6

Download citation

Publish with us

Policies and ethics