Deep Learning Architectures for Object Detection and Classification

Vaidya, Bhaumik; Paunwala, Chirag

doi:10.1007/978-3-030-03131-2_4

Bhaumik Vaidya⁶ &
Chirag Paunwala⁷

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 374))

574 Accesses
9 Citations

Abstract

Object detection and classification have observed large amount of transformation and research after the advances in machine learning algorithms. The advancement in the computing power and data availability is complimenting this transformation in object detection. In recent times, research in the field of object detection is dominated by special type of neural network called Convolutional Neural Network (CNN). The object detection system has to localize objects in an image and accurately classify it. CNN is well suited for this task as it can accurately find features like edges, corners and even more advanced features needed to detect object. This chapter provides detailed overview on how CNN works and how it is useful in object detection and classification task. After that popular deep networks based on CNN like ResNet, VGG16, VGG19, GoogleNet and MobileNet are explained in detail. These networks worked well for object classification task but needed sliding window technique for localizing object in an image. It worked slowly as it needed to process many windows for a single image. This led to more advanced algorithms for object detection based on CNN like Convolutional Neural Network with Region proposals (R-CNN), fast R-CNN, faster R-CNN, Single shot multi-box detector (SSD) and You Only Look Once (YOLO). This chapter provides a detail explanation of how these algorithms work and comparison between them. Most of the deep learning algorithms require large amount of data and dedicated hardware like GPUs to train. To overcome this, the concept of transfer learning is discovered. In that pre-trained models of popular CNN architecture are used to solve new problems. So in the last part of the chapter this concept of transfer learning and when it is useful is explained.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Review of Object Detection Models Based on Convolutional Neural Network

Comparative Analysis of Convolutional Neural Network in Object Detection

Survey on Convolutional Neural Networks-Based Object Detection Methods

References

Kpcb Internet Trends Report 2014. http://www.kpcb.com/blog/2014-internet-trends. Accessed 20 June 2017
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer Science & Business Media, Berlin (2010)
Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Walther, D., Itti, L., Riesenhuber, M., Poggio, T., Koch, C.: Attentional selection for object recognition - a gentle way. In: International Workshop on Biologically Motivated Computer Vision, pp. 472–479. Springer (2002)
Google Scholar
Lowe, D. G.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1150–1157. IEEE (1999)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer Vision ECCV 2006, pp. 404–417 (2006)
Chapter Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich: feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Berlin (2016)
Chapter Google Scholar
LeCun, Y.: LeNet-5, Convolutional Neural Networks (2015). http://yann.lecun.com/exdb/lenet
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer, Cham (2014)
Google Scholar
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2019). arXiv:1704.04861
Steinkraus, D., Buck, I., Simard, P.: Using GPUs for machine learning algorithms. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition, pp. 1115–1120. IEEE (2005)
Google Scholar
Rojas, R.: Neural Networks - A Systematic Introduction. Springer, Berlin (1996)
Chapter Google Scholar
Bishop, C.M.: Pattern recognition and machine learning. Information Science and Statistics. Springer, New York Inc, Secaucus (2006)
Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)
Article Google Scholar
Fukushima, K.: Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw. 1, 119–130 (1988)
Article Google Scholar
Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 215–243 (1968)
Article Google Scholar
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the de-tails: delving deep into convolutional nets (2014). arXiv:1405.3531
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for simplicity: the all convolutional net (2014). CoRR labs/ arXiv:1412.6806
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cognitive Modeling (1988)
Google Scholar
Sebe, N.: Machine learning in computer vision, vol. 29. Springer Science & Business Media, Berlin (2005)
Google Scholar
Imagenet large scale visual recognition challenge. http://image-net.org/challenges/LSVRC/
Imagenet database statistics. http://image-net.org/about-stats
Van de Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: IEEE International Conference on Computer Vision (ICCV), pp. 1879–1886. IEEE (2011)
Google Scholar
Zitnick, C.L., Dollar, P.: Edge boxes: locating object proposals from edges, pp. 391–405. Springer, Berlin (2014)
Google Scholar
PASCAL VOC image dataset. http://host.robots.ox.ac.uk/pascal/VOC/
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Murphy, K.: Speed/accuracy trade-offs for modern convolutional object detectors (2016). arXiv:1611.10012
COCO object detection dataset. http://cocodataset.org/#home
Google object detection API. https://github.com/tensorflow/models/tree/master/research/object_detection

Download references

Author information

Authors and Affiliations

Research Scholar, Gujarat Technological University, Ahmedabad, India
Bhaumik Vaidya
SCET, Surat, India
Chirag Paunwala

Authors

Bhaumik Vaidya
View author publications
You can also search for this author in PubMed Google Scholar
Chirag Paunwala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bhaumik Vaidya .

Editor information

Editors and Affiliations

School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India
Manoj Kumar Mishra
School of Computer Engineering, KIIT University, Bhubaneswar, Odisha, India
Bhabani Shankar Prasad Mishra
Department of Computer Science and Engineering, IIT Patna, Patna, Bihar, India
Yashwant Singh Patel
Department of Computer Science and Engineering, IIT Patna, Patna, Bihar, India
Rajiv Misra

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Vaidya, B., Paunwala, C. (2019). Deep Learning Architectures for Object Detection and Classification. In: Mishra, M., Mishra, B., Patel, Y., Misra, R. (eds) Smart Techniques for a Smarter Planet. Studies in Fuzziness and Soft Computing, vol 374. Springer, Cham. https://doi.org/10.1007/978-3-030-03131-2_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-03131-2_4
Published: 30 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03130-5
Online ISBN: 978-3-030-03131-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Deep Learning Architectures for Object Detection and Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of Object Detection Models Based on Convolutional Neural Network

Comparative Analysis of Convolutional Neural Network in Object Detection

Survey on Convolutional Neural Networks-Based Object Detection Methods

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Deep Learning Architectures for Object Detection and Classification

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of Object Detection Models Based on Convolutional Neural Network

Comparative Analysis of Convolutional Neural Network in Object Detection

Survey on Convolutional Neural Networks-Based Object Detection Methods

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation