Abstract
Object detection and classification have observed large amount of transformation and research after the advances in machine learning algorithms. The advancement in the computing power and data availability is complimenting this transformation in object detection. In recent times, research in the field of object detection is dominated by special type of neural network called Convolutional Neural Network (CNN). The object detection system has to localize objects in an image and accurately classify it. CNN is well suited for this task as it can accurately find features like edges, corners and even more advanced features needed to detect object. This chapter provides detailed overview on how CNN works and how it is useful in object detection and classification task. After that popular deep networks based on CNN like ResNet, VGG16, VGG19, GoogleNet and MobileNet are explained in detail. These networks worked well for object classification task but needed sliding window technique for localizing object in an image. It worked slowly as it needed to process many windows for a single image. This led to more advanced algorithms for object detection based on CNN like Convolutional Neural Network with Region proposals (R-CNN), fast R-CNN, faster R-CNN, Single shot multi-box detector (SSD) and You Only Look Once (YOLO). This chapter provides a detail explanation of how these algorithms work and comparison between them. Most of the deep learning algorithms require large amount of data and dedicated hardware like GPUs to train. To overcome this, the concept of transfer learning is discovered. In that pre-trained models of popular CNN architecture are used to solve new problems. So in the last part of the chapter this concept of transfer learning and when it is useful is explained.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kpcb Internet Trends Report 2014. http://www.kpcb.com/blog/2014-internet-trends. Accessed 20 June 2017
Szeliski, R.: Computer Vision: Algorithms and Applications. Springer Science & Business Media, Berlin (2010)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Walther, D., Itti, L., Riesenhuber, M., Poggio, T., Koch, C.: Attentional selection for object recognition - a gentle way. In: International Workshop on Biologically Motivated Computer Vision, pp. 472–479. Springer (2002)
Lowe, D. G.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, vol. 2, pp. 1150–1157. IEEE (1999)
Bay, H., Tuytelaars, T., Van Gool, L.: Surf: speeded up robust features. In: Computer Vision ECCV 2006, pp. 404–417 (2006)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE (2005)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich: feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: Ssd: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer, Berlin (2016)
LeCun, Y.: LeNet-5, Convolutional Neural Networks (2015). http://yann.lecun.com/exdb/lenet
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv:1409.1556
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer, Cham (2014)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2019). arXiv:1704.04861
Steinkraus, D., Buck, I., Simard, P.: Using GPUs for machine learning algorithms. In: Proceedings of the Eighth International Conference on Document Analysis and Recognition, pp. 1115–1120. IEEE (2005)
Rojas, R.: Neural Networks - A Systematic Introduction. Springer, Berlin (1996)
Bishop, C.M.: Pattern recognition and machine learning. Information Science and Statistics. Springer, New York Inc, Secaucus (2006)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)
Fukushima, K.: Neocognitron: a hierarchical neural network capable of visual pattern recognition. Neural Netw. 1, 119–130 (1988)
Hubel, D.H., Wiesel, T.N.: Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195, 215–243 (1968)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the de-tails: delving deep into convolutional nets (2014). arXiv:1405.3531
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for simplicity: the all convolutional net (2014). CoRR labs/ arXiv:1412.6806
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cognitive Modeling (1988)
Sebe, N.: Machine learning in computer vision, vol. 29. Springer Science & Business Media, Berlin (2005)
Imagenet large scale visual recognition challenge. http://image-net.org/challenges/LSVRC/
Imagenet database statistics. http://image-net.org/about-stats
Van de Sande, K.E., Uijlings, J.R., Gevers, T., Smeulders, A.W.: Segmentation as selective search for object recognition. In: IEEE International Conference on Computer Vision (ICCV), pp. 1879–1886. IEEE (2011)
Zitnick, C.L., Dollar, P.: Edge boxes: locating object proposals from edges, pp. 391–405. Springer, Berlin (2014)
PASCAL VOC image dataset. http://host.robots.ox.ac.uk/pascal/VOC/
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Murphy, K.: Speed/accuracy trade-offs for modern convolutional object detectors (2016). arXiv:1611.10012
COCO object detection dataset. http://cocodataset.org/#home
Google object detection API. https://github.com/tensorflow/models/tree/master/research/object_detection
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Vaidya, B., Paunwala, C. (2019). Deep Learning Architectures for Object Detection and Classification. In: Mishra, M., Mishra, B., Patel, Y., Misra, R. (eds) Smart Techniques for a Smarter Planet. Studies in Fuzziness and Soft Computing, vol 374. Springer, Cham. https://doi.org/10.1007/978-3-030-03131-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-03131-2_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03130-5
Online ISBN: 978-3-030-03131-2
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)