Abstract
In this paper, we present a method for locating and recognizing hand gestures from images, based on Deep Learning. Our goal is to provide an intuitive and accessible way to interact with Computer Vision-based mobile applications aimed to assist visually impaired people (e.g. pointing a finger at an object in a real scene to zoom in for a close-up of the pointed object). Initially, we have defined different hand gestures that can be assigned to different actions. After that, we have created a database containing images corresponding to these gestures. Lastly, this database has been used to train Neural Networks with different topologies (testing different input sizes, weight initialization, and data augmentation process). In our experiments, we have obtained high accuracies both in localization (96%–100%) and in recognition (99.45%) with Networks that are appropriate to be ported to mobile devices.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Organization, W.H., et al.: Global Data on Visual Impairments 2010. World Health Organization Organization, Geneva (2012)
ONCE Foundation, Afiliados a la ONCE, junio 2017, June 2017. http://www.once.es/new/afiliacion/datos-estadisticos
Manduchi, R., Coughlan, J.: (Computer) Vision without sight. Commun. ACM 55(1), 96–104 (2012)
Rituerto, A., Fusco, G., Coughlan, J.M.: Towards a sign-based indoor navigation system for people with visual impairments. In: Proceedings of the 18th International ACM SIGACCESS Conference on Computers and Accessibility, pp. 287–288. ACM (2016)
Ahmetovic, D., Manduchi, R., Coughlan, J.M., Mascetti, S.: Mind your crossings: mining GIS imagery for crosswalk localization. ACM Trans. Access. Comput. (TACCESS) 9(4), 11 (2017)
The voice for android (2017). https://www.seeingwithsound.com/android.htm
Sáez, J.M., Escolano, F., Lozano, M.A.: Aerial obstacle detection with 3D mobile devices. IEEE J. Biomed. Health Inf. 19(1), 74–80 (2015)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1–9 (2012)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Bheda, V., Radpour, D.: Using deep convolutional networks for gesture recognition in American sign language. In: CoRR, abs/1710.06836 (2017)
Molchanov, P., Gupta, S., Kim, K., Kautz, J.: Hand gesture recognition with 3D convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1–7, June 2015
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ImageNet Challenge, pp. 1–10 (2014)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. CoRR, abs/1610.02357 (2016)
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and \({<}\)0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Selvaraju, R.R., Das, A., Vedantam, R., Cogswell, M., Parikh, D., Batra, D.: Grad-cam: why did you say that? Visual explanations from deep networks via gradient-based localization. arXiv preprint arXiv:1610.02391 (2016)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010, pp. 177–186. Springer (2010)
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. CoRR, abs/1212.5701 (2012)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Li, F.-F.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Acknowledgements
This work was partially supported by the project TIN2015-69077-P of the Spanish Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Alashhab, S., Gallego, AJ., Lozano, M.Á. (2019). Hand Gesture Detection with Convolutional Neural Networks. In: De La Prieta, F., Omatu, S., Fernández-Caballero, A. (eds) Distributed Computing and Artificial Intelligence, 15th International Conference. DCAI 2018. Advances in Intelligent Systems and Computing, vol 800. Springer, Cham. https://doi.org/10.1007/978-3-319-94649-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-94649-8_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94648-1
Online ISBN: 978-3-319-94649-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)