Abstract
Deep learning (DL) using artificial neural networks has made remarkable progress, fueled by the utilization of powerful GPUs and the availability of copious online data. This advancement has led to computers becoming highly intelligent across various fields, with computer vision being a prominent area of research and development (R&D). Specifically, Human activity recognition plays a pivotal role in various applications, including healthcare monitoring, surveillance and security systems, and human–machine interfaces. However, challenges persist in unconstrained environments, including occlusions, variations in clothing, and background noise, making these tasks difficult to solve. This review article offers a succinct examination of deep learning algorithms, with a specific emphasis on convolutional neural networks (CNNs), which have been suggested as a solution to classical artificial intelligence problems. Furthermore, the paper delves into the notable outcomes and contributions of various methodologies explored in human activity classification through the utilization of DL techniques. In conclusion, the paper emphasizes the potential of a hybrid approach that combines convolutional and recurrent neural networks in future solutions for human action/activity recognition. By combining the strengths of CNNs in extracting spatial features and RNNs in capturing temporal dependencies, the hybrid CNN-RNN models hold promise in effectively analyzing video data, leading to improved accuracy in classifying human activities. Ongoing research aims to further enhance these hybrid models to tackle the challenges of unconstrained environments and advance the human activity recognition field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Harikrishnan, J., Sudarsan, A., Ajai, R. A. S., & Sadashiv, A. (2019). Vision-face recognition attendance monitoring system for surveillance using deep learning technology and computer vision. In 2019 international conference on vision towards emerging trends in communication and networking (ViTECoN).
Li, A. A. S., Trappey, A. J. C., Trappey, C. V., & Fan, C. Y. (2019). E-discover state-of-the-art research trends of deep learning for computer vision. In IEEE international conference on systems, man and cybernetics (SMC) Bari, Italy.
McCulloch, W., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology, 52, 115–133.
Shety, S. K., & Siddiqa, A. (2019, July). Deep learning and applications in computer vision. International Journal of Computer Sciences and Engineering, 7(7). E-ISSN: 2347-2693.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems (NIPS) (pp. 1106–1114).
Nishani, E., & Ciço, B. (2017). Computer vision approaches based on deep learning and neural networks: Deep neural networks for video analysis of human pose estimation. In 2017 6th Mediterranean conference on embedded computing (MECO), 11–15 June 2017.
Voulodimos, A., Doulamis, N., Doulamis, A., & Protopapadakis, E. (2018). Deep learning for computer vision: A brief review. Journal of Physics Computational Intelligence and Neuroscience, 2018, 1–13.
O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G. V., Krpalkova, L., Riordan, D., & Walsh, J. (2020). Deep learning vs. traditional computer vision. In Advances in computer vision proceedings of the 2019 computer vision conference (CVC) (pp. 128–144). Springer Nature Switzerland AG.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. In 2015 international conference on learning representations (ICLR).
He, K., Zhang, X., Ren, S., & Sun, J. (2016) Deep residual learning for image recognition. In 2016 IEEE conference on computer vision and pattern recognition (CVPR).
Elmagrouni, I., Ettaoufik, A., Aouad, S., & Maizate, A. (2021). Approach for improving user interface based on gesture recognition. In E3s web of conferences 297, 01030 (ICCSRE’2021).
Wei, L., & Shah, S. K. (2017). Human activity recognition using deep neural network with contextual information. In 12th international joint conference on computer vision, imaging and computer graphics theory and applications (VISIGRAPP N2017).
Zamri, N. N. M., Ling, G. F., Han, P. Y., & Yin, O. S. (2019). Vision-based human action recognition on pre-trained AlexNet. In 9th IEEE international conference on control system, computing and engineering (ICCSCE).
Deep, S., & Zheng, X. (2019). Leveraging CNN and transfer learning for vision-based human activity recognition. In 2019 29th international telecommunication networks and applications conference (ITNAC).
NeiliBoualia, S., & Amara, N. E. B. (2019). Pose-based human activity recognition: A review. In 2019 15th international wireless communications & mobile computing conference (IWCMC).
Ouyang, W., Chu, X., & Wang, X. (Département d’ingénierie éléctronique, Université chinoise de Hong Kong). (2014). Multi-source deep learning for human pose estimation. In 2014 IEEE conference on computer vision and pattern recognition.
Munasinghe, M. I. N. P. (2018). Dynamic hand gesture recognition using computer vision and neural networks. In 2018 3rd international conference for convergence in technology (I2CT) (pp. 1–5). IEEE.
Mo, L., Li, F., Zhu, Y., & Huang, A. (2016). Human physical activity recognition based on computer vision with deep learning model. In 2016 IEEE international instrumentation and measurement technology conference proceedings.
Kamel, A., Sheng, B., Yang, P., Li, P., Shen, R., & Feng, D. D. (2018). Deep convolutional neural networks for human action recognition using depth maps and postures. IEEE Transactions on Systems Man and Cybernetics, PP(99).
Sung, G., Sokal, K., Uboweja, E., Bazarevsky, V., Baccash, J., Bazavan, E., Chang, C.-L., & Grundmann, M. (2021). On-device real-time hand gesture recognition.
Nakazawa, A., Kato, H., & Inokuchi, S. (1998). Human tracking using distributed vision systems. In Proceedings of the Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).
Yang, J., Cheng, J., & Lu, H. (2009). Human activity recognition based on the blob features. In 2009 IEEE international conference on multimedia and expo.
Abdelbaki, A. (2016). P-CNN: Pose-based CNN features for action recognition. Computer vision Lab SS16.
Shah, U., & Harpale, A. (2018). A review of deep learning models for computer vision. In 2018 IEEE Punecon.
Tang, X., Yan, Z., Pen, J., Hao, B., Wang, H., & Li, J. (2021). Selective spatiotemporal features learning for dynamic gesture recognition. Expert Systems with Applications, 169, 114499.
Mutegeki, R., & Han, D. S. (2020). A CNN-LSTM approach to human activity recognition. In 2020 international conference on artificial intelligence in information and communication (ICAIIC).
Yang, S., Zhou, Y., & Yu, X. (2020). LSTM and GRU neural network performance comparison study. In 2020 international workshop on electronic communication and artificial intelligence (IWECAI).
Chen, L., Li, Y., & Liu, Y. (2020). Human body gesture recognition method based on deep learning. In 2020 Chinese control and decision conference (CCDC).
Ullah, A., Muhammad, K., Del Ser, J., Baik, W., & de Albuquerque, V. H. C. (2019). Activity recognition using temporal optical flow convolutional features and multilayer LSTM. IEEE Transactions on Industrial Electronics, 66(12), 9692–9702.
Zhao, C., Han, J. G., & Xuebin Xu. (2018, September). CNN and RNN based neural networks for action recognition. In Journal of Physics: Conference Series; Bristol (Vol. 1087, No. 6).
Yang, Y., & Ramanan, D. (2013). Articulated human detection with flexible mixtures of parts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12), 2878–2890. https://doi.org/10.1109/TPAMI.2012.261
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9, 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Szegedy, C., et al. (2015). Going deeper with convolutions. In 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston, MA, USA (pp. 1–9). https://doi.org/10.1109/CVPR.2015.7298594
Gkioxari, G., & Malik, J. (2015). Finding action tubes (pp. 759–768). https://doi.org/10.1109/CVPR.2015.7298676
Vrigkas, M., Nikou, C., & Kakadiaris, I. A. (2015). A review of human activity recognition methods. Frontiers in Robotics and AI, 2, 28.
Wang, C., & Yan, J. (2023). A comprehensive survey of RGB-based and skeleton-based human action recognition. IEEE Access, 11, 53880–53898. https://doi.org/10.1109/ACCESS.2023.3282311
Zhao, L. (2023). A hybrid deep learning-based intelligent system for sports action recognition via visual knowledge discovery. IEEE Access, 11, 46541–46549. https://doi.org/10.1109/ACCESS.2023.3275012
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Haddad, L.E., Hanoune, M., Ettaoufik, A. (2024). Computer Vision with Deep Learning for Human Activity Recognition: Features Representation. In: Chakir, A., Andry, J.F., Ullah, A., Bansal, R., Ghazouani, M. (eds) Engineering Applications of Artificial Intelligence. Synthesis Lectures on Engineering, Science, and Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-50300-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-50300-9_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50299-6
Online ISBN: 978-3-031-50300-9
eBook Packages: Synthesis Collection of Technology (R0)