Abstract
Human action recognition plays a major role in enabling an effective and safe collaboration between humans and robots. Considering for example a collaborative assembly task, the human worker can use gestures to communicate with the robot while the robot can exploit the recognized actions to anticipate the next steps in the assembly process, improving safety and the overall productivity. In this work, we propose a novel framework for human action recognition based on 3D pose estimation and ensemble techniques. In such framework, we first estimate the 3D coordinates of the human hands and body joints by means of OpenPose and RGB-D data. The estimated joints are then fed to a set of graph convolutional networks derived from Shift-GCN, one network for each set of joints (i.e., body, left hand and right hand). Finally, using an ensemble approach we average the output scores of all the networks to predict the final human action. The proposed framework was evaluated on a dedicated dataset, named IAS-Lab Collaborative HAR dataset, which includes both actions and gestures commonly used in human-robot collaboration tasks. The experimental results demonstrated how the ensemble of the different action recognition models helps improving the accuracy and the robustness of the overall system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Villani, V., Pini, F., Leali, F., Secchi, C.: Survey on human-robot collaboration in industrial settings: safety, intuitive interfaces and applications. Mechatronics 55, 248–266 (2018)
Matheson, E., Minto, R., Zampieri, E.G., Faccio, M., Rosati, G.: Human-robot collaboration in manufacturing applications: a review. Robotics 8(4), 100 (2019)
Kim, W., Peternel, L., Lorenzini, M., Babič, J., Ajoudani, A.: A human-robot collaboration framework for improving ergonomics during dexterous operation of power tools. Robot. Comput.-Integr. Manuf. 68, 102084 (2021)
Liu, H., Fang, T., Zhou, T., Wang, L.: Towards robust human-robot collaborative manufacturing: multimodal fusion. IEEE Access 6, 74762–74771 (2018)
Mohammadi Amin, F., Rezayati, M., van de Venn, H.W., Karimpour, H.: A mixed-perception approach for safe human-robot collaboration in industrial automation. Sensors 20(21), 6347 (2020)
Kobayashi, T., Aoki, Y., Shimizu, S., Kusano, K., Okumura, S.: Fine-grained action recognition in assembly work scenes by drawing attention to the hands. In: 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), pp. 440–446. IEEE (2019)
Liu, K., Zhu, M., Fu, H., Ma, H., Chua, T.S.: Enhancing anomaly detection in surveillance videos with transfer learning from action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 4664–4668 (2020)
Prati, A., Shan, C., Wang, K.I.K.: Sensors, vision and networks: from video surveillance to activity recognition and health monitoring. J. Ambient Intell. Smart Environ. 11(1), 5–22 (2019)
Ranieri, C.M., MacLeod, S., Dragone, M., Vargas, P.A., Romero, R.A.F.: Activity recognition for ambient assisted living with videos, inertial units and ambient sensors. Sensors 21(3), 768 (2021)
Al-Amin, M., Tao, W., Doell, D., Lingard, R., Yin, Z., Leu, M.C., Qin, R.: Action recognition in manufacturing assembly using multimodal sensor fusion. Procedia Manuf. 39, 158–167 (2019)
Bo, W., Fuqi, M., Rong, J., Peng, L., Xuzhu, D.: Skeleton-based violation action recognition method for safety supervision in the operation field of distribution network based on graph convolutional network. CSEE J. Power Energy Syst. (2021)
Chen, C., Jafari, R., Kehtarnavaz, N.: UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 168–172. IEEE (2015)
Yu, J., Gao, H., Yang, W., Jiang, Y., Chin, W., Kubota, N., Ju, Z.: A discriminative deep model with feature fusion and temporal attention for human action recognition. IEEE Access 8, 43243–43255 (2020)
Ullah, A., Ahmad, J., Muhammad, K., Sajjad, M., Baik, S.W.: Action recognition in video sequences using deep bi-directional LSTM with CNN features. IEEE Access 6, 1155–1166 (2017)
Feichtenhofer, C.: X3D: expanding architectures for efficient video recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 203–213 (2020)
Wen, X., Chen, H., Hong, Q.: Human assembly task recognition in human-robot collaboration based on 3D CNN. In: 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), pp. 1230–1234. IEEE (2019)
Xiong, Q., Zhang, J., Wang, P., Liu, D., Gao, R.X.: Transferable two-stream convolutional neural network for human action recognition. J. Manuf. Syst. 56, 605–614 (2020)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2019)
Cheng, K., Zhang, Y., He, X., Chen, W., Cheng, J., Lu, H.: Skeleton-based action recognition with shift graph convolutional network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 143–152 (2020)
Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2021)
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: NTU RGB+D: a large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Wang, J., Nie, X., Xia, Y., Wu, Y., Zhu, S.C.: Cross-view action modeling, learning and recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2649–2656 (2014)
Martins, G.S., Santos, L., Dias, J.: The GrowMeUp project and the applicability of action recognition techniques. In: Third Workshop on Recognition and Action for Scene Understanding (REACTS), Ruiz de Aloza (2015)
Acknowledgment
The research leading to these results has received funding from the European Unions Horizon 2020 research and innovation program under grant agreement No. 101006732. Part of this work was supported by MIUR (Italian Minister for Education) under the initiative “Departments of Excellence” (Law 232/2016).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Terreran, M., Lazzaretto, M., Ghidoni, S. (2023). Skeleton-Based Action and Gesture Recognition for Human-Robot Collaboration. In: Petrovic, I., Menegatti, E., Marković, I. (eds) Intelligent Autonomous Systems 17. IAS 2022. Lecture Notes in Networks and Systems, vol 577. Springer, Cham. https://doi.org/10.1007/978-3-031-22216-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-22216-0_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22215-3
Online ISBN: 978-3-031-22216-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)