An Integrated Analysis for Identifying Iconic Gestures in Human-Robot Interactions

Singh, Aditi; Bansal, Arvind K.

doi:10.1007/978-3-031-47718-8_18

Aditi Singh¹⁰ &
Arvind K. Bansal¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 825))

Included in the following conference series:

Intelligent Systems Conference

244 Accesses

Abstract

Co-speech human gesture analysis is an important aspect for social conversational interactions involving human-robot interfaces. Co-speech gestures require synchronous integration of speech, human posture, and motions. Iconic gestures are a major subclass of co-speech gestures that express entities and actions by their attributes such as shape-contours, magnitude, and proximity using the synchronous motions of fingers, palms, and spoken phrases. The attributes of entities and actions correlate directly with the displayed contours. In this research, we describe an integrated technique that combines motion analysis to derive contours, synchronization of motion with speech to identify words corresponding to iconic gestures, and conceptual dependency of action words to drive iconic gestures. This technique models motion-sketched contour as a combination of synchronous color Petri net extended to model composite motions and contour-segment patterns. We present high-level algorithms and the corresponding implementation for the proposed technique and evaluate its performance. Performance results show approximately 90% recognition of simple contours, including closed contours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures

Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots

An Automated Structural Approach to Support Theatrical Performances by Introducing Gesture Recognition to a Cuing System

References

Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://doi.org/10.1007/s11482-014-9334-2
Article Google Scholar
García, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in therapy and care. In: Proceedings of the14th ACM/IEEE International Conference on Human-Robot Interaction, Daegu, South Korea, pp. 669–670 (2019). https://doi.org/10.1109/HRI.2019.8673243
McNeill, D.: Hand and Mind: What Gestures Reveal About Thought. The University of Chicago Press, Chicago (1992)
Google Scholar
Wagner, P., Malisz, Z., Kopp, S.: Gesture and speech in interaction—an overview. Speech Commun. 57, 209–232 (2014). https://doi.org/10.1016/j.specom.2013.09.008
Article Google Scholar
Pickering, M.J., Garrod, S.: Understanding Dialogue: Language Use and Social Interaction. Cambridge University Press, Cambridge (2021)
Book Google Scholar
Graham, J.A., Heywood, S.: The effects of elimination of hand gestures and of verbal codability on speech performance. Eur. J. Soc. Psychol. 5(2), 189–195 (1976). https://doi.org/10.1002/ejsp.2420050204
Article Google Scholar
Aussems, S., Kita, S.: Seeing iconic gestures while encoding events facilitates children’s memory of these events. Child Dev. 90(4), 1127–1137 (2019). https://doi.org/10.1111/cdev.12988
Article Google Scholar
Sowa, T., Wachsmuth, I.: A model for the representation for processing of shape in coverbal iconic gestures. In: Proceedings of KogWis05: The German Cognitive Science Conference, Basel, Switzerland, pp. 183–188. Schwabe Verlag, Basel (2005)
Google Scholar
Ghayoumi, M., Thafar, M., Bansal, A.K.: A formal approach for multimodal integration to derive emotions. J. Vis. Lang. Sentient Syst. 2, 48–54 (2016). https://doi.org/10.18293/DMS2016-030
Article Google Scholar
Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture generation in humanoids. In: Arai, K. (ed.) Intelligent Computing, London, UK 2022, LNNS, vol. 283, no. 1, pp. 737–756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9_47
Gesture recognition market size, share & trends analysis report by technology (touch-based, touchless), By industry (automotive, consumer electronics, healthcare), by region, and segment forecasts, 2022–2030. https://www.grandviewresearch.com/industry-analysis/gesture-recognition-market. Last accessed 24 March 2023
Cheok, M.J., Omar, Z., Jaward, M.H.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10(1), 131–153 (2019). https://doi.org/10.1007/s13042-017-0705-5
Article Google Scholar
Iengo, S., Rossi, S., Staffa, M., Finzi, A.: Continuous gesture recognition for flexible human-robot interaction. In: Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China, pp. 4863–4868 (2014). https://doi.org/10.1109/ICRA.2014.6907571
Singh, A., Bansal, A.K.: Synchronous colored Petri net based modeling and video analysis of conversational head-gestures for training social robots. In: Arai, K. (ed.) Future Technology Conference, Vancouver, Canada 2021, LNNS, vol. 561, pp. 432–450. Springer, Heidelberg (2021). https://doi.org/10.1007/978-3-030-89880-9_36
Ng-Thow-Hing, V., Okita, S.Y., Luo, P.: Synchronized gesture and speech production. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, pp. 4617–4624 (2010). https://doi.org/10.1109/IROS.2010.5654322
Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983)
Article Google Scholar
Ziaie, P., Muller, T., Knoll, A.: A novel approach to hand-gesture recognition in a human-robot dialog system. In: Proceedings of the First Workshops on Image Processing Theory, Tools and Applications, Sousse, Tunisia, pp. 1–8 (2008). https://doi.org/10.1109/IPTA.2008.4743760
Schank, R.C.: Conceptual dependency: a theory of natural language understanding. Cognit. Psychol. 3(4), 552–631 (1972). https://doi.org/10.1016/0010-0285(72)90022-9
Article Google Scholar
Chein, M., Mugnier, M.L.: Conceptual graphs: fundamental notions. Revue d’Inteligence Artificielle 6(4), 365–406 (1992)
Google Scholar
Goldin-Meadow, S.: The role of gesture in communication and thinking. Trends Cognit. Sci. 3(11), 419–429 (1999). https://doi.org/10.1016/S1364-6613(99)01397-2
Article Google Scholar
Kelly, S.D., Kravitz, C., Hopkins, M.: Neural correlates of bimodal speech and gesture comprehension. Brain Lang. 89(1), 253–260 (2004). https://doi.org/10.1016/S0093-934X(03)00335-3
Article Google Scholar
Cook, S.W., Tanenhaus, M.K.: Embodied communication: speakers’ gestures affect listeners’ actions. Cognition 113(1), 98–104 (2009). https://doi.org/10.1016/j.cognition.2009.06.006
Article Google Scholar
Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Morency, L.-P., de Kok, I., Gratch, J.: Context-based recognition during human interactions: automatic feature selection and encoding dictionary. In: Proceedings of the Tenth International Conference on Multimedia Interfaces (ICMI), Chania, Crete, Greece, pp. 181–188 (2008). https://doi.org/10.1145/1452392.1452426
Jensen, K., Kristensen, L.M.: Colored Petri Nets: Modeling and Validation of Concurrent Systems. Springer, Heidelberg (2009)
Book Google Scholar
Wang, J.: Timed Petri Net: Theory and Applications. Springer Science + Business Media, New York, NY (1998)
Google Scholar
Liu, W., Du, Y.: Modeling multimedia synchronization using Petri nets. Inf. Technol. J. 8(7), 1054–1058 (2009). https://doi.org/10.3923/itj.2009.1054.1058
Article Google Scholar
Ekman, P., Friesen, W.V.: The repertoire of nonverbal behavior: categories, origins, usage, and coding. Semiotica 1(1), 49–98 (1969). https://doi.org/10.1515/9783110880021.57
Article Google Scholar
Zhao, R., Wang, Y., Jia, P., Li, Ma, C.Y., Zhang, Z.: Review of human gesture recognition based on computer vision technology. In: Proceedings of the IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, pp. 1599–1603 (2021). https://doi.org/10.1109/IAEAC50856.2021.9390889
Alibali, M.W., Goldin-Meadow, S.: Gesture-speech mismatch and mechanisms of learning: what the hands reveal about a child’s state of mind. Cognit. Psychol. 25(4), 468–523 (1993). https://doi.org/10.1006/cogp.1993.1012
Article Google Scholar
Asadi-Aghbolaghi, M., Fathy, M., Behbahani, M.M., Sarrafzadeh, A.: A survey on deep learning based approaches for action and gesture recognition in image sequences. In: Proceedings of the12th IEEE International Conference on Automatic Face & Gesture Recognition, Washington, DC, USA, pp. 476–483 (2017). https://doi.org/10.1109/FG.2017.150
Gong, X.-Y., Su, H., Xu, D., Zhang, Z.-T., Shen, F., Yang, H.-B.: An overview of contour detection approaches. Int. J. Autom. Comput. 15(6), 656–672 (2018). https://doi.org/10.1007/s11633-018-1117-z
Article Google Scholar
Pisharady, P.K., Saerbeck, M.: Recent methods in vision-based hand-gesture recognition: a review. Comput. Vis. Image Underst. 141, 152–165 (2015). https://doi.org/10.1016/j.cviu.2015.08.004
Article Google Scholar
Yang, D., Peng, B., Al-Huda, A., Malik, A., Zhai, D.: An overview of edge and object-contour detection. Neurocomputing 488, 470–493 (2022). https://doi.org/10.1016/j.neucom.2022.02.079
Article Google Scholar
Zhang, Y., Li, S.: A survey of shape representation and description techniques. Pattern Recognit 42(1), 1–19 (2009). https://doi.org/10.1016/j.patcog.2003.07.008
Article Google Scholar
Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017). https://doi.org/10.1109/ACCESS.2017.2684186
Article Google Scholar
Yu, J., Qin, M., Zhou, S.: Dynamic gesture recognition based on 2D convolutional neural network and feature fusion. Sci. Rep. 12, article 4345 (2022). https://doi.org/10.1038/s41598-022-08133-z
Nam, Y., Wohn, N., Lee-Kwang, H.: Modeling and recognition of hand gesture using colored Petri nets. IEEE Trans. Syst. Man Cybern. Part A: Syst. Humans 29(5), 514–421 (1999). https://doi.org/10.1109/3468.784178
Article Google Scholar
Mediapipe. https://mediapipe.dev. Last accessed 24 March 2023
Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001). https://doi.org/10.1145/375360.375365
Article Google Scholar
Open CV. https://opencv.org. Last accessed 24 March 2023
PyAudio. https://people.csail.mit.edu/hubert/pyaudio/docs/. Last accessed 24 March 2023
Pydub. https://pypi.org/project/pydub/. Last accessed 24 March 2023
Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 740–750 (2014). https://doi.org/10.3115/v1/D14-1082

Download references

Author information

Authors and Affiliations

Department of Computer Science, Kent State University, Kent, OH, 44242, USA
Aditi Singh & Arvind K. Bansal

Authors

Aditi Singh
View author publications
You can also search for this author in PubMed Google Scholar
Arvind K. Bansal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditi Singh .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, A., Bansal, A.K. (2024). An Integrated Analysis for Identifying Iconic Gestures in Human-Robot Interactions. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 825. Springer, Cham. https://doi.org/10.1007/978-3-031-47718-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-47718-8_18
Published: 14 February 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47717-1
Online ISBN: 978-3-031-47718-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

An Integrated Analysis for Identifying Iconic Gestures in Human-Robot Interactions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures

Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots

An Automated Structural Approach to Support Theatrical Performances by Introducing Gesture Recognition to a Cuing System

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

An Integrated Analysis for Identifying Iconic Gestures in Human-Robot Interactions

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures

Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots

An Automated Structural Approach to Support Theatrical Performances by Introducing Gesture Recognition to a Cuing System

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation