Skip to main content

An Integrated Analysis for Identifying Iconic Gestures in Human-Robot Interactions

  • Conference paper
  • First Online:
Intelligent Systems and Applications (IntelliSys 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 825))

Included in the following conference series:

  • 244 Accesses

Abstract

Co-speech human gesture analysis is an important aspect for social conversational interactions involving human-robot interfaces. Co-speech gestures require synchronous integration of speech, human posture, and motions. Iconic gestures are a major subclass of co-speech gestures that express entities and actions by their attributes such as shape-contours, magnitude, and proximity using the synchronous motions of fingers, palms, and spoken phrases. The attributes of entities and actions correlate directly with the displayed contours. In this research, we describe an integrated technique that combines motion analysis to derive contours, synchronization of motion with speech to identify words corresponding to iconic gestures, and conceptual dependency of action words to drive iconic gestures. This technique models motion-sketched contour as a combination of synchronous color Petri net extended to model composite motions and contour-segment patterns. We present high-level algorithms and the corresponding implementation for the proposed technique and evaluate its performance. Performance results show approximately 90% recognition of simple contours, including closed contours.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://doi.org/10.1007/s11482-014-9334-2

    Article  Google Scholar 

  2. García, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in therapy and care. In: Proceedings of the14th ACM/IEEE International Conference on Human-Robot Interaction, Daegu, South Korea, pp. 669–670 (2019). https://doi.org/10.1109/HRI.2019.8673243

  3. McNeill, D.: Hand and Mind: What Gestures Reveal About Thought. The University of Chicago Press, Chicago (1992)

    Google Scholar 

  4. Wagner, P., Malisz, Z., Kopp, S.: Gesture and speech in interaction—an overview. Speech Commun. 57, 209–232 (2014). https://doi.org/10.1016/j.specom.2013.09.008

    Article  Google Scholar 

  5. Pickering, M.J., Garrod, S.: Understanding Dialogue: Language Use and Social Interaction. Cambridge University Press, Cambridge (2021)

    Book  Google Scholar 

  6. Graham, J.A., Heywood, S.: The effects of elimination of hand gestures and of verbal codability on speech performance. Eur. J. Soc. Psychol. 5(2), 189–195 (1976). https://doi.org/10.1002/ejsp.2420050204

    Article  Google Scholar 

  7. Aussems, S., Kita, S.: Seeing iconic gestures while encoding events facilitates children’s memory of these events. Child Dev. 90(4), 1127–1137 (2019). https://doi.org/10.1111/cdev.12988

    Article  Google Scholar 

  8. Sowa, T., Wachsmuth, I.: A model for the representation for processing of shape in coverbal iconic gestures. In: Proceedings of KogWis05: The German Cognitive Science Conference, Basel, Switzerland, pp. 183–188. Schwabe Verlag, Basel (2005)

    Google Scholar 

  9. Ghayoumi, M., Thafar, M., Bansal, A.K.: A formal approach for multimodal integration to derive emotions. J. Vis. Lang. Sentient Syst. 2, 48–54 (2016). https://doi.org/10.18293/DMS2016-030

    Article  Google Scholar 

  10. Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture generation in humanoids. In: Arai, K. (ed.) Intelligent Computing, London, UK 2022, LNNS, vol. 283, no. 1, pp. 737–756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9_47

  11. Gesture recognition market size, share & trends analysis report by technology (touch-based, touchless), By industry (automotive, consumer electronics, healthcare), by region, and segment forecasts, 2022–2030. https://www.grandviewresearch.com/industry-analysis/gesture-recognition-market. Last accessed 24 March 2023

  12. Cheok, M.J., Omar, Z., Jaward, M.H.: A review of hand gesture and sign language recognition techniques. Int. J. Mach. Learn. Cybern. 10(1), 131–153 (2019). https://doi.org/10.1007/s13042-017-0705-5

    Article  Google Scholar 

  13. Iengo, S., Rossi, S., Staffa, M., Finzi, A.: Continuous gesture recognition for flexible human-robot interaction. In: Proceedings of the IEEE International Conference on Robotics and Automation, Hong Kong, China, pp. 4863–4868 (2014). https://doi.org/10.1109/ICRA.2014.6907571

  14. Singh, A., Bansal, A.K.: Synchronous colored Petri net based modeling and video analysis of conversational head-gestures for training social robots. In: Arai, K. (ed.) Future Technology Conference, Vancouver, Canada 2021, LNNS, vol. 561, pp. 432–450. Springer, Heidelberg (2021). https://doi.org/10.1007/978-3-030-89880-9_36

  15. Ng-Thow-Hing, V., Okita, S.Y., Luo, P.: Synchronized gesture and speech production. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Taipei, Taiwan, pp. 4617–4624 (2010). https://doi.org/10.1109/IROS.2010.5654322

  16. Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983)

    Article  Google Scholar 

  17. Ziaie, P., Muller, T., Knoll, A.: A novel approach to hand-gesture recognition in a human-robot dialog system. In: Proceedings of the First Workshops on Image Processing Theory, Tools and Applications, Sousse, Tunisia, pp. 1–8 (2008). https://doi.org/10.1109/IPTA.2008.4743760

  18. Schank, R.C.: Conceptual dependency: a theory of natural language understanding. Cognit. Psychol. 3(4), 552–631 (1972). https://doi.org/10.1016/0010-0285(72)90022-9

    Article  Google Scholar 

  19. Chein, M., Mugnier, M.L.: Conceptual graphs: fundamental notions. Revue d’Inteligence Artificielle 6(4), 365–406 (1992)

    Google Scholar 

  20. Goldin-Meadow, S.: The role of gesture in communication and thinking. Trends Cognit. Sci. 3(11), 419–429 (1999). https://doi.org/10.1016/S1364-6613(99)01397-2

    Article  Google Scholar 

  21. Kelly, S.D., Kravitz, C., Hopkins, M.: Neural correlates of bimodal speech and gesture comprehension. Brain Lang. 89(1), 253–260 (2004). https://doi.org/10.1016/S0093-934X(03)00335-3

    Article  Google Scholar 

  22. Cook, S.W., Tanenhaus, M.K.: Embodied communication: speakers’ gestures affect listeners’ actions. Cognition 113(1), 98–104 (2009). https://doi.org/10.1016/j.cognition.2009.06.006

    Article  Google Scholar 

  23. Kendon, A.: Gesture: Visible Action as Utterance. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  24. Morency, L.-P., de Kok, I., Gratch, J.: Context-based recognition during human interactions: automatic feature selection and encoding dictionary. In: Proceedings of the Tenth International Conference on Multimedia Interfaces (ICMI), Chania, Crete, Greece, pp. 181–188 (2008). https://doi.org/10.1145/1452392.1452426

  25. Jensen, K., Kristensen, L.M.: Colored Petri Nets: Modeling and Validation of Concurrent Systems. Springer, Heidelberg (2009)

    Book  Google Scholar 

  26. Wang, J.: Timed Petri Net: Theory and Applications. Springer Science + Business Media, New York, NY (1998)

    Google Scholar 

  27. Liu, W., Du, Y.: Modeling multimedia synchronization using Petri nets. Inf. Technol. J. 8(7), 1054–1058 (2009). https://doi.org/10.3923/itj.2009.1054.1058

    Article  Google Scholar 

  28. Ekman, P., Friesen, W.V.: The repertoire of nonverbal behavior: categories, origins, usage, and coding. Semiotica 1(1), 49–98 (1969). https://doi.org/10.1515/9783110880021.57

    Article  Google Scholar 

  29. Zhao, R., Wang, Y., Jia, P., Li, Ma, C.Y., Zhang, Z.: Review of human gesture recognition based on computer vision technology. In: Proceedings of the IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, pp. 1599–1603 (2021). https://doi.org/10.1109/IAEAC50856.2021.9390889

  30. Alibali, M.W., Goldin-Meadow, S.: Gesture-speech mismatch and mechanisms of learning: what the hands reveal about a child’s state of mind. Cognit. Psychol. 25(4), 468–523 (1993). https://doi.org/10.1006/cogp.1993.1012

    Article  Google Scholar 

  31. Asadi-Aghbolaghi, M., Fathy, M., Behbahani, M.M., Sarrafzadeh, A.: A survey on deep learning based approaches for action and gesture recognition in image sequences. In: Proceedings of the12th IEEE International Conference on Automatic Face & Gesture Recognition, Washington, DC, USA, pp. 476–483 (2017). https://doi.org/10.1109/FG.2017.150

  32. Gong, X.-Y., Su, H., Xu, D., Zhang, Z.-T., Shen, F., Yang, H.-B.: An overview of contour detection approaches. Int. J. Autom. Comput. 15(6), 656–672 (2018). https://doi.org/10.1007/s11633-018-1117-z

    Article  Google Scholar 

  33. Pisharady, P.K., Saerbeck, M.: Recent methods in vision-based hand-gesture recognition: a review. Comput. Vis. Image Underst. 141, 152–165 (2015). https://doi.org/10.1016/j.cviu.2015.08.004

    Article  Google Scholar 

  34. Yang, D., Peng, B., Al-Huda, A., Malik, A., Zhai, D.: An overview of edge and object-contour detection. Neurocomputing 488, 470–493 (2022). https://doi.org/10.1016/j.neucom.2022.02.079

    Article  Google Scholar 

  35. Zhang, Y., Li, S.: A survey of shape representation and description techniques. Pattern Recognit 42(1), 1–19 (2009). https://doi.org/10.1016/j.patcog.2003.07.008

    Article  Google Scholar 

  36. Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017). https://doi.org/10.1109/ACCESS.2017.2684186

    Article  Google Scholar 

  37. Yu, J., Qin, M., Zhou, S.: Dynamic gesture recognition based on 2D convolutional neural network and feature fusion. Sci. Rep. 12, article 4345 (2022). https://doi.org/10.1038/s41598-022-08133-z

  38. Nam, Y., Wohn, N., Lee-Kwang, H.: Modeling and recognition of hand gesture using colored Petri nets. IEEE Trans. Syst. Man Cybern. Part A: Syst. Humans 29(5), 514–421 (1999). https://doi.org/10.1109/3468.784178

    Article  Google Scholar 

  39. Mediapipe. https://mediapipe.dev. Last accessed 24 March 2023

  40. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001). https://doi.org/10.1145/375360.375365

    Article  Google Scholar 

  41. Open CV. https://opencv.org. Last accessed 24 March 2023

  42. PyAudio. https://people.csail.mit.edu/hubert/pyaudio/docs/. Last accessed 24 March 2023

  43. Pydub. https://pypi.org/project/pydub/. Last accessed 24 March 2023

  44. Chen, D., Manning, C.D.: A fast and accurate dependency parser using neural networks. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 740–750 (2014). https://doi.org/10.3115/v1/D14-1082

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditi Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, A., Bansal, A.K. (2024). An Integrated Analysis for Identifying Iconic Gestures in Human-Robot Interactions. In: Arai, K. (eds) Intelligent Systems and Applications. IntelliSys 2023. Lecture Notes in Networks and Systems, vol 825. Springer, Cham. https://doi.org/10.1007/978-3-031-47718-8_18

Download citation

Publish with us

Policies and ethics