Skip to main content

Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures

  • Conference paper
  • First Online:
Intelligent Computing (SAI 2023)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 739))

Included in the following conference series:

Abstract

Gestures are an important part of intelligent human-robot interactions. Co-speech gestures are a subclass of gestures that integrate speech and dialogs with synchronous combinations of various postures, haptics (touch), and motions such as head, hand, index finger or palm, and gaze. Deictic gestures are a subclass of co-speech gestures that provide Spatio-temporal reference to entities in the field-of-vision, by pointing at an individual entity or collection of entities and referring to them using pronouns in spoken phrases. Deictic gestures are important for human-robot interaction due to their property of attention seeking and providing a common frame-of-reference by object localization. In this research, we identify different subclasses of deictic gestures and extend the Synchronized Colored Petri net (SCP) model to recognize deictic gestures. The proposed extension integrates synchronized motions of head, hand, index-finger, palm, gaze (eye-motion tracking and focus) with pronoun reference in speech. An implementation using video-frame analysis and gesture-signatures representing meta-level attributes of SCP has been described. An algorithm has been presented. Performance analysis shows that the recall is approximately 85 percent for deictic gestures, and conversational head-gestures are separated from deictic gestures 95 percent of the time. Results show that mislabeling in deictic gestures occurs due to missing frames, feature points, undetectable motions, and the choice of thresholds during video analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://doi.org/10.1007/s11482-014-9334-2

    Article  Google Scholar 

  2. Agrigoroaie, R.M., Tapus, A.: Developing a healthcare robot with personalized behaviors and social skills for the elderly. In: Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 589–590. Christchurch, New Zealand (2016). 10.1109/ HRI.2016.7451870

    Google Scholar 

  3. García, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in therapy and care. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 669–670. Daegu, Korea (2019). https://doi.org/10.1109/HRI.2019.8673243

  4. Rosenberg-Kima, R., Koren, Y., Yachini M., Gordon, G.: Human-Robot collaboration (HRC): social robots as teaching assistants for training activities in small groups. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 522–523. Daegu, South Korea (2019). https://doi.org/10.1109/HRI.2019.8673103

  5. Diftler, M.A., Ahlstrom, T.D., Ambrose, R.O., Radford, N.A., Joyce, C.A., De La Pena, N., et al.: Robonaut 2—initial activities on-board the ISS. In: IEEE Aerospace Conference, pp. 1–12, Big Sky, Montana, USA (2012). https://doi.org/10.1109/AERO.2012.6187268

  6. Glas, D.F., Minato, T., Ishi, C.T., Kawahara, T., Ishiguro, H.: ERICA: the ERATO intelligent conversational android. In: Proceedings of the 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 22–29, New York, NY, USA (2016). https://doi.org/10.1109/ROMAN.2016.7745086

  7. Atmeh, G.M., Ranatunga, I., Popa, D.O., Subbarao, K., Lewis, F., Rowe, P.: Implementation of an adaptive, model free, learning controller on the Atlas robot. In: American Control Conference, pp. 2887–2892, Portland, OR, USA(2014). https://doi.org/10.1109/ACC.2014.6859431

  8. Bansal, A.K., Ghayoumi, M.: A hybrid model to improve occluded facial expressions prediction in the wild during conversational head movements. Int. J. Adv. Life Sci. 13(1–2), 65–74 (2021). https://www.iariajournals.org/life_sciences/lifsci_v13_n12_2021_paged.pdf

  9. Ekman, P., Friesen, W.V.: Nonverbal Behavior. In: Ostwald, P.F. (ed.) Communication and Social Interaction, pp. 37- 46, Grune & Stratton, New York, NY (1977)

    Google Scholar 

  10. Plutchik, R.: Emotion: A Psychoevolutionary Synthesis. Harper & Row, New York, NY, USA (1980)

    Google Scholar 

  11. Craig, K.D., Prkachin, K.M., Grunau, R.V.: The facial expression of pain. In: Turk, D.C., Melzack, R. (eds.) Handbook of Pain Assessment, 3rd edn, pp. 117–133, New York: Guilford, USA (2011). ISBN 978-1-60623-976-6

    Google Scholar 

  12. Lucey, P., et al.: Automatically detecting pain in Video through facial action units. IEEE Trans. Syst. Man Cybern. 41(3), 664–674 (2011). https://doi.org/10.1109/TSMCB.2010.208252

    Article  Google Scholar 

  13. Kendon, A.: Gesture: Visible Actions as Utterance. Cambridge University Press, Cambridge, UK (2004)

    Book  Google Scholar 

  14. Fillmore, C.J.: Towards a descriptive framework for spatial deixis. Speech place and action: Studies in deixis and related topics, pp. 31–59 (1982)

    Google Scholar 

  15. Correa, M., Ruiz-del-Solar, J., Verschae, R., Lee-Ferng, J., Castillo, N.: Real-time hand gesture recognition for human robot interaction. In: Baltes, J., Lagoudakis, M.G., Naruse, T., Ghidary, S.S. (eds.) RoboCup 2009. LNCS (LNAI), vol. 5949, pp. 46–57. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11876-0_5

    Chapter  Google Scholar 

  16. Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture generation in humanoids. In: Arai, K. (ed.) Intelligent Computing. LNCS, vol 283, pp. 737-756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9_47

  17. Singh, A., Bansal, A.K.: Automated real-time recognition of non-emotional conversational head-gestures for social robots. In: Arai, K. (ed.) Proceedings of the Future Technology Conference (FTC), vol. 3, Vancouver, Canada, LNNS, vol. 561, pp. 432–450 (2022). https://doi.org/10.1007/978-3-031-18344-7_29

  18. Yang, M.-H., Tao, J.-H.: Data fusion methods in multimodal human-computer dialog. Virtual Reality Intell. Hardware 1(1), 21–28 (2019). https://doi.org/10.3724/SP.J.2096-5796.2018.0010

    Article  Google Scholar 

  19. Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2012). https://doi.org/10.1007/s10462-012-9356-9

    Article  Google Scholar 

  20. Stukenbrock, A.: Deixis, Meta-perceptive gaze practices and the interactional achievement of joint attention. Front. Psychol. 11, Article 1779 (2020). https://doi.org/10.3389/fpsyg.2020.01779

  21. Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front. Robot. AI 2(28), Article 28 (2015). https://doi.org/10.3389/frobt.2015.00028

  22. Beddiar, D.R., Nini, B., Sabokrou, M., Hadid, A.: Vision-based human activity recognition: a survey. Multimedia Tools Appl. 79(41–42), 30509–30555 (2020). https://doi.org/10.1007/s11042-020-09004-3

    Article  Google Scholar 

  23. Morency, L.-P., Christoudias, C.M., Darrell, T.: Recognizing gaze aversion gestures in embodied conversational discourse. In: Proceedings of the 8th International Conference on Multimedia Interfaces, pp. 287–294. Banff, Alberta, Canada (2006). 10.1145/ 1180995.1181051

    Google Scholar 

  24. Vertegaal, R., Slagter, R., van der Veer, G., Nijholt, A.: Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 301–308. Seattle, WA, USA (2001). https://doi.org/10.1145/365024.365119

  25. Pisharady, P.K., Saerbeck, M.: Recent methods in vision-based hand-gesture recognition: a review. Comput. Vis. Image Underst. 141, 152–165 (2015). https://doi.org/10.1016/j.cviu.2015.08.004

    Article  Google Scholar 

  26. Brooks, A.G., Breazeal, C.: Working with robots and objects: revisiting deictic reference for achieving spatial common ground. In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction (HRI), pp. 297–304. Salt Lake City, UT, USA (2006). https://doi.org/10.1145/1121241.1121292

  27. Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983). https://doi.org/10.1145/182.358434

    Article  MATH  Google Scholar 

  28. Kita, S. (ed.): Pointing: a foundational building block of human communication. In: Pointing: Where Language Culture and Cognition Meet, pp. 171–215. Lawrence Erlbaum Associates, Mahwah, NJ (2003)

    Google Scholar 

  29. Gliga, T., Csibra, G.: One year old infant appreciate the referential nature of deictic gestures and words. Psychol. Sci. 20(3), 347–353 (2009). https://doi.org/10.1111/j.1467-9280.2009.02295.x

    Article  Google Scholar 

  30. Goldin-Meadow, S., Mylander, C., de Villiers, J., Bates, E., Volterra, V.: Gestural communication in deaf children: the effects and non-effects of parental input on early language development. Monogr. Soc. Res. Child Dev. 49(3–4), 1–151 (1984)

    Article  Google Scholar 

  31. Bejarano, T.: Becoming Human: From Pointing Gestures to Syntax. John Benjamins Publishing, Amsterdam, The Netherlands (2011)

    Book  Google Scholar 

  32. Clark, H.H.: Coordinating with each other in a material world. Discourse Stud. 7(4), 507–525 (2005). https://doi.org/10.1177/1461445605054404

    Article  Google Scholar 

  33. Louwerse, M.M., Bangerter, A.: Focusing attention with deictic gestures and linguistic expressions. In: Proceedings of the Annual Conference of Cognitive Science Society, pp. 1331–1336. Stresa, Italy (2005). Available at escolarship.org/uc/item/201422tj. Accessed 6 Nov 2022

    Google Scholar 

  34. Qu, S., Chai, J.Y.: Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces. In: Proceedings of the 13th ACM International Conference on Intelligent User Interfaces (IUI), pp. 237–246. Gran Canaria, Spain (2008). https://doi.org/10.1145/1378773.1378805

  35. Kang, D., Kwak, S.S., Lee, H., Kim, E.H., Choi, J.: This or that: the effect of robot's deictic expression on user's perception. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11383–11390. Las Vegas, NV, USA (2020). https://doi.org/10.1109/IROS45743.2020.9341067

  36. Bolt, R.A.: “Put-That-There”: voice and gesture at the graphic interface. ACM SIGRAPH Comput. Graph. 14(3), 262–270 (1980). https://doi.org/10.1145/965105.807503

    Article  MathSciNet  Google Scholar 

  37. Breazeal, C., Kidd, C.D., Thomaz, A.L., Hoffman, G., Berlin, M.: Effects of nonverbal communication on efficiency and robustness in human-robot teamwork. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 708–713. Edmonton, Alberta, Canada (2005). https://doi.org/10.1109/IROS.2005.1545011

  38. Hato, Y., Satake, S., Kanda, T., Imai, M., Hagita, N.: Pointing to space: modeling of deictic interaction referring to regions. In: Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 301–308. Osaka, Japan (2010). https://doi.org/10.1109/HRI.2010.5453180

  39. Hu, J., Jiang, Z., Ding, X., Mu, T., Hall, P.: VGPN: voice-guided pointing robot navigation for humans. In: Proceedings of the IEEE International Conference on Robotics and Biomimetic (ROBIO), pp. 1107–1112. Kuala Lumpur, Malaysia (2018). https://doi.org/10.1109/ROBIO.2018.8664854

  40. Nickel, K., Stiefelhagen, R.: Visual recognition of pointing gestures for human-robot interaction. J. Image Vision Comput. 25(12), 1875–1884 (2007). https://doi.org/10.1016//j.imavis.2005.12.020

    Article  Google Scholar 

  41. Nagai, Y.: Learning to comprehend deictic gestures in robots and human Infants. In: Proceedings of the IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN), pp. 217–222. (2005). 10.1109/ ROMAN.2005.1513782

    Google Scholar 

  42. Sidner, C.L., Kidd, C.D., Lee, C., Lesh, N.: Where to look: a study of human-robot engagement. In: Proceedings of the 9th international conference on Intelligent user interfaces (IUI 2004), pp. 78–84. Association for Computing Machinery, New York, NY, USA (2004). https://doi.org/10.1145/964442.964458

  43. Sprute, D., Rasch, R., Pörtner, A., Battermann, S., König, M.: Gesture-based object localization for robot applications in intelligent environments. In: Proceedings of the 14th International Conference on Intelligent Environments (IE), pp. 48–55 (2018). https://doi.org/10.1109/IE.2018.00015

  44. Sugiyama, O., Kanda, T., Imai, M., Ishiguro, H., Hagita, N.: Natural deictic communication with humanoid robots. In: Proceedings of the IEEE International Conference on Intelligent Robot Systems, pp. 1441–1448. San Diego, CA, USA (2007). https://doi.org/10.1109/IROS.2007.4399120

  45. Azari, B., Lim, A., Vaughan, R.: Commodifying pointing in HRI: simple and fast pointing gesture detection from RGB-d images. In: Proceedings of the 16th Conference on Computer and Robot Vision (CRV), pp. 174–180. Kingston, ON, Canada (2019). https://doi.org/10.1109/CRV.2019.00031

  46. Wong, N., Gutwin, C.: Where are you pointing? the accuracy of deictic pointing in CVEs. In: Proceedings of the 28th ACM Conference on Human Factors in Computing Systems (CHI), pp. 1029–1038 (2010). https://doi.org/10.1145/1753326.1753480

  47. Hofemann, N., Fritsch, J., Sagerer, G.: Recognition of deictic gestures with context. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 334–341. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28649-3_41

    Chapter  Google Scholar 

  48. Kollorz, E., Penne, J., Hornegger, J., Barke, A.: Gesture recognition with a time-of-flight camera. Int. J. Intell. Syst. Technol. Appl. 5(3–4), 334–343 (2008). https://doi.org/10.1504/IJISTA.2008.021296

    Article  Google Scholar 

  49. Kondaxakis, P., Pajarinen, J., Kyrki, V.: Real-time recognition of pointing gestures for robot to robot interaction. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2621–2626. Chicago, IL, USA (2014). https://doi.org/10.1109/IROS.2014.6942920

  50. Lai, Y., Wang, C., Li, Y., Ge, S.S., Huang, D.: 3d pointing gesture recognition for human-robot interaction. In: Proceedings of the Chinese Control and Decision Conference (CCDC), pp. 4959–4964. Yinchuan, China (2016). https://doi.org/10.1109/CCDC.2016.7531881

  51. Nowack, T., Lutherdt, S., Jehring, S., Xiong, Y., Wenzel, S., Kurtz, P.: Detecting deictic gestures for control of mobile robots. In: Savage-Knepshield, P., Chen, J. (eds.) Advances in Human Factors in Robots and Unmanned Systems, pp. 87–96. Springer International Publishing, Cham (2017). https://doi.org/10.1007/978-3-319-41959-6_8

    Chapter  Google Scholar 

  52. OpenCV. https://opencv.org. Accessed 13 Nov 2022

  53. Mediapipe. https://mediapipe.dev. Accessed 10 Nov 2022

  54. PyAudio. https://people.csail.mit.edu/hubert/pyaudio/docs/. Accessed 11 Nov 2022

  55. Pydub. https://pypi.org/project/pydub/. Accessed 11 Nov 2022

  56. Morency, L.-P., Sidner, C. L., Darrell, T.: Dialog context for visual feedback recognition. Wiley Series in Agent Technology, pp. 117–131. https://doi.org/10.1002/9780470512470.CH7

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aditi Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, A., Bansal, A.K. (2023). Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 739. Springer, Cham. https://doi.org/10.1007/978-3-031-37963-5_85

Download citation

Publish with us

Policies and ethics