Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures

Singh, Aditi; Bansal, Arvind K.

doi:10.1007/978-3-031-37963-5_85

Aditi Singh¹⁰ &
Arvind K. Bansal¹⁰

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 739))

Included in the following conference series:

Science and Information Conference

772 Accesses
1 Citations

Abstract

Gestures are an important part of intelligent human-robot interactions. Co-speech gestures are a subclass of gestures that integrate speech and dialogs with synchronous combinations of various postures, haptics (touch), and motions such as head, hand, index finger or palm, and gaze. Deictic gestures are a subclass of co-speech gestures that provide Spatio-temporal reference to entities in the field-of-vision, by pointing at an individual entity or collection of entities and referring to them using pronouns in spoken phrases. Deictic gestures are important for human-robot interaction due to their property of attention seeking and providing a common frame-of-reference by object localization. In this research, we identify different subclasses of deictic gestures and extend the Synchronized Colored Petri net (SCP) model to recognize deictic gestures. The proposed extension integrates synchronized motions of head, hand, index-finger, palm, gaze (eye-motion tracking and focus) with pronoun reference in speech. An implementation using video-frame analysis and gesture-signatures representing meta-level attributes of SCP has been described. An algorithm has been presented. Performance analysis shows that the recall is approximately 85 percent for deictic gestures, and conversational head-gestures are separated from deictic gestures 95 percent of the time. Results show that mislabeling in deictic gestures occurs due to missing frames, feature points, undetectable motions, and the choice of thresholds during video analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Integrated Analysis for Identifying Iconic Gestures in Human-Robot Interactions

Synchronous Colored Petri Net Based Modeling and Video Analysis of Conversational Head-Gestures for Training Social Robots

Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots

References

Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://doi.org/10.1007/s11482-014-9334-2
Article Google Scholar
Agrigoroaie, R.M., Tapus, A.: Developing a healthcare robot with personalized behaviors and social skills for the elderly. In: Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 589–590. Christchurch, New Zealand (2016). 10.1109/ HRI.2016.7451870
Google Scholar
García, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in therapy and care. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 669–670. Daegu, Korea (2019). https://doi.org/10.1109/HRI.2019.8673243
Rosenberg-Kima, R., Koren, Y., Yachini M., Gordon, G.: Human-Robot collaboration (HRC): social robots as teaching assistants for training activities in small groups. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 522–523. Daegu, South Korea (2019). https://doi.org/10.1109/HRI.2019.8673103
Diftler, M.A., Ahlstrom, T.D., Ambrose, R.O., Radford, N.A., Joyce, C.A., De La Pena, N., et al.: Robonaut 2—initial activities on-board the ISS. In: IEEE Aerospace Conference, pp. 1–12, Big Sky, Montana, USA (2012). https://doi.org/10.1109/AERO.2012.6187268
Glas, D.F., Minato, T., Ishi, C.T., Kawahara, T., Ishiguro, H.: ERICA: the ERATO intelligent conversational android. In: Proceedings of the 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 22–29, New York, NY, USA (2016). https://doi.org/10.1109/ROMAN.2016.7745086
Atmeh, G.M., Ranatunga, I., Popa, D.O., Subbarao, K., Lewis, F., Rowe, P.: Implementation of an adaptive, model free, learning controller on the Atlas robot. In: American Control Conference, pp. 2887–2892, Portland, OR, USA(2014). https://doi.org/10.1109/ACC.2014.6859431
Bansal, A.K., Ghayoumi, M.: A hybrid model to improve occluded facial expressions prediction in the wild during conversational head movements. Int. J. Adv. Life Sci. 13(1–2), 65–74 (2021). https://www.iariajournals.org/life_sciences/lifsci_v13_n12_2021_paged.pdf
Ekman, P., Friesen, W.V.: Nonverbal Behavior. In: Ostwald, P.F. (ed.) Communication and Social Interaction, pp. 37- 46, Grune & Stratton, New York, NY (1977)
Google Scholar
Plutchik, R.: Emotion: A Psychoevolutionary Synthesis. Harper & Row, New York, NY, USA (1980)
Google Scholar
Craig, K.D., Prkachin, K.M., Grunau, R.V.: The facial expression of pain. In: Turk, D.C., Melzack, R. (eds.) Handbook of Pain Assessment, 3rd edn, pp. 117–133, New York: Guilford, USA (2011). ISBN 978-1-60623-976-6
Google Scholar
Lucey, P., et al.: Automatically detecting pain in Video through facial action units. IEEE Trans. Syst. Man Cybern. 41(3), 664–674 (2011). https://doi.org/10.1109/TSMCB.2010.208252
Article Google Scholar
Kendon, A.: Gesture: Visible Actions as Utterance. Cambridge University Press, Cambridge, UK (2004)
Book Google Scholar
Fillmore, C.J.: Towards a descriptive framework for spatial deixis. Speech place and action: Studies in deixis and related topics, pp. 31–59 (1982)
Google Scholar
Correa, M., Ruiz-del-Solar, J., Verschae, R., Lee-Ferng, J., Castillo, N.: Real-time hand gesture recognition for human robot interaction. In: Baltes, J., Lagoudakis, M.G., Naruse, T., Ghidary, S.S. (eds.) RoboCup 2009. LNCS (LNAI), vol. 5949, pp. 46–57. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11876-0_5
Chapter Google Scholar
Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture generation in humanoids. In: Arai, K. (ed.) Intelligent Computing. LNCS, vol 283, pp. 737-756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9_47
Singh, A., Bansal, A.K.: Automated real-time recognition of non-emotional conversational head-gestures for social robots. In: Arai, K. (ed.) Proceedings of the Future Technology Conference (FTC), vol. 3, Vancouver, Canada, LNNS, vol. 561, pp. 432–450 (2022). https://doi.org/10.1007/978-3-031-18344-7_29
Yang, M.-H., Tao, J.-H.: Data fusion methods in multimodal human-computer dialog. Virtual Reality Intell. Hardware 1(1), 21–28 (2019). https://doi.org/10.3724/SP.J.2096-5796.2018.0010
Article Google Scholar
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2012). https://doi.org/10.1007/s10462-012-9356-9
Article Google Scholar
Stukenbrock, A.: Deixis, Meta-perceptive gaze practices and the interactional achievement of joint attention. Front. Psychol. 11, Article 1779 (2020). https://doi.org/10.3389/fpsyg.2020.01779
Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front. Robot. AI 2(28), Article 28 (2015). https://doi.org/10.3389/frobt.2015.00028
Beddiar, D.R., Nini, B., Sabokrou, M., Hadid, A.: Vision-based human activity recognition: a survey. Multimedia Tools Appl. 79(41–42), 30509–30555 (2020). https://doi.org/10.1007/s11042-020-09004-3
Article Google Scholar
Morency, L.-P., Christoudias, C.M., Darrell, T.: Recognizing gaze aversion gestures in embodied conversational discourse. In: Proceedings of the 8^th International Conference on Multimedia Interfaces, pp. 287–294. Banff, Alberta, Canada (2006). 10.1145/ 1180995.1181051
Google Scholar
Vertegaal, R., Slagter, R., van der Veer, G., Nijholt, A.: Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 301–308. Seattle, WA, USA (2001). https://doi.org/10.1145/365024.365119
Pisharady, P.K., Saerbeck, M.: Recent methods in vision-based hand-gesture recognition: a review. Comput. Vis. Image Underst. 141, 152–165 (2015). https://doi.org/10.1016/j.cviu.2015.08.004
Article Google Scholar
Brooks, A.G., Breazeal, C.: Working with robots and objects: revisiting deictic reference for achieving spatial common ground. In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction (HRI), pp. 297–304. Salt Lake City, UT, USA (2006). https://doi.org/10.1145/1121241.1121292
Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983). https://doi.org/10.1145/182.358434
Article MATH Google Scholar
Kita, S. (ed.): Pointing: a foundational building block of human communication. In: Pointing: Where Language Culture and Cognition Meet, pp. 171–215. Lawrence Erlbaum Associates, Mahwah, NJ (2003)
Google Scholar
Gliga, T., Csibra, G.: One year old infant appreciate the referential nature of deictic gestures and words. Psychol. Sci. 20(3), 347–353 (2009). https://doi.org/10.1111/j.1467-9280.2009.02295.x
Article Google Scholar
Goldin-Meadow, S., Mylander, C., de Villiers, J., Bates, E., Volterra, V.: Gestural communication in deaf children: the effects and non-effects of parental input on early language development. Monogr. Soc. Res. Child Dev. 49(3–4), 1–151 (1984)
Article Google Scholar
Bejarano, T.: Becoming Human: From Pointing Gestures to Syntax. John Benjamins Publishing, Amsterdam, The Netherlands (2011)
Book Google Scholar
Clark, H.H.: Coordinating with each other in a material world. Discourse Stud. 7(4), 507–525 (2005). https://doi.org/10.1177/1461445605054404
Article Google Scholar
Louwerse, M.M., Bangerter, A.: Focusing attention with deictic gestures and linguistic expressions. In: Proceedings of the Annual Conference of Cognitive Science Society, pp. 1331–1336. Stresa, Italy (2005). Available at escolarship.org/uc/item/201422tj. Accessed 6 Nov 2022
Google Scholar
Qu, S., Chai, J.Y.: Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces. In: Proceedings of the 13th ACM International Conference on Intelligent User Interfaces (IUI), pp. 237–246. Gran Canaria, Spain (2008). https://doi.org/10.1145/1378773.1378805
Kang, D., Kwak, S.S., Lee, H., Kim, E.H., Choi, J.: This or that: the effect of robot's deictic expression on user's perception. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11383–11390. Las Vegas, NV, USA (2020). https://doi.org/10.1109/IROS45743.2020.9341067
Bolt, R.A.: “Put-That-There”: voice and gesture at the graphic interface. ACM SIGRAPH Comput. Graph. 14(3), 262–270 (1980). https://doi.org/10.1145/965105.807503
Article MathSciNet Google Scholar
Breazeal, C., Kidd, C.D., Thomaz, A.L., Hoffman, G., Berlin, M.: Effects of nonverbal communication on efficiency and robustness in human-robot teamwork. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 708–713. Edmonton, Alberta, Canada (2005). https://doi.org/10.1109/IROS.2005.1545011
Hato, Y., Satake, S., Kanda, T., Imai, M., Hagita, N.: Pointing to space: modeling of deictic interaction referring to regions. In: Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 301–308. Osaka, Japan (2010). https://doi.org/10.1109/HRI.2010.5453180
Hu, J., Jiang, Z., Ding, X., Mu, T., Hall, P.: VGPN: voice-guided pointing robot navigation for humans. In: Proceedings of the IEEE International Conference on Robotics and Biomimetic (ROBIO), pp. 1107–1112. Kuala Lumpur, Malaysia (2018). https://doi.org/10.1109/ROBIO.2018.8664854
Nickel, K., Stiefelhagen, R.: Visual recognition of pointing gestures for human-robot interaction. J. Image Vision Comput. 25(12), 1875–1884 (2007). https://doi.org/10.1016//j.imavis.2005.12.020
Article Google Scholar
Nagai, Y.: Learning to comprehend deictic gestures in robots and human Infants. In: Proceedings of the IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN), pp. 217–222. (2005). 10.1109/ ROMAN.2005.1513782
Google Scholar
Sidner, C.L., Kidd, C.D., Lee, C., Lesh, N.: Where to look: a study of human-robot engagement. In: Proceedings of the 9th international conference on Intelligent user interfaces (IUI 2004), pp. 78–84. Association for Computing Machinery, New York, NY, USA (2004). https://doi.org/10.1145/964442.964458
Sprute, D., Rasch, R., Pörtner, A., Battermann, S., König, M.: Gesture-based object localization for robot applications in intelligent environments. In: Proceedings of the 14th International Conference on Intelligent Environments (IE), pp. 48–55 (2018). https://doi.org/10.1109/IE.2018.00015
Sugiyama, O., Kanda, T., Imai, M., Ishiguro, H., Hagita, N.: Natural deictic communication with humanoid robots. In: Proceedings of the IEEE International Conference on Intelligent Robot Systems, pp. 1441–1448. San Diego, CA, USA (2007). https://doi.org/10.1109/IROS.2007.4399120
Azari, B., Lim, A., Vaughan, R.: Commodifying pointing in HRI: simple and fast pointing gesture detection from RGB-d images. In: Proceedings of the 16th Conference on Computer and Robot Vision (CRV), pp. 174–180. Kingston, ON, Canada (2019). https://doi.org/10.1109/CRV.2019.00031
Wong, N., Gutwin, C.: Where are you pointing? the accuracy of deictic pointing in CVEs. In: Proceedings of the 28th ACM Conference on Human Factors in Computing Systems (CHI), pp. 1029–1038 (2010). https://doi.org/10.1145/1753326.1753480
Hofemann, N., Fritsch, J., Sagerer, G.: Recognition of deictic gestures with context. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 334–341. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28649-3_41
Chapter Google Scholar
Kollorz, E., Penne, J., Hornegger, J., Barke, A.: Gesture recognition with a time-of-flight camera. Int. J. Intell. Syst. Technol. Appl. 5(3–4), 334–343 (2008). https://doi.org/10.1504/IJISTA.2008.021296
Article Google Scholar
Kondaxakis, P., Pajarinen, J., Kyrki, V.: Real-time recognition of pointing gestures for robot to robot interaction. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2621–2626. Chicago, IL, USA (2014). https://doi.org/10.1109/IROS.2014.6942920
Lai, Y., Wang, C., Li, Y., Ge, S.S., Huang, D.: 3d pointing gesture recognition for human-robot interaction. In: Proceedings of the Chinese Control and Decision Conference (CCDC), pp. 4959–4964. Yinchuan, China (2016). https://doi.org/10.1109/CCDC.2016.7531881
Nowack, T., Lutherdt, S., Jehring, S., Xiong, Y., Wenzel, S., Kurtz, P.: Detecting deictic gestures for control of mobile robots. In: Savage-Knepshield, P., Chen, J. (eds.) Advances in Human Factors in Robots and Unmanned Systems, pp. 87–96. Springer International Publishing, Cham (2017). https://doi.org/10.1007/978-3-319-41959-6_8
Chapter Google Scholar
OpenCV. https://opencv.org. Accessed 13 Nov 2022
Mediapipe. https://mediapipe.dev. Accessed 10 Nov 2022
PyAudio. https://people.csail.mit.edu/hubert/pyaudio/docs/. Accessed 11 Nov 2022
Pydub. https://pypi.org/project/pydub/. Accessed 11 Nov 2022
Morency, L.-P., Sidner, C. L., Darrell, T.: Dialog context for visual feedback recognition. Wiley Series in Agent Technology, pp. 117–131. https://doi.org/10.1002/9780470512470.CH7

Download references

Author information

Authors and Affiliations

Department of Computer Science, Kent State University, Kent, OH, 44242, USA
Aditi Singh & Arvind K. Bansal

Authors

Aditi Singh
View author publications
You can also search for this author in PubMed Google Scholar
Arvind K. Bansal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aditi Singh .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, A., Bansal, A.K. (2023). Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 739. Springer, Cham. https://doi.org/10.1007/978-3-031-37963-5_85

Download citation

DOI: https://doi.org/10.1007/978-3-031-37963-5_85
Published: 20 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37962-8
Online ISBN: 978-3-031-37963-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Integrated Analysis for Identifying Iconic Gestures in Human-Robot Interactions

Synchronous Colored Petri Net Based Modeling and Video Analysis of Conversational Head-Gestures for Training Social Robots

Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

An Integrated Analysis for Identifying Iconic Gestures in Human-Robot Interactions

Synchronous Colored Petri Net Based Modeling and Video Analysis of Conversational Head-Gestures for Training Social Robots

Automated Real-Time Recognition of Non-emotional Conversational Head-Gestures for Social Robots

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation