Abstract
Gestures are an important part of intelligent human-robot interactions. Co-speech gestures are a subclass of gestures that integrate speech and dialogs with synchronous combinations of various postures, haptics (touch), and motions such as head, hand, index finger or palm, and gaze. Deictic gestures are a subclass of co-speech gestures that provide Spatio-temporal reference to entities in the field-of-vision, by pointing at an individual entity or collection of entities and referring to them using pronouns in spoken phrases. Deictic gestures are important for human-robot interaction due to their property of attention seeking and providing a common frame-of-reference by object localization. In this research, we identify different subclasses of deictic gestures and extend the Synchronized Colored Petri net (SCP) model to recognize deictic gestures. The proposed extension integrates synchronized motions of head, hand, index-finger, palm, gaze (eye-motion tracking and focus) with pronoun reference in speech. An implementation using video-frame analysis and gesture-signatures representing meta-level attributes of SCP has been described. An algorithm has been presented. Performance analysis shows that the recall is approximately 85 percent for deictic gestures, and conversational head-gestures are separated from deictic gestures 95 percent of the time. Results show that mislabeling in deictic gestures occurs due to missing frames, feature points, undetectable motions, and the choice of thresholds during video analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yenilmez, M.I.: Economic and social consequences of population aging the dilemmas and opportunities in the twenty-first century. Appl. Res. Qual. Life 10(4), 735–752 (2015). https://doi.org/10.1007/s11482-014-9334-2
Agrigoroaie, R.M., Tapus, A.: Developing a healthcare robot with personalized behaviors and social skills for the elderly. In: Proceedings of the 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 589–590. Christchurch, New Zealand (2016). 10.1109/ HRI.2016.7451870
García, D.H., Esteban, P.G., Lee, H.R., Romeo, M., Senft, E., Billing, E.: Social robots in therapy and care. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 669–670. Daegu, Korea (2019). https://doi.org/10.1109/HRI.2019.8673243
Rosenberg-Kima, R., Koren, Y., Yachini M., Gordon, G.: Human-Robot collaboration (HRC): social robots as teaching assistants for training activities in small groups. In: Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 522–523. Daegu, South Korea (2019). https://doi.org/10.1109/HRI.2019.8673103
Diftler, M.A., Ahlstrom, T.D., Ambrose, R.O., Radford, N.A., Joyce, C.A., De La Pena, N., et al.: Robonaut 2—initial activities on-board the ISS. In: IEEE Aerospace Conference, pp. 1–12, Big Sky, Montana, USA (2012). https://doi.org/10.1109/AERO.2012.6187268
Glas, D.F., Minato, T., Ishi, C.T., Kawahara, T., Ishiguro, H.: ERICA: the ERATO intelligent conversational android. In: Proceedings of the 25th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 22–29, New York, NY, USA (2016). https://doi.org/10.1109/ROMAN.2016.7745086
Atmeh, G.M., Ranatunga, I., Popa, D.O., Subbarao, K., Lewis, F., Rowe, P.: Implementation of an adaptive, model free, learning controller on the Atlas robot. In: American Control Conference, pp. 2887–2892, Portland, OR, USA(2014). https://doi.org/10.1109/ACC.2014.6859431
Bansal, A.K., Ghayoumi, M.: A hybrid model to improve occluded facial expressions prediction in the wild during conversational head movements. Int. J. Adv. Life Sci. 13(1–2), 65–74 (2021). https://www.iariajournals.org/life_sciences/lifsci_v13_n12_2021_paged.pdf
Ekman, P., Friesen, W.V.: Nonverbal Behavior. In: Ostwald, P.F. (ed.) Communication and Social Interaction, pp. 37- 46, Grune & Stratton, New York, NY (1977)
Plutchik, R.: Emotion: A Psychoevolutionary Synthesis. Harper & Row, New York, NY, USA (1980)
Craig, K.D., Prkachin, K.M., Grunau, R.V.: The facial expression of pain. In: Turk, D.C., Melzack, R. (eds.) Handbook of Pain Assessment, 3rd edn, pp. 117–133, New York: Guilford, USA (2011). ISBN 978-1-60623-976-6
Lucey, P., et al.: Automatically detecting pain in Video through facial action units. IEEE Trans. Syst. Man Cybern. 41(3), 664–674 (2011). https://doi.org/10.1109/TSMCB.2010.208252
Kendon, A.: Gesture: Visible Actions as Utterance. Cambridge University Press, Cambridge, UK (2004)
Fillmore, C.J.: Towards a descriptive framework for spatial deixis. Speech place and action: Studies in deixis and related topics, pp. 31–59 (1982)
Correa, M., Ruiz-del-Solar, J., Verschae, R., Lee-Ferng, J., Castillo, N.: Real-time hand gesture recognition for human robot interaction. In: Baltes, J., Lagoudakis, M.G., Naruse, T., Ghidary, S.S. (eds.) RoboCup 2009. LNCS (LNAI), vol. 5949, pp. 46–57. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11876-0_5
Singh, A., Bansal, A.K.: Towards synchronous model of non-emotional conversational gesture generation in humanoids. In: Arai, K. (ed.) Intelligent Computing. LNCS, vol 283, pp. 737-756. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-80119-9_47
Singh, A., Bansal, A.K.: Automated real-time recognition of non-emotional conversational head-gestures for social robots. In: Arai, K. (ed.) Proceedings of the Future Technology Conference (FTC), vol. 3, Vancouver, Canada, LNNS, vol. 561, pp. 432–450 (2022). https://doi.org/10.1007/978-3-031-18344-7_29
Yang, M.-H., Tao, J.-H.: Data fusion methods in multimodal human-computer dialog. Virtual Reality Intell. Hardware 1(1), 21–28 (2019). https://doi.org/10.3724/SP.J.2096-5796.2018.0010
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2012). https://doi.org/10.1007/s10462-012-9356-9
Stukenbrock, A.: Deixis, Meta-perceptive gaze practices and the interactional achievement of joint attention. Front. Psychol. 11, Article 1779 (2020). https://doi.org/10.3389/fpsyg.2020.01779
Vrigkas, M., Nikou, C., Kakadiaris, I.A.: A review of human activity recognition methods. Front. Robot. AI 2(28), Article 28 (2015). https://doi.org/10.3389/frobt.2015.00028
Beddiar, D.R., Nini, B., Sabokrou, M., Hadid, A.: Vision-based human activity recognition: a survey. Multimedia Tools Appl. 79(41–42), 30509–30555 (2020). https://doi.org/10.1007/s11042-020-09004-3
Morency, L.-P., Christoudias, C.M., Darrell, T.: Recognizing gaze aversion gestures in embodied conversational discourse. In: Proceedings of the 8th International Conference on Multimedia Interfaces, pp. 287–294. Banff, Alberta, Canada (2006). 10.1145/ 1180995.1181051
Vertegaal, R., Slagter, R., van der Veer, G., Nijholt, A.: Eye gaze patterns in conversations: there is more to conversational agents than meets the eyes. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI), pp. 301–308. Seattle, WA, USA (2001). https://doi.org/10.1145/365024.365119
Pisharady, P.K., Saerbeck, M.: Recent methods in vision-based hand-gesture recognition: a review. Comput. Vis. Image Underst. 141, 152–165 (2015). https://doi.org/10.1016/j.cviu.2015.08.004
Brooks, A.G., Breazeal, C.: Working with robots and objects: revisiting deictic reference for achieving spatial common ground. In: Proceedings of the 1st ACM SIGCHI/SIGART Conference on Human-Robot Interaction (HRI), pp. 297–304. Salt Lake City, UT, USA (2006). https://doi.org/10.1145/1121241.1121292
Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26(11), 832–843 (1983). https://doi.org/10.1145/182.358434
Kita, S. (ed.): Pointing: a foundational building block of human communication. In: Pointing: Where Language Culture and Cognition Meet, pp. 171–215. Lawrence Erlbaum Associates, Mahwah, NJ (2003)
Gliga, T., Csibra, G.: One year old infant appreciate the referential nature of deictic gestures and words. Psychol. Sci. 20(3), 347–353 (2009). https://doi.org/10.1111/j.1467-9280.2009.02295.x
Goldin-Meadow, S., Mylander, C., de Villiers, J., Bates, E., Volterra, V.: Gestural communication in deaf children: the effects and non-effects of parental input on early language development. Monogr. Soc. Res. Child Dev. 49(3–4), 1–151 (1984)
Bejarano, T.: Becoming Human: From Pointing Gestures to Syntax. John Benjamins Publishing, Amsterdam, The Netherlands (2011)
Clark, H.H.: Coordinating with each other in a material world. Discourse Stud. 7(4), 507–525 (2005). https://doi.org/10.1177/1461445605054404
Louwerse, M.M., Bangerter, A.: Focusing attention with deictic gestures and linguistic expressions. In: Proceedings of the Annual Conference of Cognitive Science Society, pp. 1331–1336. Stresa, Italy (2005). Available at escolarship.org/uc/item/201422tj. Accessed 6 Nov 2022
Qu, S., Chai, J.Y.: Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces. In: Proceedings of the 13th ACM International Conference on Intelligent User Interfaces (IUI), pp. 237–246. Gran Canaria, Spain (2008). https://doi.org/10.1145/1378773.1378805
Kang, D., Kwak, S.S., Lee, H., Kim, E.H., Choi, J.: This or that: the effect of robot's deictic expression on user's perception. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 11383–11390. Las Vegas, NV, USA (2020). https://doi.org/10.1109/IROS45743.2020.9341067
Bolt, R.A.: “Put-That-There”: voice and gesture at the graphic interface. ACM SIGRAPH Comput. Graph. 14(3), 262–270 (1980). https://doi.org/10.1145/965105.807503
Breazeal, C., Kidd, C.D., Thomaz, A.L., Hoffman, G., Berlin, M.: Effects of nonverbal communication on efficiency and robustness in human-robot teamwork. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 708–713. Edmonton, Alberta, Canada (2005). https://doi.org/10.1109/IROS.2005.1545011
Hato, Y., Satake, S., Kanda, T., Imai, M., Hagita, N.: Pointing to space: modeling of deictic interaction referring to regions. In: Proceedings of the 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 301–308. Osaka, Japan (2010). https://doi.org/10.1109/HRI.2010.5453180
Hu, J., Jiang, Z., Ding, X., Mu, T., Hall, P.: VGPN: voice-guided pointing robot navigation for humans. In: Proceedings of the IEEE International Conference on Robotics and Biomimetic (ROBIO), pp. 1107–1112. Kuala Lumpur, Malaysia (2018). https://doi.org/10.1109/ROBIO.2018.8664854
Nickel, K., Stiefelhagen, R.: Visual recognition of pointing gestures for human-robot interaction. J. Image Vision Comput. 25(12), 1875–1884 (2007). https://doi.org/10.1016//j.imavis.2005.12.020
Nagai, Y.: Learning to comprehend deictic gestures in robots and human Infants. In: Proceedings of the IEEE International Workshop on Robot and Human Interactive Communication (RO-MAN), pp. 217–222. (2005). 10.1109/ ROMAN.2005.1513782
Sidner, C.L., Kidd, C.D., Lee, C., Lesh, N.: Where to look: a study of human-robot engagement. In: Proceedings of the 9th international conference on Intelligent user interfaces (IUI 2004), pp. 78–84. Association for Computing Machinery, New York, NY, USA (2004). https://doi.org/10.1145/964442.964458
Sprute, D., Rasch, R., Pörtner, A., Battermann, S., König, M.: Gesture-based object localization for robot applications in intelligent environments. In: Proceedings of the 14th International Conference on Intelligent Environments (IE), pp. 48–55 (2018). https://doi.org/10.1109/IE.2018.00015
Sugiyama, O., Kanda, T., Imai, M., Ishiguro, H., Hagita, N.: Natural deictic communication with humanoid robots. In: Proceedings of the IEEE International Conference on Intelligent Robot Systems, pp. 1441–1448. San Diego, CA, USA (2007). https://doi.org/10.1109/IROS.2007.4399120
Azari, B., Lim, A., Vaughan, R.: Commodifying pointing in HRI: simple and fast pointing gesture detection from RGB-d images. In: Proceedings of the 16th Conference on Computer and Robot Vision (CRV), pp. 174–180. Kingston, ON, Canada (2019). https://doi.org/10.1109/CRV.2019.00031
Wong, N., Gutwin, C.: Where are you pointing? the accuracy of deictic pointing in CVEs. In: Proceedings of the 28th ACM Conference on Human Factors in Computing Systems (CHI), pp. 1029–1038 (2010). https://doi.org/10.1145/1753326.1753480
Hofemann, N., Fritsch, J., Sagerer, G.: Recognition of deictic gestures with context. In: Rasmussen, C.E., Bülthoff, H.H., Schölkopf, B., Giese, M.A. (eds.) DAGM 2004. LNCS, vol. 3175, pp. 334–341. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28649-3_41
Kollorz, E., Penne, J., Hornegger, J., Barke, A.: Gesture recognition with a time-of-flight camera. Int. J. Intell. Syst. Technol. Appl. 5(3–4), 334–343 (2008). https://doi.org/10.1504/IJISTA.2008.021296
Kondaxakis, P., Pajarinen, J., Kyrki, V.: Real-time recognition of pointing gestures for robot to robot interaction. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2621–2626. Chicago, IL, USA (2014). https://doi.org/10.1109/IROS.2014.6942920
Lai, Y., Wang, C., Li, Y., Ge, S.S., Huang, D.: 3d pointing gesture recognition for human-robot interaction. In: Proceedings of the Chinese Control and Decision Conference (CCDC), pp. 4959–4964. Yinchuan, China (2016). https://doi.org/10.1109/CCDC.2016.7531881
Nowack, T., Lutherdt, S., Jehring, S., Xiong, Y., Wenzel, S., Kurtz, P.: Detecting deictic gestures for control of mobile robots. In: Savage-Knepshield, P., Chen, J. (eds.) Advances in Human Factors in Robots and Unmanned Systems, pp. 87–96. Springer International Publishing, Cham (2017). https://doi.org/10.1007/978-3-319-41959-6_8
OpenCV. https://opencv.org. Accessed 13 Nov 2022
Mediapipe. https://mediapipe.dev. Accessed 10 Nov 2022
PyAudio. https://people.csail.mit.edu/hubert/pyaudio/docs/. Accessed 11 Nov 2022
Pydub. https://pypi.org/project/pydub/. Accessed 11 Nov 2022
Morency, L.-P., Sidner, C. L., Darrell, T.: Dialog context for visual feedback recognition. Wiley Series in Agent Technology, pp. 117–131. https://doi.org/10.1002/9780470512470.CH7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Singh, A., Bansal, A.K. (2023). Synchronized Colored Petri Net Based Multimodal Modeling and Real-Time Recognition of Conversational Spatial Deictic Gestures. In: Arai, K. (eds) Intelligent Computing. SAI 2023. Lecture Notes in Networks and Systems, vol 739. Springer, Cham. https://doi.org/10.1007/978-3-031-37963-5_85
Download citation
DOI: https://doi.org/10.1007/978-3-031-37963-5_85
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-37962-8
Online ISBN: 978-3-031-37963-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)