Summary
Led by the fundamental role that rhythms apparently play in speech and gestural communication among humans, this study was undertaken to substantiate a biologically motivated model for synchronizing speech and gesture input in human computer interaction. Our approach presents a novel method which conceptualizes a multimodal user interface on the basis of timed agent systems. We use multiple agents for the purpose of polling presemantic information from different sensory channels (speech and hand gestures) and integrating them to multimodal data structures that can be processed by an application system which is again based on agent systems. This article motivates and presents technical work which exploits rhythmic patterns in the development of biologically and cognitively motivated mediator systems between humans and machines.
Zusammenfassung
Als Eckpfeiler der natürlichen Verständigung zwischen Menschen sind Gestik und Sprache in der Mensch-Maschine-Kommunikation von großem Interesse. Jedoch gibt es bislang kaum Lösungsvorschläge dafür, wie die multimodalen Äußerungen eines Systemnutzers — als zeitlich gestreute Perzepte auf getrennten Kanälen registriert — in ihrem zeitlichen Zusammenhang zu rekonstruieren sind. In diesem Beitrag wird anhand der Beobachtung, daß menschliches Kommunikationsverhalten von signifikant rhythmischer Natur ist, eine neuartige Methode zur Konzeption eines multimodalen Eingabesystems entworfen. Es basiert auf einem zeitgetakteten Multiagentensystem, mit dem eine präsemantische Integration der Sensordaten von Sprach- und Gesteneingaben in einer multimodalen Eingabedatenstruktur vorgenommen wird. Hiermit werden erste technische Arbeiten beschrieben, die rhythmische Muster für biologisch und kognitiv motivierte Mittlersysteme zwischen Mensch und Maschine ausnutzen.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Literatur
Bolt, R.A. (1980) «Put-That-There“: Voice and gesture at the graphics interface. Computer Graphics, 14(3): 262–270
Bos, E., Huls, C. & Claasen, W. (1994) EDWARD: Full integration of language and action in a multimodal user interface. Int. Journal Human- Computer Studies, 40: 473–495
Broendsted, T. & Madsen, J.P. (1997) Analysis of speaking rate variations in stress-timed languages. Proceedings 5th European Conference on Speech Communication and Technology (EuroSpeech), pp. 481–484, Rhodes
Condon, W.S. (1986) Communication: Rhythm and Structure. In J. Evans and M. Clynes (Eds.): Rhythm in Psychological, Linguistic and Musical Processes. Springfield, Ill.: Thomas, pp. 55–77
Coutaz, J., Nigay, L. & Salber, D. (1995) Multimodality from the user and systems perspectives. Proceedings of the ERCIM-95 Workshop on Multimedia Multimodal User Interfaces
Cummins, F. & Port, R.F. (1998) Rhythmic constraints on stress timing in English. Journal of Phonetics 26: 145–171
Fant, G. & Kruckenberg, A. (1996) On the Quantal Nature of Speech Timing. Proc. ICSLP-96, pp. 2044-2047
Kendon, A. (1972) Some relationships between body motion and speech — An analysis of an example. In: Siegman, A.W. & Pope, B. (eds) Studies in Dyadic Communication. New York: Pergamon Press
Kien, J. & Kemp, A. (1994) Is speech temporally segmented? Comparison with temporal segmentation in behavior. Brain and Language 46: 662–682
Koons, D.B., Sparrell, C.J. & Thórisson, K.R. (1993) Integrating simultaneous input from speech, gaze, and hand gestures. In: Maybury, M.T. (ed.) Intelligent Multimedia Interfaces. AAAI Press/The MIT Press, Menlo Park, pp 257–276
Kopp, S. & Wachsmuth, I. (1999) Natural timing in coverbal gesture of an articulated figure, Working notes, Workshop “Communicative Agents” at Autonomous Agents 1999, Seattle
Lenzmann, B. (1998) Benutzeradaptive und multimodale Interface-Agenten. Dissertationen der Künstlichen Intelligenz, Bd. 184, Sankt Augustin: Infix
Martin, J.G. (1972. Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psychological Review 79(6): 487–509
Martin, J.G. (1979) Rhythmic and segmental perception. J. Acoust. Soc. Am. 65(5): 1286–1297
Maybury, M.T. (1995) Research in multimedia and multimodal parsing and generation. Artificial Intelligence Review 9(2–3): 103–127
McAuley, D. (1994) Time as phase: A dynamical model of time perception. In: Proceedings of the Sixteenth Annual Meeting of the Cognitive Science Society. Hillsdale NJ: Lawrence Erlbaum Associates, pp 607–612
McClave, E. (1994) Gestural Beats: The Rhythm Hypothesis. Journal of Psycholinguistic Research 23(1), 45–66
McNeill, D. (1992) Hand and Mind: What Gestures Reveal About Thought. Chicago: University of Chicago Press
Neal, J.G. & Shapiro, S.C. (1991) Intelligent multi-media interface technology. In: Sullivan, J.W. & Tyler, S.W. (eds). Intelligent User Interfaces. ACM Press, New York, pp 11–43
Nigay, L. & Coutaz, J. (1995) A generic platform for addressing the multimodal challence. In: Proceedings of the Conference on Human Factors in Computing Systems (CHI-95). Reading: Addison-Wesley, pp 98–105
Pöppel, E. (1997) A hierarchical model of temporal perception. Trends in Cognitive Science 1(2), 56–61
Schöner, G. & Kelso, J.A.S. (1988) Dynamic pattern generation in behavioral and neural systems. Science, 239: 1513–1520
Sowa, T., Fröhlich, M. & Latoschik, M. (1999) Temporal symbolic integration applied to a multimodal system using gestures and speech. In: Braffort, A. et al. (eds). Toward a Gesture-based Communication in Human-Computer Interaction (Proceedings Internat. Gesture Workshop, Gif-sur-Yvette, France, March 1999). Berlin: Springer (LNAI 1739). 291–302
Srihari, R.K. (1995) Computational models for integrating linguistic and visual information: a survey. Artificial Intelligence Review 8: 349–369
Wachsmuth, I. (1999) Communicative rhythm in gesture and speech. In: Braffort, A. et al. (eds). Toward a Gesture-based Communication in Human-Computer Interaction (Proceedings Internat GestureWorkshop, Gif-sur-Yvette, France, March 1999). Berlin: Springer (LNAI 1739). 277–290
Wachsmuth, I. & Cao, Y. (1995) Interactive graphics design with situated agents. In: Strasser, W. & Wahl, F. (eds). Graphics and Robotics. Berlin: Springer, pp 73–85
Wachsmuth, I. & Fröhlich, M. (1998) Gesture and Sign Language in Human-Computer Interaction (Proceedings International Gesture Workshop, Bielefeld, Germany, September 17–19, 1997). Berlin: Springer (LNAI 1371)
Wooldridge, M. & Jennings, N.R. (1995) Intelligent agents: Theory and practice. Knowledge Engineering Review, 10(2): 115–152
Author information
Authors and Affiliations
Corresponding author
Additional information
De
Dieser Beitrag ist eine leicht überarbeitete, ergänzte deutsche Fassung des unter o.g. englischem Titel erschienenen Beitrags (Wachsmuth, 1999), mit freundlicher Genehmigung von Springer.
Rights and permissions
About this article
Cite this article
Wachsmuth, I. Kommunikative Rhythmen in Gestik und Sprache. Kognit. Wiss. 8, 151–159 (1999). https://doi.org/10.1007/BF03354937
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF03354937