Abstract
Behavior models implemented within Embodied Conversational Agents (ECAs) require nonverbal communication to be tightly coordinated with speech. In this paper we present an empirical study seeking to explore the influence of the temporal coordination between speech and facial expressions of emotions on the perception of these emotions by users (measuring their performance in this task, the perceived realism of behavior, and user preferences). We generated five different conditions of temporal coordination between facial expression and speech: facial expression displayed before a speech utterance, at the beginning of the utterance, throughout, at the end of, or following the utterance. 23 subjects participated in the experiment and saw these 5 conditions applied to the display of 6 emotions (fear, joy, anger, disgust, surprise and sadness). Subjects recognized emotions most efficiently when facial expressions were displayed at the end of the spoken sentence. However, the combination users viewed as most realistic, preferred over others, was the display of the facial expression throughout speech utterance. We review existing literature to position our work and discuss the relationship between realism and communication performance. We also provide animation guidelines and draw some avenues for future work.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Allen J (1983) Maintaining knowledge about temporal intervals. Commun ACM 26:832–843
Arya A, DiPaola S, Parush A (2009) Perceptually valid facial expressions for character-based applications. Int J Comput Games Technol 2009:1–13
Calder AJ, Rowland D, Young AW, Nimmo-Smith I, Keane J, Perrett DI (2000) Caricaturing facial expressions. Cognition 76:105–146
Cassell J (2000) Nudge nudge wink wink: elements of face-to-face conversation for embodied conversational agents. In: Cassell J, Sullivan J, Prevost S, Churchill E (eds) Embodied conversational agents. MIT Press, Cambridge, pp 1–27
Cassell J, Pelachaud C, Badler N, Steedman M, Achorn B, Becket T, Douville B, Prevost S, Stone M (1994) Animated conversation: rule-based generation of facial expression, gesture and spoken intonation for multiple conversational agents. In: SIGGRAPH’94. ACM, New York, pp 413–420
Cassell J, Vilhjálmsson H, Bickmore T (2001) BEAT: the Behavior Expression Animation Toolkit. In: SIGGRAPH’01. ACM, New York, pp 477–486
De Rosis F, Pelachaud C, Poggi I, Carofiglio V, De Carolis B (2003) From Greta’s mind to her face: modelling the dynamics of affective states in a conversational embodied agent. Int J Hum-Comput Stud 59:81–118
Devillers L, Vidrascu L, Lamel L (2005) Emotion detection in real-life spoken dialogs recorded in call center. J Neural Netw 18:407–422
Egges A, Kshirsaga S, Magnenat-Thalmann N (2004) Generic personality and emotion simulation for conversational agents. Comput Animat Virtual Worlds 15:1–13
Ekman P (1994) Strong evidence for universals in facial expressions: a reply to Russell’s mistaken critique. Psychol Bull 115:268–287
Ekman P, Friesen WV (1975) Unmasking the face. A guide to recognizing emotions from facial clues. Prentice-Hall, Englewood Cliffs
Gratch J, Rickel J, André E, Badler N, Cassell J, Petajan E (2002) Creating interactive virtual humans: some assembly required. IEEE Intell Syst 17:54–63
Gratch J, Marsella S, Egges A, Eliëns A, Isbister K, Paiva A, Rist T, ten Hagen P (2004) Design criteria, techniques and case studies for creating and evaluating interactive experiences for virtual humans. Working group on ECA’s design parameters and aspects, Dagstuhl seminar on Evaluating Embodied Conversational Agents
Groom V, Nass C, Chen T, Nielsen A, Scarborough JK, Robles E (2009) Evaluating the effects of behavioral realism in embodied agents. Int J Hum-Comput Stud 67:842–849
Grynszpan O, Nadel J, Constant J, Le Barillier F, Carbonell N, Simonin J, Martin JC, Courgeon M (2009) A new virtual environment paradigm for high functioning autism intended to help attentional disengagement in a social context. In: Virtual rehabilitation international conference, pp 51–58
Isbister K, Doyle P (2004) The blind men and the elephant revisited. In: Ruttkay Z, Pelachaud C (eds) From brows to trust: evaluating embodied conversational agents. Kluwer Academic, Norwell, pp 3–26
Johnson WL, Rickel J, Lester J (2000) Animated pedagogical agents: face-to-face interaction in interactive learning environments. Int J Artif Intell Educ 11:47–78
Krahmer E, Swerts M (2004) More about brows. In: Ruttkay Z, Pelachaud C (eds) From brows to trust: evaluating embodied conversational agents. Kluwer Academic, Norwell, pp 191–216
Krahmer E, Swerts M (2009) Audiovisual prosody—introduction to the special issue. Lang Speech 52:129–133
Krumhuber E, Manstead ASR, Kappas A (2007) Temporal aspects of facial displays in person and expression perception: the effect of smile dynamics, head-tilt, and gender. J Nonverbal Behav 31:39–56
Lester J, Towns S, Callaway C, Voerman J, FitzGerald P (2000) Deictic and emotive communication in animated pedagogical agents. In: Cassell J, Prevost S, Sullivan J, Churchill E (eds) Embodied conversational agents. MIT Press, Cambridge, pp 123–154
Martin JC, Niewiadomski R, Devillers L, Buisine S, Pelachaud C (2006) Multimodal complex emotions: gesture expressivity and blended facial expressions. Int J Humanoid Robot 3:269–292
Messinger DS, Fogel A, Dickson KL (1999) What’s in a smile? Dev Psychol 35:701–708
Nusseck M, Cunningham DW, Wallraven C, Bülthoff HH (2008) The contribution of different facial regions to the recognition of conversational expressions. J Vis 8:1–23
Pelachaud C (2005) Multimodal expressive embodied conversational agents. In: International multimedia conference, pp 683–689
Pelachaud C (2009) Modelling multimodal expression of emotion in a virtual agent. Philos Trans R Soc B 364:3539–3548
Pelachaud C, Badler N, Steedman M (1996) Generating facial expressions for speech. Cogn Sci 20:1–46
Pelachaud C, Carofiglio V, De Carolis B, de Rosis F, Poggi I (2002) Embodied contextual agent in information delivering application. In: AAMAS’2002, pp 758–765
Scherer KR (1980) The functions of nonverbal signs in conversation. In: Giles H, St Clair R (eds) The social and physhological contexts of language. LEA, New York, pp 225–243
Scherer KR (2001) Appraisal considered as a process of multi-level sequential checking. In: Scherer KR, Schorr A, Johnstone T (eds) Appraisal processes in emotion: theory, methods, research. Oxford University Press, New York, pp 92–120
Tanguy E, Willis P, Bryson J (2007) Emotions as durative dynamic state for action selection. In: IJCAI’07: International joint conference on artificial intelligence, pp 1537–1542
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Buisine, S., Wang, Y. & Grynszpan, O. Empirical investigation of the temporal relations between speech and facial expressions of emotion. J Multimodal User Interfaces 3, 263–270 (2009). https://doi.org/10.1007/s12193-010-0050-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-010-0050-4