Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture

Zoric, Goranka; Smid, Karlo; Pandzic, Igor S.

doi:10.1007/978-3-642-00525-1_11

Goranka Zoric²³,
Karlo Smid²⁴ &
Igor S. Pandzic²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5398))

1182 Accesses
5 Citations

Abstract

In our current work we concentrate on finding correlation between speech signal and occurrence of facial gestures. Motivation behind this work is computer-generated human correspondent, ECA. In order to have a believable human representative it is important for an ECA to implement facial gestures in addition to verbal and emotional displays. Information needed for generation of facial gestures is extracted from speech prosody by analyzing natural speech in real-time. This work is based on the previously developed HUGE architecture for statistically-based facial gesturing and extends our previous work on automatic real-time lip sync.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Emotion Recognition Model Based on Facial Gestures and Speech Signals Using Skew Gaussian Mixture Model

MAGEFACE: Performative Conversion of Facial Characteristics into Speech Synthesis Parameters

Audio-Driven Lips and Expression on 3D Human Face

Keywords

References

Cassell, J., Sullivan, J., Prevost, S., Churchill, E. (eds.): Embodied Conversational Agents, p. 430. MIT press, Cambridge (2000)
Google Scholar
Chovil, N.: Discourse-oriented facial displays in conversation, Research on Language and Social Interaction (1991)
Google Scholar
Fridlund, A., Ekman, P., Oster, H.: Facial expressions of emotion. In: Siegman, A., Feldstein, S. (eds.) Nonverbal Behavior and Communication. Lawrence Erlbaum, Hillsdale (1987)
Google Scholar
Zoric, G., Smid, K., Pandzic, I.: Facial Gestures: Taxonomy and Application of Nonverbal, Nonemotional Facial Displays for Emodied Conversational Agents. In: Nishida, T. (ed.) Conversational Informatics - An Engineering Approach, pp. 161–182. John Wiley & Sons, Chichester (2007)
Chapter Google Scholar
Ekman, P., Friesen, W.V.: The repertoire of nonverbal behavior: Categories, origins, usage, and coding, Semiotica (1969)
Google Scholar
Pelachaud, C., Badler, N., Steedman, M.: Generating Facial Expressions for Speech. Cognitive Science 20(1), 1–46 (1996)
Article Google Scholar
Ekman, P.: About brows: Emotional and conversational signals. In: von Cranach, M., Foppa, K., Lepenies, W., Ploog, D. (eds.) Human ethology: Claims and limits of a new discipline (1979)
Google Scholar
Cavé, C., Guaïtella, I., Bertrand, R., Santi, S., Harlay, F., Espesser, R.: About the relationship between eyebrow movements and F0 variations. In: Proceedings of Int’l Conf. Spoken Language Processing (1996)
Google Scholar
Honda, K.: Interactions between vowel articulation and F0 control. In: Fujimura, B.D.J.O., Palek, B. (eds.) Proceedings of Linguistics and Phonetics: Item Order in Language and Speech (LP 1998) (2000)
Google Scholar
Yehia, H., Kuratate, T., Vatikiotis-Bateson, E.: Facial animation and head motion driven by speech acoustics. In: Hoole, P. (ed.) 5th Seminar on Speech Production: Models and Data, Kloster Seeon (2000)
Google Scholar
Granström, B., House, D., Lundeberg, M.: Eyebrow movements as a cue to prominence. In: The Third Swedish Symposium on Multimodal Communication (1999)
Google Scholar
House, D., Beskow, J., Granström, B.: Timing and interaction of visual cues for prominence in audiovisual speech perception. In: Proceedings of Eurospeech 2001 (2001)
Google Scholar
Graf, H.P., Cosatto, E., Strom, V., Huang, F.J.: Visual Prosody: Facial Movements Accompanying Speech. In: Proceedings of AFGR 2002, pp. 381–386 (2002)
Google Scholar
Granström, B., House, D.: Audiovisual representation of prosody in expressive speech communication. Speech Communication 46, 473–484 (2005)
Article Google Scholar
Cassell, J.: Embodied Conversation: Integrating Face and Gesture into Automatic Spoken Dialogue Systems. In: Luperfoy, S. (ed.) Spoken Dialogue Systems. MIT Press, Cambridge (1989)
Google Scholar
Bui, T.D., Heylen, D., Nijholt, A.: Combination of facial movements on a 3D talking head. In: Proceedings of Computer Graphics International (2004)
Google Scholar
Smid, K., Pandzic, I.S., Radman, V.: Autonomous Speaker Agent. In: Computer Animation and Social Agents Conference CASA 2004, Geneva, Switzerland (2004)
Google Scholar
Zoric, G.: Automatic Lip Synchronization by Speech Signal Analysis, Master Thesis (03-Ac-17/2002-z) on Faculty of Electrical Engineering and Computing, University of Zagreb (2005)
Google Scholar
Kshirsagar, S., Magnenat-Thalmann, N.: Lip synchronization using linear predictive analysis. In: Proceedings of IEE International Conference on Multimedia and Expo., New York (2000)
Google Scholar
Lewis, J.: Automated Lip-Sync: Background and Techniques. Proceedings of J. Visualization and Computer Animation 2 (1991)
Google Scholar
Huang, F.J., Chen, T.: Real-time lip-synch face animation driven by human voice. In: IEEE Workshop on Multimedia Signal Processing, Los Angeles, California (December 1998)
Google Scholar
McAllister, D.F., Rodman, R.D., Bitzer, D.L., Freeman, A.S.: Lip synchronization of speech. In: Proceedings of AVSP 1997 (1997)
Google Scholar
Kuratate, T., Munhall, K.G., Rubin, P.E., Vatikiotis-Bateson, E., Yehia, H.: Audio-visual synthesis of talking faces from speech production correlates. In: Proceedings of EuroSpeech 1999 (1999)
Google Scholar
Yehia, H.C., Kuratate, T., Vatikiotis-Bateson, E.: Linking facial animation, head motion and speech acoustics. Journal of Phonetics (2002)
Google Scholar
Munhall, K.G., Jones, J., Callan, D., Kuratate, T., Vatikiotis-Bateson, E.: Visual Prosody and Speech Intelligibility. Psychological Science 15(2), 133–137 (2003)
Article Google Scholar
Deng, Z., Busso, C., Narayanan, S., Neumann, U.: Audio-based Head Motion Synthesis for Avatar-based Telepresence Systems. In: Proc. of ACM SIGMM Workshop on Effective Telepresence (ETP), NY, pp. 24–30 (October 2004)
Google Scholar
Chuang, E., Bregler, C.: Mood swings: expressive speech animation. ACM Transactions on Graphics (TOG) 24(2), 331–347 (2005)
Article Google Scholar
Sargin, M.E., Erzin, E., Yemez, Y., Tekalp, A.M., Erdem, A.T., Erdem, C., Ozkan, M.: Prosody-Driven Head-Gesture Animation. In: ICASSP 2007, Honolulu, USA (2007)
Google Scholar
Hofer, G., Shimodaira, H.: Automatic Head Motion Prediction from Speech Data. In: Proceedings Interspeech 2007 (2007)
Google Scholar
Brand, M.: Voice Puppetry. In: Proceedings of Siggraph 1999 (1999)
Google Scholar
Gutierrez-Osuna, R., Kakumanu, P.K., Esposito, A., Garcia, O.N., Bojorquez, A., Castillo, J.L., Rudomin, I.: Speech-driven facial animation with realistic dynamics. IEEE Transactions on Multimedia (2005)
Google Scholar
Costa, M., Lavagetto, F., Chen, T.: Visual Prosody Analysis for Realistic Motion Synthesis of 3D Head Models. In: Proceedings of International Conference on Augmented, Virtual Environments and 3D Imaging (2001)
Google Scholar
Albrecht, I., Haber, J., Seidel, H.: Automatic Generation of Non-Verbal Facial Expressions from Speech. In: Proceedings of Computer Graphics International 2002 (CGI 2002), pp. 283–293 (2002)
Google Scholar
Malcangi, M., de Tintis, R.: Audio Based Real-Time Speech Animation of Embodied Conversational Agents. LNCS (2004)
Google Scholar
Cassell, J., Pelachaud, C., Badler, N., Steedman, M., Achorn, B., Becket, T., Douville, B., Prevost, S., Stone, M.: Animated Conversation: Rule-based Generation of Facial Expressions, Jesture & Spoken Intonation for Multiple Conversational Agents. In: Proceedings of SIGGAPH 1994 (1994)
Google Scholar
Lee, S.P., Badler, J.B., Badler, N.I.: Eyes Alive. In: Proceedings of the 29th annual conference on Computer graphics and interactive techniques 2002, San Antonio, Texas, USA, pp. 637–644. ACM Press, New York (2002)
Google Scholar
Smid, K., Zoric, G., Pandzic, I.P.: [HUGE]: Universal Architecture for Statistically Based HUman GEsturing. In: Gratch, J., Young, M., Aylett, R.S., Ballin, D., Olivier, P. (eds.) IVA 2006. LNCS (LNAI), vol. 4133, pp. 256–269. Springer, Heidelberg (2006)
Chapter Google Scholar
Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech signals. Prentice-Hall Inc., Englewood Cliffs (1978)
Google Scholar
http://www.visagetechnologies.com/

Download references

Author information

Authors and Affiliations

Department of Telecommunications, Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, HR-10 000, Zagreb, Croatia
Goranka Zoric & Igor S. Pandzic
Ericsson Nikola Tesla, Krapinska 45, p.p. 93, HR-10 002, Zagreb, Croatia
Karlo Smid

Authors

Goranka Zoric
View author publications
You can also search for this author in PubMed Google Scholar
Karlo Smid
View author publications
You can also search for this author in PubMed Google Scholar
Igor S. Pandzic
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Psychology, Second University of Naples, and IIASS, Via Pellegrino 19, 84019, Vietri sul Mare (SA), Italy
Anna Esposito
Department of Computing Science & Mathematics, University of Stirling, FK9 4LA, Stirling, Scotland, UK
Amir Hussain
Dipartimento di Fisica “E.R. Caianiello”, Università degli Studi di Salerno, Italy and IIASS, Via S. Allende, 84081, Baronissi (SA), Italy
Maria Marinaro
Dip. di Ingegneria dell’ Informazione, Seconda Università di Napoli, Via Roma 29, 81031, Aversa (CE), Italy
Raffaele Martone

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zoric, G., Smid, K., Pandzic, I.S. (2009). Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture. In: Esposito, A., Hussain, A., Marinaro, M., Martone, R. (eds) Multimodal Signals: Cognitive and Algorithmic Issues. Lecture Notes in Computer Science(), vol 5398. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00525-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-00525-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00524-4
Online ISBN: 978-3-642-00525-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture

Abstract

Chapter PDF

Similar content being viewed by others

Emotion Recognition Model Based on Facial Gestures and Speech Signals Using Skew Gaussian Mixture Model

MAGEFACE: Performative Conversion of Facial Characteristics into Speech Synthesis Parameters

Audio-Driven Lips and Expression on 3D Human Face

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Towards Facial Gestures Generation by Speech Signal Analysis Using HUGE Architecture

Abstract

Chapter PDF

Similar content being viewed by others

Emotion Recognition Model Based on Facial Gestures and Speech Signals Using Skew Gaussian Mixture Model

MAGEFACE: Performative Conversion of Facial Characteristics into Speech Synthesis Parameters

Audio-Driven Lips and Expression on 3D Human Face

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation