Multimodal emotion recognition from expressive faces, body gestures and speech

Caridakis, George; Castellano, Ginevra; Kessous, Loic; Raouzaiou, Amaryllis; Malatesta, Lori; Asteriadis, Stelios; Karpouzis, Kostas

doi:10.1007/978-0-387-74161-1_41

George Caridakis²,
Ginevra Castellano³,
Loic Kessous⁴,
Amaryllis Raouzaiou²,
Lori Malatesta²,
Stelios Asteriadis² &
…
Kostas Karpouzis²

Part of the book series: IFIP The International Federation for Information Processing ((IFIPAICT,volume 247))

Included in the following conference series:

IFIP International Conference on Artificial Intelligence Applications and Innovations

4423 Accesses
59 Citations
6 Altmetric

Abstract

In this paper we present a multimodal approach for the recognition of eight emotions that integrates information from facial expressions, body movement and gestures and speech. We trained and tested a model with a Bayesian classifier, using a multimodal corpus with eight emotions and ten subjects. First individual classifiers were trained for each modality. Then data were fused at the feature level and the decision level. Fusing multimodal data increased very much the recognition rates in comparison with the unimodal systems: the multimodal approach gave an improvement of more than 10% with respect to the most successful unimodal system. Further, the fusion performed at the feature level showed better results than the one performed at the decision level.

Download to read the full chapter text

Chapter PDF

Interpretable multimodal emotion recognition using hybrid fusion of speech and image data

Article 05 September 2023

Fusing facial and speech cues for enhanced multimodal emotion recognition

Article 24 January 2024

Multimodal Recognition of Emotions Using Physiological Signals with the Method of Decision-Level Fusion for Healthcare Applications

Keywords

References

Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., Taylor, J.G.: Emotion recognition in human-computer interaction, IEEE Signal Processing Magazine, January 2001.
Google Scholar
Picard, R.: Affective computing, Boston, MA: MIT Press (1997).
Google Scholar
Sebe, N., Cohen, I., Huang, T.S.: Multimodal Emotion Recognition, Handbook of Pattern Recognition and Computer Vision, World Scientific, ISBN 981-256-105-6, January 2005.
Google Scholar
Pantic, M., Sebe, N., Cohn, J., Huang, T.S.: Affective Multimodal Human-Computer Interaction, ACM Multimedia, pp. 669–676, Singapore, November 2005.
Google Scholar
Scherer, K. R., Wallbott, H. G.: Analysis of Nonverbal Behavior. HANDBOOK OF DISCOURSE: ANALYSIS, Vol. 2, Cap. 11, Academic Press London (1985).
Google Scholar
Banse, R., Scherer, K.R.: Acoustic Profiles in Vocal Emotion Expression. Journal of Personality and Social Psychology. 614–636 (1996).
Google Scholar
Vogt, T., André, E.: Comparing feature sets for acted and spontaneous speech in view of automatic emotion recognition. IEEE International Conference on Multimedia & Expo (ICME 2005).
Google Scholar
Gunes H., Piccardi M.: A Bimodal Face and Body Gesture Database for Automatic Analysis of Human Nonverbal Affective Behavior, Proc. of ICPR 2006 the 18th International Conference on Pattern Recognition, 20–24 Aug. 2006, Hong Kong, China.
Google Scholar
Banziger, T., Pirker, H., Scherer, K.: Gemep — geneva multimodal emotion portrayals: a corpus for the study of multimodal emotional expressions. In L. Deviller et al. (Ed.), Proceedings of LREC’06 Workshop on Corpora for Research on Emotion and Affect (pp. 15–019). Genoa. Italy (2006).
Google Scholar
Douglas-Cowie, E., Campbell, N., Cowie, R., Roach, P.: Emotional speech: towards a new generation of databases. Speech Communication, 40, 33–60 (2003).
Article MATH Google Scholar
Pantic, M., Rothkrantz, L.J.M.: Automatic analysis of facial expressions: The state of the art. IEEE Trans, on Pattern Analysis and Machine Intelligence, 22(12):1424–1445 (2000).
Article Google Scholar
Ioannou, S., Raouzaiou, A., Tzouvaras, V., Mailis, T., Karpouzis, K., Kollias, S.: Emotion recognition through facial expression analysis based on a neurofuzzy network, Neural Networks, Elsevier, Vol. 18, Issue 4, May 2005, pp. 423–435.
Google Scholar
Cowie, R., Douglas-Cowie, E.: Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In Proc. International Conf. on Spoken Language Processing, pp. 1989–1992 (1996).
Google Scholar
K.R. Scherer: Adding the affective dimension: A new look in speech analysis and synthesis, In Proc. International Conf. on Spoken Language Processing, pp. 1808–1811, (1996).
Google Scholar
Camurri, A., Lagerlof, I, Volpe, G.: Recognizing Emotion from Dance Movement: Comparison of Spectator Recognition and Automated Techniques, International Journal of Human-Computer Studies, 59(1∓), pp. 213–225, Elsevier Science, July 2003.
Article Google Scholar
Bianchi-Berthouze, N., Kleinsmith, A. A categorical approach to affective gesture recognition, Connection Science, V. 15, N. 4, pp. 259–269. (2003).
Google Scholar
Picard, R.W., Vyzas, E., Healey, J.: Toward machine emotional intelligence: Analysis of affective physiological state, IEEE Trans, on Pattern Analysis and Machine Intelligence, 23(10):1175–1191 (2001).
Article Google Scholar
Pantic M., Rothkrantz, L.J.M.: Towards an Affect-sensitive Multimodal Human-Computer Interaction, Proceedings of the IEEE, vol. 91, no. 9, pp. 1370–1390 (2003).
Article Google Scholar
Busso, C., Deng, Z., Yildirim, S., Bulut, M., Lee, C.M., Kazemzaeh, A., Lee, S., Neumann, U., Narayanan, S.: “Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal information,” Proc. of ACM 6th int’l Conf. on Multimodal Interfaces (ICMI 2004), State College, PA, Oct. 2004. pp205–211.
Google Scholar
Kim, J., André, E., Rehm, M., Vogt, T., Wagner, J.: Integrating information from speech and physiological signals to achieve emotional sensitivity. Proc. of the 9th European Conference on Speech Communication and Technology (2005).
Google Scholar
Gunes H, Piccardi M.: Bi-modal emotion recognition from expressive face and body gestures. Journal of Network and Computer Applications (2006), doi:10.1016/j.jnca.2006.09.007
Google Scholar
el Kaliouby, R., Robinson, P.: Generalization of a Vision-Based Computational Model of Mind-Reading. In Proceedings of First International Conference on Affective Computing and Intelligent Interfaces, pp 582–589 (2005).
Google Scholar
Engelbrecht, A.P., Fletcher, L., Cloete, I.: Variance analysis of sensitivity information for pruning multilayer feedforward neural networks, Neural Networks, 1999. IJCNN’ 99. International Joint Conference on, Vol.3, Iss., 1999, pp:1829–1833 vol.3.
Google Scholar
Young, J. W.: Head and Face Anthropometry of Adult U.S. Civilians, FAA Civil Aeromedical Institute, 1963–1993 (final report 1993)
Google Scholar
Raouzaiou, A., Tsapatsoulis, N., Karpouzis, K., Kollias, S.: Parameterized facial expression synthesis based on MPEG-4, EURASIP Journal on Applied Signal Processing, Vol. 2002, No 10, 2002, pp. 1021–1038.
Article MATH Google Scholar
Camurri, A., Coletta, P., Massari, A., Mazzarino, B., Peri, M., Ricchetti, M., Ricci, A. and Volpe, G.:Toward real-time multimodal processing: EyesWeb 4.0, in Proc. AISB 2004 Convention: Motion, Emotion and Cognition, Leeds, UK, March 2004.
Google Scholar
Camurri, A., Mazzarino, B., and Volpe, G.: Analysis of Expressive Gesture: The Eyesweb Expressive Gesture Processing Library, in A. Camurri, G. Volpe (Eds.), Gesture-based Communication in Human-Computer Interaction, LNAI 2915, Springer Verlag (2004).
Google Scholar
Castellano, G., Camurri, A., Mazzarino, B., Volpe, G.: A mathematical model to analyse the dynamics of gesture expressivity, in Proc. of AISB 2007 Convention: Artificial and Ambient Intelligence, Newcastle upon Tyne, UK, April 2007.
Google Scholar
Kessous, L., Amir, N.: Comparison of feature extraction approaches based on the Bark time/frequency representation for classification of expressive speechpaper submitted to Interspeech 2007.
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd Edition, Morgan Kaufmann, San Francisco (2005).
MATH Google Scholar
Kononenko, I.: On Biases in Estimating Multi-Valued Attributes. In: 14th International Joint Conference on Articial Intelligence, 1034–1040 (1995).
Google Scholar

Download references

Author information

Authors and Affiliations

Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, 9, Heroon Politechniou str., 15780, Athens, Greece
George Caridakis, Amaryllis Raouzaiou, Lori Malatesta, Stelios Asteriadis & Kostas Karpouzis
InfoMus Lab, DIST - University of Genova, Viale Causa 13, I-16145, Genova, Italy
Ginevra Castellano
Department of Speech, Language and Hearing, University of Tel Aviv, Sheba Center, 52621, Tel Aviv, Israel
Loic Kessous

Authors

George Caridakis
View author publications
You can also search for this author in PubMed Google Scholar
Ginevra Castellano
View author publications
You can also search for this author in PubMed Google Scholar
Loic Kessous
View author publications
You can also search for this author in PubMed Google Scholar
Amaryllis Raouzaiou
View author publications
You can also search for this author in PubMed Google Scholar
Lori Malatesta
View author publications
You can also search for this author in PubMed Google Scholar
Stelios Asteriadis
View author publications
You can also search for this author in PubMed Google Scholar
Kostas Karpouzis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Athens Information Technology, Greece
Christos Boukis , Aristodemos Pnevmatikakis & Lazaros Polymenakos , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Caridakis, G. et al. (2007). Multimodal emotion recognition from expressive faces, body gestures and speech. In: Boukis, C., Pnevmatikakis, A., Polymenakos, L. (eds) Artificial Intelligence and Innovations 2007: from Theory to Applications. AIAI 2007. IFIP The International Federation for Information Processing, vol 247. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-74161-1_41

Download citation

DOI: https://doi.org/10.1007/978-0-387-74161-1_41
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-74160-4
Online ISBN: 978-0-387-74161-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multimodal emotion recognition from expressive faces, body gestures and speech

Abstract

Chapter PDF

Similar content being viewed by others

Interpretable multimodal emotion recognition using hybrid fusion of speech and image data

Fusing facial and speech cues for enhanced multimodal emotion recognition

Multimodal Recognition of Emotions Using Physiological Signals with the Method of Decision-Level Fusion for Healthcare Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multimodal emotion recognition from expressive faces, body gestures and speech

Abstract

Chapter PDF

Similar content being viewed by others

Interpretable multimodal emotion recognition using hybrid fusion of speech and image data

Fusing facial and speech cues for enhanced multimodal emotion recognition

Multimodal Recognition of Emotions Using Physiological Signals with the Method of Decision-Level Fusion for Healthcare Applications

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation