Emotion Detection from Speech to Enrich Multimedia Content

Yu, Feng; Chang, Eric; Xu, Ying-Qing; Shum, Heung-Yeung

doi:10.1007/3-540-45453-5_71

Feng Yu⁷,
Eric Chang⁸,
Ying-Qing Xu⁸ &
…
Heung-Yeung Shum⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2195))

Included in the following conference series:

Pacific-Rim Conference on Multimedia

905 Accesses
49 Citations
3 Altmetric

Abstract

This paper describes an experimental study on the detection of emotion from speech. As computer-based characters such as avatars and virtual chat faces become more common, the use of emotion to drive the expression of the virtual characters becomes more important. This study utilizes a corpus containing emotional speech with 721 short utterances expressing four emotions: anger, happiness, sadness, and the neutral (unemotional) state, which were captured manually from movies and teleplays. We introduce a new concept to evaluate emotions in speech. Emotions are so complex that most speech sentences cannot be precisely assigned to a particular emotion category; however, most emotional states nevertheless can be described as a mixture of multiple emotions. Based on this concept we have trained SVMs (support vector machines) to recognize utterances within these four categories and developed an agent that can recognize and express emotions.

Visiting Microsoft Research China from Department of Computer Science and Technology, Tsinghua University, Beijing, China

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Recognizing Emotion Presence in Natural Language Sentences

Recognizing Emotional States Using Speech Information

Emotion Detection Using Speech Analysis

References

Brand, M.: “Voice Puppetry”, Proceedings of the SIGGRAPH, 21–28, 1999.
Google Scholar
Cassell, J., Bickmore, T., Campbell, L., Chang, K., Vilhjlmsson, H., and Yan, H.: “Requirements for an architecture for embodied conversational characters”, Proceedings of Computer Animation and Simulation, 109–120, 1999.
Google Scholar
Cassell, J., Pelachaud, C., Badler, N.I., Steedman, M., Achorn, B., Beckett, T., Douville, B., Prevost, S. and Stone, M.: “Animated conversation: rule-based generation of facial display, gesture and spoken intonation for multiple conversational agents”, Proceedings of the SIGGRAPH, 28(4): 413–420, 1994.
Google Scholar
Chang, E., Zhou, J.-L., Di, S., Huang, C., and Lee., K.-F.: “Large vocabulary Mandarin speech recognition with different approaches in modeling tones”, International Conference on Spoken Language Processing, 2000.
Google Scholar
Roy, D., and Pentland, A.: “Automatic spoken affect analysis and classification”, in Proceedings of the Sencond International Conference on Automatic Face and Gesture Recognition, pp. 363–367, 1996.
Google Scholar
Dellaert, F., Polzin, T., and Waibel, A.: “Recognizing Emotion in Speech”, Proceedings of the ICSLP, 1996.
Google Scholar
Erickson, D., Abramson, A., Maekawa, K., and Kaburagi, T.: “Articulatory Characteristics of Emotional Utterances in Spoken English”, Proceedings of the ICSLP, 2000.
Google Scholar
Joachims, T., Schölkopf, B., Burges, C., and Smola, A.(ed.): Making large-Scale SVM Training Practical. Advances in Kernel Methods-Support Vector Training, MIT-Press, 1999.
Google Scholar
Kang, B.-S., Han C.-H., Lee, S.-T., Youn, D.-H., and Lee, C.-Y.: “Speaker Dependent Emotion Recognition using Speech Signals”, Proceedings of the ICSLP, 2000.
Google Scholar
Paeschke, A., and Sendlmeier, W. F.: “Prosodic Characteristics of Emotional Speech: Measurements of Fundamental Frequency Movements”, Proceedings of the ISCA-Workshop on Speech and Emotion, 2000.
Google Scholar
Pereira, C.: “Dimensions of Emotional Meaning in Speech”, Proceedings of the ISCAWorkshop on Speech and Emotion, 2000.
Google Scholar
Polzin, T., and Waibel, A.: “Emotion-Sensitive Human-Computer Interfaces”, Proceedings of the ISCA-Workshop on Speech and Emotion, 2000.
Google Scholar
Scherer, K.R.: “A Cross-Cultural Investigation of Emotion Inferences from Voice and Speech: Implications for Speech”, Proceedings of the ICSLP, 2000.
Google Scholar
Li, Y., Yu, F., Xu, Y.-Q., Chang, E., and Shum, H.-Y.: “Speech-Driven Cartoon Animation with Emotions”, to be appeared in ACM Multimedia 2001.
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Technology, Tsinghua Univ., Beijing, 100084, P.R.C
Feng Yu
Microsoft Research China, sr3/F Beijing Sigma Center, Beijing, 100080, P.R.C.
Eric Chang, Ying-Qing Xu & Heung-Yeung Shum

Authors

Feng Yu
View author publications
You can also search for this author in PubMed Google Scholar
Eric Chang
View author publications
You can also search for this author in PubMed Google Scholar
Ying-Qing Xu
View author publications
You can also search for this author in PubMed Google Scholar
Heung-Yeung Shum
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Microsoft Research China, 5/F Beijing Sigma Center 49 Zhichung Road, Haidian District, Beijing, 100080, China
Heung-Yeung Shum
Institute of Information Science, Academia Sinica, Taiwan
Mark Liao
Department of Electrical Engineering, Columbia University, New York, NY, 10027, USA
Shih-Fu Chang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, F., Chang, E., Xu, YQ., Shum, HY. (2001). Emotion Detection from Speech to Enrich Multimedia Content. In: Shum, HY., Liao, M., Chang, SF. (eds) Advances in Multimedia Information Processing — PCM 2001. PCM 2001. Lecture Notes in Computer Science, vol 2195. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45453-5_71

Download citation

DOI: https://doi.org/10.1007/3-540-45453-5_71
Published: 20 November 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42680-6
Online ISBN: 978-3-540-45453-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Emotion Detection from Speech to Enrich Multimedia Content

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Recognizing Emotion Presence in Natural Language Sentences

Recognizing Emotional States Using Speech Information

Emotion Detection Using Speech Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Emotion Detection from Speech to Enrich Multimedia Content

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Recognizing Emotion Presence in Natural Language Sentences

Recognizing Emotional States Using Speech Information

Emotion Detection Using Speech Analysis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation