Skip to main content

Emotion Detection from Speech to Enrich Multimedia Content

  • Conference paper
  • First Online:
Advances in Multimedia Information Processing — PCM 2001 (PCM 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2195))

Included in the following conference series:

Abstract

This paper describes an experimental study on the detection of emotion from speech. As computer-based characters such as avatars and virtual chat faces become more common, the use of emotion to drive the expression of the virtual characters becomes more important. This study utilizes a corpus containing emotional speech with 721 short utterances expressing four emotions: anger, happiness, sadness, and the neutral (unemotional) state, which were captured manually from movies and teleplays. We introduce a new concept to evaluate emotions in speech. Emotions are so complex that most speech sentences cannot be precisely assigned to a particular emotion category; however, most emotional states nevertheless can be described as a mixture of multiple emotions. Based on this concept we have trained SVMs (support vector machines) to recognize utterances within these four categories and developed an agent that can recognize and express emotions.

Visiting Microsoft Research China from Department of Computer Science and Technology, Tsinghua University, Beijing, China

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Brand, M.: “Voice Puppetry”, Proceedings of the SIGGRAPH, 21–28, 1999.

    Google Scholar 

  2. Cassell, J., Bickmore, T., Campbell, L., Chang, K., Vilhjlmsson, H., and Yan, H.: “Requirements for an architecture for embodied conversational characters”, Proceedings of Computer Animation and Simulation, 109–120, 1999.

    Google Scholar 

  3. Cassell, J., Pelachaud, C., Badler, N.I., Steedman, M., Achorn, B., Beckett, T., Douville, B., Prevost, S. and Stone, M.: “Animated conversation: rule-based generation of facial display, gesture and spoken intonation for multiple conversational agents”, Proceedings of the SIGGRAPH, 28(4): 413–420, 1994.

    Google Scholar 

  4. Chang, E., Zhou, J.-L., Di, S., Huang, C., and Lee., K.-F.: “Large vocabulary Mandarin speech recognition with different approaches in modeling tones”, International Conference on Spoken Language Processing, 2000.

    Google Scholar 

  5. Roy, D., and Pentland, A.: “Automatic spoken affect analysis and classification”, in Proceedings of the Sencond International Conference on Automatic Face and Gesture Recognition, pp. 363–367, 1996.

    Google Scholar 

  6. Dellaert, F., Polzin, T., and Waibel, A.: “Recognizing Emotion in Speech”, Proceedings of the ICSLP, 1996.

    Google Scholar 

  7. Erickson, D., Abramson, A., Maekawa, K., and Kaburagi, T.: “Articulatory Characteristics of Emotional Utterances in Spoken English”, Proceedings of the ICSLP, 2000.

    Google Scholar 

  8. Joachims, T., Schölkopf, B., Burges, C., and Smola, A.(ed.): Making large-Scale SVM Training Practical. Advances in Kernel Methods-Support Vector Training, MIT-Press, 1999.

    Google Scholar 

  9. Kang, B.-S., Han C.-H., Lee, S.-T., Youn, D.-H., and Lee, C.-Y.: “Speaker Dependent Emotion Recognition using Speech Signals”, Proceedings of the ICSLP, 2000.

    Google Scholar 

  10. Paeschke, A., and Sendlmeier, W. F.: “Prosodic Characteristics of Emotional Speech: Measurements of Fundamental Frequency Movements”, Proceedings of the ISCA-Workshop on Speech and Emotion, 2000.

    Google Scholar 

  11. Pereira, C.: “Dimensions of Emotional Meaning in Speech”, Proceedings of the ISCAWorkshop on Speech and Emotion, 2000.

    Google Scholar 

  12. Polzin, T., and Waibel, A.: “Emotion-Sensitive Human-Computer Interfaces”, Proceedings of the ISCA-Workshop on Speech and Emotion, 2000.

    Google Scholar 

  13. Scherer, K.R.: “A Cross-Cultural Investigation of Emotion Inferences from Voice and Speech: Implications for Speech”, Proceedings of the ICSLP, 2000.

    Google Scholar 

  14. Li, Y., Yu, F., Xu, Y.-Q., Chang, E., and Shum, H.-Y.: “Speech-Driven Cartoon Animation with Emotions”, to be appeared in ACM Multimedia 2001.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Yu, F., Chang, E., Xu, YQ., Shum, HY. (2001). Emotion Detection from Speech to Enrich Multimedia Content. In: Shum, HY., Liao, M., Chang, SF. (eds) Advances in Multimedia Information Processing — PCM 2001. PCM 2001. Lecture Notes in Computer Science, vol 2195. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45453-5_71

Download citation

  • DOI: https://doi.org/10.1007/3-540-45453-5_71

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42680-6

  • Online ISBN: 978-3-540-45453-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics