Skip to main content

Speech-Based Emotion Classification for Human by Introducing Upgraded Long Short-Term Memory (ULSTM)

  • Conference paper
  • First Online:
Computational Intelligence in Pattern Recognition

Abstract

All humans have the intelligence for emotions through emotional behaviour by social skills and by interacting with and imitating humans. Not only that, we also enhance and upgrade our skills for analysis of different emotions through learning with our experience in our surroundings. Now, what if the machine is capable of learning through its artificial intelligent skills? The ongoing exploration is being done using the deep learning model concept. This technique is being used to enhance the learning capacity of the machine which is most important in human emotion recognition because one emotion can be derived towards another type of emotion which is difficult to analyse. This theme has inclined us to explore this problem. The proposed method has been designed to categorize the human emotions through four different deep learning models, which are convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). For training these models, well-known physical and perceptual features have been fitted. The system has been tested on the benchmark data of Ryerson Audio-Visual Dataset for Emotional Speech and Song (RAVDESS). Furthermore, the mentioned deep learning model has been compared based on testing the above dataset in terms of the vanishing gradient problem. In addition, an upgraded model of LSTM has been proposed to get better accuracy and it is being tested with the existing model of LSTM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)

    Google Scholar 

  2. Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)

    Google Scholar 

  3. Zheng, W.Q., Yu, J.S., Zou, Y.X.: An experimental study of speech emotion recognition based on deep convolutional neural networks. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE (2015)

    Google Scholar 

  4. Satt, A., Rozenberg, S., Hoory, R.:. Efficient emotion recognition from speech using deep learning on spectrograms. In: INTERSPEECH (2017)

    Google Scholar 

  5. Lee, J., Tashev, I.: High-level feature representation using recurrent neural network for speech emotion recognition. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  6. Mao, Q., et al.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE transactions on multimedia 16(8), 2203–2213 (2014)

    Google Scholar 

  7. Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)

    Google Scholar 

  8. Trigeorgis, G., et al.: Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2016)

    Google Scholar 

  9. Kahou, S.E., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction. ACM (2013)

    Google Scholar 

  10. Aldeneh, Z., Provost, E.M.: Using regional saliency for speech emotion recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)

    Google Scholar 

  11. Chernykh, V., Prikhodko, P.: Emotion recognition from speech with recurrent neural networks. arXiv preprint arXiv:1701.08071 (2017)

    Google Scholar 

  12. The Ryerson Audio Visual Database of Emotional Speech and Song (RAVDESS). https://zenodo.org/record/1188976#.XU739B0zbIV

Download references

Acknowledgements

This chapter does not contain any studies with human participants or animals performed by any of the authors.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bhowmik, S., Chatterjee, A., Biswas, S., Farhin, R., Yasmin, G. (2020). Speech-Based Emotion Classification for Human by Introducing Upgraded Long Short-Term Memory (ULSTM). In: Das, A., Nayak, J., Naik, B., Dutta, S., Pelusi, D. (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing, vol 1120. Springer, Singapore. https://doi.org/10.1007/978-981-15-2449-3_8

Download citation

Publish with us

Policies and ethics