Speech-Based Emotion Classification for Human by Introducing Upgraded Long Short-Term Memory (ULSTM)

Bhowmik, Subhrajit; Chatterjee, Akshay; Biswas, Sampurna; Farhin, Reshmina; Yasmin, Ghazaala

doi:10.1007/978-981-15-2449-3_8

Subhrajit Bhowmik¹⁹,
Akshay Chatterjee¹⁹,
Sampurna Biswas¹⁹,
Reshmina Farhin¹⁹ &
…
Ghazaala Yasmin¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1120))

613 Accesses
1 Citations

Abstract

All humans have the intelligence for emotions through emotional behaviour by social skills and by interacting with and imitating humans. Not only that, we also enhance and upgrade our skills for analysis of different emotions through learning with our experience in our surroundings. Now, what if the machine is capable of learning through its artificial intelligent skills? The ongoing exploration is being done using the deep learning model concept. This technique is being used to enhance the learning capacity of the machine which is most important in human emotion recognition because one emotion can be derived towards another type of emotion which is difficult to analyse. This theme has inclined us to explore this problem. The proposed method has been designed to categorize the human emotions through four different deep learning models, which are convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM) and gated recurrent unit (GRU). For training these models, well-known physical and perceptual features have been fitted. The system has been tested on the benchmark data of Ryerson Audio-Visual Dataset for Emotional Speech and Song (RAVDESS). Furthermore, the mentioned deep learning model has been compared based on testing the above dataset in terms of the vanishing gradient problem. In addition, an upgraded model of LSTM has been proposed to get better accuracy and it is being tested with the existing model of LSTM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Speech Based Multiple Emotion Classification Model Using Deep Learning

Speech-Based Human Emotion Recognition Using CNN and LSTM Model Approach

When Old Meets New: Emotion Recognition from Speech Signals

Article Open access 19 April 2021

References

Han, K., Yu, D., Tashev, I.: Speech emotion recognition using deep neural network and extreme learning machine. In: Fifteenth Annual Conference of the International Speech Communication Association (2014)
Google Scholar
Kim, Y., Lee, H., Provost, E.M.: Deep learning for robust feature generation in audiovisual emotion recognition. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE (2013)
Google Scholar
Zheng, W.Q., Yu, J.S., Zou, Y.X.: An experimental study of speech emotion recognition based on deep convolutional neural networks. In: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE (2015)
Google Scholar
Satt, A., Rozenberg, S., Hoory, R.:. Efficient emotion recognition from speech using deep learning on spectrograms. In: INTERSPEECH (2017)
Google Scholar
Lee, J., Tashev, I.: High-level feature representation using recurrent neural network for speech emotion recognition. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Mao, Q., et al.: Learning salient features for speech emotion recognition using convolutional neural networks. IEEE transactions on multimedia 16(8), 2203–2213 (2014)
Google Scholar
Mirsamadi, S., Barsoum, E., Zhang, C.: Automatic speech emotion recognition using recurrent neural networks with local attention. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)
Google Scholar
Trigeorgis, G., et al.: Adieu features? end-to-end speech emotion recognition using a deep convolutional recurrent network. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2016)
Google Scholar
Kahou, S.E., et al.: Combining modality specific deep neural networks for emotion recognition in video. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction. ACM (2013)
Google Scholar
Aldeneh, Z., Provost, E.M.: Using regional saliency for speech emotion recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)
Google Scholar
Chernykh, V., Prikhodko, P.: Emotion recognition from speech with recurrent neural networks. arXiv preprint arXiv:1701.08071 (2017)
Google Scholar
The Ryerson Audio Visual Database of Emotional Speech and Song (RAVDESS). https://zenodo.org/record/1188976#.XU739B0zbIV

Download references

Acknowledgements

This chapter does not contain any studies with human participants or animals performed by any of the authors.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, St Thomas’ College of Engineering and Technology, 4 Diamond Harbour Road, Kolkata, 700 023, India
Subhrajit Bhowmik, Akshay Chatterjee, Sampurna Biswas, Reshmina Farhin & Ghazaala Yasmin

Authors

Subhrajit Bhowmik
View author publications
You can also search for this author in PubMed Google Scholar
Akshay Chatterjee
View author publications
You can also search for this author in PubMed Google Scholar
Sampurna Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Reshmina Farhin
View author publications
You can also search for this author in PubMed Google Scholar
Ghazaala Yasmin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Indian Institute of Engineering Science and Technology, Howrah, West Bengal, India
Asit Kumar Das
Department of Computer Science and Engineering, Aditya Institute of Technology and Management (AITAM), Srikakulam, Andhra Pradesh, India
Janmenjoy Nayak
Department of Computer Applications, Veer Surendra Sai University of Technology, Sambalpur, Odisha, India
Bighnaraj Naik
Department of Computer Science and Application, Institute of Engineering and Management (IEM), Kolkata, West Bengal, India
Soumi Dutta
Faculty of Communication Sciences, University of Teramo, Teramo, Italy
Danilo Pelusi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhowmik, S., Chatterjee, A., Biswas, S., Farhin, R., Yasmin, G. (2020). Speech-Based Emotion Classification for Human by Introducing Upgraded Long Short-Term Memory (ULSTM). In: Das, A., Nayak, J., Naik, B., Dutta, S., Pelusi, D. (eds) Computational Intelligence in Pattern Recognition. Advances in Intelligent Systems and Computing, vol 1120. Springer, Singapore. https://doi.org/10.1007/978-981-15-2449-3_8

Download citation

DOI: https://doi.org/10.1007/978-981-15-2449-3_8
Published: 20 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2448-6
Online ISBN: 978-981-15-2449-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Speech-Based Emotion Classification for Human by Introducing Upgraded Long Short-Term Memory (ULSTM)

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Based Multiple Emotion Classification Model Using Deep Learning

Speech-Based Human Emotion Recognition Using CNN and LSTM Model Approach

When Old Meets New: Emotion Recognition from Speech Signals

References

Acknowledgements

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Speech-Based Emotion Classification for Human by Introducing Upgraded Long Short-Term Memory (ULSTM)

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Speech Based Multiple Emotion Classification Model Using Deep Learning

Speech-Based Human Emotion Recognition Using CNN and LSTM Model Approach

When Old Meets New: Emotion Recognition from Speech Signals

References

Acknowledgements

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation