Abstract
The detection of emotions from speech is a key aspect of all human behaviors, Speech Emotion Recognition (SER) plays an extensive role in a diverse range of applications, especially in human-computer communication. The main aim of this study is to build two Machine Learning (ML) models able to classify the input speech into several classes of emotions. In contrast, we extract a set of prosodic and spectral features from sound files and apply a feature selection method to improve the SER rate of the proposed system. Experiments are being done to evaluate the accuracy of the emotional speech system with the use of the RAVDESS database. We performed the efficiency of our models and compared them to the existing literature for SER. Our obtained results indicate that the proposed system based on Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) achieves a test accuracy of \(69.67\%\) and \(65.04\%\) respectively with 8 emotional states.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bahreini, K., Nadolski, R., Westera, W.: Towards real-time speech emotion recognition for affective e-learning. Educ. Inf. Technol. 21(5), 1367–1386 (2015). https://doi.org/10.1007/s10639-015-9388-2
Abdel-Hamid, L., Shaker, N.H., Emara, I.: Analysis of linguistic and prosodic features of bilingual Arabic-English speakers for speech emotion recognition. IEEE Access 8, 72957–72970 (2020)
BojaniÃ, M., DeliÃ, V., Karpov, A.: Call redistribution for a call center based on speech emotion recognition. Appl. Sci. 10(13), 4653 (2020)
Shegokar, P., Sircar, P.: Continuous wavelet transform based speech emotion recognition. In: 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS), pp. 1–8 (2016)
Getahun, F., Kebede, M.: Emotion identification from spontaneous communication. In: 2016 12th International Conference on Signal-Image Technology Internet-Based Systems (SITIS), pp. 151–158 (2016)
Sun, L., Fu, S., Wang, F.: Decision tree SVM model with fisher feature selection for speech emotion recognition. EURASIP J. Audio Speech Music. Process. 2019, 2 (2019)
Bhavan, A., Chauhan, P., Hitkul, Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl.-Based Syst. 184, 104886 (2019)
Podder, P., Khan, T.Z., Khan, M.H., Rahman, M.M.: Comparative performance analysis of hamming, hanning and blackman window. Int. J. Comput. Appl. 96(18) (2014)
Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)
McKay, C., Fujinaga, I., Depalle, P.: jAudio: a feature extraction library. In: Proceedings of the International Conference on Music Information Retrieval, pp. 600-3 (2005)
Park, C.-H., Sim, K.-B.: Emotion recognition and acoustic analysis from speech signal. In: Proceedings of the International Joint Conference on Neural Networks, 2003, vol. 4, pp. 2594–2598. IEEE (2003)
Dave, N.: Feature extraction methods LPC, PLP and MFCC in speech recognition. Int. J. Adv. Res. Eng. Technol. 1(6), 1–4 (2013)
Makhoul, J.: Linear prediction: a tutorial review. Proc. IEEE 63(4), 561–580 (1975)
McAdams, S.: Perspectives on the contribution of timbre to musical structure. Comput. Music. J. 23(3), 85–102 (1999)
Aparna, U., Paul, S.: Feature selection and extraction in data mining. In: 2016 Online International Conference on Green Engineering and Technologies (IC-GET), pp. 1–3. IEEE (2016)
Ferri, F.J., Pudil, P., Hatef, M., Kittler, J.: Comparative study of techniques for large-scale feature selection. In: Machine Intelligence and Pattern Recognition, vol. 16, pp. 403–413. Elsevier (1994)
Bandela, S.R., Kishore, K.T.: Speech emotion recognition using semi-NMF feature optimization. Turk. J. Electr. Eng. Comput. Sci. 27(5), 3741–3757 (2019)
Liu, Z.-T., Rehman, A., Wu, M., Cao, W.-H., Hao, M.: Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence. Inf. Sci. 563, 309–325 (2021)
Deusi, J.S., Popa, E.I.: An investigation of the accuracy of real time speech emotion recognition. In: Bramer, M., Petridis, M. (eds.) SGAI 2019. LNCS (LNAI), vol. 11927, pp. 336–349. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-34885-4_26
Ancilin, J., Milton, A.: Improved speech emotion recognition with Mel frequency magnitude coefficient. Appl. Acoust. 179, 108046 (2021)
Acknowledgements
This work was supported by the Ministry of Higher Education, Scientific Research and Innovation, the Digital Development Agency (DDA) and the CNRST of Morocco (Alkhawarizmi/2020/01).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chakhtouna, A., Sekkate, S., Adib, A. (2022). Improving Speech Emotion Recognition System Using Spectral and Prosodic Features. In: Abraham, A., Gandhi, N., Hanne, T., Hong, TP., Nogueira Rios, T., Ding, W. (eds) Intelligent Systems Design and Applications. ISDA 2021. Lecture Notes in Networks and Systems, vol 418. Springer, Cham. https://doi.org/10.1007/978-3-030-96308-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-96308-8_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-96307-1
Online ISBN: 978-3-030-96308-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)