Automatic Emotion Recognition from Cochlear Implant-Like Spectrally Reduced Speech

Alam, Md Jahangir; Attabi, Yazid; Kenny, Patrick; Dumouchel, Pierre; O’Shaughnessy, Douglas

doi:10.1007/978-3-319-13105-4_48

Md Jahangir Alam¹⁹,
Yazid Attabi^19,20,
Patrick Kenny¹⁹,
Pierre Dumouchel²⁰ &
…
Douglas O’Shaughnessy²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8868))

Included in the following conference series:

International Workshop on Ambient Assisted Living

2854 Accesses

Abstract

In this paper we present a robust feature extractor that includes the In this paper we study the performance of emotion recognition from cochlear implant-like spectrally reduced speech (SRS) using the conventional Mel-frequency cepstral coefficients and a Gaussian mixture model (GMM)-based classifier. Cochlear-implant-like SRS of each utterance from the emotional speech corpus is obtained only from low-bandwidth subband temporal envelopes of the corresponding original utterance. The resulting utterances have less spectral information than the original utterances but contain the most relevant information for emotion recognition. The emotion classes are trained on the Mel-frequency cepstral coefficient (MFCC) features extracted from the SRS signals and classification is performed using MFCC features computed from the test SRS signals. In order to evaluate to the performance of the SRS-MFCC features, emotion recognition experiments are conducted on the FAU AIBO spontaneous emotion corpus. Conventional MFCC, Mel-warped DFT (discrete Fourier transform) spectrum-based cepstral coefficients (MWDCC), PLP (perceptual linear prediction), and amplitude modulation cepstral coefficient (AMCC) features extracted from the original signals are used for comparison purpose. Experimental results depict that the SRS-MFCC features outperformed all other features in terms of emotion recognition accuracy. Average relative improvements obtained over all baseline systems are 1.5% and 11.6% in terms of unweighted average recall and weighted average recall, respectively.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Speech emotion recognition using MFCC-based entropy feature

Article 22 August 2023

Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition

Paralinguistic and spectral feature extraction for speech emotion classification using machine learning techniques

Article Open access 15 May 2023

Keywords

References

Wu, S., Falk, T.H., Chan, W.-Y.: Automatic speech emotion recognition using modulation spectral features. Speech Comm. 53(5), 768–785 (2011)
Article Google Scholar
Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: Features and classification models. Digital Signal Processing 22, 1154–1160 (2012)
Article MathSciNet Google Scholar
Ververidis, D., Kotropoulos, C.: Emotional speech recognition – resources features and methods. Speech Commun. 48, 1162–1181 (2006)
Article Google Scholar
Scherer, K.: Vocal communication of emotion: A review of research paradigms. Speech Commun. 40, 227–256 (2003)
Article MATH Google Scholar
Sobol-Shikler, T., Robinson, P.: Classification of complex information: Inference of co-occurring affective states from their expressions in speech. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1284–1297 (2010)
Article Google Scholar
Dumouchel, P., Dehak, N., Attabi, Y., Dehak, R., Boufaden, N.: Cepstral and long-term features for emotion recognition. In: Proc. INTERSPEECH, pp. 344–347 (2009)
Google Scholar
Alam, M.J., Attabi, Y., Dumouchel, P., Kenny, P., O’Shaughnessy, D.: Amplitude Modulation Features for Emotion Recognition from Speech. In: Proc. INTERSPEECH, Lyon, France (2013)
Google Scholar
Georgogiannis, A., Digalakis, V.: Speech emotion recognition using nonlinear Teager energy based features in noisy environments. In: Proc. EUSIPCO, Bucharest, Romania (August 2012)
Google Scholar
Sato, N., Obuchi, Y.: Emotion recognition using Mel-frequency cepstral coefficients. Journal of Natural Language Processing 14(4), 83–96 (2007)
Article Google Scholar
Neiberg, D., Elenius, K., Laskowski, K.: Emotion recognition in spontaneous speech using GMMs. In: Proc. of INTERSPEECH Conference, pp. 809–812 (2006)
Google Scholar
Peter, C., Beale, R. (eds.): Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
Google Scholar
Yoon, W.-J., Park, K.-S.: A study of emotion recognition and its applications. In: Torra, V., Narukawa, Y., Yoshida, Y. (eds.) MDAI 2007. LNCS (LNAI), vol. 4617, pp. 455–462. Springer, Heidelberg (2007)
Chapter Google Scholar
Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? Recognizing natural interest by extensive audiovisual integration for real-life application. Image Vis. Comput. 27(12), 1760–1774 (2009)
Article Google Scholar
Van Deemter, K., Krenn, B., Piwek, P., Klesen, M., Schröder, M., Baumann, S.: Fully generated scripted dialogue for embodied agents. Artificial Intelligence 172(10), 1219–1244 (2008)
Article MATH Google Scholar
Lorini, E., Schwarzentruber, F.: A logic for reasoning about counterfactual emotions. Artificial Intelligence 175(3), 814–847 (2011)
Article MATH MathSciNet Google Scholar
Scherer, K.R., Bänziger, T., Roesch, E.B. (eds.): Blueprint for Affective Computing - A Sourcebook. Oxford University Press, Oxford (2010)
Google Scholar
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 Emotion Challenge. In: Interspeech, ISCA, Brighton (2009)
Google Scholar
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Article Google Scholar
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 87(4), 1738–1752 (1990)
Article Google Scholar
Do, C.-T., Pastor, D., Goalic, A.: A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech. Speech Communication 54(1), 119–133 (2012)
Article Google Scholar
Do, C.-T., Pastor, D., Le Lan, G., Goalic, A.: Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients. In: Proc. of INTERSPEECH 2010, pp. 2634–2637 (September 2010)
Google Scholar
Do, C.-T., Taghizadeh, M.J., Garner, P.N.: Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition. In: Proc. SLT 2012 - IEEE Workshop on Spoken Language Technology, pp. 137–142 (December 2012)
Google Scholar
Do, C.-T., Barras, C.: Cochlear implant-like processing of speech signal for speaker verification. In: Proc. SAPA 2012 Conference - Statistical and Perceptual Audition (Satellite Workshop of Interspeech 2012), pp. 17–21 (September 2012)
Google Scholar
Shannon, R.V., Zeng, F.-G., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with primarily temporal cues. Science 270(5234), 303–304 (1995)
Article Google Scholar
Zeng, F.-G., Nie, K., Stickney, G., Kong, Y.-Y., Vongphoe, M., Bhargave, A., Wei, C., Cao, K.: Speech recognition with amplitude and frequency modulations. Proceedings of National Academy of Sciences 102(7), 2293–2298 (2005)
Article Google Scholar
Dempster, A.P., Laird, N.M., Robin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royel Stastical Society B, 1–38 (1997)
Google Scholar
Gunawan, T.S., Ambikairajah, E.: Speech enhancement using temporal masking and fractional Bark gammatone filters. In: Proc. 10th Australian Int. Conf. Speech Sci. Technol., Sydney, Australia, December 08-10, pp. 420–425 (2004)
Google Scholar
Steidl, S.: Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech. Logos Verlag, Berlin (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

CRIM, Montreal, QC, Canada
Md Jahangir Alam, Yazid Attabi & Patrick Kenny
ETS, Montreal, QC, Canada
Yazid Attabi & Pierre Dumouchel
INRS-EMT, University of Quebec, Montreal, QC, Canada
Douglas O’Shaughnessy

Authors

Md Jahangir Alam
View author publications
You can also search for this author in PubMed Google Scholar
Yazid Attabi
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Kenny
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Dumouchel
View author publications
You can also search for this author in PubMed Google Scholar
Douglas O’Shaughnessy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Engineering, University of Warwick, CV4 7AL, Coventry, UK
Leandro Pecchia
School of Computer Science and Informatics, De Montfort University, The Gateway, LE1 9BH, Leicester, UK
Liming Luke Chen
Institute, School of Computing and Mathematics, University of Ulster, Computer Science Research, Jordanstown Campus, Shore Road, BT37 0QB, Newtownabbey, UK
Chris Nugent
Escuela Superior de Informática, Castilla La Mancha University, Paseo de la Universidad 4, 13071, Ciudad, Real, Spain
José Bravo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alam, M.J., Attabi, Y., Kenny, P., Dumouchel, P., O’Shaughnessy, D. (2014). Automatic Emotion Recognition from Cochlear Implant-Like Spectrally Reduced Speech. In: Pecchia, L., Chen, L.L., Nugent, C., Bravo, J. (eds) Ambient Assisted Living and Daily Activities. IWAAL 2014. Lecture Notes in Computer Science, vol 8868. Springer, Cham. https://doi.org/10.1007/978-3-319-13105-4_48

Download citation

DOI: https://doi.org/10.1007/978-3-319-13105-4_48
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13104-7
Online ISBN: 978-3-319-13105-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Emotion Recognition from Cochlear Implant-Like Spectrally Reduced Speech

Abstract

Chapter PDF

Similar content being viewed by others

Speech emotion recognition using MFCC-based entropy feature

Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition

Paralinguistic and spectral feature extraction for speech emotion classification using machine learning techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Emotion Recognition from Cochlear Implant-Like Spectrally Reduced Speech

Abstract

Chapter PDF

Similar content being viewed by others

Speech emotion recognition using MFCC-based entropy feature

Linear Frequency Residual Cepstral Coefficients for Speech Emotion Recognition

Paralinguistic and spectral feature extraction for speech emotion classification using machine learning techniques

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation