Abstract
In this paper we present a robust feature extractor that includes the In this paper we study the performance of emotion recognition from cochlear implant-like spectrally reduced speech (SRS) using the conventional Mel-frequency cepstral coefficients and a Gaussian mixture model (GMM)-based classifier. Cochlear-implant-like SRS of each utterance from the emotional speech corpus is obtained only from low-bandwidth subband temporal envelopes of the corresponding original utterance. The resulting utterances have less spectral information than the original utterances but contain the most relevant information for emotion recognition. The emotion classes are trained on the Mel-frequency cepstral coefficient (MFCC) features extracted from the SRS signals and classification is performed using MFCC features computed from the test SRS signals. In order to evaluate to the performance of the SRS-MFCC features, emotion recognition experiments are conducted on the FAU AIBO spontaneous emotion corpus. Conventional MFCC, Mel-warped DFT (discrete Fourier transform) spectrum-based cepstral coefficients (MWDCC), PLP (perceptual linear prediction), and amplitude modulation cepstral coefficient (AMCC) features extracted from the original signals are used for comparison purpose. Experimental results depict that the SRS-MFCC features outperformed all other features in terms of emotion recognition accuracy. Average relative improvements obtained over all baseline systems are 1.5% and 11.6% in terms of unweighted average recall and weighted average recall, respectively.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Wu, S., Falk, T.H., Chan, W.-Y.: Automatic speech emotion recognition using modulation spectral features. Speech Comm. 53(5), 768–785 (2011)
Chen, L., Mao, X., Xue, Y., Cheng, L.L.: Speech emotion recognition: Features and classification models. Digital Signal Processing 22, 1154–1160 (2012)
Ververidis, D., Kotropoulos, C.: Emotional speech recognition – resources features and methods. Speech Commun. 48, 1162–1181 (2006)
Scherer, K.: Vocal communication of emotion: A review of research paradigms. Speech Commun. 40, 227–256 (2003)
Sobol-Shikler, T., Robinson, P.: Classification of complex information: Inference of co-occurring affective states from their expressions in speech. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1284–1297 (2010)
Dumouchel, P., Dehak, N., Attabi, Y., Dehak, R., Boufaden, N.: Cepstral and long-term features for emotion recognition. In: Proc. INTERSPEECH, pp. 344–347 (2009)
Alam, M.J., Attabi, Y., Dumouchel, P., Kenny, P., O’Shaughnessy, D.: Amplitude Modulation Features for Emotion Recognition from Speech. In: Proc. INTERSPEECH, Lyon, France (2013)
Georgogiannis, A., Digalakis, V.: Speech emotion recognition using nonlinear Teager energy based features in noisy environments. In: Proc. EUSIPCO, Bucharest, Romania (August 2012)
Sato, N., Obuchi, Y.: Emotion recognition using Mel-frequency cepstral coefficients. Journal of Natural Language Processing 14(4), 83–96 (2007)
Neiberg, D., Elenius, K., Laskowski, K.: Emotion recognition in spontaneous speech using GMMs. In: Proc. of INTERSPEECH Conference, pp. 809–812 (2006)
Peter, C., Beale, R. (eds.): Affect and Emotion in Human-Computer Interaction. LNCS, vol. 4868. Springer, Heidelberg (2008)
Yoon, W.-J., Park, K.-S.: A study of emotion recognition and its applications. In: Torra, V., Narukawa, Y., Yoshida, Y. (eds.) MDAI 2007. LNCS (LNAI), vol. 4617, pp. 455–462. Springer, Heidelberg (2007)
Schuller, B., Müller, R., Eyben, F., Gast, J., Hörnler, B., Wöllmer, M., Rigoll, G., Höthker, A., Konosu, H.: Being bored? Recognizing natural interest by extensive audiovisual integration for real-life application. Image Vis. Comput. 27(12), 1760–1774 (2009)
Van Deemter, K., Krenn, B., Piwek, P., Klesen, M., Schröder, M., Baumann, S.: Fully generated scripted dialogue for embodied agents. Artificial Intelligence 172(10), 1219–1244 (2008)
Lorini, E., Schwarzentruber, F.: A logic for reasoning about counterfactual emotions. Artificial Intelligence 175(3), 814–847 (2011)
Scherer, K.R., Bänziger, T., Roesch, E.B. (eds.): Blueprint for Affective Computing - A Sourcebook. Oxford University Press, Oxford (2010)
Schuller, B., Steidl, S., Batliner, A.: The INTERSPEECH 2009 Emotion Challenge. In: Interspeech, ISCA, Brighton (2009)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. Journal of the Acoustical Society of America 87(4), 1738–1752 (1990)
Do, C.-T., Pastor, D., Goalic, A.: A novel framework for noise robust ASR using cochlear implant-like spectrally reduced speech. Speech Communication 54(1), 119–133 (2012)
Do, C.-T., Pastor, D., Le Lan, G., Goalic, A.: Recognizing cochlear implant-like spectrally reduced speech with HMM-based ASR: experiments with MFCCs and PLP coefficients. In: Proc. of INTERSPEECH 2010, pp. 2634–2637 (September 2010)
Do, C.-T., Taghizadeh, M.J., Garner, P.N.: Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition. In: Proc. SLT 2012 - IEEE Workshop on Spoken Language Technology, pp. 137–142 (December 2012)
Do, C.-T., Barras, C.: Cochlear implant-like processing of speech signal for speaker verification. In: Proc. SAPA 2012 Conference - Statistical and Perceptual Audition (Satellite Workshop of Interspeech 2012), pp. 17–21 (September 2012)
Shannon, R.V., Zeng, F.-G., Kamath, V., Wygonski, J., Ekelid, M.: Speech recognition with primarily temporal cues. Science 270(5234), 303–304 (1995)
Zeng, F.-G., Nie, K., Stickney, G., Kong, Y.-Y., Vongphoe, M., Bhargave, A., Wei, C., Cao, K.: Speech recognition with amplitude and frequency modulations. Proceedings of National Academy of Sciences 102(7), 2293–2298 (2005)
Dempster, A.P., Laird, N.M., Robin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royel Stastical Society B, 1–38 (1997)
Gunawan, T.S., Ambikairajah, E.: Speech enhancement using temporal masking and fractional Bark gammatone filters. In: Proc. 10th Australian Int. Conf. Speech Sci. Technol., Sydney, Australia, December 08-10, pp. 420–425 (2004)
Steidl, S.: Automatic Classification of Emotion-Related User States in Spontaneous Children’s Speech. Logos Verlag, Berlin (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Alam, M.J., Attabi, Y., Kenny, P., Dumouchel, P., O’Shaughnessy, D. (2014). Automatic Emotion Recognition from Cochlear Implant-Like Spectrally Reduced Speech. In: Pecchia, L., Chen, L.L., Nugent, C., Bravo, J. (eds) Ambient Assisted Living and Daily Activities. IWAAL 2014. Lecture Notes in Computer Science, vol 8868. Springer, Cham. https://doi.org/10.1007/978-3-319-13105-4_48
Download citation
DOI: https://doi.org/10.1007/978-3-319-13105-4_48
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13104-7
Online ISBN: 978-3-319-13105-4
eBook Packages: Computer ScienceComputer Science (R0)