Abstract
In this paper we present a robust feature extractor that includes the use of a smoothed nonlinear energy operator (SNEO)-based amplitude modulation features for a large vocabulary continuous speech recognition (LVCSR) task. SNEO estimates the energy required to produce the AM-FM signal, and then the estimated energy is separated into its amplitude and frequency components using an energy separation algorithm (ESA). Similar to the PNCC (Power Normalized Cepstral Coefficients) front-end, a medium duration power bias subtraction (MDPBS) is used to enhance the AM power spectrum. The performance of the proposed feature extractor is evaluated, in the context of speech recognition, on the AURORA-4 corpus, which represents additive noise and channel mismatch conditions. The ETSI advanced front-end (ETSI-AFE),power normalized cepstral coefficients (PNCC), Cochlear filterbank cepstral coefficients (CFCC) and conventional MFCC and PLP features are used for comparison purposes. Experimental speech recognition results on the AURORA-4 task depict that the proposed method is robust against both additive and different microphone channel environments.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoustics, Speech, and Signal Processing 28(4), 357–366 (1980)
Hermansky, H.: Perceptual linear prediction analysis of speech, J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)
Terasawa, H.: A Hybrid Model for Timbre Perception: Quantitative Representations of Sound Color and Density. Ph.D. Thesis, Stanford University, Stanford, CA (2009)
ETSI ES 202 050, Speech Processing, Transmission and Quality aspects (STQ); Distributed speech recognition; advanced front-end feature extraction algorithm; Compression algorithms (2003)
Kim, C., Stern, R.M.: Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 4574–4577 (March 2010)
Alam, M.J., Kenny, P., O’Shaughnessy, D.: Robust Feature Extraction for Speech Recognition by Enhancing Auditory Spectrum. In: Proc. INTERSPEECH, Portland Oregon (September 2012)
van Hout, J., Alwan, A.: A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition. In: Proc. of ICASSP, pp. 4105–4108 (2012)
Vikramjit Mitra, H., Franco, M., Graciarena, A.: Mandal, Normalized Amplitude modulation features for large vocabulary noise-robust speech recognition. In: Proc. of ICASSP, pp. 4117–4120 (2012)
Maragos, Kaiser, J.F., Quatieri, T.F.: On amplitude and frequency demodulation using energy operators. IEEE Trans. Signal Processing 41(4), 1532–1550 (1993)
Potamianos, A., Maragos, P.: Speech analysis and synthesis using an AM–FM modulation model. Speech Communication 28, 195–209 (1999)
Dimitriadis, D., Maragos, P.: Continuous energy demodulation methods and application to speech analysis. Speech Communication 48(7), 819–837 (2006)
Zhou, G., Hansen, J.H.L., Kaiser, J.F.: Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing 9, 201–216 (2001)
Gao, H., Chen, S.G.: Emotion classification of mandarin speech based on TEO nonlinear features. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, 394–398 (2007)
Jabloun, F., Cetin, A.E., Erzin, E.: Teager energy based feature parameters for speech recognition in car noise. IEEE Signal Processing Letters 6(10), 259–261 (1999)
Dimitriadis, D., Maragos, P., Potamianos, A.: Robust AM–FM features for speech recognition. IEEE Signal Processing Letters 12(9), 621–624 (2005)
Jankowski Jr., C.R., Quatieri, T.F., Reynolds, D.A.: Measuring fine structure in speech: Application to speaker identification. In: ICASSP 1995, Detroit, USA (May 1995)
Plumpe, M.D., Quatieri, T.F., Reynolds, D.A.: Modeling of the glottal flow derivative waveform with application to speaker identification. IEEE Trans. Speech and Audio Processing 7(5), 569–586 (1999)
Grimaldi, M., Cummins, F.: Speaker identification using instantaneous frequencies. IEEE Trans. Audio, Speech and Language Processing 16(6), 1097–1111 (2008)
Tsiakoulis, P., Potamianos, A.: Statistical Analysis of Amplitude Modulation in Speech Signals using an AM-FM Model. In: Proc. Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2009), Taipei, Taiwan (April 2009)
Potamianos, A., Maragos, P.: A comparison of energy operator and Hilbert transform approach to signal and speech demodulation. Signal Process 37(1), 95–120 (1994)
Mukhopadhyay, S., Ray, G.C.: A new interpretation of nonlinear energy operator and its efficacy in spike detection. IEEE Tans. on Biomedical Engg. 45(2), 180–187 (1998)
Parihar, N., Picone, J., Pearce, D., Hirsch, H.G.: Performance analysis of the Aurora large vocabulary baseline system. In: Proceedings of the European Signal Processing Conference, Vienna, Austria (2004)
Kaiser, J.F.: On a Simple Algorithm to Calculate the ‘Energy’ of a Signal,”. In: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Albuquerque, NM, pp. 381–384 (April 1990)
Li, Q(P.), Huang, Y.: Robust speaker identification using an auditory-based feature. In: Proc. ICASSP, pp. 4514–4517 (2010)
Kvedalen, E.: Signal processing using the Teager energy operator and other nonlinear operators, Cand. Scient Thesis, University of Oslo (May 2003)
Au Yeung, S.-K., Siu, M.-H.: Improved performance of Aurora-4 using HTK and unsupervised MLLR adaptation. In: Proceedings of the Int. Conference on Spoken Language Processing, Jeju, Korea (2004)
Young, S.J., et al.: HTK Book, Entropic Cambridge Research Laboratory Ltd., 3.4 edition (2006), http://htk.eng.cam.ac.uk/
Alam, M.J., Ouellet, P., Kenny, P., O’Shaughnessy, D.: Comparative Evaluation of Feature Normalization Techniques for Speaker Verification. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds.) NOLISP 2011. LNCS, vol. 7015, pp. 246–253. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alam, M.J., Kenny, P., O’Shaughnessy, D. (2013). Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-38847-7_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38846-0
Online ISBN: 978-3-642-38847-7
eBook Packages: Computer ScienceComputer Science (R0)