Abstract
This paper aims at investigating the potentials of the phase spectrum in automatic speech recognition (ASR). We show that speech phase spectrum could potentially provide features with high discriminability and robustness. Out of such belief and to realize a higher portion of the phase spectrum potentials, we propose two simple amendments in two common blocks in feature extraction, namely pre-emphasis and windowing, without changing the workflow of the algorithms. Recognition tests over Aurora 2 indicate up to 11.2% and 14.7% performance improvement in average in the presence of both additive and convolutional noises for phase-based MODGDF and CGDF features, respectively. It proves the high potentials of the phase spectrum in robust ASR.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Ohm, G.S.: Uber die Definition des Tones, nebst daran geknupfter Theorie der Sirene und ahnlicher tonbildender Vorrichtungen. Ann. Phys. Chem. 59, 513–565 (1843)
von Helmholtz, H.L.F.: On the Sensations of Tone (English translation by A.J. Ellis). Longmans, Green and Co., London (1912) (original work published 1875)
Oppenheim, A.V., Lim, J.S.: The importance of phase in signals. Proc. IEEE 69, 529–541 (1981)
Wang, D.L., Lim, J.S.: The unimportance of phase in speech enhancement. IEEE Trans. Acoust. Speech Signal Process, ASSP 30(4), 679–681 (1982)
Liu, L., He, J., Palm, G.: Effects of phase on the perception of intervocalic stop consonants. Speech Commun. 22(4), 403–417 (1997)
Paliwal, K.K., Alsteris, L.D.: Usefulness of phase spectrum in human speech perception. In: Proc. of Eurospeech, pp. 2117–2120 (September 2003)
Murthy, H.A., Gadde, V.: The modified group delay function and its application to phoneme recognition. In: Proc. ICASSP, pp. 68–71 (April 2003)
Bozkurt, B., Couvreur, L., Dutoit, T.: Chirp group delay analysis of speech signals. Speech Commun. 49(3), 159–176 (2007)
Loweimi, E., Ahadi, S.M., Sheikhzadeh, H.: Phase-only speech reconstruction using short frames. In: Proc. InterSpeech, Florence, Italy (2011)
Loweimi, E., Ahadi, S.M., Loveymi, S.: On the importance of phase and magnitude spectra in speech enhancement. In: Proc. ICEE, Tehran, Iran (May 2011)
Hirsch, H.G., Pearce, D.: The AURORA experimental framework for the performance evaluation of speech recognition Systems under noisy conditions. In: Proc. ASR 2000, Paris, France (September 2000)
Young, S.J., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book Version 3.4. Cambridge University Press, Cambridge (2006)
Makhoul, J., Viswanathan, R.: Adaptive preprocessing for linear predictive speech com-pression systems. Journal of Acoustic Society of America 55, 475 (1974)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Loweimi, E., Ahadi, S.M., Drugman, T., Loveymi, S. (2013). On the Importance of Pre-emphasis and Window Shape in Phase-Based Speech Recognition. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-38847-7_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38846-0
Online ISBN: 978-3-642-38847-7
eBook Packages: Computer ScienceComputer Science (R0)