Abstract
The performance of Mel-frequency cepstrum based automatic speech recognition system significantly degrade in noisy environments. In this article, the feasibility of utilizing the bio-inspired auditory features to improve noise robustness is investigated. The features are based on auditory characteristics, which include gammatone filtering and modulation spectral processing to emulate the mechanisms performed in the cochlea and middle ear aimed to improve robustness in human ear. The robust noise resistant features that emulate cochlea frequency resolution are extracted by gammatone filtering. And then a long-term modulation spectral processing, which preserves speech intelligibility in the signal is performed. Compared and discussed are the features based on the performance on Aurora5 database, comprising the meeting recorder digit task recorded with four different microphones in a hands-free mode at a real meeting room and living room and office room simulated data corrupted with different levels of additive noises. The performance of these features is also investigated for CHiME challenge, aiming at speech separation and recognition in noise background that has been collected from a real family room using binaural microphones. The experimental results show that the proposed features provide considerable improvement with respect to the standard feature extraction techniques for both the versions of the database.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Kellermann, W.: Some current challenges in multichannel acoustic signal processing. The Journal of the Acoustical Society of America 120, 3177–3178 (2006)
Droppo, J., Acero, A.: Environmental Robustness. In: Handbook of Speech Processing, pp. 653–679. Springer, Heidelberg (2008)
Maganti, H.K., Member, S., Gatica-perez, D., Mccowan, I.: Speech enhancement and recognition in meetings with an audio-visual sensor array. In: IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne, EPFL (2006)
Woelfel, J., McDonough, J.: Distant Speech Recognition, 1st edn. John Wiley (2009)
Ephraim, Y., Cohen, I.: Recent Advances in Speech Enhancement. CRC Press (2006)
Habets, E.A.P.: Single-channel speech dereverberation based on spectral subtraction. In: PRORISC, Veldhoven, The Netherlands, pp. 250–254 (2004)
Omologo, M., Svaizer, P., Matassoni, M.: Environmental conditions and acoustic transduction in hands-free speech recognition. Speech Communication 25, 75–95 (1998)
Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 9, 504–512 (2001)
Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing 2, 578–589 (1994)
Gales, M., Young, S.: A fast and flexible implementation of parallel model combination. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1995, vol. 1, pp. 133–136 (1995)
Holmberg, M., Gelbart, D., Ramacher, U., Hemmert, W.: Automatic Speech Recognition with Neural Spike Trains. In: INTERSPEECH (2005)
Deng, L., Sheikhzadeh, H.: Use of Temporal Codes Computed From a Cochlear Model for Speech Recognition. Psychology Press (2006)
Ghitza, O.: Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment. Journal of Phonetics (1988)
Seneff, S.: A joint synchrony/mean-rate model of auditory speech processing. Journal of Phonetics 16, 55–76 (1988)
Dau, T., Pueschel, D., Kohlrausch, A.: A quantitative model of the effective signal processing in the auditory system. The Journal of the Acoustical Society of America 99, 3615–3622 (1996)
Flynn, R., Jones, E.: A comparative study of auditory-based front-ends for robust speech recognition using the aurora 2 database. In: Irish Signals and Systems Conference, 2006, pp. 111–116. IET (2006)
Kleinschmidt, M., Tchorz, J., Kollmeier, B.: Combining speech enhancement and auditory feature extraction for robust speech recognition. Speech Commun. 34, 75–91 (2000)
Hermansky, H.: Auditory modeling in automatic recognition of speech. ECSAP (1996)
Schluter, R., Bezrukov, L., Wagner, H., Ney, H.: Gammatone features and feature combination for large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. IV-649–IV-652 (2007)
Drullman, R., Festen, J.M., Plomp, R.: Effect of reducing slow temporal modulations on speech reception. The Journal of the Acoustical Society of America 95, 2670–2680 (1994)
Kanedera, N., Arai, T., Hermansky, H., Pavel, M.: On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Communication 28, 43–55 (1999)
Houtgast, T., Steeneken, H.J.M., Plomp, R.: Predicting speech intelligibility in rooms from the modulation transfer function. Acustica 46, 60–72 (1980)
Kingsbury, B.: Perceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments. PhD thesis, Michigan State University (1998)
Maganti, H.K., Motlicek, P., Gatica-Perez, D.: Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP (2007)
Flanagan, J.L.: Models for approximating basilar membrane displacement. Journal of the Acoustical Society of America 32 (1960)
Johannesma, P.I.: The pre-response stimulus ensemble of neurons in the cochlear nucleus. In: Symposium on Hearing Theory (Institute for Perception Research), Eindhoven, Holland, pp. 58–69 (1972)
Boer, E.D.: On the principle of specific coding. Journal of Dynamic Systems, Measurement, and Control 95, 265–273 (1973)
Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., Rice, P.: An efficient auditory filterbank based on the gammatone function. In: Meeting of the IOC Speech Group on Auditory Modelling at RSRE (1987)
Slaney, M.: An efficient implementation of the patterson holdsworth auditory filterbank. Technical report, Apple Computers, Perception Group (1993)
Glasberg, B.R., Moore, B.C.J.: Derivation of auditory filter shapes from notched-noise data. Hearing Research 47, 103–138 (1990)
Greenberg, S.: On the origins of speech intelligibility in the real world. In: ESCA Workshop on Robust Speech Recognition for Unkown Communication Channels, pp. 23–32 (1997)
Dudley, H.: Remarking speech. The Journal of the Acoustical Society of America 11, 169–177 (1939)
Drullman, R., Festen, J.M., Plomp, R.: Effect of temporal envelope smearing on speech reception. Journal of The Acoustical Society of America 95 (1994)
Ellis, D.: Gammatone-like spectrograms (2010), http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram
Hirsch, H.: Aurora-5 experimental framework for the performance evaluation of speech recognition in case of a hands-free speech input in noisy environments (2007), http://aurora.hsnr.de/aurora-5/reports.html
Christensen, H., Baker, J., Ma, N., Green, P.: The chime corpus: a resource and a challenge for computational hearing in multisource environments. In: Interspeech 2010 (2010)
Nesta, F., Wada, T., Juang, B.H.: Batch-online semi-blind source separation applied to multi-channel acoustic echo cancellation. IEEE Transactions on Audio, Speech, and Language Processing 19, 583–599 (2011)
Nesta, F., Svaizer, P., Omologo, M.: Convolutive bss of short mixtures by ica recursively regularized across frequencies. IEEE Transactions on Audio, Speech, and Language Processing 19, 624–639 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maganti, H.K., Matassoni, M. (2013). Auditory Processing Inspired Robust Feature Enhancement for Speech Recognition. In: Fred, A., Filipe, J., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2011. Communications in Computer and Information Science, vol 273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29752-6_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-29752-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29751-9
Online ISBN: 978-3-642-29752-6
eBook Packages: Computer ScienceComputer Science (R0)