Auditory Processing Inspired Robust Feature Enhancement for Speech Recognition

Maganti, Hari Krishna; Matassoni, Marco

doi:10.1007/978-3-642-29752-6_15

Hari Krishna Maganti⁴ &
Marco Matassoni⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 273))

Included in the following conference series:

International Joint Conference on Biomedical Engineering Systems and Technologies

1594 Accesses
1 Citations

Abstract

The performance of Mel-frequency cepstrum based automatic speech recognition system significantly degrade in noisy environments. In this article, the feasibility of utilizing the bio-inspired auditory features to improve noise robustness is investigated. The features are based on auditory characteristics, which include gammatone filtering and modulation spectral processing to emulate the mechanisms performed in the cochlea and middle ear aimed to improve robustness in human ear. The robust noise resistant features that emulate cochlea frequency resolution are extracted by gammatone filtering. And then a long-term modulation spectral processing, which preserves speech intelligibility in the signal is performed. Compared and discussed are the features based on the performance on Aurora5 database, comprising the meeting recorder digit task recorded with four different microphones in a hands-free mode at a real meeting room and living room and office room simulated data corrupted with different levels of additive noises. The performance of these features is also investigated for CHiME challenge, aiming at speech separation and recognition in noise background that has been collected from a real family room using binaural microphones. The experimental results show that the proposed features provide considerable improvement with respect to the standard feature extraction techniques for both the versions of the database.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

A bio-inspired feature extraction for robust speech recognition

Article Open access 04 November 2014

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Novel Gammatone Filterbank Based Spectro-Temporal Features for Robust Phoneme Recognition

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Kellermann, W.: Some current challenges in multichannel acoustic signal processing. The Journal of the Acoustical Society of America 120, 3177–3178 (2006)
Google Scholar
Droppo, J., Acero, A.: Environmental Robustness. In: Handbook of Speech Processing, pp. 653–679. Springer, Heidelberg (2008)
Chapter Google Scholar
Maganti, H.K., Member, S., Gatica-perez, D., Mccowan, I.: Speech enhancement and recognition in meetings with an audio-visual sensor array. In: IDIAP Research Institute and Ecole Polytechnique Federale de Lausanne, EPFL (2006)
Google Scholar
Woelfel, J., McDonough, J.: Distant Speech Recognition, 1st edn. John Wiley (2009)
Google Scholar
Ephraim, Y., Cohen, I.: Recent Advances in Speech Enhancement. CRC Press (2006)
Google Scholar
Habets, E.A.P.: Single-channel speech dereverberation based on spectral subtraction. In: PRORISC, Veldhoven, The Netherlands, pp. 250–254 (2004)
Google Scholar
Omologo, M., Svaizer, P., Matassoni, M.: Environmental conditions and acoustic transduction in hands-free speech recognition. Speech Communication 25, 75–95 (1998)
Article Google Scholar
Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Transactions on Speech and Audio Processing 9, 504–512 (2001)
Article Google Scholar
Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Transactions on Speech and Audio Processing 2, 578–589 (1994)
Article Google Scholar
Gales, M., Young, S.: A fast and flexible implementation of parallel model combination. In: International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1995, vol. 1, pp. 133–136 (1995)
Google Scholar
Holmberg, M., Gelbart, D., Ramacher, U., Hemmert, W.: Automatic Speech Recognition with Neural Spike Trains. In: INTERSPEECH (2005)
Google Scholar
Deng, L., Sheikhzadeh, H.: Use of Temporal Codes Computed From a Cochlear Model for Speech Recognition. Psychology Press (2006)
Google Scholar
Ghitza, O.: Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment. Journal of Phonetics (1988)
Google Scholar
Seneff, S.: A joint synchrony/mean-rate model of auditory speech processing. Journal of Phonetics 16, 55–76 (1988)
Google Scholar
Dau, T., Pueschel, D., Kohlrausch, A.: A quantitative model of the effective signal processing in the auditory system. The Journal of the Acoustical Society of America 99, 3615–3622 (1996)
Article Google Scholar
Flynn, R., Jones, E.: A comparative study of auditory-based front-ends for robust speech recognition using the aurora 2 database. In: Irish Signals and Systems Conference, 2006, pp. 111–116. IET (2006)
Google Scholar
Kleinschmidt, M., Tchorz, J., Kollmeier, B.: Combining speech enhancement and auditory feature extraction for robust speech recognition. Speech Commun. 34, 75–91 (2000)
Article Google Scholar
Hermansky, H.: Auditory modeling in automatic recognition of speech. ECSAP (1996)
Google Scholar
Schluter, R., Bezrukov, L., Wagner, H., Ney, H.: Gammatone features and feature combination for large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. IV-649–IV-652 (2007)
Google Scholar
Drullman, R., Festen, J.M., Plomp, R.: Effect of reducing slow temporal modulations on speech reception. The Journal of the Acoustical Society of America 95, 2670–2680 (1994)
Article Google Scholar
Kanedera, N., Arai, T., Hermansky, H., Pavel, M.: On the relative importance of various components of the modulation spectrum for automatic speech recognition. Speech Communication 28, 43–55 (1999)
Article Google Scholar
Houtgast, T., Steeneken, H.J.M., Plomp, R.: Predicting speech intelligibility in rooms from the modulation transfer function. Acustica 46, 60–72 (1980)
Google Scholar
Kingsbury, B.: Perceptually Inspired Signal-processing Strategies for Robust Speech Recognition in Reverberant Environments. PhD thesis, Michigan State University (1998)
Google Scholar
Maganti, H.K., Motlicek, P., Gatica-Perez, D.: Unsupervised speech/non-speech detection for automatic speech recognition in meeting rooms. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP (2007)
Google Scholar
Flanagan, J.L.: Models for approximating basilar membrane displacement. Journal of the Acoustical Society of America 32 (1960)
Google Scholar
Johannesma, P.I.: The pre-response stimulus ensemble of neurons in the cochlear nucleus. In: Symposium on Hearing Theory (Institute for Perception Research), Eindhoven, Holland, pp. 58–69 (1972)
Google Scholar
Boer, E.D.: On the principle of specific coding. Journal of Dynamic Systems, Measurement, and Control 95, 265–273 (1973)
Article Google Scholar
Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., Rice, P.: An efficient auditory filterbank based on the gammatone function. In: Meeting of the IOC Speech Group on Auditory Modelling at RSRE (1987)
Google Scholar
Slaney, M.: An efficient implementation of the patterson holdsworth auditory filterbank. Technical report, Apple Computers, Perception Group (1993)
Google Scholar
Glasberg, B.R., Moore, B.C.J.: Derivation of auditory filter shapes from notched-noise data. Hearing Research 47, 103–138 (1990)
Article Google Scholar
Greenberg, S.: On the origins of speech intelligibility in the real world. In: ESCA Workshop on Robust Speech Recognition for Unkown Communication Channels, pp. 23–32 (1997)
Google Scholar
Dudley, H.: Remarking speech. The Journal of the Acoustical Society of America 11, 169–177 (1939)
Article Google Scholar
Drullman, R., Festen, J.M., Plomp, R.: Effect of temporal envelope smearing on speech reception. Journal of The Acoustical Society of America 95 (1994)
Google Scholar
Ellis, D.: Gammatone-like spectrograms (2010), http://www.ee.columbia.edu/~dpwe/resources/matlab/gammatonegram
Hirsch, H.: Aurora-5 experimental framework for the performance evaluation of speech recognition in case of a hands-free speech input in noisy environments (2007), http://aurora.hsnr.de/aurora-5/reports.html
Christensen, H., Baker, J., Ma, N., Green, P.: The chime corpus: a resource and a challenge for computational hearing in multisource environments. In: Interspeech 2010 (2010)
Google Scholar
Nesta, F., Wada, T., Juang, B.H.: Batch-online semi-blind source separation applied to multi-channel acoustic echo cancellation. IEEE Transactions on Audio, Speech, and Language Processing 19, 583–599 (2011)
Article Google Scholar
Nesta, F., Svaizer, P., Omologo, M.: Convolutive bss of short mixtures by ica recursively regularized across frequencies. IEEE Transactions on Audio, Speech, and Language Processing 19, 624–639 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Fondazione Bruno Kessler, Center for Information Technology - IRST, via Sommarive 18, 38123, Povo, Trento, Italy
Hari Krishna Maganti & Marco Matassoni

Authors

Hari Krishna Maganti
View author publications
You can also search for this author in PubMed Google Scholar
Marco Matassoni
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

IST - Technical University of Lisbon, Av.Rovisco Pais, 1, 1049-001, Lisbon, Portugal
Ana Fred
Departament of Systems and Informatics, Polytechnic Institute of Setúbal – INSTICC, Rua do Vale de Chaves - Estefanilha, 2910-761, Setúbal, Portugal
Joaquim Filipe
Institute of Telecommunications, Av. Rovisco Pais, 1, 1049-001, Lisboa, Portugal
Hugo Gamboa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maganti, H.K., Matassoni, M. (2013). Auditory Processing Inspired Robust Feature Enhancement for Speech Recognition. In: Fred, A., Filipe, J., Gamboa, H. (eds) Biomedical Engineering Systems and Technologies. BIOSTEC 2011. Communications in Computer and Information Science, vol 273. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29752-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-642-29752-6_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29751-9
Online ISBN: 978-3-642-29752-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Auditory Processing Inspired Robust Feature Enhancement for Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

A bio-inspired feature extraction for robust speech recognition

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Novel Gammatone Filterbank Based Spectro-Temporal Features for Robust Phoneme Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Auditory Processing Inspired Robust Feature Enhancement for Speech Recognition

Abstract

Chapter PDF

Similar content being viewed by others

A bio-inspired feature extraction for robust speech recognition

Smoothed Nonlinear Energy Operator-Based Amplitude Modulation Features for Robust Speech Recognition

Novel Gammatone Filterbank Based Spectro-Temporal Features for Robust Phoneme Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation