Abstract
Currently, many speaker recognition applications must handle speech corrupted by environmental additive noise without having a priori knowledge about the characteristics of noise. Some previous works in speaker recognition have used the missing feature (MF) approach to compensate for noise. In most of those applications, the spectral reliability decision step is performed using the signal to noise ratio (SNR) criterion, which attempts to directly measure the relative signal to noise energy at each frequency. An alternative approach to spectral data reliability has been used with some success in the MF approach to speech recognition. Here, we compare the use of this new criterion with the SNR criterion for MF mask estimation in speaker recognition. The new reliability decision is based on the extraction and analysis of several spectro-temporal features from across the entire speech frame, but not across the time, which highlight the differences between spectral regions dominated by speech and by noise. We call it the feature classification (FC) criterion. It uses several spectral features to establish spectrogram reliability unlike SNR criterion that relies only in one feature: SNR. We evaluated our proposal through speaker verification experiments, in Ahumada speech database corrupted by different types of noise at various SNR levels. Experiments demonstrated that the FC criterion achieves considerably better recognition accuracy than the SNR criterion in the speaker verification tasks tested.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Benesty J., Sondhi M.M., Huang Y.: Springer Handbook of Speech Processing. Springer, Berlin (2008)
Berouti, M., Schwartz, R., Makhoul, J.: Enhancement of speech corrupted by acoustic noise. In: IEEE ICASSP (1979)
Hirsch, H.G., Ehrlicher, C.: Noise estimation techniques for robust speech recognition. In: ICASSP (1995)
Martin, R.: Noise power spectral density estimation based on optimal smoothing and minimum statistics. In: IEEE Transaction on Speech and Audio Proceedings, vol. 9 (2001)
Teunen, R., Shahshahani, B., Heck, L.P.: A Model-Based Transformational Approach to Robust Speaker Recognition. ICSLP, Beijing (2000)
Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. In: IEEE Transaction Ac. Speech, and Signal Processing, vol. 28, issue number 4, pp. 357–366 (1980)
Hermansky, H.: Perceptual linear prediction (PLP) analysis for speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)
Gales, M.J.F., Young, S.J.: HMM recognition in noise using parallel model combination. In: EUROSPEECH’93, pp. 837–840 (1993)
Sagayama, S., Yamaguchi, Y., Takahashi, S., Takahashi, J.: Jacobian approach to fast acoustic model adaptation. In: ICASSP (1997)
Bimbot F., Bonastre J.-F., Fredouille C., Gravier G., Magrin-Chagnolleau I., Meignier S., Merlin T., Ortega-Garcia J., Petrovska-Delacretaz D., Reynolds D.A.: A tutorial on text-independent speaker verification. Eurasip J. Appl. Signal Process. 4, 430–451 (2004)
Kinnunen T., Li H.: An overiew of text-independent speaker recognition: from features to supervectors. Speech Commun. 52, 12–40 (2010)
Raj, B., Stern, R.: Missing-feature approaches in speech recognition. In: IEEE Signal Processing Magazine (2005)
Raj, B., Seltzer, M., Stern, R.M.: Reconstruction of MFs for robust speech recognition. Speech Commun. 43, 275–296 (2004)
El-Maliki, M., Drygajlo, A.: Integration and imputation methods for unreliable feature compensation in GMM based speaker verification. In: Speaker Recognition Workshop Odyssey, Crete, Greece (2001)
El-Maliki M., Drygajlo A.: Missing Features Detection and Handling for Robust Speaker Verification. Eurospeech, Budapest (1999)
Demange S., Cerisara C., Haton J.-P.: Accurate Marginalization Range for Missing Data Recognition in Interspeech. Interspeech, Antwerp (2007)
Padilla M., Quatieri T., Reynolds D.: MF Theory with Soft Spectral Subtraction for Speaker Verification. Interspeech, Pittsburgh (2006)
Ming J., Hazen T., Glass J.R., Reynolds D.A.: Robust speaker recognition in noisy conditions. IEEE Trans. Speech Audio Process. 15, 1711–1723 (2007)
Pullella, D., Kuhne, M., Togneri, R.: Robust speaker identification using combined feature selection and missing data recognition. In: ICASSP (2008)
Cerisara C., Demange S., Haton J.-P.: On noise masking for automatic missing data speech recognition: a survey and discussion. Comput Speech Lang 21(3), 443–457 (2007)
Drygajlo, A., El-Maliki, M.: Speaker verification in noisy enviroments with combined spectral subtraction and MF theory. In: Signal Processing Laboratory, Swiss Federal Institute of Technology at Lausanne (1998)
Shao, Y., Wang, D.: Robust speaker recognition using binary time-frequency masks. In: ICASSP (2006)
Seltzer, M., Raj, B., Stern, R.M.: A Bayesian classifier for spectrographic mask estimation for MF speech recognition. Speech Commun. 43, 379–393 (2004)
Reynolds D.A., Quatieri T.F., Dunn R.B.: Speaker verification using adapted gaussian mixture models. Digit Signal Process 10, 19–41 (2000)
Talkin D.: “A Robust Algorithm for Pitch Tracking (RAPT)”, Speech Coding and Synthesis. Elsevier, Amsterdam (1995)
Duin, R.P.W., Juszczak, P., Paclik, P., Pekalska, E., de Ridder, D., Tax, D.M.J., Verzakov, S.: “PRTools4 A Matlab Toolbox for Pattern Recognition”, Version 4.1, Delft Pattern Recognition Research Faculty EWI—ICT, http://prtools.org/ (2007)
Zilca R., Kingsbury B., Navratil J., Ramaswamy G.: Pseudo Pitch Synchronous Analysis of Speech with Applications to Speaker Recognition. In: IEEE Trans. Audio Speech Lang. Process. 14, 467–478 (2006)
Ortega J., Gonzalez J., Marrero V.: AHUMADA: A Large Speech Corpus in Spanish for Speaker Characterization and Identification. Speech Commun. 31, 255–264 (2000)
Drygajlo A., El-Maliki M.: Speaker Verification in Missing Features Detection and Handling for Robust Speaker Verification. EUROSPEECH, Budapest (1999)
Davis G.M.: Noise Reduction in Speech Applications. CRC PRESS LLC, New York (2002)
Krishnamurthy N., Hansen J.H.L.: Babble noise: modeling, analysis, and applications. In: IEEE Trans. Audio Speech Lang. Process. 17(7), 1394–1407 (2009)
Besacier L., Bonastre J.-F.: Subband architecture for automatic speaker recognition. Signal Process. 80, 1245–1259 (2000)
Besacier L., Bonastre J.F., Fredouille C.: Localization and selection of speaker-specific information with statistical modeling. Speech Commun. 31, 89–106 (2000)
Morris, A.C., Green, P.M.: Some solutions to the missing feature problem in data classification with application to noise robust ASR. In: ICASSP, pp. 737–740 (1998)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ribas González, D., Calvo de Lara, J.R. Feature classification criterion for missing features mask estimation in robust speaker recognition. SIViP 8, 365–375 (2014). https://doi.org/10.1007/s11760-012-0299-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-012-0299-z