Speaker verification under degraded condition: a perceptual study

Pradhan, Gayadhar; Prasanna, S. R. Mahadeva

doi:10.1007/s10772-011-9120-6

Speaker verification under degraded condition: a perceptual study

Published: 13 October 2011

Volume 14, pages 405–417, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Speech Technology Aims and scope Submit manuscript

Speaker verification under degraded condition: a perceptual study

Download PDF

Gayadhar Pradhan¹ &
S. R. Mahadeva Prasanna¹

258 Accesses
16 Citations
Explore all metrics

Abstract

This study analyzes the effect of degradation on human and automatic speaker verification (SV) tasks. The perceptual test is conducted by the subjects having knowledge about speaker verification. An automatic SV system is developed using the Mel-frequency cepstral coefficients (MFCC) and Gaussian mixture model (GMM). The human and automatic speaker verification performances are compared for clean train and different degraded test conditions. Speech signals are reconstructed in clean and degraded conditions by highlighting different speaker specific information and compared through perceptual test. The perceptual cues that the human subjects used as speaker specific information are investigated and their importance in degraded condition is highlighted. The difference in the nature of human and automatic SV tasks is investigated in terms of falsely accepted and falsely rejected speech pairs. Speech signals are reconstructed in clean and degraded conditions by highlighting different speaker specific information and compared through perceptual test. A discussion on human vs automatic speaker verification is carried out and the possibility of performance improvement of automatic speaker verification under degraded condition is suggested.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Alexandera, A., Bottib, F., Dessimozb, D., & Drygajlo, A. (2004). The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. In Forensic Science International (pp. 95–99).
Google Scholar
Auckenthaler, R., Carey, M., & Thomas, H. L. (2000). Score normalization for text-independent speaker verification systems. Digital Signal Processing, 10(1), 42–54.
Article Google Scholar
Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-27, 113–120.
Article Google Scholar
Campbell, J. P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462.
Article Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-28(4), 357–366.
Article Google Scholar
Haris, B. C., Pradhan, G., Misra, A., Shukla, S., Sinha, R., & Prasanna, S. R. M. (2011). Multi-variability speech database for robust speaker recognition. In National conf. on communication (NCC), Bangalore, India (pp. 1–5).
Chapter Google Scholar
Hogg, R. V., & Ledolter, J. (1987). Engineering statistics. New York: Macmillan.
Google Scholar
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.
Article Google Scholar
Kreiman, J., & Papcun, G. (1991). Comparing discrimination and recognition of unfamiliar voices. Speech Communication, 10, 265–275.
Article Google Scholar
Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1711–1723.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and mfcc features for speaker recognition. IEEE Signal Processing Letters 13(1), 52–55.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613.
Article Google Scholar
Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.
Article Google Scholar
Nielsen, A. S., & Crystal, T. H. (1998). Human vs. machine speaker identification with telephone speech. In Inter. conf. on spoken language processing, Sydney, Australia (pp. 221–224).
Google Scholar
Nielsen, A. S., & Crystal, T. H. (2000). Speaker verification by human listeners: Experiments comparing human and machine performance using the NIST 1998 speaker evaluation data. Digital Signal Processing, 249–266.
Nielsen, A. S., & Stern, K. R. (1986). Recognition of previously unfamiliar speakers as a function of narrowband processing and speaker selection. The Journal of the Acoustical Society of America, 79, 1174–1177.
Article Google Scholar
NIST (2003). NIST-speaker recognition evaluations. In [Online], Available: http://www.nist.gov/speech/tests/spk.
Pelecanos, J., & Sridharan, S. (2001). Feature warping for robust speaker verification. In Speaker Odessy: the speaker recognition workshop (pp. 213–218).
Google Scholar
Prasanna, S. R. M., & Pradhan, G. (2011 in press). Significance of vowel-like regions for speaker verification under degraded condition. IEEE Transactions on Audio, Speech, and Language Processing.
Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.
Article Google Scholar
Teunen, R., Shahshahani, B., & Heck, L. P. (2000). A model-based transformation approach to robust speaker recognition. In Proc. int. conf. on spoken language processing. Beijing, China (Vol. 2, pp. 495–498).
Google Scholar
Wang, N., Ching, P. C., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract feature. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 196–205.
Article Google Scholar
Wu, W., Zheng, T. F., Xu, M., & Soong, F. K. (2007). A cohort-based speaker model synthesis for mismatched channels in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 15(6), 1893–1903.
Article Google Scholar
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, S. (2005). Combining evidence from source suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Gayadhar Pradhan & S. R. Mahadeva Prasanna

Authors

Gayadhar Pradhan
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. R. Mahadeva Prasanna.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pradhan, G., Prasanna, S.R.M. Speaker verification under degraded condition: a perceptual study. Int J Speech Technol 14, 405–417 (2011). https://doi.org/10.1007/s10772-011-9120-6

Download citation

Received: 04 July 2011
Accepted: 28 September 2011
Published: 13 October 2011
Issue Date: December 2011
DOI: https://doi.org/10.1007/s10772-011-9120-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Speaker verification under degraded condition: a perceptual study

Abstract

Article PDF

Similar content being viewed by others

Speaker Verification from Codec-Distorted Speech Through Combination of Affine Transform and Feature Switching

Robust Methods for Text-Dependent Speaker Verification

Closed-set speaker identification using VQ and GMM based models

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speaker verification under degraded condition: a perceptual study

Abstract

Article PDF

Similar content being viewed by others

Speaker Verification from Codec-Distorted Speech Through Combination of Affine Transform and Feature Switching

Robust Methods for Text-Dependent Speaker Verification

Closed-set speaker identification using VQ and GMM based models

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation