Abstract
This study analyzes the effect of degradation on human and automatic speaker verification (SV) tasks. The perceptual test is conducted by the subjects having knowledge about speaker verification. An automatic SV system is developed using the Mel-frequency cepstral coefficients (MFCC) and Gaussian mixture model (GMM). The human and automatic speaker verification performances are compared for clean train and different degraded test conditions. Speech signals are reconstructed in clean and degraded conditions by highlighting different speaker specific information and compared through perceptual test. The perceptual cues that the human subjects used as speaker specific information are investigated and their importance in degraded condition is highlighted. The difference in the nature of human and automatic SV tasks is investigated in terms of falsely accepted and falsely rejected speech pairs. Speech signals are reconstructed in clean and degraded conditions by highlighting different speaker specific information and compared through perceptual test. A discussion on human vs automatic speaker verification is carried out and the possibility of performance improvement of automatic speaker verification under degraded condition is suggested.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Alexandera, A., Bottib, F., Dessimozb, D., & Drygajlo, A. (2004). The effect of mismatched recording conditions on human and automatic speaker recognition in forensic applications. In Forensic Science International (pp. 95–99).
Auckenthaler, R., Carey, M., & Thomas, H. L. (2000). Score normalization for text-independent speaker verification systems. Digital Signal Processing, 10(1), 42–54.
Boll, S. F. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-27, 113–120.
Campbell, J. P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85(9), 1437–1462.
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-28(4), 357–366.
Haris, B. C., Pradhan, G., Misra, A., Shukla, S., Sinha, R., & Prasanna, S. R. M. (2011). Multi-variability speech database for robust speaker recognition. In National conf. on communication (NCC), Bangalore, India (pp. 1–5).
Hogg, R. V., & Ledolter, J. (1987). Engineering statistics. New York: Macmillan.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.
Kreiman, J., & Papcun, G. (1991). Comparing discrimination and recognition of unfamiliar voices. Speech Communication, 10, 265–275.
Ming, J., Hazen, T. J., Glass, J. R., & Reynolds, D. A. (2007). Robust speaker recognition in noisy conditions. IEEE Transactions on Audio, Speech, and Language Processing, 15(5), 1711–1723.
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and mfcc features for speaker recognition. IEEE Signal Processing Letters 13(1), 52–55.
Murty, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signals. IEEE Transactions on Audio, Speech, and Language Processing, 16, 1602–1613.
Murty, K. S. R., Yegnanarayana, B., & Joseph, M. A. (2009). Characterization of glottal activity from speech signals. IEEE Signal Processing Letters, 16(6), 469–472.
Nielsen, A. S., & Crystal, T. H. (1998). Human vs. machine speaker identification with telephone speech. In Inter. conf. on spoken language processing, Sydney, Australia (pp. 221–224).
Nielsen, A. S., & Crystal, T. H. (2000). Speaker verification by human listeners: Experiments comparing human and machine performance using the NIST 1998 speaker evaluation data. Digital Signal Processing, 249–266.
Nielsen, A. S., & Stern, K. R. (1986). Recognition of previously unfamiliar speakers as a function of narrowband processing and speaker selection. The Journal of the Acoustical Society of America, 79, 1174–1177.
NIST (2003). NIST-speaker recognition evaluations. In [Online], Available: http://www.nist.gov/speech/tests/spk.
Pelecanos, J., & Sridharan, S. (2001). Feature warping for robust speaker verification. In Speaker Odessy: the speaker recognition workshop (pp. 213–218).
Prasanna, S. R. M., & Pradhan, G. (2011 in press). Significance of vowel-like regions for speaker verification under degraded condition. IEEE Transactions on Audio, Speech, and Language Processing.
Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.
Teunen, R., Shahshahani, B., & Heck, L. P. (2000). A model-based transformation approach to robust speaker recognition. In Proc. int. conf. on spoken language processing. Beijing, China (Vol. 2, pp. 495–498).
Wang, N., Ching, P. C., Zheng, N., & Lee, T. (2011). Robust speaker recognition using denoised vocal source and vocal tract feature. IEEE Transactions on Audio, Speech, and Language Processing, 19(1), 196–205.
Wu, W., Zheng, T. F., Xu, M., & Soong, F. K. (2007). A cohort-based speaker model synthesis for mismatched channels in speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 15(6), 1893–1903.
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, S. (2005). Combining evidence from source suprasegmental and spectral features for a fixed-text speaker verification system. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Pradhan, G., Prasanna, S.R.M. Speaker verification under degraded condition: a perceptual study. Int J Speech Technol 14, 405–417 (2011). https://doi.org/10.1007/s10772-011-9120-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-011-9120-6