Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information

Pati, Debadatta; Prasanna, S. R. Mahadeva

doi:10.1007/s10772-010-9087-8

Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information

Published: 21 December 2010

Volume 14, pages 49–64, (2011)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Speech Technology Aims and scope Submit manuscript

Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information

Download PDF

Debadatta Pati¹ &
S. R. Mahadeva Prasanna¹

348 Accesses
22 Citations
Explore all metrics

Abstract

This work processes linear prediction (LP) residual in the time domain at three different levels, extracts speaker information, and demonstrates their significance and also different nature for text-independent speaker recognition. The subsegmental analysis considers LP residual in blocks of 5 msec with shift of 2.5 msec to extract speaker information. The segmental analysis extracts speaker information by processing in blocks of 20 msec with shift of 2.5 msec. The suprasegmental speaker information is extracted by viewing in blocks of 250 msec with shift of 6.25 msec. The speaker identification and verification studies performed using NIST-99 and NIST-03 databases demonstrate that the segmental analysis provides best performance followed by subsegmental analysis. The suprasegmental analysis gives the least performance. However, the evidences from all the three levels of processing seem to be different and combine well to provide improved performance, demonstrating different speaker information captured at each level of processing. Finally, the combined evidence from all the three levels of processing together with vocal tract information further improves the speaker recognition performance.

Article PDF

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Article 24 February 2015

Robust Methods for Text-Dependent Speaker Verification

Article 03 May 2019

Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA

Article 12 October 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Ananthapadmanabha, T. V., & Yegnanarayana, B. (1979). Epoch extraction from linear prediction residual for identification of closed glottis interval. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-27, 309–319.
Article Google Scholar
Atal, B. S. (1972). Automatic speaker recognition based on pitch contours. The Journal of the Acoustical Society of America, 52(6), 1687–1697.
Article Google Scholar
Atal, B. S. (1974). Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. The Journal of the Acoustical Society of America, 55(6), 1304–1312.
Article Google Scholar
Cohen, L. (1995). Time-frequency analysis: theory and application. Signal processing series. Englewood Cliffs: Prentice Hall.
Google Scholar
Davis, S. B., & Mermelstein, P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech, and Signal Processing, 28(28), 357–366.
Article Google Scholar
Ezzaidi, H., & Rouat, J. (2004). Pitch and MFCC dependent GMM models for speaker identification systems. In IEEE int. conf. on electrical and computer eng.: Vol. 1.
Google Scholar
Farrus, M., & Hernando, J. (2009). Using jitter and shimmer in speaker verification. Signal Processing, 3(4), 247–257.
Google Scholar
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, and Signal Processing, 29(2), 254–272.
Article Google Scholar
Hayakawa, S., Takeda, K., & Itakura, F. (1997). Speaker identification using harmonic structure of lp-residual spectrum. In Lecture notes in computer science: Vol. 1206. Audio- and video-based biometric personal authentification (pp. 253–260). Berlin: Springer.
Chapter Google Scholar
Huang, W., Chao, J., & Zhang, Y. (2008). Combination of pitch and MFCC GMM supervectors for speaker verification. In IEEE int. conf. on audio, language and image process (ICALIP) (pp. 1335–1339).
Chapter Google Scholar
Jankowski, C., Kalyanswamy, A., Basson, S., & Spitz, J. (1990). NTIMIT: A phonetically balanced, continuous speech, telephone bandwidth speech database. In Int. conf. on acoust. speech and signal process. (ICASSP), Albuquerque, NM (pp. 109–112).
Chapter Google Scholar
Makhoul, J. (1975). Linear prediction: A tutorial review. Proceedings of the IEEE, 63(4), 561–580.
Article Google Scholar
Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proc. Eur. conf. on speech communication technology, Rhodes, Greece, Vol. 4 (pp. 1895–1898).
Google Scholar
Mary, L., & Yegnanarayana, B. (2008). Extraction and representation of prosodic features for language and speaker recognition. Speech Communication, 50, 782–796.
Article Google Scholar
Mashao, D. J., & Skosan, M. (2006). Combining classifier decisions for robust speaker identification. Pattern Recognition, 39, 147–155.
Article Google Scholar
Murthy, K. S. R., & Yegnanarayana, B. (2008). Epoch extraction from speech signal. IEEE Transactions on Speech and Audio Processing, 16(8), 1602–1613.
Article Google Scholar
Murthy, K. S. R., & Yegnanarayana, B. (2009). Characterization of glottal activity from speech signal. IEEE Signal Processing Letters, 16(6), 469–472.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2006). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55.
Article Google Scholar
Murty, K. S. R., Prasanna, S. R. M., & Yegnanarayana, B. (2004). Speaker specific information from residual phase. In Int. conf. on signal proces. and comm. (SPCOM).
Google Scholar
Nist speaker recognition evaluation plan (2003). In: Proc. NIST speaker recognition workshop, College Park, MD.
Pati, D., & Prasanna, S. R. M. (2010). Speaker information from subband energies of linear prediction residual. In Proc. NCC (pp. 1–4).
Google Scholar
Plumpe, M. D., Quatieri, T. F., & Reynolds, D. A. (1999). Modelling of glottal flow derivative waveform with application to speaker identification. IEEE Transactions on Speech and Audio Processing, 7(5), 569–586.
Article Google Scholar
Prasanna, S. R. M., Gupta, C. S., & Yegnanarayana, B. (2006). Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication, 48, 1243–1261.
Article Google Scholar
Przybocky, M., & Martin, A. (2000). The NIST-1999 speaker recognition evaluation- an overview. Digital Signal Processing, 10, 1–18.
Article Google Scholar
Reynolds, D. A. (1994). Experimental evaluation of features for robust speaker identification. IEEE Transactions on Speech and Audio Processing, 2(4), 639–643.
Article Google Scholar
Reynolds, D. A. (1995). Speaker identification and verification using gaussian mixture speaker models. Speech Communication, 17, 91–108.
Article Google Scholar
Reynolds, D. A., & Rose, R. C. (1995). Robust text-independent speaker identification using gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing, 3(1), 4–17.
Article Google Scholar
Sonmez, K., Shriberg, E., Heck, L., & Weintraub, M. (1998). Modeling dynamic prosodic variation for speaker verification. In Proc. ICSLP’ 98: Vol. 7 (pp. 3189–3192).
Google Scholar
Thevenaz, P., & Hugli, H. (1995). Usefulness of the LPC-residue in text-independent speaker verification. Speech Communication, 17, 145–157.
Article Google Scholar
Wolf, J. J. (1972). Efficient acoustic parameters for speaker recognition. The Journal of the Acoustical Society of America, 51(2), 2044–2055.
Article Google Scholar
Yegnanarayana, B., & Prasanna, S. R. M. (2010). Analysis of instantaneous F0 contours from two speakers mixed signal using zero frequency filtering. In Int. conf. on acoust. speech and signal process. (ICASSP), Dallas, Texas, USA (pp. 5074–5077).
Google Scholar
Yegnanarayana, B., Reddy, K. S., & Kishore, S. P. (2001). Source and system feature for speaker recognition using AANN models. In Proc. IEEE int. con. acoust. speech and signal processing, Salt Lake City, UT, USA, May 2001 (pp. 409–412).
Google Scholar
Yegnanarayana, B., Prasanna, S. R. M., Zachariah, J. M., & Gupta, C. S. (2005). Combining evidences from source, suprasegmental and spectral features for fixed-text speaker verification study. IEEE Transactions on Speech and Audio Processing, 13(4), 575–582.
Article Google Scholar
Yegnenarayana, B., & Murthy, K. S. R. (2009). Event based instantaneous fundamental frequency estimation from speech signals. IEEE Transactions on Audio, Speech and Language Processing, 17(4), 614–624.
Article Google Scholar
Zheng, N., Lee, T., & Ching, P. C. (2007). Integration of complimentary acoustic features for speaker recognition. IEEE Signal Processing Letters, 14(3), 181–184.
Article Google Scholar
Zue, V., Seneff, S., & Glassa, J. (1990). Speech database development at MIT: Timit and beyond. Speech Communication, 9(4), 351–356.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Communication Engineering, Indian Institute of Technology Guwahati, Guwahati, 781039, India
Debadatta Pati & S. R. Mahadeva Prasanna

Authors

Debadatta Pati
View author publications
You can also search for this author in PubMed Google Scholar
S. R. Mahadeva Prasanna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. R. Mahadeva Prasanna.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pati, D., Prasanna, S.R.M. Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information. Int J Speech Technol 14, 49–64 (2011). https://doi.org/10.1007/s10772-010-9087-8

Download citation

Received: 27 August 2010
Accepted: 30 November 2010
Published: 21 December 2010
Issue Date: March 2011
DOI: https://doi.org/10.1007/s10772-010-9087-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information

Abstract

Article PDF

Similar content being viewed by others

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Robust Methods for Text-Dependent Speaker Verification

Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Subsegmental, segmental and suprasegmental processing of linear prediction residual for speaker information

Abstract

Article PDF

Similar content being viewed by others

Processing of linear prediction residual in spectral and cepstral domains for speaker information

Robust Methods for Text-Dependent Speaker Verification

Noise robust speaker verification via the fusion of SNR-independent and SNR-dependent PLDA

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation