Abstract
In this paper we investigate the automatic segmentation of recorded telephone conversations based on models for speech and non-speech to find sentence-like chunks for use in speech recognition systems. Presented are two different approaches, based on Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs), respectively. The proposed methods provide segmentations that allow for competitive speech recognition performance in terms of word error rate (WER) compared to manual segmentation.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
References
IARPA: IARPA, Office for Incisive Analysis, Babel Program, http://www.iarpa.gov/Programs/ia/Babel/babel.html (retrieved March 06, 2013)
Stüker, S., Fügen, C., Kraft, F., Wölfel, M.: The ISL 2007 English Speech Transcription System for European Parliament Speeches. In: Proceedings of the 10th European Conference on Speech Communication and Technology (INTERSPEECH 2007), Antwerp, Belgium, pp. 2609–2612 (August 2007)
Soltau, H., Metze, F., Fügen, C., Waibel, A.: A One-pass Decoder Based on Polymorphic Linguistic Context Assignment. In: ASRU (2001)
Yu, H., Tam, Y.C., Schaaf, T., Stüker, S., Jin, Q., Noamany, M., Schultz, T.: The ISL RT04 Mandarin Broadcast News Evaluation System. In: EARS Rich Transcription Workshop (2004)
Kraft, F., Malkin, R., Schaaf, T., Waibel, A.: Temporal ICA for Classification of Acoustic Events in a Kitchen Environment. In: INTERSPEECH, Lisbon, Portugal (2005)
Beyerlein, P., Aubert, X., Haeb-Umbach, R., Harris, M., Klakow, D., Wendemuth, A., Molau, S., Ney, H., Pitz, M., Sixtus, A.: Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach. Speech Communication 37(12), 109–131 (2002)
Matsoukas, S., Gauvain, J., Adda, G., Colthurst, T., Kao, C.L., Kimball, O., Lamel, L., Lefevre, F., Ma, J., Makhoul, J., Nguyen, L., Prasad, R., Schwartz, R., Schwenk, H., Xiang, B.: Advances in Transcription of Broadcast News and Conversational Telephone Speech Within the Combined EARS BBN/LIMSI System. IEEE Transactions on Audio, Speech, and Language Processing 14(5), 1541–1556 (2006)
Enqing, D., Guizhong, L., Yatong, Z., Xiaodi, Z.: Applying Support Vector Machines to Voice Activity Detection. In: 2002 6th International Conference on Signal Processing, vol. 2, pp. 1124–1127 (2002)
Ramirez, J., Yelamos, P., Gorriz, J., Segura, J.: SVM-based Speech Endpoint Detection Using Contextual Speech Features. Electronics Letters 42(7), 426–428 (2006)
Lopes, C., Perdigao, F.: Speech Event Detection Using SVM and NMD. In: 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, pp. 1–4 (2007)
Han, K., Wang, D.: An SVM Based Classification Approach to Speech Separation. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4632–4635 (2011)
Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011)
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 273–297 (1995)
Hsu, C.W., Chang, C.C., Lin, C.J.: A Practical Guide to Support Vector Classification (2010)
Kinnunen, T., Chernenko, E., Tuononen, M., Fränti, P., Li, H.: Voice Activity Detection Using MFCC Features and Support Vector Machine (2007)
Temko, A., Macho, D., Nadeu, C.: Enhanced SVM Training for Robust Speech Activity Detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. IV–1025–IV–1028 (2007)
Rogina, I.: Sprachliche Mensch-Maschine-Kommunikation (2005)
Kilgour, K., Saam, C., Mohr, C., Stüker, S., Waibel, A.: The 2011 KIT Quaero Speech-to-text System for Spanish. In: IWSLT 2011, pp. 199–205 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Heck, M. et al. (2013). Segmentation of Telephone Speech Based on Speech and Non-speech Models. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-319-01931-4_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)