Segmentation of Telephone Speech Based on Speech and Non-speech Models

Heck, Michael; Mohr, Christian; Stüker, Sebastian; Müller, Markus; Kilgour, Kevin; Gehring, Jonas; Nguyen, Quoc Bao; Nguyen, Van Huy; Waibel, Alex

doi:10.1007/978-3-319-01931-4_38

Michael Heck²²,
Christian Mohr²²,
Sebastian Stüker²²,
Markus Müller²²,
Kevin Kilgour²²,
Jonas Gehring²²,
Quoc Bao Nguyen²²,
Van Huy Nguyen²² &
…
Alex Waibel²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8113))

Included in the following conference series:

International Conference on Speech and Computer

1229 Accesses
2 Citations

Abstract

In this paper we investigate the automatic segmentation of recorded telephone conversations based on models for speech and non-speech to find sentence-like chunks for use in speech recognition systems. Presented are two different approaches, based on Gaussian Mixture Models (GMMs) and Support Vector Machines (SVMs), respectively. The proposed methods provide segmentations that allow for competitive speech recognition performance in terms of word error rate (WER) compared to manual segmentation.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Automatic Speech Recognition Based on Clustering Technique

A Phonetic Segmentation Procedure Based on Hidden Markov Models

Building Automatic Speech Recognition Systems for Moroccan Dialect: A Phoneme-Based Approach

Article 25 July 2024

Keywords

References

IARPA: IARPA, Office for Incisive Analysis, Babel Program, http://www.iarpa.gov/Programs/ia/Babel/babel.html (retrieved March 06, 2013)
Stüker, S., Fügen, C., Kraft, F., Wölfel, M.: The ISL 2007 English Speech Transcription System for European Parliament Speeches. In: Proceedings of the 10th European Conference on Speech Communication and Technology (INTERSPEECH 2007), Antwerp, Belgium, pp. 2609–2612 (August 2007)
Google Scholar
Soltau, H., Metze, F., Fügen, C., Waibel, A.: A One-pass Decoder Based on Polymorphic Linguistic Context Assignment. In: ASRU (2001)
Google Scholar
Yu, H., Tam, Y.C., Schaaf, T., Stüker, S., Jin, Q., Noamany, M., Schultz, T.: The ISL RT04 Mandarin Broadcast News Evaluation System. In: EARS Rich Transcription Workshop (2004)
Google Scholar
Kraft, F., Malkin, R., Schaaf, T., Waibel, A.: Temporal ICA for Classification of Acoustic Events in a Kitchen Environment. In: INTERSPEECH, Lisbon, Portugal (2005)
Google Scholar
Beyerlein, P., Aubert, X., Haeb-Umbach, R., Harris, M., Klakow, D., Wendemuth, A., Molau, S., Ney, H., Pitz, M., Sixtus, A.: Large Vocabulary Continuous Speech Recognition of Broadcast News - The Philips/RWTH Approach. Speech Communication 37(12), 109–131 (2002)
Article MATH Google Scholar
Matsoukas, S., Gauvain, J., Adda, G., Colthurst, T., Kao, C.L., Kimball, O., Lamel, L., Lefevre, F., Ma, J., Makhoul, J., Nguyen, L., Prasad, R., Schwartz, R., Schwenk, H., Xiang, B.: Advances in Transcription of Broadcast News and Conversational Telephone Speech Within the Combined EARS BBN/LIMSI System. IEEE Transactions on Audio, Speech, and Language Processing 14(5), 1541–1556 (2006)
Article Google Scholar
Enqing, D., Guizhong, L., Yatong, Z., Xiaodi, Z.: Applying Support Vector Machines to Voice Activity Detection. In: 2002 6th International Conference on Signal Processing, vol. 2, pp. 1124–1127 (2002)
Google Scholar
Ramirez, J., Yelamos, P., Gorriz, J., Segura, J.: SVM-based Speech Endpoint Detection Using Contextual Speech Features. Electronics Letters 42(7), 426–428 (2006)
Article Google Scholar
Lopes, C., Perdigao, F.: Speech Event Detection Using SVM and NMD. In: 9th International Symposium on Signal Processing and Its Applications, ISSPA 2007, pp. 1–4 (2007)
Google Scholar
Han, K., Wang, D.: An SVM Based Classification Approach to Speech Separation. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4632–4635 (2011)
Google Scholar
Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011)
Google Scholar
Cortes, C., Vapnik, V.: Support-Vector Networks. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Hsu, C.W., Chang, C.C., Lin, C.J.: A Practical Guide to Support Vector Classification (2010)
Google Scholar
Kinnunen, T., Chernenko, E., Tuononen, M., Fränti, P., Li, H.: Voice Activity Detection Using MFCC Features and Support Vector Machine (2007)
Google Scholar
Temko, A., Macho, D., Nadeu, C.: Enhanced SVM Training for Robust Speech Activity Detection. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2007, vol. 4, pp. IV–1025–IV–1028 (2007)
Google Scholar
Rogina, I.: Sprachliche Mensch-Maschine-Kommunikation (2005)
Google Scholar
Kilgour, K., Saam, C., Mohr, C., Stüker, S., Waibel, A.: The 2011 KIT Quaero Speech-to-text System for Spanish. In: IWSLT 2011, pp. 199–205 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Anthropomatics, Karlsruhe Institute of Technology, Germany
Michael Heck, Christian Mohr, Sebastian Stüker, Markus Müller, Kevin Kilgour, Jonas Gehring, Quoc Bao Nguyen, Van Huy Nguyen & Alex Waibel

Authors

Michael Heck
View author publications
You can also search for this author in PubMed Google Scholar
Christian Mohr
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Stüker
View author publications
You can also search for this author in PubMed Google Scholar
Markus Müller
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Kilgour
View author publications
You can also search for this author in PubMed Google Scholar
Jonas Gehring
View author publications
You can also search for this author in PubMed Google Scholar
Quoc Bao Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Van Huy Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Alex Waibel
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Cybernetics, University of West Bohemia, Univerzitní 8, 306 14, Plzeň, Czech Republic
Miloš Železný
University of West Bohemia, 306 14, Pilsen, Czech Republic
Ivan Habernal
Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation for the Russian Academy of Sciences, 14-th line, 39, 199178, St. Petersburg, Russia
Andrey Ronzhin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heck, M. et al. (2013). Segmentation of Telephone Speech Based on Speech and Non-speech Models. In: Železný, M., Habernal, I., Ronzhin, A. (eds) Speech and Computer. SPECOM 2013. Lecture Notes in Computer Science(), vol 8113. Springer, Cham. https://doi.org/10.1007/978-3-319-01931-4_38

Download citation

DOI: https://doi.org/10.1007/978-3-319-01931-4_38
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-01930-7
Online ISBN: 978-3-319-01931-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Segmentation of Telephone Speech Based on Speech and Non-speech Models

Abstract

Chapter PDF

Similar content being viewed by others

Automatic Speech Recognition Based on Clustering Technique

A Phonetic Segmentation Procedure Based on Hidden Markov Models

Building Automatic Speech Recognition Systems for Moroccan Dialect: A Phoneme-Based Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Segmentation of Telephone Speech Based on Speech and Non-speech Models

Abstract

Chapter PDF

Similar content being viewed by others

Automatic Speech Recognition Based on Clustering Technique

A Phonetic Segmentation Procedure Based on Hidden Markov Models

Building Automatic Speech Recognition Systems for Moroccan Dialect: A Phoneme-Based Approach

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation