Abstract
Retrieving pertinent parts of a meeting or a conversation recording can help for automatic summarization or indexing of the document. In this paper, we deal with an original task, almost never presented in the literature, which consists in automatically extracting questions utterances from a recording. In a first step, we have tried to develop and evaluate a question extraction system which uses only acoustic parameters and does not need any textual information from a speech-to-text automatic recognition system (called ASR system for Automatic Speech Recognition in the speech processing domain) output. The parameters used are extracted from the intonation curve of the speech utterance and the classifier is a decision tree. Our first experiments on French meeting recordings lead to approximately 75% classification rate. An experiment in order to find the best set of acoustic parameters for this task is also presented in this paper. Finally, data analysis and experiments on another French dialog database show the need of using other cues like the lexical information from an ASR output, in order to improve question detection performance on spontaneous speech.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Ferrer, L., Shriberg, E., Stolcke, A.: A Prosody-Based Approach to End-of-Utterance Detection That Does Not Require Speech Recognition. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Hong Kong, vol. I, pp. 608–611 (2003)
Shriberg, E., Bates, R., Stolcke, A.: A prosody-only decision-tree model for disfluency detection. In: Eurospeech 1997, Rhodes, Greece (1997)
Standfpord, V., Garofolo, J., Galibert, O., Michel, M., Laprun, C.: The NIST Smart Space and Meeting Room Projects: Signal, Acquisition, Annotation and Metrics. In: Proc of ICASSP 2003, Hong-Kong, China, Mai (2003)
Wang, D., Lu, L., Zhang, H.J.: Speech Segmentation Without Speech Recognition. In: IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), April 2003, vol. I, pp. 468–471 (2003)
Mana, N., Burger, S., Cattoni, R., Besacier, L., Maclaren, V., McDonough, J., Metze, F.: The NESPOLE! VoIP Multilingual Corpora in Tourism and Medical Domains. In: Eurospeech 2003, Geneva, September 1-4 (2003)
Marquez, L.: Machine learning and Natural Language processing, Technical Report LSI-00-45-R, Universitat Politechnica de Catalunya (2000)
Witten, I.H., Frank, E.: Data mining: Pratical machine learning tools and techniques with Java implementations. Morgan Kaufmann, San Francisco (1999)
Besacier, L., Bonastre, J.F., Fredouille, C.: Localization and selection of speaker-specific information with statistical modeling. Speech Communication 31, 89–106 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Quang, V.M., Castelli, E., Yên, P.N. (2006). A Decision Tree-Based Method for Speech Processing: Question Sentence Detection. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds) Fuzzy Systems and Knowledge Discovery. FSKD 2006. Lecture Notes in Computer Science(), vol 4223. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11881599_150
Download citation
DOI: https://doi.org/10.1007/11881599_150
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45916-3
Online ISBN: 978-3-540-45917-0
eBook Packages: Computer ScienceComputer Science (R0)