Abstract
The spontaneous speech processing has a number of problems. Among them there are speech disfluencies. Although most of them are easily treated by speakers and usually do not cause any difficulties for understanding, for Automatic Speech Recognition (ASR) systems their appearance lead to many recognition mistakes. Our paper deals with the most frequent of them (filled pauses and sound lengthenings) basing on the analysis of their acoustical parameters. The method based on the autocorrelation function was used to detect voiced hesitation phenomena and a method of band-filtering was used to detect unvoiced hesitation phenomena. For the experiments on filled pauses and lengthenings detection an especially collected corpus of spontaneous Russian map-task and appointment-task dialogs was used. The accuracy of voiced filled pauses and lengthening detection was 80%. And accuracy of detection of unvoiced fricative lengthening was 66%.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
References
Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Communication 56, 213–228 (2014)
Kipyatkova, I.S.: Software complex for recognition and processing of Russian conversa-tional speech. Information-control systems 53, 53–59 (2011) (in Rus)
Podlesskaya, V.I., Kibrik, A.A.: Speech disfluencies and their reflection in discourse transcription. In: Proc. of VII International Conference on Cognitive Modelling in Linguistics, pp. 194–204 (2004)
Clark, H.H., Fox Tree, J.E.: Using uh and um in spontaneous speaking. Cognition 84, 73–111 (2002)
Verkhodanova, V.O., Karpov, A.A.: Speech disfluencies modeling in the automatic speech recognition systems The Bulletin of University of Tomsk 363, 10–15 (2012) (in Rus.)
Kipyatkova, I., Karpov, A., Verkhodanova, V., Zelezny, M.: Analysis of Long-distance Word Dependencies and Pronunciation Variability at Conversational Russian Speech Recognition. In: Proc. of Federated Conference on Computer Science and Information Systems, pp. 719–725 (2012)
Veiga, A., Candeias, S., Lopes, C., Perdigao, F.: Characterization of hesitations using acoustic models. In: Proc. of 17th International Congress of Phonetic Sciences, pp. 2054–2057 (2011)
Liu, Y., Shriberg, E., Stolcke, A., et al.: Enriching Speech Recognition with Automatic Detection of Sentence Boundaries and Disfluencies. IEEE Transactions on Audio, Speech and Language Processing 1(5), 1526–1540 (2006)
Verkhodanova, V.O.: Algorithms and Software for Automatic Detection of Speech Disfluencies in an Audio Signal. SPIIRAS Proceedings 31, 43–60 (2013)
Lease, M., Johnson, M., Charniak, E.: Recognizing disfluencies in conversational speech. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1566–1573 (2006)
Kaushik, M., Trinkle, M., Hashemi-Sakhtsari, A.: Automatic Detection and Removal of Disfluencies from Spontaneous Speech. In: Proc. of 13th Australasian International Conference on Speech Science and Technology, pp. 98–101 (2010)
Liu, Y.: Structural Event Detection for Rich Transcription of Speech. PhD thesis, Purdue University and ICSI, Berkeley (2004)
Masataka, G., Katunobu, I., Satoru, H.: A Real-time Filled Pause Detection System for Spontaneous Speech Recognition. In: Proc. of 6th European Conference on Speech Communication and Technology, pp. 227–230 (1999)
Medeiros, R.B., Moniz, G.S., Batista, M.M., Trancoso, I., Nunes, L.: Disfluency Detection Based on Prosodic Features for University Lectures. In: Proc. of 14th Annual Conference of the International Speech Communication Association, pp. 2629–2633 (2013)
Corpus “Czech Broadcast Conversation MDE Transcripts”, LDC., http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2009T20 (accessed January 5, 2014)
Corpus “Czech Broadcast Conversation Speech”, LDC., http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2009S02 (accessed January 5, 2014)
Kolar, J., Svec, J., Strassel, S., et al.: Czech Spontaneous Speech Corpus with Structural Metadata. In: Proc. of 9th European Conference on Speech Communication and Technology, pp. 1165–1168 (2005)
Verkhodanova, V., Shapranov, V.: Automatic detection of speech disfluencies in the spontaneous russian speech. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS (LNAI), vol. 8113, pp. 70–77. Springer, Heidelberg (2013)
Zemskaya, E.A.: Russian spoken speech: linguistic analysis and the problems of learning, Moscow (1979) (in Rus.)
Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G.M., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H.S., Weinert, R.: The HCRC Map Task Corpus. Language and Speech 34, 351–366 (1991)
Kohler, K.J.: Labelled data bank of spoken standard German: the Kiel corpus of read/spontaneous speech. In: Kohler, K.J. (ed.) Proc. of 4th International Conference on Spoken Language, vol. 3, pp. 1938–1941 (1996)
Wave Assistant, the speech analyzer program by Speech Technology Center, http://www.phonetics.pu.ru/wa/WA_S.EXE (accessed October 6, 2013)
Krivnova, O.F., Chadrin, I.S.: Pausing in the Natural and Synthesized Speech. In: Proc. of Conference on Theory and Practice of Speech Investigations (1999) (in Rus)
Nelson, D.: Correlation based speech formant recovery. In: Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1997, vol. 3, pp. 1643–1646 (1997)
Meshcheryakov, R.M., Kostyuchenko, E., Yu, B.L.N., Choinzonov, E.L.: Structure and database of software for speech quality and intelligibility assessment in the process of rehabilitation after surgery in the treatment of cancers of the oral cavity and oropharynx, maxillofacial area. SPIIRAS Proceedings 32, 116–124 (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Verkhodanova, V., Shapranov, V. (2014). Filled Pauses and Lengthenings Detection Based on the Acoustic Features for the Spontaneous Russian Speech. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_28
Download citation
DOI: https://doi.org/10.1007/978-3-319-11581-8_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)