Filled Pauses and Lengthenings Detection Based on the Acoustic Features for the Spontaneous Russian Speech

Verkhodanova, Vasilisa; Shapranov, Vladimir

doi:10.1007/978-3-319-11581-8_28

Vasilisa Verkhodanova²² &
Vladimir Shapranov²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8773))

Included in the following conference series:

International Conference on Speech and Computer

1316 Accesses

Abstract

The spontaneous speech processing has a number of problems. Among them there are speech disfluencies. Although most of them are easily treated by speakers and usually do not cause any difficulties for understanding, for Automatic Speech Recognition (ASR) systems their appearance lead to many recognition mistakes. Our paper deals with the most frequent of them (filled pauses and sound lengthenings) basing on the analysis of their acoustical parameters. The method based on the autocorrelation function was used to detect voiced hesitation phenomena and a method of band-filtering was used to detect unvoiced hesitation phenomena. For the experiments on filled pauses and lengthenings detection an especially collected corpus of spontaneous Russian map-task and appointment-task dialogs was used. The accuracy of voiced filled pauses and lengthening detection was 80%. And accuracy of detection of unvoiced fricative lengthening was 66%.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech

Detecting Filled Pauses and Lengthenings in Russian Spontaneous Speech Using SVM

Modeling of Filled Pauses and Prolongations to Improve Slovak Spontaneous Speech Recognition

Keywords

References

Karpov, A., Markov, K., Kipyatkova, I., Vazhenina, D., Ronzhin, A.: Large vocabulary Russian speech recognition using syntactico-statistical language modeling. Speech Communication 56, 213–228 (2014)
Article Google Scholar
Kipyatkova, I.S.: Software complex for recognition and processing of Russian conversa-tional speech. Information-control systems 53, 53–59 (2011) (in Rus)
Google Scholar
Podlesskaya, V.I., Kibrik, A.A.: Speech disfluencies and their reflection in discourse transcription. In: Proc. of VII International Conference on Cognitive Modelling in Linguistics, pp. 194–204 (2004)
Google Scholar
Clark, H.H., Fox Tree, J.E.: Using uh and um in spontaneous speaking. Cognition 84, 73–111 (2002)
Article Google Scholar
Verkhodanova, V.O., Karpov, A.A.: Speech disfluencies modeling in the automatic speech recognition systems The Bulletin of University of Tomsk 363, 10–15 (2012) (in Rus.)
Google Scholar
Kipyatkova, I., Karpov, A., Verkhodanova, V., Zelezny, M.: Analysis of Long-distance Word Dependencies and Pronunciation Variability at Conversational Russian Speech Recognition. In: Proc. of Federated Conference on Computer Science and Information Systems, pp. 719–725 (2012)
Google Scholar
Veiga, A., Candeias, S., Lopes, C., Perdigao, F.: Characterization of hesitations using acoustic models. In: Proc. of 17th International Congress of Phonetic Sciences, pp. 2054–2057 (2011)
Google Scholar
Liu, Y., Shriberg, E., Stolcke, A., et al.: Enriching Speech Recognition with Automatic Detection of Sentence Boundaries and Disfluencies. IEEE Transactions on Audio, Speech and Language Processing 1(5), 1526–1540 (2006)
Google Scholar
Verkhodanova, V.O.: Algorithms and Software for Automatic Detection of Speech Disfluencies in an Audio Signal. SPIIRAS Proceedings 31, 43–60 (2013)
Google Scholar
Lease, M., Johnson, M., Charniak, E.: Recognizing disfluencies in conversational speech. IEEE Transactions on Audio, Speech and Language Processing 14(5), 1566–1573 (2006)
Article Google Scholar
Kaushik, M., Trinkle, M., Hashemi-Sakhtsari, A.: Automatic Detection and Removal of Disfluencies from Spontaneous Speech. In: Proc. of 13th Australasian International Conference on Speech Science and Technology, pp. 98–101 (2010)
Google Scholar
Liu, Y.: Structural Event Detection for Rich Transcription of Speech. PhD thesis, Purdue University and ICSI, Berkeley (2004)
Google Scholar
Masataka, G., Katunobu, I., Satoru, H.: A Real-time Filled Pause Detection System for Spontaneous Speech Recognition. In: Proc. of 6th European Conference on Speech Communication and Technology, pp. 227–230 (1999)
Google Scholar
Medeiros, R.B., Moniz, G.S., Batista, M.M., Trancoso, I., Nunes, L.: Disfluency Detection Based on Prosodic Features for University Lectures. In: Proc. of 14th Annual Conference of the International Speech Communication Association, pp. 2629–2633 (2013)
Google Scholar
Corpus “Czech Broadcast Conversation MDE Transcripts”, LDC., http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2009T20 (accessed January 5, 2014)
Corpus “Czech Broadcast Conversation Speech”, LDC., http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2009S02 (accessed January 5, 2014)
Kolar, J., Svec, J., Strassel, S., et al.: Czech Spontaneous Speech Corpus with Structural Metadata. In: Proc. of 9th European Conference on Speech Communication and Technology, pp. 1165–1168 (2005)
Google Scholar
Verkhodanova, V., Shapranov, V.: Automatic detection of speech disfluencies in the spontaneous russian speech. In: Železný, M., Habernal, I., Ronzhin, A. (eds.) SPECOM 2013. LNCS (LNAI), vol. 8113, pp. 70–77. Springer, Heidelberg (2013)
Chapter Google Scholar
Zemskaya, E.A.: Russian spoken speech: linguistic analysis and the problems of learning, Moscow (1979) (in Rus.)
Google Scholar
Anderson, A., Bader, M., Bard, E., Boyle, E., Doherty, G.M., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H.S., Weinert, R.: The HCRC Map Task Corpus. Language and Speech 34, 351–366 (1991)
Google Scholar
Kohler, K.J.: Labelled data bank of spoken standard German: the Kiel corpus of read/spontaneous speech. In: Kohler, K.J. (ed.) Proc. of 4th International Conference on Spoken Language, vol. 3, pp. 1938–1941 (1996)
Google Scholar
Wave Assistant, the speech analyzer program by Speech Technology Center, http://www.phonetics.pu.ru/wa/WA_S.EXE (accessed October 6, 2013)
Krivnova, O.F., Chadrin, I.S.: Pausing in the Natural and Synthesized Speech. In: Proc. of Conference on Theory and Practice of Speech Investigations (1999) (in Rus)
Google Scholar
Nelson, D.: Correlation based speech formant recovery. In: Proc. of IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 1997, vol. 3, pp. 1643–1646 (1997)
Google Scholar
Meshcheryakov, R.M., Kostyuchenko, E., Yu, B.L.N., Choinzonov, E.L.: Structure and database of software for speech quality and intelligibility assessment in the process of rehabilitation after surgery in the treatment of cancers of the oral cavity and oropharynx, maxillofacial area. SPIIRAS Proceedings 32, 116–124 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

SPIIRAS, 39, 14th line, 199178, St. Petersburg, Russia
Vasilisa Verkhodanova
Betria Systems, Inc, 50, Building 11, Ligovskii Prospekt, St. Petersburg, Russia
Vladimir Shapranov

Authors

Vasilisa Verkhodanova
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir Shapranov
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Speech and Multimodal Interfaces Laboratory, St. Petersburg Institute of Informatics and Automation of the Russian Academy of Sciences, 39, 14th line, 199178, St. Petersburg, Russia
Andrey Ronzhin
Institute of Applied and Mathematical Linguistics, Moscow State Linguistic University, 38, Ostozhenka, 119034, Moscow, Russia
Rodmonga Potapova
Faculty of Technical Sciences, University of Novi Sad, 6, Trg Dositeja Obradovića, 21000, Novi Sad, Serbia
Vlado Delic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Verkhodanova, V., Shapranov, V. (2014). Filled Pauses and Lengthenings Detection Based on the Acoustic Features for the Spontaneous Russian Speech. In: Ronzhin, A., Potapova, R., Delic, V. (eds) Speech and Computer. SPECOM 2014. Lecture Notes in Computer Science(), vol 8773. Springer, Cham. https://doi.org/10.1007/978-3-319-11581-8_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-11581-8_28
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11580-1
Online ISBN: 978-3-319-11581-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Filled Pauses and Lengthenings Detection Based on the Acoustic Features for the Spontaneous Russian Speech

Abstract

Chapter PDF

Similar content being viewed by others

Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech

Detecting Filled Pauses and Lengthenings in Russian Spontaneous Speech Using SVM

Modeling of Filled Pauses and Prolongations to Improve Slovak Spontaneous Speech Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Filled Pauses and Lengthenings Detection Based on the Acoustic Features for the Spontaneous Russian Speech

Abstract

Chapter PDF

Similar content being viewed by others

Multi-factor Method for Detection of Filled Pauses and Lengthenings in Russian Spontaneous Speech

Detecting Filled Pauses and Lengthenings in Russian Spontaneous Speech Using SVM

Modeling of Filled Pauses and Prolongations to Improve Slovak Spontaneous Speech Recognition

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation