Toward Exploring the Role of Disfluencies from an Acoustic Point of View: A New Aspect of (Dis)continuous Speech Prosody Modelling

Szaszák, György; Beke, András

doi:10.1007/978-3-319-24033-6_42

György Szaszák¹⁵ &
András Beke¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1831 Accesses

Abstract

Several studies use idealized, fluent utterances to comprehend spoken language. Disfluencies are often regarded to be just a noise in the speech flow. Other works argue that fragmented structures (disfluencies, silent and filled pauses) are important and can help better understanding. By extending the original concept of speech disfluency, the current paper involves the acoustic level and places the discontinuity of F0 in parallel with speech disfluencies. An exhaustive analysis of the advantages and disadvantages of using a continuous F0 estimate in prosodic event detection tasks is performed for formal and informal speaking styles. Results suggest that unlike in read (formal) speech, using a continuous, overall interpolated F0 curve is counterproductive in spontaneous (informal) speech. Comparing the behaviour of speech disfluencies and the effect of discontinuity of the F0 contour, results raise more general modelling philosophy considerations, as they suggest that disfluencies in informal speech may be by themselves informative entities, reflected also in the acoustic level organization of speech, which suggests that disfluencies in general are an important perceptual cue in human speech understanding.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Prosody Modeling: A Review Report on Indian Language

Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

Modeling of Filled Pauses and Prolongations to Improve Slovak Spontaneous Speech Recognition

Keywords

References

Silverman, K.M., Beckman, J., Pitrelli, M., Ostendorf, C., Wightman, P., Price, J.P., Hirschberg, J.: Tobi: a standard for labelling english prosody. In: Proceedings of the 2nd International Conference on Spoken Language Processing (ICSLP-92), pp. 867–870 (1992)
Google Scholar
Selkirk, E.: The syntax-phonology interface. In: International Encyclopaedia of the Social and Behavioural Sciences, pp. 15407–15412. Pergamon, Oxford (2001)
Google Scholar
Veilleux, N., Ostendorf, M.: Prosody/parse scoring and its application in atis. In: Proceedings of the Workshop on Human Language Technology, pp. 335–340 (1993)
Google Scholar
Gallwitz, F., Niemann, H., Nöth, E., Warnke, W.: Integrated recognition of words and prosodic phrase boundaries. Speech Communication 36(1–2), 81–95 (2002)
Article MATH Google Scholar
Szaszák, G., Beke, A.: Exploiting prosody for automatic syntactic phrase boundary detection in speech. Journal of Language Modeling 0(1), 143–172 (2012)
Article Google Scholar
Beke, A., Szaszák, G.: Unsupervised clustering of prosodic patterns in spontaneous speech. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 648–655. Springer, Heidelberg (2012)
Chapter Google Scholar
Medeiros, H., Batista, F., Moniz, H., Trancoso, I., Meinedo, H.: Experiments on automatic detection of filled pauses using prosodic features. Actas de Inforum 2013, 335–345 (2013)
Google Scholar
Swerts, M.: Filled pauses as markers of discourse structure. Journal of Pragmatics 30, 485–946 (1998)
Article Google Scholar
Cook, H., Lallijee, M.: The interpretation of pauses by the listener. Brit. J. Soc. Clin. Psy. 9, 375–376 (1970)
Article Google Scholar
Swerts, M., Ostendorf, M.: Prosodic and lexical indications of discourse structure in human-machine interactions. Speech Communication 22(1), 25–41 (1997)
Article Google Scholar
Swerts, A., Wichmann, A., Beun, R.J.: Filled pauses as markers of discourse structure. In: Proceedings ICSLP96, Fourth International Conference on Spoken Language Processing, pp. 1033–1036 (1996)
Google Scholar
Zellner, B.: Pauses and the temporal structure of speech. In: Fundamentals of Speech Synthesis and Speech Recognition, pp. 41–62. John Wiley, Chichester (1994)
Google Scholar
Hirst, D., Cristo, A.D.: Intonation Systems: A Survey of Twenty Languages. Cambridge University Press, New York (1989)
Google Scholar
Ghahremani, P., BabaAli, B., Povey, D., Riedhammer, K., Trmal, J., Khudanpur, S.: A pitch extraction algorithm tuned for automatic speech recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2494–2498 (2014)
Google Scholar
Roach, P.S., Amfield, S., Bany, W., Baltova, J., Boldea, M., Fourcin, A., Goner, W., Gubrynowicz, R., Hallum, E., Lamep, L., Marasek, K., Marchal, A., Meiste, E., Vicsi, K.: Babel: an eastern european multi-language database. In: International Conf. on Speech and Language, pp. 1033–1036 (1996)
Google Scholar
Neuberger, T., Gyarmathy, D., Gráczi, T.E., Horváth, V., Gósy, M., Beke, A.: Development of a large spontaneous speech database of agglutinative Hungarian language. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 424–431. Springer, Heidelberg (2014)
Google Scholar
Sjölander, K., Beskow, A.: Wavesurfer - an open source speech tool. In: Proceedings of the 6th International Conference of Spoken Language Processing, vol. 4, pp. 464–467 (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
György Szaszák
Research Institute for Linguistics, Hungarian Academy of Sciences, Budapest, Hungary
András Beke

Authors

György Szaszák
View author publications
You can also search for this author in PubMed Google Scholar
András Beke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to György Szaszák .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Szaszák, G., Beke, A. (2015). Toward Exploring the Role of Disfluencies from an Acoustic Point of View: A New Aspect of (Dis)continuous Speech Prosody Modelling. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_42

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_42
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Toward Exploring the Role of Disfluencies from an Acoustic Point of View: A New Aspect of (Dis)continuous Speech Prosody Modelling

Abstract

Chapter PDF

Similar content being viewed by others

Prosody Modeling: A Review Report on Indian Language

Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

Modeling of Filled Pauses and Prolongations to Improve Slovak Spontaneous Speech Recognition

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Toward Exploring the Role of Disfluencies from an Acoustic Point of View: A New Aspect of (Dis)continuous Speech Prosody Modelling

Abstract

Chapter PDF

Similar content being viewed by others

Prosody Modeling: A Review Report on Indian Language

Modeling Vietnamese Speech Prosody: A Step-by-Step Approach Towards an Expressive Speech Synthesis System

Modeling of Filled Pauses and Prolongations to Improve Slovak Spontaneous Speech Recognition

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation