Initial Experiments on Automatic Correction of Prosodic Annotation of Large Speech Corpora

Hanzlíček, Zdeněk; Grůber, Martin

doi:10.1007/978-3-319-10816-2_58

Zdeněk Hanzlíček²¹ &
Martin Grůber²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1536 Accesses
3 Citations

Abstract

Most modern speech synthesis systems utilize large speech corpora to learn new voices. These speech corpora usually contain several hours of speech spoken by talented speakers who are able to record such an amount of speech data in a sufficient quality. An appropriate phonetic and prosodic annotation of the recorded utterances is necessary for a high quality of synthesized speech. For many languages, the pitch shape within the last prosodic word of a phrase is characteristic for particular types of sentences and phrase structure of compound/complex sentences. However in the real data, this formal convention can be breached and a different pitch shape than expected can be present. This can be a source of prosody inconsistency in synthesized speech. This article presents some experiments on automatic detection of prosodic mismatch in recorded utterances. A simple classifier based on GMM was proposed for this task. Experiments were performed on 5 large speech corpora. The classification results were successfully verified by listening tests.

This work was supported by the Technology Agency of the Czech Republic, project No. TA01011264 and by the European Regional Development Fund (ERDF), project “New Technologies for Information Society” (NTIS), European Centre of Excellence, ED1.1.00/02.0090.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Correction of Prosodic Phrases in Large Speech Corpora

Concatenation Artifact Detection Trained from Listeners Evaluations

Correction of Formal Prosodic Structures in Czech Corpora Using Legendre Polynomials

Keywords

References

Hunt, A., Black, A.W.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proceedings of ICASSP 1996, Atlanta, Georgia, pp. 373–376 (1996)
Google Scholar
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Communication 51, 1039–1064 (2009)
Article Google Scholar
Ross, K., Ostendorf, M.: Prediction of abstract prosodic labels for speech synthesis. Computer Speech and Language 10, 155–185 (1996)
Article Google Scholar
Toledano, D., Gómez, L., Grande, L.: Automatic Phonetic Segmentation. IEEE Transactions on Speech and Audio Processing 11(6), 617–625 (2003)
Article Google Scholar
Wightman, C., Ostendorf, M.: Automatic labeling of prosodic patterns. IEEE Transactions on Speech and Audio Processing 2(4), 469–481 (1994)
Article Google Scholar
Romportl, J., Matoušek, J., Tihelka, D.: Advanced Prosody Modelling. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 441–447. Springer, Heidelberg (2004)
Chapter Google Scholar
Tihelka, D., Matoušek, J.: Unit Selection and its Relation to Symbolic Prosody: A New Approach. In: Proceedings of Interspeech 2006, Pittsburgh, Pennsylvania, USA, pp. 2042–2045 (2006)
Google Scholar
Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions Speech Audio Processing 3(1), 72–83 (1995)
Article Google Scholar
Přibil, J., Přibilová, A.: Evaluation of influence of spectral and prosodic features on GMM classification of Czech and Slovak emotional speech. EURASIP Journal on Audio, Speech, and Music Processing 8, 1–22 (2013)
Google Scholar
Talkin, D.: A Robust Algorithm for Pitch Tracking (RAPT). In: Kleijn, W.B., Paliwal, K.K. (eds.) Speech Coding and Synthesis, ch. 14, pp. 495–518. Elsevier Science (1995)
Google Scholar
Speech Signal Processing Toolkit (SPTK), http://sp-tk.sourceforge.net
Matoušek, J., Tihelka, D., Romportl, J.: Building of a Speech Corpus Optimised for Unit Selection TTS Synthesis. In: Proc. of LREC 2008, Marrakech, Morocco (2008)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley, Chichester (1998)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

NTIS - New Technology for the Information Society, Faculty of Applied Sciences, University of West Bohemia, Univerzitní 22, 306 14, Plzeň, Czech Republic
Zdeněk Hanzlíček & Martin Grůber

Authors

Zdeněk Hanzlíček
View author publications
You can also search for this author in PubMed Google Scholar
Martin Grůber
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Botanicá 6a, 60200, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hanzlíček, Z., Grůber, M. (2014). Initial Experiments on Automatic Correction of Prosodic Annotation of Large Speech Corpora. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_58

Download citation

DOI: https://doi.org/10.1007/978-3-319-10816-2_58
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Initial Experiments on Automatic Correction of Prosodic Annotation of Large Speech Corpora

Abstract

Chapter PDF

Similar content being viewed by others

Correction of Prosodic Phrases in Large Speech Corpora

Concatenation Artifact Detection Trained from Listeners Evaluations

Correction of Formal Prosodic Structures in Czech Corpora Using Legendre Polynomials

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Initial Experiments on Automatic Correction of Prosodic Annotation of Large Speech Corpora

Abstract

Chapter PDF

Similar content being viewed by others

Correction of Prosodic Phrases in Large Speech Corpora

Concatenation Artifact Detection Trained from Listeners Evaluations

Correction of Formal Prosodic Structures in Czech Corpora Using Legendre Polynomials

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation