Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

Guennec, David; Chevelu, Jonathan; Lolive, Damien

doi:10.1007/978-3-319-24033-6_17

David Guennec¹⁵,
Jonathan Chevelu¹⁵ &
Damien Lolive¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1829 Accesses

Abstract

Unit selection speech synthesis systems generally rely on target and concatenation costs for selecting a best unit sequence. These costs, though often considering contextual features, mainly include local distances that are accumulated afterwards. In this paper, we describe a new duration target cost that takes a whole sequence into account. It aims at selecting a sequence globally good, instead of a very good sequence almost everywhere but having a few local duration cost leaps that are counter-balanced by other units. The problem of weighting this new duration cost with other sub-costs is also investigated. Experiments showed this new measure performed well on sentences featuring duration artefacts, while not deteriorating others.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Last Syllable Unit Penalization in Unit Selection TTS

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Keywords

References

Yamagishi, J., Ling, Z., King, S.: Robustness of HMM-based speech synthesis. In: Ninth Annual Conference of the International Speech Communication Association, pp. 2–5 (2008)
Google Scholar
Sagisaka, Y.: Speech synthesis by rule using an optimal selection of non-uniform synthesis units. In: Proc. of ICASSP, pp. 679–682. IEEE (1988)
Google Scholar
Black, A., Taylor, P.: Chatr: a generic speech synthesis system. In: Proc. of Coling, Association for Computational Linguistics (1994)
Google Scholar
Hunt, A., Black, A.: Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. of ICASSP, pp. 373–376. IEEE (1996)
Google Scholar
Taylor, P., Black, A., Caley, R.: The architecture of the festival speech synthesis system. In: Proc. of the ESCA Workshop in Speech Synthesis, pp. 147–151 (1998)
Google Scholar
Breen, A., Jackson, P.: Non-uniform unit selection and the similarity metric within bts laureate tts system. In: Proc. of the ESCA Workshop on Speech Synthesis, pp. 373–376. Citeseer (1998)
Google Scholar
Clark, R., Richmond, K., King, S.: Multisyn: Open-domain unit selection for the festival speech synthesis system. Speech Communication, 317–330 (2007)
Google Scholar
Kumar, R.: A genetic algorithm for unit selection based speech synthesis. In: Eighth International Conference on Spoken Language Processing (2004)
Google Scholar
Schröder, M.: Expressive Speech Synthesis: Past, Present, and Possible Futures. In: Affective Information Processing, pp. 111–126. Springer, London (2009)
Google Scholar
Alías, F., Formiga, L., Llorá, X.: Efficient and reliable perceptual weight tuning for unit-selection text-to-speech synthesis based on active interactive genetic algorithms: A proof-of-concept. Speech Communication, 786–800 (May 2011)
Google Scholar
Hashimoto, K., Oura, K., Nankaku, Y., Tokuda, K.: The effect of neural networks in statistical parametric speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4455–4459 (2015)
Google Scholar
Guennec, D., Lolive, D.: Unit selection cost function exploration using an A* based text-to-speech system. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 432–440. Springer, Heidelberg (2014)
Google Scholar
Tuerk, C., Robinson, T.: Speech synthesis using artificial neural networks trained on cepstral coefficients. In: Proc. of EUROSPEECH, pp. 4–7 (1993)
Google Scholar
Karaali, O., Corrigan, G., Gerson, I.: Speech synthesis with neural networks. In: Proc. of World Congress on Neural Networks, pp. 45–50 (1996)
Google Scholar
Taylor, P.: The target cost formulation in unit selection speech synthesis. In: Proc. of Stress, pp. 2038–2041 (2006)
Google Scholar
Boeffard, O., Charonnat, L., Le Maguer, S., Lolive, D., Vidal, G.: Towards fully automatic annotation of audio books for tts. In: Proc. of LREC, pp. 975–980 (2012)
Google Scholar
Chevelu, J., Lecorvé, G., Lolive, D.: Roots: a toolkit for easy, fast and consistent processing of large sequential annotated data collections. In: Proc. of LREC, pp. 619–626 (2014)
Google Scholar
ITU-T: Itu-t recommendation p. 800: Methods for subjective determination of transmission quality (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

IRISA - University of Rennes 1, Lannion, France
David Guennec, Jonathan Chevelu & Damien Lolive

Authors

David Guennec
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Chevelu
View author publications
You can also search for this author in PubMed Google Scholar
Damien Lolive
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Guennec .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guennec, D., Chevelu, J., Lolive, D. (2015). Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_17
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

Abstract

Chapter PDF

Similar content being viewed by others

Last Syllable Unit Penalization in Unit Selection TTS

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Defining a Global Adaptive Duration Target Cost for Unit Selection Speech Synthesis

Abstract

Chapter PDF

Similar content being viewed by others

Last Syllable Unit Penalization in Unit Selection TTS

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

Uncertainty of Phone Voicing and Its Impact on Speech Synthesis

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation