Abstract
This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-to-speech (TTS) system for Hungarian. The experimental system generates weather forecasts in Hungarian. 5260 sentences were recorded creating a speech corpus containing 11 hours of continuous speech. A Hungarian speech recognizer was applied to label speech sound boundaries. Word boundaries were also marked automatically. The unit selection follows a top-down hierarchical scheme using words and speech sounds as units. A simple prosody model is used, based on the relative position of words within a prosodic phrase. The quality of the system was compared to two earlier Hungarian TTS systems. A subjective listening test was performed by 221 listeners. The experimental system scored 3.92 on a five-point mean opinion score (MOS) scale. The earlier unit concatenation TTS system scored 2.63, the formant synthesizer scored 1.24, and natural speech scored 4.86.
Access provided by Autonomous University of Puebla. Download to read the full chapter text
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Möbius, B.: Corpus-Based Speech Synthesis: Methods and Challenges. AIMS 6(4), 87–116 (2000)
Olaszy, G., Németh, G., Olaszi, P., Kiss, G., Gordos, G.: PROFIVOX - A Hungarian Professional TTS System for Telecommunications Applications. International Journal of Speech Technology 3(3/4), 201–216 (2000)
Németh, G., Zainkó, C.: Word Unit Based Multilingual Comparative Analysis of Text Corpora. In: Eurospeech 2001, pp. 2035–2038 (2001)
Boersma, P.: Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a Sampled Sound. In: IFA Proceedings, vol. 17, pp. 97–110 (1993)
Mihajlik, P., Révész, T., Tatai, P.: Phonetic Transcription in Automatic Speech Recognition. Acta Linguistica Hungarica 49(3–4), 407–425 (2002)
Vicsi, K., Tóth, L., Kocsor, A., Gordos, G., Csirik, J.: MTBA - Magyar nyelvű telefonbeszéd adatbázis (Hungarian Telephone-Speech Database). In: Híradástechnika, vol. 2002/8, pp. 35–39 (2002)
Taylor, P., Black, A., W.: Speech Synthesis by Phonological Structure Matching. In: Eurospeech 1999, vol. 2, pp. 623–626 (1999)
Olaszy, G.: Az artikuláció akusztikus vetülete – a hangsebészet elmélete és gyakorlata (The Articulation and the Spectral Content—the Theory and Practice of Sound Surgery). In: Hunyadi, L. (ed.) KIF-LAF (Journal of Experimental Phonetics and Laboratory Phonology), Debreceni Egyetem, pp. 241–254 (2003)
Olaszy, G., Gordos, G., Németh, G.: The MULTIVOX Multilingual Text-to-Speech Converter. In: Bailly, G., Benoit, C., Sawallis, T. (eds.) Talking machines: Theories, Models and Applications, pp. 385–411. Elsevier, Amsterdam (1992)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fék, M., Pesti, P., Németh, G., Zainkó, C., Olaszy, G. (2006). Corpus-Based Unit Selection TTS for Hungarian. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2006. Lecture Notes in Computer Science(), vol 4188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11846406_46
Download citation
DOI: https://doi.org/10.1007/11846406_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39090-9
Online ISBN: 978-3-540-39091-6
eBook Packages: Computer ScienceComputer Science (R0)