Abstract
In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1–10, was obtained for intelligibility and naturalness respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Donovan, R.E.: Trainable Speech Synthesis. Ph.D. thesis, Cambridge University, U.K., Cambridge (1996)
Horne, M.: Prosody: Theory and Experiment: Studies Presented to Gösta Bruce, pp. 450–456. Kluwer, Dordrecht (2000)
Wang, C.: Prosodic modelling for improved speech recognition and understanding. Ph.D. thesis, Massachusetts Institute of Technology (2001)
Prevost, S., Steedman, M.: Specifying intonation from context for speech synthesis. Speech Communication 15, 139–153 (1994)
d’Alessandor, C., Mertens, P.: Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language 9, 257–288 (1995)
Cheng, Y.C., Lu, S.Y.: Waveform correlation by tree matching. IEEE Trans. On Patt. Anal. & Mach. Intel. PAMI-7, 299–305 (1985)
Ehrich, R.W., Forith, J.: Representation of random waveform by relational trees. IEEE Trans. On Computers C-25, 725–736 (1976)
Takagi, T., Sugeno, M.: Fuzzy identification of systems and its application to modelling and control. IEEE Trans. On Syst., Man & Cyber. SMC-1, 116–132 (1985)
Jitca, D., Teodorescu, H.N., Apopei, V., Grigoras, F.: Improved speech synthesis using fuzzy methods. Int. Jr. of Speech Tech. 5, 227–235 (2002)
Ọdẹ́ọbí, O.A., Beaumont, A.J., Wong, S.H.S.: Experiments on stylisation of standard Yorùbá language tones. Technical Report CS-001, Aston University, Birmingham, United Kingdom (2004)
Connell, B., Ladd, D.R.: Aspect of pitch realisation in Yorùbá. Phonology 7, 1–29 (1990)
Harrison, P.: Acquiring the phonology of lexical tone in infants. Lingua 110, 581–616 (2000)
Laniran, Y.O., Clements, G.N.: Downstep and high rising: interacting factors in Yorùbá tone production. J. of Phonetics, 203–250 (2003)
Velle, C.R.L.: An experimental study of Yorùbá tone. Studies in African Linguistics Suppl. 5, 185–194 (1974)
Wang, W.J., Liao, Y.F., Chen, S.H.: RNN-based prosodic modelling for Mandarin speech and its application to speech-to-text conversion. Speech Communication 36, 247–265 (2002)
Monaghan, A.I.C., Ladd, D.R.: Symbolic output as the basis for evaluating intonation in text-tospeech synthesis system. Speech Communication 9, 305–314 (1990)
Boersma, P., Weenink, D.: Praat, doing phonetic by computer (2004), http://www.fon.hum.uva.nl/praat/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Odéjobí, O.A., Beaumont, A.J., Wong, S.H.S. (2004). A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_52
Download citation
DOI: https://doi.org/10.1007/978-3-540-30120-2_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive