A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis

Odéjobí, Odétúnjí A.; Beaumont, Anthony J.; Wong, Shun Ha Sylvia

doi:10.1007/978-3-540-30120-2_52

Odétúnjí A. Odéjobí²¹,
Anthony J. Beaumont²¹ &
Shun Ha Sylvia Wong²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3206))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

900 Accesses
5 Citations

Abstract

In this paper we present the design and analysis of an intonation model for text-to-speech (TTS) synthesis applications using a combination of Relational Tree (RT) and Fuzzy Logic (FL) technologies. The model is demonstrated using the Standard Yorùbá (SY) language. In the proposed intonation model, phonological information extracted from text is converted into an RT. RT is a sophisticated data structure that represents the peaks and valleys as well as the spatial structure of a waveform symbolically in the form of trees. An initial approximation to the RT, called Skeletal Tree (ST), is first generated algorithmically. The exact numerical values of the peaks and valleys on the ST is then computed using FL. Quantitative analysis of the result gives RMSE of 0.56 and 0.71 for peak and valley respectively. Mean Opinion Scores (MOS) of 9.5 and 6.8, on a scale of 1–10, was obtained for intelligibility and naturalness respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Hybrid statistical/unit-selection Turkish speech synthesis using suffix units

Article Open access 02 February 2016

Statistical Text-to-Speech Synthesis of Spanish Subtitles

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

References

Donovan, R.E.: Trainable Speech Synthesis. Ph.D. thesis, Cambridge University, U.K., Cambridge (1996)
Google Scholar
Horne, M.: Prosody: Theory and Experiment: Studies Presented to Gösta Bruce, pp. 450–456. Kluwer, Dordrecht (2000)
Google Scholar
Wang, C.: Prosodic modelling for improved speech recognition and understanding. Ph.D. thesis, Massachusetts Institute of Technology (2001)
Google Scholar
Prevost, S., Steedman, M.: Specifying intonation from context for speech synthesis. Speech Communication 15, 139–153 (1994)
Article Google Scholar
d’Alessandor, C., Mertens, P.: Automatic pitch contour stylization using a model of tonal perception. Computer Speech and Language 9, 257–288 (1995)
Article Google Scholar
Cheng, Y.C., Lu, S.Y.: Waveform correlation by tree matching. IEEE Trans. On Patt. Anal. & Mach. Intel. PAMI-7, 299–305 (1985)
Article Google Scholar
Ehrich, R.W., Forith, J.: Representation of random waveform by relational trees. IEEE Trans. On Computers C-25, 725–736 (1976)
Article Google Scholar
Takagi, T., Sugeno, M.: Fuzzy identification of systems and its application to modelling and control. IEEE Trans. On Syst., Man & Cyber. SMC-1, 116–132 (1985)
Google Scholar
Jitca, D., Teodorescu, H.N., Apopei, V., Grigoras, F.: Improved speech synthesis using fuzzy methods. Int. Jr. of Speech Tech. 5, 227–235 (2002)
Article MATH Google Scholar
Ọdẹ́ọbí, O.A., Beaumont, A.J., Wong, S.H.S.: Experiments on stylisation of standard Yorùbá language tones. Technical Report CS-001, Aston University, Birmingham, United Kingdom (2004)
Google Scholar
Connell, B., Ladd, D.R.: Aspect of pitch realisation in Yorùbá. Phonology 7, 1–29 (1990)
Article Google Scholar
Harrison, P.: Acquiring the phonology of lexical tone in infants. Lingua 110, 581–616 (2000)
Article Google Scholar
Laniran, Y.O., Clements, G.N.: Downstep and high rising: interacting factors in Yorùbá tone production. J. of Phonetics, 203–250 (2003)
Google Scholar
Velle, C.R.L.: An experimental study of Yorùbá tone. Studies in African Linguistics Suppl. 5, 185–194 (1974)
Google Scholar
Wang, W.J., Liao, Y.F., Chen, S.H.: RNN-based prosodic modelling for Mandarin speech and its application to speech-to-text conversion. Speech Communication 36, 247–265 (2002)
Article MATH Google Scholar
Monaghan, A.I.C., Ladd, D.R.: Symbolic output as the basis for evaluating intonation in text-tospeech synthesis system. Speech Communication 9, 305–314 (1990)
Article Google Scholar
Boersma, P., Weenink, D.: Praat, doing phonetic by computer (2004), http://www.fon.hum.uva.nl/praat/

Download references

Author information

Authors and Affiliations

Computer Science, Aston University, Aston Triangle, Birmingham, B4 7ET, United Kingdom
Odétúnjí A. Odéjobí, Anthony J. Beaumont & Shun Ha Sylvia Wong

Authors

Odétúnjí A. Odéjobí
View author publications
You can also search for this author in PubMed Google Scholar
Anthony J. Beaumont
View author publications
You can also search for this author in PubMed Google Scholar
Shun Ha Sylvia Wong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Masaryk University, Botanická 68a, CZ-602 00, Brno, Czech Republic
Ivan Kopeček
Faculty of Informatics, Department of Computer Graphics and Design, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Karel Pala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Odéjobí, O.A., Beaumont, A.J., Wong, S.H.S. (2004). A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis. In: Sojka, P., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2004. Lecture Notes in Computer Science(), vol 3206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30120-2_52

Download citation

DOI: https://doi.org/10.1007/978-3-540-30120-2_52
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23049-6
Online ISBN: 978-3-540-30120-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Hybrid statistical/unit-selection Turkish speech synthesis using suffix units

Statistical Text-to-Speech Synthesis of Spanish Subtitles

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Computational Model of Intonation for Yorùbá Text-to-Speech Synthesis: Design and Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Hybrid statistical/unit-selection Turkish speech synthesis using suffix units

Statistical Text-to-Speech Synthesis of Spanish Subtitles

Handling Two Difficult Challenges for Text-to-Speech Synthesis Systems: Out-of-Vocabulary Words and Prosody: A Case Study in Romanian

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation