Abstract
In this paper we describe how computational models of F0 were derived from four different speech corpora and how their control characteristics were compared to find the possibilities of prosody conversion for speech synthesis. A superpositional F0 control model was employed to reduce comptational complexities and a statistical optimization method was used to determine the dominant factors for F0 control in each speech corpus efficiently. The analyses showed the invariance of some dominant control parameters and the differences due to speaking styles. These preliminary results also confirmed the usefulness of superpositional F0 control for prosody conversion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. Abe and H. Sato. Two-stage F0 control model using syllable based F0 units. Proceedings of the International Conference on Acoustics, Speech and Signal Processes, pp. 53–56, 1992.
E. Moulines and Y. Sagisaka. Voice conversion: State of the art and perspectives. Special issue of Speech Communication, 16:125–216, 1995.
H. Fujisaki and K. Hirose. Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoustics Soc. J. (E), 5:233–242, 1984.
H. Fujisaki and K. Hirose and N. Takahashi. Manifestation of linguistic and paralinguistic information. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 485–488, 1990.
H. Fujisaki and H. Kawai. Realization of linguistic information in the voice fundamental frequency contour. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 663–666, 1988.
H. Fujisaki and H. Sudo. Synthesis by rule of prosodic features of connected Japanese. Proceedings of 7th ICA, 3:133–136, 1971.
K. Hirose, H. Fujisaki, and H. Kawai. A system for synthesis of connected speech-special emphasis on the prosodic features. Trans, of the Committee on Speech Research, 1985. S85–43 (in Japanese.).
N. Higuchi, T. Hirai, and Y. Sagisaka. Effect of speaking style on parameters of fundamental frequency contour. In J. P. H. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis. New York: Springer-Verlag, 1997.
T. Hirai, N. Iwahashi, N. Higuchi, and Y. Sagisaka. Automatic extraction of F0 control parameters using statistical analysis. In J. P. H. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis. New York: Springer-Verlag, 1997.
N. Iwahashi and Y. Sagisaka. Duration modelling with multiple split regression. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 329–332, 1993.
N. Kaiki and Y. Sagisaka. Optimization of intonation control using statistical F0 resetting characteristics. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, 2:49–52, 1992.
N. Kaiki and Y. Sagisaka. Prosodie characteristics of Japanese conversational speech. Trans. IEICE Jpn., E76-A: 1927–1933, 1993.
N. Kaiki and Y. Sagisaka. Linguistic properties in the control of segmental duration for speech synthesis. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 255–263. Amsterdam: Elsevier Science, 1992.
S. Nakajima and K. Kabeya. Relations between phrase structure and pitch contour. Rec. Spring Meeting, Acoustics Soc. Jpn., Mar. 1984 (in Japanese), pp. 113–114, 1984.
E. Ohira, H. Fujisaki, and K. Hirose. Relationship between articulatory and phonatory controls in the sentence context. Rec. Spring Meeting, Acoustics Soc. Jpn., Mar. (in Japanese.), pp. 111–112, 1984.
M. Riley. Tree-based modelling of segmental durations. In C. Benoît G. Bailly and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 265–274. Amsterdam: Elsevier Science, 1992.
B. G. Secrest and G. R. Doddington. An integrated pitch tracking algorithm for speech system. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 1352–1355, 1983.
Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuwabara. A large-scale Japanese speech database. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, pp. 1089–1092, 1990.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
Hirai, T., Higuchi, N., Sagisaka, Y. (1997). Comparison of F0 Control Rules Derived from Multiple Speech Databases. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_14
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2258-3_14
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive