Comparison of F0 Control Rules Derived from Multiple Speech Databases

Hirai, Toshio; Higuchi, Norio; Sagisaka, Yoshinori

doi:10.1007/978-1-4612-2258-3_14

Toshio Hirai,
Norio Higuchi &
Yoshinori Sagisaka

293 Accesses
1 Citations

Abstract

In this paper we describe how computational models of F0 were derived from four different speech corpora and how their control characteristics were compared to find the possibilities of prosody conversion for speech synthesis. A superpositional F0 control model was employed to reduce comptational complexities and a statistical optimization method was used to determine the dominant factors for F0 control in each speech corpus efficiently. The analyses showed the invariance of some dominant control parameters and the differences due to speaking styles. These preliminary results also confirmed the usefulness of superpositional F0 control for prosody conversion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

On the Contribution of Articulatory Features to Speech Synthesis

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC

References

M. Abe and H. Sato. Two-stage F0 control model using syllable based F0 units. Proceedings of the International Conference on Acoustics, Speech and Signal Processes, pp. 53–56, 1992.
Google Scholar
E. Moulines and Y. Sagisaka. Voice conversion: State of the art and perspectives. Special issue of Speech Communication, 16:125–216, 1995.
Google Scholar
H. Fujisaki and K. Hirose. Analysis of voice fundamental frequency contours for declarative sentences of Japanese. J. Acoustics Soc. J. (E), 5:233–242, 1984.
Google Scholar
H. Fujisaki and K. Hirose and N. Takahashi. Manifestation of linguistic and paralinguistic information. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 485–488, 1990.
Google Scholar
H. Fujisaki and H. Kawai. Realization of linguistic information in the voice fundamental frequency contour. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 663–666, 1988.
Google Scholar
H. Fujisaki and H. Sudo. Synthesis by rule of prosodic features of connected Japanese. Proceedings of 7th ICA, 3:133–136, 1971.
Google Scholar
K. Hirose, H. Fujisaki, and H. Kawai. A system for synthesis of connected speech-special emphasis on the prosodic features. Trans, of the Committee on Speech Research, 1985. S85–43 (in Japanese.).
Google Scholar
N. Higuchi, T. Hirai, and Y. Sagisaka. Effect of speaking style on parameters of fundamental frequency contour. In J. P. H. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis. New York: Springer-Verlag, 1997.
Google Scholar
T. Hirai, N. Iwahashi, N. Higuchi, and Y. Sagisaka. Automatic extraction of F0 control parameters using statistical analysis. In J. P. H. van Santen, R. Sproat, J. Olive, and J. Hirschberg, editors, Progress in Speech Synthesis. New York: Springer-Verlag, 1997.
Google Scholar
N. Iwahashi and Y. Sagisaka. Duration modelling with multiple split regression. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 329–332, 1993.
Google Scholar
N. Kaiki and Y. Sagisaka. Optimization of intonation control using statistical F0 resetting characteristics. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, 2:49–52, 1992.
Google Scholar
N. Kaiki and Y. Sagisaka. Prosodie characteristics of Japanese conversational speech. Trans. IEICE Jpn., E76-A: 1927–1933, 1993.
Google Scholar
N. Kaiki and Y. Sagisaka. Linguistic properties in the control of segmental duration for speech synthesis. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 255–263. Amsterdam: Elsevier Science, 1992.
Google Scholar
S. Nakajima and K. Kabeya. Relations between phrase structure and pitch contour. Rec. Spring Meeting, Acoustics Soc. Jpn., Mar. 1984 (in Japanese), pp. 113–114, 1984.
Google Scholar
E. Ohira, H. Fujisaki, and K. Hirose. Relationship between articulatory and phonatory controls in the sentence context. Rec. Spring Meeting, Acoustics Soc. Jpn., Mar. (in Japanese.), pp. 111–112, 1984.
Google Scholar
M. Riley. Tree-based modelling of segmental durations. In C. Benoît G. Bailly and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 265–274. Amsterdam: Elsevier Science, 1992.
Google Scholar
B. G. Secrest and G. R. Doddington. An integrated pitch tracking algorithm for speech system. Proceedings of the International Conference on Acoustics, Speech, and Signal Processes, pp. 1352–1355, 1983.
Google Scholar
Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuwabara. A large-scale Japanese speech database. In Proceedings of the International Conference on Spoken Language Processing, Kobe, Japan, pp. 1089–1092, 1990.
Google Scholar

Download references

Authors

Toshio Hirai
View author publications
You can also search for this author in PubMed Google Scholar
Norio Higuchi
View author publications
You can also search for this author in PubMed Google Scholar
Yoshinori Sagisaka
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ATR Interpreting Telecommunications Research Labs, 2-2, Hikaridai, Seika-cho, Soraku-gun, 619-02, Kyoto, Japan
Yoshinori Sagisaka , Nick Campbell & Norio Higuchi , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hirai, T., Higuchi, N., Sagisaka, Y. (1997). Comparison of F0 Control Rules Derived from Multiple Speech Databases. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_14

Download citation

DOI: https://doi.org/10.1007/978-1-4612-2258-3_14
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Comparison of F0 Control Rules Derived from Multiple Speech Databases

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

On the Contribution of Articulatory Features to Speech Synthesis

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC

References

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Comparison of F0 Control Rules Derived from Multiple Speech Databases

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Modelling F0 Dynamics in Unit Selection Based Speech Synthesis

On the Contribution of Articulatory Features to Speech Synthesis

First Steps Towards Hybrid Speech Synthesis in Czech TTS System ARTIC

References

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation