Segmental Duration and Speech Timing

van Santen, Jan P. H.

doi:10.1007/978-1-4612-2258-3_15

Jan P. H. van Santen

312 Accesses
16 Citations

Abstract

Many speech technologies assume that speech can be approximated by time warping and concatenating appropriately selected “speech units”, such as diphones. This paper first discusses evidence for the validity of this assumption, and then points out how modelling of speech timing can be cast in terms of time warp functions; conventional segmental duration based modelling is a special case of time warp function based modelling. Next, the paper addresses a challenge against time warp based modelling: the possible existence of long-range temporal constraints on timing, as proposed by isochrony and syllabic timing concepts. However, evidence is provided that such constraints simply do not exist in American English and Mandarin Chinese. The paper concludes with a presentation of a time warp based approach to pitch modelling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Novel Methods for Characterising L2 Speech Rhythm

Chronset: An automated tool for detecting speech onset

Article Open access 06 December 2016

Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

References

J. Allen, S. Hunnicut, and D. H. Klatt. From Text to Speech: The MITalk System. Cambridge, UK: Cambridge University Press, 1987.
Google Scholar
P. Barbosa and G. Bailly. Characterization of rhythmic patterns for text-to-speech synthesis. Speech Communication 15:127–137, 1994.
Article Google Scholar
W. N. Campbell. Syllable-based segmental duration. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 211–224. Asterdam: Elsevier Science, 1992.
Google Scholar
W. N. Campbell. Automatic detection of prosodic boundaries in speech. Speech Communication, 13:343–354, 1993.
Article Google Scholar
T. H. Crystal and A. S. House. Segmental durations in connected-speech signals: Current results. J. Acoust. Soc. Am., 83:1553–1573, 1988.
Article ADS Google Scholar
W. N. Campbell and S. D. Isard. Segment durations in a syllabic frame. Journal of Phonetics, 47:19:37, 1991.
Google Scholar
J. S. Coleman. ‘Synthesis-by-rule’ without segments of rewrite- rules. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 43–60. Amsterdam: Elsevier Science, 1992.
Google Scholar
R. Collier. A comment on the prediction of prosody. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 205–208. Amsterdam: Elsevier Science, 1992.
Google Scholar
H. Fujisaki. A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. In O. Fujimura, editor, Vocal Fold Physiology: Voice Production, Mechanisms and Functions. New York: Raven, 1988.
Google Scholar
Th. Gay. Effect of speaking rate on diphthong formant movements. J. Acoust. Soc. Am., 44:1570–1573, 1968.
Article ADS Google Scholar
S. R. Hertz. The delta programming language: An integrated approach to nonlinear phonology, phonetics, and speech synthesis. In J. Kingston and M. E. Beckman, editors, Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech, pp. 215–257. Cambridge, UK: Cambridge University Press, 1990.
Google Scholar
D. H. Klatt. Interaction between two factors that influence vowel duration. J. Acoust. Soc. Am., 54:1102–1104, 1973.
Article ADS Google Scholar
A. Ljolje. High accuracy phone recognition using context clustering and quasi-triphonic models. Computer Speech and Language, 8:129–151, 1994.
Article Google Scholar
M. J. Macchi. Using dynamic time warping to formulate duration rules for speech synthesis. J. Acoust. Soc. Am., 85:S1(U49), 1989.
ADS Google Scholar
B. Möbius, M. Pätzold, and W. Hess. Analysis and synthesis of F0 contours by means of Fujisaki’s model. Speech Communication 13, pp. 53–61, 1993.
Article Google Scholar
S. G. Nooteboom. Some observations on the temporal organisation and rhythm of speech. In Proceedings of the Xllème International Congress of Phonetic Sciences, Aix-en-Provence, France, 1991.
Google Scholar
J. P. Olive and R. W. Sproat. Principles of speech synthesis. In W. B. Kleijn and K. K. Paliwal, editors, Speech Coding and Synthesis. Amsterdam: Elsevier, 1995.
Google Scholar
J. B. Pierrehumbert. The Phonology and Phonetics of English Intonation. PhD thesis, Massachusetts Institute of Technology, Distributed by the Indiana University Linguistics Club, 1980.
Google Scholar
P. Prieto, J. P. H. van Santen, and J Hirschberg. Tonal alignment patterns in Spanish. Journal of Phonetics, 23: 1995.
Google Scholar
K. N. Stevens and C. A. Bickley. Constraints among parameters simplify control of Klatt formant synthesizer. Journal of Phonetics, 19:161–174, 1991.
Google Scholar
D. Sankoff and J. B. Krusal. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. London: Addison-Wesley, 1983.
Google Scholar
J. P. H. van Santen. Contextual effects on vowel duration. Speech Communication, 11:513–546, 1992.
Article Google Scholar
J. P. H. van Santen. Analyzing N-way tables with sums-of- products models. Journal of Mathematical Psychology, 37:327–371, 1993.
Article MathSciNet MATH Google Scholar
J. P. H. van Santen. Timing in text-to-speech systems. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 1397–1404, 1993.
Google Scholar
J. P. H. van Santen. Assignment of segmental duration in text- to-speech synthesis. Computer Speech and Language, 8:95–128, 1994.
Article Google Scholar
J. P. H. van Santen. Using statistics in text-to-speech system construction. Proceedings of the ESCA/IEEE Workshop on Speech Synthesis, Mohonk, NY, pp. 240–243, 1994.
Google Scholar
J. P. H. van Santen, J. C. Coleman, and M. A. Randolph. Effects of post-vocalic voicing on the time course of vowels and diphthongs. J. Acoust. Soc. Am., 4.2:2444–2447, 1992.
Article Google Scholar
J. P. H. van Santen, and J. Hirschberg. Segmental effects on timing and height of pitch contours. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, pp. 719–722, 1994.
Google Scholar
J. P. H. van Santen and C. Shih. Syllabic and segmental timing in Mandarin Chinese and American English (in preparation)
Google Scholar

Download references

Authors

Jan P. H. van Santen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ATR Interpreting Telecommunications Research Labs, 2-2, Hikaridai, Seika-cho, Soraku-gun, 619-02, Kyoto, Japan
Yoshinori Sagisaka , Nick Campbell & Norio Higuchi , &

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

van Santen, J.P.H. (1997). Segmental Duration and Speech Timing. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_15

Download citation

DOI: https://doi.org/10.1007/978-1-4612-2258-3_15
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Segmental Duration and Speech Timing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Novel Methods for Characterising L2 Speech Rhythm

Chronset: An automated tool for detecting speech onset

Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

References

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Segmental Duration and Speech Timing

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Novel Methods for Characterising L2 Speech Rhythm

Chronset: An automated tool for detecting speech onset

Instantaneous Harmonic Analysis: Techniques and Applications to Speech Signal Processing

References

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation