Abstract
Many speech technologies assume that speech can be approximated by time warping and concatenating appropriately selected “speech units”, such as diphones. This paper first discusses evidence for the validity of this assumption, and then points out how modelling of speech timing can be cast in terms of time warp functions; conventional segmental duration based modelling is a special case of time warp function based modelling. Next, the paper addresses a challenge against time warp based modelling: the possible existence of long-range temporal constraints on timing, as proposed by isochrony and syllabic timing concepts. However, evidence is provided that such constraints simply do not exist in American English and Mandarin Chinese. The paper concludes with a presentation of a time warp based approach to pitch modelling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
J. Allen, S. Hunnicut, and D. H. Klatt. From Text to Speech: The MITalk System. Cambridge, UK: Cambridge University Press, 1987.
P. Barbosa and G. Bailly. Characterization of rhythmic patterns for text-to-speech synthesis. Speech Communication 15:127–137, 1994.
W. N. Campbell. Syllable-based segmental duration. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 211–224. Asterdam: Elsevier Science, 1992.
W. N. Campbell. Automatic detection of prosodic boundaries in speech. Speech Communication, 13:343–354, 1993.
T. H. Crystal and A. S. House. Segmental durations in connected-speech signals: Current results. J. Acoust. Soc. Am., 83:1553–1573, 1988.
W. N. Campbell and S. D. Isard. Segment durations in a syllabic frame. Journal of Phonetics, 47:19:37, 1991.
J. S. Coleman. ‘Synthesis-by-rule’ without segments of rewrite- rules. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 43–60. Amsterdam: Elsevier Science, 1992.
R. Collier. A comment on the prediction of prosody. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 205–208. Amsterdam: Elsevier Science, 1992.
H. Fujisaki. A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. In O. Fujimura, editor, Vocal Fold Physiology: Voice Production, Mechanisms and Functions. New York: Raven, 1988.
Th. Gay. Effect of speaking rate on diphthong formant movements. J. Acoust. Soc. Am., 44:1570–1573, 1968.
S. R. Hertz. The delta programming language: An integrated approach to nonlinear phonology, phonetics, and speech synthesis. In J. Kingston and M. E. Beckman, editors, Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech, pp. 215–257. Cambridge, UK: Cambridge University Press, 1990.
D. H. Klatt. Interaction between two factors that influence vowel duration. J. Acoust. Soc. Am., 54:1102–1104, 1973.
A. Ljolje. High accuracy phone recognition using context clustering and quasi-triphonic models. Computer Speech and Language, 8:129–151, 1994.
M. J. Macchi. Using dynamic time warping to formulate duration rules for speech synthesis. J. Acoust. Soc. Am., 85:S1(U49), 1989.
B. Möbius, M. Pätzold, and W. Hess. Analysis and synthesis of F0 contours by means of Fujisaki’s model. Speech Communication 13, pp. 53–61, 1993.
S. G. Nooteboom. Some observations on the temporal organisation and rhythm of speech. In Proceedings of the Xllème International Congress of Phonetic Sciences, Aix-en-Provence, France, 1991.
J. P. Olive and R. W. Sproat. Principles of speech synthesis. In W. B. Kleijn and K. K. Paliwal, editors, Speech Coding and Synthesis. Amsterdam: Elsevier, 1995.
J. B. Pierrehumbert. The Phonology and Phonetics of English Intonation. PhD thesis, Massachusetts Institute of Technology, Distributed by the Indiana University Linguistics Club, 1980.
P. Prieto, J. P. H. van Santen, and J Hirschberg. Tonal alignment patterns in Spanish. Journal of Phonetics, 23: 1995.
K. N. Stevens and C. A. Bickley. Constraints among parameters simplify control of Klatt formant synthesizer. Journal of Phonetics, 19:161–174, 1991.
D. Sankoff and J. B. Krusal. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. London: Addison-Wesley, 1983.
J. P. H. van Santen. Contextual effects on vowel duration. Speech Communication, 11:513–546, 1992.
J. P. H. van Santen. Analyzing N-way tables with sums-of- products models. Journal of Mathematical Psychology, 37:327–371, 1993.
J. P. H. van Santen. Timing in text-to-speech systems. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 1397–1404, 1993.
J. P. H. van Santen. Assignment of segmental duration in text- to-speech synthesis. Computer Speech and Language, 8:95–128, 1994.
J. P. H. van Santen. Using statistics in text-to-speech system construction. Proceedings of the ESCA/IEEE Workshop on Speech Synthesis, Mohonk, NY, pp. 240–243, 1994.
J. P. H. van Santen, J. C. Coleman, and M. A. Randolph. Effects of post-vocalic voicing on the time course of vowels and diphthongs. J. Acoust. Soc. Am., 4.2:2444–2447, 1992.
J. P. H. van Santen, and J. Hirschberg. Segmental effects on timing and height of pitch contours. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, pp. 719–722, 1994.
J. P. H. van Santen and C. Shih. Syllabic and segmental timing in Mandarin Chinese and American English (in preparation)
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer-Verlag New York, Inc.
About this chapter
Cite this chapter
van Santen, J.P.H. (1997). Segmental Duration and Speech Timing. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_15
Download citation
DOI: https://doi.org/10.1007/978-1-4612-2258-3_15
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7476-6
Online ISBN: 978-1-4612-2258-3
eBook Packages: Springer Book Archive