Skip to main content

Segmental Duration and Speech Timing

  • Chapter
Computing Prosody

Abstract

Many speech technologies assume that speech can be approximated by time warping and concatenating appropriately selected “speech units”, such as diphones. This paper first discusses evidence for the validity of this assumption, and then points out how modelling of speech timing can be cast in terms of time warp functions; conventional segmental duration based modelling is a special case of time warp function based modelling. Next, the paper addresses a challenge against time warp based modelling: the possible existence of long-range temporal constraints on timing, as proposed by isochrony and syllabic timing concepts. However, evidence is provided that such constraints simply do not exist in American English and Mandarin Chinese. The paper concludes with a presentation of a time warp based approach to pitch modelling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. J. Allen, S. Hunnicut, and D. H. Klatt. From Text to Speech: The MITalk System. Cambridge, UK: Cambridge University Press, 1987.

    Google Scholar 

  2. P. Barbosa and G. Bailly. Characterization of rhythmic patterns for text-to-speech synthesis. Speech Communication 15:127–137, 1994.

    Article  Google Scholar 

  3. W. N. Campbell. Syllable-based segmental duration. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 211–224. Asterdam: Elsevier Science, 1992.

    Google Scholar 

  4. W. N. Campbell. Automatic detection of prosodic boundaries in speech. Speech Communication, 13:343–354, 1993.

    Article  Google Scholar 

  5. T. H. Crystal and A. S. House. Segmental durations in connected-speech signals: Current results. J. Acoust. Soc. Am., 83:1553–1573, 1988.

    Article  ADS  Google Scholar 

  6. W. N. Campbell and S. D. Isard. Segment durations in a syllabic frame. Journal of Phonetics, 47:19:37, 1991.

    Google Scholar 

  7. J. S. Coleman. ‘Synthesis-by-rule’ without segments of rewrite- rules. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 43–60. Amsterdam: Elsevier Science, 1992.

    Google Scholar 

  8. R. Collier. A comment on the prediction of prosody. In G. Bailly, C. Benoît, and T. R. Sawallis, editors, Talking Machines: Theories, Models, and Designs, pp. 205–208. Amsterdam: Elsevier Science, 1992.

    Google Scholar 

  9. H. Fujisaki. A note on the physiological and physical basis for the phrase and accent components in the voice fundamental frequency contour. In O. Fujimura, editor, Vocal Fold Physiology: Voice Production, Mechanisms and Functions. New York: Raven, 1988.

    Google Scholar 

  10. Th. Gay. Effect of speaking rate on diphthong formant movements. J. Acoust. Soc. Am., 44:1570–1573, 1968.

    Article  ADS  Google Scholar 

  11. S. R. Hertz. The delta programming language: An integrated approach to nonlinear phonology, phonetics, and speech synthesis. In J. Kingston and M. E. Beckman, editors, Papers in Laboratory Phonology I: Between the Grammar and Physics of Speech, pp. 215–257. Cambridge, UK: Cambridge University Press, 1990.

    Google Scholar 

  12. D. H. Klatt. Interaction between two factors that influence vowel duration. J. Acoust. Soc. Am., 54:1102–1104, 1973.

    Article  ADS  Google Scholar 

  13. A. Ljolje. High accuracy phone recognition using context clustering and quasi-triphonic models. Computer Speech and Language, 8:129–151, 1994.

    Article  Google Scholar 

  14. M. J. Macchi. Using dynamic time warping to formulate duration rules for speech synthesis. J. Acoust. Soc. Am., 85:S1(U49), 1989.

    ADS  Google Scholar 

  15. B. Möbius, M. Pätzold, and W. Hess. Analysis and synthesis of F0 contours by means of Fujisaki’s model. Speech Communication 13, pp. 53–61, 1993.

    Article  Google Scholar 

  16. S. G. Nooteboom. Some observations on the temporal organisation and rhythm of speech. In Proceedings of the Xllème International Congress of Phonetic Sciences, Aix-en-Provence, France, 1991.

    Google Scholar 

  17. J. P. Olive and R. W. Sproat. Principles of speech synthesis. In W. B. Kleijn and K. K. Paliwal, editors, Speech Coding and Synthesis. Amsterdam: Elsevier, 1995.

    Google Scholar 

  18. J. B. Pierrehumbert. The Phonology and Phonetics of English Intonation. PhD thesis, Massachusetts Institute of Technology, Distributed by the Indiana University Linguistics Club, 1980.

    Google Scholar 

  19. P. Prieto, J. P. H. van Santen, and J Hirschberg. Tonal alignment patterns in Spanish. Journal of Phonetics, 23: 1995.

    Google Scholar 

  20. K. N. Stevens and C. A. Bickley. Constraints among parameters simplify control of Klatt formant synthesizer. Journal of Phonetics, 19:161–174, 1991.

    Google Scholar 

  21. D. Sankoff and J. B. Krusal. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. London: Addison-Wesley, 1983.

    Google Scholar 

  22. J. P. H. van Santen. Contextual effects on vowel duration. Speech Communication, 11:513–546, 1992.

    Article  Google Scholar 

  23. J. P. H. van Santen. Analyzing N-way tables with sums-of- products models. Journal of Mathematical Psychology, 37:327–371, 1993.

    Article  MathSciNet  MATH  Google Scholar 

  24. J. P. H. van Santen. Timing in text-to-speech systems. Proceedings of the European Conference on Speech Communication and Technology, Berlin, Germany, pp. 1397–1404, 1993.

    Google Scholar 

  25. J. P. H. van Santen. Assignment of segmental duration in text- to-speech synthesis. Computer Speech and Language, 8:95–128, 1994.

    Article  Google Scholar 

  26. J. P. H. van Santen. Using statistics in text-to-speech system construction. Proceedings of the ESCA/IEEE Workshop on Speech Synthesis, Mohonk, NY, pp. 240–243, 1994.

    Google Scholar 

  27. J. P. H. van Santen, J. C. Coleman, and M. A. Randolph. Effects of post-vocalic voicing on the time course of vowels and diphthongs. J. Acoust. Soc. Am., 4.2:2444–2447, 1992.

    Article  Google Scholar 

  28. J. P. H. van Santen, and J. Hirschberg. Segmental effects on timing and height of pitch contours. In Proceedings of the International Conference on Spoken Language Processing, Yokohama, Japan, pp. 719–722, 1994.

    Google Scholar 

  29. J. P. H. van Santen and C. Shih. Syllabic and segmental timing in Mandarin Chinese and American English (in preparation)

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag New York, Inc.

About this chapter

Cite this chapter

van Santen, J.P.H. (1997). Segmental Duration and Speech Timing. In: Sagisaka, Y., Campbell, N., Higuchi, N. (eds) Computing Prosody. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-2258-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4612-2258-3_15

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4612-7476-6

  • Online ISBN: 978-1-4612-2258-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics