Abstract
This article describes a neural network model that addresses the acquisition of speaking skills by infants and subsequent motor equivalent production of speech sounds. The model learns two mappings during a babbling phase. A phonetic-to-orosensory mapping specifies a vocal tract target for each speech sound; these targets take the form of convex regions in orosensory coordinates defining the shape of the vocal tract. The babbling process wherein these convex region targets are formed explains how an infant can learn phoneme-specific and language-specific limits on acceptable variability of articulator movements. The model also learns an orosensory-to-articulatory mapping wherein cells coding desired movement directions in orosensory space learn articulator movements that achieve these orosensory movement directions. The resulting mapping provides a natural explanation for the formation of coordinative structures. This mapping also makes efficient use of redundancy in the articulator system, thereby providing the model with motor equivalent capabilities. Simulations verify the model's ability to compensate for constraints or perturbations applied to the articulators automatically and without new learning and to explain contextual variability seen in human speech production.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Abbs JH (1986) Invariance and variability in speech production: a distinction between linguistic intent and its neuromotor implementation. In: Perkell JS, Klatt DH (eds) Invariance and variability in speech processes. Erlbaum, Hillsdale, pp 202–219
Abbs JH, Gracco VL (1984) Control of complex motor gestures: orofacial muscle responses to load perturbations of lip during speech. J Neurophysiol 51:705–723
Boyce SE, Krakow RA, Bell-Berti F, Gelfer CE (1990) Converging sources of evidence for dissecting articulatory movements into core gestures. J Phonetics 18:173–188
Bullock D, Grossberg S (1988) Neural dynamics of planned arm movements: emergent invariants and speed-accuracy properties during trajectory formation. Psychol Rev 95:49–90
Bullock D, Grossberg S, Guenther FH (1993) A self-organizing neural network model for redundant sensory-motor control, motor equivalence, and tool use. J Cogn Neurosci 5:408–435
Cohen MA, Grossberg S, Stork DG (1988) Speech perception and production by a self-organizing neural network. In: Lee YC (ed) Evolution, learning, cognition, and advanced architectures. World Scientific Publishers, Hong Kong
Daniloff R, Schuckers G, Feth L (1980) The physiology of speech and hearing: an introduction. Prentice-Hall, Englewood Cliffs
Easton TA (1972) On the normal use of reflexes. Am Sci 60:591–599
Eimas PD, Siqueland ER, Jusczyk P, Vigorito J (1971) Speech perception in infants. Science 171:303–306
Folkins JW, Abbs JH (1975) Lip and jaw motor control during speech: responses to resistive loading of the jaw. J Speech Hearing Res 18:207–220
Fowler CA (1980) Coarticulation and theories of extrinsic timing. J Phonetics 8:113–133
Fowler CA (1990) Some regularities of speech are not consequences of formal rules: comments on Keating's paper. In: Kingston J, Beckman ME (eds) Papers in laboratory phonology. I. Between the grammar and physics of speech. Cambridge University Press, Cambridge, UK, pp, 476–487
Gaudiano P, Grossberg S (1991) Vector associative maps: Unsupervised real-time error-based learning and control of movement trajectories. Neural Networks 4:147–183
Grobstein P (1991) Directed movement in the frog: a closer look at a central representation of spatial location. In: Arbib MA, Ewert JP (eds) Visual structures and integrated functions. Springer, Berlin Heidelberg New York, pp 125–138
Guenther FH (1992) Neural models of adaptive sensory-motor control for flexible reaching and speaking. PhD dissertation, Boston University
Guenther FH (1993) A self-organizing neural model for motor equivalent phoneme production. In: Proceedings of the World Congress on Neural Networks, Portland. Erlbaum, Hillsdale, pp III-6–9
Guenther FH (1994) Speech sound acquisition, coarticulation, and rate effects in a neural network model of speech production. Boston University Center for Adaptive Systems Technical Report CAS/CNS-94-012
Henke WL (1966) Dynamic articulatory model of speech production using computer simulation. PhD dissertation, Massachusetts Institute of Technology
Kaplan E, Kaplan G (1971) The prelinguistic child. In: Eliot J (ed) Human development and cognitive processes. Holt, Rinehart, and Winston, New York, pp 358–381
Keating PA (1990) The window model of coarticulation: articulatory evidence. In: Kingston J, Beckman ME (eds) Papers in laboratory phonology. I. Between the grammar and physics of speech. Cambridge University Press, Cambridge, UK, pp 451–570
Kelso JAS, Tuller B, Vatikiotis-Bateson E, Fowler CA (1984) Functionally specific articulatory cooperation following jaw perturbations during speech: evidence for coordinative structures. J Exp Psychol Hum Percep Perform 10:812–832
Kent RD, Minifie FD (1977) Coarticulation in recent speech production models. J Phonetics 5:115–133
Kent RD, Carney P, Severeid L (1974) Velar movement and timing: evaluation of a model for binary control. J Speech Hearing Res 17:470–488
Kozhevnikov VA, Chistovich LA (1965) Speech: articulation and perception. Translation by Joint Publications Research Service. Washington DC (JPRS 30543)
Kuhl PK (1979) Speech perception in early infancy: perceptual constancy for spectrally dissimilar vowel categories. J Acoust Soc Am 66:1668–1679
Lindblom B (1983) Economy of speech gestures. In: MacNeilage PF (ed) The production of speech. Springer, Berlin Heidelberg New York, pp 217–245
Lindblom B, Lubker J, Gay T (1979) Format frequencies of some fixed-mandible vowels and a model of speech motor programming by predictive simulation. J Phonetics 7:147–161
MacNeilage PF (1970) Motor control of serial ordering in speech. Psychol Rev 77:182–196
MacNeilage PF, Davis B (1990) Acquisition of speech production: frames, then content. In: Jeannerod M (ed) Attention and performance. XIII. Motor representation and control. Erlbaum, Hillsdale, pp 453–576
Miyawaki K, Strange W, Verbrugge R, Liberman AM, Jenkins JJ, Fujimura O (1975) An effect of linguistic experience: the discrimination of [r] and [1] by native speakers of Japanese and English. Percept Psychophys 18:331–340
Munhall KG, Ostry DJ, Flanagan JR (1991) Coordinate spaces in speech planning. J Phonetics 19:293–307
Oller DK (1980) The emergence of the sounds of speech in infancy. In: Yeni-Komshian GH, Kavanagh JF, Ferguson CA (eds) Child phonology, Vol. 1. production. Academic Press, New York, pp 93–112
Penfield W, Rasmussen T (1950) The cerebral cortex of man: a clinical study of localization and function. MacMillan, New York
Perkell JS (1980) Phonetic features and the physiology of speech production. In: Butterworth B (ed) Language production, Vol 1. Speech and talk. Academic Press, New York, pp 337–372
Perkell JS, Nelson WL (1985) Variability in production of the vowels /i/ and /a/. J Acoust Soc Am 77:1889–1895
Sachs J (1976) The developments of speech. In: Carterette EC, Friedman MP (eds) Handbook of perception, Vol VIL. Languange and speech. Academic Press, New York, pp 145–172
Sakata H, Shibutani H, Kawano K (1980) Spatial properties of visual fixation neurons in posterior parietal association cortex of the monkey. J Neurophysiol 43:1654–1672
Saltzman EL, Kelso JAS (1987) Skilled actions: a task-dynamic approach. Psychol Rev 94:84–106
Saltzman EL, Munhall KG (1989) A dynamical approach to gestural patterning in speech production. Ecol Psychol 1:333–382
Stark RE (1980) Stages of speech development in the first year of life. In: Yeni-Komshian GH, Kavanagh JF, Ferguson CA (eds) Child phonology, Vol 1. production. Academic Press, New York, pp 73–92
Sussman HM, Smith JU (1971) Jaw movements under delayed auditory feedback. J Acoust Soc Am 50:685–691
Werker JF, Tees RC (1984) Cross-language speech perception: evidence for perceptual reorganization during the first year of life. Infant Behav Develop 7:49–63
Wood SAJ (1991) X-ray data on the temporal coordination of speech gestures. J Phonetics 19:281–292
Author information
Authors and Affiliations
Additional information
Supported in part by AFOSR F49620-92-J-0499
Rights and permissions
About this article
Cite this article
Guenther, F.H. A neural network model of speech acquisition and motor equivalent speech production. Biol. Cybern. 72, 43–53 (1994). https://doi.org/10.1007/BF00206237
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF00206237