Consideration of Infants’ Vocal Imitation Through Modeling Speech as Timbre-Based Melody

Minematsu, Nobuaki; Nishimura, Tazuko

doi:10.1007/978-3-540-78197-4_4

Nobuaki Minematsu¹ &
Tazuko Nishimura²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4914))

Included in the following conference series:

Annual Conference of the Japanese Society for Artificial Intelligence

1141 Accesses
1 Citations

Abstract

Infants acquire spoken language through hearing and imitating utterances mainly from their parents [1,2,3] but never imitate their parents’ voices as they are. What in the voices do the infants imitate? Due to poor phonological awareness, it is difficult for them to decode an input utterance into a string of small linguistic units like phonemes [3,4,5,6], so it is also difficult for them to convert the individual units into sounds with their mouths. What then do infants acoustically imitate? Developmental psychology claims that they extract the holistic sound pattern of an input word, called word Gestalt [3,4,5], and reproduce it with their mouths. We address the question “What is the acoustic definition of word Gestalt?” [7] It has to be speaker-invariant because infants extract the same word Gestalt for a particular input word irrespective of the person speaking that word to them. Here, we aim to answer the above question by regarding speech as timbre-based melody that focuses on holistic and speaker-invariant contrastive features embedded in an utterance.

Access provided by Autonomous University of Puebla. Download to read the full chapter text

Chapter PDF

Learning and adaptation in speech production without a vocal tract

Article Open access 19 September 2019

Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech

Article Open access 25 July 2024

Soundgen: An open-source tool for synthesizing nonverbal vocalizations

Article Open access 27 July 2018

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Kuhl, P.K., Meltzoff, A.N.: Infant vocalizations in response to speech: Vocal imitation and developmental change. J. Acoust. Soc. Am. 100(4), 2425–2438 (1996)
Article Google Scholar
Gruhn, W.: The audio-vocal system in sound perception and learning of language and music. In: Proc. Int. Conf. on language and music as cognitive systems (2006)
Google Scholar
Hayakawa, M.: Language acquisition and matherese. In: Language, Taishukan pub. vol. 35(9), pp. 62–67 (2006)
Google Scholar
Shaywitz, S.E.: Overcoming dyslexia, Random House (2005)
Google Scholar
Kato, M.: Phonological development and its disorders. J. Communication Disorders 20(2), 84–85 (2003)
Google Scholar
Hara, K.: Phonological disorders and phonological awareness in children. J. Communication Disorders 20(2), 98–102 (2003)
Google Scholar
Minematsu, N., Nishimura, T.: Universal and invariant representation of speech, CD-ROM of Int. Conf. Infant Study (2006), http://www.gavo.t.u-tokyo.ac.jp/~mine/paper/PDF/2006/ICIS_t2006-6_OnlinePDF.pdf
Johnson, K., Mullennix, J.W.: Talker variability in speech processing. Academic Press, London (1997)
Google Scholar
http://tepia.or.jp/archive/12th/pdf/viavoice_OnlinePDF.pdf
Miyamoto, K.: Making voices and watching voices. Morikawa Pub. (1995)
Google Scholar
Minematsu, N., et al.: Theorem of the invariant structure and its derivation of speech Gestalt. In: Proc. ISCA Int. Workshop on Speech Recognition and Intrinsic Variation, pp. 47–52 (2006)
Google Scholar
Minematsu, N.: Are learners myna birds to the averaged distributions of native speaker? – a note of warning from a serious speech engineer –, CD-ROM of ISCA Int. Workshop on Speech and Language Technology in Education (2007)
Google Scholar
Asakawa, S., Minematsu, N., Hirose, K.: Automatic recognition of connected vowels only using speaker-invariant representation of speech dynamics. In: Proc. InterSpeech, pp. 890–893 (2007)
Google Scholar
Qiao, Y., Asakawa, S., Minematsu, N.: Random discriminant structure analysis for continous Japanese vowel recognition. In: Proc. Int. Workshop on Automatic Speech Recognition and Understanding, December 2007 (to appear)
Google Scholar
Taniguchi, T.: Sounds become music in mind – Introduction to music psychology –. Kitaoji Pub. (2000)
Google Scholar
Titze, I.R.: Principles of voice production. Prentice-Hall Inc., Englewood Cliffs (1994)
Google Scholar
Miyazaki, K.: How well do we understand absolute pitch? J. Acoust. Soc. Jpn. 60(11), 682–688 (2004)
Google Scholar
Minematsu, N., Asakawa, S., Hirose, K.: Linear and non-linear transformation invariant representation of information and its use for acoustic modeling of speech. In: Proc. Spring Meeting Acoust. Soc. Jpn., pp. 147–148 (2007)
Google Scholar
Jakobson, R., Lotz, J.: Notes on the French phonemic pattern, Hunter (1949)
Google Scholar
Saussure, F.: Cours de linguistique general. In: Publie par Charles Bally et Albert Schehaye avec la collaboration de Albert Riedlinge, Lausanne et Paris, Payot (1916)
Google Scholar
Labov, W., Ash, W., Boberg, C.: Atlas of North American English. Walter de Gruyter, Berlin (2001)
Google Scholar
Saito, D., et al.: Derectional dependency of cepstrum on vocal tract length. In: Proc. Int. Conf. Acoustics, Speech, and Signal Processing (2008, submitted)
Google Scholar
Minematsu, N.: Yet another acoustic representation of speech. In: Proc. Int. Conf. Acoustics, Speech, and Signal Processing, pp. 585–588 (2004)
Google Scholar
Kawahara, T., et al.: Recent progress of open-source LVCSR engine Julius and Japanese model repository. In: Proc. Int. Conf. Spoken Language Processing, pp. 3069–3072 (2004)
Google Scholar
Asakawa, S., Minematsu, N., Hirose, K.: Multi-stream parameterization for structural speech recognition. In: Proc. Int. Conf. Acoustics, Speech, and Signal Processing (2008, submitted)
Google Scholar
Takeshima, C., Tsuzaki, M., Irino, T.: Identification of size-modulated vowel sequences and temporal characteristics of the size extraction process, IEIEC Technical Report, SP2006-29, 13-17 (2006)
Google Scholar
Smith, D.R., et al.: The processing and perception of size information in speech sounds. J. Acoust. Soc. Am. 171(1), 305–318 (2005)
Article Google Scholar
Hayashi, Y., et al.: Comparison of perceptual characteristics of scaled vowels and words. In: Proc. Spring Meeting Acoust. Soc. Jpn., pp. 473–474 (2007)
Google Scholar
Davis, R.D., Braun, E.M.: The gift of dyslexia, Perigee Trade (1997)
Google Scholar
Frith, U.: Autism: Explaining the enigma. Blackwell Pub., Malden (1992)
Google Scholar
Happe, F.: Autism: An introduction of psychological theory. UCL Press (1994)
Google Scholar
Higashida, N., Higashida, M.: Messages to all my colleagues living on the planet. Escor Pub. (2005)
Google Scholar
Nade, J.: The developing child with autism: evidences, speculations and vexed questions. In: Tutorial Session of IEEE Int. Conf. Development and Learning (2005)
Google Scholar
Asami, T.: A book on my son, Hiroshi, Nakagawa Pub., vol. 5 (2006)
Google Scholar
Trehub, S.E.: The developmental origins of musicality. Nature neurosciences 6, 669–673 (2003)
Article Google Scholar
Hauser, M.D., McDermott, J.: The evolution of the music faculty: A comparative perspective. Nature neurosciences 6, 663–668 (2003)
Article Google Scholar
Levitin, D.J., Rogers, S.E.: Absolute pitch: perception, coding, and controversies. Trends in Cognitive Sciences 9(1), 26–33 (2005)
Article Google Scholar
Kojima, S.: A search for the origins of human speech: Auditory and vocal functions of the chimpanzee. Trans Pacific Press (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Engineering, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
Nobuaki Minematsu
Graduate School of Medicine, The University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
Tazuko Nishimura

Authors

Nobuaki Minematsu
View author publications
You can also search for this author in PubMed Google Scholar
Tazuko Nishimura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ken Satoh Akihiro Inokuchi Katashi Nagao Takahiro Kawamura

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Minematsu, N., Nishimura, T. (2008). Consideration of Infants’ Vocal Imitation Through Modeling Speech as Timbre-Based Melody. In: Satoh, K., Inokuchi, A., Nagao, K., Kawamura, T. (eds) New Frontiers in Artificial Intelligence. JSAI 2007. Lecture Notes in Computer Science(), vol 4914. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78197-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-540-78197-4_4
Published: 27 July 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78196-7
Online ISBN: 978-3-540-78197-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Consideration of Infants’ Vocal Imitation Through Modeling Speech as Timbre-Based Melody

Abstract

Chapter PDF

Similar content being viewed by others

Learning and adaptation in speech production without a vocal tract

Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech

Soundgen: An open-source tool for synthesizing nonverbal vocalizations

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Consideration of Infants’ Vocal Imitation Through Modeling Speech as Timbre-Based Melody

Abstract

Chapter PDF

Similar content being viewed by others

Learning and adaptation in speech production without a vocal tract

Chimpanzee utterances refute purported missing links for novel vocalizations and syllabic speech

Soundgen: An open-source tool for synthesizing nonverbal vocalizations

Keywords

References

Author information

Authors and Affiliations

Editor information

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation