Keywords

When Elizabeth heard Mary’s greeting, the baby leaped in her womb, and Elizabeth was filled with the Holy Spirit. (Luke 1:41, Bible New International Version)

The scientific method has confirmed what has been passed down for at least two millennia. Research results converge to show that the fetus begins reacting to maternal and other acoustic signals before birth. Resourceful twentieth century investigators devised procedures to record what sounds could be generated within the womb or be transmitted from the outside. In addition, they learned how to measure both fetal and neonatal responses to particular sounds. Audio recordings in the womb show that a strong maternal voice signal is available to the fetus. Fetal cardiac and motor activity show a reaction to the mother’s voice, and there is a good reason to believe that the sound of her voice is retained across the transition into postnatal life. For the fetus, this early experience is potentially a rich source of learning about the mother, language, and their contexts. The necessity of such prenatal auditory experience has not been established. Special cases such as children in the deaf community show us that there is considerable plasticity in early postnatal development.

Maternal Voice Is Available in the Womb

Recordings from inside the womb during labor show that the maternal voice is prominent among other sounds. Recordings exist of womb sounds because researchers have threaded a tiny microphone or hydrophone through the cervix after rupture of the amniotic sac with microphone placement next to the fetal head. In the most extensive study to date, five laboring women in a hospital in Roubaix, France, spoke both spontaneously and also under instruction (“live”) while recordings were made both inside the womb and outside in the room (Querleu, Renard, Versyp, Paris-Delrue, & Crepin, 1988; Querleu, Renard, Versyp, Paris-Delrue, & Vervoort, 1988). Besides the live recordings , a loudspeaker in the room played comparable prerecorded male, female, and the maternal voices, and these stimuli were recorded both near the mother’s belly in air and in utero. Analysis of the intrauterine recordings showed that the maternal live voice was more intense than other voices. Moreover, as the researchers pointed out, in everyday life a mother’s voice is not only louder but is also normally more abundantly available than other voices. With every maternal utterance, sound waves from her vocal tract are both airborne and borne through her abdomen to the womb where they are likely to be delivered by soft tissue and liquid conduction to the cochlea (Adelman, Chordekar, Perez, & Sohmer, 2014; Perez, Adelman, & Sohmer, 2016). One would expect transmission through tissue and liquid to be more effective for lower frequencies, and the early recordings generally confirmed this. Frequencies above about 500 Hz showed overall attenuation. Interestingly, conduction transmission of sound waves to the fetal cochlea means that it probably isn’t possible for the fetus to determine the location of sound sources. Sound localization requires time or intensity differences in the arrival of sound at the two ears. Nevertheless, neonates less than 2 h old have been reported to turn their heads in the direction of the mother’s voice (Querleu et al., 1984). Thus, it appears that an ability that is not present in the womb is available immediately upon birth .

On the internal recordings , the Roubaix team discovered that the loudspeaker version of a male voice emerged and was more intense than a female non-maternal voice but was masked by internal low frequency sounds such as maternal vascular and digestive noises that are relatively intense. The non-maternal female voice , with its higher frequencies, although less intense, had less competition with other sounds. In comparison, the live maternal voice had both the advantage of greater intensity and less masking, and it was therefore the most prominent of the recorded voices. The intrauterine recordings also revealed the presence of sound emanating from the maternal cardiac and digestive systems.

In the Roubaix study , the research protocol included an intelligibility test of speech in the womb by using standard clinical speech-language lists of words and nonsense syllables. Blinded listeners then heard the intrauterine recordings and were asked to identify the vowels, consonants, and nonsense syllables or words. The frequency components in speech range from the fundamental frequency of the voice at around 200 Hz to about 8000 Hz. If the higher frequencies in speech are attenuated, one would expect difficulty for listeners identifying individual consonants and vowels that are information-rich in frequencies greater than about 150 Hz for men or 250 Hz for women. Identification of consonants was 14% for the recorded maternal voice but 30% for her live voice. For all voices together, intelligibility of the recorded speech hovered around 30%, individual phonemes, nonsense syllables, and words included. Similar intelligibility results have been reported in research on cochlear microphonics in fetal sheep (Smith, Gerhardt, Griffiths, Huang, & Abrams, 2003).

In contrast to the data suggesting that fetuses receive little exposure to frequencies greater than 1000 Hz, there are results suggesting that higher frequencies are available to the fetus. For the Roubaix study , the investigators used two microphones, one for lower and one for higher frequencies. The microphone in utero that was especially chosen for fidelity in recording frequencies greater than 1000 Hz showed that frequencies in a synthetic vowel emerged up to 10,000 Hz. In addition, the vowel /i/ (as in “feet”), requiring a second formant component of 2500 Hz for recognition, was one of the best recognized in intelligibility tests. Researchers have speculated that resonance patterns form among the sound waves inside the uterus, that attenuation of frequencies above 1000 Hz is not linear, and that, thus, higher frequencies are present. For a review and model, see Lecanuet et al. (1998).

If there is uncertainty about the extent of prenatal exposure to higher frequency sounds , there is no uncertainty about the lower frequency prosodic features of speech: rhythm, stress (relative loudness), and intonation (pitch). Prosody is well preserved on intrauterine recordings (D. Querleu, Renard, Boutteville, & Crepin, 1989). Prosody communicates emotion (Wiethoff et al., 2008) and intention (Hellbernd & Sammler, 2016) as well as grammatical function (e.g., for tone languages, or for English differentiation of questions and statements, and parts of speech like nouns and verbs).

Experiments show that the transmission of sound frequencies from outside into a specific location in the amniotic cavity depends on many factors: location of the sound source, how much tissue or fluid is present between the source and measurement site, whether the uterus is larger or smaller with thicker or thinner walls, and the acoustic structure of the outside environment (Gerhardt, 1989; Lecanuet et al., 1998; Turkewitz, 1988). The experience of the maternal voice is undoubtedly dynamic depending on the development of the fetal auditory system plus the external factors listed above. One conclusion has emerged from research over time: contrary to some earlier hypotheses (Fifer & Moon, 1988; Spence & DeCasper, 1987) the maternal voice can deliver more acoustic information to the developing fetus than lower frequency prosody. Individual sounds of speech (phonemes) seem to be available prenatally in addition to the prosody and unique characteristics of voices (Huotilainen, 2013; Moon, Lagercrantz, & Kuhl, 2013).

In addition to being richer in language information than previously thought, the maternal voice potentially delivers sensory information to other systems besides the auditory system. Sound transmission into the uterus is best described as multimodal (Moon & Fifer, 2000). For one thing, sound is transmitted in waves of compressed molecules that at high intensities and low frequencies can be felt as vibrations in the body. How the fetus experiences the non-cochlear components of sound delivery, particularly the mother’s voice, has not been investigated. One question is the following: Are vibrations from the mother’s voice available through the cutaneous senses when the fetus’s skin is in direct contact with the uterus? This may be possible when a pregnant woman’s voice is in the lower range as in the vocalist’s “chest voice.” If so, the experience of the maternal voice may be available earlier than the onset of hearing because the cutaneous senses are among the earliest sensory systems to develop (Hogg, 1941).

In addition to the cutaneous route , another nonauditory mode of transmission of the maternal voice is through the vestibular sense. For one thing, speaking requires movement of the mother’s diaphragm and intercostal muscles in the chest in controlled expiration of air through the larynx. Respiratory inspiration motions are also synchronized with speech. These maternal motions are likely to cause at least small displacements of the fetus’s body in space, depending on maternal effort and fetal size. Moreover, talking is accompanied by the speaker’s motion of hands, arms, torso, and even legs, and this motion is very likely to affect the position of the fetus in space and be detected by the vestibular sense. Furthermore, as the fetus grows heavier over time, the nature of maternal body motion changes. For this type of motion, there is some research. In the late term fetus, passive motion of mother being propelled in a swing resulted in heart rate acceleration in the fetus when passive swinging was in an anterior-posterior direction but not side-to-side (Lecanuet & Jacquet, 2002). The study showed that fetuses are sensitive to motion of mother’s body, but it doesn’t address the conjugate prenatal stimulation by the maternal voice and motion that comes from her gestures and other body movements.

As a final point in the discussion of what the maternal voice may bring to the development of the fetus, there are circadian rhythms that bring together maternal voice and motion. When it is night and mother is lying in bed, for the most part she is not talking nor is anyone else. During the day when she is awake, upright, and actively moving, she is more likely to be speaking, and other voices are present.

Thus, there are at least three ways the maternal voice may be synchronized with nonauditory fetal sensation : vocal vibration, vestibular motion, and circadian cycles. Virtually nothing is known about how the relationship between maternal voice and motion affect prenatal and early postnatal development.

The Fetus Responds to Maternal Voice

The fetus is, of course, not readily accessible for observation of a response to a sound, and this poses a big research challenge. One potential source of information about fetal reaction to sound is the reports of pregnant women who feel startle-type responses to loud sounds. But maternal report of fetal movement in general is not a sensitive way to determine if a fetus has detected a particular sound. In one study, only 16% of fetal movement episodes detected by Doppler ultrasound were reported as such by mothers (Johnson, Jordan, & Paine, 1990). Researchers therefore use other methods. There are three targets of measurement for fetal detection of a particular sound: fetal brain activity, movement, and cardiac changes. The most frequently used methods have been (1) fetal motion detection through ultrasound visualization of the limbs or face, or detection of nonspecific motion through the use of Doppler ultrasound , and (2) measurement of fetal cardiac response to a sound through Doppler ultrasound. The method of Doppler ultrasound uses high frequency sound waves that penetrate through to the fetus and are affected by fetal motion, either of the heart or other parts of the body. The Doppler device receives the echo of the altered sound waves and provides data about them.

Doppler data on fetal motor activity has provided the most solid evidence thus far of the existence of fetal hearing at 24 weeks gestational age (GA) . Forty-seven fetuses were tested with a rattle-like device (peak intensity 89 dB) 4 inches away from the maternal abdomen near the fetal head. Comparison of stimulation trials with sham trials showed a significant fetal motor, but not cardiac, response at 24 weeks. Fetuses at 36 weeks responded to the sound with both motor and cardiac changes (DiPietro et al., 2013). In a study conducted 20 years earlier, fetal movement to pulsed sound was reported for 20-week-olds (Shahidullah & Hepper, 1993). However, because 110 dB broadband sound (80–2000 Hz) was delivered through direct coupling with the maternal abdomen, it is possible that the fetal response was due to vibratory sensory characteristics of the sound and not cochlear reception. Acoustically complex , percussive sounds may be more effective at eliciting a response than pure tones, at least that seems to be the case for newborns (Clarkson & Berg, 1983). These experiments suggest that at 24 weeks, around the age of viability, the fetus is capable of detecting and responding to the acoustic features of the mother’s voice. The conjoint features that accompany the maternal voice and that do not depend on a functioning fetal auditory system may be detected well before 24 weeks.

Very few studies have been published on fetal response to maternal voice per se. The studies that have been reported have used two types of stimulation , live speaking and recorded speech delivered by loudspeaker. These modes of delivery have different characteristics for the fetus. Live voice is the more familiar mode, is accompanied by synchronous auditory cardiovascular changes, and is multimodal . A recorded version of the mother’s voice delivered from a loudspeaker is likely to be different in intensity and spectral properties compared to the live voice and is experienced during the audio recording process prior to loudspeaker testing as a live voice, and the voice is auditory only, not multimodal.

An older study with ten 36-week gestational age (GA) fetuses actually directly compared fetal motor responding to maternal live vs. recorded direct skin contact loudspeaker speech. Results showed differences in motor response to the two modes of stimulation (Hepper, Scott, & Shahidullah, 1993).

In a more recent Doppler ultrasound study of 74 fetuses at 36 weeks GA, only live speech was used in an effort to characterize the more typical circumstances of mother talking (Voegtline, Costigan, Pater, & DiPietro, 2013). In order to have some control over samples of talking, mothers read a passage aloud. This type of talking is arguably rare for fetuses. Fetuses showed a decrease in motor activity when the mother was speaking. Compared to a baseline period, fetal response to the voice was mediated by both the fetus’s and the mother’s prior state of arousal. The largest fetal motor and cardiac responses came from those who had been in a more quiescent state and whose mothers had previously been resting. But even the fetuses in a more active state of arousal responded with a heart rate-orienting response when their mother switched from conversation to reading aloud. In an even more recent study using ultrasound visual images , researchers coded five specific fetal movements of head, arms, and face in “awake” fetuses during live reading of one of two children’s stories (Marx & Nagy, 2015). The fetuses were either in the second trimester (N = 10) or third (N = 13). Results showed no difference in the movements to voice compared to a control period with the exception of one of the six movements – yawning in third trimester fetuses (less frequent during speaking). The results should be considered preliminary due to the small N and low inter-rater reliability.

Taken together, these three studies, especially Voegtline et al. (2013), confirm that by the beginning of the third trimester , fetuses detect when their mother is talking, and they respond with changes in movement and heart rate. Fetal responses vary depending upon their own state of arousal and that of their mother when she begins to talk.

In an attempt to discern whether recognition of the mother’s voice begins before birth, researchers have studied fetal response to audio recordings of mother vs. an unfamiliar female. In the studies, the two voices have been rendered equivalent by presenting them over a loudspeaker with the same words, equal duration, and equal intensity. One such study with ten fetuses showed no difference in fetal movement to the two voices (Hepper et al., 1993), whereas two studies from a different lab did find differences in heart rate response. In the first of these, at 38 weeks, fetuses responded with a sustained heart rate acceleration to a recording of a poem read by the mother, whereas the response to an unfamiliar female voice was a sustained deceleration (Kisilevsky et al., 2003). An important control for the within-subjects experimental design was that the maternal voice for each fetus served as the control voice for the subsequent one. In a later study, 104 fetuses of 33–41 weeks GA again showed a heart rate acceleration during a maternal voice recording compared to an unfamiliar female voice (Kisilevsky et al., 2009). The finding of heart rate acceleration to a recorded version of the mother’s voice may reflect an increase in arousal because the recorded sound of her voice is unusual. This is supported by the 2003 study’s deceleratory response to the novel voice. In neither the 2003 nor the 2009 research articles do the authors report fetal state of arousal prior to sound delivery. In other fetal perception studies, the stimuli are presented only when the fetus is in a state of low heart rate variability so that a small (usually deceleratory) orienting response to a moderately novel stimulus can be detected against background fetal heart rate fluctuations (Lecanuet, Granier-Deferre, & Jacquet, 1992).

To sum up the results from the few extant prenatal maternal voice studies, they provide a positive answer to the question of whether there is prenatal perception of the mother’s voice and whether her voice is perceived as distinct from others. Studies of fetal response to other types of sound stimuli are consistent with prenatal learning about the mother’s voice, particularly exposure studies in which sounds, such as stories, are presented daily at home over time, and then the fetus is tested for evidence of having learned about them (DeCasper, Lecanuet, Busnel, Granier-Deferre, et al., 1994; Krueger, Holditch-Davis, Quint, & Decasper, 2004).

Postnatal Retention of Prenatal Experience

Newborn infants are much more accessible for observation than fetuses. Consequently there have been many more prenatal learning studies of babies than fetuses. The research strategy is based on the assumption that during the neonatal period, there has been so little opportunity to learn that infant responses reflect retained prenatal experience. Particularly informative have been studies that measure newborn “preference” for one sound compared to another. Preference has been inferred by exposing the infant in the laboratory to two alternatives, one presumably familiar and the other unfamiliar. Researchers measure the relative behavioral attraction to the two stimuli. For example, neonates make head turns in the delivery room toward live maternal voice more than toward an unfamiliar female voice (Querleu et al., 1984) or later after birth toward the maternal voice compared to turns toward the father’s voice (Lee & Kisilevsky, 2014).

Most neonatal preference experiments do not rule out rapid postnatal learning, but there are a few that provide strong evidence of prenatal learning. One such experiment used infant-controlled presentation of sounds in headphones. Sound presentation was contingent on pacifier sucking, and the frequency and duration of production of one sound vs. another were assumed to reveal infant preference for the sound. Using this method, Spence and colleagues obtained results consistent with a preference for a synthesized prenatal version of the mother’s voice, a sound not experienced postnatally (Spence & DeCasper, 1987; Spence & Freeman, 1996). In another study, DeCasper and Spence asked pregnant women to frequently read a story aloud, and then the neonates were tested for preference for the prenatal story vs. a novel one. The infants preferred the familiar prenatal story that they had not heard since being born (DeCasper & Spence, 1986).

Infant-controlled voice preference studies show that neonates alter sucking patterns to activate the sound of the mother’s voice whether she was recorded reading a nursery story (DeCasper & Fifer, 1980) or conversing with another adult (Moon & Fifer, 1990). And they prefer a recording of the maternal language vs. a foreign one (Moon, Panneton Cooper, & Fifer, 1993) but show no preference for either of a bilingual mother’s two languages (Byers-Heinlein, Burns, & Werker). The language recognition results were extended to show that neonates respond to a vowel from the maternal language as if it is familiar compared to a vowel from a foreign language. The number of hours of postnatal experience to the native language did not affect responding to the two languages, consistent with an effect of prenatal, and not postnatal, learning (Moon et al., 2013). The experiments comparing native and foreign languages show that frequencies above 1000 Hz are available in the womb because the vowels cannot be learned on the basis of lower frequencies alone (Huotilainen, 2013) and neither can most consonants. In a neonate brain imaging study on fetal exposure to a recorded pseudoword (not the mother’s voice), response patterns to the familiar word were different compared to other stimuli, and the amount of prenatal exposure was related to postnatal brain response strength (Partanen et al., 2013). In neonatal brain imaging studies , infants showed different brain evoked response patterns to a recording of their mother saying the word “baby” vs. an unfamiliar female. Remarkably, they seemed to recognize the voice on the basis of less than a second of stimulus (Deregnier, Nelson, Thomas, Wewerka, & Georgieff, 2000). In contrast, results of a sucking behavior study using the single word “baby” showed no preference for mother vs. a stranger (Moon, Zernzach, & Kuhl, 2015). This seems to indicate that although voice recognition is present despite minimal information, it takes a richer sample of the maternal voice for newborns to mount a behavioral response and show a preference. Perhaps prosody is required. A study of 1-month-olds using the contingent sucking procedure showed no difference in responding to the voice of mother vs. a stranger when the recordings were monotone (Mehler, Bertoncini, Barriere, & Jassik-Gerschenfeld, 1978), and rhythm is an important feature of newborn language discrimination (Nazzi, Bertoncini, & Mehler, 1998). It is apparent from decades of research on prenatal experience with the maternal voice that the voice is a prominent feature of the intrauterine environment, that fetuses perceive it, and that they learn as a result of exposure. At the very least, they learn to recognize their mother’s voice, and they learn about her language(s).

Prenatal Auditory Experience and Later Spoken Language Acquisition

Although experimental evidence converges on the existence of prenatal learning about voices and speech, hearing the mother’s voice during the fetal period may not be necessary for acquisition of spoken language (Moon, 2011). Especially informative are cases in which fetuses are exposed to little or no sound of maternal speech. One such example is the language acquisition of hearing children of deaf mothers who used little spoken speech during both the pre- and postnatal periods. Such children, KODAs (kids of deaf adults) , are studied especially for their bilingual language acquisition. They learn both spoken and sign languages. Despite a relative paucity of experiencing spoken language compared to hearing children of hearing parents, KODAs meet acquisition milestones in their two languages at the same rate as hearing children of hearing parents (Brackenbury, Ryan, & Messenheimer, 2006; Petitto et al., 2001).

In contrast to the relatively rare research studies of KODAs , there is a large and growing research literature on language learning by children who were born deaf but were later able to perceive speech because of cochlear implants at the end of their first year at the earliest. Prior to being implanted, these children have had minimal experience with the sounds of speech, depending on their degree of deafness . Even after implantation, they do not have the full range of sound available because cochlear implants deliver only a portion of the audible frequencies to the auditory nerve. Nonetheless, under optimal conditions, children with cochlear implants (CI children) who had little to no prenatal auditory experience have been shown to score within the range of hearing children on standardized measures of language acquisition. In a recent prospective longitudinal sample of 60 CI children, predictors for language outcome at 10.5 years included implantation before 3 years of age, a high level of parent education and income, hearing aids prior to implantation, early professional speech and language intervention, and early mainstream classroom education. The participants’ preschool speech/language ability predicted their ability at 10.5 years of age (Geers & Nicholas, 2013). This study is particularly informative because it shows that early intervention and nurturing conditions can alter a pre- and early postnatal developmental path that may seem fixed at birth.

This example speaks to early deprivation of acoustic stimulation and suggests plasticity in brain organization for speech perception and production at least up to the end of the second year of postnatal life (Kral & Sharma, 2012; Markman et al., 2011). In fact, it appears that there are multiple open windows for brain organization for different aspects of spoken language that close at different times in development (Markman et al., 2011). These results open many questions for both basic and applied research on aberrant early experience relevant to language acquisition. For basic research, what is the best way to describe the aspects of language that are under development at different times? What are the mechanisms that allow the development of one aspect of language before another? For applied research an important question is how to support language development both before and after intervening to make relatively typical auditory experience possible?

Preterm Infants

Deaf infants and preterm infants both experience unusual exposure to sound during very early development compared to hearing fetuses born at term. Without supportive intervention, both populations are at risk for delayed language development. An exception is deaf children of deaf mothers who use fluent sign language with them (Bornstein, Selmi, Haynes, Painter, & Marx, 1999). In contrast to hospitalized preterms at the same gestational ages, deaf fetuses experience the other forms of sensory stimulation that accompany maternal use of language, whether or not the mother uses spoken or signed language (Petitto, Holowka, Sergio, & Ostry, 2001). From CI infants, we can learn about the consequences of early auditory deprivation followed by intervention, and we can be optimistic about the plasticity of the auditory system and language acquisition in the first 2 years after birth if the right kind of support is provided. Hospitalized preterm infants are not comparable to deaf infants in that they do experience sounds during what would normally have been their fetal period, including speech sounds, but the acoustic characteristics are not the same as the ones they would have experienced at the same stage of development in utero. Research has yet to clearly specify the characteristics of those intrauterine sounds as perceived by the fetus at different points in development. If only we had a psychophysics of the fetus!

For the hospitalized preterm infant, the environment may provide a rich sound environment, perhaps even much richer compared to the womb – a broader range and different distribution of the sound frequency spectrum, machine sounds, and different voices but less maternal voice and no maternal digestive and cardiac sounds. This will vary, depending on the infant’s circumstances. We have had research results for decades demonstrating that loud and unabated sound in the NICU is detrimental to hospitalized preterm infants (Graven, 2000). Compared to the womb, in the NICU there is most likely a relative paucity of sustained exposure to a particular voice, that of the mother. Furthermore, the impoverished experience with this one particular voice includes the lack of conjoint nonauditory sensory stimulation. The exception is when the mother is holding the baby, particularly with skin-to-skin contact as in kangaroo care (Feldman, Rosenthal, & Eidelman, 2014). We know next to nothing about the developmental role played by the absence of experience with one particular voice and with its conjoint multisensory stimulation.

In conclusion, many studies have been conducted on prenatal exposure and learning about the mother’s voice. The studies confirm what many mothers and others have believed for millennia – that the developing child in the womb perceives the maternal voice, learns from it, and retains what was learned into early postnatal life. Much remains unknown about the prenatal effect of hearing the mother’s voice, including even very basic information about the dynamic unfolding of the experience, what sensations are involved, and even whether the experience is necessary for favorable development.

Key Messages

  1. Fetal auditory experience with the mother’s voice begins around 24 weeks after conception, and learning that occurred during prenatal life persists into early postnatal life.

  2. The maternal voice is potentially a rich source of multimodal stimulation that is comparatively absent for hospitalized preterm infants.

  3. Deaf infants with early cochlear implants appear to surmount early auditory deprivation and can be equivalent to hearing children in language acquisition. This suggests early human brain plasticity for language acquisition.

  4. The sound of the maternal voice and the complex of sensory experiences that accompany it are present for typically developing fetuses, but the necessity of the sound per se for favorable long term development has not been established, for either the fetus or the hospitalized preterm.