1 Introduction

A fundamental objective of child language research is to describe universal pathways to native language proficiency and further to this objective, to develop models of language development that characterizes language acquisition in all children. However, in large part, research—and consequently, theory—has been disproportionately guided by evidence drawn from the English monolingual child. Most language learners do not acquire English natively, but acquire Mandarin Chinese as a native language (Dryer & Haspelmath, 2013). An empirical skew toward languages like English can limit the generalizability of theories of early language development to even larger populations who speak a different language. While not unique to the study of language development, non-probability sampling and/or convenience sampling is not uncommon in psychological research. However, this practice does raise questions about whether developmental change applies universally or only to specific groups of learners. As early as infancy, children acquiring a tone system demonstrate distinct developmental trajectories in tone discrimination not easily accounted for by theories developed for infants learning non-tone systems (e.g., perceptual narrowing). Research findings on infant tone sensitivity in perceptual discrimination tasks are covered in Chap. 10 of this volume.

In this chapter, I will focus on three core developments in lexical processing in Mandarin Chinese learners: word segmentation, novel word learning, and familiar word recognition and discuss how research findings from each area contribute to an evolving narrative on child language acquisition. Where available, interwoven into each section are discussions of comparisons of monolingual and bilingual learners of Mandarin Chinese.

2 Word Segmentation in Mandarin Chinese Learners

A seminal study by Jusczyk & Aslin (1995) revealed that infants between 7 and 8 months track familiar words in speech months (for a review of this literature, see Bergmann & Cristia, 2016). Although infant word segmentation engages cognitive universals, such as phonological short-term memory (Minagawa, Hakuno, Kobayashi, Naoi, & Kojima, 2017), the ability to track repetitions of words is specific to the infants’ native language (Newman, Tsay, & Jusczyk, 2003; Polka & Sundara, 2012) and foreshadows later vocabulary growth in the toddler years (Newman, Ratner, Jusczyk, Jusczyk, & Dow, 2006; Singh, Reznick, & Liang, 2012). However, infants are constrained in early word segmentation in ways that impact upon our understanding of the salience of tone and pitch: When words change in vocal emotion, talker gender, or vocal pitch, infants at 7–8 months incorrectly interpret these changes as signifying different words (Houston & Jusczyk, 2000; Singh, Morgan, & White, 2004; Singh, White, & Morgan, 2008). This suggests that infants over-represent and assign relevance to surface variation in speech, such as pitch movements, that limits their ability to equate repetitions of the same word.

This raises questions about tone languages, which vary pitch both lexically and non-lexically. Singh & Foong (2012) investigated word segmentation abilities of bilingual infants acquiring both Mandarin and English when tested in each of their languages, with specific attention to the influence of pitch variation on word recognition. Infants were tested at 7.5-, 9-, and 11-months of age. All infants were familiarized with individual monosyllabic words and then tested on their ability to recognize these familiarized words in Mandarin Chinese and in English. Importantly, during the English session, one familiarized word was matched in vocal pitch between familiarization and test phases of the experiment and one familiarized word changed in pitch during the test phase from the familiarization phase. Pitch variation was lexical (i.e., a tone shift) in Mandarin and non-lexical (i.e., a pitch transposition) in English. Results revealed an ability across all age groups to consistently recognize pitch-matched (English) and tone-matched (Mandarin) words. However, age-related differences emerged for words that changed in pitch (English) or in tone (Mandarin). At 7.5 months, when tested on recognition of a pitch-mismatched word in English, infants did not recognize this word, replicating previous findings with English monolingual infants (Singh et al., 2008). At this age, in Mandarin, infants also did not recognize words that changed in tone. At 9 months, infants’ interpretations of pitch and tone reversed: When tested in English, infants recognized pitch-mismatched words; however, they also incorrectly recognized tone-mismatched words as instances of familiarized words. At 11 months, infants recognized pitch-mismatched words when tested in English, but not tonal mismatches in Mandarin, demonstrating a language-selective interpretation of pitch variation.

An important question is whether the developmental trajectory observed in this study is specific to bilingual infants. Prior research suggests both similarities (Polka & Sundara, 2003; Singh, 2017) and differences (Byers-Heinlein, & Werker, 2013; Polka, Orena, Sundara, & Worrall, 2017) in early word knowledge in monolingual and bilingual infants. It is possible that Mandarin monolingual infants would demonstrate a different trajectory with respect to tone interpretation, a possibility that awaits investigation. The following section moves from word segmentation to novel word learning in Mandarin Chinese.

3 Novel Word Learning in Mandarin Chinese

The majority of studies on novel word learning have focused on the acquisition of words in English and other European languages. These studies have yielded important discoveries about infant’s sensitivity to vowel and consonant variation, demonstrating that infants as young as 14 months are sensitive to vowel changes in newly learned word (e.g., Mani & Plunkett, 2008, but see Curtin, Fennell, & Escudero, 2009) as well as to consonant changes (e.g., Ballem & Plunkett, 2005). By comparison, investigations of word learning in Chinese are scarce. One question of particular relevance to Mandarin populations is how infants represent not just vowels and consonants, but also lexical tones native to Mandarin Chinese.

The study of Mandarin lexical tones contributes in specific ways to our understanding of the architecture of the developing lexicon. In particular, studies with learners of English, French, and also Italian have suggested that the lexicon of infants and children does not afford equal priority to different units of phonology. In particular, prior studies have suggested that infants place greater weight on consonants versus vowels as determinants of lexical identity, leading to a hypothesized consonant bias in word learning (e.g., Havy & Nazzi, 2009, Nazzi & Bertoncini, 2009, Nazzi, Floccia, Moquet, & Butler, 2009, but see also Floccia, Nazzi, Delle Luche, Poltrock, & Goslin, 2014, Højen & Nazzi, 2016). Such biases have been observed in adults within these language communities (e.g., Havy, Serres, & Nazzi, 2014; Lee, Rayner, & Pollatsek, 2002) and have been attributed to (i) intrinsic properties of consonants versus vowels (Cutler & Mehler, 1993; Floccia et al., 2014), (ii) to differences in how infants perceive consonants and vowels in the initial state (Bonatti, Pena, Nespor, & Mehler, 2005), and (iii) to the role of language experience in guiding infants’ abstractions about the role of consonants versus vowels in word learning (e.g., Højen & Nazzi, 2016; Keidel, Jenison, Kluender, & Seidenberg, 2007). The first two accounts would predict similarity across populations in how infants weigh consonants and vowels in lexical processes, while the last account invokes language-specific experience as a driver of phonological bias. An additional consideration is that unlike more commonly studied languages, the co-existence of segmental (vowels/consonants) and suprasegmental (tones) in Mandarin Chinese may change mutual dependencies between units of phonology for Mandarin learners. Supportive evidence for this comes from a corpus analysis by Tong, Francis, & Gandour (2008) demonstrating that the information value—at a lexical level—contributed by vowels exceeds that contributed by consonants and also by tones in Mandarin Chinese. It is therefore possible that a consonant bias is not a universal feature of development. Investigations into the acquisition of Mandarin Chinese therefore provide a valuable lens through which to investigate universal versus language-dependent constraints on language acquisition.

Studies investigating sensitivity to Mandarin phonological contrasts when learning words have employed looking time measures to determine the fidelity with which infants bind Mandarin tones to newly learned words (Graf Estes & Hay, 2015; Hay, Graf Estes, Wang, & Saffran, 2015; Ma, Zhou, Singh, & Gao, 2017; Singh, Tam, Chan, & Golinkoff, 2014; Singh, Poh, & Fu, 2016; Singh & Quam, 2016). These studies have involved training infants and children on new labels for novel objects. Crucially, the labels introduced are tone-bearing syllables. During a test phase, childrens’ memories for the trained words are assessed during two types of test trials: one where the word and tone are segmentally and tonally matched and another where the word is segmentally matched but tonally contrastive.

Investigating whether infants are sensitive to tone when learning new words, Singh et al. (2014) taught infants novel words in Mandarin Chinese via a preferential looking paradigm. After a training phase, infants were tested on their recognition of those words when correctly produced as well as on their recognition of those words when mispronounced in one of two ways. Words were either mispronounced on account of a vowel substitution (a two-feature height and backness change) or on account of a tone substitution (Mandarin Tone 2–4). Infants were tested at 18- and 24-months and were either bilingual learners of Mandarin and English, of English and another non-tone language or monolingual learners of English only. Results demonstrated that all groups were sensitive to lexical tones in Mandarin at 18 months, treating tone substitutions as mispronunciations of newly learned words. This is surprising in view of the fact that tones only distinguished word meanings for the Mandarin/English bilingual learners. All groups were also sensitive to the vowel changes, which distinguished words in each of the participants’ languages. Tone sensitivity and vowel sensitivity were comparable in magnitude for each group. However, at 24 months, only Mandarin learning infants remained sensitive to tone changes and vowel changes. In contrast, 24-month-old non-tone language learners (English monolinguals and English/non-tone language bilinguals) were only sensitive to vowel changes and not to tone changes demonstrating a language-dependent sensitivity to lexical tones.

Asking a similar question via a different paradigm, Hay et al. (2015) investigated monolingual English learning infants’ sensitivity to lexical tones when learning novel words. Using a habituation-based paradigm, Hay et al. tested 14-, 17-, and 19-month-old infants on sensitivity to the same tone mispronunciations used by Singh et al. (2014), Mandarin tones 2 and 4. Results revealed that while 14-month-old infants were sensitive to lexical tone changes, 17- and 19-month-old infants were not. In comparing tone sensitivity in non-tone language learners, Hay et al. (2015) reported that infants learning English disregarded tone changes as determinants of meaning at by 17 months, whereas Singh et al. (2014) reported that non-tone language learners continued to bind tone to newly learned words at 18 months. These findings can perhaps be reconciled by the fact that the preferential looking paradigm employed by Singh et al. (2014) provides comparatively rich referential support in contrast to habituation-based approaches to novel word learning used by Hay et al. (2015). Nevertheless, both studies point to an early de facto sensitivity to Mandarin lexical tones whether or not infants are learning Mandarin, later followed by a selective attenuation in sensitivity to lexical tones in infants who are not learning a tone language. Graf Estes and Hay (2015) also investigated tone sensitivity in bilingual learners, learning two non-tone languages. They found that bilingual infants integrated tone for longer than monolingual infants.

It should be noted that both Hay et al. (2015) and Singh et al. (2014) employed the same tone contrasts—tones 2 versus 4. This is potentially methodologically significant as these tones correspond to rising and falling pitch contours, respectively. In addition to serving as contrastive tones in Mandarin, rising/falling pitch contours are pragmatically contrastive in many languages drawing an important prosodic distinction between questions and statements (Bolinger, 1978). It is therefore possible that sensitivity to these particular contrasts is influenced by their pragmatic significance. Subsequent research suggests that rising/falling pitch contours may be interpreted in a different way to other pitch contrasts without clear pragmatic significance (e.g., high-rising contrasts) (Burnham, Singh, Mattock, Woo, & Kalashnikova, 2018). It is possible that other tone pairs would not elicit the same sensitivity and that tone sensitivity develops asynchronously for different tone pairs, a pattern mirrored in production (Wong, Schwartz, & Jenkins, 2005) and in infant tone discrimination (Shi, Gao, Achim, & Li, 2017; Tsao, 2017).

In a study designed specifically to investigate the effects of different types of tone and intonational contrasts on tone sensitivity in novel word learning, Burnham et al. (2018) contrasted 18-month-old infants on their sensitivity to two types of Mandarin contrasts (Tones 1 vs. 2 and Tones 2 vs. 4) with their sensitivity to two types of closely corresponding Thai tone contrasts (high-rising and rising–falling). Monolingual Mandarin learning infants were tested as well as Mandarin–English bilingual learners. An additional group of monolingual English learners was tested on their sensitivity to Thai or Mandarin contrasts as well as on their sensitivity to English intonational contrasts (statement vs. question and statement vs. order). All infants were tested via the switch paradigm. Results revealed that monolingual English learners were not sensitive to Thai or Mandarin tone contrasts nor were they sensitive to intonational contrasts as indicative of word meaning. In contrast, Mandarin monolingual infants were sensitive to Mandarin tones, but only to the Tone 1–Tone 2 contrast, but not to the Tone 2–Tone 4 contrast. Recall that Tones 2 and 4 represent question and statement contrasts in Mandarin as well as tone contrasts which may account for why these tones were not associated with word meanings (Yuan, 2004). Furthermore, Mandarin monolingual infants were not sensitive to either Thai contrast, suggesting phonological precision in Mandarin learners’ tone representations. Bilingual Mandarin–English infants demonstrated an interesting and distinctive pattern of results: Like monolingual Mandarin infants, they were only sensitive to Mandarin Tones 1 and 2 and not to Mandarin Tones 2 and 4. However, unlike monolingual Mandarin infants, they were sensitive to the Thai contrast that corresponds closely to Mandarin Tones 1 and 2, although not to the Thai correspondents of Tones 2 and 4. This suggests that bilingual learners of English and Mandarin demonstrate greater flexibility in their tone representation, accepting non-native analogues of native tones as lexically relevant distinctions. This finding converges with results of prior investigations of phonetic sensitivities in bilingual infants suggesting greater flexibility in bilingual learner’s phonetic category boundaries for segments (Ferjan-Ramirez, Ramirez, Clarke, Taulu, & Kuhl, 2016; Petitto et al., 2012; Singh, 2018). The present study suggests bilinguals may also maintain greater flexibility in their sensitivity to suprasegmental sources of lexical contrast (i.e., tones).

The present study suggests that tone interpretation in a novel word learning paradigm may be constrained by tone-intonation relationships at 18 months. Although further experimentation is needed to confirm this possibility, it may be that lexical tone contrasts that overlap with non-lexical intonation contrasts (e.g., questions/statements) are more challenging for infants to negotiate. Tone-intonation relations are one of several factors that could constrain the acquisition of lexical tones. Another factor that may determine infants’ sensitivity to lexical tones is perceptual salience. In a study designed to investigate whether infants’ sensitivity to tone in a word learning paradigm is dependent on tone salience, Singh et al. (2016) compared 12–13-month-old Mandarin monolingual and English–Mandarin bilingual infants on their sensitivity to a set of Mandarin tone contrasts. Infants were familiarized with words labeled by a syllable produced in Tone 3, a complex tone produced with a falling–rising contour. Infants were then were exposed to the word-object pair to which they were familiarized in Tone 3 as well to the familiarized object labeled by the same word in Tone 2 and to the familiarized object labeled by the same word in Tone 1. Tones 2 and 3 are reportedly the most confusable tone pair in the Mandarin tone inventory even for native learners (Shen & Lin, 1991). In contrast, Tones 1 and 3 are relatively easy to discriminate and have been shown to be the least confusable pair of Mandarin tones (Wang, Spence, Jongman, & Sereno, 1999). In an initial discrimination task, Singh et al. (2016) found that 12–13-month-old Mandarin monolingual infants could discriminate Tones 1 and 3 as well as Tones 2 and 3 in an auditory discrimination paradigm that did not require infants to map tones to meanings. However, when tones were associated with word meanings, it was only at 18 months that Mandarin monolingual infants demonstrated sensitivity to a change from Tones 3 to 2 and from Tones 3 to 1. In contrast, English–Mandarin bilingual infants demonstrated a 6-month lead in tone sensitivity. Bilingual infants were tested on their sensitivity to tones in a word learning task both when introduced to a word embedded in a set of English carrier phases as well as when introduced to the same word embedded in Mandarin carrier phrases. Bilingual infants were sensitive to lexical tones—both salient (Tone 3 vs. Tone 1) and subtle (Tone 3 vs. Tone 2)—when learning a new word in a Mandarin context. However, when learning a word in an English context, they were not sensitive to either tone contrast. Bilingual infants therefore demonstrated a precocious and language-sensitive interpretation of tones as a source of lexical contrast relative to their monolingual peers. Moreover, this precocity applied to salient tone contrasts (i.e., Tones 1 and 3) as well as to comparatively subtle tone contrasts (i.e., Tones 2 and 3). Nevertheless, by 18 months of age, Mandarin learning infants—monolingual and bilingual—appear to bind both salient and subtle tone contrasts to newly learned words.

The program of studies described above focuses on novel word learning between 12 and 18 months, when infants exhibit the beginnings of a productive vocabulary. However, children continue to add to their vocabularies at an aggressive rate in the following months, demonstrating a rapid rise in their vocabulary size after 18 months (Mayor & Plunkett, 2011). One might predict greater tone sensitivity over this period on account of an expanded lexical inventory on account of positive relationships between vocabulary size and mispronunciation effects for segmental substitutions (e.g., Law & Edwards, 2015). In an investigation of tone sensitivity and vowel sensitivity in Mandarin monolingual toddlers, Ma et al. (2017) used a preferential looking paradigm to compare 2- and 3-year-old children’s response to newly learned words as well as to variants of those words (tone and vowel substitutions). Tone substitutions encompassed a shift between rising and falling tones (Tones 2 vs. 4). Vowel substitutions incorporated a three-feature change in backness, height and roundedness. Results collapsed across age groups revealed that children were slower in general to orient toward target images when words were mispronounced via tone or vowel mispronunciations. Accuracy analyses, focusing on the proportionate amount of time spent fixating the target object when it was correctly pronounced and mispronounced, revealed more nuanced, age-dependent effects of mispronunciations. Specifically, 2-year-old children were sensitive to tone mispronunciations and vowel mispronunciations in equal measure, rejecting both types of mispronunciations as acceptable target labels. Upon testing older children, the authors discovered that 3-year-old children interpreted vowel substitutions as mispronunciations as did 2-year-old children. However, contrary to expectations, 3-year-old children did not interpret tone shifts as mispronunciations, preferentially fixating the target object when labeled by a tonal alternation.

In a second study designed to investigate limits on Mandarin monolingual 3-year-old children’s apparent insensitivity to lexical tones, Ma et al. (2017) tested 3-year-old Mandarin monolingual children on their sensitivity to tones when additional cues were provided. Specifically, during familiarization, children were trained on tone minimal pairs with the expectation that this would draw their attention to tone as a source of lexical contrast. Additionally, Ma et al. expanded the tone pairs in this experiment, incorporating distinct rising and falling tones (Tones 2 vs. 4) but also more confusable rising and dipping tones (Tones 2 vs. 3). When provided with supportive cues during familiarization, results revealed that participants were able to map rising and falling tones (Tones 2 and 4) onto contrastive objects. However, they were not able to recognize words with which they were familiarized in dipping tone (Tone 3). This pattern is mirrored in production, where Tone 3 is considerably difficult to master and is only reliably produced after Tones 1, 2, and 4 (Wong et al. 2005). These findings add to a groundswell of further evidence that tone sensitivity may decline with maturation to be further discussed in the next section.

In the previous set of studies, tone sensitivity was investigated by familiarizing participants with word-object pairings in a highly structured fashion (i.e., infants simply viewed visual objects in conjunction with repeated presentation of an auditory label). The task demands of such paradigms deviate in potentially significant ways from learning words in social contexts where word meaning links are often inferred from interactions. Sensitivity in novel word learning in conversational contexts was investigated in bilingual English–Mandarin preschool children to determine whether bilingual children demonstrated a language-specific interpretation of pitch variation. When learning English and Mandarin simultaneously, learners must integrate tones selectively in Mandarin during novel word learning and disregard pitch as lexically relevant in English. In a study designed to investigate whether preschool children were able to interpret tones in a language-selective manner, Singh and Quam (2016) tested 3- to 4- and 4- to 5-year-old Mandarin–English bilingual children on their sensitivity to tone shifts when learning words in English and in Mandarin conversational contexts. Words to be learned were manipulated such that within-word (phonotactic) cues were specific to the target language or common to both languages. Therefore, target words either lent themselves to one language or another or were ambiguous in terms of the language to which they belonged. Children were taught words in English and Mandarin in a conversational context and then tested on their recognition of the same words and tone variants of taught words in each language. Results demonstrated that 3–4-year-old children recognized words that they were taught when the words matched in tone both in English and Mandarin. However, when the words did not match in pitch (English) or tone (Mandarin), children did not demonstrate recognition of these words. Moreover, this pattern of results was observed whether children received leading phonotactic cues to the target language or not.

In contrast, when children were 4–5 years of age, when children were presented with words with no leading phonotactic cues, they rejected pitch variants as lexical equivalents in English as well as tone variants in Mandarin Chinese, similar to 3–4-year-old children. It was only when 4–5-year-old children were presented with leading phonotactic cues to the target language that they were able to demonstrate a language-dependent sensitivity to tones, integrating tone changes in Mandarin but disregarding the same changes in English. Although Mandarin learners bind tones to newly learned words as infants, some aspects of tone interpretation such as learning words via conversation take time to mature.

Studies investigating children’s sensitivity to lexical tones when learning new words reveal four important findings. First, infants demonstrate an early sensitivity to lexical tones whether they are learning a tone language or not (Hay et al., 2015; Singh et al., 2015). It is only at 17 months (in habituation-based tasks without referential support) and 24 months (in preferential looking tasks with referential support) that Mandarin learning infants demonstrate a language-specific sensitivity to tones that is not shared by their non-tone learning peers. Secondly, as with production, sensitivity to lexical tones is variable depending on the tone used. Different tone pairings elicit different sensitivities in novel word learning (Burnham et al., 2018; Ma et al., 2017; Singh et al., 2016). Third, bilingual tone processing may present language learning opportunities such as precocious integration of tones relative to monolingual Mandarin learning infants (Singh et al., 2016). However, some of the challenges of learning two languages where the functions of pitch differ may be more evident in the preschool years. At this stage, a language-selective integration of tones may be challenging for children (Singh & Quam, 2016) although this is a tentative claim given that there is no monolingual backdrop against which to evaluate bilingual data obtained by Singh & Quam (2016).

In addition to mapping novel words to meaning, language learners must rapidly recognize words that they already know in sentential contexts. The ability to do so is a strong predictor of concurrent and later language abilities (Marchman & Fernald, 2008). The following section discusses childrens’ abilities to understand familiar words in Mandarin with specific attention to differences between monolingual and bilingual learners of Mandarin.

4 Familiar Word Recognition in Mandarin Chinese

In a first attempt to determine the accuracy with which Mandarin learners recognize known words, Singh, Goh, and Wewalaarachchi (2015) tested Mandarin learning toddlers and preschoolers on spoken word recognition via a preferential looking paradigm. Similar to prior instantiations of this paradigm (e.g., Mani & Plunkett, 2007, 2011; Swingley & Aslin, 2002; White & Morgan, 2008), participants were presented with pairs of visual objects. Upon viewing the pair of objects for some time, one of the objects—presumed to be familiar to infants—was labeled in Mandarin. Proportionate fixation time to the target object versus the unlabeled object (distractor object) was tracked before and after hearing the label. A statistically significant increase in fixation to the target object upon hearing its label serves as evidence of word recognition. No significant increase in fixation to the target upon hearing its label serves as evidence of rejecting the label as a name for the target object. Occasionally, a third pattern of results surfaces: Participants preferentially fixate the distractor object upon hearing an auditory label. When distractor objects are unfamiliar to participants, a distractor preference is interpreted as evidence that the participant may have formed a new association between the auditory label and the distractor object. This pattern of results is often interpreted as more convincing evidence that the mispronunciation has been definitively rejected and mapped to a different object.

Participants were presented with several trials. On half of the trials, the target was correctly labeled while on half of the trials, the target was labeled by a mispronunciation caused by a consonant, vowel, or tone substitution. Participants’ abilities to accurately recognize correctly produced words and to reject incorrect pronunciations were investigated. Participants were tested at two age groups: 3 years of age and 4.5 years of age. Analyses revealed that all participants recognized correctly produced Mandarin words, preferentially fixating visual targets upon hearing them accurately labeled. Participants did not preferentially fixate visual targets upon hearing their labels mispronounced. However, responses varied markedly for mispronunciations due to vowel, consonant, and tone substitutions within each age group. At the younger age group (3 years), upon hearing vowel and consonant mispronunciations, children demonstrated similar responses for vowel and consonant substitutions. Specifically, they did not preferentially fixate target or distractor. In contrast, however, when hearing tone mispronunciations, participants preferentially fixated the distractor object suggesting that sensitivity to tones relative to vowels and consonants was comparatively high. However, at 4.5 years of age, children demonstrated very different results, expressing distractor preferences when hearing vowel and consonant substitutions. In contrast, participants demonstrated no preference for target or distractor objects when presented with tone mispronunciations. In combination, these findings suggest that responses to vowel and consonant variation were similar to one other at both age groups and furthermore, were dissociable from tone sensitivity at both age groups. While vowel and consonant sensitivity appeared to increase in older versus younger children, as reflected by a movement from no target/distractor preference to a distractor preference when hearing a mispronounced label, tone sensitivity appeared to attenuate over the same period.

Wewalaarachchi and Singh (submitted) have pursued this line of inquiry in older children, ranging from 5 to 6 years of age, demonstrating that tone sensitivity continues to decrease with age such that 6-year-old children were found to treat tone substitutions equivalently to correct pronunciations. This parallels an age-related decline in tone sensitivity reported by Ma et al. (2017) in novel word learning. Such a decline has not been observed for vowels and consonants either in the present study or in Ma et al. (2017) who manipulated vowels as well as tones.

It should be noted that the study reported by Singh et al. (2015) incorporated three Mandarin tones: 1, 2, and 4. Tone 3 (falling–rising tone) was not incorporated as it maintains a more variable form in Mandarin than Tones 1, 2, and 4 due to tone sandhi rules. Singh, Tan, and Wewalaarachchi (2017) investigated effects of salient tone mispronunciations (substitutions between Tones 1 and 4) as well as effects of subtle tone mispronunciations (substitutions between Tones 2 and 3) on word recognition in 3-year-old participants. Results revealed that children were highly sensitive to salient mispronunciations as indicated by Singh et al. (2015). However, they were insensitive to subtle mispronunciations, responding similarly to substitutions of Tones 2 and 3 as they did to correctly produced words. Similar difficulties with Tones 2 and 3 were reported in toddlers tested in familiar word recognition by Shi et al. (2017). This finding informs conclusions drawn from previous studies with younger participants, suggesting that high tone sensitivity is not observed for the entire Mandarin tone inventory, and in fact, toddlers can be largely insensitive to subtle tone changes such as Tone 2 to Tone 3.

It should be noted that participants tested in Singh et al. (2015) were bilingual learners of English and Mandarin. It remains unclear whether the observed age-based decline in tone sensitivity could be attributable to learning a non-tone language concurrently with a tone language. This question is informed by the results of a similar study by Ma et al. (2017) revealed that Mandarin monolingual children were not sensitive to a range of tone substitutions at 3 years of age, responding to tone mispronunciations as if they were correct pronunciations. In contrast, the same children were sensitive to vowel substitutions. This finding suggests that sensitivity to tone variation may attenuate with age, whereas sensitivity to vowels and consonants may be more stable over development.

In a systematic comparison of monolingual (Mandarin) and bilingual (English–Mandarin) learners, Wewalaarachchi, Wong, and Singh (2017) compared 2-year-old children on their sensitivity to tone, vowel, and consonant variation using a preferential looking paradigm. Wewalaarachchi et al. (2017) reported both similarities and differences between monolingual and bilingual learners. Both groups were similar in correctly accepting accurate labels as referring to familiar targets; however, monolingual infants demonstrated more rapid recognition of correctly produced words than bilingual infants. Both groups were also similar in rejecting consonant, vowel and tone mispronunciations as incorrect labels. However, the relative priority assigned to each type of mispronunciation in terms of processing efficiency (speed of recognition or mis-recognition of the target) differed by group: monolingual Mandarin learning infants demonstrated the least degree of sensitivity to consonants, followed by vowels and tones. In contrast, bilingual infants demonstrated least sensitivity to tones, followed by consonants and then by vowels. This pattern of results suggests that while both groups are similarly accurate in recognizing correct pronunciations and rejecting incorrect productions of familiar words, they varied in terms of the efficiency with which they do so. Moreover, the relative processing constraints associated with vowel, consonant, and tone variation differed between monolingual and bilingual learners which each group demonstrating a different ordering in the processing costs arising from variation in vowels, consonants, and tones.

Thus far, studies on spoken word recognition have focused on words presented in citation form. However, in natural speech, words occur predominantly in the context of clauses, phrases, and sentences. The context within which words occur can alter their physical form, leaving it incumbent on listeners to recover the underlying phonological structure. Research with Mandarin speaking children shows that tone sensitivity is quite heavily influenced by the word context within which tones occur (e.g., Wong & Strange, 2017). Instability in the form that words assume can pose a challenge to listeners. This challenge, often termed the ‘variability problem,’ has been reasonably well studied in learners of English and other Western non-tone languages (e.g., Houston & Jusczyk, 2000; Schmale, Cristia, Seidl, & Johnson, 2010; Singh, 2008; Skoruppa, Mani, & Peperkamp, 2013) although less so in learners of Mandarin Chinese. Mandarin Chinese presents with some unique sources of variability, two of which will be discussed here: tone sandhi and tone-intonation relationships.

First, Mandarin, like English, is associated with morphophonemic changes where words change their surface form in response to phonological context. A prime example of this in Mandarin Chinese is tone sandhi. According to tone sandhi rules, whole-tone substitutions can occur in a context-conditioned manner. According to the Tone 3 Sandhi rule, when two syllables co-occur, the first syllable alternates to Tone 2, resulting in a Tone 2-Tone 3 disyllabic sequence in place of a Tone 3-Tone 3 disyllabic sequence. Learners therefore have to appreciate that the first syllable in such a sequence bears Tone 3 and has undergone a phonological alternation. In addition, learners have to distinguish the alternating (post-sandhi) form from a disyllable where the base form is a Tone 2-Tone 3 sequence (non-sandhi form). For example, Tone 3 sandhi rules prescribe that that the phrase/fən(214) tʂʰɑŋ(214)/ (flour mill) is obligatorily modified such the first syllable is alternated to [fən(35) tʂʰɑŋ(214)], while preserving the original meaning of the word. However, given that the/fən(35) tʂʰɑŋ(214)/ means ‘graveyard’ (坟场), this tonal alternation creates a potential lexical ambiguity (Chen, 2000). Studies investigating children’s production of sandhi forms suggest that children do not systematically produce sandhi forms over the first 5 years of life (Chen, Wang, Shu, Wu, & Li, 2010; Wang, 2011). Children demonstrate evidence of reliably producing Sandhi forms at 6 years of age (Wang, 2011). In a word recognition study, Wewalaarachchi and Singh (2016) investigated whether children demonstrate receptive knowledge of sandhi forms. In this study, 3–5 year-old Mandarin learning children were presented with familiar words in a paradigm similar to Singh et al. (2015) described earlier in this section. Children were presented with 24 trials belonging to four possible trial types: correctly produced disyllables that were non-sandhi forms all of which were Tone 2-Tone 1 disyllables; ‘garden-variety’ mispronounced forms that were not sandhi forms, all of which were Tone 2-Tone 1 sequences mispronounced as Tone 3-Tone 1 sequences; sandhi forms that had undergone Tone 2 Sandhi alternation (Tone 2-Tone 3 sequences) and pre-sandhi forms that had not undergone the prescribed alternation (i.e., Tone 3-Tone 3 sequences).

Children were presented with familiar objects labeled in the four ways articulated above. As before, word recognition was measured via accuracy of fixation to visual targets. In addition, the time course of word recognition was charted. Results revealed that children reliably recognized correctly produced forms, preferentially fixating visual targets when they were correctly labeled. Children also reliably recognized post-sandhi forms that had undergone the correct alternation. Children did not fixate visual targets upon hearing ‘garden-variety’ (non-sandhi) mispronunciations. They also did not fixate visual targets upon hearing pre-sandhi forms. These findings suggest that although children’s productive mastery of sandhi forms may remain fragile through the preschool years, their comprehension processes reflect an ability to distinguish words based on sandhi alternations.

The analyses above focused on accuracy of spoken word recognition drawing from analyses of target preferences upon hearing sandhi and non-sandhi forms. However, a more detailed analysis into the time course of target selection revealed a comparatively nuanced picture. Charting the proportion of fixations to the target from the distractor object over time after hearing auditory labels affords insight into temporal constraints on children’s lexical selections. These analyses revealed that children’s eye movements to the target were slightly weaker for post-sandhi forms versus correct pronunciations late in the processing window (1400–2400 ms. after the onset of the target word), revealing processing costs linked to sandhi forms relative to non-sandhi forms. A similar analysis was performed for mispronunciations. It should be noted that upon hearing mispronounced forms, children should not fixate the target object and doing so reflects erroneous mappings between the auditory label and the visual target. Late in the processing window (1600–2200 ms. after the onset of the target word), children demonstrated slightly reduced target fixations (i.e., fewer false alarms to the mispronunciations) when hearing generic mispronunciations relative to pre-sandhi forms. This suggests that generic mispronunciations were more robustly rejected as labels for the target word than pre-sandhi forms. These findings add to accuracy analyses by suggesting that there may be temporal processing costs to sandhi forms relative to non-sandhi forms. Nevertheless, in the aggregate, children appear to demonstrate faithful comprehension of sandhi forms by 5 years of age although it must be acknowledged that this does not demonstrate knowledge of sandhi rules.

Another factor that could conceivably complicate word recognition of Mandarin tones is tone-intonation correspondences. Every language uses pitch variation toward a variety of non-lexical ends, such as the communication of vocal affect (Lieberman, 1967), the placement of stress (Fernald & Mazzie, 1991) and to distinguish communicative intent, such as questions versus statements (van Heuven & Haan, 2002). Mandarin Chinese is no exception: Communicative intent, such as questions versus statement forms, is reliably distinguished by pitch variation (Ho, 1977; Yuan, 2004, 2006; Zeng, Martin, & Boulakia, 2004). It is therefore incumbent upon learners to control for intonational variation to arrive at lexical tones and vice versa. Studies with Mandarin speaking adults have demonstrated that adult judgments of communicative intent are indeed taxed by co-occurring tone cues (Yuan, 2004). Specifically, adults encountered particular difficulty identifying question forms when sentences contained rising tones (i.e., Tone 2), interpreted as prioritization of lexical functions over intonational functions of pitch in language processing.

In a similar investigation with children, 3–4- and 4–5-year-old Mandarin learning children were tested on their recognition of familiar words in a preferential looking paradigm similar to that used in Singh et al. (2015). In this study, Singh and Chee (2016) presented children with familiar words marked by rising tones, in rising intonation (i.e., question forms) as well as in falling intonation (i.e., statement forms). They were also presented with words marked by falling tones in rising intonation (question forms) as well as in falling intonation (statement forms). Acoustic profiles of each tone are described in Singh & Chee (2016). However, it should be noted that intonational variation did not result in speakers crossing a tone boundary; rather, intonational variation simply altered properties of the pitch contour of the target words in a more subtle fashion that did not cause adult listeners to mis-identify the tone.

Results demonstrated that younger children at 3–4 years of age only recognized familiar words when pitch cues to tone and intonation converged. In other words, they only recognized Tone 2 (rising) tone words in question forms and Tone 4 (falling) tone words in statement forms. They did not recognize Tone 2 words in statement forms, nor did they recognize Tone 4 words in question forms. In contrast, by 4–5 years of age, children recognize familiar words in Tone 2 and Tone 4 in both rising and falling intonation, suggesting that by this point, their interpretation of lexical tones was not contingent upon convergent intonational cues. This study suggests that while tone interpretation on the part of Mandarin learning children is faithful and accurate in infancy (e.g., Singh et al., 2015, 2016), it appears to be more limited when intonational variation changes the realization of specific tones.

5 Models of Early Language Development: Where Does Tone Fit?

As it stands, prevailing models of early speech perception and language development such as PRIMIR (Werker & Curtin, 2005) and PAM (Best, 1994) do not readily account for lexical tones. In some sense, these models may be challenged by some research findings on tone acquisition, such as by reports of perceptual facilitation for tone in non-tone language learners or by findings that toddlers integrate tones into newly learned words that they fail to discriminate several months earlier. There have been recent efforts to explore the extent to which current developmental models of speech perception account for tones (see Curtin & Werker, 2018; Reid et al., 2015) as well as for adult models of speech perception [see T-TRACE by Tong, McBride and Burnham (2014) or COHORT on lexical tones by Zhou and Marslen-Wilson (1994)]. However, these models await empirical evidence to fully determine whether they capture processing of lexical tones as well as of vowels and consonants. Models must also consider the role of tone in atypical populations given that tone production and perception can be impacted by language disorders that affect prosodic sensitivity (see Chap. 13).

6 Conclusions

To summarize, infants and children appear to make gradual and incremental progress in their understanding and interpretation of Mandarin phonology. As early as 11 months, infants demonstrate a language-specific interpretation of Mandarin tones, even when learning non-tone language concurrently. Just 1–2 months later, bilingual infants learning English and Mandarin correctly and selectively bind lexical tones to meaning when learning new words in Mandarin. Later, at 24 months of age, when word learning is well underway, infants demonstrate a clear appreciation of the consequences of tone, vowel, and consonant mispronunciations when recognizing familiar words whether they are learning Mandarin monolingually or in conjunction with a non-tone language such as English. The time course of spoken word recognition differs in subtle ways between monolingual and bilingual learners, but both groups demonstrate a robust sensitivity to mispronounced forms. At 3–5 years of age, children demonstrate an awareness of context-driven changes, specifically of tone Sandhi rules, evidenced by correct recognition of legally alternating forms and correct rejection of pre-Sandhi forms as acceptable labels for known objects. Finally, between 4 and 5 years of age, children demonstrate a robust ability to recognize tone-bearing words, corresponding to words they know, regardless of their intonational context. This brief chronicle suggests that while tones appear early in children’s production, leading to the argument that they are the first phonological constituent to which infants are sensitive in perception (Yeung, Chen, & Werker, 2013) and production (Clumeck, 1980). However, the refinement and maturation of tone categories, at least in Mandarin, appear to take several additional years. Somewhat paradoxically, studies with tone interpretation reveal a decline in sensitivity with lexical tones over time (Ma et al., 2017; Singh et al., 2015; Wewalaarachchi & Singh, submitted). This pattern of results has not been observed with vowels and consonants. The scope and longevity of this decline in sensitivity as well as possible means by which older children compensate for reduced tone sensitivity to arrive at correct semantic interpretations remain to be determined.

In addition to charting the development of Mandarin in early lexical processing, an additional goal of this review was to provide a comparison of monolingual and bilingual learners of tone languages. Although systematic comparisons are quite rare, bilingual learners of Mandarin appear to develop in their knowledge of Mandarin phonology at a similar pace to monolingual peers, with some evidence of bilingual facilitation in tone interpretation (Singh et al., 2016). There is also evidence of bilingual processing costs in familiar word recognition (Wewalaarachchi et al., 2017) in the form of reduced processing efficiency, consistent with a larger body of studies with bilingual learners demonstrating reduced efficiency in lexical access (Gollan, Fennema-Notestine, Montoya, & Jernigan, 2007; Kaushanskaya & Marian, 2007; Roberts, Garcia, Desrochers, & Hernandez, 2002). In large part, however, bilingual and monolingual learners of Mandarin Chinese appear to demonstrate comparable abilities in building and accessing a Mandarin lexicon.

To conclude, there are distinctive elements of Mandarin that warrant systematic investigation of Mandarin acquisition as a complement to the vast body of research conducted on the acquisition of Indo-European languages such as English, French, and Spanish. The presence of a tone system and different links between vowels/consonants and the lexicon (see Wiener & Turnbull, 2016) serve as distinguishing properties of Mandarin compared with English that may lead us to hypothesize a distinct course of language acquisition to that charted for English. Empirical research conducted in each of these areas suggests that language acquisition may operate under different constraints for Mandarin as compared to English. Continued efforts to understand language-specific pathways to proficiency are integral to the development and refinement of models and theories of early language development. Such models and theories often promise to describe universals in development and do not limit their putative scope to the acquisition of specific language communities from which they draw participants. Further research aimed at expanding the evidence basis on the early acquisition of Mandarin and of other language families beyond Romance and Germanic languages could potentially expand existing and future models of early language acquisition in significant ways.