Introduction

Human languages tend to converge on their design. For example, across languages, syllables like blif are more frequent than lbif (Berent et al. 2007; Greenberg 1978). Such observations suggest that all speakers might share common restrictions on language structure. The nature of those constraints, however, is controversial.

One explanation attributes the cross-linguistic regularities to the language faculty itself. In this view, all languages share a set of universal constraints on syllable structure (e.g., Prince and Smolensky 1993/2004). Structures that violate these constraints (e.g., lbif) are ill-formed, hence, they are dispreferred by individual speakers and are underrepresented across languages.

On an alternative account, the restrictions on syllable structure originate solely from nonlinguistic sources. Indeed, syllables like blif might be more familiar, and/or easier to produce and perceive (Blevins 2004; Bybee 2008). Speakers’ preferences, then, might reflect not universal linguistic restrictions, but rather their shared experience, acoustic and articulatory pressures (e.g., Liberman et al. 1967; Evans and Levinson 2009; Pulvermüller and Fadiga 2010).

Our present research seeks to adjudicate between these possibilities by investigating the putative sonority restrictions on onset clusters (e.g., bl ock). Linguistic accounts define sonority \((s)\) as an abstract phonological property of segments that correlates with acoustic intensity (Clements 2005; Parker 2008). Least sonorous are stops \((\hbox {e.g}., b,p; s=1)\), followed by fricatives \((\hbox {e.g}., f,v; s=2)\), nasals \((\hbox {e.g}., m,n, s=3)\), liquids \((\hbox {e.g}., l,r; s=4)\), and glides \((\hbox {e.g}., w,y; s=5)\). Accordingly, onsets such as bl manifest a large rise in sonority \((\Delta s=s(l)-s(b)=3)\), bn exhibits a small rise \((\Delta s=2)\), bd has a sonority plateau \((\Delta s=0)\) and lb falls in sonority \((\Delta s=-3)\). The cross-linguistic preferences for syllables like blif over lbif could thus emanate from universal linguistic restrictions on sonority distance \((\Delta s)\) (Berent et al. 2007; Greenberg 1978; for additional typological evidence and additional constraints on onset structure, see Diver 1979 and Tobin 2002). In this view, large sonority rises are preferred to small rises, which, in turn, are favored over plateaus; least preferred are onsets of falling sonority \((\hbox {e.g}., bl\succ bn \succ bd \succ lb)\). By hypothesis, this restriction should be active in all speakers, irrespective of whether such syllables are present or absent in their language (Prince and Smolensky 1993/2004).

In line with this possibility, our past research has shown that speakers of different languages are sensitive to onset structures they have never heard before (e.g., Berent et al. 2007, 2008, 2012a, b). We inferred the well-formedness of onsets from their tendency to undergo repair. Specifically, we reasoned that if onsets with small sonority distances (e.g., lbif) violate structural linguistic restrictions, then such onsets will not be encoded faithfully by the language faculty. Instead, ill-formed syllables will be repaired as better-formed ones \((\hbox {e.g}., { lbif} \rightarrow { lebif})\), and consequently, their onsets will be systematically misidentified.

Results indeed show that identification of unattested onsets is monotonically related to their sonority distance—the smaller the distance, the more likely its repair and misidentification (e.g., Berent et al. 2007, 2008, 2012a, b). Additional analyses suggest that misidentification is not solely due to failures to encode acoustic input, as the misidentification of ill-formed onsets persist even for printed materials (Berent et al. 2009; Berent and Lennertz 2010; Tamasi and Berent 2014). Other results speak against an articulatory motor explanation—the possibility that people misidentify ill-formed syllables because they have difficulties in tacitly generating their motor plan. Contrary to the motor account, ill-formed syllables disengage the articulatory system (Berent et al. 2014); and their misidentification persists even when articulatory motor system is suppressed by Transcrnial Magnetic Stimulation (Berent et al. 2015). Moreover, the results obtain with speakers of various languages (English: Berent et al. 2007; French: Maïonchi-Pino et al. 2012; Korean: Berent et al. 2008; Spanish: Berent et al. 2012a, b) and despite minimal linguistic and articulatory experience—in the brains of neonates (Gómez et al. 2014). These results are in line with the possibility that speakers share universal restrictions on syllable structure.

These conclusions, however, remain controversial. One line of criticism questions the utility of sonority as a linguistic construct (Parker 2012). Some researchers have argued that the phonological concept of sonority is either grounded (Henke et al. 2012) or even subsumed by the phonetic system (Davidson 2010, 2011; Davidson and Shaw 2012). In this latter view, speakers do not systematically favor onsets with large sonority distances; rather, their behavioral preferences reflect difficulties in the phonetic encoding of the acoustic input. In line with this possibility, Davidson (2010) showed that adult English and Catalan speakers both fail to exhibit the expected preference for large sonority distances, and instead, their responses were highly sensitive to the phonetic cues of burst release (see also Wilson and Davidson 2013; Wilson et al. 2014). Other researchers assert that syllable preferences are induced from experience, as phonotactic learning models can exhibit the onset hierarchy despite having no innate constraints on syllable structure (Daland et al. 2011; Hayes 2011).

This controversy raises two questions. First, do adult speakers systematically favor onsets with large sonority distances despite no experience with such structures? Second, are such putative preferences due to phonological or phonetic factors?

To address these questions, one might turn to cluster-poor languages. Korean presents one such example. Although Korean arguably lacks onset clusters altogether, its speakers are demonstrably sensitive to the onset hierarchy (Berent et al. 2008). Korean, however, manifests many clusters across syllables (kp/kt/lb/lg/lk/lp/lt/md/mg/mk/mt/pk/pt/tk/tp; Kabak and Idsardi 2007), so it is conceivable that these clusters could have informed participants’ preferences for large sonority clines.

Mandarin Chinese provides an even stronger test case. Mandarin tolerates only consonant-glide onsets (e.g., obstruent-glide: by an, “change”; nasal-glide: nw an, “warm”), and across syllables, it exhibits only clusters beginning with a nasal consonant (e.g., ma nd e, “slow”; \(ma{\underline{\varvec{\eta l}}}u\), “busy”; Duanmu 1990, 2007, 2011; Wang and Chang 2001). Accordingly, the cluster inventory of Mandarin is even smaller than Korean. While there is some evidence that Mandarin speakers obey a portion of the onset hierarchy (e.g., \(bl \succ lb\), Ren et al. 2010), this investigation of the onset hierarchy is incomplete, and the findings do not distinguish between phonetic or phonological reasons for this preference.

Our current study examines these issues. We proceed in two steps. First, we ask whether Mandarin speakers are sensitive to the full onset hierarchy \((bl \succ bn \succ bd \succ lb)\). We reason that if onsets with small sonority distances are dispreferred, then such onsets should be repaired as better-formed structures \((\hbox {e.g}., { lbif} \rightarrow { lebif})\)—the smaller the distance, the more likely the repair. Consequently, as sonority distance decreases, identification of such monosyllables will be slower and error-prone.

Insofar as sensitivity to the onset hierarchy is found, we can next examine whether this effect reflects shared grammatical restrictions or alternative nongrammatical factors—either the phonetic properties of our acoustic stimuli, or the (limited) familiarity of our participants with their second language (English). For comparison with our past findings, we also included English-speaking controls.

Experiment 1: Syllable Judgment

Experiment 1 examined the linguistic preferences of Mandarin speakers using a syllable judgment task. In each trial, participants heard a single nonword (either a monosyllable or a disyllable), and they were asked to classify it as “short” or “long”—a proxy for the syllable-count procedure used in past research in English (e.g., does lbif have one syllable or two; Berent et al. 2007).

The change in procedure was introduced in light of the syllable structure of Mandarin. Unlike English, Mandarin bans not only complex onsets but also obstruent codas. Since our monosyllables violate both constraints, they might be repaired at two sites, yielding outputs with three syllables \((\hbox {e.g}., { lbif \rightarrow l}{{\mathbf {e}}.}{ bi.f}{{\mathbf {e}}})\), rather than two \((\hbox {e.g}., { lbif} \rightarrow l{{\mathbf {e}}}{ .bif})\). Although it is unknown whether Mandarin speakers represent our monosyllables (e.g., blif) as having two or three syllables, clearly, such items should be shorter than their disyllabic counterparts (e.g., lebif). Accordingly, we asked participants to judge each stimlus as “short” or “long”, rather than as having one or two syllables. Our main interest is whether onset structure affects the classification of these monosyllables. If Mandarin speakers encode onset hierarchy \(({ blif} \succ { bnif} \succ { bdif} \succ { lbif})\), then as sonority distance decreases, the likelihood of repair should increase, and consequently, monosyllables will be more likely to elicit “long” responses.

Methods

Participants

Sixteen college students, native Mandarin speakers participated in this experiment. Another group of 16 native English-speaking students of Northeastern University served as controls. Participants received either $10 or course credit for their participation.

Materials

The materials consisted of pairs of monosyllables and their matched disyllables described in Berent et al. (2007). Briefly, monosyllables were arranged in quartets, whose onsets exhibited either large rises, small rises, plateaus or falls in sonority (e.g., blif, bnif, bdif, lbif, respectively, see Appendix). Disyllables differed from monosyllables by a schwa \((\hbox {e.g.}, b{{{\underline{\varvec{e}}}}} { lif}, b {{{\underline{\varvec{e}}}}} { nif}, b {{{\underline{\varvec{e}}}}} { dif}, l {{{\underline{\varvec{e}}}}}{ bif})\). In total, 240 items (2 syllable \(\times \) 4 type \(\times \) 30 quartets) were included, and they were all recorded by a native Russian speaker (since Russian allows all four syllable types, these items can be produced naturally by the Russian speaker).

Procedure

After pressing the spacebar, participants heard a single item. They were instructed to quickly indicate whether it was short or long by pressing the appropriate key \((1=\hbox {short}; 2=\hbox {long})\). Slow responses (response time over 2500 ms) triggered a computerized warning message (“Too Slow!”). Trial order was randomized. All interactions with Mandarin participants in Experiments 1–2 were conducted in Mandarin.

Results and Discussion

Figure 1 plots the sensitivity scores (d-prime) of Mandarin speakers, along with English-speaking controls. An inspection of the means suggested that English speakers were sensitive to the onset hierarchy. In contrast, no such effects were evident for Mandarin participants. Consequently, the 2 group \(\times \) 4 type ANOVAs by participants \((\hbox {R}^{2}= 0.473)\) and items \((\hbox {R}^{2}=0.434)\) yielded reliable interactions \((F1(3,90)=13.61, p<0.0001,\eta ^{2}=0.114; F2(3,174)=14.44,p<0.0001,\eta ^{2}=0.108)\). The simple main effect of onset type was significant only for English speakers \((F1(3,45)=24.87, p<0.0001,\eta ^{2}=0.457; F2(3,87)=26.94,p<0.0001,\eta ^{2}=0.416\); for Mandarin speakers, \(F1(3,45)=1.50,p=0.23,\eta ^{2} =0.040; F2(3,87)=1.95,p=0.13,\eta ^{2}=0.050)\).

Planned comparisons revealed that onsets with large sonority rises (e.g., blif) elicited greater sensitivity than small rises \((\hbox {e.g.}, { bnif}, t1(45)=4.01,p<0.0003,d=1.73; t2(87)=4.40,p<0.0001,d=1.35)\), which, in turn, elicited greater sensitivity than plateaus \((\hbox {e.g}., { bdif}, t1(45)=3.10,p<0.004,d=1.34; t2(87)=2.79,p<0.007,d=0.85)\). Sensitivity to onsets of level and falling sonority (e.g., bdif vs. lbif) did not differ significantly \(( t1(45)=0.59,p=0.56,d=0.25; t2(87)=1.03,p=0.31,d=0.31)\).

Fig. 1
figure 1

The sensitivity (d-prime) of Mandarin and English speakers to sonority distance in Experiment 1. Note: error bars indicate 95 % confidence intervals for the difference between the means

Similar 2 syllable \(\times \) 4 type ANOVAs of correct response time did not yield significant interactions for either Mandarin \((\hbox {both}\, p>0.19)\) or English \((\hbox {both}\, p>0.27)\) participants.

Taken at face value, these results indicate that Mandarin speakers do not represent the onset hierarchy. But on an alternative account, this hierarchy might be represented, but its effect is masked by phonetic factors. For example, Mandarin speakers might fail to encode the onset because they confuse the burst release of stop consonants with an epenthetic vowel (Kang 2003; Wilson and Davidson 2013; Wilson et al. 2014). Such phonetic ambiguities could have been exacerbated in the syllable judgment task because monosyllables were presented in isolation. If this explanation is correct, then these difficulties might be alleviated in a discrimination task that contrasts monosyllables with disyllables. Since monosyllables and disyllables are matched for the initial consonant (e.g., blif vs. belif), their phonetic properties (e.g., the presence of a burst) are similar. The explicit comparison of such matched pairs might help participants ignore those irrelevant nondistinctive phonetic cues, and focus on their contrastive phonological structure.

Experiment 2: AX Identity Judgment

In Experiment 2, participants heard two items—either identical tokens (e.g., blif-blif; belif-belif) or epenthetically-related (e.g., blif-belif), and were asked to determine whether the two items were identical. Since pair members share the same initial consonant, and they are presented in close proximity, their contrastive phonological structure might now become more salient to participants. If speakers are sensitive to onset structure, then worse formed monosyllables should be more likely to be recoded as their disyllabic counterparts \((\hbox {e.g.}, { lbif} \rightarrow { lebif})\). Consequently, as sonority distance decreases, misidentification rate should increase.

Methods

Participants

Two additional groups of Mandarin- (N = 16) and English-speaking (N = 16) participants took part in the experiment. All were college students, and received either $10 or course credit.

Materials

The same materials from Experiment 1 were used, except that they were presented in pairs. Half of the pairs were physically identical (e.g., monosyllabic: blif-blif; disyllabic: belif-belif), whereas the other half was nonidentical (e.g., blif-belif; belif-blif, with order counterbalanced).

Procedure

After pressing the spacebar, participants heard a pair of nonwords (ISI = 1500 ms). They were instructed to quickly indicate whether those nonwords were identical by pressing a computer key (1 = identical, 2 = nonidentical). Slow responses (response time over 2500 ms) triggered a computerized warning message (“Too Slow!”).

Results and Discussion

Figure 2a plots the sensitivity (d’) of Mandarin and English participants to syllable structure; the effect of syllable structure on correct response time (RT) to nonidentical trials (e.g., lbif-lebif) is provided in Fig. 2b. An inspection of the means suggests that both groups were now sensitive to the onset hierarchy: as the onset became worse formed, sensitivity declined, and response time to nonidentical trials increased.

Fig. 2
figure 2

Sensitivity (d-prime, panel a) and correct response time to nonidentical trials (RT, panel b) as a function of sonority distance in Experiment 2. Note: error bars indicate 95 % confidence intervals for the difference between the means

The 2 group \(\times \) 4 type ANOVAs on response time yielded a significant main effect of onset type \((F1(3,87)=17.36,p<0.001, \eta ^{2}=0.089; F2(3,171)=13.89,p<0.001, \eta ^{2}=0.128)\), which was not further modulated by the group factor \((\hbox {both}\, p>0.35)\). Similar ANOVAs (2 group \(\times \) 4 type) conducted on sensitivity yielded significant interactions \(( F1(3,90)=9.44,p<0.0001,\eta ^{2}=0.038,\hbox {R}^{2 }=0.386; F2(3,174)=2.88,p<0.04,\eta ^{2}=0.022,\hbox {R}^{2}=0.331)\). However, the simple main effect of onset type was significant for both English \((F1(3,45)=52.41,p<0.0001,\eta ^{2}=0.448; F2(3,87)=35.90,p<0.0001,\eta ^{2}=0.436; \hbox {RT}: F1(3,42)=7.25,p<0.0006,\eta ^{2}=0.132; F2(3,84)=8.77,p<0.0001,\eta ^{2}=0.182)\) and Mandarin participants \((F1(3,45)=40.16,p<0.0001,\eta ^{2} =0.301; F2(3,87)=13.41,p<0.0001,\eta ^{2}=0.238; \hbox {RT}: F1(3,45)=10.58,p<0.0001,\eta ^{2} =0.084; F2(3,87)=7.18,p<0.0003,\eta ^{2}=0.148)\).

We next examined the effect of sonority distance on the performance of English and Mandarin speakers separately, using planned contrasts. Considering first the English participants, we found that onsets with large sonority rises elicited significantly greater sensitivity than small rises \((t1(45)=4.69,p<0.0001,d=2.02; t2(87)=3.37,p<0.002,d=1.04; \hbox {RT: both}\, p>0.47)\), which, in turn, yielded significantly greater sensitivity than plateaus \(( t1(45)=4.99,p<0.0001,d=2.15; t2(87)=4.65,p<0.0001,d=1.43; \hbox {RT: both}\, p>0.28)\). The sensitivity to sonority plateaus and falls did not differ reliably \((t1(45)=1.61, p=0.11, d=0.69; t2(87)=1.11, p=0.27, d=0.34)\), but plateaus did produce significantly faster responses than falls \((\hbox {RT}: t1(42)=3.33,p<0.002,d=1.49; t2(84)=3.14,p<0.003,d=0.98)\). Thus, as sonority cline of the monosyllable decreased, English speakers experienced greater difficulty in its discrimination from its disyllabic counterpart.

Crucially, our Mandarin participants were likewise sensitive to onset structure. Onsets with small rises elicited significantly higher sensitivity \((t1(45)=6.89,p<0.0001,d=2.97; t2(87)=4.01,p<0.0002,d=1.23)\) and faster responses \(( t1(45)=2.25,p<0.03,d=0.97; t2(87)=2.70,p<0.009,d=0.83)\) compared to plateaus. Plateaus, in turn, elicited greater sensitivity \(( t1(45)=2.76,p<0.009,d=1.19; t2(87)=1.43,p=0.16,d=0.44)\) and faster responses \((t1(45)=3.00,p<0.005,d=1.29; t2(87)=1.47,p=0.146,d=0.45)\) than sonority falls, a trend significant across participants only. Responses to onsets with large and small rises did not differ reliably (sensitivity: both \(p>0.10\); RT: both \(p>0.38\)).

Together, these results suggest that as sonority distance decreased, Mandarin and English participants tended to misidentify monosyllables as their disyllabic counterparts. Unlike English participants, however, Mandarin speakers were insensitive to the contrast between the large and small sonority rises \((\hbox {e.g}., bl \succ bn)\), possibly because they interpret \(l\) and \(r\) as interchangeable (due to the wide productivity of the nasalization and de-nasalization processes in Mandarin, Chen 1972). For most of the onset hierarchy, however, Mandarin and English speakers showed similar sensitivity to the structure of the onsets that they have never heard before, and their behavior mirrored the onset typology.

General Discussion

This study investigated whether Mandarin speakers are sensitive to the putatively universal hierarchy of onset clusters. The results of Experiments 1–2 provided conflicting answers to this question. While Experiment 1 found no effects of onset structure among Mandarin speakers (using the syllable judgment task), onset structure did modulate responses in Experiment 2 (AX discrimination).

These results raise two fundamental questions. First, are Mandarin speakers sensitive to structure of complex onsets? To the extent that they are, we can next ask what the source of their sensitivity is—whether it reflects putatively universal phonological restrictions; or whether these findings can be captured by alternative explanations, either the phonetic properties of our materials, or the linguistic experience of our participants with both their native language (Mandarin) and their second language (English). Our discussion considers each of these questions in turn.

Are Mandarin Speakers Sensitive to the Onset Hierarchy?

The results of our two experiments yielded different outcomes with respect to Mandarin speakers’ sensitivity to the onset hierarchy. While Experiment 2 showed that ill-formed syllables were generally harder for Mandarin speakers to identify, Experiment 1 found no effect of syllable structure.

Why do the results of the two experiments diverge? One possibility is that the divergence reflects inherent limitations of the syllable judgment task (in Experiment 1) relative to the AX discrimination task (in Experiment 2). We believe this explanation is unlikely, as past research provides ample evidence that the syllable judgment task is highly sensitive to the onset hierarchy (Berent et al. 2007, 2008, 2012a, b, 2015), and the present results from English speakers further bolster this claim. We thus suspect that the divergent outcomes of Experiments 1 and 2 reflect not inherent task artifacts, but rather the interaction between properties of the task and systematic characteristics of Mandarin.

The syllable judgment task presents a special challenge to Mandarin participants because it elicits judgment of unfamiliar stimuli presented in isolation. While the interpretation of unfamiliar isolated words, uttered by a speaker of a foreign language, is always difficult, the linguistic properties of Mandarin might render this task especially challenging. Mandarin exhibits vowel devoicing, a process that renders the vowel (specifically, non-low vowels with low tones) inaudible after an aspirated consonant (Duanmu 2007). Applying this knowledge to our experimental materials, Mandarin speakers might conclude that onsets beginning with stop consonants (whose burst release resembles the aperiodic energy characteristic of aspiration) are followed by an inaudible vowel. Accordingly, stop-consonant onsets (e.g., blif) might be misinterpreted as ones including an intermediate schwa (e.g., belif). And since this analysis will apply to all stops, regardless of onset structure, the effect of sonority will be greatly attenuated.

The AX task might allow participants to overcome this phonetic challenge. Because this task pairs monosyllables with their disyllabic counterparts (e.g., blif-belif), Mandarin participants could now disregard the irrelevant phonetic cues shared by the two stops, and focus on the relevant phonological distinction.

To determine whether the burst release caused misidentification, we submitted the results of both experiments to several step-wise regression analyses, using sonority distance and burst properties (intensity and duration) as two predictors. Our first set of analyses examined whether participants were, in fact, sensitive to phonetic properties of the burst, and whether its salience was greater in Experiment 1. Next, we asked whether participants remained sensitive to the onset hierarchy once the properties of the burst were statistically controlled. Results are presented in Table 1; the proportion of unique variance associated with the burst and sonority distance is plotted in Fig. 3.

Fig. 3
figure 3

The proportion of unique variance \((\hbox {R}^{2}_\mathrm{change})\) associated with (a) phonetic factors (burst intensity and duration) and (b) sonority distance in stepwise regression analyses of responses to monosyllables in Experiment 1 and responses to nonidentical trials in Experiment 2

Table 1 The unique effects of (a) the phonetic properties of the burst (intensity and duration); and (b) sonority distance in stepwise regression analyses of response accuracy in Experiment 1 and response accuracy to nonidentical items in Experiment 2

To test speakers’ sensitivity to the burst, we first forced its duration and intensity (together) as the last predictor; the effect of sonority was entered in the first step. Results revealed that Mandarin speakers were highly sensitive to the salience of the burst, and the unique effect of the burst \((\hbox {R}^{2}_\mathrm{change}=.17)\) was far larger than that for English speakers \((\hbox {R}^{2}_\mathrm{change}=.03)\) . Moreover, while English speakers showed comparable sensitivity to the burst across our two experiments, for Mandarin participants, the size of this effect in Experiment 1 \((\hbox {R}^{2}_\mathrm{change}=.17)\) was roughly twice its size in Experiment 2 \((\hbox {R}^{2}_\mathrm{change}=.08)\). This result is in line with our assertion that the syllable judgment task might have presented Mandarin speakers with greater phonetic difficulties.

Given that Mandarin speakers are especially sensitive to the phonetic properties of the burst, we next asked whether they are sensitive to the phonological structure of the onset.

To address this question, we repeated the regression analyses while reversing the order of predictors—the phonetic properties of the burst were entered first, whereas the effect of onset structure was entered last.

The analysis of Experiment 1 yielded no effect of onset structure (see Table 1; Fig. 3), but results from Experiment 2 showed the effect of onset structure remained significant, even after the phonetic properties of the burst were controlled. Moreover, this effect was found for both Mandarin and English speakers.

These findings confirm that Mandarin participants were sensitive to the structure of the onset in the AX discrimination task. Nonetheless, speakers of Mandarin were also acutely sensitive to the burst release, and this effect was especially notable in Experiment 1, when the stimuli were presented in isolation (for syllable judgment). The difficulty of Mandarin speakers in the phonetic parsing of isolated stimuli explains the null effects of onset structure in Experiment 1, and their emergence in Experiment 2. Together, these results suggest that Mandarin speakers are in fact sensitive to the phonological structure of unattested onsets, but, when unfamiliar syllables are judged in isolation, this effect can be masked by their heightened sensitivity to phonetic properties.

Why are Mandarin Speakers Sensitive to the Onset Hierarchy?

Why are Mandarin speakers sensitive to onset structure? Earlier, we considered the hypothesis that speakers of all languages might share universal linguistic constraints that disfavor onsets with small sonority distances. The sensitivity of Mandarin speakers to the onset hierarchy is consistent with this possibility. But on an alternative account, the behavior of Mandarin speakers could be guided by their prior linguistic experience—either their experience with their native language, Mandarin, or their experience of English as a second language. We examined these two possibilities in turn.

The Role of Linguistic Experience with Mandarin

Although the complex onsets presented in our experiments are all unattested in Mandarin, it is conceivable that Mandarin speakers might rely on knowledge of their native language. One concern is that ill-formed onsets are misidentified because they include phonemes that are unattested in Mandarin, and these sounds are thus misidentified. Although our experiments do not allow us to determine how Mandarin participants interpreted our stimuli, we can nonetheless ask whether responses to items whose phonemes (as intended by the Russian talker) are unattested in Mandarin differ from those whose phonemes exist in Mandarin.

Six of our item quartets included at least one member with a nonnative Mandarin phoneme (either the fricative \(\int \) or the vowel \(\mho \)). To examine the effect of these nonnative phonemes, we compared responses to items whose phonemes are all native to Mandarin with those with nonnative phonemes. If the difficulty with ill-formed onsets reflects unfamiliarity with nonnative phonemes, then the effect of onset structure of monosyllables should be modulated by the status of the phoneme in Mandarin (i.e., native vs. nonnative). However, the 2 phoneme type \(\times \) 2 syllable \(\times \) 4 onset type ANOVAs conducted on the accuracy data in Experiment 1 yielded no three-way interactions \((\hbox {both}\, p>0.85)\). Similar analyses on the response accuracy to nonidentical items, presented in Experiment 2, likewise found no hint of such interactions \((\hbox {both}\, p>0.74)\). In addition, we noted that response accuracy to identical items with nonnative phonemes in Experiment 2 was nearly perfect (Mean = 0.97). These results suggest that the sensitivity of Mandarin participants to syllable structure is unlikely due to the misinterpretation of phonemes that are nonnative to their native language.

Another concern might attribute our results to familiarity with the consonant clusters occurring in Mandarin. In this view, onsets with small sonority distances are disliked because they are underrepresented in Mandarin. As noted earlier, however, the inventory of consonant clusters in Mandarin is highly impoverished. The only consonant sequences attested in Mandarin onsets consist of consonant-glide (CG) combinations (Duanmu 2007, 2008, 2011), and there is evidence that this sequence forms a complex segment, rather than an onset cluster (Duanmu 2008). The only other source of evidence concerning consonant clusters obtains from hetero-syllabic clusters. But since Mandarin codas are confined to nasals (i.e., [n] and [ŋ]), those hetero-syllabic clusters are restricted to nasal-consonant combinations.

Clearly, then, the relevant consonant clusters attested in Mandarin account for a very small subset of the onsets presented in our experiments. While we cannot rule out the possibility that experience with those instances might inform the linguistic preferences of Mandarin participants in our experiments, the relevant linguistic evidence available to them is extremely limited, and it is clearly far more restricted than in any other adult population, examined in any of the previous studies on onset hierarchy (e.g., English, Berent et al. 2007; French, Maïonchi-Pino et al. 2012; Spanish, Berent et al. 2012a, b).

The Role of Familiarity with English

A far more pressing explanation for the sensitivity of our Mandarin participants to the onset hierarchy concerns their second language—English. While our design sought to minimize such effects by selecting participants from English-remediation classes and conducting the experiments in a Mandarin linguistic environment, it is still possible that the familiarity of these participants with English could account for the results.

To evaluate this possibility, we first assessed the familiarity of our participants with English by means of a survey, administered to all Mandarin participants. According to their reports, participants were all born in Mainland China, and they identified Mandarin Chinese as their native and dominant language. Regarding their English proficiency (see Table 2), they began learning English in mid childhood, arrived at an English-speaking country in adulthood and resided there for a brief time (less than 2 years). Their low performance on the TOEFL-iBT test (Test of English as a Foreign Language-internet Based Test, a standardized English language proficiency test developed and administered by Educational Testing Service) further suggests that these participants had only weak English proficiency. Of all 32 Mandarin participants, only 2 reported speaking another language fluently (reported by themselves as “I am fully comfortable with comprehending, speaking and writing the language”) other than English, and the two languages mentioned (Cantonese, Japanese) have a cluster inventory that is more restricted than English.

Table 2 English proficiency and exposure measures of the Mandarin participants in Experiments 1–2

We next asked whether English proficiency and exposure modulated participants’ sensitivity to onset structure. To quantify the sensitivity of individual participants to the onset hierarchy, we submitted their performance to a Mixed Effects logistic regression model, and obtained the slope associated with the simple effect of onset type for each participant in Experiments 1–2. The slope provides an index of individual participants’ sensitivity to sonority—a negative slope indicates decreasing discriminability to worse formed onsets; the steeper the slope, the more sensitive they are to the entire onset hierarchy.

We obtained two estimates of this slope: one based on the d’ measure, and another based on the accuracy scores to either monosyllables (in Experiment 1) or nonidentical trials (in Experiment 2). For Experiment 2, we calculated the slope separately for the two presentation orders (e.g., blif-belif; belif-blif) because it is conceivable that holding ill-formed onsets in memory (when presented first) might exert greater demands, and consequently, enhance the sensitivity to onset structure. Of interest is whether participants’ sensitivity to the onset hierarchy correlates with their English exposure or proficiency. To address this question, we correlated each participant’s slope with the various measurements of English proficiency and exposure.

Results (see Table 3) showed that Time in US was associated with greater sensitivity to onset structure (i.e., negative slope). In Experiment 1, this association obtained for both the d’ and accuracy scores; in Experiment 2, the correlation with exposure was only significant when monosyllables had to be maintained in memory (e.g., blif-belif). Thus, the longer participants had been in the US, the more sensitive they were to onset structure.

Table 3 Correlations between English familiarity and exposure and the slope of the effect of onset structure for Mandarin participants

These findings are open to two distinct interpretations. One is that experience with English allows participants to induce the onset hierarchy (i.e., phonological learning). Alternatively, experience does not result in phonological learning of the onset hierarchy itself. Rather, experience boosts sensitivity to syllable structure because it helps participants extract the surface phonetic form of the acoustic input (i.e., phonetic learning). Indeed, the onset hierarchy is only relevant if the surface phonological form extracted by participants specifies a complex onset. But as noted earlier, Mandarin speakers tend to systematically misinterpret stop-consonant onsets for phonetic reasons (i.e., the salience of the burst release), informed by their knowledge of Mandarin (specifically, the process of vowel-devoicing). Experience with English might help participants overcome this bias by informing the phonetic interpretation of the acoustic input. Note that, while the first account (phonological learning) challenges the possibility that the onset hierarchy is a universal phonological constraints, the second does not; experience, in this view, is only necessary for rendering such universal constraints applicable.

To adjudicate between these explanations, we asked whether the familiarity of our participants with English (as determined by their length of stay in the US) affects their sensitivity to the phonological structure of the onset, or the phonetic properties of stop consonants. We approached this question in two steps. First, we performed a median split on our participants’ Time in US; either short (Recent Arrivals) or long (Earlier Arrivals). The characteristics of the two groups are provided in Table 4.

Table 4 English familiarity and exposure for the Mandarin participants in Experiments 1–2 as a function of the date of their arrival in the US (Recent vs. Earlier)

We next gauged the sensitivity of each group to phonetic factors and sonority distance via step-wise regression analyses. As in previous analyses (“Are Mandarin Speakers Sensitive to the Onset Hierarchy?” section), we first examined participants’ sensitivity to phonetic cues (i.e., burst intensity and duration) by forcing this factor last into the regression model; to examine the unique effect of onset structure, we next reversed the order of the predictors. In all analyses, the dependent measure was response accuracy, either response to monosyllables (in Experiment 1) or to monosyllable-disyllable sequences (e.g., blif-belif, in Experiment 2). If familiarity with English promotes the induction of the onset hierarchy from experience (i.e., phonological learning), then sensitivity to the hierarchy should be confined to Earlier-arrivals. The findings are provided in Table 5; the proportion of the unique variance associated with the burst and onset structure is graphically depicted in Fig. 4.

Fig. 4
figure 4

The proportion of unique variance \((\hbox {R}^{2}_\mathrm{change})\) associated with (a) phonetic factors (burst intensity and duration) and (b) sonority distance in the performance of Recent vs. Earlier arrival groups in stepwise regression analyses of Experiment 1 and Experiment 2. Note: Data from Experiment 1 captures response accuracy to monosyllables; data from Experiment 2 captures responses to nonidentical monosyllable-disyllable trials (e.g., blif-belif)

Table 5 Stepwise regression analyses of the performance of the recent- and earlier-arrival groups, using forced entries of predictors

Considering first the sensitivity of phonetic factors, we found that, regardless of the length of stay, phonetic factors uniquely captured the behavior of Mandarin speakers in both experiments. While in Experiment 2, the unique effect of phonetic cues was comparable in magnitude for Earlier \((\hbox {R}^{2}_\mathrm{change}=0.089)\) and Recent-arrivals \((\hbox {R}^{2}_\mathrm{change}=0.072)\), Experiment 1 showed a numerically larger effect of phonetic cues for Earlier-arrivals \((\hbox {R}^{2}_\mathrm{change}=0.137)\), compared to Recent-arrivals \((\hbox {R}^{2}_\mathrm{change}=0.088)\). Crucially, once the effect of phonetic factors was statistically controlled, both groups showed significant unique effects of onset structure in each of the two experiments. In fact, the unique effect of onset structure tended to be larger for Recent- relative to Earlier-arrivals in both Experiment 1 (0.09 vs. 0.062) and Experiment 2 (0.141 vs. 0.015).

Summarizing, our analyses confirm that Mandarin speakers are acutely sensitive to the phonetic properties of the acoustic materials, and that this sensitivity correlates with their exposure to English—the longer their exposure, the stronger their phonetic sensitivity. Whether the enhanced sensitivity of Earlier-arrivals to phonetic cues reflects gains in extracting the phonetic representation of stop consonants, specifically, or in other aspects of phonetic processing that happen to correlate with the sensitivity to stops (e.g., learning the phonetic form English schwas) is not entirely clear form these results. Nonetheless, exposure to English is clearly associated with enhanced sensitivity to phonetic cues. By contrast, we found no evidence for phonological learning of the onset hierarchy, as the unique effect of onset structure was present even among participants who had minimal experience with English (as little as 3.25 months, for the Recent-arrivals in Experiment 2), and its size was, in fact, larger, numerically, relative to the Earlier-arriving group. These results ought to be interpreted with caution, as our adult Mandarin participants had significant linguistic experience, and they all had some familiarity with English. As such, these findings cannot rule out the role of phonological induction. Nonetheless, the present analyses demonstrate that sensitivity to onset structure is present for speakers of Mandarin, and they provide no evidence that their performance was informed by their experience with either Mandarin or English.

Conclusion

Our findings outline an intriguing convergence between onset structural preferences across languages and the behavior of Mandarin speakers. Across languages, onsets with small sonority distances are systematically dispreferred; our results demonstrate that Mandarin speakers exhibit similar preferences despite minimal experience with consonant clusters of any kind. While the emergence of this sensitivity was modulated by task demands and the phonetic properties of our stimuli, auxiliary analyses suggest that phonetic difficulties are unlikely to account for the effect of the onset hierarchy. It is also unlikely that the effects reported here are solely due to participants’ phonological experience with Mandarin or with English—their second language. While our present results from adult Mandarin speakers cannot rule out the contribution of those factors, it is nonetheless interesting to note that similar sensitivity to the onset hierarchy obtains even in neonates (Gómez et al. 2014). Whether the onset hierarchy is due to linguistic constraints, and whether those constraints are truly universal are questions awaiting further research.