Introduction

Autism spectrum disorder (ASD) is defined as a group of neurodevelopmental disorders with core deficits in social communication and interaction, combining restricted and repetitive behaviors and interests (American Psychiatric Association 2013). Despite having impairments in language and social communication, children with ASD may demonstrate extraordinary cognitive abilities in auditory and visual domains. For instance, the lack of social attention to faces in children with autism is accompanied with a general preference to nonsocial objects (Kikuchi et al. 2009; Klin et al. 2009). In auditory domain, there is a relatively high occurrence of absolute pitch (AP) ability in the ASD population (Heaton et al. 1998). Individuals with high-functioning autism (HFA) also did better than controls in discriminating and categorizing pure tones (Bonnel et al. 2003); children with HFA and Asperger syndrome (AS) outperformed their control counterparts in memorizing and labeling musical tones (Heaton 2003), detecting pitch intervals (Heaton 2005), and discriminating the pitch patterns of meaningful spoken sentences as well as those of linguistically meaningless vocal sounds (Järvinen-Pasley et al. 2008).

Additional evidence for pitch superiority in autism comes from electrophysiological measures known as the mismatch negativity (MMN) response. The MMN reflects pre-attentive automatic detection of acoustic stimulus change, which is correlated with perceptual auditory discrimination (Näätänen et al. 2011). When compared with age-matched controls, individuals with autism aged 6–19 years demonstrated enlarged amplitudes (Ferri et al. 2003) and children with autism displayed shortened MMN latencies (Gomot et al. 2002) for pitch contrasts in pure tones; children with autism also showed larger MMN amplitudes to pitch changes embedded in speech and nonspeech stimuli (Lepistö et al. 2005, 2008). More in-depth summary of evidence can be seen in a series of comprehensive reviews (Haesen et al. 2011; Hitoglou et al. 2010; Kujala et al. 2013; O’Connor 2012; Ouimet et al. 2012).

The enhanced pitch processing phenomenon has drawn researchers’ attention to the issue of atypical perception in autism. The Weak Central Coherence (WCC) theory attempts to explain this phenomenon as a form of global processing deficit along with “detail” bias presented in the auditory mode (Happé and Frith 2006). The neural complexity hypothesis ascribes it to disrupted neural hierarchy where spectro-temporally simple but not complex stimuli yield superior performances (Samson et al. 2006). The Enhanced perceptual functioning (EPF) theory (Mottron et al. 2006) and its updated version, Veridical Mapping theory (Mottron et al. 2013), attribute the heightened pitch sensitivity to over-developed neural network for low-level perceptual processing.

One fundamental question of theoretical and clinical interest is whether the perceptual enhancement or bias toward pitch in autism has an impact on higher-level speech processing. Existing literature does not provide a definitive answer. Some researchers did not find a direct relationship between vocal pitch processing and receptive language skills (Heaton et al. 2008a, b; Mayer et al. 2014) whereas others showed that pitch discrimination was associated with early language delay as well as developmental outcome of autism (Bonnel et al. 2010; Eigsti and Fein 2013; Jones et al. 2009). One study reported that unlike children with learning disability and typical developing, children with autism were not susceptible to higher-level semantic capture when performing a pitch judgment task (Järvinen-Pasley et al. 2008). In another study, however, MMN enhancement in children with autism for a vowel contrast was found to be disturbed by pitch variation (Lepistö et al. 2008). Thus enhanced pitch processing in autism may diminish the capacity to extract invariant phonemic categories from the highly variable speech input for language acquisition (Kujala et al. 2013; Lepistö et al. 2008).

As previous work has exclusively relied on data from native speakers of nontonal languages such as English, it remains unclear whether the autistic enhancement in pitch perception operates universally regardless of the language background of the participants. Pitch changes play a special role in a tonal language as they form phonemic contrasts at the syllabic level that signal differences in word meaning (Fromkin et al. 2000). For example, /bai/ in Mandarin Chinese means “white” when spoken with a rising tone (Tone 2), and the same syllable means “defeat” or “worship” when it is pronounced with a falling tone (Tone 4). It has been argued that the unique phonemic role of lexical tones requires the development of language-specific neural representations for the categorical pitch patterns in a tonal language (Chandrasekaran et al. 2007; Xi et al. 2010; Xu et al. 2006; Zhang et al. 2012).

The current investigation was initiated to address whether heightened auditory discrimination or neural sensitivity to pitch differences in autism would be applicable to lexical tone processing in speakers of a tonal language. There were two basic theoretical concerns. First, over-processing of lower-level perceptual features such as pitch variation may lead to deficient category learning of higher-level phonemic units of the lexical tone categories. Assuming that Mandarin Chinese speakers with autism also demonstrate enhanced pitch perception, this ability could potentially fine tune their processing towards detecting within-category pitch variations, which might hinder the proper acquisition and perception of lexical tones. Second, speech acquisition is known to be mediated by social factors including social preference and joint attention (Kuhl 2010), which have been shown to be impaired in autism (Dawson et al. 2004; Mundy and Neal 2000). Accordingly, phonological development of the lexical tone system could be deficient in Mandarin-speaking children with autism. Specifically, we hypothesized that Mandarin-speaking children with autism and age-matched typically developing (TD) controls would show distinct MMN patterns for pitch perception depending on whether the pitch information is phonetically meaningful. Enhanced neural sensitivity to pitch for the Chinese children with autism may be confined to the nonspeech stimuli, and this auditory hypersensitivity could potentially be problematic for processing the phonetic cues of pitch patterns for lexical tones. As previous neurophysiological studies have also shown atypical attention to pure tones and speech sounds in children with autism (Čeponienė et al. 2003; Ferri et al. 2003; Gomot et al. 2011; Whitehouse and Bishop 2008), the P3a component indexing involuntary attention switch to novelty detection (Escera et al. 1998) was also taken into account in our analysis. In particular, we assessed whether and how the P3a responses in the two groups of children would differentiate depending on factors including acoustic salience, linguistic/semantic significance, and social relevance of the stimuli.

To test our hypothesis, we employed a passive listening oddball paradigm to measure the MMN and the P3a responses in school-aged children with autism. This paradigm independent of behavioral measurement has been widely used in developmental research on auditory and linguistic processing with ASD population (e.g., Ferri et al. 2003; Gage et al. 2003; Gomot et al. 2002; Jansson-Verkasalo et al. 2003; Kuhl et al. 2005; Kujala et al. 2007; Lepistö et al. 2005, 2007, 2008) as it does not require focused attention or any overt responses. Previous studies using behavioral tasks manifesting superior pitch perception in autism were mostly conducted in individuals without intellectual impairment. The behavioral discrimination task that requires pressing a button to indicate same or different responses has been noted to be unsuitable for children under the age of 8 as a large proportion of them had difficulty performing the task (Heaton et al. 2008a, b). In the case of children with autism and intellectual impairment, pitch superiority failed to be replicated even with the visuo-spatial paradigm specifically developed for these children (Heaton et al. 2008a, b). Thus, in testing young children with and limited communication ability, the neurophysiological approach can serve as an objective tool to measure auditory discrimination (MMN) and involuntary attention switch (P3a) (Kujala et al. 2013; Näätänen et al. 2011). Abnormalities in the MMN component may reflect pre-attentive neural sensitivity problems in speech discrimination/categorization, and abnormalities in P3a may indicate deficits in the control of attentional resources in the context of novelty detection.

There were two experiments in our study. In Experiment 1, the stimuli included a pure tone condition and two conditions of Chinese lexical tones spoken either in real words or in nonwords. The purpose was to examine domain specificity of enhanced discriminative sensitivity of pitch in autism and test whether word status (including semantic information) would affect neural sensitivity to the lexical tone contrast. In Experiment 2 (see Table 5 for a summary of stimulus features in our experimental design), to further test domain specificity, we created hummed version of the real words as a refined acoustic control, which removed the linguistic aspects of the stimuli but preserved the contrastive pattern of pitch contours of the lexical tones.

Experiment 1

Methods

Participants

Participants with autism were recruited from a local rehabilitation school specially designated for children with autism. The diagnoses were established by pediatricians and child psychiatrists with extensive experience in diagnosing autism. All the children at the school met the DSM-IV criteria for Autistic Disorder (American Psychiatric Association 1994). As Chinese versions of the standardized diagnostic instruments, i.e., the Autism Diagnostic Interview –Revised (ADI-R; Lord et al. 1994) and the Autism Diagnostic Observation Schedule (ADOS; Lord et al. 2000), have not been officially validated and widely adopted in China (Huang et al. 2013; Sun et al. 2013), we confirmed diagnoses using the Chinese version of the Gilliam Autism Rating Scale–Second Edition (GARS-2; Gilliam 2006). The GARS-2 has previously been used for this purpose in published autism studies conducted in China (Yang and Lee 2014; Yi et al. 2014). It is a norm-referenced assessment tool for differentiating individuals with autism from typically developing and those with behavioral disorder. The three subtests of GARS-2—Stereotyped Behaviors, Communication, and Social Interaction, are based on the DSM-IV-TR (American Psychiatric Association 2000) and Autism Society of America (1994) criteria of autism. The normative sample for GARS-2 included 1,107 children and young adults between the ages of 3 and 22 with a diagnosis of autism. The Autism Index (AI) score assessed by GARS-2 can range from 40 to 165. An AI below 70 (69 or less) represent an individual is unlikely to have autism; an AI score between 70 and 84 represents an individual may have autism. Scores above 84 (85 or higher) represent an individual is very likely to have autism. Each child’s assessment was obtained from the special education teachers who had daily contact with the child for at least 6 h. The assessment showed an overall Autism Index of 137 (SD 35, range 67–165), indicating a strong probability of autism in this sample. Although there were three children with scores (67, 68, 79) below the “very likely to have autism” cut-off (85), we were able to solicit independent secondary confirmation for their autism diagnoses with two additional experienced pediatricians who were unrelated to the present study. The typical developing (TD) controls were recruited from a local elementary school. All participants had been screened for hearing either before entering school or during diagnosis. Pure tone audiometry was administered by otolaryngologists and met the criteria of normal hearing. All children in the autism group were verbal but with limited communication ability. Twelve of them had delayed onset of speech as measured by the use of two-word sentence. All participants were native Mandarin speakers. Children who had a known or diagnosed genetic, mental, or additional neurological condition were excluded.

Informed consent was obtained from each child’s parent following a protocol approved by the local institutional review board. Originally, twenty children with autism and eighteen controls participated in the experiment. After artifact inspection and EEG data preprocessing, only those who had sufficient numbers of ERP trials entered final analyses, including 17 children with autism (15 boys, M age 9.3 years, SD 1.8 years, age range 6.9–12.4 years) and 15 controls (12 boys, M age 9.5 years, SD 1.2 years, age range 7.7–11.8 years) for the pure tone condition, and 18 children with autism (16 boys, M age 9.3 years, SD 1.8 years, age range 6.9–12.4 years) and 16 controls (13 boys, M age 9.6 years, SD 1.2 years, age range 7.7–11.8 years) for the two lexical tone conditions. Nonverbal IQ scores were collected using the Raven’s Standard Progressive Matrices Test (Raven and Court 1998). The autism group scored lower with a mean score of 88 (SD 14) compared to that of 107 (SD 14) in the control group (t(32) = 3.55, p = .001) (see Table 1 for sample characteristics). The lower nonverbal scores in the children with autism in our study were expected and consistent with reported IQ profiles in the literature (Dawson et al. 2007).

Table 1 Descriptive characteristics of the sample

Stimuli and Procedure

The experiment consisted of three stimulus conditions respectively for pure tones, lexical tones in real monosyllabic words, and lexical tones in nonwords. Two simple tones (216 Hz for the standard, 299 Hz for the deviant) were created using the Praat software (Boersma and Weenink 2014). The two frequencies were chosen to be within the range of fundamental frequencies of the lexical tones in the study. The lexical tones were uttered by a female talker and recorded using Neundo 4 software (Steinberg Media Technologies, Germany). In the real word condition, the standard stimuli were /bai2/ with a rising tone and the deviant stimuli were /bai4/ with a falling tone. The nonword condition used a nonsensical syllable /rai/, and the lexical tone contrast and other aspects of stimulus setup in the nonword condition were kept the same as in the real word condition. Each stimulus was 350 ms long including 5 ms fade in/out.

The three stimulus conditions were presented in separate blocks. Each block started with ten trials of standard stimuli. The standard and the deviant stimuli were respectively 84 and 16 % of the total trials. The inter-stimulus interval (ISI) was 500 ms. The stimuli were presented pseudo-randomly with at least two consecutive standards before each deviant. Stimuli were presented via AKG K518 earphones at approximately 60 dB SL (sensation level). Participants were asked to watch a muted self-chosen cartoon movie and ignore the presented sounds. Counterbalancing was implemented for the presentation orders among the participants.

EEG Recording and Data Analysis

EEG data was recorded with a 32-channel BrainAmps DC amplifier system at a sampling rate of 500 Hz (Brain Products, Germany). The left mastoid was the reference electrode, and the AFz served as the ground electrode. Eye blinks and movements were monitored with electrodes placed below the right eye and the outer corner of the left eye. Electrode impedances were kept below 10 kΩ.

ERP data analysis was performed with BrainVision Analyzer. The data was offline re-referenced to the average of left and right mastoid recordings. Epochs of 800 ms (including a 100 ms pre-stimulus time) were averaged separately for the standards and deviants. The epochs were digitally filtered with a 1–30 Hz band-pass and baseline-corrected. Trials with instantaneous amplitude exceeding ±150 μV were rejected. Standard trials that immediately followed the deviant were also excluded. Subjects needed to have a minimum of 70 deviant trials accepted in each condition to be included in the final analyses.

The MMN and P3a responses were derived from the deviant-minus-standard difference ERP waveforms in each condition. Based on the grand mean data, the MMN was defined as the largest negative deflection within 100–250 ms post stimulus, and the P3a was defined as the largest positive deflection within 250–500 ms. The MMN and P3a mean amplitudes were calculated with a 60 ms time window around peak for each individual subject. Based on the grand mean waveform data for MMN and P3a, the Fz and Cz electrode sites were used for statistical analysis. The presence of MMN and P3a components in each group were examined using one-sample t test (making comparison relative to the zero baseline). As long as one group showed presence of the component, two-way ANOVA would be conducted for that stimulus condition to analyze group effect as well as possible site by group interaction. Planned independent sample t tests for Fz and Cz sites were also performed to locate group differences. Wherever appropriate, p-values after Greenhouse-Geisser correction were reported. Partial eta squared for ANOVA and Cohen’s d for two-sample t tests were calculated to evaluate effect sizes.

Results

MMN Data

In the pure tone condition, significant MMNs were elicited in both groups at Fz site (TD: t(14) = −3.52, p = .003; autism: t(16) = −4.61, p < .001) and Cz site (TD: t(14) = −3.63, p = .003; autism: t(16) = −6.08, p < .001). Two-way ANOVA for MMN showed that group effects were approaching significance for amplitude (F(1, 30) = 3.35, p = .077, partial η2 = 0.100) and latency (F(1, 30) = 3.03, p = .092, partial η2 = 0.092). Further independent sample t tests revealed that the autism group had larger MMN amplitudes than the control group at Cz (t(30) = 2.239, p = .033, Cohen’s d = 0.798) but not at Fz (t(30) = 1.27, p = .215, Cohen’s d = 0.453). Latency measurements showed similar pattern for which the autism group had a tendency of shortened latencies at Cz (t(30) = 2.016, p = .053, Cohen’s d = 0.740) but not at Fz (t(30) = 1.15, p = .261, Cohen’s d = 0.412) (Table 2; Fig. 1). There were no group × site interactions on amplitude (F(1, 30) = 0.62, p = .439, partial η2 = 0.020) or latency (F(1, 30) = 1.55, p = .223, partial η2 = 0.049).

Table 2 MMN mean amplitude and latency data in children with Autism and TD controls at Fz and Cz (Experiment 1)
Fig. 1
figure 1

The deviant-minus-standard difference waves for the pure tone, the real word, and the nonword conditions (Experiment 1)

There was a reversal of the MMN pattern in the speech conditions. The MMN responses in the autism group appeared to be diminished (Table 2; Fig. 1). For lexical tones in real words, both groups displayed significant MMN activities at Fz (TD: t(15) = −5.44, p < .001; autism: t(17) = −3.01, p = .008) but only the TD group at Cz (TD: t(15) = −5.88, p < .001; autism: t(17) = −0.61, p = .491). Two-way ANOVA revealed a significant group effect on amplitude (F(1, 32) = 4.45, p = .043, partial η2 = 0.122). Children with autism had smaller MMNs than the TD children at Fz (t(32) = −2.45, p = .032, Cohen’s d = −0.768) but not at Cz (t(32) = −1.67, p = .104, Cohen’s d = −0.588). There were no group effect on latency (F(1, 32) = 0.68, p = .417, partial η2 = 0.021), no group × site interactions on amplitude (F(1, 32) = 0.005, p = .944, partial η2 < 0.001) or latency (F(1, 32) = 0.211, p = .649, partial η2 = 0.007). For the nonword condition, only the TD group displayed typical MMN activities at Fz (TD: t(15) = −3.52, p = .003; autism: t(17) = −1.76, p = .101) and Cz (TD: t(15) = −3.80, p = .002; autism: t(17) = −0.34, p = .741). Two-way ANOVA showed no significant group effect on amplitude (F(1, 32) = 2.72, p = .109, partial η2 = 0.078) and a tendency of shorter latencies in the TD group (F(1, 32) = 3.59, p = .067, partial η2 = 0.101). Further t tests revealed that the TD group had shorter latencies than the autism group at Fz (t(32) = −2.12, p = .042, Cohen’s d = −0.731) but not at Cz (t(32) = −1.06, p = .296, Cohen’s d = −0.367). There were no group × site interactions on amplitude (F(1, 32) = 0.64, p = .429, partial η2 = 0.020) or latency (F(1, 32) = 0.93, p = .342, partial η2 = 0.028).

As the autism group scored lower on their NVIQ than the control group, in order to see if there was any effect of NVIQ on MMN responses in both amplitude and latency measures, Pearson’s correlation analysis was performed for MMN responses to pure tones and the NVIQ scores in the autism group. No significant correlation between these brain and behavioral variables was found (amplitude: r = .30, p = .243; latency: r = −.22, p = .405).

P3a Data

In the pure tone condition, typical P3a activities were elicited in both groups at Fz (TD: t(14) = 2.16, p = .049; autism: t(16) = 4.23, p = .001), but only in the autism group at Cz (TD: t(14) = 1.53, p = .148; autism: t(16) = 3.95, p = .001). Two-way ANOVA revealed there was a significant group effect on amplitude (F(1, 30) = 6.285, p = .018, partial η2 = 0.173). Children with autism had enhanced P3a responses for pure tone change at both Fz (t(30) = −2.28, p = .030, Cohen’s d = −0.819) and Cz (t(30) = −2.49, p = .019, Cohen’s d = −0.895) (Table 3; Fig. 1). There were no group effect on latency (F(1, 30) = 1.44, p = .240, partial η2 = 0.046), no group × site interactions on amplitude (F(1, 30) = 0.24, p = .627, partial η2 = 0.008) or latency (F(1, 30) = 0.34, p = .565, partial η2 = 0.011).

Table 3 P3a mean amplitude and latency data in children with autism and TD controls at Fz and Cz (Experiment 1)

In the real word condition, significant P3a activities were elicited in both groups at Fz (TD: t(15) = 2.77, p = .014; autism: t(17) = 6.07, p < .001) and Cz (TD: t(15) = 2.45, p = .027; autism: t(17) = 4.80, p < .001). Two-way ANOVA showed that the autism group had prolonged latencies (F(1, 32) = 4.35, p = .045, partial η2 = 0.120) and a tendency of larger amplitudes (F(1, 32) = 3.15, p = .085, partial η2 = 0.090) compared to the TD group. Further independent sample t test revealed a tendency of larger amplitude in the autism group at Cz (t(32) = −1.98, p = .056, Cohen’s d = −0.687) which was obvious in the waveforms (Fig. 1), but not at Fz (t(32) = −1.30, p = .203, Cohen’s d = −0.444). There were no group × site interactions on amplitude (F(1, 32) = 1.13, p = .297, partial η2 = 0.034) or latency (F(1, 32) = 0.14, p = .710, partial η2 = 0.004). For the nonword condition, significant P3a activities were elicited in both groups at Fz (TD: t(15) = 2.53, p = .023; autism: t(17) = 5.20, p < .001) and Cz (TD: t(15) = 2.57, p = .021; autism: t(17) = 4.14, p = .001). There were no group effects on amplitude (F(1, 32) = 2.80, p = .104, partial η2 = 0.081) or latency (F(1, 32) = 0.51, p = .482, partial η2 = 0.016), no group × site interactions on amplitude (F(1, 32) = 0.47, p = .499, partial η2 = 0.014) or latency (F(1, 32) = 0.02, p = .892, partial η2 = 0.001) (Table 3; Fig. 1).

No significant correlation was found between the nonverbal IQ measures and the P3a amplitude for pure tones in the autism group (r = .16, p = .532) or latency (r = −.02, p = .947).

Experiment 2

Methods

Hummed version of the real word stimuli in Experiment 1 was generated using Praat. The hummed stimuli preserved the prosodic variations, intensity characteristics, and duration but were phonetically and semantically unintelligible. Subject recruitment and experimental procedure for data collection and analysis followed Experiment 1. The GARS-2 measure for the autism group showed an overall Autism Index of 140 (SD 35, range 68–165). Three children with scores (72, 79, 68; two of them also participated in Experiment 1) below 85 underwent secondary diagnostic confirmation as described earlier. All children were verbal but with limited communication ability. Nine of them had delayed onset of speech as measured by the use of two-word sentence. Originally, 18 children with autism and 19 controls participated in this experiment. After artifact inspection, 16 children with autism (15 boys, M age 9.6 years, SD 1.3 years, age range 7.9–12 years) and 18 controls (12 boys, M age 9.3 years, SD 1.8 years, age range 6.9–12.4 years) entered the final analyses. Thirteen children in the autism group were the same participants from Experiment 1 and the controls were new except for one child who also participated in Experiment 1. The autism group had lower nonverbal IQ with a mean of 94 (SD 16) compared to that of 106 (SD 11) in the control group (t(32) = 2.48, p = .019) (see Table 1 for sample characteristics).

Results

For the MMN, one-sample t tests showed that the two subject groups did not elicit prominent activities at Fz site (TD: t(17) = −0.70, p = .492; autism: t(15) = −2.00, p = .064) and only the autism group at Cz (TD: t(17) = −0.53, p = .606; autism: t(15) = −4.81, p < .001). Two-way ANOVA revealed a trend of smaller amplitudes in the autism group (F(1, 32) = 3.00, p = .093, partial η2 = 0.086) and a trend of group × site interaction on amplitude (F(1, 32) = 3.3108, p = .078, partial η2 = 0.094). Independent sample t tests were performed and showed that the autism group had larger MMNs than their controls at Cz (t(32) = 2.63, p = .013, Cohen’s d = 0.913) but not at Fz (t(32) = 0.76, p = .454, Cohen’s d = 0.263) (Table 4; Fig. 2). There were no group effect on latency (F(1, 32) = 0.01, p = .937, partial η2 < 0.001) or group × site interaction on latency (F(1, 32) = 0.43, p = .517, partial η2 = 0.013).

Table 4 MMN and P3a mean amplitude and latency data in children with autism and TD controls for the hum stimuli (Experiment 2)
Fig. 2
figure 2

The deviant-minus-standard difference waves for the hum stimuli (Experiment 2)

For the P3a, both groups displayed presence of the component at Fz (TD: t(17) = 3.26, p = .005; autism: t(15) = 5.05, p < .001) and only the TD group at Cz (TD: t(17) = 4.82, p < .001; autism: t(15) = 1.47, p = .162) . Two-way ANOVA revealed a trend of group effect on amplitude (F(1, 32) = 2.92, p = .097, partial η2 = 0.084) as well as on latency (F(1, 32) = 3.13, p = .086, partial η2 = 0.089). There was a significant group × site interaction on amplitude (F(1, 32) = 21.59, p < .001, partial η2 = 0.403) but not on latency (F(1, 32) = 0.53, p = .470, partial η2 = 0.016). Further independent sample t tests revealed greater responses in the TD group at Cz (t(32) = 3.60, p = .005, Cohen’s d = 1.681) but not at Fz (t(32) = 0.002, p = .998, Cohen’s d = 0.003). There was a tendency of shorter latencies in the TD group at Fz (t(32) = −1.99, p = .056, Cohen’s d = −0.702) but not at Cz (t(32) = −1.16, p = .254, Cohen’s d = −0.377).

No significant correlations was found between the nonverbal IQ scores and the MMN or P3a amplitude or latency measures in the autism group (MMN amplitude: r = −.22, p = .379; MMN latency: r = .35, p = .156; P3 amplitude: r = −.26, p = .144; P3 latency: r = −.16, p = .371).

Discussion

MMN Findings

The current investigation employed an auditory passive oddball paradigm to examine pitch processing in pure tones, lexical tones in word and nonword conditions, and nonspeech pitch contour contrasts in school-aged children with autism and TD controls who spoke a tonal language (Mandarin Chinese). The major finding was that enhanced auditory discrimination or neural sensitivity to pitch in autism as indicated by the MMN measure was limited to the nonspeech domain (see Table 5 for a summary of findings). Furthermore, there was diminished brain sensitivity for lexical tone processing at the pre-attentive level in the Mandarin-speaking children with autism.

Table 5 Summary of stimulus features in the experimental design and the corresponding MMN and P3a results

Our pure tone data were consistent with previous behavioral and MMN findings (Bonnel et al. 2003, 2010; Ferri et al. 2003; Gomot et al. 2002). According to the “neural complexity hypothesis” (Samson et al. 2006), individuals with autism may show enhancement as pure tone processing predominantly engages local neural activity in the primary auditory cortex, and they may also display difficulty when higher level of neural complexity including association cortex is required. However, pitch perception studies do not fully support this account. For instance, Bonnel et al. (2010) found enhancement in individuals with autism for pure tones, but the clinical and control groups did not differ in thresholds of discriminating complex nonspeech and speech sounds. This finding is supported by the similar MMN results for the nonspeech conditions (i.e., pure tones and hummed sounds) even though the hummed sounds in Experiment 2 are spectro-temporally more complex than pure tones.

Pure tone superiority in autism has also been challenged by a recent behavioral study in which frequency discrimination threshold was examined (Boets et al. 2014). They found inferior pure tone discrimination in adolescents with ASD compared to TDs when the reference tone was varied across trials. This deficit was interpreted as impaired auditory working memory (Boets et al. 2014). However, this interpretation should not be viewed as being contradictory with our current finding of greater neural sensitivity (i.e., larger MMNs) to pure tone change, as auditory working memory or general cognitive factors such as central coherence may involve different mechanisms from the pre-attentive discrimination measure of MMN. Thus, it would be interesting to test in future studies how the cognitive factors and perceptual sensitivity to pitch interact in attentive listening paradigms.

As the hummed stimuli preserved the pitch contour patterns of the lexical tones and intensity variations over time and deprived lexical semantics from the stimuli, the MMN results in Experiment 2 lend additional support to our domain specificity interpretation in that the MMN enhancement effect for the nonspeech stimuli failed to extend to phonological/semantic processing of pitch information in lexical tones. Our findings are also in line with the neural complexity hypothesis that nonspeech rather than speech conditions yield enhanced perception, as neither the pure tones nor the hummed stimuli were comparable to the lexical tone speech stimuli in terms of spectro-temporal complexity.

Our domain specificity interpretation may be seemingly in contrast with findings from other studies using Finnish syllables, in which the MMNs for fixed pitch condition was enhanced in autism (Lepistö et al. 2005, 2008). We do not consider the results as being inconsistent with each other. The difference here can be explained in terms of phonological status of the pitch information in the stimuli. The pitch information in the speech stimuli for the Finnish studies was phonologically irrelevant. In our study with tonal language users, however, the pitch difference constitutes a phonemic contrast, and the reduced MMN for lexical tones in the Mandarin speaking children with autism is likely attributable to problems in the proper development of neural representations (Zhang et al. 2005) and acquisition of lexical tones for tonal language users. Mandarin speaking children are exposed to language input delivered by numerous speakers in their immediate environment, which provides highly variable pitch information for each of the four lexical tone categories. In order to establish stable mental representations of the phonological categories for the lexical tones, the listener needs to develop enhanced sensitivity for between-category contrasts and ignore subtle within-category variations, which is known as categorical perception for speech sounds (Liberman et al. 1957).

Recent MMN studies on typical developing Chinese adults and children have shown clear evidence for categorical perception of lexical tones (Chandrasekaran et al. 2009; Xi et al. 2010). If the lexical tone categories are acquired based on statistical learning of tonal variations in the speech input, the enhanced pitch discrimination ability in autism could have adverse effects by biasing the listener’s processing to within-category variations (Kujala et al. 2013). In the current study, the autism group’s reduced neural sensitivity in lexical tone processing could indicate deficiencies in categorical perception. We speculate that one possible cause of such deficiencies can be the lack of inhibitory mechanism for suppressing the detection of irrelevant within-category pitch differences. On this point, our data are consistent with the Finnish findings that when irrelevant pitch variations were introduced in the speech stimuli, the children with autism no longer showed the enhanced MMN phenomenon for detecting the phonemic categories (Lepistö et al. 2008).

Another question pertinent to the current study was the role of social factors in phonological development. In addition to statistical learning, social abilities such as join attention and social orienting are proved to be critical in early speech acquisition (Kuhl 2010), however, are profoundly affected in children with autism (Dawson et al. 2004; Mundy and Neal 2000). Direct evidence of social factors came from a study in which children with autism who did not show social preference to “motherese” but showed preference to nonspeech material, had no significant phonetic learning effect reflected by MMN responses (Kuhl et al. 2005). Our MMN data are consistent with this interpretation that the autism group’s pitch enhancement in nonspeech did not extend to the domain of speech perception. However, our study did not look into specific measures of social abilities of the participants or in-depth language profiles. Thus it requires more rigorous investigations with direct measures to establish whether there exists a link between social deficit and impaired lexical tone perception.

The MMN measure has been highlighted as a potential clinical biomarker (Näätänen et al. 2012). To probe this issue, we have further looked in to individual-level analysis by comparing the MMN amplitudes of the subjects from Experiment 1 who did both speech (real word) condition and nonspeech (pure tone) condition. Specifically, there are 13 out of 17 (76.5%) children in the autism group reached an enhancement of 1 μV or more whereas the control group showed such enhancement in only 6 out of 15 subjects (40%). None of the children in the control group reached 2 μV enhancement, but 11 children (64.7%) in the autism group did. We further evaluated the results with a binomial test under the null hypothesis that the pure tone enhancement is equally likely to occur in the autism group and in the control group. The null hypothesis was to be rejected because the binomial test produced a p value of .002, showing that the autism group had a significant higher occurrence of pure tone enhancement than the control group. Although typical ERP studies do not report this type of individual analysis, we believe that this approach of combining speech and nonspeech evaluations at the individual level would allow us to appreciate the utility as well as potential limitations of the MMN measure as a biomarker for early diagnosis.

P3a Findings

The P3a results revealed a different aspect of pitch processing at the cortical level (Table 5). In comparison with TDs, the children with autism showed greater P3a responses of attention switch to change detection for pure tones and at least equal amount of amplitudes with delayed responses for lexical tones in words, but smaller P3a for the hummed sounds. Our data suggest that attentional orienting to novel sounds in the oddball paradigm may depend on a number of parameters, including acoustic saliency, linguistic significance, and social significance of the stimuli.

Our P3a results for the pure tones are consistent with previous reports (Ferri et al. 2003; Gomot et al. 2011), indicating greater susceptibility to salient environmental changes in children with autism. This interpretation is also in accordance with the notion that individuals with autism lack the ability in automatic suppression of peripheral stimulation, which may be attributed to over-performing prefrontal cortex with local hyper-connectivity (Markram and Markram 2010; Rinaldi et al. 2008).

Our P3a data for the lexical tones further suggest that semantic relevance of the stimuli is an important factor as the autism group had delayed responses and a tendency of P3a larger amplitudes for lexical tones in the word condition but not in the nonword condition. Previous findings with speech stimuli are mixed. Some researchers suggested a speech-specific attention deficiency in children with autism as indicated by smaller or absent P3a following the MMN activity for a vowel change (Čeponienė et al. 2003; Lepistö et al. 2005). However, children with autism also showed larger P3a amplitudes to deviant speech sound presented among sequences of nonspeech standard stimuli. This pattern was reversed in the test condition with the nonspeech stimuli as the deviant, demonstrating that children with autism may attend to novel speech sounds in a hyper-orienting manner for the detection of linguistic relevance of the target stimuli (Whitehouse and Bishop 2008). Along the same line, our P3a data for the real word condition could indicate a difference between the two subject groups in the level of arousal/attentiveness as a function of the semantic significance of the speech stimuli. When the semantic relevance was deprived as in the nonword condition, the tendency of enhanced P3a in children with autism also disappeared. However, these children’s orientation to real words appeared to be delayed compared to TDs as indicated by the latency result, suggesting a slower evaluation process of such meaningful speech sounds (Polich 2007). This was not observed for the nonword condition, which may imply semantic relevance rather than phonetic information in the stimuli could be a factor mediating processing speed in autism. As none of the previous studies used semantic meaningful sound contrasts or lexical tone contrasts, it merits further investigation whether there is semantically driven P3a modulation for speech stimuli in autism.

An additional factor that influences the strength of P3a could be social significance of the stimuli. Unlike the pure tone results in Experiment 1, the control group showed larger P3a amplitudes and a tendency of shorter latencies in response to the hummed stimuli. We suspect that the TD children might find the prosodic variations in the hummed sound more socially relevant or interesting, which may lead to a more prominent involuntary attention switch to novelty detection in the sequence of perceptually demanding stimuli.

Taken the MMN and P3a results as a whole due to their temporal contiguity, one may suspect that the smaller MMN responses displayed in the autism group for the real word condition could be contingent upon the relatively larger P3a responses that follows the MMN. In other words, the P3a component may have pulled down the preceding MMN component. However, this interpretation does not stand up to the fact that the pattern was not seen in the other conditions where the autism group either had smaller or larger MMNs. For instance, in the pure tone condition, the MMN and P3a responses were both larger in the autism group than in the control group. Moreover, correlation analysis with MMN and P3a amplitudes showed that the two variables were not correlated in either subject group for both speech and nonspeech conditions. Thus, the smaller MMNs for speech stimuli but not for nonspeech stimuli in the autism group are more likely a reflection of domain specificity rather than correlations from the temporally contiguous MMN and P3a components.

To our knowledge, this is the first study to have demonstrated domain specificity of enhanced neural sensitivity in autism to nonspeech pitch information in speakers of a tonal language. Our study is not without its limitations as this domain specificity interpretation exclusively rests on the MMN data for acoustic/phonological processing alone. Nevertheless, such an interpretation allows integration with previous MMN findings on pitch processing in autism, which typically tested nontonal language speakers with nonsense syllables without involving the semantic contrast factor. Our results are in support of the notion of impaired change detection for the linguistic elements of speech in autism (for a review, see O’Connor 2012). When P3a is taken into consideration, semantic significance and social relevance appear to be other contributing factors to atypical attentional orienting to change detection in the listeners with autism in addition to acoustic salience. This data-driven interpretation of P3a points to the future research directions to seek further clarification on how attentional switch is modulated by acoustic saliency, semantic significance, and social significance in speech stimuli. As our conclusions in the report are solely based on neurophysiological data, caution should be taken when generalizing the results to behavioral domain. In this regard, it would be of importance to develop behavioral paradigms that are suitable for testing the perception of speech and nonspeech stimuli in children with autism and intellectual impairment.

Of particular relevance here are the potential different developmental trajectories of pitch processing in ASD and TD. Recently, a study by Mayer et al. (2014) revealed enhanced complex tone and speech pitch perception in children with autism compared to TDs but not in adolescent or adult groups. Thus, it would be of great value to extend this line of study to Chinese-speaking adolescent as well as adult cohorts.

As phonological learning involves a largely implicit neural commitment process by perceptually “tuning out” irrelevant acoustic information for early language acquisition, neural and behavioral measurements of speech perception could potentially serve as biomarkers for early diagnosis of autism (Kuhl 2010). Despite the inherent limitations of our approach, the present findings indicate that neural sensitivity as well as attentional switch in autism may be differentiated between nonspeech and speech processing in terms of the acoustic, linguistic, and social aspects of the stimuli. Thus, it will be of great value to incorporate evaluations from both speech and nonspeech domains in autism research as well as in practice.