Autism spectrum disorder (ASD) is a developmental condition characterized by troubles with social interaction and communication (DSM-5). It has long been claimed that individuals with autism face deficits in comprehending verbal materials, while comprehension of visual materials remain intact, a perspective dubbed the “Visual Ease Assumption” (for review, see Coderre 2019). One manifestation of this Visual Ease Assumption is that researchers frequently use experimental tasks with visual narratives like comics (Baron‐Cohen et al. 1986; Johnels et al. 2013). These stimuli are chosen because researchers believe that visual narratives are more transparent to children, that they involve theory of mind to understand, and that this non-verbal information is easier for children with ASD to process than verbal stimuli (Coderre 2019). Like verbal utterances which appear in grammatical sequences, visual narratives also use a structured sequential organization (Cohn 2019). This structure provides a cross-modal comparison between verbal and visual information when both use a coherent sequence, beyond asymmetrical comparisons of sequential verbal utterances and single unit visual materials. Nevertheless, only one study has examined the neural differences in processing visual narratives and verbal stimuli between neurotypical individuals and those with ASD (Coderre et al. 2017), and none have extended such comparisons to children. This study thus is the first to examine the neurocognitive processing of visual narratives, and their comparison with spoken stimuli, in both neurotypical children and those with ASD.

A common measure using visual narratives is the Picture Arrangement Task (PAT), which gives participants several unordered images and asks them to arrange them into a coherent sequence. This measure is used frequently in various experimental paradigms, particularly because of its inclusion in clinical assessments and general intelligence (IQ) tests (WAIS-IQ, WISC) (Kaufman and Lichtenberger 2006; Wechsler 1981a, b). Scoring on this task is based on how close participants recreate a specific “correct” sequence. Previous behavioral studies (Baron‐Cohen et al. 1986; Johnels et al. 2013) have shown that individuals with ASD are worse than controls in constructing the expected sequences in the PAT. In addition, lower PAT scores have been observed for children with ASD than children with other language deficits (Allen et al. 1991; Goldstein et al. 2001). Similarly, in tasks asking participants to choose an image to end a visual narrative sequence, children with ASD produce fewer correct responses for predicting the final action of a visual event sequence than typically-developing controls (Zalla et al. 2010).

Deficits in visual narrative understanding by individuals with ASD are usually attributed to issues with theory of mind rather than basic semantic processing (Baron‐Cohen et al. 1986; Johnels et al. 2013). Indeed, in line with the Visual Ease Assumption, the processing of visual information is widely assumed to remain intact in ASD, and thus looked to as an alternative to verbal stimuli, which have widely been shown to have deficits in ASD. Moreover, individuals with autism have scored higher on Raven's Standard Progressive Matrices than would be predicted by their Wechsler intelligence scores (Dawson et al. 2007; Soulieres et al. 2009).

Studies of semantic processing have implied that there is at least some cortical overlap across the processing of meaning in different modalities. Studies of event-related brain potentials (ERPs) have identified the N400 response as an index of semantic processing (Kutas and Federmeier 2011). This ERP response is a negativity-polarity component that peaks roughly 400 ms after the onset of a stimulus and is thought to index the spreading activation in the access of semantic information by a stimulus in relation to its preceding context (Kutas and Federmeier 2011). The N400 has been observed in a wide array of meaningful contexts to various levels of linguistic structure (Kutas and Hillyard 1980, 1984; Camblin et al. 2007; Bentin et al. 1985). However, it is also shown to be at least somewhat modality-independent, and also arises in response to individual visual images (Van Berkum et al. 1999, 2003; Ganis et al. 1996; Olivares et al. 1999; Proverbio and Riva 2009; Bach et al. 2009; Manfredi et al. 2017) or sequences of events and visual narratives (Sitnikova et al. 2008; Sitnikova et al. 2003; West and Holcomb 2002; Cohn et al. 2012) and also to multimodal interactions between visual narratives and both auditory and written language (Manfredi et al. 2017, 2018). Several previous findings showed that pictures elicited a more frontally distributed N400 compared to linguistic information (Kutas and Federmeier 2011).

In addition to the N400, several studies of language have observed a late positive deflection in a posterior region of the scalp, the P600 (Kuperberg 2007; Coulson et al. 1998; Gunter et al. 1997; Hahne and Friederici 1999; Van Petten and Luka 2006). Theories of the neurocognition of language have viewed the N400 and P600 as tightly coupled, subserving complementary processes of semantic access or retrieval (N400) with a subsequent process of reanalysis, integration, or updating (Brouwer et al. 2012; Kuperberg et al. 2016). Previous studies observed the P600 as elicited by syntactic violations (Hagoort et al. 1993; Osterhout and Holcomb 1992) but later research observed it with semantic content, including visual events (Cohn and Paczynski 2019; Kuperberg et al. 2003). For example, violations of the internal structure of visual events evoke a positivity similar to the P600, often preceded by an N400 (Cohn and Paczynski 2019; Cohn and Kutas 2015; Amoruso et al. 2013; Cohn and Maher 2015). Such work implies that the P600 reflects the integration of information with the current semantic representation into an updated representation (Brouwer et al. 2012). This link between the N400 and P600 are further reinforced by findings that participants often fall on a distribution for which ERP component is elicited for a given linguistic manipulation (Tanner and Van Hell 2014).

Previous studies of the N400 in the ASD population have revealed deficits in neural processing of verbal semantic information (Dunn and Bates 2005; Dunn and Vaughn 1999; Pijnacker et al. 2010; Kujala et al. 2013; Braeutigam et al. 2008; Lepistö et al. 2005; Fishman et al. 2011; see O’Connor 2012). For example, Pijnacker et al. (2010) studied the processing of sentence context and reasoning context in adults with high-functioning autism (HFA), Asperger syndrome, and in a matched control group. They found that that adults with HFA require more elaborate processing for sentence interpretation. Similar results were observed in a MEG study performed by Braeutigam et al. (2008), who found a weaker N400-type effect in response to incongruous words for individuals with autism compared with a control group.

Studies of the N400 response for children with ASD have often compared verbal and non-verbal information (Ribeiro et al. 2013; McCleery et al. 2010). McCleery et al. (2010) recorded EEG while children observed semantically congruent and incongruent picture–word and picture–environmental sound pairs. ASD children showed an N400 effect only for environmental sounds, but not words. This was taken to suggest that the semantic deficit in children with ASD is more selective to the verbal than the nonverbal domain.

These latter studies examined the nature of semantic processing in autism by using multimodal paradigms. Therefore, even though their results shed light on the semantic processing of children with ASD, the paradigms that they used do not isolate and compare the cognitive mechanisms that could be involved in the two different sensory modalities (i.e., verbal vs. non-verbal). In addition, these studies present visual and verbal information in asymmetric ways. While sentence processing typically unfolds sequentially, pictures in these studies are presented without any sort of binding sequence. This is different from the sequential, visual narratives often used in studies of ASD. Indeed, a growing literature has shown that visual narratives are governed by structural constraints analogous to those found in written sentences (Cohn 2019). Previous studies have revealed that the N400 appears to be evoked in similar ways to semantic processing in visual narratives as in sentences, such as an attenuation of the N400 across ordinal sequence position of coherent visual narratives (Cohn et al. 2012). Furthermore, this processing of visual sequential information interacts with the plausibility of verbal/non-verbal information tied to the images, suggesting connections between modalities within semantic memory (Manfredi et al. 2017, 2018).

Like in sentences, the processing of meaning in visual narratives appears to involve a cyclic relationship of semantic access and subsequent integration or updating, as indexed by the N400 and P600 ERPs respectively (Cohn 2019). That is, when a comprehender views a panel within a sequence of images, they extract the relevant visual cues for the sequential meaning and access this corresponding information in semantic memory, as indexed by the N400 (Cohn 2019; West and Holcomb 2002). This information is subsequently incorporated into a mental model which is updated given the changes that occur across the narrative sequence (Loschky et al. 2017). Such updating processes, as indexed by the P600, have been shown in visual narratives for discontinuity between panels in terms of situational information, like changes between characters or expected events (Cohn and Kutas 2015), or to images where lines depicting motion are omitted or reversed to be anomalous (Cohn and Maher 2015). Studies of visual events have similarly shown P600s to incongruous event structures (Sitnikova et al. 2003; Amoruso et al. 2013). Thus, visual narrative sequences seem to involve similar processing mechanisms as sentences, only operating on different representations in the visual modality.

Only one prior ERP study has compared the N400 response in both verbal and visual narratives for both neurotypical and ASD populations but did so examining adults of a wide age range. Coderre et al. (2018) compared semantic comprehension of short verbal narratives where the final word was either congruous or incongruous, and also compared short 6-panel visual narratives (i.e., comic strips) with congruous or incongruous final images. Like previous findings, attenuated N400 effects were evoked by linguistic narratives for the ASD population compared to the controls, but so were the N400 effects to the visual narratives. Also, unlike prior studies of typically developing adults (Cohn et al. 2012), the N400s to coherent narratives for individuals with ASD showed no attenuation across the ordinal position of these sequences. These findings suggested impairments for both modalities in individuals with ASD, thereby implicating a more domain-general impairment.

These findings imply that difficulties shown by individuals with ASD may not be due to a modality-specific verbal impairment. However, it remains uncertain whether these conclusions extend to ASD children. Indeed, despite a robust literature of using visual narratives in experimental tasks with children (Coderre 2019), there appear to be no prior studies which have investigated the neurocognition of visual narrative comprehension in children. Thus, similar to Coderre et al. (2018), here we investigated the semantic processing of sentences and visual narratives (comics) in children with ASD and typically-developing (TD) children. Unlike that previous study, sentences were presented auditorily here, rather than via text, and both sentences and visual narratives were kept to short lengths. We made these changes to make the task more feasible for children: the auditory presentation of the sentences allowed for easier semantic processing of the content compared to written sentences that required reading, and the short length of the stimuli helped the children to maintain a high level of attention without overloading working memory.

Such investigation has ramifications for developing specific learning strategies and intervention programs, in addition to providing neurocognitive evidence to bring to bear on visual narratives used as experimental stimuli.

Our study had two aims: First, we sought further evidence for whether semantic processing is impaired during the presentation of both verbal and nonverbal information for children with ASD compared to typically developing children. Second, and more broadly, we sought to observe the ERPs evoked by visual narratives in children, given that no apparent prior studies have yet examined the neurocognitive processing of such ubiquitous materials.

We therefore recorded ERPs to auditory sentences and comic strips that were either semantically congruent or incongruent with the previous context. We will refer to the auditory sentence paradigm as Experiment 1 and to the visual narrative paradigm as Experiment 2. The paradigm with auditory sentences had a final word which was congruent or incongruent with the previous context, while wordless visual narratives had final critical panels that were congruent or incongruent with the previous visual context, as in Fig. 1. Based on Coderre et al. (2018), we expected that if semantic processing in children with ASD involved a modality-independent deficit, we should observe reduced N400 effects in both the verbal and visual stimuli compared to TD children. In contrast, if semantic processing in ASD is impaired in ways specific to the verbal domain, we expected a reduction of the N400 effect to incongruent verbal stimuli but a robust N400 to those in visual narratives. In addition, in order to verify whether ASD and TD children were equally able to integrate meaning across visual units, we analyzed the processing of the narrative sequence across the panels. This analysis was also conducted by Coderre et al. (2018), and we thus followed the same methodology.

Fig. 1
figure 1

a Example of sentences used as experimental stimuli. The final word of the sentence could be semantically congruent or incongruent with the previous context. b Example of strips used as experimental stimuli. The final panel of the strip could be semantically congruent or incongruent with the previous context

Methods

Participants

The experimental group involved twenty-four school-aged children and adolescents with ASD (range 9–15 years; mean age = 11.4, SE = 1.9) and sixteen age-matched typically developing children and adolescents (range 9–16 years; mean age = 12.6, SE = 2.1). All participants were right-handed male Portuguese speakers and had normal or corrected-to-normal vision. All of the participants’ parents provided written, informed consent. The study was approved by the Institutional Ethics Committees of the Mackenzie Presbyterian University and by the National Ethics Committee of Brazil.

All of the children with ASD met the DSM-IV criteria for autistic disorder (American Psychiatric Association 1987) and were diagnosed through structured assessment by a developmental psychiatrist and further confirmed using the Autism Diagnostic Interview-Revised (Lord et al. 1994). The children exhibited mild/moderate autistic symptoms.

Autistic children were recruited from public schools and social groups located in Sao Paulo: Apace, APRAESPI, Santa Casa. Controls were recruited from the immediate community members of children with ASD (family, schoolmates) to minimize differences in socioeconomic status. All of the parents in the two groups reported having medium or low income (i.e. people from C, D, E social classes according to the Brazilian salary classification scale).

An abbreviated version (2 subset; Cube and Vocabulary) of the Brazilian WISC-III (WISC-III; Wechsler 1992) was also administered to calculate an estimated IQ (De Mello et al. 2011). No significant differences were found in full scale IQ between children with ASD (M = 100, SE = 11.39) and TD children (M = 107, SE = 18.88). However, children with ASD exhibited a lower verbal IQ (M = 7.8, SE = 4.8) compared to the TD children (M = 12, SE = 3.7). Individuals with ASD often experience language deficits (Coderre et al. 2017), thus, to account for this group difference, verbal IQ scores were included as a covariate in all analyses.

Exclusionary criteria for all the groups included a nonverbal IQ below 80 as assessed by the WISC-III (Wechsler 1999) and a history of significant psychiatric or neurological comorbidities (Table 1).

Table 1 Participant characteristics for the TD and ASD groups

Eight ASD participants were excluded from the ERP statistical analyses because of EEG artifacts including excessive blinking, eye-movements and muscular movements. In addition, two control subjects were excluded from the analyses of auditory sentences but no control subjects were excluded from the analysis of visual narratives. This left 16 final participants in the ASD group, with 14 final participants in the neurotypical control group for the Experiment 1 (i.e., auditory sentences) and 16 participants in the control group for the Experiment 2 (i.e. visual narrative).

Stimuli

The auditory stimulus set included 152 sentences (76 congruent, 76 incongruent) (Fig. 1). A female native Portuguese speaker produced the sentence stimuli, which were recorded in a single session in a sound attenuating booth. All the sentences were standardized for sound quality (44.1 kHz, 16 bit, stereo). Auditory sentences consisted of a subject, a verb, and a direct object. The final word of each sentence (the object) was concrete common noun (mean duration = 887 ms, SD = 138 ms) that could be congruent or incongruent with the previous context. The EEG was synchronized to the onset of final word presentation.

Preassessment of stimuli were made by a group of 8 judges of similar age and educational level as the control subjects, though these participants did not take the EEG study. Sentences were discarded if they were congruent sentences rated as incoherent and incongruent sentences rated as congruent by at least 80–99% of judges. A total of two lists (each consisting of 152 sentences in random order) were created with the two conditions counterbalanced, such that participants viewed each sentence only once in a list.

For Experiment 2 (i.e., visual narratives), we used black and white panels from the Complete Peanuts volumes 1 through 6 (1950–1962) by Charles Schulz (Fantagraphics Books 2004–2006) to design 100 novel three-panel-long visual narrative sequences (Cohn et al. 2012; Cohn and Wittenberg 2015; Cohn and Kutas 2015; Manfredi et al. 2017). No panels contained text, and all were adjusted to a uniform size. All sequences had a coherent narrative structure as defined by prior theoretical and empirical research (Cohn 2013; Cohn et al. 2012). The final panel of each strip could be congruent or incongruent with the previous context. EEG data were synchronized to the onset of final panel presentation (Fig. 1). The stimulus set included 136 strips (68 congruent, 68 incongruent).

As in Experiment 1, the stimuli were evaluated by a group of 8 judges of similar age and educational level as the control subjects who did not participate in the experiment. Strips were discarded if at least 80–99% of judges rated a congruent strip as incoherent or an incongruent strip as congruent. A total of two lists (each consisting of 136 strips in random order) were created with the two conditions counterbalanced, such that participants viewed each sentence only once in a list.

Procedure

Participants sat in front of a monitor in a sound-proof, electrically-shielded recording chamber. During the auditory presentation of sentence stimuli, participants were instructed to maintain gaze on a fixation cross that appeared in the center of the screen and not to blink or move during the experimental session. When the sentence concluded, the question “was it understandable?” appeared on the screen and participants responded by pressing one of the two hand-held buttons. Response hand was counterbalanced across participants and lists.

During the visual presentation of stimuli, each panel stayed on screen for 1350 ms, separated by an ISI of 500 ms. As in Experiment 1, when the strip concluded, the question “was it understandable?” (in Portuguese) appeared on the screen, and participants responded by pressing one of the two hand-held buttons. Response hand was counterbalanced across participants and lists. Although behavioral performance was not our main outcome measure, we analyzed comprehensibility judgements rates to the comprehension task to ensure that participants were successfully attending to the stimuli, as in prior studies on visual narrative (e.g., Cohn and Kutas 2015; Cohn and Maher 2015). However, because several children with ASD omitted to answer several trials, we decided to analyze ERP responses to all stimuli presented.

Both the experiments had four blocks: in Experiment 1 each block contained 38 trials and in Experiment 2 each block contained 34 trials. The blocks were separated by short breaks that usually lasted few seconds (max 60 s). However, after the first two blocks the participants were invited to take a longer break that could last 2/3 min. In each experiment, the experimental trials were preceded by a short practice of 10 trials to familiarize participants with the procedures.

Electroencephalographic Recording Parameters

The electroencephalogram (EEG) signal was recorded from 128 electrodes at a sampling rate of 250 Hz (bandpass 0.01–100 Hz). The EEG was recorded and analyzed using Net station software (Geodesic EEG Net Station, EGI, Eugene, OR) (Fig. 2). The impedance of all electrodes was kept below 50 kΩ during the experiment. All recordings were referenced to the Cz electrode during data acquisition. This solution allowed us to analyze the mastoid-temporal lobe activity in addition to all other important sites for the linguistic processing.

Fig. 2
figure 2

Schematic showing the 128 channel array of scalp electrodes from which the EEG was recorded

Statistical Analysis of ERPs

Trials contaminated by blinks, muscle tension (EMG), channel drift, and/or amplifier blocking were discarded before averaging. Approximately 9% of critical panel and word epochs were rejected due to such artifacts, with losses distributed approximately evenly across the conditions. Participants with more than 20% of all trials rejected were removed from analysis. Each participant’s EEG was time-locked to critical words and panels, and ERPs were computed for epochs extending from 100 ms before stimulus onset to 1500 ms after stimulus onset. Finally, a 30-Hz lowpass filter was applied and data were re-referenced to an average reference.

Our analysis focused on two epochs of interest. We investigated the mean amplitude voltage of the N400 in the 350–550 ms epoch and a subsequent time window of 550–750 ms to investigate any later or sustained effects (i.e., P600 and sustained negativity). These responses were measured at 72 electrode sites (8 in each region) in fronto-left (23, 24, 26, 32, 33, 27, 34, 28), fronto-central (18, 16, 10, 22, 9, 15, 21, 14), fronto-right (3, 124, 2, 123, 122, 1, 117, 116), centro-left (39, 40, 45, 46, 41, 47, 50, 51), central (7, 106, 31, 80, 55, 30, 105, 79), centro-right (104, 103, 102, 98, 101, 97, 115, 109), parieto-left (58, 59, 60, 70,66, 65, 69, 64), parietal (61, 62, 78, 67, 72, 77, 71, 76), parieto-right (96, 85, 91, 83, 84, 90, 95, 89) electrode sites. Because we expected a different scalp distribution of the ERP responses for each modality (Coderre et al. 2018; Manfredi et al. 2018; Cohn et al. 2012; Kutas and Federmeier 2011; West and Holcomb 2002) and because this is the first study to use EEG to investigate narrative comprehension in children, we selected these sites to avoid missing any possible unexpected effects and to provide a broad scalp representation of the ERP responses.

For both stimuli categories, mean amplitude of N400 and later components (sustained negativity and/or P600) were analyzed using repeated measures ANOVAs with Group (2 levels: ASD children, Controls), Congruency (2 levels: Congruent, Incongruent), and Region (9 levels).

Separate ANOVAs for each macro region (Frontal, Central, Parietal) were performed to identify differences between the three regions. For both stimuli categories, mean amplitude of N400 and frontal negativity components were analyzed using repeated measures ANOVAs with Group (2 levels: ASD children, Controls), Congruency (2 levels: Congruent, Incongruent) and Region (3 levels). Verbal IQ was included as a covariate in all ANOVAs. All statistical analyses were performed using Statistica software (Statsoft version 7.0, USA). Multiple comparisons of means were performed by using the post-hoc Bonferroni's least-significant difference tests.

Results

Behavioral Results

Sentences

The ANOVA computed on ratings revealed a significant main effect of Congruency (F (1, 22) = 71.79, p < 0.01), arising because congruent sentences were rated as more coherent than the incongruent sentences. In addition, a Groups × Congruency interaction (F (1, 22) = 4.83, p < 0.05) showed that congruent sentences were rated as more coherent (M = 84.2%, SD = 17.7) in the Control group than the ASD group (M = 54.7%, SD = 28.6); in addition, incongruent sentences were rated as less coherent in the Control group (M = 14.6%, SD = 15.1) than the ASD group (M = 25.2%, SD = 25.4).

Visual Narratives

The ANOVA performed on ratings revealed a significant main effect of Congruency (F (1, 14) = 5.07, p < 0.05), showing that congruent strips (Control group: M = 66%, SD = 20; Children with ASD: M = 43.1%, SD = 25) were rated as more coherent than the incongruent ones (Control group: M = 29.1%, SD = 12; Children with ASD: M = 31.1%, SD = 14.7). No differences were found between groups (p > 0.05).

Electrophysiological Results

Sentences

N400 (350–550 ms)

The omnibus ANOVA across regions for the mean amplitude of the N400 component showed a main effect of Region [F (8, 232) = 2.66; p < 0.05]. The post-hoc analysis revealed a greater negativity in the frontal sites compared to the central and the parietal ones. However, no differences in the omnibus ANOVA suggested differences between sequence types or between TD and ASD participants (Figs. 3, 4). Finally, there was no significant effect of the covariate (p = 0.23).

Fig. 3
figure 3

Grand-average ERP waveforms recorded at central sites in response to Congruent word (blue) and Incongruent word (red) sentences in the control group

Fig. 4
figure 4

Grand-average ERP waveforms recorded at central sites in response to Congruent word (blue) and Incongruent word (red) sentences in the ASD group

Because of the effect of Region, additional follow up ANOVAs were performed on each of the three macro-regions (Frontal, Central and Parietal), which revealed Congruent and Incongruent words were significant only at the central areas [F (1, 29) = 6.01, p < 0.05]. Further analysis of the Central region revealed a main effect of Group [F (1, 29) = 8.58, p < 0.05] showing a significant greater negativity in the control group compared to the ASD one. In addition, there was no significant effect of the covariate (p = 0.21).

Later Effects (550–750 ms)

The ANOVA performed in the 550–750 ms epoch revealed a significant Congruency × Region interaction [(F (8, 232) = 3.05, p < 0.05)], suggesting a greater sustained negativity to incongruent final words than congruent final words in the right parietal sites. However, no differences were found between Groups of ASD children and controls (Figs. 3, 4). There was no significant effect of the covariate (p = 0.36).

Individual ANOVAs performed on the three macro-regions (Frontal, Central and Parietal) revealed results similar to those observed in the earlier time window: a significant greater negativity to Incongruent words compared to Congruent ones was observed at the central areas [F (1, 29) = 4.38, p < 0.05]. The ANOVA also revealed a main effect of Group [F (1, 29) = 6.92, p < 0.05] showing a greater negativity in the control group compared to the ASD group (Fig. 5).

Fig. 5
figure 5

Voltage of scalp distribution of the N400 in response to incongruent sentences in control group (first row) and ASD group (second row)

In addition, the analysis on Central sites showed that the Congruency × Region × Group interaction revealed a trend [F (2, 58) = 6.92, p = 0.08] suggesting a greater negativity to Incongruent compared to Congruent words at centro-parietal and right parietal sites and only in control groups. No P600 effects were elicited during this epoch. Finally, the ANOVA revealed no significant effect of the covariate (p = 0.20).

Visual Narratives

N400 (350–550 ms)

For critical panels in visual narratives, the ANOVA for the mean amplitude of the N400 component showed a main effect of Region [F (8, 192) = 96.3; p < 0.01], revealing a greater negativity in the frontal sites compared to central and parietal ones. In addition, a Congruency × Region interaction [F (8, 192) = 5.16, p < 0.01] revealed a greater N400 amplitude to incongruent strips than congruent strips in the fronto-central, fronto-right and central sites, as in Figs. 6 and 7. Moreover, this interaction showed a reverse effect—i.e. a greater positivity to incongruent strips than the congruent ones—in the parieto-left and parieto-right sites. Finally, there was no significant effect of the covariate (p = 0.74).

Fig. 6
figure 6

Grand-average ERP waveforms recorded at frontal and central sites in response to Congruent panel (blue) and Incongruent panel (red) strips in the control group

Fig. 7
figure 7

Grand-average ERP waveforms recorded at frontal and central sites in response to Congruent panel (blue) and Incongruent panel (red) strips in the ASD group

Individual ANOVAs performed on the three macro-regions (Frontal, Central and Parietal) showed a main effect of Congruency at the Frontal areas [F (1, 23) = 7.95, p < 0.05]. There was no significant effect of the covariate (p = 0.69). The ANOVA performed on the Central sites revealed a significant Congruency × Region interaction [F (2, 46) = 10.70, p < 0.01], revealing a greater N400 effect to incongruent strips than congruent ones only in the central sites. There was no significant effect of the covariate (p = 0.84) (Fig. 8).

Fig. 8
figure 8

Voltage of scalp distribution of the N400 in response to incongruent panels in control group (first row) and ASD group (second row)

Finally, the ANOVA performed on the Parietal sites revealed a significant Congruency × Group interaction [F (1, 23) = 4.38, p < 0.05], revealing a greater positivity to Incongruent strips than Congruent ones in Control groups. No differences were found between the Incongruent and the Congruent strips in the ASD group and no significant effect of the covariate (p = 0.43).

Later Effects (550–750 ms)

The ANOVA of the 550–750 ms epoch for visual narratives showed a main effect of Region [(F (8, 200) = 80.1; p < 0.01] revealing a greater sustained negativity in the frontal regions compared to central and parietal regions. In addition, a Congruency × Region interaction [F (8, 200) = 9.91, p < 0.01)] revealed a greater negativity to incongruent strips than the congruent ones in the fronto-central, fronto-right and centro-right sites. In addition, the analysis revealed a greater P600 to incongruent strips that the congruent ones in the centro-parietal and parieto-right sites. There was no significant effect of the covariate (p = 0.14).

The individual ANOVA performed at the frontal sites revealed a main effect of congruency, revealing a greater negativity to Incongruent than Congruent strips [F (1, 25) = 6.06, p < 0.05]. There was no significant effect of the covariate (p = 0.14). Similar to the earlier time window, the analysis on the central sites revealed a significant Congruency × Region interaction [F (2, 50) = 6.30, p < 0.01], revealing a greater sustained negativity to incongruent strips than congruent ones only in the central sites. There was no significant effect of the covariate (p = 0.71).

As in the previous time window, ANOVA performed on the parietal sites revealed a significant Congruency × Group interaction [F (1, 25) = 4.97, p < 0.05] revealing a greater P600 to incongruent strips than congruent strips in Control groups. There was no significant effect of the covariate (p = 0.84).

Ordinal Sequence Position Analyses

Similar to previous studies (Van Petten and Kutas 1991; Cohn et al. 2012; Coderre et al. 2018), we analyzed the processing of the narrative sequence across the panels. We wanted to investigate whether the position of each panel in the ordinal sequence of visual narratives affected N400 amplitudes. ERP responses to all the panels were collapsed over congruent and incongruent sequences. An omnibus 2 (group) × 3 (panel position) repeated-measures ANOVA on the mean amplitude of the N400 component showed an interaction of group × panel position (F (2, 50) = 3.73, p < 0.05). Post-hoc analyses revealed a greater negativity in response to the first panel compared to the other two and of the second panel compared to the third one in both groups. Follow up analyses at each position found no group differences at each individual panel. (Fig. 9).

Fig. 9
figure 9

Average N400 amplitude for each panel in the visual narrative sequence averaged across all electrodes. Error bars show the standard error of the mean. Negativity is plotted upwards

Discussion

In this study, we investigated whether comprehension deficits in individuals with ASD are language-specific or concern a more global comprehension impairment. To achieve this aim, we recorded ERPs to critical auditory words and comic panels that were either semantically congruent or incongruent with the previous context. In the 350–550 ms epoch, we found that incongruent sentence-final words elicited a weak and focally distributed N400 effect only in the contol group, whereas incongruent sequence-final images in visual narratives evoked an increased N400 effect in both groups. These results suggested that children with ASD had only minimal differences from TD children for verbal stimuli and did not have difficulty accessing semantic memory (N400) for the non-verbal visual narratives. However, ASD children did face challenges interpreting or integrating this information, as is evident in a larger late negativity to critical words and a larger P600 to critical panels in the TD group but not the ASD group. Given the close relationship between the N400 and P600 in research on language and visual narrative (Cohn 2019; Loschky et al. 2017; Sitnikova et al. 2003; Amoruso et al. 2013), these findings overall suggest that deficits in processing meaning across modalities may be more complex than comparisons between modalities or may be isolated to specific ERP components.

For critical words in sentences, we observed a measured and centrally-distributed N400 effect to incongruent words as compared to congruent words. Further analysis indicated a slightly greater focal negativity in typically-developing children as compared to the ASD group. However, this difference was constrained to a fairly small central region on the scalp. This observation of a difference in the N400 in sentences is in line with previous findings of restricted semantic processing of verbal information in the ASD population (Dunn and Bates 2005; Dunn and Vaughn 1999; Pijnacker et al. 2010; Kujala et al. 2013; Braeutigam et al. 2008; Lepistö et al. 2005; Fishman et al. 2011; Coderre et al. 2018). In particular, a reduced N400 effect was observed in response to auditory sentences in children and adults with ASD whereas significantly larger N400 effects were observed in neurotypical children (O’ Connor 2012; Dunn and Vaughn 1999; Dunn and Bates 2005; Fishman et al. 2011). These results have been interpreted as suggesting that individuals with ASD make less use of contextual information, which could be due to a less elaborate or less connected semantic network. In our study, the N400 effect to auditory words was focal and measured for both groups; however, the greater negativity observed in typically-developing children compared to children with ASD hint that, like in prior studies, ASD individuals have more difficulty than controls in accessing the meaning of the incongruent verbal semantic information.

Following the N400, we observed a sustained negativity in the 550–750 ms time window that was larger to incongruous than congruous stimuli in the centro-parietal areas, similar to the preceding N400. In addition, the analysis revealed a statistical trend toward a greater negativity to incongruent compared to congruent words at centro-parietal sites only in control groups. Similar sustained negativities have been observed following N400s in response to both linguistic and visual anomalies (West and Holcomb 2002; Cohn et al 2012; Lee and Federmeier 2008; Manfredi et al. 2018). In our study, this sustained negativity could reflect that the typically-developing children worked to further process the inconsistent auditory information whereas the ASD group did not.

The analyses of critical panels in visual narratives revealed a greater N400 amplitude to incongruent than congruent strips in the frontal and central areas in line with previous findings that images elicit a frontal distribution of the N400 (Barrett and Rugg 1990; Ganis et al. 1996; McPherson and Holcomb 1999; Manfredi et al. 2018). In addition, no differences were observed between the N400 responses to congruent and incongruent stimuli in the ASD and control group.

In addition, our ordinal sequence position analyses suggest that our ASD participants did not have difficulty accessing semantic memory (N400) for the non-verbal visual narrative and that they were able to integrate the meaning across units. This result was in line with previous observations of an attenuation of the N400 across ordinal sequence position of coherent visual narratives (Cohn et al. 2012), but it is in contrast with Coderre’s study (2018), which found no attenuation across the ordinal position of these sequences in individuals with ASD.

In general, these findings are in line with studies that suggest that semantic processing of nonverbal stimuli is not impaired in individuals with ASD (Kamio and Toichi 2000; McCleery et al. 2010; Sahyoun et al. 2009; Brandwein et al. 2011; Coderre et al. 2017). However, these results are not in line with findings by Coderre et al. (2018) showing attenuation of the N400 to both verbal and visual narrative stimuli in ASD participants compared to TD participants.

Following the N400, a greater negativity was elicited by incongruent strips than the congruent strips, similar to our findings that sustained negativities also followed the N400s to words in sentences. However, unlike with those sentences, no group differences were observed in this case. As for sentences, this response might reflect additional processing sustained from the preceding N400 (West and Holcomb 2002; Cohn et al 2012; Lee and Federmeier 2008; Manfredi et al. 2018), and the relative scalp distribution of the effects appear to support this. Thus, the N400 in the 350–550 ms epoch may have reflected a “lead-in” for the later effects, which were more widespread.

Nevertheless, in addition to this negativity, a greater P600 with a centro-parietal distribution appeared to critical panels in incongruent strips than the congruent strips. This effect appeared only for the control group, with an attenuated effect appearing in the ASD group. Previous work has associated the posterior P600 with updating or reanalysis processes (Van Petten and Luka 2012), particularly when the incoming word disconfirms predictions created by a preceding context (Quante et al. 2018; Kuperberg 2013). This late response could indicate that only control children were able to recognize the discontinuity of incoming visual information relative to its prior context. Our electrophysiological findings are in line with behavioral studies using visual narrative sequences with individuals with ASD (Baron‐Cohen et al. 1986; Johnels et al. 2013), which showed that individuals with ASD showed difficulties comprehending sequential images (i.e., constructing the expected sequences in the PAT).

In addition, as this is the first study to examine visual narrative processing in younger participants, it is worth comparing this to previous work with adults that used a similar paradigm (Manfredi et al. 2014; Manfredi et al. 2018; Cohn 2013; West and Holcomb 2002). In these studies, simple comparisons of incongruent and congruent stimuli in visual narratives showed no such P600 along with the N400. As described above, the N400 and P600 have been suggested to reflect different mechanisms within a broader processing cycle for both language and visual narratives (Kuperberg 2016; Cohn 2019) and can vary in emphasis across individuals (Tanner and Van Hell 2014). Since this late effect was not usually observed in adults during visual narrative processing (Manfredi et al. 2017; Cohn 2013), this would then suggest that children may use a processing strategy oriented towards updating to comprehend these sequences than adults, implying a greater need for backward-looking reanalysis. One possibility is that this greater updating in younger comprehenders is due to aspects of fluency with comprehending comics, as prior works have shown greater posterior positivities for low-frequency readers of particular comics compared to high-frequency readers (Cohn and Kutas 2017). This result raises interesting questions about typical development that need to be tested further.

Such later effects are also interesting because we observed differences between children with ASD and TD children in these later epochs. Unlike prior work, which found an attenuated N400 to both incongruent words in sentences and panels in visual narratives in individuals with ASD compared to neurotypical controls (Coderre et al. 2018), our study found only a focal central N400 effect to incongruent words as compared to congruent ones that was slightly attenuated for the children with ASD compared to TD children. Rather, our results showed greater group difference in the late time windows (sustained negativities, P600) than in the early stages (N400) for both modalities. Compared to adult participants, these effects in the later epochs may suggest that neurotypical kids face a processing delay whereas ASD kids show attenuation in these later stages (i.e., a lack of the late effects).

However, these later effects differed between modalities. In the sentences, we observed differences in a centro-parietal negativity sustained from the N400 between children with ASD and TD children. Though incongruent panels in visual narratives also evoked sustained negativities of the N400 with a more frontal distribution, we did not find group differences in this effect. Rather, group differences were observed in the more posterior P600. This group difference in the P600 between the two modalities might suggest that children with ASD show a deficit in the relational semantic processing of verbal information specifically, for updating during visual narrative processing. It is worth noting that these later effects are actually the opposite of what was observed in Coderre et al. (2017), where a P600 was attenuated for adults with ASD only in the verbal narratives, not in the visual narratives as observed here. Meanwhile, a sustained negativity following the N400 was observed for visual narratives in Coderre et al. (2017) for neurotypical adults only, with no P600, which was again attenuated for adults with ASD. Given that our findings for ostensibly the same ERP components show the opposite patterns, despite using roughly the same manipulation they on the one hand show the coupled nature of these effects (Kuperberg 2016; Cohn 2019). However, it also raises questions about why such opposite findings were observed, in addition to the minimal attenuation of the N400 observed here across groups compared to other studies.

One possibility might be due to age effects in the two studies: the age range in Coderre’s study varied from 18 to 68, while this study included ages 9–16. It is possible that our younger participants use a different processing strategy from wide span of older adults (which showed no internal variation across ages), and within that, group differences may show impairment on different aspects of processing throughout the lifespan. While no extant work has examined the neurocognition of visual narratives across lifespan, previous studies have observed the electrophysiological responses during natural language comprehension in typically developing school‐aged and adolescent children (Friedrich and Friederici 2004; Hahne et al. 2004; Männel and Friederici 2011; Schneider et al. 2016). These studies suggest that, similarly to our findings, children show a later and larger N400 effect for semantic errors and a smaller and later P600 effect for grammatical errors compared to adults. When language capabilities are taxed, children demonstrate subtle processing differences compared to adults, suggesting that typically developing children may engage different skills or strategies than adults during language comprehension (Holland et al. 2007).

While the processing of visual narratives appears to involve similar neurocognitive mechanisms as language (Cohn 2019), to our knowledge, no research has explicitly examined these mechanisms in visual narrative comprehension across the lifespan. Nakazawa (2005; 2016) has reported that younger and older adults have better comprehension and performance on visual narrative diagnostics than children. However, these effects may be largely driven by expertise, as college-aged adults—who were observed to read comics the most—outperformed both children and older adults. As discussed above, this difference in the familiarity with the conventions and processing of visual narratives may be a motivating trait of differences between adult and child populations. That is, they might differ in their visual narrative fluency (Cohn 2014): the young ASD participants of our study might have less familiarity with visual narrative material than the ASD participants in Coderre’s study. Such results suggest an interaction between comprehension across ages and expertise that can be explored in further studies.

Overall, our findings raise questions about how semantic memory is organized in the ASD brain across modalities and how-and if-this organization varies across the development. One way of addressing these questions about modality differences is to contrast modalities directly in concurrent presentation. For example, it would be worth investigating brain responses of individuals with ASD when visual narrative is presented together with different types of information such as sounds and auditory words (Manfredi et al. 2018). Such a design could probe how the semantic processing of verbal information might be facilitated when it is linked to meaningful visual information directly.

However, diversity in our participants could have led to variability in our findings. One limitation of our study was the significantly lower verbal IQ of the ASD group compared to the TD group. Future research should include ASD children matched to controls on verbal IQ. This would help to clarify whether impaired verbal comprehension observed in ASD children may be the result of low verbal cognitive abilities rather than ASD. In addition, since the age range for our participants was quite broad (8–16), it would be worthwhile for future research to compare the brain responses to verbal and visual narrative in TD and ASD children at more constrained ages. Nevertheless, our findings might have implications for clinical and educational outcomes. For example, future research in this field might integrate these findings when investigating inference skills and reading comprehension deficits.

In conclusion, our findings suggest that children with ASD have difficulty in interpreting and integrating both visual and verbal information and that these deficits may be more complex than simple comparisons between modalities or isolated to specific ERP components. In addition, since this is the first ERP study to investigate visual narrative processing in children, it raises questions related to issues of latency and componentry that can fruitfully be looked at in future research.