Introduction

Research examining the genetic basis of ASD (e.g., Muhle et al. 2004; Szatmari 1999), developmental trajectories (Waterhouse 2013), response to intervention (e.g., Vivanti et al. 2013, 2014), and personal experiences of living with ASD (e.g., Hines et al. 2012; Trembath et al. 2013) consistently points to the need to acknowledge differences and avoid assumptions when studying and supporting development in children with ASD. Yet, frequent reference to minimally verbal children with ASD (those who use less than ten spontaneous, functional, and communicative words; Goods et al. 2013) in the literature as being visual learners fails to account for potential differences amongst children in their ability to learn from visually presented stimuli. Accordingly, these children are routinely prescribed picture-based augmentative and alternative communication (AAC) systems, in part on the premise that these systems will complement their visual learning style (e.g., Rao and Gagie 2006; Tissot and Evans 2003). Our aim was to test this assumption, which partly underpins the use of picture-based AAC methods, with the view to promoting evidence-based intervention approaches for minimally verbal children with ASD.

Evidence for a Uniform Visual Learning Style

Proposed evidence for a visual learning style in children with ASD comes from three main sources. First, some adults with ASD have provided qualitative accounts of their experiences living and learning with ASD. Temple Grandin (2013), for example, noted that “words are like a second language to me… When somebody speaks to me, his words are instantly translated into pictures. Language-based thinkers often find this phenomenon difficult to understand…” Similarly, those who support people with ASD have described what they perceive to be a visual learning style. To illustrate, Trembath et al. (2013) interviewed family members and support staff of adults with ASD who were provided AAC in an attempt to support their communication needs. Carol, mother of Mary, a middle-aged woman with ASD who spoke using a small number of words, was quoted as saying “people with autism don’t understand spoken language and that’s why we’ve got the little picture there” (p6). However, it is unknown the extent to which Mary made use of the pictures. A further unknown is the extent to which the experiences of some adults with ASD reflect those of the broader community of children and adults with ASD, in particular, those who do not demonstrate underlying linguistic ability.

The second source of proposed evidence comes from the outcomes of studies involving the use of picture-based AAC systems to support the communication and learning of children with ASD. These systems have the potential to provide an efficient and recognisable communication mode (e.g., Functional Communication Training) and may support the development of symbolic communication (e.g., System for Augmenting Language; Romski and Sevcik 1996). The use of AAC for children with ASD has been endorsed by peak bodies including the American Speech and Hearing Association (2006), Speech Pathology Australia (2010), and the United States National Academy of Sciences (2001). Indeed, Ganz et al. (2012) recently conducted a systematic review of AAC strategies for children with ASD, and reported medium to large effects for interventions targeting social skills, communication development, academic skills, and challenging behaviour. However, evidence that these approaches may be helpful to some children with ASD does not automatically confer that these children have a visual learning style. Furthermore, close examination of the overall positive group effects reveals considerable individual variability in response to treatment. Therefore, if children with ASD share a visual learning style, it is either not uniform in strength across individuals or its action is mediated by other factors, such as auditory comprehension skills.

The third source of proposed evidence for a visual learning style in individuals with ASD comes from studies showing improved performance on tasks requiring the processing of visual, as opposed to auditory, information. Quill (1997) cited evidence that children with ASD tend to perform better on tasks requiring visual processing (e.g., matching, copying, puzzle assembly) than they do on language related tasks, in presenting a rationale for the use of visually-cued instruction with children with ASD in educational settings. Individuals with ASD have been shown to demonstrate superior performance on a range of tasks requiring visual processing including embedded figures test, copying impossible figures, and the Block Design subtest of the Wechsler intelligence test compared to typically developing (TD) controls (see Dakin and Frith 2005 for review). In addition, Samson et al. (2012), based on their meta-analysis of functional imaging studies, reported that individuals with ASD demonstrated enhanced task-related neural activity in regions associated with visual processing compared to TD controls. However, it is not clear if or how these findings translate to learning in real life contexts (e.g., shared book reading at preschool, visually-cued instruction), nor whether visual processing bias and superior performance on visually focused tasks represents a learning strength or a lesser degree of impairment.

Quill and other researchers (e.g., Foley and Staples 2003; Preis 2006) cited the results of experiments by Hermelin and O’Connor (1970) as evidence for a visual learning style in children with ASD. Specifically, in one experiment a group of children with ASD completed a puzzle task with an additional visual cue (in the form a continuous line drawn across pieces) faster than a puzzle task in which the line was not provided. Hermelin and O’Connor noted that the children with ASD completed the task faster, and with fewer errors, than TD controls. However, closer reading reveals that while ‘more advanced’ children with ASD benefited from the additional visual cues, it was not true for those with cognitive impairment. In fact, across the broader assessment battery, children with ASD demonstrated a preference for motoric cues over visual and auditory stimuli (Hermelin and O’Connor 1970).

Preis (2006) attempted to teach children with ASD (aged 5; 3–6; 7) to follow a set of commands (e.g., stand up, clap hands) under two experimental conditions: speech alone and speech + pictures. The children had non-verbal IQ in the average range but receptive vocabulary in the range of 1st–7th percentile. Each child attended between 15 and 28 sessions, during which he or she was taught up to six new commands with up to 15 trials for each command. A three step hierarchy of most-to-least prompts was used, ranging from hand-over hand guidance, to gestures (point or tap), to no additional prompts. Using an alternating treatments design, Preis found no difference in the children’s performance under the two conditions during the treatment phase.

Preis (2006) did report (a) a small but significant improvement in performance in the speech + pictures condition during generalisation tasks in which an unfamiliar clinician asked the children to complete previously learned directions, and (b) a moderate positive effect for the use of speech + pictures when an unfamiliar clinician asked the children to complete previously learned directions at approximately 10 and 20 weeks post-treatment. Preis argued that the lack of treatment effect during the teaching phase was due to the presence of the physical and gestural prompts, but that when these were removed (in maintenance and generalisation phases), the pictures helped the children perform previously learned commands. It is possible that the delayed improvement in the children’s performance mirror those reported in studies of working memory training programs (e.g., Cogmed), whereby children and adults have further improved on their post-test scores 12-months after treatment (Ralph 2012). Alternatively, the fact that a difference in performance was evident only beyond the treatment condition can be considered a threat to internal validity. In addition, given that there were only five participants and with non-verbal IQ in the average range, the relevance of findings to the broader spectrum of children presenting with ASD is questionable.

Erdődi et al. (2013) cast further doubt on the notion that all children with ASD are visual learners. Based on a medical record review, they examined learning amongst children with ASD (n = 42), attention-deficit hyperactivity disorder (n = 83), velocardiofacial syndrome (n = 17) and TD children (n = 38) over repeated trials of two subtests of the Test of Memory and Learning. The first subtest (Word Selective Reminding) required children to repeat a string of unrelated words. The second task (Visual Selective Reminding) required the children to point to dots on a card in the same sequence as the examiner. Rather than demonstrate a propensity for visual learning, the children with ASD (and children with VCFS) appeared to benefit little from repeated exposure to the visual stimuli across trials. In contrast, all four groups improved their performance in the auditory learning condition across trials. Erdődi, et al. noted that despite children with ASD appearing to benefit little from learning in the visual condition, they were the only group to maintain their performance in the visual condition (albeit not as improved as performance in the auditory condition) during a delayed recall task. They suggested that although children with ASD may learn more quickly during the auditory condition, they may be able to better preserve visual learned information over time. These findings could be viewed as being consistent with those of Preis (2006), suggesting that although visually presented information may not help children perform at the point of teaching, it may facilitate their recall of whatever information they learned. Alternatively, given the limitations identified, including that there was no benefit of pictures at the point of teaching, caution is required in using these results to support the prescription of picture-based AAC systems on the basis that they will complement a preference or improved capacity for visual learning.

Visual Attention and Visual Learning

Caution is also needed in prescribing AAC on the basis of a visual learning style, given that the most common mode of teaching the use of picture-based AAC systems not only purports to utilise, but relies on, visual attention. Aided Language Stimulation involves communication partners pointing to picture symbols (e.g., photographs, line drawings) representing words and messages (e.g., ‘car’, ‘more’, ‘help’) while producing the corresponding words during interactions. This process is implemented on the assumption that children will watch, learn, and then use the AAC systems being presented (Goosens’ et al. 1995). The core assumption underpinning Aided Language Stimulation as a teaching method—that children with ASD will look at the picture symbols the teacher is pointing to—has not been validated. This lack of empirical evidence to support the underlying premise of AAC intervention for people with ASD is of concern in the face of evidence contra-indicating the use of visual systems, at least for some individuals.

There is an abundant research literature demonstrating that children with ASD have reduced attention to social stimuli (Dalton et al. 2005, 2007; Grelotti et al. 2002) and difficulties with face processing (Scherf et al. 2008). Two studies have demonstrated that abnormal visual scanning extends to non-social scenes and objects (Anderson et al. 2006; Sasson et al. 2008), raising the prospect that children with ASD also may not attend to the presentation of picture symbols commonly used in AAC systems aimed at supporting their communication development. Furthermore, recent findings (e.g., Vivanti et al. 2011, 2014a, b; Vivanti and Dissanayake 2014) indicate that atypical gaze patterns among children with ASD affect their ability to understand the actions they observe. On the basis of these recent findings, it is reasonable to hypothesise that abnormalities in visual attention might disrupt the learning process of children with an ASD during teaching situations involving the use of AAC, in which the primary modality is visual. Specifically, reduced attention to the visual stimuli used may reduce the efficiency of learning to use or benefit from picture-based AAC systems in children with ASD. This hypothesis remains to be tested.

The Need for Further Research

Our overall objective was to test the assumption that children with ASD visually attend to, and benefit from, picture-based AAC systems due to an increased ability to process visually presented information. Our first aim was to assess whether or not children with ASD visually attend to a picture-based AAC system used by a teacher to convey commands during simulated educational activities in the same way as children presenting with global developmental delay (GDD) and TD children. In this study, GDD was defined as “significant delay in two or more of the following developmental domains: gross/fine motor, speech/language, cognition, social/personal, and activities of daily living” (Shevell et al. 2003). We included the GDD group to study the possible influence of receptive language on visual attention and task performance. Our second aim was to assess whether the children performed differently when the commands were presented using speech-alone versus speech + pictures. On the basis of findings that children with ASD have shown superior auditory learning over visual learning (Erdődi et al. 2013; Preis 2006), our hypotheses were (a) that compared to TD and GDD children, the children with ASD would show reduced visual attention to the picture-based AAC system in the trials where it was used to supplement the spoken instructions, and (b) as a consequence of reduced visual attention, the children with ASD, unlike those in the comparison groups, would show no difference in performance under the two conditions. In addition, the relationship between visual attention and performance in each condition was examined in each group.

Method

Participants

The participants were 25 children with ASD, 19 TD children, and 17 children with GDD but no history of ASD (see Table 1). Participants with ASD were recruited through the Victorian Autism Specific Early Learning and Care Centre, an ASD specific program located at La Trobe University, Australia. Participants in the GDD group were recruited through a community-based early intervention centre. TD children were recruited from the La Trobe University Children’s Centre and through advertisements in the community.

Table 1 Participants’ characteristics

The diagnoses of ASD were previously made by community-based health care professionals using DSM-IV-TR criteria (APA 2000). The community-based professionals did not consistently document a sub-group diagnosis for the children (i.e., Autistic disorder, Asperger’s disorder, Pervasive Developmental Disorder—Not Otherwise Specified) as part of the diagnostic process, with most simply referring to ‘Autism Spectrum Disorder.’ Therefore, we confirmed eligibility for the study using the Social Communication Questionnaire (SCQ; Rutter et al. 2003) as completed by parents and the Autism Diagnostic Observation Schedule (ADOS; Lord et al. 2000) conducted by a clinician with demonstrated reliability in the use of this measure. Four children met ADOS criteria for Autism Spectrum Disorder and 21 met criteria for Autistic Disorder. None of the children had been diagnosed with Rett’s disorder or Childhood Disintegrative Disorder.

Exclusionary criteria for the ASD group included the presence of a genetic or metabolic disorder known to cause autism-like features (e.g., fragile X syndrome or tuberous sclerosis), uncorrected hearing or vision impairment, and the presence of a major medical problem. Participants in the GDD group had all been assessed by community professionals. Exclusionary criteria for the GDD group included the presence of features of autism as assessed through the SCQ. Participants’ cognitive level was measured with the Mullen Scales of Early Learning (MSEL; Mullen 1995). Sample characteristics are presented in Table 1 with significant post hoc contrasts between groups indicated. The ASD and TD groups were not different from each other on chronological age (CA). The ASD and GDD groups were not different from each other on CA or the four subscales of the Mullen Scales of Early Learning (visual reception, fine motor, receptive language, expressive language).

Apparatus and Stimuli

We tested our hypotheses via an experiment in which the children were shown a series of eight video stimuli (5 s each) on Tobii T120 binocular eye-tracker monitor with an imbedded camera (120-Hz, 1280 × 1024 pixels resolution, average precision of 0.5 of visual angle). The videos were presented in two different fixed random orders. During each video, a female actor commanded the child to complete one of eight brief tasks (e.g., “Pick up the BLOCK and put it in the BASKET”) using objects that were placed on a table in front of the child (see Table 2). Our independent variable was the presence/absence of pictures during each video. Specifically, in the speech-only condition, the actor made the command using her speech, while at the same time making a neutral hand gesture in the physical space where the pictures were placed in the speech + pictures condition. In the speech + pictures condition, the actor conveyed the command using her speech while pointing to two colour photos representing the object the child needed to pick up and the container in which the object was to be placed. The pictures were colour photographs of the objects used in the study, displayed on the video via a piece of A3 cardboard presented in a fixed position facing the child. Thus, the only difference between the two conditions was the presence or absence of the pictures.

Table 2 Commands given to the children under each condition

Procedure

The study was approved by the La Trobe University Human Ethics Committee and informed consent was obtained from the children’s parents. The children were tested in a quiet room at their respective Centres. The length of experimental testing was approximately 5 min; the current experiment was part a longer session of experimental testing. Participants were seated in a comfortable chair 60 cm from the monitor. The session began with a 5-point calibration procedure that was saved and used for the entire protocol. The examiner then placed the objects in a fixed order in front of the child, labelling each object once as it was placed on the table. After each stimulus, the researchers observed the child’s response with the objects provided for each command. If he or she did not respond, reassurance in the form of a statement “you can play” was provided. Objects the child manipulated were then returned to their original position on the table prior to the next stimulus being presented. Sessions were video recorded to allow for behavioural coding of the children’s responses to the commands.

Data Coding

During observation of the video stimuli, children’s eye movements were recorded to determine where they were gazing on the screen. Data were analysed frame-by-frame using Tobii Studio with regards to one predefined area of interest in the speech only condition (the actor’s face) and three predefined areas of interest in the AAC condition (actor’s face, picture of target object, picture of target container). See Fig. 1 for areas of interest. Fixation criteria were set to Tobii Studio defaults of a 30-pixel dispersion threshold for 100 ms. The average proportion of fixations to the areas of interest across the four trials in each condition were calculated for each child and exported to SPSS for analysis.

Fig. 1
figure 1

Screen shots of video stimuli showing speech-only and speech + pictures condition

The children’s responses to the commands (behaviour performance) were coded by a naive coder blind to group membership and study hypotheses. For each trial, each child received 1 point for picking up the correct object and attempting to insert it in the correct container, thus allowing each child a maximum of 4 points under each condition. Coding reliability was assessed for a randomly selected 50 % of trials across participants and the two conditions, based on point-by-point comparison between independent ratings by the naïve coder and the first author, yielding 100 % agreement.

Analysis

We used analysis of variance to test for differences in visual attention (average number of fixations across trials) between the three groups to the four predefined areas of interest: the actors face in speech-only and speech + pictures conditions (DV1; and the picture of object (DV2) and picture of container (DV3) in speech + pictures condition). We also used analysis of variance to test for differences in performance (DV4) between the groups in the two conditions. We examined the relationship between visual attention and performance in each condition in each group using correlational analysis.

Results

Deviations in kurtosis and skewness from the normal distribution curve were tested for all eye tracking and behavioural coding variables following guidelines set by Tabachnick and Fidell (2007). All eye tracking variables were normally distributed (Shapiro–Wilk test results; attention to face = 0.31 in speech-only condition, attention to face = 0.36 in speech + pictures condition, attention to picture of object = 0.19, attention to picture of container = 0.08), and so were analysed using parametric statistics. The behavioural performance scores for ASD and GDD groups violated assumptions of normality (Shapiro–Wilk test = 0.01 for performance under both conditions). Accordingly, ANOVAs—which are robust to violations of normality (Glass et al. 1972)—were used to test for group differences in performance as noted above; whereas within group associations between visual attention, receptive language, and performance were assessed using non-parametric correlations (Spearman’s Rho).

Visual Attention

We first looked at whether there were group differences in the amount of visual attention directed to the actor’s face. The results of a 3 (Group) × 2 (Condition) ANOVA examining the children’s attention to the actor’s face under each condition showed a main effect for Condition, F(2, 58) = 34.70, p < .01, partial eta squared = .37 but no effect for Group, F(2, 58) = .28, p = .76 or Group × Condition interaction, F(2, 58) = .42, p = .65. As apparent in Fig. 2, although there were no differences in attention to the actor’s face between groups overall, children in each group spent more time looking at the actor’s face in the speech-only condition than they did in the speech + pictures condition.

Fig. 2
figure 2

Mean fixations to areas of interest in each condition

We next compared attention to the picture-based AAC system (pictures of target object and pictures of target container) using one-way ANOVAs. As illustrated in Fig. 2, no group differences were found in the amount of attention to target objects, F(2, 58) = 2.75, p = .07 or the target container, F(2, 58) = 1.39, p = .26. We also looked at the proportion of total fixations on the pictures (versus the face in the AAC condition), with the results again showing no group differences, F(2, 58) = .70, p = .5. As presented in Table 3, all groups looked at the pictures for approximately 70 % of the time, and to the face for approximately 30 % of the time, indicating that children with ASD were not drawn to the pictures more than other two groups. The results for all tests of visual attention were unchanged when we co-varied for CA and receptive language (MSEL).

Table 3 Visual attention to areas of interest (face, object, and container) and performance scores in the speech-only and speech + pictures conditions

Behavioural Performance

Results of a 3 (Group) × 2 (Condition) ANOVA examining participants’ performance in the two conditions showed a main effect for Condition, F(2, 58) = 24.57, p < .01, partial eta squared = 0.30, a main effect for Group, F(2, 58) = 9.32, p < .01, partial eta squared = 0.24, and a Group × Condition interaction, F(2, 58) = 4.90, p = .01, partial eta squared = 0.15. Pairwise comparisons using Wilcoxon Signed Rank Tests with bonferroni correction revealed significantly better performance for the TD, z = −2.45, d = .67, p = .01 and GDD groups, z = −2.81, d = .91, p = .01 under the AAC condition, whereas the children with ASD performed equally poorly under each condition, z = −1.41, d = .16, p = .16 (see Fig. 3). These findings indicate that the children with ASD were the only group that did not benefit from the pictures.

Fig. 3
figure 3

Mean performance scores for each group in each condition

With regard to individual differences in performance, in descriptive terms, five children (20 %) with ASD, and 7 children (41 %) with GDD followed instructions in the speech-only condition. As indicated in Table 3, there was no significant difference in task performance between the children with ASD and children with GDD in the speech-only condition. When the pictures were available (speech + pictures condition), the number of children with ASD who completed instructions increased by only one (total 24 % of sample completed some instructions), whereas a further three children with GDD completed at least part of the instruction (total 59 % of sample) when the pictures were available. Additionally, six of the seven children (85.7 %) with GDD who completed some instructions in the speech-only condition ‘improved’ their performance scores in the speech + pictures condition. This compares with just one out of the five children (20 %) with ASD who ‘improved’ his performance score when the pictures were available.

In order to test the hypothesis of a relationship between visual attention and task performance, we conducted correlational analyses (Spearman’s rho) and found a correlation between visual attention to the pictures (fixations to pictures) and performance in the ASD group, rs(23) = .35, p = .04 and the GDD group, rs(15) = .45, p = .03, whilst no correlation was found in the TD group, rs(17) = .09, p = .36 (see Fig. 4 for scatterplots). Similarly, we found a correlation between performance and Receptive Language (Mullen T-scores) in the ASD group rs(23) = .40, p = .02 and in the GDD group, rs(15) = .50, p = .02, but not in the TD group, rs(17) = .16, p = .26. Therefore, both attention to the pictures and receptive language were associated with performance in each of the clinical groups in the speech + picture condition. In the speech-only condition, performance was again correlated with Receptive Language for the ASD group, rs(23) = .49, p < .01 and the GDD group rs(15) = .50, p = .02 but not for the TD group, rs(17) = .08, p = .38. Attention to the face was not correlated with performance in any of the groups or in any condition.

Fig. 4
figure 4

Scatterplots illustrating the relationship between visual attention, receptive language, and task performance for children during the speech + pictures condition

Discussion

Our overall objective was to test the assumption that children with ASD visually attend to, and benefit from, picture-based AAC systems due to an increased ability to process visually presented information. We found no difference in the way children with ASD, children with GDD, and TD children visually attended to pictures in a simulated AAC teaching scenario. However, when it came to performance, the TD children and the children with GDD benefited from supplementing spoken commands with pictures, but the children with ASD did not. As hypothesised, these findings contradict the notion that children with ASD are all visual learners. The implications for clinical practice and future research are discussed.

Are Children with ASD Looking?

Despite frequent reference to children with ASD being visual learners (e.g., Quill 1997; Rao and Gagie 2006; Tissot and Evans 2003), there has been little research evidence to support this assertion. Indeed, there is some evidence that children with ASD learn more quickly in auditory learning tasks compared to visual learning tasks (e.g., Erdődi et al. 2013), and evidence for atypical patterns of visual attention amongst children with ASD to social and non-social stimuli (e.g., Dalton et al. 2005, 2007; Grelotti et al. 2002). Accordingly, we hypothesised that children with ASD would show reduced visual attention to the picture-based communication system during the simulated teaching scenarios. Our results, however, indicate there were no such differences in visual attention when compared to TD children of similar age, and children with GDD of similar age and receptive language ability. From a clinical perspective, our findings provide the first empirical evidence to support the notion that children with ASD do look when shown picture-based AAC systems, as per the Aided Language Stimulation teaching approach.

Do Pictures Improve Performance?

In an attempt to measure the impact of the pictures on children’s learning, we examined their responses to commands under speech-only and speech + pictures conditions. We hypothesised that as a consequence of reduced visual attention, the children with ASD would show no difference in performance under the two conditions. Consistent with the findings of Preis (2006), we found no difference in the performance of children with ASD as a group under the two conditions, and thus no evidence to suggest they benefited from the addition of visual support in this experiment. In contrast, the children in the GDD and TD groups performed better when the pictures were available.

Our finding resembles that of Pierce et al. (1997) who assessed the ability of children with ASD, children with ‘mental handicap,’ and ‘typically functioning’ children to interpret social situations presented in video vignettes. By manipulating the number of verbal, tonal, non-verbal, and object cues featured in the videos, the authors demonstrated that while the ‘mentally handicapped’ and ‘typically functioning’ children benefited from additional cues, the children with ASD did not. Pierce et al. suggested that stimulus overselectivity and difficulties modulating arousal amongst the children with ASD may have negatively impacted their performance in the conditions featuring multiple cues. While not the focus of the present study, it is possible that stimulus overselectivity and poor arousal modulation, as well as other well-documented difficulties with joint-attention and weak central coherence, may have impacted on the ability of children with ASD in this study to benefit from the additional picture cues, which warrants further investigation.

Individual Differences

At the outset, we questioned the notion that children with ASD share a common visual learning style, which is incompatible with the substantial body of research pointing to the presence and importance of individual differences in this population. Presumably, the picture is more complex, with a confluence of factors affecting each child’s learning. In our experimental task, we did not find evidence for a common visual learning style amongst children with ASD in our group level analyses, nor a common benefit of supplementing spoken words with pictures in the simulated learning scenario. However, our within group analyses revealed a significant correlation between visual attention to the pictures and performance for both ASD and GDD groups in the speech + pictures condition. It may be that for some children with ASD (and GDD), augmenting speech with pictures has the potential to improve performance, but not for all. Furthermore, we found a correlation between receptive language and task performance, indicating that, unsurprisingly, those children who do understand the verbal instructions have an advantage compared to those that have to rely on the pictures because of their low language understanding. This result points to the importance of targeting receptive language in young children with ASD. The findings may also help explain the variability in treatment response amongst children with ASD reported in the AAC literature (Ganz et al. 2012), particularly in cases where children are taught using methods that rely on them looking as a communication partner models the use of AAC system (e.g., Aided Language Stimulation). Clearly, further research examining visual versus auditory learning in children with ASD must examine the relative benefits both at the group and the individual level.

Implications

In considering the clinical implications of our findings, it is important to emphasise that the purpose of this study was not to evaluate the clinical utility of picture-based AAC systems for children with ASD. To do so requires a treatment study and clearly there is evidence to suggest that AAC can be helpful to some children (Ganz et al. 2012). Accordingly, the performance data do not indicate whether or not some of the children in our study may ultimately benefit from AAC systems. Instead, the data show that on first exposure to a new AAC system in a simulated learning scenario, as a group, the children with ASD looked at, but did not benefit from, the pictures. This finding was in contrast to children with GDD with similar language skills and TD children of similar age, who looked and performed better in the speech + pictures condition. Our within groups finding of a correlation between visual attention and performance amongst children in the ASD group and the GDD group point to the need for further research examining individual differences. On the basis of these findings, we argue that our results should not be used in support of, or as evidence against, the use of AAC for children with ASD (or children with GDD who appeared to benefit from the pictures). Instead, we hope our results will encourage researchers and clinicians to pursue a more critical, sophisticated, and, at all times, theoretically driven approach to the prescription of AAC devices for minimally verbal children, irrespective of their diagnoses.

Limitations

When testing a hypothesis that hinges on the discovery, or otherwise, of differences between groups or conditions, a key concern is sample size. Given that our sample necessarily included children with a spectrum of individual skills and needs, as evidenced by variability within test-scores and performance scores in the speech-only condition, there is a risk that these individual differences masked a main effect for learning. However, in reviewing the literature that supports the visual learning style hypothesis, the insinuation is that the preference for visual learning is observable, if not obvious, in children with ASD, irrespective of their individual learning needs and profiles. Presumably, this would constitute a medium to large effect that we had sufficient power to detect. Furthermore, if having a visual learning style is common to all children with ASD, then the heterogeneity of strengths and needs in our samples of children with ASD should not have been a problem. Finally, and perhaps most compelling, is the fact that we detected a significant improvement in performance associated with the speech + picture condition in the GDD group, despite the fact that (a) these children numbered only 17, (b) none were identified as having a propensity for visual learning, and (c) there was no significant difference between these children and the children with ASD in either receptive language or task performance in the speech-only condition.

With regard to our experimental task, our aim was to create an ecologically valid simulated teaching scenario, in which children are required to respond to vision of a teacher giving instructions. The task was effective in eliciting a range of performance, reflecting the deliberately targeted, clinically relevant, heterogeneous sample of children with ASD in this study, including those with significant learning needs, for whom picture-based AAC systems are routinely prescribed. We have described the conditions as speech-only and speech + pictures in describing the fundamental difference between the conditions. However, we acknowledge that the videos contained other visual stimuli including the teacher’s face and the plain background. Similarly, we acknowledge that by having the teacher invite responses from the children, the task included a social element, which may have impacted the responses of the children with ASD differently to those of the children with GDD. That said, the stimuli were kept constant across the conditions and we found no differences in the pattern of visual attention across the three groups, suggesting that children with ASD attended to the social stimuli in the same way as children in the other groups. Nevertheless, future studies examining learning in children with ASD could include purely-visual versus purely-auditory learning tasks in order to further elucidate the relative benefits of the two learning modalities, as well as tasks to examine the possible impact of stimulus overselectivity, poor arousal modulation, joint-attention, and weak central coherence on children’s responses to picture-based AAC systems.

It is possible that children may have responded differently to the commands if delivered in a real-life scenario as opposed to videos of an actor. Similarly, it is possible that the children may have responded differently if we had included more trials under each condition. To this end, clinicians may argue that just as TD children are exposed to hundreds of thousands of words before they learn to talk, children with ASD presenting with significant learning disability may need more than four trials to detect the influence, if any, of pictures on their learning. The results of Preis (2006) and Erdődi et al. (2013) suggest that there may be a visual learning effect that is not obvious at the point of teaching, but that becomes evident when children are challenged to recall previously learned information. Therefore, further research is warranted to examine the possible impact of picture-based AAC strategies on learning over clinically-relevant periods of time.

Conclusion

The findings from this study build on those of Erdődi et al. (2013) suggesting a clinical and research need for a more sophisticated understanding of the comparative benefits of visual and auditory learning in children with ASD. Our findings call into question the assertion that children with ASD have a propensity for visual learning over auditory learning, and point to the need for caution in the prescription of picture-based AAC systems on the basis of this assertion until further research is conducted. Clearly, some children with ASD benefit from the use of AAC, but we suggest that the benefits are more likely due to the capacity of these systems to provide an efficient and recognisable communication mode and support the development of symbolic communication than due to their ability to bootstrap a visual learning style. Advancing evidence-based practice for minimally verbal children with ASD requires that we identify the active ingredients in these and other communication interventions, understand individual differences in treatment outcomes, and in doing so address un-tested assumptions about how these children learn.