Autism spectrum conditions (ASC) are characterised by persistent deficits in social communication and social interaction, as well as restricted, repetitive patterns of behaviour, interests, or activities. Impairments in emotion comprehension, defined as the knowledge to identify and understand others’ emotions by facial or bodily cues, and within specific social context (Harris et al. 2016) are closely linked to the social communication deficits in ASC (APA 2013). Individuals with ASC show general impairment in emotion recognition from facial expressions as well as identification of situation-based emotions (Uljarevic and Hamilton 2013; Lozier et al. 2014; Fridenson-Hayo et al. 2016). Research has suggested that these challenges may stem from altered attention to faces, specific processing deficits and styles, and abnormal neural circuits that mediate face perception (Dawson et al. 2005; Schultz 2005). Previous research emphasized the role of experience with faces i.e. the exposure to faces as visual stimuli in the development of expertise in face processing (Gauthier et al. 2000). In contrast to neurotypical population, children with ASC who avoid looking at people show deficiencies or a slower development of these abilities.

As emotion recognition is associated with social competence in ASC (Williams and Gray 2013; Baron-Cohen et al. 2009) interventions that target emotion comprehension may be crucial to addressing the deficits in social communication and interaction (Kouo and Egel 2016; Fridenson-Hayo et al. 2016). It is suggested that improving EU may give wider positive qualitative changes in the socio-communication skills and in the overall development of the child with ASC (Rice et al. 2015). Thus, EU as a crucial construct for social understanding, should be an integral part of educational interventions and programs for students with ASC (Howlin et al. 1999) as well as the often comorbid intellectual impairments (Adibsereshki et al. 2014), an area where technology-based interventions have shown great potential.

CBI provide controlled and structured interactive environment, with little social demands which has been found to be appealing to children with ASC. A number of empirical studies in the literature demonstrates positive effects of technology based interventions on aspects of emotion comprehension in children with ASD (Bölte et al. 2006; Lacava et al. 2007; Tanaka et al. 2010; McHugh et al. 2010; Hopkins et al. 2011; Rice et al. 2015) A recent systematic review of evaluation studies notes several ways to overcome current limitations: include individuals with low functioning ASC, confounding variables control, as well as enhance generalization and maintenance of skills (Kouo and Egel 2016).

Several previous studies teaching emotion recognition have reported difficulties with skills generalization (Golan and Baron-Cohen 2006; Sven Bölte et al. 2002; Silver and Oakes 2001). Despite improvements made on the teaching materials, the participants often did not show better ER ability on faces not previously presented. Golan and Baron-Cohen (2006) described three levels of generalization of emotion understanding skills: close generalization (involves using previously familiar stimulus); feature based distant generalization (involves using novel stimulus); and Holistic distant generalization (using complex dynamic stimulus e.g. films). We focused on Feature based generalization of learning as a test of transferring EU abilities across situation. As Ryan and Charragáin (2010) noted, this is the minimal level of generalization required to conclude that children are not simply associating the correct response with some irrelevant features of the stimuli used in training.

The Intervention Program

Ucime Emocii (Learning Emotions)Footnote 1 is a cross platform web application designed for teaching and practicing emotion comprehension skills for children with ASC in a repetitive, predictable and consistent system which appeals to the autistic child. Given the multi-componential nature of EU, as reflected in the definition of EU concepts (Pons et al. 2004; Howlin et al. 1999), it’s necessary to point out that the program targets basic level EU competences—recognition of external non-verbal cues (facial expressions recognition and understanding of the impact of situational factors on emotions). The program includes four basic emotions (sadness, happiness, fear, and anger) explored through photographs, pictograms and illustrations of social context. Generalization of facial emotional cues is fostered by matching facial expressions across identities. In addition, an integrative perceptual approach in recognition of eye and mouth face features (Tanaka et al. 2012) that addresses the bias towards analytical face processing in ASC (Dawson et al. 2005; Behrmann et al. 2006) is used.

The LE program comprises two parts: test module and game module. The game module provides opportunities for learning and practicing EU through six interactive games.

The first game targets the ability to match a verbal label to facial expression. At the beginning of each trial, four emotional faces appear on the screen (happy, angry, sad and afraid). A verbal instruction is given, along with a visual prompt, prompting the child to tap the face depicting the correct emotion label.

The second game addresses the visuo-spatial processing differences i.e. the bias towards analytical face processing. Two pairs of facial features (eyes and mouths) are presented on the screen while the child is verbally and visually instructed to construct a face that matches the study face by dragging corresponding facial parts to a designated area on the screen. This task requires the user to be attentive to the individual facial components, as well as the configural aspect of the expression.

The third game is a variation of the well-known memory game. The child is presented with a set of eight cards containing four pairs of different identities producing the same emotional expression. This game targets generalization of the expression across different identities. The child is expected to match emotional expressions across changes in facial identity without explicit verbal labels.

The fourth game targets the ability to match a verbal label to a pictogram (emoticon). At the beginning of each trial, four pictograms representing basic facial expressions appear on the screen (happy, angry, sad and afraid). A verbal instruction is given, along with a visual prompt, instructing the child to tap the pictogram depicting the correct emotion label.

The fifth game allows for generalization of the facial emotional cues across different modalities (drawings and photos). One emotional face is presented centrally on the screen, while four emoticons are below. The user is verbally instructed to match a pictogram with the presented emotional face, without using verbal labels.

The sixth game addresses understanding situation based emotions. An illustration depicting an emotional situation is presented on the screen, along with at a short verbal explanation. The facial expressions of the protagonists are not visible or are blurred. Four pictograms are presented below. The child is instructed to choose the most appropriate pictogram that depicts how the protagonist in the scene is feeling. Screenshots of game items from each task are shown in Fig. 1.

Fig. 1
figure 1

Screenshots of items from each task

Each game comprises eight trials with two items per emotional expression, which are randomly chosen from a larger set (25 per emotion) of items, providing diversity and novelty of the teaching material when practising multiple times. In each trial, the probes remain on the screen until a choice was made. After each trial and completion of a game positive reinforcement (praise) in an auditive format, as well as computer-animated graphics were included as an incentive to engage the participants in the interventions. We used errorless teaching strategy which is suggested to maximize acquisition speed and retention (Mueller et al. 2007; Howlin et al. 1999). In case of an error, the user is prompted to choose the correct answer (the correct probe blinks until taped or clicked with the rest of the probes dimmed) while error is not labelled in any way. The user must select the highlighted correct answer in order to proceed to the next trial.

Although many children with ASC appear to have a natural affinity for computers and the controlled environment provided by computers (Moore et al. 2005) and thus are intrinsically motivating for the children, external incentives and reinforcement was provided in the form of token economy, individualised in accordance with personal wishes and interests with input from the parents.

The visual material is from multiple sources. The emotional faces photographs are taken from the Warsaw Set of Emotional Facial Expression Pictures (Olszanowski et al. 2015). A visual artist designed the pictograms and modified contextual illustrations from Google’s Public Domain, based on materials in Teaching Children with Autism to Mind-Read: A Practical Guide for Teachers and Parents“(Howlin et al. 1999).

This study assessed the effects of using the LE program with professional supervision on improvement of emotion understanding in children with ASC aged 7 to 15 years. The intervention took place over a 6 weeks period. Emotional understanding was tested over two time points: before (Time 1) and after (Time 2) the intervention. A control group with ASD was matched to the intervention group for age and symptom severity. We hypothesized that the intervention group would perform better than the control group at all post-intervention emotion understanding measures, when controlling for pre-intervention scores and symptoms severity.

Method

Design and Instruments

We used a pre-post study design, with a control group matched for age and symptom severity. The two groups were assessed twice over an 8 weeks period. In the initial assessment (Time 1) participants diagnosis was confirmed and emotion understanding was tested. At post intervention (Time 2) emotion understanding only was tested.

The Childhood Autism Rating Scale

The Childhood Autism Rating Scale (CARS) successfully differentiates between ASC, other developmental disabilities and typical functioning in children (Schopler et al. 1988). Its use is supported by findings of good internal consistency (.896, 95% CI .877–.913) and good inter-rater reliability (.796, 95% CI .736–.844) (Breidbord and Croudace 2013). The child’s behaviour is rated from 1 to 4, 1 for normal for the child’s age, 2 for mildly abnormal, 3 for moderately abnormal and 4 for severely abnormal. Scores range from 15 to 60 with 30 being the cut-off rate for a diagnosis of mild autism. Scores 30–37 indicate mild to moderate autism, while scores between 38 and 60 indicate severe autism. A Macedonian translation of the CASR instrument was used.

Emotion Comprehension Test (ECT)

This assessment, a separate mode of the LE program, examines the child’s EU regarding external emotional cues: emotion recognition from facial expressions (photographs), emotion recognition from graphical representations of facial expressions i.e. pictograms, and recognition of situation based emotion (Howlin et al. 1999) in three respective tasks. It assesses the EU of four basic emotions: happy, sad, angry, and surprised.

The Face Task examined the child’s ability to match a verbal label to facial expression. At the beginning of each trial, four emotional faces appear on the screen (happy, angry, sad and afraid). The child is given a verbal label and is instructed to tap the face matching the emotion label. The material comprises of validated facial expressions photographs from the Warsaw Set of Emotional Facial Expression Pictures (Olszanowski et al. 2015) alternative to the subset used in intervention. Also, unique subsets of photographs were used in testing at each time point.

The Picto Task examined the child’s ability to match a verbal label to a graphic representation of a facial expression. At the beginning of each trial, four pictograms appear on the screen (happy, angry, sad and afraid). The child is given a verbal label and is instructed to tap the face matching the emotion label. The material for the Picto tasks was created by a visual artist. We used a unique subset of emoticons at each assessment time point, different than the subset used in the intervention.

The Situation Task examined the child’s ability to match an emotional situation with the facial expression that it evokes. An illustration depicting an emotional scene is presented on the screen, along with at a short verbal description. The facial expressions of the protagonist are not visible or are blurred. Four pictograms are presented below. The child is instructed to choose the most appropriate pictogram that depicts how the character in the scene was feeling.

The visual material consisted of modified contextual illustrations from Google’s Public Domain, based on materials in Teaching Children with Autism to Mind-Read: A Practical Guide for Teachers and Parents“(Howlin et al. 1999). Each emotion was represented by 5 items with a total of 20 items per task. A point was given for each correct answer, with a maximum of 20 points per task, 60 points total. The Situation task consists of entirely novel visual material, than the one used in the intervention, as it was intended to test for Feature based distant generalization. Additionally, two different subsets of materials (situations) were employed for each assessment time-point. Screen shots of ECT tasks are shown in Fig. 2.

Fig. 2
figure 2

Screen shots of ECT subtests

Participants

Participants were recruited via the Macedonian Scientific Society for Autism and Open the Windows. The principal investigator contacted the families and written informed consent for participation in the study was obtained from all of the parents of the children. Initially 34 participants were recruited, 11 female and 23 male. The children’s chronological age was between 7 and 15 years (M = 10.91, SD = 2.57). They were recruited from self-contained special education classrooms as well as inclusive classrooms in the urban area of Skopje municipality. They all have been diagnosed with Autism disorder (ICD-10 criteria) by the regional commissions for disability assessment, and did not have additional syndromes. In the sample 59% (n = 20) had been diagnosed as having intellectual disability (ID), from which 41% (n = 14) had mild ID (IQ 50 to 70) and 18% (n = 6) had moderate intellectual disability (IQ 35 to 49). The information on intellectual functioning was recorded in the children’s disability assessment reports, provided by the parents. However information on the specific instrument or IQ score was not available. Two inclusion criteria were applied: (1) confirmation of ASD diagnosis with CARS, and (2) presence of EU deficits demonstrated with ECT scores below 80% (48 points). All children exceeded the CARS cut-off; however one child (female) did not meet the second inclusion criteria and was excluded from the sample after scoring 85% (51 points) on ECT.

The final sample consisted of 33 children, 10 female 23 male. Participants were randomly assigned to either an intervention or a control group. Randomization was stratified by age (7–10 and 11–15 years) and symptoms severity level i.e. CARS scores, (mild to moderate 30–36.5 and severe 37–60).

17 Participants (13 male and 4 female) in the intervention group used the programme in total duration of 720 min screen time. The sessions took place at the children’s home, where they engaged in the programme independently. All children had previously been using a computer and/or tablet, and were encouraged to use the device they are normally using and are most comfortable with. Some children used a tablet, some a PC with a mouse, while one child used a tablet and a PC interchangeably. The parents received specific instructions to reinforce the child after playing the games, to ensure motivation for completing the intervention, as well as not to interfere with the child’s performance in the games. The researcher explained how teaching was structured in the programme and that any interference was unnecessary and was going to impede learning. They were instructed to play the games at least 90 min per week. The researchers monitored the participant’s compliance with the intervention twice a week, by accessing on-line log files generated by the software. They also provided parents with feedback about the child’s progress and suggested which games the child should play.

The number of individual session per child varied between 24 and 48. The duration of each session varied from 15 to 30 min, depending on the individual child, and the intervention continued for approximately 8 weeks, until the total screen time reached 720 min (12 h). During the intervention one participant (male) failed to complete the program. 16 participants in the control group (10 male and 6 female) were not involved in any type of intervention of training, other than the standard school curriculum. During the intervention the participants did not receive instructions or training on emotion recognition or socio-emotional skills, in the school or elsewhere. See Fig. 3 for the progress of participants through the study.

Fig. 3
figure 3

Participant flow chart following consolidated standards of reporting trials guidelines

The final sample included 32 children, 16 in each condition. Frequency distribution of intellectual functioning categories across groups within the final sample is given in Table 1. As shown in Table 2 significant between group differences were not found on age, CARS scores or ECT sores.

Table 1 Frequency distribution for intellectual functioning
Table 2 Means, variability (SD’s) and ranges of background variables for the control and intervention groups

Testing Procedure

For the initial assessment the primary investigator met with the participants and their parents at the school/organization premises the child attended. The parents were informed about the purpose of the study and written informed consent for participation in the study was obtained from all of the parents of the children. Each child’s behaviour was rated by the parent using CARS in the presence of the principal investigator who provided clarification of the rating criteria and assistance in the process. ECT was administered one-on-one by the principal investigator. The tasks were presented via a tablet computer in the following order: Face task, Picto task and Situation task. Recorded verbal instructions were played automatically at the beginning of each trial. For each question in the ECT the child had to make a choice by independently taping on the screen. When the child did not respond in a timely manner, the investigator repeated the question until a choice was made. Any other kind of assistance to the child was not provided during the testing. Although not all children had used a tablet before, all of them had been using smartphones, so they were familiar with the touch screen technology. Answers were automatically collected by the software. The children were allowed to take breaks at any time during or between tasks.

The post-intervention assessment for the clinical group took place 1 week after completing the intervention. Post-intervention assessment for the control group took place 8 weeks after the initial testing. The ECT was administered with novel testing material, following the procedure described for the initial assessment.

Results

Assumption evaluations indicated that the normality, homogeneity of variance, linearity and homogeneity of regression slopes assumptions were all satisfactory. We looked at significant differences between groups on the four pre-intervention dependent variables. Using four one-way analysis of variance (ANOVA) found that the groups did not differ significantly on the Face task (F (1, 30) = .558, p = .46), Picto task (F (1, 30) = .002, n.s), Situation task (F (1, 30) = .525, n.s) or Total ECT score ((F (1, 30) = .004, n.s). Pre-intervention means and standard deviations for each dependent variable are shown in Table 3.

Table 3 Unadjusted means and variability (SD’s) across groups on all tasks at pre and post-intervention

Overall Analysis

After confirming the assumptions for ANCOVA were met, Two-Way Analysis of Covariance (2 × 2 ANCOVA) were computed to determine the effects of group (intervention, control) and intellectual functioning (typical, ID) on post-intervention ECT scores after controlling for pre-intervention scores. Because the intervention effect on ECT performance may depend, to some degree, on the severity of autistic traits, we included CARS score as a second covariate in the model. Effects of the overall analysis are detailed in Table 4.

Table 4 Main effect, interaction and covariates effects from overall analysis detailed by ECT task

After adjustment for pre-intervention ECT scores and symptoms severity, there was statistically significant overall main effect for group (F (1, 26) = 50.26, p < .001, η 2p  = .65) and for intellectual functioning (F (1, 26) = 7.24, p < .05, η 2p  = .21), on post-intervention ECT scores. Both symptoms severity (F (1, 26) = 6.58, p < .05, η 2p  = .20) and pre-intervention ECT (F (1, 26) = 12.96, p < .005, η 2p  = .33) had a significant overall effects as covariates. The effect of interaction term group by intellectual functioning was also significant (F (1, 26) = 11.39, p < .005, η 2p  = .30). Therefore an analysis of simple main effects and pairwise comparisons for group and intellectual functioning was performed.

The simple main effect of group was statistically significant on both levels of intellectual functioning with a difference in adjusted mean post-intervention ECT scores of 20.31, 95% CI [13.89, 26.72], p < .001 for typical and 7.07 95% CI [2.29, 11.85], p < .01 for ID. The adjusted mean post-intervention ECT score was significantly higher in the intervention group versus the control group across both levels.

The simple main effect of intellectual functioning was statistically significant for the intervention group, but not for the control group with a difference in adjusted mean post-intervention ECT scores of 12.64, 95% CI [6.24, 19.04], p < .001, and − .58 95% CI [− 6.40, 5.23], n.s, respectively. The adjusted mean post-intervention ECT score was significantly higher in the typical intervention compared to ID intervention group.

Means, adjusted means, standard deviations and standard errors for post-intervention ECT scores are presented in Table 5.

Table 5 Means, adjusted means, standard deviations and standard errors for post-intervention ECT scores

Analysis Per EU Task

Effects of the consecutive Two-by Two ANCOVAs analysis per ECT are detailed in Table 4. Examination of each distinct level of EU measured with the three tasks also revealed significant main effect of group on all three tasks: Face task (F (1, 26) = 17.52, p < .001, η 2p  = .40), Picto task (F (1, 26) = 52.60, p < .001, η 2p  = .66), Situation task (F (1, 26) = 6.13, p < 0.05, η 2p  = .19).

Face task scores were statistically significantly higher in the intervention group compared to the control group, with adjusted mean difference of 3.78, 95% CI [1.92, 5.63], p < .001. Picto task post-intervention scores were statistically significantly higher in the intervention group versus the control group with adjusted mean difference of 7.02, 95% CI [5.03, 9.02], p < .001). Situation task scores was statistically significantly greater in the intervention group compared to the control, adjusted mean difference of 3.05, 95% CI [.52, 5.59], p < .05.

The main effect of intellectual functioning was found to be significant on Picto task (F (1, 26) = 5.16, p < .05, η 2p  = .16), borderline significant for Face task (F (1, 26) = 4.23, p = .05, η 2p  = .16) and not significant for Situation task (F (1, 26) = .01, n.s). Picto task post-intervention scores were statistically significantly higher in the intellectually typical group (Madj = 11.99, SE = .81) versus the ID group (Madj = 9.52, SE = .61) with adjusted mean difference of 2.47, 95% CI [.23, 4.71], p < .05. The intellectually typical group scored higher on the Face task (Madj = 13.87, SE = .73) versus the ID group (Madj = 11.88, SE = .55) with borderline significant adjusted mean difference of 1.99, 95% CI [.001,3.98], p = .05. Significant difference was not found for the Situation task (adjusted mean difference .11, 95% CI [− 2.71, 2.94], n.s). Means, Adjusted Means, Standard Deviations and Standard Errors per ECT task are presented in Table 6.

Table 6 Means, adjusted means, standard deviations and standard errors detailed by task

Significant interaction effect of group by intellectual functioning was only found on the Picto task (F (1, 26) = 11.93, p = 0.002, η 2p  = .315), but not on Face task (F (1, 26) = 4.16, n.s) or Situation task (F (1, 26) = .75, n.s). Analysis of simple main effects and pairwise comparisons for group and intellectual functioning was performed on Picto task.

The simple main effect of group was statistically significant on both levels of intellectual functioning with a difference in adjusted mean post-intervention Picto task scores of 10.33, 95% CI [7.17, 13.49], p < .001 for typical and 3.72 95% CI [1.33, 6.11], p < .005 for ID. The adjusted mean scores were significantly higher in the intervention groups versus the control groups across both levels (Table 6).

The simple main effect of intellectual functioning was statistically significant for the intervention group, but not for the control group with a difference in adjusted mean Picto task of 5.78, 95% CI [2.78, 8.78], p < .005, and − .834 95% CI [− 3.79, 2.12], n.s, respectively. The adjusted mean post-intervention Picto task scores were significantly higher in the typical intervention compared to ID intervention group. Means, adjusted means, standard deviations and standard errors for post-intervention Picto task scores are presented in Table 6.

Symptoms severity had a significant effect as covariate on Picto task (F (1, 26) = 5.82, p = .02, η 2p  = .18) and Situation task (F (1, 26) = 6.42, p < .05, η 2p  = .19), but not on Face task (F (1, 26) = .63, n.s). Pre-test scores had a significant effect on Face task only (F (1, 26) = 24.35, p < .001, η 2p  = .484), not on Picto task (F (1, 26) = 1.47, n.s) or Situation task (F (1, 26) = .22, n.s)

Discussion

The goal of this study was to evaluate a CBI designed to enhance EU skills among elementary school students with autistic spectrum conditions with and without intellectual disability. We studied the effects of 720 min (8 weeks) individual use of the intervention program, on three emotion understanding measures. We found that the program is a useful tool for improving the EU skills of both children with typical IQ, as well as children with mild to moderate ID. Significant difference (p < .001) in the overall achievements between the experimental and control group at post-intervention was observed after adjusting for the effect of pre-intervention scores and autistic symptoms severity. Also significant difference in achievement was observed in the typical and ID intervention group compared with typical and ID control group.

The results on pre-intervention ECT support previous findings that children with ASC show marked deficits in EU as demonstrated on the pre intervention ECT (Fridenson-Hayo et al. 2016). After the intervention participants in the intervention group were able to correctly answer the ECT questions approximately 60% of the time relative to 37.5% for the control group. Given that there were no differences between the groups at pre-intervention ECT, we can conclude that the changes that have occurred in the direction of improving the capabilities that were studied are due to the use of the program. Individuals with ASC use different process in perceiving faces—one that is more time-consuming and perhaps more explicit and rule-based than that used by others. When performing emotion perception tasks people with ASC focus on individual facial features, rather than configurations (Rutherford and McIntosh 2007; Tanaka et al. 2003). It appears that the methodology used in this intervention builds on the perceptual strengths of children with ASC and supports learning to identify regularities certain correspondences in emotional expressions. This process may depend on intellectual capacity as demonstrated by the significant difference in performance between typical and ID intervention groups. However, both groups scored significantly higher than their control equvivalents. Although emotion comprehension impairments are present regardless of intellectual functioning, intellectual ability may act as a moderator in the learning process. In order to conclude this from our findings, similar results are expected across all ECT tasks. However difference within the intervention group based on intellectual functioning was found in one task (Picto) only, and thus the results regarding this aspect are inconclusive.

Overall, our results are in line with previous findings from evaluative studies that emotion understanding in children with ASC can be improved in laboratory settings using CBI.

Moreover, the effects of the intervention are evident at the level of feature based distant generalization, which demonstrates generalization of acquired skills applied to stimuli (faces and scenarios) previously unknown to users.

A contributing factor for generalization of EU abilities might be the characteristics of the used stimuli. The task incorporated pictograms as well as real face photographs of different ages (child and adult), sexes and ethnicities which might have enhanced generalization. An additional factor that might have fostered generalization is how games are structured. Part/whole training that addresses the featural processing bias (Tanaka et al. 2012) as well as the opportunity to practice matching facial expressions across identities might have further supported generalization of EU skills.

In children with ASC, insufficient experience with faces can inhibit the development of mechanisms for face processing and emotion recognition (Gauthier et al. 2000). Although the explicit exposure to face stimuli in the intervention (12 h) is incomparable with the experience that neurotypical children have with faces and facial expressions, this relatively short intervention shows that with increased stimulation and training for recognizing key elements of faces, facial expressions, and the situational factors that evoke them, the recognition of emotions among students with ASD can be significantly improved.

Another factor worth considering is the unique combination and content of training activities of the intervention. Examining whether specific activities or a combination of activities or games are more effective than others in improving EU would have implications for future intervention design. This scope of the intervention was teaching to recognize external indicators (as opposed to internal, cognitive causes of the emotions in others) of the emotional states of other people: facial expressions recognition explored through photographs (Face task) and pictograms (Picto task); and understanding of the impact of situational factors on emotions, explored through illustrations of social context (Situation task). The results of the corresponding tasks that measured each EU component, regarding four basic emotions (sadness, happiness, fear, and anger) demonstrate statistically significantly better experimental results of the intervention group in relation to the control group on all tasks.

The intervention effect is highest in the Picto task of emotion recognition that refers to the recognition of emotions from simple schematic drawings. Here we find the effect with the highest intensity (η 2p  = 0.55), which means that about 56% of the variance in the Task achievements are due to the effect of the independent variable, that is, the intervention itself. The intervention group showed significantly better achievements than the control on the emotion recognition from pictograms, at the level of significance α < 0.01. This sets the possibility for coincidence to less than 1000th of a percentage. For comparison, at the first component of emotion recognition—real face photographs, the effect of the training accounts for 17.5% of the variance. This effect, although interpreted as a great effect under the accepted conventions,Footnote 2 is significantly lower compared to the second level. This confirms the findings in Silver and Oakes’s training study (2001) that greater improvements in emotion recognition are shown when programs include cartoons rather than photographs of real faces. It is likely that in children with ASC the recognition of emotional expressions is more successful when they are represented by pictograms than with real face photos. This may be due to the simplified, clip art, cartoony style of expression representation in pictograms where the details are minimized, only information necessary to discriminate between expressions is presented and all irrelevant features are omitted. Moreover, there is evidence that cartoons are perceived differently than human faces in ASC (Grelotti et al. 2005; Lindner and Rosen 2006; Rosset et al. 2008, 2010).

For example, children with ASC use typical configural processing strategies for cartoons (not used for real faces). In a study examining facial-expression processing strategies in typical control groups and children with ASC, the later showed a typical inversion effect to human and non-human cartoon faces. This might be due the perceptual characteristics of the stimulus (cartoon vs. human). However, the authors note that both control and ASD groups showed greater overall performance for cartoon than for human faces (Rosset et al. 2008).

It has also been suggested that cartoons hold a significant status for children with ASC (same can be said for children in general) and this intrinsic interest and social motivation might be a facilitator for improving emotional understanding, which is an important implication for intervention. Significant interaction effect was only found for the Picto task. Despite the strongest effect size, here ID intervention group performed significantly worse than the the typical intervention group. This indicates than the EU performance regarding cartoons may be moderated by intellectual ability.

Looking at the third EU component—understanding situation-based emotions (contextual illustrations) we also found a statistically significant difference in favour of the intervention group. After adjusting for the effects of covariates the groups differ at α < 0.05 in favour of the intervention group. The intervention group showed significantly better abilities to predict emotional expressions in a social context with respect to the control group. The magnitude of effect size is similar to the Face task and accounts for 17% of variance (partial η 2p  = 0.17), a strong effect, but much smaller compared to the Picto task. The Situation task has the lowest means (see results section for adjusted means) compared to the other tasks. This is not surprising given that emotion inference from situational factors is a relatively more complex cognitive task (Mitchell and Phillips 2015). Despite this, users of the program still were significantly better abled at identify emotions base on illustrated context.

This study used a controlled randomise experimental method, with the highest internal validity. Including symptoms severity, in addition to pre-intervention scores in the model increases the sensitivity and power of the analysis and rigour of the evaluation. A recent systematic review of evaluation studies notes several ways to overcome current limitations: include individuals with low functioning ASC, confounding variables control, as well as enhance generalization and maintenance of skills (Kouo and Egel 2016). Our sample included elementary school children from regular and special education setting, with heterogeneous clinical presentation, including mild to moderate, as well as severe autism symptoms which enhances the generalization of results to the broader ASC population. However, we agree with Hopkins et al. (2011) ideally the findings need to be replicated with a population based sample where more heterogeneity is likely to be found.

The generalization and maintenance of these abilities were tested at the level of feature based distant generalization only. Generalization and maintenance (follow up) in complex social stimuli should be further explored. Next, it would be interesting to explore the effect of the time of the intervention, since it may be that longer training (more than 12 h), would lead to greater improvement EU skills. Also future studies should incorporate more comprehensive standardized measures of emotion understanding and a broader social competence profile.

EU components related to external socio-emotional indicators, which this study focuses on, are only part of the wider repertoire of socio-emotional skills. Although encouraging, our results represent a small complement to the development of a more comprehensive educational support of students with ASC.

Conclusion

The results of our research demonstrate that a relatively short educational intervention can bring about measurable gains in emotion understanding in children with ASC with associated ID and typical intellectual abilities. Strong positive effects were observed in emotion recognition from real face photographs and pictograms, as well as in understanding situation based emotion. The effects are generalized and manifested in novel stimuli previously not presented to users. In contrast to the neurotypical population, children with ASC need explicit teaching of emotion understanding that could possibly enhance of social-emotional functioning. Further research is indispensable for exploring the possible long-term benefits and the extent to which educational interventions reflect in everyday social life.