Introduction

Children with autism spectrum disorder (ASD) exhibit two primary symptom dimensions including social impairments (social-interaction and social-communication) and circumscribed and repetitive behaviors and interests (American Psychiatric Association [APA], 2013). The diagnostic framework recognizes significant heterogeneity in symptoms and functional levels (documented by specifiers indicating the presence or absence of co-occurring intellectual and language impairment). Despite relative strengths in cognitive and language abilities for high-functioning children with ASD (HFASD), social interaction and social communication deficits significantly interfere with daily functioning. Featured prominently in the social impairment dimension are problems involving diminished or lack of social responsiveness and non-verbal communication (i.e., processing, understanding, and responding to social cues; APA 2013).

Significant research attention has been directed toward understanding the non-verbal communication abilities of individuals with HFASD/ASD, particularly the ability to decode emotions (emotion recognition; ER) in facial expressions and prosody. Studies of ER are important as nonverbal behaviors such as facial expressions and prosody provide critical information about the internal/emotional state of others (Doi et al. 2013) and better child ER skills have been associated with higher parent and teacher ratings of social competence in children (Nowicki 1997). In addition, improved ER skills and have been associated with improved ASD symptoms of children with HFASD (Thomeer et al. 2011).

Many studies have identified face and prosody ER deficits in individuals with HFASD/ASD (e.g., Doi et al. 2013; Lindner and Rosen 2006) however some findings have been contradictory. A comprehensive review of studies yielded some conclusions about the ER abilities of these individuals. Harms et al. (2010) reviewed behavioral, eye-tracking, and brain-based studies of facial ER for individuals with ASD. Evidence from behavioral studies was mixed regarding a general deficit for basic emotions, however supported a deficit for complex emotions and for stimuli that have been manipulated. Abnormalities were consistently found in eye-tracking studies and neuroimaging studies suggesting less automatic and more attention-demanding processing. The authors suggested that individuals with ASD may be utilizing feature-based (local) processing, instead of configural (global/holistic) processing, as a compensatory strategy and that this may require more attention and effort compared to typically-developing (TD) individuals. Harms et al. concluded that the evidence supported a face ER deficit in ASD and that contradictory evidence in some behavioral studies may have been due to compensatory strategies, as well as cross-study differences (e.g., age and functional level of participants, type of stimuli). Studies of ER in prosody are far less prevalent; however, available findings have indicated a deficit for prosody in individuals with HFASD (see Doi et al. 2013; Lindner and Rosen 2006; Mazefsky and Oswald 2007).

Another facet of nonverbal communication involves encoding (display) of emotions. Encoding problems for individuals with HFASD can involve diminished or atypical facial expressions (APA 2013). Given its prominence in the diagnostic criteria, it is noteworthy how little research has examined encoding in this population. Available findings have, however, indicated significant problems encoding specific emotions, as well as more oddness of facial emotion expressions for both children and adults with HFASD compared to TD individuals (Macdonald et al. 1989; Volker et al. 2009). Taken together, studies have documented a number of decoding (recognition) and encoding (display) abnormalities for children with HFASD which contribute to their social challenges.

The significant deficits and abnormalities clearly establish the need for treatments that enhance the nonverbal skills and social performance of children with HFASD. One of the most common approaches for increasing the social competence of children with HFASD is social skills interventions. Recent comprehensive reviews by Reichow and Volkmar (2010) and Reichow et al. (2012) reported that social skills interventions were among the most common approaches for fostering social competence of individuals with HFASD and that evidence from existing studies has generally found social skills groups and video modeling to be promising treatment approaches for increasing social skills/competence, particularly for school-age youth with HFASD. Reichow et al. (2012) also assessed the impact of social skills groups on ER skills (among others) and found that social skills groups had no significant effect on ER skills of children with HFASD. Although findings suggested that social skills groups/interventions are a common and promising approach for enhancing social competence of youth with HFASD, their impact on other important areas of social performance including ER and social communication may be limited (Reichow et al. 2012).

Another approach to increasing the skills of individuals with ASD/HFASD is computer-based intervention (CBI). Although CBIs have received less empirical study, over the past two decades there has been an increase in CBIs to promote skills in individuals with ASD including ER (Ploog et al. 2013). Two recent comprehensive reviews yielded similar conclusions about the efficacy and methodological rigor of studies of CBI for this population. Ramdoss et al. (2012) reviewed seven studies and Ploog et al. (2013) reviewed six studies of CBIs that targeted ER for individuals with ASD. Both determined that CBI for ER was promising and often associated with improvements however the studies were characterized by significant methodological limitations (e.g., lack of randomized designs and control groups) which precluded conclusions regarding efficacy. The authors also noted the need for increased targeting and assessment of generalization and maintenance of skills. Further, Ramdoss et al. noted that few studies used standardized measures to assess the impact of CBI on social outcomes. According to Ploog et al., most CBI studies have been pilot investigations and the lack of controlled designs is typical for an emerging area of study. As a result, randomized controlled trials (RCTs) constitute an important next step in the assessment of CBIs. RCTs allow for the minimization/elimination of threats to internal validity (e.g., history, maturation), more precise determinations of causal links, and stronger conclusions regarding efficacy (Trochim et al. 2014).

CBI may be especially useful for children with HFASD who are hyperattentive to details and exhibit a tendency to “analyze or build systems, to understand and predict the behavior of nonagentive events in terms of underlying rules and regularities” (i.e., systemizing; Golan and Baron-Cohen 2006, p. 593). Systemizing may allow these individuals to compensate for ER deficits by learning and establishing predictable links between facial expressions and prosody and underlying emotions (Golan and Baron-Cohen 2006). Other advantages of CBI include reduced social demands and distractions during instruction (Hopkins et al. 2011), repetition of lessons and practice exercises, a predictable and consistent routine and environment (LaCava et al. 2007), greater instructional precision, and increased fidelity of implementation (Ploog et al. 2013). Repeated practice may also enhance ER automaticity (Thomeer et al. 2011).

Mind Reading (i.e., MR; Baron-Cohen et al. 2004) is a CBI designed to increase decoding skills and exploit the systemizing strengths of individuals with HFASD. The interactive software teaches facial expression and prosody decoding using visual and auditory lessons and stimuli, practice trials, and computer-delivered reinforcement (see Procedures for a description of MR). The current review yielded only three studies of MR as the primary treatment for children with HFASD. LaCava et al. (2007) assessed the effect of MR on decoding skills of eight children with HFASD. Following 10 weeks of independent use of MR (M = 10.5 h total per child), pre-post analyses revealed significant improvements in face and prosody ER; however face ER improvements were limited to tasks taken from the MR program. In another study, LaCava et al. (2010) evaluated MR for four boys with HFASD. Treatment involved 7–10 weeks of MR instruction (M = 12.3 h total per child) that also included adult tutors. Results indicated pre-post improvements for face and prosody ER on tasks taken from the MR program, as well as photographs and cartoons. Changes in social interactions (based on observations) were unreliable, leading the authors to conclude that the improved ER skills did not yield associated improvements in social performance. Limitations across both studies included the lack of control groups, small samples, no reporting of IQ or language data, lack of diagnostic confirmation, and no reporting of effect sizes.

Thomeer et al. (2011) conducted a pilot study of MR for 11 children, ages 7–12 years, with HFASD. The manualized protocol was comprised of 12 sessions (90-min each) administered over 6 weeks (M = 15.9 h total per child). Each session included staff supervised MR instruction (prescribed emotion groups and time using components of the program), in vivo rehearsal trials, and a behavioral reinforcement system. In vivo rehearsal and reinforcement were included to insure repeated practice and foster generalization. Parents also provided a reinforcer at home if their child earned a predetermined percentage of points during each session. Feasibility was supported in high levels of fidelity and parent and child satisfaction. Pre-post comparisons indicated significant increases in parent-rated emotion decoding and encoding and a significant decrease in ASD symptoms. Results provided support for the effect of MR on nonverbal skills and also suggested potential positive effects on ASD symptoms including social impairments. Despite support for feasibility and initial indications of positive outcomes, Thomeer et al. (2011) noted the need for examination of the protocol in a randomized controlled trial (RCT) that includes a control group and a combination of measures (child testing and ratings scales) assessing targeted skills, ASD symptoms, and broader social skills.

This study was conducted to examine the protocol developed by Thomeer et al. (2011) in an RCT. Methodological improvements included a larger sample and randomized design, outcome assessment using direct child testing and rating scales of proximal and distal outcomes, and the inclusion of a follow-up assessment of maintenance. Given that research on the MR software is still in the early stages and assessment of efficacy requires monitoring of treatment exposure (lesson completion and time using program areas) and active engagement, there is also a continued need for supervised administration of the program and evaluation in a controlled environment. It was hypothesized that children in treatment would exhibit significantly better ER skills and receive significantly higher ratings of ER, encoding, and social skills and lower ratings of ASD symptoms at posttest compared to controls, and maintain the gains at follow-up relative to controls. It was also hypothesized that children and parents would express high levels of satisfaction with the program.

Method

Participants

A total of 43 children, ages 7–12 years with HFASD completed the study and were included in the analyses. The sample was recruited using public announcements and achieved in three sampling waves over an 18-month period. Inclusion criteria were a prior clinical diagnosis of autism, Asperger’s, or Pervasive Developmental Disorder-Not Otherwise Specified, Wechsler Intelligence Scale for Children-4th Edition (WISC-IV; Wechsler 2003) short-form IQ > 70 (and Verbal Comprehension Index [VCI] or Perceptual Reasoning Index [PRI] score ≥ 80), and Comprehensive Assessment of Spoken Language (CASL; Carrow-Woolfolk 1999) short-form expressive or receptive language score ≥ 80. In addition, all met criteria on the Autism Diagnostic Interview-Revised (ADI-R; Rutter et al. 2003) which was completed to verify diagnosis. A total of 49 cases were screened for inclusion and 3 were rejected due to an IQ or language score below the minimum required. Two children dropped out of the study before pretesting and were replaced in a subsequent sampling wave. The 44 eligible children were randomly assigned to the treatment or waitlist control condition using an online random number generator. One control case was excluded from the outcome analyses due to the emergence of significant psychiatric symptoms during the study. This resulted in a total of 43 children in the final analyses (see Fig. 1 for the progress of participants through the study). A detailed description of the sample is presented in Table 1.

Fig. 1
figure 1

CONSORT flow diagram of the progression of phases for the 2 groups (screening, intervention allocation, follow-up, and data analyses)

Table 1 Demographic characteristics

Analyses of demographic data supported cross-condition comparability. Results indicated no significant between-groups differences on average child age, t(41) = 0.745, p = 0.460, parent education, t(41) = 0.475, p = 0.637, short-form IQ, t(41) = −0.223, p = 0.825, short-form VCI, t(41) = −0.213, p = 0.832, short-form PRI, t(41) = −0.078, p = 0.939, expressive language, t(41) = 0.212, p = 0.833, receptive language, t(41) = −0.015, p = 0.988, ADI-R Social, t(41) = −0.592, p = 0.557, ADI-R Communication, t(41) = −1.133, p = 0.264, or ADI-R Restricted and Repetitive Behavior, t(41) = −0.173, p = 0.863. Similarly, Exact Test two-tailed p values for gender (1.00) and ethnicity (.488) were non-significant. This degree of cross-condition comparability is critical, particularly in the areas of cognitive functioning (VCI and PRI) in ER studies for children with HFASD (Harms et al. 2010).

Measures

Consistent with recommendations to improve ASD outcome assessments, the current measures assessed skills directly targeted by the treatment, as well as ASD features and broader social skills (Lord et al. 2005). The measures were selected based on their use in prior MR treatment trials and other psychosocial treatment trials for children with HFASD. The following is a description of the screening measures and outcome measures.

Screening Measures

Wechsler Intelligence Scale for Children-4th Edition (WISC-IV)

IQ was evaluated using a 4-subtest short-form of the WISC-IV (Wechsler 2003) consisting of Block Design, Similarities, Vocabulary, and Matrix Reasoning subtests. Methods provided by Tellegen and Briggs (1967) were used to calculate short-form reliability and validity coefficients based on information in the technical manual. The short-form composite yielded an internal consistency estimate of .95 and correlated .92 with the Full Scale IQ.

Comprehensive Assessment of Spoken Language (CASL)

A 4-subtest short form of the CASL (Carrow-Woolfolk 1999) was used as a screening measure for receptive and expressive language abilities including the Antonyms, Synonyms, Syntax Construction, and Paragraph Comprehension subtests. For the ages under consideration, subtest internal consistency reliabilities ranged from .76 to .90 and the short-form composite reliability was .94. Composite reliability was calculated using the formula provided by Tellegen and Briggs (1967).

Autism Diagnostic Interview-Revised (ADI-R)

The ADI-R (Rutter et al. 2003) is a 93-item standardized diagnostic interview administered to a caregiver familiar with the developmental history and current behavior of the person being evaluated. The interview focuses on three domains (i.e., Reciprocal Social Interactions, Language/Communication, and Restricted, Repetitive, and Stereotyped Behaviors and Interests). Validity evidence indicates that the ADI-R accurately discriminates between ASD and non-ASD samples (Rutter et al. 2003).

Outcome Measures

Cambridge Mindreading Face-Voice Battery for Children (CAM-C)

The CAM-C measures emotion recognition for 15 emotion concepts using facial expression video clips and speech audio clips. Children view or listen to a clip and select one of four emotion words that reflect the emotion of the person in the clip. The measure assesses recognition of six basic and nine complex emotions taught in the Mind Reading program, with higher scores indicating greater accuracy. The CAM-C consists of two subtests which yield scores for recognition of emotions in facial expressions (Faces total score) and speech segments (Voices total score). Test–retest reliabilities (10–15 week interval) were reportedly .79 and .75 for the Faces and Voices scales, respectively. The CAM-C effectively discriminates between children with HFASD and typical children, especially the complex emotions (O. Golan, personal communication, June 10, 2009). The CAM-C has been recommended as a standardized measure for use in ER studies for children with HFASD (Ramdoss et al. 2012) and found to be treatment sensitive in prior ER studies with this population (e.g., LaCava et al. 2007, 2010).

Emotion Recognition and Display Survey (ERDS)

The ERDS (Thomeer et al. 2011) is a rating scale designed to evaluate the ability of children, age 7–12 years, to recognize (decode) and display (encode) emotions from the “Top 100” emotions of Mind Reading. The 35 emotions assessed are comprised of both basic (e.g., happy, sad) and complex (e.g., silly, upset, tired) emotions and they constitute a random sampling of the emotions taught in the treatment protocol. Assessment of a representative subset met the need for sufficient sampling of treatment content and for brevity. Each emotion on the ERDS is paired with a definition and the parent rates how well the child can (a) recognize the emotion (yielding a Receptive total score) and (b) display the emotion (yielding an Expressive total score). Items are rated on a 5-point scale ranging from 1 (almost never) to 5 (almost always). Higher total scores indicate more accuracy in decoding and encoding. In an ER study involving children with HFASD, Thomeer et al. (2011) found the ERDS was treatment sensitive and reported an internal consistency reliability of .90 for the Receptive score and .92 for the Expressive score, with the correlation between the decoding and encoding subscales being .86.

Social Responsiveness Scale (SRS)

The SRS (Constantino and Gruber 2005) is a rating scale assessing the severity of ASD features and it generates a total score and five subscale scores. On the SRS, respondents rate the intensity of behaviors on a scale of 1 (not true) to 4 (almost always true), with higher scores reflecting more ASD-related symptoms/problems. Psychometric studies of the SRS have consistently documented a single-factor structure representing a unitary construct underlying ASD symptom severity. The continuous scaling of the total score, however, makes it useful as a measure of overall symptom severity including symptom severity in response to intervention. In this study, only the total score was used. The total score has an internal consistency reliability of .93 to .97 and it accurately discriminates between ASD and non-ASD behavioral disorders. The SRS was included as it assesses the severity of ASD symptoms on a continuous scale, it was treatment sensitive in the pilot MR study by Thomeer et al. (2011), and it has been used and found to be treatment sensitive in other psychosocial treatment trials for youth with HFASD (e.g., White et al. 2010).

Behavior Assessment System for Children, Second Edition-Parent Rating Scales (BASC-2-PRS)

The BASC-2-PRS (Reynolds and Kamphaus 2004) assess behaviors across a wide variety of domains. Items are rated on a scale from 0 (never) to 3 (almost always). This study used the Social Skills subscale which measures interpersonal aspects of social adaptation and skills needed for successful interaction. Internal consistency reliability was reportedly .84 to .88 for the Social Skills subscale. Moderate correlations have been established with comparable scales on other well-known behavior rating scales (Reynolds and Kamphaus 2004). The Social Skills subscale was selected as an indicator of broad social performance due to its use and treatment sensitivity as a measure of broader social skills (social skills not directly targeted) in other psychosocial treatment trials for children with HFASD (e.g., Lopata et al. 2008; Thomeer et al. 2012).

Satisfaction Surveys

Researcher-developed satisfaction surveys were administered to the treatment group following completion of treatment. Parents provided ratings for 10 items that assessed their satisfaction with their child’s progress in identifying emotions, displaying emotions, and social interactions, as well as staff interactions with their child, staff communication, staff responsiveness to questions or requests, the program’s schedule, the quality of treatment, their child’s enjoyment of the program, and overall program effectiveness. Children provided ratings for 7 items that assessed their satisfaction with how well the program taught them to understand emotions in people’s faces and voices, display emotions in their own faces and voices, how helpful staff clinicians were, enjoyment of using computers to learn emotions, and overall enjoyment with the program. Items on both surveys were on a scale of 1 (completely dissatisfied) to 7 (completely satisfied). (Satisfaction surveys provided in Appendix).

Procedures

This study was approved by the Institutional Review Board and conducted according to the approved protocol including attainment of written parental consent and child assent prior to data collection. A randomized waitlist control design was used. After being screened and determined to meet inclusion criteria, participants were randomly assigned to the treatment or control condition. Child testing and parent ratings were conducted for both groups; pretest data were collected during the week preceding treatment initiation, posttest data were collected within 1 week following treatment, and follow-up data were collected 5 weeks following posttest. Satisfaction ratings were collected from the treatment group only at posttest.

Treatment Protocol

The treatment was administered in a computer lab on a college campus and followed the Thomeer et al. (2011) protocol. One modification to the original protocol (12 sessions over 6 weeks) was that the number of sessions was increased to 24 (two 90-min sessions per week over 12 weeks). In this trial the original 12-session protocol was administered once and then repeated a second time. This was done to increase the number of exposures to lessons and practice trials in order to promote generalization and maintenance. Prior evidence has suggested that increased time using MR was associated with better ER (Golan and Baron-Cohen 2006) and that the effects of CBI may require adequate time for consolidation of learning (Faja et al. 2012). The 90-min manualized and supervised sessions followed the same schedule (five 15–20 min intervals) and included MR instruction, in vivo rehearsal trials, and a behavioral reinforcement system. Each staff clinician was responsible for supervising the MR lessons, conducting in vivo trials, and providing reinforcement for two to three children per session. The following is a description of the treatment components.

MR is an interactive software program designed to teach recognition of simple and complex emotions to children with ASD via facial-video and vocal-audio stimuli (Baron-Cohen et al. 2004). The program consists of 412 emotions, organized into 24 emotion groups and by 6 emotion levels. The protocol in this study targeted 98 of the “Top 100” emotions as they were identified as appropriate for the age range of the sample (Thomeer et al. 2011).

The software delivers instruction and reinforcement across multiple program areas including the Emotions Library, Learning Center, Games Zone, and Rewards Zone. In the Emotions Library children observe/listen as emotions are defined in text vignettes, and facial-video and vocal-audio examples. The Learning Center consists of structured lessons that teach ER using a combination of audio and visual examples. It also includes quizzes that assess ER skills prior to and upon completion of lessons. The Game Zone is comprised of games and activities (e.g., Space Faces) designed to provide additional practice of ER skills. Lastly, the Rewards Zone provides contingent access to pictures and video clips of interest to many children with HFASD (e.g., trains, spinning objects, etc.). The software has an internal reinforcement system that allows children to accrue “rewards” (i.e., tokens) for successfully and accurately completing tasks and quizzes within the MR program. These MR generated “rewards” are used to access (i.e., unlock) pictures and video clips in the Rewards Zone. In this treatment trial, the duration of time spent in each area was dictated by the schedule and staff clinicians observed and prompted the children to insure they adhered to time parameters and lesson requirements.

In vivo rehearsal trials were originally included in the protocol by Thomeer et al. (2011) as a way to improve the limited generalization reported in previous MR studies for children with HFASD (e.g., LaCava et al. 2007). They provide repeated practice of the newly learned emotions, and opportunities to foster skills generalization during more authentic interactions (Thomeer et al. 2011). Two in vivo trials were conducted during each of the five intervals in each session. Specifically, one time per interval a staff clinician displayed an emotion and asked the child to identify the emotion (decoding) and one time per interval the staff member asked the child to display the emotion (encoding). The emotions practiced during each session were from a list of emotions targeted during that session’s MR lesson. All in vivo rehearsal trials were conducted in a one-to-one format between the clinician and child. The clinician provided reinforcement for an accurate decoding and/or encoding response (see behavioral reinforcement system).

A behavioral reinforcement system was implemented to increase on-task and social behaviors, as well as reinforce decoding and encoding skills. Program and social rules were operationally defined, reviewed by staff clinicians at the start of each session, and posted on a large display. During each of the five intervals per 90-min session, each child had the opportunity to earn one point for adhering to program rules (e.g., watching all target video and listening to all audio clips), one point for refraining from negative social behavior (e.g., poor eye contact), one point for accurately decoding the emotion during the in vivo trial, and one point for accurately encoding the emotion during the in vivo trial. At the end of every interval, each child received point-based feedback on her/his performance during that interval. In all, each child had the opportunity to earn 20 points per session. To further reinforce the children’s performance, parents provided a home reward if the child received ≥80 % of her/his points for the given session (≥16 points). (No points-based rewards were provided at the treatment sessions).

Control Protocol

Children in the control condition were monitored for any external clinical intervention they may have received while children in the treatment condition received the intervention. Based on parent reports, no control children participated in external clinical treatment (i.e., psychosocial or emotion-recognition treatment) during the study. Following completion of the treatment protocol and follow-up testing, the same manualized treatment was provided to children in the control condition.

Staff Training and Treatment Integrity

Graduate and undergraduate psychology and education students served as staff clinicians and treatment implementation was supervised by a doctoral student and doctoral-level psychologist. Prior to initiation of treatment, staff completed 8 h of training which included mandatory passing of an exam assessing mastery of the treatment manual (score of 100 % required) and applied practice exercises implementing MR, in vivo trials, and the behavioral reinforcement system. Standardized fidelity checklists (assessing staff clinician adherence to the protocol) were completed throughout treatment by research assistants not involved in treatment delivery, and one of the clinical supervisors. Staff clinicians and sessions were randomly selected and observed. Fidelity was assessed in 21 % of the sessions and averaged 98 %. The highly manualized and simple nature of the protocol contributed to the high fidelity. Each child’s use of MR was also tracked by an internal chronometer within the software. The protocol targeted approximately 31.2 h total of MR use (12 min of each session were used for review of rules, lesson set-up, in vivo rehearsal trials, and interval and performance feedback). The average total time using MR was 29.3 h.

Results

Data Analysis Plan

Consistent with NIMH working group recommendations for assessing outcomes in ASD treatment trials (Smith et al. 2007), a designated set of primary measures and small number of secondary measures were used. Primary measures consisted of four ER indicators that assessed skills targeted by the intervention including a child test assessing ER in faces and voices (CAM-C Faces and CAM-C Voices) and a parent rating scale assessing the children’s emotion decoding and encoding skills (ERDS Receptive and ERDS Expressive). Secondary measures were comprised of two indicators assessing potential associated effects including parent ratings of ASD symptoms (SRS) and broad social skills (BASC-2 Social Skills). ANCOVA (controlling for pretest) was used to assess between-condition differences for each measure at both posttest and follow-up. Significant between-condition omnibus F results were further assessed using Sidak-corrected post hoc comparisons to examine between-condition differences separately at posttest and follow-up. Family-wise alpha was maintained at .05 for the four primary measures (.0125 per comparison) and at .05 for the two secondary measures (.025 per comparison). Post hoc comparisons within each ANCOVA model were also protected by using the family-wise alpha level (.0125 for the primary measures and .025 for the secondary measures). Effect sizes for the ANCOVA omnibus F tests were calculated using omega squared (ω2). Effect size d was calculated for between-group differences at posttest and follow-up (0.2 = small effect, 0.5 = medium effect, 0.8 = large effect; Cohen 1988). Descriptive data on satisfaction ratings are provided for the treatment group at posttest.

Primary Outcome Analyses

Cambridge Mindreading Face-Voice Battery for Children (CAM-C Faces and CAM-C Voices)

Descriptive statistics and results of the statistical analyses for the CAM-C are presented in Table 2 and in the following. Results of the ANCOVA for CAM-C Faces yielded a significant between-groups effect (p < .001; ω2 = .23) in the expected direction. Post hoc comparisons between the two conditions indicated that the treatment group achieved a significantly higher CAM-C Faces score than the control group at both posttest (t[40] = 5.79, p < .001 [one-tail], d = 1.34) and follow-up (t[40] = 3.45, p = .001 [one-tail], d = 0.86). Between-groups effect size estimates at posttest and follow-up were large. For the CAM-C Voices, ANCOVA results indicated a significant between-groups difference (p < .001, ω2 = .14) in the expected direction. Results of the post hoc comparisons between the conditions revealed significantly higher scores for the treatment group on the CAM-C Voices at both posttest (t[40] = 4.36, p < .001 [one-tail], d = .99) and follow-up (t[40] = 2.87, p = .006 [one-tail], d = 0.66) than the control group. Effect sizes between-groups were large at posttest and medium at follow-up.

Table 2 Primary outcome measures, pretest, posttest, and follow-up scores, tests of significance, and effect sizes

Emotion Recognition and Display Survey (ERDS Receptive and ERDS Expressive)

Descriptive statistics and results of the statistical analyses for the ERDS are presented in Table 2 and in the following. ANCOVA results for the ERDS Receptive (decoding) ratings indicated a significant between-groups difference (p = .006, ω2 = .08) in the expected direction. Although the post hoc comparison between the groups was not significant at posttest (t[40] = 1.82, p = .038 [one-tail], d = .46), the between-groups difference was significant at follow-up (t[40] = 2.76, p = .0045 [one-tail], d = .73) and favored the treatment group. The effect size was medium at follow-up. For the ERDS Expressive (encoding) ratings, ANCOVA results were significant (p = .0025, ω2 = .11) in the hypothesized direction. Post hoc between-groups differences were significant and favored the treatment group at posttest (t[40] = 2.33, p = .0125 [one-tail], d = .61) and at follow-up (t[40] = 2.93, p = .003 [one-tail], d = .85). Between-groups effect size estimates were medium at posttest and large at follow-up.

Secondary Outcome Analyses

Social Responsiveness Scale (SRS)

Descriptive statistics and results of the statistical analyses for the SRS are presented in Table 3 and in the following. Results of the ANCOVA for SRS scores yielded a significant between-groups difference (p = .0135, ω2 = .04) in the anticipated direction. Results of the post hoc comparisons indicated significantly lower scores (fewer ASD symptoms) for the treatment group than the control group at both posttest (t[40] = 2.19, p = .0175 [one-tail], d = .46) and follow-up (t[40] = 2.06, p = .023 [one-tail], d = .45). Between-groups effect size estimates at posttest and follow-up were in the small range.

Table 3 Secondary outcome measures, pretest, posttest, and follow-up scores, tests of significance, and effect sizes

BASC-2 Social Skills

Descriptive statistics and results of the statistical analyses for the BASC-2 Social Skills scale are presented in Table 3 and in the following. Results of the ANCOVA indicated that the between-groups difference was not statistically significant (p = .168, ω2 = .00); thus, no follow-up post hoc comparisons were conducted.

Parent and Child Satisfaction

Parent and child ratings reflected high levels of satisfaction. Out of 70 possible points, the average parent rating on the satisfaction surveys was 66.92 (item M = 6.69 of a maximum = 7). Out of 49 possible points, the average child rating on the satisfaction surveys was 44.60 (item M = 6.37 of a maximum = 7).

Discussion

A pilot study by Thomeer et al. (2011) suggested that a manualized protocol including MR instruction, in vivo rehearsal, and behavioral reinforcement resulted in significant increases in decoding and encoding skills and a significant reduction in ASD symptoms for children with HFASD, however that study lacked direct child testing, a control group, and assessment of maintenance. This study evaluated the Thomeer et al. (2011) protocol in an RCT that included a control group, direct child testing, and evaluation of skill maintenance. Results indicated that children who completed the treatment performed significantly better than children in the control on a test of ER skills for both facial and vocal expressions immediately following treatment, with the gains maintained at 5-week follow-up. Parent ratings were largely consistent with the child testing and indicated significantly better decoding and encoding of emotions following treatment compared to controls. The only difference was that the improvement in parent-rated decoding skills at posttest, despite being higher for the treatment group (relative to controls), was non-significant following the alpha correction (p = .038; d = .46); the difference was significant at 5-week follow-up. Taken together, the child testing and parent ratings indicated significant improvements in decoding skills that were evident in the child testing, perceived by parents outside the treatment setting, and maintained at follow-up. The improvement in encoding (display) also suggested that the protocol was perceived by parents as positively affecting the children’s facial expressions of emotions. Secondary measures were used to assess potential associated (distal) effects of the treatment on ASD symptoms and broader social skills. Because these potential associated effects were evaluated using only parent ratings (which may have been susceptible to rater bias), they should be viewed as preliminary. Examination of these effects suggested that the treatment was associated with significantly lower levels of parent-rated ASD symptoms for the treatment group immediately following treatment and at 5-week follow-up. Although preliminary, this was considered promising given the long-term stability of ASD symptoms that characterize HFASD. Interestingly, the significant reduction in ASD symptom ratings for the treatment group was not accompanied by significantly higher ratings of broad social skills (although they were higher at posttest and follow-up for the treatment group compared to controls). Lastly, parents and children expressed high levels of satisfaction with the program. The high levels of satisfaction were also viewed as promising given that increased motivation and use of CBI has been associated with improved learning (Golan and Baron-Cohen 2006; Ramdoss et al. 2012) and the current protocol was 12-weeks in duration. There has been a need to formally assess satisfaction in CBI trials that target ER and social performance for this population (Hopkins et al. 2011). Again, although these symptom, skills, and satisfaction findings are interesting, they were based on parent ratings which may have been susceptible to bias (due to parental awareness of the treatment condition of the children). They do suggest however that parents and children were satisfied with the protocol and perceived some benefit associated with the treatment.

Findings of this study lend further support for the efficacy of MR as a promising CBI for children with HFASD. Results indicating improved ER skills for children with HFASD are consistent with prior MR investigations (e.g., LaCava et al. 2010), and support MR’s additional potential for positive effects on encoding and ASD symptoms (Thomeer et al. 2011). The findings of maintenance extend the research and suggest some stability of skill improvements and symptom reductions. The current protocol contained a number of treatment elements that have been recommended for teaching ER skills to children with HFASD including a focus on direction and allocation of attention toward core facial features and tone of voice, explicit rule-based instruction, behavioral reinforcement, in vivo rehearsal, and repeated practice (Faja et al. 2012; Harms et al. 2010; LaCava et al. 2010; Lindner and Rosen 2006). Results of this study appear to support the use of these techniques for teaching decoding and encoding and reducing ASD symptoms for children with HFASD, however additional elements appear warranted for enhancement of broader social performance (e.g., explicit instruction and role-play for learning to respond to emotional expressions by others, how to encode emotions during specific social scenarios, etc.). Consistent with this assertion, a number of authors have proposed that CBIs targeting ER and social performance will likely need structured and planned instruction and practice in real-life settings and social interactions to yield greater improvements in broad social skills (Hopkins et al. 2011; Ploog et al. 2013; Ramdoss et al. 2012).

The current study had a number of strengths that addressed limitations in prior studies including a relatively large and well-characterized sample, supervised and manualized protocol, structured fidelity monitoring system, randomized design, and battery of measures that assessed targeted outcomes and broader skills, maintenance, and satisfaction. Despite these strengths, several limitations warrant mention. Although the sample was relatively large compared to other studies, it was none-the-less limited and may have yielded insufficient power to detect smaller effects. The sample was also mainly male and Caucasian. Future studies would benefit from larger and more diverse samples.

Additional limitations involved the use of raters (parents) who were aware of the treatment condition of the children and lack of direct behavioral observations by naïve raters. Although use of direct behavioral observations is rare and often not feasible in larger-scale RCTs for children with HFASD (White et al. 2007), behavioral observations conducted by naïve raters (blinded to treatment condition) using carefully selected and operationally-defined skills/behaviors would minimize potential rater bias, increase reliability (inter-rater), and provide an objective assessment of ASD symptoms and skills. The use of operationally-defined social behaviors may also be more sensitive to treatment gains than broad social skills measures that contain unrelated items. Given these benefits, future studies should include behavioral observations conducted by naïve raters as part of the outcome assessments. Another limitation involved the use of the SRS as a measure of ASD symptom severity. As previously described, validity studies have only supported the use of the SRS total score as an indicator of overall symptom severity. This inhibits the ability to determine the specific ASD symptom dimension(s) affected by the treatment. Because the SRS total score is limited in this way, future studies may benefit from the use of an ASD measure that yields information on specific symptom dimensions.

Given that this is the first RCT of MR and this protocol, replication studies are needed. Ongoing research should continue to study CBI in narrowly defined groups as efficacy may differ based on functional level (Hopkins et al. 2011; Ploog et al. 2013). Future studies may want to compare supervised MR administration versus independent use, evaluate the efficacy of MR as a component in a comprehensive psychosocial treatment, or assess characteristics of treatment responders. Dismantling studies of the current protocol would also be informative. As noted, in vivo rehearsal and reinforcement were added to increase treatment effects and foster improvements in other skills/symptoms however these features may make the protocol more difficult to implement. Future studies should examine the various components and possible combinations to determine if similar outcomes can be achieved using a simplified protocol. Studies that compare the treatment protocol to a non-therapeutic program would also control for potential placebo effects.

Overall, results of the current study suggested that the manualized protocol was effective in improving decoding and encoding skills and reducing ASD symptoms for children with HFASD. Prior assertions about the affinity of children with HFASD toward working on computers and CBI also appeared to be supported in the high satisfaction ratings. The significant increases in decoding and encoding and reduction in parent-rated ASD features are noteworthy on their own, however identifying ways to enhance the current protocol to yield stronger effects on social-communicative performance and social competence appears worthy of ongoing study.