Introduction

Individuals with autism spectrum disorder (ASD) constitute a growing proportion of students receiving special education services and they experience unique challenges (e.g., social-communication deficits) which impede acquisition and generalization of skills (American Psychiatric Association 2013; Office of Special Education Programs 2017). Although individuals with ASD present with diverse skill profiles, they often exhibit poor performance on academic skills relative to their cognitive abilities, suggesting that they require individualized instruction and supports (Keen et al. 2016; King et al. 2016). As emphasis on access to general education curriculum and settings has increased, so have the academic expectations of students with ASD, further increasing the need to identify effective practices for teaching academic content to this population of students (e.g., the Common Core State Standards 2010; Fleury et al. 2014; King et al. 2016).

In addition to difficulties in traditional academics, many students with ASD display minimal appropriate engagement during classroom activities (Fleury et al. 2014). Engagement behaviors (e.g., participation in group activities, appropriate use of materials, on-task behavior) are often considered fundamental by teachers and have been linked to academic outcomes (Fleury et al. 2014; Koegel et al. 2010). For example, academic participation and success during grade school positively predict participation in postsecondary education and competitive employment, domains in which individuals with ASD are greatly underrepresented (Migliore et al. 2012).

One increasingly popular option for presenting academic content and engaging students with ASD is the use of touch-screen device technology (Kagohara et al. 2013). Portable touch-screen devices such as iPads and Android tablets are widely available and have a number of features which make them potentially desirable for use in educational contexts with individuals with ASD. Researchers have found that some individuals with ASD prefer technology-based instruction and perform better during interventions that include electronic devices (Kagohara et al. 2013; Shane and Albert 2008). Previous literature also suggests that these devices may reduce the frequency of adult-delivered prompts during instruction, which can decrease the likelihood of prompt-dependency (Mechling 2011; Smith et al. 2015). Additionally, these mainstream devices may be less stigmatizing, more affordable, and offer additional functions compared to many devices specifically designed to serve as assistive technology (e.g., highly-specialized speech generating devices).

Parents and teachers report that they find portable touch-screen devices appealing and use them frequently with individuals with ASD (Clark et al. 2015; King et al. 2017). These devices have also received attention and widespread endorsement in popular media (Knight et al. 2013). For example, a recent article in Parenting advertises 11 “expert-recommended apps” for individuals with autism and describes how their use may improve skills across a variety of domains without any references to supporting research (Willets 2017). Given the apparent enthusiasm and purported adoption of these devices in academic programs for individuals with ASD, it is important for systematic reviews to illuminate how their use is supported by empirical research.

Previous reviews on interventions incorporating touch-screen devices with individuals with ASD have focused more broadly across skill domains (Hong et al. 2017; Kagohara et al. 2013). Kagohara et al. (2013) systematic review included 15 intervention studies but only one of the included studies targeted academic skills. Specifically, researchers used an iPad to present instructional videos (video modeling) to successfully teach two students with ASD to check the spelling of words (Kagohara et al. 2012). A more recent meta-analysis by Hong et al. (2017) examined 36 studies that used touch-screen devices in treatment programs for individuals with ASD. Nine of those studies directly targeted academic skills (e.g., reading, writing, arithmetic) or an increase in engagement in academic tasks. Although the nine studies produced large effect size estimates, no moderating variables related to intervention characteristics were identified. The broad focus across skill domains (e.g., academics, communication, vocational skills) precluded more nuanced conclusions regarding academic outcomes, limiting ability to offer guidance for practitioners. In terms of academic engagement, a number of peer-reviewed studies have successfully increased engagement in academic settings by teaching self-monitoring, providing choices, and using peer supports (e.g., Goodman and Williams 2007; Koegel et al. 2010; McCurdy and Cole 2014). However, despite the recognized importance of engagement in the classroom, we were unable to locate any systematic reviews focused on the academic engagement of students with ASD.

This current meta-analysis builds from previous reviews by focusing directly on studies involving students with ASD that used touch screen devices to target academic skills and increase engagement in academic tasks (on-task behavior). Further, variables that were not coded and summarized in previous reviews (e.g., intervention dosage, participant functioning level) were extracted from included studies and analyzed. Specifically, this review (a) describes the characteristics, features, and functions of touch-screen devices and applications that have been used in previous research; (b) identifies specific target skills and teaching procedures; and (c) calculates effect size estimates in order to analyze potential moderating variables. Overall, a review of this nature is intended to inform evidence-base practice and offer suggestions for future research.

Method

Protocol Registration and PRISMA Guidelines

The procedures for this meta-analysis were registered with the PROSPERO International prospective register of systematic reviews (Ledbetter-Cho et al. 2017a), a database which publishes protocols from systematic reviews prior to the initiation of data extraction in an effort to reduce reporting bias (Moher et al. 2015). The meta-analysis procedures were conducted in accordance with PRISMA guidelines (Moher et al. 2009), a set of evidence-based reporting procedures designed to increase the quality of systematic reviews.

Search Strategy

A systematic search was conducted in the following four electronic databases: Educational Resources Information Center (ERIC), Medline, Psychology and Behavioral Sciences Collection, and PsychINFO. Search terms were designed to identify studies that included participants with an autism diagnosis (i.e., autis*, ASD, Asperger*, or pervasive developmental disord*) and the use of a touch-screen device (i.e., mobile technolog*, pocket PC, phone, portable media, Mp3, palmtop comp*, handheld comp*, PDA, personal digital assis*, multimedia device, iPhone, iPod, iPad, portable electronic devi*, or tablet). The search was limited to peer-reviewed articles published in English from 2000 to 2017. Consistent with other reviews examining comparable technology, the year 2000 was chosen because touch-screen mobile devices became widely available following this time period (Mechling 2011; Nashville 2009). The first author subsequently conducted ancestry searches of included articles identified through the electronic database search.

The initial database search yielded a total of 427 records. Following the removal of duplicates and non-intervention articles (e.g., systematic reviews, commentaries), the first author screened the full text of 136 articles for inclusion. Nineteen met our predetermined inclusion criteria, 17 from database searches and two from ancestry searches. Figure 1 outlines the search and screening process.

Fig. 1
figure 1

Flowchart of included studies

Study Selection

Studies were required to meet multiple inclusion criteria that were determined prior to literature searches. First, studies must have provided intervention to a minimum of one individual diagnosed with an autism spectrum disorder (i.e., Asperger’s, ASD, autism, Autistic Disorder, or Pervasive Developmental Disorder-Not Otherwise Specified [PDD-NOS]) per author report, a medical professional, school diagnostic criteria, or alignment with criteria from the Diagnostic and Statistical Manual of Mental Disorders (DSM). If a study included participants who were not diagnosed with an ASD, only the data from participants with ASD were analyzed. Second, only studies that used experimental designs with the potential to demonstrate a functional relation between the intervention and dependent variable (e.g., multiple baseline design, reversal design, group design with appropriate randomization and controls) were considered. Additionally, studies must have utilized touch-screen mobile devices (e.g., iPods, iPads, personal digital assistants) in intervention delivery.

Finally, studies were required to target specific academic skills or academic engagement behaviors. Specific academic skills were defined as students’ accuracy during activities in the content areas of language arts, science, social studies, writing, or mathematics (Knight et al. 2013; Machalicek et al. 2008; Root et al. 2017). Academic engagement behaviors consisted of on-task behaviors that took place within the context of an academic task and were necessary for accurate performance (e.g., engagement with academic materials; on-task behavior; Koegel et al. 2010; McCurdy and Cole 2014). Interrater agreement on the application of the inclusion criteria was conducted on 20% of articles in the database and ancestry searches and reached 100% agreement.

Data Extraction and Coding

Data extracted from each study are reported in Table 1 and are summarized in terms of: (a) participant characteristics, (b) intervention materials and procedures, (c) dependent variables, (d) outcomes, and (e) research design and rigor. The cost of applications used in the studies are displayed in Table 2. The first author coded and summarized variables from all included studies. Co-authors independently verified the accuracy of the summaries for 30% of studies (Watkins et al. 2014). Interrater agreement was calculated on all coded variables by dividing the number of agreements by the total number of items and multiplying by 100. Interrater agreement was scored across 142 items (e.g., setting, implementer, effect size estimates) and reached 96%. Disagreements were resolved by discussion among co-authors.

Table 1 Summary of included studies
Table 2 Summary of software applications in included studies

Each participant’s functioning level was coded as lower, medium, or higher based upon the framework outlined by Reichow and Volkmar (2010). Specifically, individuals with limited vocal communication and/or an IQ below 55 were categorized as lower functioning. Participants were classified as medium functioning when they presented with emerging vocal communication and/or an IQ between 55 and 85. Individuals with well-developed vocal communication and/or an IQ above 85 were categorized as higher functioning.

In order to summarize outcomes using visual analysis, authors examined the data from included studies to code a success estimate for each intervention (Reichow and Volkmar 2010; Watkins et al. 2017). The success estimate provides a ratio of the number of implementations of intervention where an effect was observed out of the total number of implementations (Reichow and Volkmar 2010). Success is determined by employing visual analysis as described by the What Works Clearinghouse (Kratochwill et al. 2010; i.e., level, trend, stability, immediacy of effect, non-overlap, and consistency of data).

The Evaluative Method for Determining Evidence-Based Practices in Autism was applied to included studies to determine the quality of research (Reichow et al. 2008). This method has precedence in systematic reviews of applied intervention research and has demonstrated validity and reliability (Wendt and Miller 2012; Whalon et al. 2015). Studies were coded as having strong, adequate, or weak methodological strength based upon the number of primary and secondary quality indicators that they displayed. Primary quality indicators consist of descriptions of participants, independent and dependent variables, baseline conditions, visual analysis of data, and evaluation of experimental control. Secondary quality indicators consist of interobserver agreement (IOA), kappa, treatment fidelity, the use of blind raters, the evaluation of maintenance and generalization of behavior change, and social validity.

Studies coded as having strong methodological rigor received high ratings on all primary quality indicators and displayed a minimum of three secondary quality indicators. Studies classified as adequate received high ratings on a minimum of four primary quality indicators and included two secondary quality indicators. Studies with weak methodological rigor received high ratings on fewer than four primary quality indicators and/or included less than two secondary quality indicators.

Meta-analysis

In addition to using visual analysis to report outcomes, we calculated nonparametric effect size estimates in an effort to enable broader comparisons across studies. Given that there is no consensus regarding the most appropriate effect size metric for single-case research designs, we adhered to the current recommendation and utilized multiple approaches to estimating effect size. We calculated the improvement rate difference (IRD) and nonoverlap of all pairs (NAP; Kratochwill et al. 2013; Pustejovsky and Ferron 2017).

IRD is equivalent to the difference between the rate of improvement in baseline and treatment phases and has been widely applied to medical research (Parker et al. 2009). Advantages of IRD include its alignment with the Phi coefficient and compatibility with visual analysis (Parker et al. 2009). IRD scores above .70 indicate a large treatment effect, .50 to .70 moderate, and scores below .50 indicate small or questionable effects (Parker et al. 2009). NAP represents the proportion of data that are improved across contrasting phases following pairwise comparisons and is mathematically equivalent to the area under the curve (AUC; Parker and Vannest 2009). Advantages of NAP include its ability to produce valid confidence intervals and its alignment with visual analysis. NAP scores at or above .93 indicate a large treatment effect, .66 to .92 moderate, and scores at or below .65 indicate a small effect (Parker and Vannest 2009).

In order to prepare data for effect size calculations, graphs from each study were saved as images and imported into the WebPlotDigitizer data extraction software (Rohatgi 2017). WebPlotDigitizer has demonstrated validity and reliability for extraction of data from single-case design graphs (Moeyaert et al. 2016). Graphed data were converted into numerical data and exported into an excel spreadsheet which organized the raw data from each phase of individual studies. IRD and NAP were calculated using online software (Pustejovsky 2017).

Effect sizes were calculated for individual participants as well as at the study level. For studies employing multiple baseline, multiple probe, reversal, or combined designs, data from all adjacent AB phases were contrasted (Chen et al. 2016; Pustejovsky and Ferron 2017). For multielement designs, effect sizes were calculated by conducting between-condition comparisons (i.e., contrasting the data from the two intervention conditions). Two separate IRD and NAP scores were reported for studies using alternating treatment designs. Specifically, effect sizes were calculated by contrasting baseline phases with best treatment phases and by conducting between-condition comparisons (Chen et al. 2016; Pustejovsky and Ferron 2017).

In an effort to identify potential moderating variables, average IRD and NAP scores were calculated for different study and participant variables (e.g., participant functioning level, research rigor) and are reported in Table 3. We used the Stasitical Package for the Social Sciences (SPSS) to conduct the Mann–Whitney U test to determine if differences between effect size estimates in the different groups were statistically significant (i.e., contained a p-value of less than .05; Mann and Whitney 1947). The Mann–Whitney U test is appropriate for data with a non-normal distribution, such as the effect sizes calculated for the current meta-analysis, and is comparable to a non-parametric version of a t test (McKnight and Najab 2010).

Table 3 Effect sizes for study variables

Results

The procedures and outcomes of the 19 studies included in this meta-analysis are categorized by the domain of the targeted skills (i.e., academic skills or engagement behaviors) and presented in Table 1. All studies utilized single-case research designs and were published across six different peer-reviewed journals. Table 2 summarizes the variety and cost of the software applications utilized in the studies and Table 3 reports the average effect sizes, standard deviations, and indicates statistically significant differences between groups when examining specific study variables.

Participant, Setting, and Implementer Characteristics

A total of 53 individuals (including six females) diagnosed with an ASD participated in the included studies and ranged in age from 2 to 19 years (M = 10 years and 5 months). Participants included 32 children (coded for individuals ages birth through 11) and 21 adolescents (ages 12–21). Individuals received classification as lower functioning (n = 17), medium (n = 16), and higher functioning (n = 14) according to criteria outlined by Reichow and Volkmar (2010). For six participants, the level of functioning could not be determined due to limited information in the studies. Interventions were most often conducted in classrooms (n = 16), followed by homes (n = 2), and clinics (n = 2). One study was conducted across two locations (Neely et al. 2013). Interventions were implemented by researchers (n = 13) and teachers (n = 6).

Devices and Software Applications

Devices priced at less than $600.00 US dollars (USD) were used in the majority of studies and consisted of iPads (n = 15; $329.00), iPods (n = 1; $199.00), and a Samsung tablet (n = 1; $599.00). One study utilized a smart phone which retails for $724.00 and the remaining study included an HP iPAQ mobile for which pricing data was not available.

Table 2 displays the variety and current cost in USD of the software applications used in the included studies and reveals that the applications utilized by most researchers were cost free (n = 13). Eight studies used applications that ranged in cost from $1.99 to $13.99 (M = $5.99). The applications described in two studies were not available for commercial purchase nor was the cost reported (e.g., I-Connect; Clemons et al. 2016). Two studies each used two applications or device features in their investigation (Lee et al. 2015; Neely et al. 2013).

Pre-training on Devices

In 14 of the included studies, participants operated the touch-screen device during intervention. Of these, six did not provide participants with pre-training on the device (e.g., Burton et al. 2013). The use of prompting (e.g., verbal prompts, gestural prompts) was described in seven studies, including three studies that reported teaching participants to use the device within the context of a mastered skill (e.g., Spriggs et al. 2015). In the remaining five studies, the instructor presented and manipulated the touch-screen device during intervention.

Intervention Procedures and Dosage

In addition to the use of a touch-screen device, operated either by participants (n = 14 studies) or instructors (n = 5 studies), intervention packages included a variety of evidence-based procedures to teach the targeted skills. Studies often described a form of prompting (n = 9) to evoke the targeted skill, including least-to-most hierarchies (n = 3), priming (n = 2), verbal prompts (n = 2), time delay, and a system of most-to-least prompts (n = 1 each). Three studies utilized error correction procedures (e.g., replaying the video model and instructing the participant to perform the skill a second time; Cihak et al. 2010a). The use of reinforcement (e.g., delivery of a preferred item) was described in seven studies (e.g., Clemons et al. 2016). Five studies merely provided participants with the touch-screen device and did not describe the use of any prompts, program for reinforcement, nor the delivery of any supplemental instructional procedures (e.g., Spriggs et al. 2015; Van der Meer et al. 2015).

Session length was not reported in seven studies, precluding calculation of the total dosage of intervention. For the remaining studies, session length ranged from 5 to 30 min (M = 15 min) and sessions were implemented one to four times per week (M = 3). The total length of interventions ranged from 1.5 to 12 h (M = 5 h and 10 min), with the majority of interventions lasting no more than 5 h.

Target Behaviors

Specific academic skills were targeted in eight studies. Five studies taught participants to complete mathematics skills (e.g., comparing prices, double-digit subtraction; Weng and Bouck 2014; Yakubova et al. 2016) and two studies targeted reading comprehension (e.g., Zein et al. 2016). One study taught both paragraph-writing and mathematics (Spriggs et al. 2015).

Researchers targeted academic engagement in the eleven remaining studies. Seven studies targeted on-task behavior during academic work, including five studies that taught participants to self-monitor their behavior (e.g., Clemons et al. 2016; Crutchfield et al. 2015). Independent transitions between activities were targeted in three studies (e.g., Cihak et al. 2010a) and two studies compared participants’ engagement in academic tasks during teacher-led and iPad-assisted instruction (Lee et al. 2015; Neely et al. 2013).

Four studies evaluated collateral behaviors that were not directly targeted by intervention components. Specifically, three studies targeting on-task behavior during academic work also measured participants’ challenging behavior (Lee et al. 2015; Neely et al. 2013; Zein et al. 2016). Following an intervention that taught four participants to self-monitor their on-task behavior during class, researchers measured participants’ scores on a vocabulary assessment that was not utilized during intervention (Xin et al. 2017).

Intervention Effectiveness

Intervention outcomes, success estimates, and effect sizes of individual studies are reported in Table 1. Given that some studies reported multiple dependent variables (e.g., Lee et al. 2015) or utilized designs which necessitated the calculation of two effect sizes (e.g., Weng and Bouck 2014), IRD and NAP were calculated for a total of 28 variables. Effect sizes for dependent variables ranged from small to large, with most variables producing large effect sizes (n = 17; 61%), followed by moderate (n = 6; 21%), and small (n = 5; 18%). These effect size estimates were consistently aligned with the success estimates determined for each study using visual analysis (see Table 1).

Effect size estimates and their statistical significance were also examined across different variables of the included studies and are reported in Table 3. Participant functioning level did not significantly influence treatment effectiveness, with IRD and NAP scores indicating large effects across functioning levels. Participant age did impact treatment outcomes, with adolescent participants producing significantly higher effect size estimates than children (UIRD = 377.5; p = .037; UNAP = 376.5; p = .036). Effect sizes increased with ratings of methodological rigor. Studies with weak research rigor produced moderate effect size estimates but did not contain enough cases for statistical analysis. NAP scores indicated a significant difference between studies with adequate and strong methodological rigor (UNAP = 324; p = .038) while IRD scores did not reach statistical significance (UIRD = 331.5; p = .051).

With regard to intervention characteristics, interventions in which the participant operated the device (i.e., physically manipulated the device during intervention) resulted in significantly higher effect sizes in comparison to studies in which the instructor manipulated the device (UIRD = 218; p = .007; UNAP = 208; p = .004). Additionally, interventions that provided the participant with pre-training on the device prior to intervention produced significantly better treatment outcomes than those without pre-training (UIRD = 254.5; p < .001; UNAP = 250; p < .001). Interventions consisting of video modeling and self-monitoring produced large effect size estimates. Self-monitoring interventions resulted in significantly better treatment outcomes in comparison to explicit instruction interventions (UIRD = 99.5; p = .001; UNAP = 100.5; p = .002). Studies using visual supports and social stories did not contain enough cases for statistical analysis but produced moderate to large treatment effects.

Examination of the targeted skills revealed that studies teaching specific academic skills produced the largest effect size estimates, followed by interventions targeting engagement and challenging behavior. However, no statistically significant differences were found. Finally, intervention dosage did not significantly influence outcomes. Effect size estimates ranged from moderate to large and studies with the largest dosage produced the largest effects.

Research Strength

All included studies used single-case research designs to evaluate intervention effects on participants’ academic skills and engagement. No group designs met inclusion criteria. Studies were most commonly awarded ratings of strong methodological rigor (n = 9). Eight studies met criteria for adequate methodological rigor, with the remaining two studies receiving ratings of weak rigor. Adequate and weak ratings were due to overlap and instability in the data (n = 7), a lack of secondary quality indicators, or a lack of detailed participant description (n = 1 each).

Discussion

This meta-analysis identified 19 studies that incorporated touch-screen devices into interventions targeting the academic skills (n = 8) or academic engagement behaviors (n = 11) of 53 students with ASD. The majority of studies produced moderate to large treatment effects across participant functioning levels and received methodological ratings of adequate or strong. These findings support the conclusions of previous reviews that suggested interventions using touch-screen devices are generally found to be effective and that research in this area is increasing (Hong et al. 2017; Kagohara et al. 2013). In conjunction with the touch-screen device, most studies used teaching procedures with robust support in the research-base (e.g., prompting hierarchies, systematic reinforcement), which likely contributed to the positive outcomes reported.

Most studies utilized widely available devices (e.g., iPods) and cost-free software applications. It is somewhat surprising, however, that so few commercially designed educational applications were investigated (see Table 2). Rather than using pre-configured applications designed for intervention, researchers often used the device’s inherent video or photograph functions to create individualized teaching materials (e.g., video-enhanced activity schedules). Future research should examine the effectiveness of additional commercially designed educational applications on the market, such as those targeting reading or mathematics (e.g., Starfall®, Show Me Math®). In addition to the relative effectiveness of the various applications, usability and other social validity variables should be considered in future comparisons of software and device options.

Only eight studies targeted performance on specific academic skills such as writing, math, and reading comprehension, indicating a clear need for future research on the utility of touch-screen devices for teaching these skills (Kagohara et al. 2013). Six of these studies utilized video modeling or prompting, supporting previous research which has found video modeling effective for teaching a variety of skills to individuals with ASD (Bellini and Akullian 2007). Five studies taught students to utilize the touch-screen device to monitor their on-task behavior during academic work, including one in which participants monitored their own stereotypy (Crutchfield et al. 2015). Although students with ASD have been taught to use these devices, some tasks may be more complicated to perform on the device than others (require additional steps). For example, students may acquire the skills necessary to play the video model more efficiently than they acquire the skills necessary to use the same device for self-management. Because all but two self-monitoring interventions were implemented within the context of independent work, future research should evaluate the efficacy and social validity of technology-based self-monitoring during teacher-led instruction or group work.

The examination of unintended adverse effects of interventions that use touch-screen devices may have important implications for applied practice. Researchers have suggested that the use of electronic devices in teaching programs for individuals with ASD may lead to increases in untargeted stereotypy or challenging behavior (King et al. 2017; Ramdoss et al. 2011). Alternatively, interventions may produce desirable collateral effects across different skill domains, potentially increasing intervention efficiency (Ledbetter-Cho et al. 2017b; McConnell 2002). The results of the current review are promising, with three studies reporting collateral improvements in challenging behavior during interventions incorporating touch-screen devices (Lee et al. 2015; Neely et al. 2013; Zein et al. 2016) and one finding untargeted academic improvements (Xin et al. 2017). However, these findings must be interpreted with caution given the small number of studies that investigated the impact of the interventions on untargeted dependent variables.

Variations in the technology features utilized in intervention packages did not appear to influence treatment outcomes. Components such as voice-over narration and video modeling versus video prompting did not contain enough cases for statistical analysis but the data that were available indicated similar outcomes. These findings are consistent with previous studies that have reported success using various formats and approaches to video modeling (Bellini and Akullian 2007). Based on these results, practitioners should consider individualizing technology features and teaching procedures based upon the learner’s preferences (e.g., conduct a preference assessment on device features prior to intervention).

Studies were primarily conducted in applied settings, such as schools, supporting claims that individuals with ASD can benefit from using touch-screen devices in natural contexts. However, interventions were overwhelmingly implemented by researchers. This is concerning given that some adult instruction appears potentially necessary for learners to acquire targeted skills. Specifically, with the exception of three studies (Burton et al. 2013; Hart and Whalon 2012; Van der Meer et al. 2015), interventionists used instructional procedures (e.g., prompts, reinforcement) in addition to providing participants with the touch-screen device. Future research that utilizes natural intervention agents and describes the process for training them in replicable detail would be beneficial in determining the feasibility of such interventions. Indeed, classroom teachers have indicated that they feel underprepared to implement interventions involving technology and desire training in this area (Clark et al. 2015).

Regarding moderating variables, interventions in which the participant operated the device produced significantly larger effect size estimates compared to interventions in which the adult manipulated the device (see Table 3). It is possible that requiring students to operate the application increases attending to relevant stimuli, decreasing the need for adult-delivered prompts and increasing independence (Kimball et al. 2004). Additionally, some individuals may enjoy interacting with technology and be more likely to correctly perform the targeted academic skills. Providing participants with pre-training on the device prior to introducing intervention also produced significantly improved outcomes. Participants who did not receive pre-training may have experienced difficulty during intervention due to the necessity of acquiring two skills simultaneously (i.e., navigating the software and learning the targeted skill).

Interventions with adolescent participants produced significantly higher effect size estimates than those with children. This finding could be due to the fact that the targeted academic skills and engagement/self-monitoring behaviors may have been more developmentally appropriate for older participants (Lifter et al. 2005). Alternatively, the finding that adolescents benefited more may be due to some characteristic of the interventions more likely to be used with adolescent participants (e.g., self-monitoring, video modeling). Finally, the methodological rigor of the included studies was also found to moderate intervention effectiveness, with studies that received higher quality ratings producing significantly higher effect size estimates. This is most likely due to the method used to appraise research quality: studies with non-overlap of data across adjacent phases received higher marks for methodological rigor which contributed to larger effect size estimates (Reichow et al. 2008).

Limitations

Because all of the options for estimating effect sizes from single-case design studies have limitations, we followed current recommendations to employ multiple measures (IRD and NAP) that estimate the degree of improvement following intervention (Maggin and Odom 2014). Although alternative effect size measures which could potentially provide a more fine-grained analysis through regression models are beginning to appear in the literature (e.g., standardized mean difference statistics), these measures cannot currently be applied to many of the designs utilized by the included studies (e.g., multielement designs; Pustejovsky and Ferron 2017; Shadish et al. 2014).

To ensure a minimum level of study quality, we restricted our search to peer-reviewed publications that used an experimental design with the potential to demonstrate a functional relation. Studies that met these criteria were included in the analysis—even if they had ratings of weak methodological rigor - in an effort to provide a comprehensive review of a small research-base. Although there are concerns with including less methodologically rigorous studies in meta-analyses, further restricting inclusion criteria may have inflated positive outcomes (Sham and Smith 2014).

Because the included studies differed across a number of different variables (e.g., intervention components, dosage, participant age), interpretation of moderator variables should be considered cautiously. For example, interventions in which the participant operated the device included many studies with video modeling, self-monitoring, and explicit instruction. These intervention components, rather than who operated the device, may have contributed to the positive outcomes observed. Finally, interrater agreement at the level of entering search terms during the database search was not collected.

Implications for Practice

Despite these limitations, results from the current meta-analysis provide evidence that intervention packages incorporating touch-screen devices may be effective in improving the academic skills and related engagement behaviors of students with ASD in applied settings. Only eight of the included studies targeted specific academic skills, indicating that there is limited empirical support for the use of touch-screen devices in teaching academic content. The majority of included studies utilized instructor-created teaching materials. Touch screen devices are only as effective as the underlying instructional procedures and ineffective teaching procedures are not likely to become effective merely by delivery via a touch-screen device. Practitioners are encouraged to individualize touch-screen presented lessons based on the needs of the student and ensure that the instruction provided by the device is aligned with the evidence-base.

This meta-analysis suggests that touch screen devices are useful in improving academic skills and academic engagement in students with ASD. However, these devices should be viewed as a supplement to carefully-planned instruction involving evidence-based teaching practices. Finally, given the promising outcomes from interventions in which pre-training was conducted, educators should consider training the student to use the device and its software prior to introducing the targeted skill.