Safety skills are required to maintain personal welfare throughout the lifespan and begin to be acquired at an early age. The development of these skills is important given that unintentional injury is the leading cause of death among children (Heron 2016). Further, children with an autism spectrum disorder (ASD) are at a greater risk of sustaining an injury compared to typical children (Lee et al. 2008). Children with ASD may be at increased risk for harm and injury given difficulty or delay in social communication skills as well as difficulties with attention, motor control, and cognitive delays (Lee et al. 2008; Thomson et al. 2011). In addition, safety skills have been noted to be a primary concern of parents and educators that work with children with ASD and associated disabilities (Collins et al. 1991; Ivey 2004). Perceived deficits in these skills may also result in parental over-protection, which can impact the development of independent skills. Despite the importance of developing safety skills, there is considerably less research investigating skills in this domain compared to communication or daily living skills for individuals with ASD. Taken together, research on effective methods for teaching safety skills to children and adolescents with ASD that will generalize to and can be maintained in real world settings should be a priority.

Recent qualitative reviews of safety skill instruction methods for individuals with developmental disorders have been conducted examining individuals across age groups and with a broad range of disorders including the following: ASD, intellectual disability (ID), and developmental delay (Dixon et al. 2010; Mechling 2008; Wright and Wolery 2011). Notably, components of behavioral skills training (BST) for safety skill instruction were utilized in the majority of the studies included in these reviews. BST is a behavioral intervention package that includes providing the participant with a description of the target behavior and the context within which it should occur, modeling, role-play, positive and corrective feedback, and repetition until mastery is achieved. BST is often utilized in a naturalistic context wherein the behavior is expected to occur, or is taught with multiple exemplars and common stimuli to promote generalization within an analogue setting (Miltenberger 2008). BST has been demonstrated to be effective for teaching a variety of safety skills for typically developing children (e.g., Jones et al. 1989; Miltenberger 2008; Yeaton and Bailey 1978). It has also shown superiority over informational intervention approaches for this population (Poche et al. 1988).

Dixon et al. (2010) reviewed 27 studies on procedures for teaching safety skills to individuals with developmental disorders. Safety skills were separated into three categories: emergency situations, accident prevention, and pedestrian skills. Emergency situations typically focused on fire safety, whereas pedestrian skills included skills such as street crossing. Accident prevention included target skills related to home safety, child health, safe eating, and safe disposal of broken materials. Dixon et al. (2010) concluded that many components of BST and other similar procedures demonstrated effectiveness for safety skill instruction, specifically, prompting with prompt fading, reinforcement, and role-playing.

Mechling’s (2008) review of safety skill interventions examined individuals with ID rather than developmental disabilities more broadly. Thirty-six studies, conducted over a 30-year period, were reviewed. Six areas of safety skill instruction were examined: street crossing, home accidents, first aid skills, response to lures from strangers, fire safety, and emergency telephone use. Mechling (2008) concluded that interventions employing components of BST were effective in teaching safety skills, and role-playing was observed to be a more effective component over instruction and modeling.

Further, Wright and Wolery (2011) reviewed eight studies focusing exclusively on street crossing skills in individuals with disabilities including developmental disorders, ASD, and ID. Skills were taught either within the classroom, on the roadside, using virtual reality instruction, or a combination of intervention settings utilizing components of BST. Participants across studies performed street crossing skills more consistently following instruction in the roadside setting, which consisted of active rehearsal of the skill. Overall, these reviews suggest that a role-play or active rehearsal component is a key intervention piece in both analogue and natural settings.

Dixon et al. (2010), Mechling (2008), and Wright and Wolery (2011) highlight the need for safety skills to generalize to naturalistic settings in order to be effective. Although many of the interventions that occurred in analogue or simulated settings were reported to be effective, certain safety skills appeared to require naturalistic instruction to improve effectiveness of the intervention. Each of the above-mentioned reviews noted that interventions for pedestrian skills, including street crossing, were most effective when conducted in a naturalistic setting. Additionally, Dixon et al. (2010) reported naturalistic training to be superior to computer-based virtual reality for teaching fire safety skills. Thus, certain safety skills, most notably pedestrian skills, may require in vivo practice for the skill to be displayed on an actual road. Since most other safety skills included in these reviews were taught effectively in either a naturalistic or analogue setting, it is important to determine the success of analogue instruction for these skills with an ASD population. Individuals with ASD often display difficulty generalizing the skills they learn from one setting to another and thus in vivo practice may be particularly important for this population. The above-mentioned reviews included a subset of studies investigating children with ASD specifically; however, only two studies utilized in the current review overlap with the studies included in these reviews.

The current study is a meta-analysis examining the effectiveness of safety skill interventions for children, adolescents, and young adults with ASD. Although qualitative reviews have demonstrated the success of BST for safety skill instruction, the current meta-analysis expands this work in several ways. First, this meta-analysis focuses specifically on safety skill interventions for individuals with ASD, as compared to a broader range of developmental disorders, allowing for a more direct examination of these interventions designed for a specific clinical population. Additionally, previous reviews were qualitative in nature and did not include a quantitative examination of intervention effects. The current study addressed this limitation by evaluating the degree of methodological rigor and establishment of experimental control in each of the single-case designs, as well as by calculating effect sizes to facilitate comparison of intervention effectiveness across studies. The present meta-analysis thus has two aims: (1) to provide an objective measure of the effect of safety skill interventions for individuals with ASD and (2) to build upon past work examining generalization of safety skills to the natural environment by comparing the effectiveness of generalization efforts and the acquisition of safety skills in analogue versus naturalistic settings.

Method

Literature Search

A literature search on interventions targeting safety skills for individuals with ASD was conducted. PsycINFO, ERIC, and Google Scholar online databases were searched for peer-reviewed publications published between 1990 and 2016. The following keywords were included in the search for articles: safety skills, safety, crossing street, first aid, lures of strangers, teaching community skills, and telephone skills in combination with the keywords: autism, autism spectrum disorder, pervasive developmental disorder, and Asperger’s disorder. Reference lists of published work were also manually searched for additional relevant articles.

Procedure

Inclusionary criteria of the identified articles included the following: (a) written in English; (b) published in a peer-reviewed journal; (c) participant(s) diagnosed with autism spectrum disorder, autistic disorder, pervasive developmental disorder, or Asperger’s disorder; (d) used a single-case design; (e) included more than one data point for baseline measures; (f) evaluated an intervention targeting safety skills; (g) included an outcome measure of safety skills; and (h) displayed data in a line graph. Two independent raters reevaluated each article as to whether they met all inclusion criteria, from which there was 100 % agreement. A total of 11 articles were included in the present meta-analysis.

Variables Coded

Each article was summarized on the following variables: (a) number of participants, (b) age of participants, (c) target skills, (d) intervention method, (e) dependent variable, (f) intervention setting (i.e., naturalistic setting vs. analogue), (g) brief overall findings, (h) whether generalization and/or follow-up data were collected, and (i) an assessment of social validity. This information is presented in Table 1.

Table 1 Summary of included articles

Data Extraction

Relevant data were extracted from each of the studies using Ungraph™ and then exported into a Microsoft Excel file for further analysis. Ungraph™ is an empirically supported software program that allows users to manually extract data point values from digital copies of graphs (see Shadish et al. 2009). The approximate X- and Y-values for each data point on a graph are calculated from the extracted data.

Two authors independently extracted data from the 11 articles using Ungraph™. Inter-observer agreement (IOA) was calculated for 45 % of the articles by dividing the number of agreements by the total number of data points and multiplying by 100, yielding an average IOA of 94 % (range 70–100). The criterion of 45 % utilized in the present study exceeds the percentage typically recommended by behavioral researchers (Cooper et al. 2007). Due to human error associated with Ungraph™ (i.e., clicking the mouse on slightly different points on each data point), an agreement was defined as the comparison of data points being within one point value of each other (e.g., if one data point was 10 and another was 11). Both authors recalculated disagreements until agreement was reached. One author extracted the remaining Ungraph™ data.

Evaluation of Methodological Rigor and Experimental Control

Before determining effect sizes, the single-case designs presented in the selected articles were evaluated for the establishment of experimental control and methodological rigor. The goal of utilizing a single-case design methodology is to demonstrate the presence of a functional relation between the introduction of an independent variable (in this case a safety skill intervention) and the dependent variable, or outcome that the intervention is thought to influence. Each study was evaluated based on the certainty of evidence system utilized by previous single-case research meta-analyses (e.g., Lang et al. 2011; Ramdoss et al. 2011a, b; Roth et al. 2014). This system categorizes studies into suggestive, preponderant, or conclusive based on evidence of experimental control.

Consistent with the certainty of evidence system, studies were placed into the lowest category, suggestive, if there were methodological issues raising concerns about the presence of experimental control. These issues included use of a non-experimental design (e.g., AB design), insufficient information presented for replication, as well as the absence of or not acceptable levels of treatment fidelity and/or inter-observer agreement, which is defined for both variables as conducted on less than 20 % of observations and/or less than 80 % agreement. The middle category, preponderant, was used if the study included an experimental design demonstrating a functional relation (i.e., based on visual analysis), acceptable levels of inter-observer agreement and treatment fidelity, operationally defined dependent variables, and sufficient information for replication. However, these studies also included methodological weaknesses that impacted the level of confidence in the presence of experimental control. These included unstable or variable baseline phase, minimal baseline points, or other issues related to mean level change, trend, or immediacy of effect between baseline and intervention phases. Studies placed in the highest category, conclusive, met the criteria in the preponderant category but without the methodological weaknesses previously mentioned.

Certainty of evidence classifications and information used to determine these classifications is presented in Table 2, while criteria used for visual analysis and demonstrating a functional relation are presented in Table 3. To ensure adequate levels of agreement, two authors independently rated each criterion for each study. IOA was calculated for 100 % of criteria by dividing the number of agreements by the total number of criteria and multiplying by 100. IOA was calculated to be 95 % for the certainty of evidence criteria presented in Table 2 and 91 % for the visual analysis criteria presented in Table 3.

Table 2 Evaluation of certainty of evidence
Table 3 Criteria for demonstration of a functional relation

Measurement of Effect Sizes and Planned Analyses

Tau-U was used as a measure of effect size for the present meta-analysis. Tau-U is a non-overlap index of effect size based on pairwise data comparisons across phases (Parker et al. 2011; Rakap 2015). Tau-U scores range from 0 to 100 % and can indicate either a small (65 % or lower), medium-to-large (66–92 %), or large (93–100 %) effect. This effect size measure is advantageous over others (e.g., non-overlap of all pairs, NAP; percentage of all overlapping data, PAND) due to its ability to control for monotonic trend (i.e., linear and nonlinear positive baseline trend). Similar to other non-overlap methods, it can be used in conjunction with visual analysis and does not require assumptions as in parametric statistics. Tau-U is calculated by the mathematical expression (S/Number of pairs) where S is the value obtained from a Kendall’s rank correlation (KRC) following the coding of phase values. The number of pairs in the denominator refers to the product of the phases’ N’s (i.e., the number of data points in each phase). An online Tau-U calculator (http://www.singlecaseresearch.org/calculators/tau-u) was used for the present analysis (for a description of this calculator, see Vannest et al. 2011). Additionally, Kruskal-Wallis tests were used to determine any significant differences between the distribution of Tau-U scores across intervention methods and settings using SPSS statistical software. The Kruskal-Wallis test, a non-parametric test, was selected due to the small number of scores requiring comparison.

Results

Participant Characteristics

A total of 11 studies were included in the final analyses, yielding data for 34 participants (25 males, 9 females). Four participants (11.8 %) were preschool age (2–5 years old), 23 (67.6 %) were school age (6–15 years old), and 7 (20.6 %) were adolescents or young adults (16–24 years old). As for diagnoses, 27 (79.4 %) were described as having an ASD and 7 (20.6 %) as having both ASD and ID. The majority of studies provided only descriptive prose regarding participants’ abilities (e.g., “able to follow directions”), with only two of the 11 studies providing standard scores of any type (e.g., IQ).

Certainty of Evidence

To evaluate presence and level of experimental control, studies were classified as having suggestive, preponderant, or conclusive certainty of evidence. Seven studies were classified as suggestive (low evidence), and four were classified as conclusive (strong evidence). None were classified as preponderant (moderate evidence). The majority of studies (n = 6) classified as suggestive were done so due to missing treatment fidelity data. If the studies were to be re-categorized excluding the treatment fidelity criteria, a total of seven studies would be classified as conclusive, three would be classified as preponderant due to minimal baseline points or issues with baseline stability, and one would remain in the suggestive category due to use of a non-experimental design. Although two of these studies included one participant with only one data point (Bergstrom et al. 2012, 2014), these studies were retained in the review due to methodological strengths for all other criteria (e.g., use of an experimental design, baseline stability and more than one data point for other participants, change in level, trend, and immediacy of effects).

Overall Effects of Safety Skill Interventions

The 11 articles presenting 45 single-case design interventions yielded an average Tau-U score of 91 %, indicating medium-to-large intervention effects. Tau-U scores for the reviewed studies ranged from 72 to 100 %. Four studies yielded medium-to-large effects, whereas seven studies yielded large effects. Tau-U scores and associated categorical descriptions are displayed in Table 4.

Table 4 Effect size measures of included articles

Targeted Behavioral Outcomes and Intervention Method

A total of six behaviors were targeted across studies. The most frequently targeted behavior was abduction prevention (n = 4), followed by seeking help when lost (n = 3). The remainder of behaviors were taught in only a single study: fire safety, reading product warning labels, household safety, and disposal of broken materials.

There were a variety of behavioral intervention procedures utilized across studies that were divided into four main categories: video modeling with or without rehearsal, live modeling with or without rehearsal, role-play, and single error correction procedure (e.g., constant time delay, most to least prompting). While all studies used error correction procedures, four studies used only a single error correction procedure. The remaining studies were divided as follows: three used video modeling either with or without rehearsal, two used a live model, and two used a role-play component.

Effect sizes were compared across intervention methods. Studies using a role-play component were found to have the highest average Tau-U score (98 %, range 95–100), followed by video modeling (Tau-U = 91 %, range = 78–100), single error correction procedure (Tau-U = 90 %, range = 81–100), and live modeling (Tau-U = 86 %, range = 72–100). Average Tau-U scores across intervention methods are displayed in Fig. 1. A Kruskal-Wallis test revealed that effect sizes did not significantly differ across intervention methods (H(3) = .969, p = .809).

Fig. 1
figure 1

Average effect sizes across video model, live model, role-play, and error-correction intervention methods. The Tau-U score is reflected as a percentage ranging from 0 to 100 %

Intervention Setting

Intervention settings utilized across studies were divided into three categories: naturalistic, analogue, or a combination. For a setting to be considered naturalistic, the targeted behavior must have been taught in an environment in which it naturally occurs (e.g., reading warning labels in a supermarket). To be considered an analogue setting, the targeted behavior must have been taught in an environment in which it does not naturally occur (e.g., reading warning labels at a desk in school). Studies that taught the targeted behavior in both naturalistic and analogue settings were labeled as “combination.” The setting types were roughly equally distributed across categories with three studies using a naturalistic setting, three using an analogue setting, and five using a combination of naturalistic and analogue.

Effect sizes were compared across the intervention setting types. Studies using a combination of naturalistic and analogue settings were found to have the highest average Tau-U score (92 %, range = 78–100), closely followed by those using an analogue (Tau-U = 91 %, range = 72–100), and naturalistic only setting (Tau-U = 89 %, range = 81–100). Average Tau-U scores across intervention settings are displayed in Fig. 2. A Kruskal-Wallis test revealed that Tau-U scores did not significantly differ across setting type (H(2) = .204, p = .903).

Fig. 2
figure 2

Average effect sizes across naturalistic, analogue, and combination intervention settings. The Tau-U score is reflected as a percentage ranging from 0 to 100 %

Generalization Effects

A total of six studies conducted in an analogue or a combination naturalistic and analogue setting included a generalization component. Generalization settings included novel school settings and community locations not previously employed during the training or maintenance phases of the interventions. For those studies that provided sufficient generalization data (i.e., more than one or two generalization probes), the average Tau-U score for generalization was 92 % (range = 75–100). Three studies that were conducted in settings including a naturalistic component did not include sufficient generalization data for other settings, but did include follow-up data. While two of the studies with follow-up data demonstrated Tau-U scores of 100 %, the remaining study including follow-up data was found to have a follow-up effect size in the low range (Tau-U = 42 %). An additional three studies, conducted in either naturalistic or combination settings, did not include sufficient generalization or follow-up data.

Social Validity

An examination of the social validity of the target behaviors, procedures, and results was also included to provide support for the clinical significance or meaningfulness of the data reported across studies. Three studies formally included social validity data (Akmanoglu and Tekin-Iftar 2011; Bigelow et al. 1993; Hoch et al. 2009). For example, Akmanoglu and Tekin-Iftar (2011) provided parents of participants with a social validity questionnaire to complete, which asked parents about their attitudes towards the aims and procedure of the study, as well as the importance of changes in the target behavior. The authors reported a 67 % return rate for this questionnaire. Results indicated that parents agreed that self-protection skills are an important target to teach children with ASD, and the procedure of using typically developing peer models was well accepted; as was the use of multiple sites and unfamiliar adults for generalization purposes.

Studies that did not include social validation data (n = 8) were evaluated by the authors for social validity based on criteria outlined by Horner et al. (2005). The criteria were as follows: inclusion of socially important dependent variables, demonstration that the independent variable can be easily and correctly administered by caregivers across time, and demonstration that the intervention produced an effect that met the defined, clinical need. Studies were classified as having high social validity if all three components were considered to demonstrate social validity based on these criteria. Studies were classified as having moderate social validity if two of the above components demonstrated social validity, or were classified as low if only one or none of the components demonstrated social validity. Based on this classification, two studies demonstrated high social validity and six demonstrated moderate social validity. None of the included studies demonstrated low levels of social validity. Studies were most commonly classified as having moderate levels of social validity based on the presented (or lack of) generalization and/or follow-up data, which it made it difficult to conclude that the intervention could be successfully implemented by caregivers or whether it produced an effect meeting a clinical need. IOA was conducted between two raters using the above criteria for all rated studies and was calculated to be 88 %.

Discussion

The present meta-analysis examined effectiveness of safety skills interventions for individuals with ASD. Intervention methods, setting, and generalization efforts were also examined to determine what components of intervention led to the most advantageous outcomes. Analyses revealed that safety skill interventions employing components of BST demonstrated medium-to-large effect sizes across a range of targeted safety skills. Certainty of evidence criteria revealed 36 % of included studies demonstrated conclusive or strong intervention effects, while 64 % demonstrated suggestive or low evidence of effectiveness. For the studies demonstrating low evidence, lack of data on treatment fidelity was a concern. If this variable were to be removed from the criteria, the majority of studies evaluated in this meta-analysis (64 %) would fall in the conclusive category, with the remaining few in the preponderant category, and one in the suggestive category.

The large number of studies falling in the conclusive category and a minority falling in the preponderant category (excluding treatment fidelity) is a positive finding of this review and is consistent with the effect size results. Given that lack of data on treatment fidelity was the main limitation for categorization as conclusive experimental control, future single-case designs evaluating safety skills for children with ASD should strongly consider including information on this variable. Data on treatment fidelity is necessary to ensure that evidence-based practices are implemented as intended.

When examining specific components or procedures included in the intervention package that led to the most advantageous outcomes, role-play was found to have an effect size in the large range, while video modeling, single error correction procedures, and live modeling demonstrated effectiveness in the medium-to-large range. Although the difference was not statistically significant, role-play demonstrated the largest effect size, which was not surprising given it was considered an essential component of BST by the aforementioned reviews. Taken together, these findings suggest that a role-play or active rehearsal component is recommended when designing safety skill interventions for individuals with ASD. Findings from the current study are consistent with findings from recent reviews of safety skill behavioral intervention packages for individuals with other developmental disorders (Dixon et al. 2010; Mechling 2008; Wright and Wolery 2011).

The acquisition of safety skills in analogue versus naturalistic settings and the effectiveness of generalization and maintenance efforts were also examined in this study. Analyses of various intervention settings (analogue only, combination, naturalistic only) revealed no statistically significant differences. While this finding is consistent with past work examining the type of safety skills reviewed in the present study, other reviews have found naturalistic instruction to be more effective over simulation scenarios and classroom instruction for certain other safety skills, particularly pedestrian skills (Dixon et al. 2010; Mechling 2008; Wright and Wolery 2011). Thus, the superiority of naturalistic settings over analogue settings may depend on the type of safety skill that is targeted and the specific population.

Unfortunately, not all studies included in this meta-analysis provided generalization and maintenance data despite the importance of considering such information when designing an intervention. Social validity data were consistent with this notion and further highlight the need for future studies to consistently include information on the generalization and maintenance of targeted behaviors to help support the clinical meaningfulness of the intervention and outcome. For studies that did report such data, rates of generalization and maintenance were generally high (with the exception of one study; Winterling et al. 1992) across intervention setting types. The participants in the Winterling et al. (1992) study demonstrated lower performance at one-month follow-up; however, 1-week follow-up was similar to performance during the intervention phase. Notably, this study also had the lowest effect size measure, which may have contributed to less success at a longer follow-up. Overall, these findings suggest that safety skill interventions incorporating components of BST demonstrate high generalization and maintenance regardless of whether the skill was taught primarily in a naturalistic or analogue setting.

These findings have clinical implications when considering the design of an intervention program targeting safety skills. Given equivalence of intervention effects as well as generalization and maintenance across intervention setting types, intervention decisions can be made based on available resources. For example, if a setting providing intervention services for individuals with ASD has limited staff resources to transport students to naturalistic settings for practice, these skills can be sufficiently instructed in a classroom setting. Additionally, when planning an intervention, client and caregiver preferences may be accounted for without sacrificing successful intervention outcomes. It is reasonable that caregivers might display concerns regarding practicing safety skills in naturalistic settings when teaching responses to dangerous scenarios. The results of this meta-analysis suggest that these skills may be taught successfully in an analogue or simulated setting. This finding indicates that the use of resource intensive paradigms might not be necessary to teach certain skills given equivalence of analogue and naturalistic training.

Given the successful use of virtual reality paradigms with children with ASD (e.g., Josman et al. 2008; Self et al. 2007; Strickland et al. 2007), it is important to determine whether the resources and cost associated with these methods is warranted based on empirical evidence. Use of virtual reality paradigms might be most useful in cases in which it is difficult to simulate an experience in an analogue setting (e.g., natural disaster) or when training in an analogue setting does not generalize and there are issues related to feasibility or ethical concerns limiting the use of naturalistic instruction.

Limitations and Future Directions

There are limitations to this meta-analysis that must be addressed. First, the overall sample size of studies investigating safety skills with individuals with ASD was small and included studies that investigated a variety of safety skills. This resulted in a small number of studies per specific safety skill. Thus, comparison of the relative effectiveness of different types of safety skill instruction was not included. Second, since effect size measures for single-case designs are relatively new, the method in which Tau-U was calculated for certain study designs (e.g., changing criterion design) was decided based on conceptual rather than empirical information. This procedure involved utilizing the previous criterion as a baseline for each subsequent criterion. Finally, this meta-analysis should not be considered a comprehensive review of all safety skill studies conducted with young individuals with ASD, as it does not include studies utilizing group designs, studies that did not provide a graphical representation of the data, or non-peer-reviewed studies.

The present meta-analysis suggested equivalence of interventions across methods and settings. Future research should focus on determining efficiency of safety skill acquisition and generalization to naturalistic settings when training occurs in an analogue setting. Additionally, some individual variability in acquisition of safety skills and generalization across studies suggests moderating factors, such as verbal and cognitive ability. The results of this review suggest that individual difference factors may be more important indicators for treatment selection and effectiveness than diagnostic standing. These should be investigated in future research, as there may be clinical implications for the design of individualized interventions based on participant characteristics.

In the current study, Tau-U was used to determine the effect size of 11 studies focusing on safety skill instruction for 34 children, adolescents, and young adults with ASD. Like many interventions for individuals with ASD, the interventions examined in this study were single-case in nature. Single-case design is advantageous for establishing the effect of an intervention on a given target behavior. Many studies utilizing single-case designs rely on visual analysis to determine intervention effects (Rakap 2015). Although visual analysis is the only available method for indicating presence of a functional relation, objections to the sole use of visual analysis stem from the concern that visual analysis can be subjective, and this method lacks standards for making decisions regarding intervention effects (Kazdin 2011). Visual analysis also does not take into account the statistical magnitude of the effect, which is useful to compare the effects of different treatment types across studies. For this reason, consolidating the findings of single-case design studies and using a quantitative measure of intervention effects, in addition to an examination of experimental control, contributes to the evidence base of available treatment options for individuals with ASD in a unique way.