There are many different safety skills that address threats we experience every day (e.g., pedestrian safety, household safety), as well as other dangerous situations (e.g., unattended firearm, fire safety, abduction prevention). Safety threats cause millions of injuries and deaths in the Unites States every year. To name a few, in 2019, the Centers for Disease Control and Prevention (CDC) found (CDC, 2019) an estimated 1.75 million accidental poisoning injuries, 320,844 nonfatal fire related accidents, 20,814 nonfatal firearm related accidents, 172,814 nonfatal pedestrian related accidents, and 98,786 sexual assault injuries. To prevent these injuries or accidents and other negative consequences associated with poor safety skills, a body of recent research efforts has focused on teaching safety skills to children and vulnerable populations (Giannakakos et al., 2020). All safety skills training interventions have the unified goal of teaching the essential skills to prevent harm during safety threats (Wiseman et al., 2017). Common procedures used in safety skills training include informational approach, prompting, video modelling, behavioral skills training (BST), and in situ training (IST).

Informational approaches typically use written materials that can easily be disseminated to inform people of the importance and steps of a safety skill (Miltenberger, 2008). However, the informational approach has not led to successful demonstration of the skills, even when learners are capable of repeating back the information (Himle et al., 2004; Miltenberger, 2008). Prompting is a strategy that adds a stimulus to increase the likelihood of a response being evoked (Van Laarhoven et al., 2009). In general, prompts are identified as a hierarchy based on level of intrusiveness, from most intrusive to least (e.g., from full physical prompt to verbal prompt), and are implemented in skill acquisition utilizing the hierarchy in some way (Doyle et al., 1988). For example, Bigelow et al. (1993) used a verbal prompt, “fire” to teach a 9-year-old child with ASD fire safety skills. The target skill was walking to the front door either independently or with physical prompts, with the long-term goal of exiting the home from any location in the house with the verbal prompt of “fire.” Although prompting procedures have been used in isolation when teaching safety skills (e.g., Summers et al., 2011), they have also been used in combination with other training methods (e.g., Godish et al., 2017). One commonly used prompting strategy is time delay. This strategy involves starting with an immediate prompt (i.e., 0-s delay) and slowly increasing the delay between the presentation of the discriminative stimulus and the prompt (Haden & Zane, 1987). There are some variations to this time delay prompting strategy (e.g., progressive time delay and constant time delay) that change how the time delay is managed.

Video modeling involves a learner watching a video of a model demonstrating the target skills, and then the learner would imitate the behavior of the model in the video during assessment (Alberto et al., 2005). Video modeling has been used in a variety of safety skills, such as abduction prevention, what to do when lost, fire safety, and simple first aid (e.g., Akmanoglu & Tekin-Iftar, 2011; Bassette et al., 2018; Mechling et al., 2009; Ozkan, 2013). Researchers have also reported limited efficacy and generalization of video modeling, and several variants of video modeling have been used to improve the training outcomes for safety skills for individuals with and without disabilities (e.g., Gast et al., 1993; Himle et al., 2004; Miltenberger et al., 1999). These variations include adding simulation or prompting to video modeling (Akmanoglu & Tekin-Iftar, 2011; Bassette et al., 2018; Spivey & Mechling, 2016). For example, Akmanoglu and Tekin-Iftar (2011) used video modeling in combination with prompting (graduated guidance) throughout the intervention phase to teach children with ASD to respond to the lures of strangers.

BST has been widely used in safety skills intervention literature. BST involves providing trainees with information, instruction, modeling, opportunity for the trainee to rehearse the safety skill, and providing feedback on their performance (Miltenberger, 2008). The rehearsal and feedback components are repeated until mastery of the safety skill is demonstrated. Although BST alone has been effective in teaching safety skills to individuals with disabilities, the results of BST have been mixed with regard to how effective it is for generalization of the safety skill to the natural environment. For this reason, similar to video modeling research, some researchers have added additional components to BST in an attempt to enhance the efficacy of BST (Miltenberger et al., 1999). One of the most common variations includes adding prompting strategies into BST (e.g., Knudson et al., 2009; Lumley et al., 1998). The literature also indicates that incorporating in-situ assessments into BST may be a critical component of safety skills interventions (Miltenberger, 2008). Using an in-situ assessment approach, in which trainees are unaware that they are being observed, researchers have found that safety skills do not consistently generalize to new situations or environments (Himle et al., 2004; Miltenberger et al., 1999). In general, when a trainee fails an in-situ assessment, in-situ training begins immediately. The in-situ training (IST) utilizes the same training structure as BST; the key distinction is that IST occurs in a situation during which a trainee fails to perform the targeted skills and is unaware they are being assessed in an in-situ assessment. Studies often include a BST condition prior to using IST; however, it is possible to skip BST and use IST in isolation. The literature indicates that IST may effectively generalize trained safety skills to the natural environment because it follows a failed real-life safety threat (e.g., Egemo-Helm et al., 2007; Fisher et al., 2013; Sanchez & Miltenberger, 2015).

Although the safety skills interventions for individuals with disabilities have resulted in some success, the dose of intervention required for acquisition of the skills appears to vary widely across safety skill types, intervention methods, functioning levels and diagnoses of the trainees, and skill levels of the implementors (e.g., Mechling et al., 2009; Ozkan, 2013; Spivey & Mechling, 2016). Therefore, an investigation into dose of intervention (i.e., training frequency, number of training sessions) required to achieve the success criterion is needed to provide practitioners with better information concerning the selection of an intervention used with their trainees with disabilities (Eldevik et al., 2012). Furthermore, determining an optimal dose of intervention can inform the type of adaptation that can be made during intervention when the learners do not adequately respond to the intervention (Virués-Ortega, 2010). For example, decisions could be made whether it is necessary to increase frequency of intervention (e.g., from one session per week to two), length of intervention sessions (e.g., from 15 min to 30 min), or duration of intervention (e.g., from 20 sessions to 30 sessions). The adaptation of the intervention based on dose may help determine whether it is necessary to add an additional intervention component (Virués-Ortega, 2010).

Several reviews of the literature on safety skills for individuals with disabilities have been conducted to date. In a review of eight studies on pedestrian skills training for individuals with disabilities, Wright and Wolery (2011) found that classroom-based instruction, in-vivo training, and virtual simulated training were all effective in teaching pedestrian safety skills for this population. However, the authors did not examine whether any one intervention strategy was more effective than another. Mechling (2008) reviewed safety skills intervention studies conducted over a 30-year period on individuals with intellectual disabilities (ID). Mechling discussed some variability in the effectiveness of training methods across safety skills (e.g., emergency telephone use, fire safety, first aid skills, street crossing). In general, BST and simulated training yielded positive results across skills; however, specific study variables that might have affected the results were not examined in this review. Dixon et al. (2010) examined studies that used BST and in-vivo training to teach a variety of safety skills to individuals with disabilities. They found that these procedures were effective in training safety skills; however, they did not examine other training methods that are commonly used in safety skills training literature (e.g., video modeling).

The reviews of safety skills discussed above rely on visual analysis (e.g., level, trend, variability, overlap of data points, immediacy of effect, replication of effect) to determine treatment effectiveness, which has advantages and limitations associated with it, such as smaller sample sizes. Smaller sample sizes increase the feasibility of a researcher monitoring the ongoing data collection and allow for more information on individual participants, but also makes it difficult to use statistical analysis (Nugent, 1996). However, when visual analysis is not adequately used, researchers may fail to objectively evaluate the impact of the intervention. Meta-analyses can help address the limitation of visual analysis as the sole determinant of intervention effectiveness by using quantitative metrics to draw conclusions about the effectiveness of different aspects of an intervention (Maggin et al., 2011). Researchers have supported the movement of combining visual analysis with statistics. Each has advantages and limitations, but some of those limitations can be ameliorated by integrating the two (Kratochwill & Levin, 2014; Nugent, 1996).

Wiseman et al. (2017) completed a meta-analytic review of 11 SCD studies on safety skills of individuals with ASD. Their analyses, which used Tau-U indices, revealed a moderate-to-large effect size for interventions that used a BST component, live modeling, error correction procedures, and video modeling. They also found that there were no statistically significant differences in effect sizes across intervention types and training settings (natural, contrived, combined). Although the authors indicated that the treatment effect was dependent on the type of safety skill and whether the setting was contrived or naturalistic, they did not complete statistical analyses on each safety skill and setting. The authors did not examine additional variables that might moderate the effects of safety skills, such as type of implementer and comorbid disabilities. This may have been due to the scarcity of studies included in their analyses. In addition, Wiseman et al.’s study exclusively examined studies on individuals with ASD.

Therefore, the current meta-analytic review study aimed at addressing the gaps in the literature on safety skills interventions for individuals with disabilities by examining studies that involved the individuals whose primary diagnosis was ID. In particular, the study analyzed SCD studies to determine: (1) the overall quality of the studies, (2) the characteristics of the studies on safety skills interventions for individuals with ID, (3) the magnitudes of effects of the varying interventions across studies, and (4) the moderating variables (e.g., intervention type, implementer, setting, and grade) that influence the overall effectiveness of interventions. The findings were used to provide recommendations for practice and future research.

Method

Article Search Procedures

A comprehensive search for SCD studies was conducted using the Web of Science and PsychINFO electronic databases to identify studies meeting inclusion criteria. The search was completed using keywords within the database fields of population and dependent measure. The keyword searches were limited to articles published between 1998 and 2021. In a narrative review, Mechling (2008) provided a comprehensive summary of studies on safety skills intervention for individuals with ID that were reported during a 30-year period. In the current study, the literature search was limited to the most recent 22-year period because we found only one study that corresponded to the period of Mechling’s review (1976-2006). A Boolean operator “or” was used to search keywords within each field. Within population, the keywords of intellectual disability and mental retardation were searched, and within dependent variable, the following keywords were searched: safety, pedestrian skill, street crossing, telephone skill, first aid, accident prevention, lures of strangers, crime prevention, child abduction, and molestation.

Article Selection Criteria

To be eligible for inclusion in the meta-analytic review, each study was required to meet the following criteria: (1) published in a peer-reviewed journal; (2) written in English; (3) published from 1998 to February 2021; (4) included participants whose primary diagnosis was reported to be ID with or without provision of a standardized IQ test score (if 50% or more of the participants had an ID and the remaining participants had another type of developmental disability, the study was included); (5) included safety skills as a dependent variable; (6) employed an SCD. We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Moher et al., 2009) in searching for and selecting studies. The initial search resulted in selection of 2,292 articles for screening. Of these, 1,121 articles were excluded due to duplication or irrelevance. Following the removal of the duplicated and irrelevant articles, we screened all abstracts of the remaining 1,171 articles, resulting in the elimination of an additional 876 articles. The next screening involved reviewing the full text of the remaining 295 articles during which an additional 260 articles were excluded. Through this screening process, a total of 35 articles were selected for final screening for inclusion or exclusion. Finally, through reviews of previous systematic or meta-analytic reviews of safety skills intervention studies and Google Scholar search, 7 additional studies meeting the initial inclusion criteria were identified, resulting in 42 articles that were selected to undergo final screening.

We further reviewed the 42 articles in their entirety and discussed whether these articles met more specific inclusion criterion. Exclusion criteria included: (1) did not provide graphical data (Watson et al., 1992); (2) focused only on functional living skills, such as street navigation (Kelley et al., 2013; McMahon, Cihak et al., 2015a; McMahon, Smith et al., 2015b; Smith et al., 2017), pushing the “next stop” button on the bus (Mechling & O'Brien, 2010), using community resources (Çattık & Ergenekon, 2018), and community skills (Taras et al., 1993); (3) focused on safety of others (Feldman & Case, 1999); (4) did not clearly describe the independent variables (Dukes & McGuire, 2009); or (5) fewer than 50% of participants were diagnosed with ID as their primary diagnosis (Bıçakcı & Seray, 2019). At the conclusion of the selection process, a total of 31 articles that used a multiple probe design (n = 19), multiple baseline design (n = 11), or alternating treatments design (n = 1) remained for in-depth analysis. Of these, one article (Collins et al., 1993) included two experimental studies, and we provided the general characteristics of the two studies separately in the results section. Figure 1 depicts the flow of the study selection process.

Fig. 1
figure 1

Flow Chart of Study Selection Process

Variables Coding

The articles that met the inclusion criteria were coded using coding spreadsheets based on previous review articles (Dixon et al., 2010; Mechling, 2008; Wiseman et al., 2017). The coding variables included: (1) participant demographics (number of participants, gender, age, grade level, comorbidity, cognitive level), (2) setting (for intervention and for generalization), (3) type of intervention, (4) dose of intervention (session length, mean number of sessions needed to reach training criterion), (5) intervention implementer, (6) reporting of fidelity, (7) reporting of social validity, (8) evaluation of maintenance and generalization, and (9) dependent variables.

For setting, “residential facility” (Miltenberger et al., 1999) and “training home” (Mechling et al., 2009) were coded as “group home.” “Campus” (Akmanoglu & Tekin-Iftar, 2011) was coded as “community” because it included parks, streets, and faculty backyards. Type of intervention was coded into the following categories: (1) BST, (2) BST plus other instructional procedures (IST, time delay, prompting), (3) video modeling, (4) video modeling plus other instructional procedures (prompting, reinforcement, simulation, community-based instruction, time delay), (5) CBI, (6) simulation (e.g., crossing a simulated street in a gym), and (8) time delay (e.g., constant time delay). Three studies (Christensen et al., 1993; Marchand-Martella et al., 1992b; Spooner et al., 1989) that employed social modeling were coded as BST, even though the researchers did not use the term “BST,” because the social modeling consisted of modeling, rehearsal, and feedback. If a study did not report the mean number of sessions, the mean number of sessions to criterion (i.e., criteria to consider a participant trained) was calculated from averaging the number of intervention sessions across participants. If a participant age was reported in months, the age was converted to years. The intervention implementer was coded into the following categories: teacher, researcher, instructor, staff (e.g., instructional aide, nurse), and peer. The graduate or undergraduate student, trainer, and instructor were coded as researcher when they participated as author.

Approximately 35.5% of the included studies (n = 11) were randomly selected and independently coded by the first two authors to assess intercoder agreement on coding variables. Before coding each study, the authors discussed the definitions of coding variables and practiced coding data using the coding spreadsheets. After discussion, the two coders independently coded one article to assess agreement, and then coded the randomly selected articles. The initial average intercoder agreement on coding variables was 94.9% (81.8%–100%). For each variable, inter-rater agreement was calculated by dividing the number of agreements over the number of possible agreements. In the event of disagreements, the third author reviewed the disagreements to make a final decision.

Quality Assessment of Selected Articles

We used the WWC standards for quality assessment of selected studies (What Works Clearinghouse, 2017), which provide guidelines for evidence-based decisions. The WWC Standards Handbook includes the standards for SCD research. The WWC Standards Handbook consists of two parts: five criteria (quality indicators) and design standards (DS) determining the scientific evidence. The five quality indicators are: (1) systematic manipulation of the independent variable, (2) graphical illustration of evidence, (3) at least three attempts with sufficient data points to evaluate the demonstration of an intervention, (4) eligible outcomes that meet WWC requirements, and (5) measures of effectiveness that can be attributed solely to the intervention. After assessing studies against the five quality indicators, we rated each study to determine whether it met the design standards without or with reservation. Studies were rated as meeting What Works Clearinghouse Design Standard (WWC DS) without reservations if all five criteria were met and had a minimum of 5 data points per phase. Studies that met the criteria with fewer than 5 data points per phase, and those meeting all five criteria were rated as meeting the WWC DS with reservations. Studies that did not meet all criteria were rated as not meeting the WWC DS. Table 2 lists the WWC DS score for each study. The studies were given a numerical value to identify what level of the WWC DS they met. Studies that met the design standards without reservations were given a 2, studies with reservations were given a 1, and a 0 was given to studies that did not meet the WWC DS.

Effect Size Calculation

The effect size calculation involved a three-step procedure: (1) data extraction, (2) Tau-U effect size calculation, and (3) aggregation of effect-size calculation. The authors used the Digitizelt version 2.2 digitizer software (Bormann, 2012) to obtain data from line graphs included in individual SCD studies, which allows the users to digitize data from graphs and export the data to Excel spreadsheets for further analysis. The Digitizelt has been identified as a reliable and valid data extraction software application to digitize graphical data in evaluating SCD studies (Rakap et al., 2016). In order to calculate the effect size of each intervention, we used Tau-U indices, which range from -1.00 to +1.00 (Parker et al., 2011). The Tau-U indices not only indicate the size of the effect through the nonoverlapping ratio between baseline and intervention phases, but also control unstable baseline trends. The following formula was used to calculate Tau-U indices:

$$ \mathrm{Tau}-U=\frac{S_p-{S}_A}{mn} $$

m = number of baseline phase observations

n = number of treatment phase observations

Sp = Kendall’s S statistic calculated for the comparison between phases

SA = Kendall’s S statistic calculated on the baseline trend

In interpreting Tau-U values, effect sizes lower than .20 are small, .20–.60 moderate, .60–.80 large, and above .80 large to very large (Vannest & Ninci, 2015). Compared to percentage of nonoverlapping data (PND; Scruggs et al., 1987), in which the values indicate the percentage of intervention phase data points that exceed the single highest baseline data point, Tau-U values indicate the percentage of intervention phase data points that exceed all baseline phase data points. We calculated the aggregated effect size and confidence interval for each study and combined the aggregated effect sizes to examine differences in the magnitude of treatment effects, based on subgroups, which was performed using one of the WinPepi freeware package of statistical programs, COMPARE2. In handling dependence of multiple effect size estimates, we used the shifting-unit-of-analysis approach in which effect sizes within studies are combined based on the variables of interest in the meta-analysis and violations of the assumption of independence of the effect sizes are minimized (Cohen, 1988).

Subgroup Analysis

Subgroup analysis was completed in this meta-analysis according to the followed variables: (1) Type of intervention (BST, BST plus other instructional procedure, video modeling, video modeling plus other instructional procedure, other), (2) outcome (using telephone, pedestrian skills, abduction prevention skills, fire safety skills, first aids skills, daily living safety skills), (3) implementer (teacher, researcher, other), (4) setting (community, classroom or school, home or group home), (5) grade level (preschool, elementary, secondary, adult), (6) comorbidity in addition to ID, (7) fidelity assessment, and (8) quality level of methodology, i.e., meets WWC DS without or with reservation, or does not meet WWC DS. Subgroup analysis was divided into two steps. The first analysis was conducted across all included studies. The second analysis was performed on studies divided into groups: studies that met WWC DS with or without reservation, and studies did not meet WWC DS.

If grade level was reported as middle (Bassette et al., 2018; Taber et al., 2002) or high school (Spooner et al., 1989; Winterling et al., 1992), it was coded as secondary for subgroup analysis. The outcomes (i.e., discard broken glass and plates, social safety skills) of two studies (Spivey & Mechling, 2016; Winterling et al., 1992) were categorized into daily living skills. Across subgroups, the differences of Tau-U effect sizes and mean number of sessions were analyzed using the Mann-Whitney U test or Kruskal-Wallis one-way analysis of variance test. If the Kruskal-Wallis test found significant differences, the Bonferroni post-hoc method was employed to control the family-wise error. In a Bonferroni adjustment, the significance level (p value) is lowered by dividing the significance level by the number of tests. In the current analysis, the adjusted significance level was .05/n in which the desired alpha-level was divided by the number of comparisons.

Results

Quality of Studies

The WWC DS evaluation results showed that only 3 of the 31 studies (9.7%; Collins et al., 1993; Gast et al., 1993; Kübra & Batu, 2020) met the standards without reservations. Six studies (19.4%; Christensen et al., 1993; Egemo-Helm et al., 2007; Kearny et al., 2018; Lumley et al., 1998; Marchand-Martella et al., 1992a; Ozkan, 2013) met the standards with reservations. Twenty-two studies (71.0%) did not meet the standards. Although the specific methodological quality assessment results are not provided in Table 2, the primary reason for studies failing to meet the WWC DS was insufficient IOA measurement. WWC suggests that IOA should be measured for 20% or more of the data points overall and in each phase, and the interassessor agreement must meet minimal thresholds of 80% or .60 kappa measures. It was found that 17 studies (54.8%) did not assess IOA for at least 20% of data points in each phase, and 14 studies (45.2%) did not measure IOA. The second reason for failing to meet the WWC DS was insufficiency of data points (i.e., 5 or more data points in each phase). Among the 14 studies that did not assess IOA, 8 studies did not collect sufficient data within phases.

Characteristics of Studies

Table 1 and Table 2 present the general characteristics of each of the 31 articles. One article (Collins et al., 1993) included two experimental studies with elementary and secondary students, and characteristics were coded for each study.

Table 1 Characteristics of Studies: Participants and Setting
Table 2 Characteristics of Studies (Intervention Types and Doses, Interventionists, Measurement, and Effect Sizes

Participant and Setting Characteristics

A total of 137 individuals with ID were included in this meta-analysis, 51 of whom were male, 48 of whom were female, and 38 were not specified. Six of the 31 articles (19.4%) did not report information on the gender of the participants. The sample size varied across studies, from one to seven. With regard to grade, 3 studies (9.7%) were conducted at the preschool level, 8 (25.8%) at the elementary school level, 10 (32.3%) at the secondary school level, and 10 (32.3%) with adults. Collins et al. (1993) conducted the first study in a secondary school, and the second study in an elementary school. The participants of 17 articles (54.8%) were diagnosed with other disabilities in addition to ID (e.g., Bannerman et al., 1991; Collins et al., 1993; Egemo-Helm et al., 2007), and the participants in the other 14 articles (45.2%) were reported as not having other comorbid disabilities. The number and types of comorbid disabilities varied from one to four (Purrazzella & Mechling, 2013).

The majority of the studies (n = 22, 71.0%) provided information on participants’ cognitive levels (i.e., IQ sore) with six of these studies providing the information for some participants. Four studies (12.9%) reported the participants’ cognitive levels without information on the diagnostic instruments used to assess their levels of ID (Egemo-Helm et al., 2007; Fisher et al., 2013; Sanchez & Miltenberger, 2015; Taber et al., 2002). The studies with the diagnostic assessment information reported that the participants’ full scale IQ scores ranged from 38 to 68. One study (Marchand-Martella et al., 1992b) included three children whose IQ scores ranged from 72 to 90. Kübra and Batu (2020) included one child with ASD whose IQ score was 80. With regard to setting, six studies (19.4%) completed the intervention in the community setting, and seven (22.6%) in the home or group home setting. Classroom or school was a common intervention setting for safety skills training in children with ID (n = 18, 58.1%). Community settings were used frequently for examining the intervention generalization effects (n = 13, 41.9%).

Intervention Characteristics

Types of intervention

The number of studies that used the same type of intervention as was coded is as follows: BST-alone intervention was implemented in seven studies (22.6%), and BST with other instructional procedures (e.g., IST, prompting, time delay) in seven studies (22.6%). Two studies (6.5%) used video modeling, and five (16.1%) used video modeling with other instructional procedures (e.g., prompting, reinforcement, simulation, time delay). Two studies (6.5%) used peer tutoring with prompting (Kearny et al., 2018; Marchand-Martella et al., 1992a). The other seven studies (22.6%) implemented prompting with modeling and reinforcement (Bannerman et al., 1991), in vivo with prompting (Collins et al., 1993), community-based instruction (Ozkan et al., 2013), progressive prompt delay (Eldeniz Certing & Bozak, 2020), least-to-most prompting in a total-task presentation (Kearny, Brady et al., 2019), simulation (Batu et al., 2004), or constant-time delay (Collins & Griffen, 1996).

Intervention dose and implementer

The number of studies that reported the intervention dose (intensity and duration) in relation to intervention session length was low (n = 12, 38.7%), and those that reported the session length ranged from 3.1 min to 90–180 min. The intervention with the smallest mean number of sessions was BST with IST (3.2 sessions), and the intervention with the largest mean number of sessions was a modeling with least-to-most prompting procedure (59.7 sessions). Teacher was the implementer in five studies (16.16%), researcher in 22 studies (71.0%) and other (e.g., instructor, peer, staff or nurse) in four studies (11.5%). In one study both teacher and staff (instructional aide) implemented the intervention.

Dependent variables

With regard to dependent variables, the most common was first aid skills (n = 10, 32.3%). The next most common dependent variable (n = 7, 22.6%) was abduction prevention skills, including sexual abuse prevention and response to lures. Fire safety skills were reported in six studies (19.4%). Four studies (12.9%) focused on using the telephone (e.g., using a public phone, dialing emergency numbers, and using a cell phone to find location when lost). Daily living safety skills (e.g., discarding glass and plate shards) were targeted in three studies (9.7%), and pedestrian skills in three studies (9.7%). The Tau-U effect size ranged from .12 to 1. The effect size was only provided in studies that were reporting on safety skill dependent variables despite the presence of other dependent variables such as mailing a letter or cashing a check (Branham et al., 1999).

Other Study Characteristics

Treatment integrity and social validity

We examined how frequently the researchers assessed treatment integrity and social validity. Of the 31 studies that were included, 22 (71.0%) reported treatment fidelity. The reporting rate of social validity was low; about half (n = 16, 51.6%) assessed and reported social validity. Higher reporting of social validity was found in studies on BST combined with other instructional procedures (n = 8, 25.8%), followed by studies on video modelling plus other instructional procedures (n = 5, 16.1%).

Maintenance and generalization

We also examined how frequently researchers evaluated intervention maintenance and generalization effects. Most studies reported maintenance and generalization of safety skill training: 26 studies (83.9%) reported maintenance and 26 (83.9%) reported generalization. The range of maintenance varied from 1 week to 16 months. Among the 26 studies that reported maintenance, except for 1 study (Winterling et al., 1992), 25 studies reported maintenance effects based on criteria set by individual study authors. The assessment range of generalization also varied from one probe to probes throughout the experiment. Of the studies that reported generalization data, only one study (Lumley et al., 1998) reported that the skills did not generalize. Four studies (Collins et al., 1993; Egemo-Helm et al., 2007; Gast et al., 1993; Miltenberger et al., 1999) found mixed generalization effects. Generalization was evaluated in a community setting (n = 13, 41.9%), a combination of school and classroom (n = 4, 12.9%), group home and home (n = 6, 19.4%), school and home (n = 2, 6.5%), and classroom and community (n = 1, 3.2%).

Overall Effect Size

Table 2 details the specific Tau-U effect size information for each of the study. Figure 2 includes the aggregated Tau-U effect size and confidence interval for each study. The smallest effect size was .12 (Knudson et al., 2009) and the largest effect size was 1.00, which was shown in seven studies (e.g., Bannerman et al., 1991). Except for a few studies, the Tau-U effect sizes indicated a large to very large magnitude of treatment effects across studies.

Fig. 2
figure 2

Forest Plot Showing Tau-U Effect Sizes and 95% CIs for Individual Studies. Note. The effect sizes are denoted by the squares and CIs by the horizontal lines. The single vertical line denotes no effect. The diamond shape at the bottom of the forest plot is the overall effect size (.89) for all comparisons

Subgroup Analysis

Table 3 presents the subgroup analysis results for all studies based on different study characteristics and Figure 3 shows forest plot of effect sizes according to type of intervention. We found statistically significant differences across intervention types, outcomes, and implementers at p < .05 level. The differences among Tau-U values across settings, grades, other disabilities, treatment fidelity, and WWC DS were not significant. In post-hoc comparisons, adjusted p value indicated the mean rank scores of BST (Tau-U = .95, 95% CI = .80–1.00, p = 001), video modeling (Tau-U = .97, 95% CI = .85–1.00, p = .005), and other (Tau-U = .94, 95% CI = .89–.98, p = .008) were significantly larger than BST plus other instructional procedures (Tau-U = .74, 95% CI = .66–.82). Tau-U values of outcomes for first aid skills (Tau-U = .97, 95% CI = .93–.1.00) were significantly larger than those for fire safety skills (Tau-U = .82, 95% CI = .74–.92, p = .001) and daily living safety skills (Tau-U = .82, 95% CI = .72–.92, p = .007). The Tau-U effect size for “other” implementer (Tau-U = .98, 95% CI = .90–1.00) was significantly larger than those for teacher (Tau-U = .73, 95% CI = .58–.88, p = .003) and researcher or instructor implementer (Tau-U = .88, 95% CI = .85–.942, p = .025).

Table 3 Summary Effect for Subgroup Analysis
Fig. 3
figure 3

Forest Plot Showing Tau-U Effect Sizes and 95% CIs for Type of Intervention. Note. The effect sizes are denoted by the squares and CIs by the horizontal lines. The single vertical line denotes no effect

In terms of mean number of sessions conducted to achieve the mastery criterion, statistically significant differences were found across intervention types, settings, and grade levels at p < .05 level. The differences of mean number of sessions conducted to achieve the criteria were not significant across outcomes, implementers, other comorbid disabilities, fidelity, and WWC DS. The adjusted p value in post-hoc comparisons indicated that the mean number of sessions for BST (M = 9.15, SD = 2.79, p = .013), BST plus other instructional procedures (M = 7.79, SD = 4.30, p = .001), and video modeling plus other instructional procedures (M = 7.76, SD = 2.62, p = .006) were significantly shorter than other types of intervention (M = 19.31, SD = 12.89). Training in the classroom or school setting (M = 14.79, SD = 6.69, p = .040) required more sessions than those in the home or group home (M = 10.91, SD = 12.76). Training for elementary students (M = 19.91, SD = 11.05) required more sessions than for secondary students (M = 9.32, SD = 2.95, p = .012) and adults (M = 10.33, SD = 11.97, p = .001).

Table 4 presents the subgroup analysis results in studies meeting WWC DS. For studies meeting the WWC DS with or without reservations, the differences in the magnitude of effects across study variables were not statistically significant, indicating that high-quality studies equally resulted in large magnitude effects regardless of the intervention types, target skills, implementers, settings, grade levels, existence of other disabilities, or assessment of fidelity. In terms of mean number of sessions, there were no statistically significant differences across subgroup variables. However, for studies not meeting the WWC DS, data indicated statistically significant differences across intervention types, outcomes, and implementers at p < .05 level. In post-hoc comparisons, adjusted p value indicated the mean rank scores of BST (Tau-U = .95, 95% CI = .89–1.00, p = .010), video modeling (Tau-U = 1.00, 95% CI = .85–1.00, p = .004), and other (Tau-U = .94, 95% CI = .89–.99, p = .025) were significantly larger than BST plus other instructional procedures (Tau-U = .67, 95% CI = .55–.79). Tau-U values of outcomes for first aid skills (Tau-U = .98, 95% CI = .92–.1.00) were significantly larger than those for abduction prevention skills (Tau-U = .81, 95% CI = .68–.94, p = .015) and daily living safety skills (Tau-U = .82, 95% CI = .72–.92, p = .004). The Tau-U effect size for “other” implementer (Tau-U = .99, 95% CI = .90–1.00, p = .015) was significantly larger than for teacher (Tau-U = .76, 95% CI = .57–.95).

Table 4 Summary Effect for Subgroup Analysis of Meet WWC DS studies

Discussion

This meta-analytic review examined 31 SCD studies on safety skills interventions published between 1998 and February 2021 that targeted individuals with the diagnosis of ID. This review aimed to analyze varying study characteristics, magnitudes of effects of safety skills interventions, and differences in subgroup variables (e.g., intervention type, outcome measure, implementer, setting) of the studies, and to provide recommendations for practice and future research.

Major Findings and Implications

Quality of Evidence

In analyzing the 31 studies, we first evaluated the quality of evidence of the studies. We found that only 9 (29%) of the 31 studies met the WWC DS with or without reservations. Failing to meet the design standards for IOA was the primary reason that studies did not meet the WWC DS. The failure of IOA quality standard has also been noted in the literature on social skills interventions for students with challenging behavior (Hutchins et al., 2017), suggesting that the integrity of data need to be monitored more carefully in future studies. The second reason for failing to meet the WWC DS was insufficiency of data points in each phase to demonstrate experimental control (i.e., fewer than 3 data points in a phase). This weakness of SCD studies on individuals with disabilities have also been noted in behavioral intervention literature (Mason et al., 2016; McKenna et al., 2015). The information provided by studies that did not meet the WWC DS still provide useful information such as study characteristics, although the effect size for these studies should be more cautiously interpreted than studies that did meet the WWC DS. We conducted subgroup analyses for the studies that did meet the design standards set by the WWC to help account for inflated results due to poor experimental design.

Participant and Intervention Characteristics

We examined the study characteristics before analyzing the magnitude of effects of the social skills interventions. Results revealed that school-age children have been the primary population of interest for the safety skills research on individuals with ID. This may indicate a lack of knowledge about the efficacy of safety skills interventions for young children and adults with ID. More research in safety skills is needed for these populations to address the disparity in research population. In examining the types of safety skills interventions, we found that BST, or variations thereof, was the most commonly used intervention for individuals with ID, followed by video modeling with and without additional components. Thirteen studies (42%) used BST or BST with additional components (e.g., IST). In most studies, instead of adding the additional components when BST alone did not result in desired levels of outcomes, BST was used in combination with other procedures as an intervention package). Likewise, Wiseman et al. (2017) found that BST with additional components were widely used in the safety skills intervention literature for individuals with ASD.

In general, intervention dose is characterized as intervention intensity and duration, based on attributes of the session length, frequency of sessions, and duration of intervention (number of sessions). Few studies provided information on the session length and intervention duration and; therefore, we analyzed the total number of intervention sessions conducted and each intervention session length to estimate each study’s intervention dose. Even with this alternative method of calculating intervention dose, the number of studies with the intervention dose information was close to half (n = 16, 51.6%). Of the studies that reported dose, the mean number of sessions ranged from 4.6 to 12.5 sessions, except for Ozkan et al. (2013) study, which had 32.4 sessions. The average duration of training within the studies that reported dose varied widely from 3.1 to 180 min. The variation in training durations could be attributed to complexity of skill, skill level of the participants, or differences in how the authors reported the duration of their sessions (e.g., total time at a location rather than solely the duration of the training). Future research should analyze this more closely when investigating intervention dose. In general, the number of sessions required for elementary aged children to learn skills took more sessions than secondary students and adults, suggesting that young children need more training sessions than adolescents and adults (e.g., Akmanoglu & Tekin-Iftar, 2011; Collins & Griffen, 1996; Ozkan et al., 2013). In addition, analyses of the mean number of sessions suggest that BST and video modeling, both alone and with additional components, provided more expedient training of skills in terms of number of sessions than other intervention types in this analysis. This may help practitioners in choosing an intervention type to use if time is an important factor for training.

We found that assessment of social validity was infrequently reported (42.3%), which reflects a limitation of the current body of the literature on safety skills intervention for individuals with ID. This finding is consistent with previous findings in that despite the known importance of social validity for intervention, the number of studies reporting social validity in the behavioral intervention literature continues to be limited Park & Blair, 2019; Ledford et al., 2016; Snodgrass et al., 2018). Although we did not provide the specifics on the types and areas of social validity assessment in any of the tables due to the space issue, of the studies that assessed social validity, interview was the most commonly used method to assess social validity in safety skill training studies (e.g., Spivey & Mechling, 2016; Taber et al., 2002). Questionnaires were also used in some studies (e.g., Akmanoglu & Tekin-Iftar, 2011; Egemo-Helm et al., 2007). Only one study used normative comparison to measure social validity (Marchand-Martella et al., 1992b). The results also indicated that social validity assessment in safety skill training studies mainly focused on assessing satisfaction with or acceptability of the intervention.

A strength of the body of literature on safety skills interventions for individuals with ID was found to be the evaluations of maintenance and generalization effects. Most of the studies (n = 26, 83.9%) reported maintenance and generalization data. The length of time from the conclusion of the intervention to the maintenance probe varied from 1 week to 3 months, but was not analyzed in this study. For the studies that did have maintenance data, only one study did not observe maintenance effects, which suggests that trained safety skills do maintain over time. However, this strength is tempered by the limited number of studies that involved natural change agents as implementers. Previous reviews found that around 60% of studies report generalization data (Dixon et al., 2010, Wiseman et al., 2017; Wright & Wolery, 2011). The number of studies reporting generalization data included in this current review suggest that there has been a positive trend in evaluating generalization effects in the literature.

Magnitude of Safety Skills Intervention Effects

Overall magnitude of effects and by intervention type

Findings of the current study indicate that overall, safety skills interventions have demonstrated small-to-large effect sizes (.12–1.0) across studies. Similar to the results of Wiseman et al.’s (2017) meta-analysis of safety skills for individuals with ASD, the results of this review showed that studies implementing BST procedures with or without other components demonstrated a medium to large effect size with the exception of the Knudson et al. (2009) study, which showed a small effect size (.12). Further, post-hoc comparisons indicated that BST alone yielded larger effects than BST with additional components; however, this may be due in part to the Knudson et al. (2009) study, which targeted individuals with severe and profound intellectual disabilities. Knudson et al. used BST combined with in situ and prompting when BST alone was not successful for teaching fire safety skills to seven individuals with severe and profound intellectual disabilities; however, only one participant demonstrated improved skills, contributing to the small magnitude of treatment effect. It is likely that, with careful consideration, adding components to the BST procedures could yield better results than BST alone. Clinicians should not be dissuaded by this finding from modifying BST with other procedures, such as in-situ and prompting procedures that are supported by research in particular, when BST alone did not result in desirable behavioral outcomes as shown in several studies.

However, an argument could be made that training on abduction and sexual prevention skills, which most BST plus additional components targeted, is more difficult than training first aid skills, which most of the BST alone studies targeted. This variation in target skills might have contributed to differences in the magnitude of effect sizes. Another possibility is that there might be prerequisite skills, which have not been empirically identified, that make safety skills training more likely to be effective. This may also explain the mixed results within individual studies where BST was effective for some participants and not for others. A possible prerequisite could be related to a stimulus class of “dangerous objects” that evoke avoidance responses whereby the initial goal of an intervention would be training individuals to respond to “dangerous objects” stimulus, instead of training them the appropriate response to a safety threat. Therefore, a suggestion for practice is to be cognizant of the learner’s overall current skill levels (e.g., expressive, receptive, imitative) and understand that the learner may require more or less training, depending on their skill levels.

The results of the study also indicate that BST (.95) and video monitoring (.97) had the largest effect sizes of all the interventions. The result suggests that both BST and VM can equally be effective for training safety skills although this does not necessarily indicate that other training methods are less effective, as other training procedures demonstrated a large magnitude of effects, such as prompting, peer tutoring, and simulation. Dixon et al. (2010) reported similar findings that BST was effective in teaching safety skills to individuals with disabilities. Wiseman et al. (2017) reported similar levels of treatment effects for BST and video modeling for the ASD population. Future research is needed to clarify the nature of these findings, such as conducting direct comparisons of training types or using a group design to evaluate specific effect of different types of safety skills training and the interaction effect among variables.

The results suggest that BST alone and video modeling may be an effective intervention for teaching safety skills for individuals with ID. However, this suggestion comes with a word of caution. Clinicians must be cautious when and where safety skills training should be implemented. Generalization (i.e., use of the skill in the natural environment) is of paramount importance when it comes to safety skills. Therefore, it is highly recommended that an in-situ assessment be conducted in some manner for all safety skill training as that is the only method of assessing whether the safety threat evokes the safety response in natural settings. In-situ assessments have been utilized in numerous safety skills studies to determine generalization effects (e.g., Gast et al., 1993; Himle et al., 2004).

Differences based on study qualifications

We conducted a separate subgroup analysis to examine the differences in the magnitude of effects between studies that met the WWC DS without or with reservations and studies that did not meet WWC DS. It was found that BST (.95) and VM (1.00) remained the interventions with the largest effect sizes regardless of whether the studies met or did not meet WWC DS. However, the results should still be interpreted with caution due to the decreased sample size of the studies meeting WWC DS without or with reservations (n = 9). Given that high-quality studies equally resulted in large magnitude effects regardless of the intervention types, target skills, implementers, settings, grade levels, existence of other disabilities, or assessment of fidelity, there is a need for more high quality SCD studies on safety skills interventions for individuals with ID.

Effect sizes by outcome

Outcomes for abduction prevention skills reported in the studies in this review were significantly larger than the outcomes for other safety skills. One reason for the large outcome for abduction prevention may be because all but one of the abduction prevention studies used BST plus IST. Although BST alone was found to be more effective than BST plus other components for teaching safety skills in this analysis, BST plus IST may be an exceptionally effective method of training for the specific set of skills required for abduction prevention. This may be in due in part because IST incorporates all relevant stimuli (e.g., a potential abductor, common lure, a natural setting) and because it occurs following an in situ assessment in which the trainee is unaware that they are being assessed (e.g., Miltenberger, 2008). To the trainee, it is a very real situation. It may be prudent to suggest to practitioners to be prepared, and prepare families, to use IST when training individuals with disabilities abduction prevention or other safety skills in the clinical setting.

Effect sizes by implementer

The effect size for “other implementer” was larger than those for teacher and researcher implementers. This finding may support the importance of involving natural change agents in the implementation process because natural change agents such as staff, peers, and instructors were implementers or a part of the implementation of the interventions. For example, Kearny, Dukes et al. (2019) had a CPR instructor involved in the intervention implementation which is exactly the person that would be typically training that skill. Although the teacher subcategory would be considered a natural change agent, it is possible the small number of studies included in the “other implementer” sub-category skewed the result. Future research should use more sophisticated statistical analyses to specifically investigate implications according to implementer.

Effect sizes by other characteristics

No statistical significance was found for setting, grade, other disability, or treatment fidelity in relation to outcomes, suggesting that treatment efficacy may not be heavily reliant on these factors. It is also interesting that comorbid disabilities do not appear to alter the effectiveness of treatment. This finding should be considered carefully because some disabilities such as visual or hearing impairment may alter how individuals interact with the environment, which could affect the effectiveness of treatment (Jones et al., 1984; Thorslund et al., 2013). Although the majority of the studies were conducted in natural environments, involvement of natural change agents (e.g., caregivers and teachers) was not demonstrated in many studies. Considering that interventions result in more sustainable outcomes when the natural change agents can implement the interventions with fidelity, future studies should strive to involve those who interact with the individuals with intellectual disabilities most often. This would also potentially improve the generalizability of the intervention by incorporating common stimuli (i.e., the natural change agent) at the onset of training. In addition, it is the goal of every applied researcher that evidence-based interventions be widely disseminated; therefore, interventions designed to be used by natural change agents would be a step closer to that goal and would benefit more individuals with disabilities.

Limitations and Future Directions

There are some limitations of this meta-analysis. The first limitation concerns the number and scope of the studies included in the meta-analysis. The number of effect sizes derived from the 31 studies included in this study was insufficient to perform a more in-depth analysis. Although we attempted to review all relevant peer-reviewed studies by searching several electronic databases for related studies, it is possible that additional relevant studies were overlooked. Because unpublished sources such as dissertations were not included in the analysis, there is the possibility of publication bias. Given that dissertations can be a useful proxy measure of publication bias, future researchers may consider evaluating whether the dissertations on the interventions evaluated in the published studies showed similar positive outcomes of the interventions.

We identified a few study characteristics that demonstrated differential magnitude of effects for safety skills outcomes. Nevertheless, there is room for further clarification of the differences. One of the findings was related to intervention dose, which indicated children in elementary grade levels required longer training to master skills; but again, this could vary depending on the type of intervention. Because many studies did not provide information on the intervention frequency, session length, and intervention duration, we examined the magnitude of effects based on the number of sessions, a proxy measure of intervention duration. This did not allow us to examine the magnitude of effects based on intensity the intervention; thus, preventing a comprehensive analysis. Although efficacious interventions should be used in teaching safety skills to individuals with disabilities, studies still lack much information on the dose of an intervention, which makes it difficult for practitioners to select an evidence-based intervention that is efficient to implement in supporting individuals with disabilities.

In future studies, full and accurate information should be provided on the intervention intensity and duration to help identify variables affecting the efficacy of intervention. In examining differential magnitude of treatment effects for safety skills outcomes, we did not examine the variations of BST plus other components, which we consider a limitation of the study. As discussed earlier, we found that there were variations within BST plus other components. In some studies, BST with other components (e.g., prompting) were evaluated as a packaged intervention, whereas in some studies, additional components (e.g., IST, prompting) was added to BST when BST alone was not effective. In future research, these variations should be examined separately, instead of simply comparing BST to BST plus other to better inform safety skills prevention and intervention efforts for individuals with disabilities.

Another potential limitation of this meta-analysis relates to the use of Tau-U effect size metric in examining the magnitude of effects of safety skills interventions, which should be acknowledged in interpreting the data. It has been augured that the Tau-U effect size values are inflated and not bound between -1 and +1 and cannot be visually graphed and thus, Tau-U is a weak method of trend control in data, leading to Type 1 error (Brossart et al., 2018; Tarlow, 2017). However, in calculating the effect sizes from each reviewed study, we combined baseline and intervention phase contrasts within and across individuals by weighting Tau-U effect sizes with their standard errors as suggested by Parker et al. (2011). In addition, we did not control baseline trend in calculating Tau-U values when using the Tau-U calculator (Vannest et al., 2011) given that the baseline performance of safety skills is typically stable. Yet, in future research it might be valuable to the log response ratio to examine ratio-scaled behavioral outcomes (Pustejovky, 2018).

Conclusion

Despite these limitations, this is the first meta-analysis that synthesized studies on safety skills interventions for individuals with ID. This is also the first meta-analytic review that examined treatment dose to determine the optimal treatment duration required for safety skills interventions for individuals with ID and that examined the implementers, comorbid disabilities, and grade levels as potential factors that may be associated with treatment effects. For a future meta-analysis, it would be beneficial to examine the impact of specific comorbid diagnoses as it could provide practical information on which intervention to choose for specific individuals.

Practitioners in particular, educators commonly attempt to provide some safety skills training to students, such as fire, drug, firearm, and sexual education safety training, by having people of authority (e.g., fireman, nurse, police officer) provide relevant lectures. However, research has shown that information alone is an ineffective training method in particular, when targeting individuals with disabilities (Miltenberger, 2008). No study included in this analysis utilized an informational approach alone. Therefore, there is a need for school personnel or authority figures to utilize more effective, yet efficient intervention approaches to teaching safety skills as demonstrated in the reviewed studies. It would ideal that, if necessary, following a more active training approach, having teachers conduct infrequent a few in-situ assessments during the school year, followed by IST would promote long-term maintenance effects. This would ensure that the students are exposed to effective training and will maintain those trained skills long-term.