Introduction

Almost 146,600 people died because of unintentional injuries in 2015 (Centers for Disease Control; CDC 2015). Specific unintentional injury causes include suffocation, drowning, firearm discharge, pedestrian accidents, accidental stabbings, and motor vehicle traffic accidents. The CDC reported that 489 accidental deaths occurred in 2015 because of firearms, 8313 as a result of motor vehicle accidents, and 6914 as a result of suffocation (CDC 2015). It is estimated that the annual medical cost of unintentional injuries occurring at home is $387,000,000,000 (Runyan and Castell 2004). Some of these injuries and deaths may be preventable if appropriate safety response training is provided.

Interventions based on the principles of behavior analysis have been effective at teaching a variety of potentially lifesaving safety responses. The extant literature is diverse and covers several different types of safety responses including abduction prevention (e.g., Bergstrom et al. 2014; Gunby and Rapp 2014; Sanchez and Miltenberger 2015), fire safety (e.g., Houvouras and Harvey 2014; Knudson et al. 2009), gun safety (e.g., Hanratty et al. 2016; Himle et al. 2004), poison prevention (Dancho et al. 2008; Summers et al. 2011) pedestrian skills (e.g., Horner et al. 1985; Page et al. 1976), help-seeking responses (e.g., Bergstrom et al. 2012), motor vehicle safety (i.e., Himle and Wright 2014), sharp object safety (Winterling et al. 1992), and abuse prevention responses (e.g., Boyle and Lutzker 2005; Egemo-Helm et al. 2007; Haseltine and Miltenberger 1990).

Several authors have reviewed the safety literature; however, these reviews focused on only one or two areas of safety instruction. The areas addressed included abduction prevention (Bevill and Gast 1998; Mechling et al. 2009), personal safety (Bevill and Gast 1998; Lumley et al. 1998), firearm safety (Jostad and Miltenberger 2004), pedestrian safety (Dixon et al. 2010), and fire safety (Mechling et al. 2009). Of these reviews, three focus exclusively on safety instruction with individuals with disabilities (Bevill and Gast 1998; Dixon et al. 2010; Doughty and Kane 2010; Mechling et al. 2009). Most of the reviews focused on dangerous situations (e.g., abduction, abuse, fire); only one review evaluated the effectiveness of procedures to teach responses to a dangerous stimulus (i.e., a firearm; Jostad and Miltenberger 2004).

Several commonalities exist among these reviews. Previous reviews of the safety literature all noted behavioral skills training (BST) with and without in situ training (IST) as an effective strategy to teach safety responses. In addition, previous reviews all noted the effectiveness of safety instruction delivered to young children and individuals with disabilities, indicating safety responses can and should be taught to these populations.

These reviews reveal that the safety literature contains a large variation of features related to safety training. Several examples of these variations include the type of training procedure used, whether some supplemental additions were made to the intervention (e.g., incentives, in situ training, peer models), and how experimenters selected training stimuli and target responses.

Several reviews (Bevill and Gast 1998; Dixon et al. 2010; Doughty and Kane 2010; Mechling et al. 2009) also identified several shortcomings of the safety literature. These limitations include: (a) the need for a more systematic description of procedures to allow for replication and (b) increased assessment of generalization and maintenance of treatment gains. These reviews identified persistent shortcomings in the safety literature, which suggests that although the number of safety articles has increased in the last 40 years, previous reviews have not led to a change in research practices. For example, Bevill and Gast (1998) identified a lack of research on dangerous objects and the need for research on generalization and maintenance. Despite the addition of almost a decade worth of research, subsequent reviews identified the same deficits in the extent literature (Dixon et al. 2010; Doughty and Kane 2010; Mechling et al. 2009).

As much as these reviews reveal, there are still several areas that may require attention. For example, in recent years several studies have evaluated procedures to teach poison prevention (Dancho et al. 2008; Summers et al. 2011; Rossi et al. 2017), motor vehicle safety (Himle and Wright 2014), and what to do when lost in public (e.g., Bergstrom et al. 2012; Carlile et al. 2018). There is a need for a comprehensive review of the literature on safety response training that includes areas of safety instruction that have yet to be evaluated. To date, no review of the literature has been conducted that encompasses the entirety of the safety literature. It is possible that the limited scope of previous reviews contributed to the deficits they identified not being addressed by subsequent empirical studies. A more thorough review of research on safety skills training is necessary to guide research.

It seems appropriate therefore to conduct a systematic review of the safety literature that is not restricted to a type of safety response or specific participant characteristic. The purpose of the present review is to (a) evaluate the extant literature on procedures for teaching safety responses to individuals with and without disabilities, (b) identify gaps in the current research, and (c) propose suggestions for future research.

Method

Search Strategy

We identified empirical evaluations of safety response training through a search of the PsychINFO, ERIC, and PsycARTICLES databases using the keywords safety education, safety skills, safety, and safety instruction. We set modifiers in our search criteria to include only articles that were peer-reviewed, published in English, and of a research article type (i.e., we excluded meta-analysis and literature reviews). No parameters were set for publication date. We also performed a search of the reference sections of each of the obtained articles to identify additional articles that were not identified during our database search. To assess the potential relevance of articles, we looked for key words in the titles and reviewed the abstracts for the relevance of the content.

Articles that met all the following criteria were included: (a) published in a peer-reviewed academic journal; (b) included human consumers as participants; (c) trained a safety response (i.e., some response whose acquisition allows the consumer to avoid or prevent injury); (d) data were collected using direct observation of the primary dependent variable; and (e) reported individual participant behavior-change data. Articles that did not meet one or more of the above criteria were excluded.

The Appendix shows an overview of the literature search process. A total of 2125 original articles were identified through the database searches. A review of the abstracts and titles of each article resulted in 83 articles being retained. The reference sections of these articles were searched resulting in an additional 28 articles. A cite forward search of each of the original 83 retained articles was conducted using Google Scholar. The cite forward search resulted in 4686 articles, an initial review of the abstracts and titles of each article resulted in 521 articles, and 25 articles were retained after duplicates were removed. The inclusion and exclusion criteria were applied to a total of 136 articles, and 82 articles (containing 87 experiments) were retained and included in the final review.

Two independent raters examined each article obtained from the electronic and hand searches using the initial search criteria to determine whether it met the inclusion or exclusion criteria. Interrater agreement was calculated by dividing the number of agreements by the number of agreements plus disagreements and multiplying by 100. Agreements were defined as both raters indicating an article met all the criteria for inclusion or exclusion. A disagreement was scored if one rater indicated that the inclusion criteria or exclusion criteria had not been met. The raters reviewed the initial 2125 articles and 25 articles retained from the forward citation search and agreed that 82 articles (containing 87 experiments) should be included. Raters agreed on 100% of studies for inclusion and exclusion.

Data Collection

If two or more experiments were included in a single article, each experiment was evaluated separately for the purposes of summarizing our findings. We evaluated each experiment along several dimensions.

Consumer and Setting Characteristics

Data were collected on the age, sex, and clinical diagnosis (if applicable) of the participants, as well as the settings in which the safety responses were taught. Participants were grouped according to the age groups used by the CDC Web-based Injury Statistics Query and Reporting System (2016). The ages of the participants were coded as infants (younger than 1), toddlers (1–4), early adolescents (5–9), late adolescents (10–14), teenagers (15–19), or adults (20 and older). The training setting was coded as either natural (i.e., setting in which the safety response should occur, such as a home, motor vehicle, or community location) or clinical (i.e., some artificially constructed setting, such as creating a simulated supermarket or kitchen area).

Dependent Variables

Data were collected on the dependent variables in each experiment along several dimensions, including the type of safety response taught (i.e., abduction prevention, abuse prevention, emergency response, fire prevention, firearm safety, help-seeking responses in response to becoming lost in public, pedestrian safety, poison prevention, motor vehicle safety, sharp object safety, and suffocation), and whether the experiment evaluated the degree to which participants demonstrated differentiated responding across safety categories (i.e., responding differently and appropriately to non-dangerous and dangerous stimuli). We evaluated whether differential responding was assessed to determine the extent to which studies ensured that safety responses were under appropriate stimulus control.

Independent Variables

Data were collected on the type of training procedure used in each experiment. The training methods were coded as: behavioral skills training (BST, i.e., treatment package of instruction, modeling, role play, and feedback), in situ training (IST, i.e., in vivo role play with positive and corrective feedback), prompting procedure (i.e., some supplemental stimulus provided to increase the likelihood of a correct response), video modeling (VM, i.e., audio-visual demonstration of a response the consumer should complete), manualized treatment (i.e., a commercially available manualized treatment), discrete trial training (DTT), virtual reality (VR, i.e., computer generated simulation of a three dimensional situation), and putative reinforcer (PR, i.e., supplemental preferred stimulus was needed to generate correct responding). Data were also collected on the training agents in each experiment (i.e., clinician, caregiver, peer, or trained specialist). A clinician was defined as an individual, not a caregiver, working with the participant in an instructional or research role. A caregiver was defined as an individual responsible for the participant’s daily care in the home. A peer was defined as an individual around the participant’s age. A trained specialist was defined as an individual who required a specific certification to provide training. It is important to know how often parents and educators are serving as the training agents because the ultimate goal should be for children to receive safety education at school or in the home.

Outcomes

Data were collected on the outcomes of each experiment in terms of participant results. Outcomes were categorized as positive (i.e., all participants demonstrated the safety response at the mastery criterion), mixed (i.e., a subset of participants demonstrated safety responses at the mastery criterion), or negative (i.e., all participants did not demonstrate the safety response at the mastery criterion). Our categorization was based on the mastery criterion specified by each experiment or a mastery criterion of 100% if no criterion was explicitly stated. The 100% mastery criterion was chosen because incorrect completion of a safety response could lead to injury or death. Data were also collected on whether training duration data were reported and the reported times.

Generalization and maintenance

Previous reviews (Bevill and Gast 1998; Dixon et al. 2010; Doughty and Kane 2010; Mechling et al. 2009) called for an increase in generalization and maintenance data. Data were therefore collected on the use of the nine types of generalization technologies outlined by Stokes and Baer (1977). Studies were coded for which technology of generalization was used, whether a subsequent assessment of generalization followed, and the outcome (outcomes of generalization tests were categorized as positive, mixed, or negative). Data were also collected on whether studies reported information on the number of exemplars of dangerous stimuli used during training. Studies were coded on the type of procedure used to select their training stimuli and/or responses. Studies were coded as using stakeholder opinion (i.e., opinion of the clinicians, caregivers, or participant), expert opinion (i.e., experimenter identified experts on the safety response), or a general case analysis (i.e., see procedures outlined in Horner et al. 1984). Data were also collected on whether a measure of maintenance of treatment gains was reported and the length of the maintenance follow-up period. The length of the maintenance follow-up period was coded as 1–4 weeks, 5–48 weeks, and over one year. Experiments were also coded for whether they included a single follow-up probe or multiple follow-up probes for each participant. The number of probes was evaluated to determine whether studies were increasing the amount of maintenance data they collected, as well as, the length of the maintenance period.

Social validity

Data were collected on the type of social validity assessed (i.e., goals, procedures, and outcomes). Social validity results were defined as positive (i.e., all respondents indicated satisfaction with the goals, procedures, and/or outcomes of the experiment), mixed (i.e., only a portion of the respondents indicated satisfaction with the goals, procedures, and/or outcomes of the studies), and negative (i.e., none of the respondents indicated satisfaction with the goals, outcomes, and/or procedures of the experiment).

Interobserver Agreement

An independent rater examined 54.5% (range 42.9–100% within category) of the studies that met the inclusion criteria. Studies were chosen at random from each category of safety response (e.g., abduction prevention, firearm safety, sharp object safety). To train raters for data collection, we provided them with written instruction that described each dimension and provided a definition. The authors reviewed theses definitions with the raters and then practiced coding two articles from different safety categories. After the practice, the raters were given an opportunity to code an article independently. Training was complete when the experimenter and rater obtained 100% agreement on two consecutive studies. Each rater independently completed data tables by evaluating each experiment along the dimensions outlined above (e.g., age, independent variable, outcome). The raters assessed item-by-item agreement by comparing each item in the data tables. An agreement was defined as both raters coding a specific dimension of an article with identical information. For example, both raters coded an article as including three toddlers as participants. A disagreement was defined as when one rater’s coding for a specific dimension of an article differed from that of the other rater. For example, one rater coded an article as including BST as the independent variable and the second rater coded the independent variable as including BST and IST. Interobserver agreement was calculated by dividing the number of agreements by the number of agreements plus disagreements and converting the resulting quotient into a percentage. Mean IOA was 94.3% (range 88–100%) across studies.

Results and Discussion

The purpose of the present review was to evaluate the literature on teaching safety responses to individuals with and without disabilities. In reviewing the extant literature on safety responses, several trends emerged. An overall summary of the dimensions across all safety categories is given in Table 1.

Table 1 Percentages for dimensions across all safety response categories

Consumer and Setting Characteristics

In the current review, the majority of participants were early adolescents (25.9%) and adults (20.9%). Late adolescents (10.4%), teenagers (8.0%), and toddlers (16.3%) were represented, though not as often. Males (43.7%) served as participants more often than females (34.5%), and a small number of studies did not report participant sex (21.8%). The majority of the research on safety response training has been conducted with individuals with some types of diagnosis (neurotypical = 19.7%; disability = 53.2%). Participant diagnoses were highly varied across categories, although more consistent within each category. The two populations most often included in research were consumers diagnosed with autism spectrum disorder (ASD 15.7%) and developmental disability (DD 25.3%). Investigations were also conducted with consumers diagnosed with intellectual disability (6.1%), attention-deficit hyperactivity disorder (ADHD 0.9%), fetal alcohol syndrome (FAS 1.2%), Down syndrome (2.3%), severely handicapped (1.2%), speech and language delayed (1.2%), and cerebral palsy (0.5%). Several investigations were conducted with individuals who were blind (1.2%).

Dependent Variables

The percentage of studies that evaluated each safety response category is given in Table 2. In the current review, experiments most frequently addressed safety responses related to fire safety (21.8%) and abduction prevention (19.5%). A moderate number of experiments taught responses related to abuse prevention (10.3%), firearm safety (10.3%), seeking help when lost (11.5%), and pedestrian safety (13.8%). Only a few experiments address safety related to poison prevention (5.7%), motor vehicle safety (1.1%), sharp object safety (1.1%), emergency responses (3.4%), and suffocation prevention (1.1%).

Table 2 Percentage of studies that evaluated each safety response category

Independent Variables and Outcomes

There are a variety of effective interventions for teaching safety responses. Although some of these interventions have several studies speaking about their effectiveness, others require additional research. A broad discussion of each intervention with regard to its effectiveness is provided below.

BST and IST

BST in isolation or in combination with other training methods continues to be the most frequently used method to teach safety responses (52.9%, e.g., Dancho et al. 2008; Egemo-Helm et al. 2007; Himle and Wright 2014; Miltenberger and Thiesse-Duffy 1988). When used as the sole training method to teach safety responses, BST is not consistently effective for all participants. Of the 19 experiments that used BST in isolation, only half obtained positive results (47.4%, i.e., intervention was effective for all participants in the experiment), the remainder obtained mixed results (52.6%, i.e., intervention was effective for only a subset of participants). The inclusion of IST appears to increase the effectiveness of BST. An evaluation of the 24 experiments that used a combination of BST plus IST indicates the majority (52.6%) obtained positive results and only a subset obtained mixed results (16.7%). There are several elements of these interventions practitioners should take into consideration when teaching safety responses. First, results of the current review suggest that the inclusion of IST may increase the likelihood that participants will acquire the target safety responses. Practitioners, therefore, should consider including IST as a component of safety training from the onset. Second, although an effective training strategy, individual BST may have limited value when instructional time is limited, and large groups of consumers need to receive the training at the same time. Although several studies have evaluated BST in a group format (Gatheridge et al. 2004; Hardy 2002; Himle et al. 2004; Miltenberger et al. 2009), they did not report individual participant data and were outside the scope of this analysis.

Prompting Procedures

In the current review, 27 experiments (e.g., Batu et al. 2004; Brown and Gillard 2009; Hoch et al. 2009; Summers et al. 2011) used prompting procedures either in isolation (74.1%) or in combination with other training methods (25.9%, i.e., VM, IST, and DTT). The majority of experiments that used prompting procedures obtained positive results (74.1%), while a smaller number of studies obtained mixed (25.9%) or negative results (3.7%). These results suggest that prompting procedures are an effective training method for teaching a variety of safety responses. Prompting procedures have several benefits as a method of safety response training. First, a practitioner whose client has a limited imitative repertoire may use a series of physical, verbal, or gestural prompts to teach a safety response when other training methods that involve modeling (e.g., BST, IST, and VM) are not an option. Second, when working in settings with limited monetary resources prompting procedures are a low-tech option that does not require the use of video recording technology or the purchase of expensive equipment.

Video Modeling

Several studies used video modeling (VM 16.1%) to teach a variety of safety responses including abduction prevention (Akmanoglu and Tenkin-Iftar 2011; Beck and Miltenberger 2009), fire safety (Barone et al. 1986; Mechling et al. 2009), crossing the street (Spivey and Mechling 2016; Stienborn and Knapp 1982), and seeking help when lost (Carlile et al. 2018; Purrazzella and Mechling 2013). The majority of these studies obtained positive results (80%), while the remainder obtained mixed results (20%). Purrazzella and Mechling (2013) evaluated the use of VM to teach three adults with DD to text message pictures of their location to a caregiver when lost. VM was effective in teaching all three participants the safety response, and probes in the natural environment indicated generalized responding. Although VM may require some initial time and monetary resources, once created a video model may be an attractive option for group training as the video can be reused across consumers or groups. Additional research is needed to determine what components (e.g., voice-over narration, on screen next, examples and non-examples) should be incorporated into a video model to ensure consumers acquire the target responses without additional interventions, for example, evaluations of whether the inclusion of non-examples of target behaviors in VM alters their effectiveness.

Supplemental Modifications

Similar to other educational practices, not all consumers respond the same way to interventions. Studies have found that supplemental modifications to the primary intervention are needed to establish appropriate stimulus control. Several studies (6.9%; Hanratty et al. 2016; Miltenberger et al. 2004; Pan-Skadden et al. 2009) found that some consumers required a contrived putative reinforcer (e.g., an artificial reinforcer delivered after the correct emittance of the response) to demonstrate the safety response at mastery levels. This is of concern as in the natural environment this type of contrived reinforcement most likely will not be present. We recommend that a safety response not be considered mastered until it is demonstrated in the absence of a contrived putative reinforcer. To achieve this, we recommend identifying natural reinforcing contingencies that reliably occur in the target environment. If the contingency identified does not serve as a reinforcer, we suggest a pairing procedure be used in an attempt to establish that natural contingency as a reinforcer (Dozier et al. 2012). Future research might evaluate procedures for including these pairing procedures into the training packages of consumers who do not demonstrate the safety response under natural contingencies. Additionally, future research should evaluate modifications to current training procedures. For example, some consumers may not have the vocal verbal behavior to engage in a “tell” response. Some supplementary response may need to be incorporated to give these types of consumers a means for notifying a caregiver to a potential threat or violation of their rights. For some consumers, additional modifications may need to be made to individual prompt, prompt-fading, and reinforcement procedures to make broad interventions such as BST effective.

Interventions with Limited Research

Virtual reality. Our findings identified several training methods that could benefit from additional evaluation. One technology that has the potential for teaching safety responses is virtual reality (VR) technology. Only one experiment in the current review used VR technology (1.1%; Padgett et al. 2006). With VR technology, consumers can be taught responses in a safe clinical environment prior to instruction or assessment in the natural environment. For example, if teaching a response related to fire safety, VR can be used to simulate some of the stimuli associated with a fire, including olfactory, auditory, and visual stimuli (e.g., flames, sound of fire alarm, obstacles, smoky view) without putting the consumer in a dangerous situation. Padgett et al. (2006) used VR technology to teach participants to complete the responses needed to exit a home during a fire. All participants reached 100% mastery of the safety responses during VR training and subsequently the responses generalized to an in vivo simulation. Future research should extend the use of VR technology to teaching other safety responses such as crossing the street and poison prevention. VR technology may provide a more realistic simulation of potential dangers than could otherwise be created by the experimenter. VR has the potential to be effective at creating an analog environment whose realistic stimuli can establish strong stimulus control that is more likely to produce responses with generality.

Manualized treatment. A second technology that has been under evaluated in the current literature is manualized treatment. Despite Lumley and Miltenberger (1997) identifying the need for empirical evaluation of widely marketed and manualized programs, only four experiments (4.5%) have used a commercially available manual to teach a safety response (Barone et al. 1986; Kim 2016; Miltenberger and Thiesse-Duffy 1988; Miltenberger et al. 1990) and only one experiment (Miltenberger et al. 1990) evaluated the effect manualized instruction independently from other interventions (e.g., VM, IST). When used in combination with IST and VM to teach abuse prevention (Kim 2016; Miltenberger and Thiesse-Duffy 1988) and a fire safety (Barone et al. 1986) response, all experiments obtained positive results. When used in isolation to teach an abuse prevention response, results were mixed when implemented by a specialist and negative when the intervention was implemented by a parent (Miltenberger et al. 1990). Although outside the scope of our review, several group studies have evaluated the effectiveness of the Eddie Eagle Gunsafe® Program (2015). The National Rifle Association (2017) reports that the Eddie Eagle Gunsafe® Program has been used to teach 29 million children to stay safe should they find a gun. Despite this claim, studies evaluating this program (Gatheridge et al. 2004; Himle et al. 2004) have found it largely ineffective at preventing actual gunplay. Despite its lack of attention in the safety literature and the ineffective results obtained by previous studies (Gatheridge et al. 2004; Himle et al. 2004), manualized treatment has been found to be effective in decreasing behaviors associated with several disorders including obsessive compulsive disorder (Stimpfel et al. 2016), depression (Pasterfield et al. 2014), and anxiety (Shorey and Stuart 2012). Manualized treatment may be an attractive option as the treatment is standardized and the potential for instructor error is minimized (Eifert et al. 1997). Additionally, the standardized nature of manualized treatments provides researchers with the opportunity to conduct independent replications and subsequently to explore the external validity of the procedure. Manualized treatment, however, cannot plan for potential consumer-specific needs; research is needed to determine the necessary components that should be included in manualized approaches to safety response training and the merits of this approach over others.

Equivalence-based instruction. While there are several interventions that have not been evaluated in the safety training literature (e.g., shaping, video feedback, video prompting) there is one technology in particular that requires some discussion because of its potential efficiency. Equivalence-based instruction (EBI) has been used to teach a variety of responses, including neuroanatomy concepts (Fienup et al. 2016), portion-size estimation (Trucil et al. 2015), and statistics (e.g., Albright et al. 2015). During EBI, consumers are taught to respond to physically dissimilar stimuli as if they were the same (Fields and Reeve 2001). This results in the development of an equivalence class. Although the members of the equivalence class do not share physical similarity, they can occasion the same response (Fields and Reeve 2001). Related to safety response instruction, EBI could be used to form a dangerous class, which would contain members whose presence should evoke the same safety response. After training to one member occurs stimulus function transfers to the other class members, this class expansion in turn makes it possible to create larger classes. EBI may allow the establishment of a safety response to physically different dangers, eliminating the need to conduct training specific to each danger. This means that training would not need to be conducted for each new dangerous stimulus; new stimuli could simply be added as class members. Researcher is needed to determine the feasibility of using this procedure with young consumers and its effectiveness in teaching responses to dangerous stimuli.

Generalization and Maintenance

Teaching responses so that they occur in untrained stimulus conditions is a key component of instruction based on the principles of ABA (Baer et al. 1968). A response with generality is more likely to maintain in the natural environment and to occur under appropriate stimulus control. In the case of safety responses, this should be an essential component, as a correctly demonstrated response may prevent injury or death. Stokes and Baer (1977) outline procedures for increasing the likelihood of generalized responding. These procedures may prove ineffective if the target stimuli and responses are not selected systematically. While several studies in the current review included procedures to ensure appropriate stimulus control (e.g., discrimination training) was established over the safety response, only one experiment (Lee et al. 2019) directly assessed whether appropriate control was established. Future studies should include discrimination training as a component of their intervention. For example, if teaching a consumer to report potential poison hazards, it would be beneficial to teach both a response to emit in the presence of a danger and what response should be emitted if the consumer comes across a similar innocuous container.

Of those studies that reported their selection method, the majority of the published literature relies on expert (6.8%) or stakeholder opinion (11.4%) when selecting target responses and stimuli. While we are not discounting the social validity of these selection methods, they lack empirical support. There are areas of the generalization literature, specifically general case programming (Horner et al. 1982), which are under-used and could be useful to practitioners and researchers alike as empirical methods for selecting stimuli and responses. For example, in programming for generalization many studies in this review used multiple exemplar training. Multiple exemplar training on its own may not result in consumers attending to the relevant features of dangerous stimuli. We therefore recommend general-case programming or, at the least, careful consideration be used to ensure irrelevant or non-critical features of stimuli do not come to control the safety response. Only one experiment (2.3%) in the present review used general case programing to select stimuli and target responses.

The general case model, developed by Horner et al. (1982), provides guidelines for careful selection of stimuli and responses for inclusion during training. In general case programming, stimuli are selected to represent all the relevant stimulus features and irrelevant features that may be shared by one or more stimuli. These relevant and irrelevant stimuli are incorporated into training to teach the consumer to respond only in the presence of the relevant stimuli and to disregard irrelevant stimuli. Many interventions fail to produce generalized responding because their training stimuli are not representative of the wide range stimulus conditions in the natural environment (Horner et al. 1982). General case programming provides a technology for selecting representative training stimuli and bringing target responses under appropriate stimulus control.

While we generally advocate for the use of the general case analysis, the nature of safety responses may make it even more essential. Take, for example, the potential range of stimuli present when a child encounters a poison (e.g., prescription drugs). These drugs come in a variety of forms (e.g., liquid vs. pill), shapes, and colors, and they may be left out on a table, in a pill case or in a plastic bag. A situation may arise where a parent asks the child to get something from a cabinet where the pills are located, or the child may find them on his or her own. A parent may take a pill in front of their child, and this model could later evoke a similar response from the child. These natural variations in the dangerous stimuli and the environmental conditions make it essential the safety response be under control of all relevant stimulus conditions. While the general case analysis is a useful empirical method for selecting stimuli and responses, researchers should also evaluate how to arrange control over different environmental conditions that will influence behavior in the desired direction (Johnston 1979).

Furthermore, while the majority (69.7%) of studies did include some measure of generalization, there is a need for additional research evaluating long-term maintenance of treatment outcomes. Sixty-three studies (82.9%) included a measure of maintenance. The length of the maintenance period was predominantly one week to 4 weeks (60.4%). Only one experiment (Bannerman et al. 1991; 2.2%) included a measure of maintenance longer than a year. Future studies should include probes several years after treatment implementation to determine whether additional procedural modifications are needed to ensure participants maintain safety responses long term. Identifying effective procedures to teach safety response is of limited value if the likelihood of those procedures to produce long-term maintenance is unknown.

Individual Safety Categories

The following is a discussion of each safety category in terms of important considerations and gaps that exist in the extant literature.

Abduction Prevention

We identified 17 studies (19.5%) that described interventions for teaching abduction prevention responses (Akmanoglu and Tenkin-Iftar 2011; Beck and Miltenberger 2009; Bergstrom et al. 2014; Collins et al. 1992; Fisher et al. 2013; Gast et al. 1993; Godish et al. 2017; Gunby and Rapp 2014; Gunby et al. 2010; Holcombe et al. 1995; Johnson et al. 2005; Ledbetter-Cho et al. 2016; Sanchez and Miltenberger 2015; Summers et al. 2011; Tarasenko et al. 2010; Vanslow and Hanley 2014; Watson et al. 1992). These studies are summarized in Tables 3 and 4. Abduction prevention safety responses consist of teaching an individual what to do if approached by a stranger. The same basic three-part response was taught in all studies in this category: Say “no,” leave the area, and tell a familiar adult. All studies taught participants to emit the safety behaviors in response to four basic lure types. These types are assistance, authority, incentive, and simple (for detailed descriptions of each, see Gunby et al. 2010).

Table 3 An overview of safety studies that address abduction prevention responses
Table 4 An overview of generalization, maintenance, and social validity in studies that addressed an abduction prevention response

The extant literature in this area focused predominantly on toddlers (40.0%) and early adolescents (33.6%). Although abduction prevention should be taught to consumers of all ages, most abduction victims are over the age of 12 (NISMART 2002). In the abduction literature, participants’ ages 12–19 were underrepresented (8.0%). It follows that there is a call for additional research with consumers over this age. Although intervention with younger children may function as a preventative measure, future research should evaluate whether demonstration of an appropriate safety response is affected by the learning histories possessed by older children. The majority of studies were conducted with participants who were neurotypical; this is not surprising, as the majority of abducted individuals are neurotypical. However, over 7000 individuals under the age of 21 who were abducted in 2014 were diagnosed with some types of disability (National Crime Information Center 2014). Further research with individuals with disabilities is warranted to determine whether current strategies are effective or whether supplementary training may be needed (Dixon et al. 2010). Furthermore, all of the studies in this category taught participants to emit the safety response in the presence of unknown adults; however, the majority of abductions are perpetrated by individuals known to the abductee (National Crime Information Center 2014). In terms of stimulus control, it is not strangers that are dangerous, the danger is the behavior of anyone, stranger or known adult who is attempting to take consumer without permission. Future research should consider teaching consumers to engage in the safety response when a known adult who has not been authorized to take them attempts to lure them away.

In terms of generalization, none of the studies in this category used a general case analysis when selecting their training stimuli and responses. A primary consideration when teaching an abduction prevention response should be the selection of teaching stimuli. The stimuli used during teaching should include both relevant and irrelevant stimuli. For example, researchers could consider incorporating situations in which a confederate stranger asks the child for directions or some other types of innocuous question. We identified only one experiment (Bergstrom et al. 2014) that ensured appropriate stimulus control over the abduction prevention response by teaching participants to differentiate between strangers, friends, and family.

Abuse Prevention

We identified nine studies (10.3%) that targeted a safety response related to abuse prevention (Boyle and Lutzker 2005; Egemo-Helm et al. 2007; Haseltine and Miltenberger 1990; Kim 2016; Lumley et al. 1998; Miltenberger and Thiesse-Duffy 1988; Miltenberger et al. 1999; Poche et al. 1981). A summary of results can be found in Tables 5 and 6. Abuse prevention responses are similar to the three-step response taught in the abduction literature with the exception that participants are taught to emit the safety response when someone attempts to touch them inappropriately. The majority of the studies in this category were conducted with early adolescents (28.8%) and adult consumers (36.4%); however, because consumers of any age may be at risk of sexual or physical abuse (Black et al. 2011; Finkelhor et al. 1990), research with consumers younger than five should be conducted. Furthermore, of those studies that reported participant diagnosis, the majority (36,4%) were conducted with participants with developmental disabilities; participants reported as neurotypical were underrepresented (4.5%). More importantly, we did not identify any studies that taught a prevention response related to online threats. While many Internet responsibility organizations suggest the use of parent control software, software deals with the behavior of a caregiver not a potential victim (National Center for Missing and Exploited Children 2017). As of 2012, about 95% of children 12–17 years old were online and one in five teenagers who regularly log onto the Internet reported receiving an unwanted sexual solicitation, but only 25% of those notified a caregiver (Pew Research Center 2016). Wolak et al. (2008) reported that 1 in 25 individuals aged 10–17 years received an online sexual solicitation in which the solicitor tried to make contact offline. Given the research supporting the susceptibility of the teenage population to sexual abuse in an online format, research into safety responses related to this format is warranted.

Table 5 An overview of studies that address abuse prevention responses
Table 6 An overview of generalization, maintenance, and social validity in studies that addressed an abuse prevention response

Fire Safety

We identified 19 studies (21.8%) that taught a safety response related to fire prevention (Bannerman et al. 1991; Barone et al. 1986; Bigelow et al. 1993; Cohen 1984; Garcia et al. 2016; Haney and Jones 1982; Houvouras and Harvey 2014; Jones et al. 1981a, b; Katz and Singh 1986; Knudson et al. 2009; Luiselli 1984; Mechling et al. 2009; Padgett et al. 2006; Rossi et al. 2017; Tiong et al. 1992; Vanslow and Hanley 2014). A summary of these results can be found in Tables 7 and 8. The topography of safety response varied more widely in this category than in the others. Safety responses taught in this category included responding to a fire alarm or vocal alert to a fire (63.2%), responding to a fire-related stimulus (21.1%), reporting a fire (5.2%), and extinguishing a fire (10.5%). Given the variety in the types of responses taught, a brief description of each is warranted. When responding to an alarm was taught as a fire safety response, the response typically consisted of attending to the fire-related stimulus, leaving the area, and going to a designated meeting area (e.g., Bannerman et al. 1991; Bigelow et al. 1993; Jones et al. 1981a, b; Knudson et al. 2009). When participants were taught to respond to the presence of a fire-related stimulus such as a lighter or fire-starting device, the target response consisted of not touching the item, leaving the area, and reporting the item to an adult (Houvouras, and Harvey 2014; Rossi et al. 2017; Vanslow and Hanley 2014). Only two studies taught a response related to extinguishing a fire. Mechling et al. (2009) taught three extinguishing responses (i.e., extinguish with a lid, extinguish with flour, and extinguish with fire extinguisher) that could be used to put out 16 different possible fires (e.g., fire in a pan, in an oven, and metal fire pit). Katz and Singh (1986) taught participants to respond to a fire alarm, extinguish a fire, and report a fire via 911.

Table 7 An overview of studies that address fire safety responses
Table 8 An overview of generalization, maintenance and Social validity in studies that addressed a fire safety response

The use of general case programming may aid researchers in selecting target responses that are both appropriate for consumer’s skill repertoires as well as functional. For example, Bigelow et al. (1993) taught participants to leave the area and go outside when the word “fire” was presented. This may lead to generalization errors in the natural environment as the vocal stimulus “fire” may not be present in all fire situations nor may it always signal something bad. Teaching a consumer to respond to the sound of a fire alarm may prove more functional. If the skills of the consumer are unknown, we recommend conducting some types of functional living skills assessment such as the Assessment of Functional Living Skills (AFLS; Partington and Mueller 2012). This should provide information helpful in selecting target responses.

Clinicians planning to teach responses related to fire injury prevention should consult professional guidelines prior to selecting a safety response to ensure the response is appropriate. Safety responses taught in earlier published articles may no longer align with safe practices. For example, one of the extinguishing techniques Mechling et al. (2009) taught participants to use was flour. Current recommendation advises against the use of flour as it is flammable, and if applied to an open flame in insufficient quantities, it could cause an explosion (National Fire Protection Association 2018). It is therefore essential to consult expert opinion when selecting safety responses.

Firearm Safety

We identified nine studies (10.3%) that taught a firearm safety response (Gross et al. 2007; Himle et al. 2004; Jostad et al. 2008; Lee et al. 2019; Morgan and Miltenberger 2017; Miltenberger et al. 2004, 2005; Rossi et al. 2017). A summary of these results can be found in Tables 9 and 10. The safety response taught in this category was the same across all studies (100%). The response consisted of the consumer not touching the firearm, leaving the area it was encountered, and telling an adult. In evaluating this body of literature, we identified that participants targeted for intervention were all (100%) under the age of 15. Early adolescents (37.5%) made up the largest group of participants in the investigations. However, the majority of deaths that occurred as a result of unintentional discharge of a firearm occurred in individuals ages 15–24 (CDC 2014). It should be noted that exact ages were not provided for many participants (41.7%) making evaluation of this dimension difficult. The disparity in the focus of the extant literature and national statistics indicates a need that is twofold. First, additional research specific to the circumstances surrounding deaths of individuals over the age of 15 is warranted. It is possible that the standard three-step response is not sufficient for consumers of this age, or else other environmental factors (e.g., social influence) may need to be addressed as well. The second need is for more widespread dissemination of effective interventions in the under 15 population as this may reduce the number of deaths later in life. By implementing effective firearm prevention interventions in school children may obtain the necessary safety responses at a young age and avoid unintentional firearm deaths later in life.

Table 9 An overview of studies that address firearm safety responses
Table 10 An overview of generalization, maintenance, and social validity in studies that addressed a firearm prevention response

Help-Seeking Responses

We identified 10 studies (11.5%) that evaluated procedures for teaching a help-seeking behavior when lost in public (Bassette et al. 2018; Bergstrom et al. 2012; Carlile et al. 2018; Hoch et al. 2009; McDowell et al. 2017; Pan-Skadden et al. 2009; Purrazzella and Mechling 2013; Taber et al. 2002; Taber et al. 2003; Taylor et al. 2004). A summary of these results can be found in Tables 11 and 12. Our evaluation of this safety category determined that toddlers and early adolescents are underrepresented in the extant literature (17.4%). Investigating effective strategies with these age groups may provide an avenue for future research. The majority of studies were conducted with consumers with ASD, a warranted focus as some consumers with this diagnosis demonstrate elopement behavior and are at risk of becoming lost (Lehardy et al. 2013). All responses in this category taught a help-seeking response, but the topography of that response varied. Several studies taught participants to seek help using a device (e.g., pager, iPhone 4; Purrazzella and Mechling 2013; Taber et al. 2003; Taylor et al. 2004), while other studies taught participants to exchange an identification card to ask for help (Bergstrom et al. 2012, Hoch et al. 2009; Pan-Skadden et al. 2009). We also suggest additional research evaluating procedures for teaching a consumer to discriminate when he or she is lost. In the extant literature, when teaching a help-seeking response, most researchers (e.g., Taber et al. 2003) included procedures to assess whether each consumer could identify when they were lost. For those consumers who demonstrated this skill, a consumer-initiated response, such as approaching a store clerk for assistance or texting a picture of their location, was taught. For consumers who did not demonstrate this skill, a response initiated by a caregiver was used, whereby the consumer received a page signaling them to seek assistance or a call was placed to their mobile device and they were given instructions. These caregiver-initiated responses, while effective, may not always be feasible, a mobile device might go dead, break, or the consumer’s absence may not be immediately noticed. Consumers who have a repertoire of both caregivers-initiated and self-initiated responses may be able to seek help under a wider variety of environmental arrangements.

Table 11 An overview of studies that addressed help-seeking responses
Table 12 An overview of generalization, maintenance, and social validity in studies that addressed a help-seeking response

Pedestrian Safety

Twelve studies (13.8%) taught a safety response related to pedestrian safety (Batu et al. 2004; Branham et al. 1999; Blew et al. 1985; Brown and Gillard 2009; Collins et al. 1993; Harriage et al. 2016; Horner et al. 1985; Michie et al. 2009; Page et al. 1976; Spivey and Mechling 2016; Stienborn and Knapp 1982; Wright and Wolery 2014). A summary of these results can be found in Tables 13 and 14. In almost all studies (91.6%) in this category, the safety response taught was crossing the street. Although the topography of the streets and crossings differed widely across studies, in general, participants were taught to look for oncoming traffic, attend to relevant stimuli (e.g., traffic lights, crossing signals), and to make their way to the other side of the roadway. Only one experiment addressed a pedestrian safety response other than street crossing. Spivey and Mechling (2016) taught three social safety skills in their evaluation. Participants were taught to respond to a stranger asking for personal information, asking for money, or invading their personal space. There is the need for additional research on safe community behavior such as handling of money in public, responding to invasions of personal space, and avoiding individuals displaying dangerous or suspicious behavior. For individuals with disabilities, it is especially important safety repertoires included responses to situations that may pose a threat to their personal rights.

Table 13 An overview of studies that taught pedestrian safety responses
Table 14 An overview of generalization, maintenance, and social validity in studies that addressed a pedestrian safety response

Areas of Limited Research

Poison Prevention

We identified five studies (5.7%) that taught a safety response related to poison prevention (Collins and Stinson 1994; Dancho et al. 2008; King and Miltenberger 2017; Rossi et al. 2017; Summers et al. 2011). A summary of these results can be found in Tables 15 and 16. Although the literature in this area only made up a small percentage of the studies we evaluated, accidental poisoning is one of the top 10 leading causes of death in individuals under the age of 24. The CDC (2014) data suggest these deaths are caused most frequently by the ingestion of narcotics or exposure to gas (e.g., carbon monoxide). While only preventative measures can be used to reduce instances of the latter, we have identified procedures that may be effective in teaching responses that could prevent the former. Such procedures include BST both in isolation (40%; Rossi et al. 2017) and in combination with IST (20%; Dancho et al. 2008), prompting procedures (40%; Collins and Stinson 1994; Summers et al. 2011) and discrete trial instruction (20%; Collins and Stinson 1994).

Table 15 An overview of studies that address poison prevention responses
Table 16 An overview of generalization, maintenance, and social validity in studies that addressed a poison prevention response

Motor Vehicle Safety

We identified one experiment (1.1%) that taught a safety response related to motor vehicle safety (Himle and Wright 2014). A summary of these results can be found in Tables 17 and 18. Himle and Wright (2014) evaluated the use of BST to teach two teenagers (20%) and eight adults (80%) to correctly install car seats. The diagnoses of the participants were not specified. A trained specialist, certified to provide training on the installation and use of car seats, provided BST. The results of the experiment were positive (i.e., all participants learned to install the car seat in the rear-facing position).

Table 17 An overview of studies that address a motor vehicle safety response
Table 18 An overview of generalization, maintenance, and social validity in studies that addressed a motor vehicle safety response

To increase the likelihood that participant skills would generalize to untrained positions and installation methods, the experimenters programmed for common stimuli by using the same car seat during training and generalization probes. They did not provide information regarding how their training stimuli were selected. The authors evaluated whether participants’ responses generalized to installing the car seat in the forward-facing position; although some generalized responding was observed, none of the participants demonstrated responding at mastery levels. The authors did not provide information on maintenance or social validity.

Training safety responses related to motor vehicle safety provide a unique challenge, as there are no responses a consumer can engage in to prevent themselves from being injured in a crash. Future research might employ general case programming in developing a VM to instruct caregivers to correctly install their child’s car seat.

Sharp Object Safety

We identified one experiment (1.1%) that evaluated procedures for teaching a response related to prevention of injury from sharp objects (Winterling et al. 1992). A summary of these results can be found in Tables 19 and 20. Winterling et al. (1992) taught four adults diagnosed with MR the correct method for throwing away broken items. In their evaluation, participants were taught to remove broken glass items from a filled sink, a counter top, and the floor. The authors evaluated the use of a prompting procedure to teach these safety responses and obtained positive results. Although a measure of generalization was not included, the authors used a general case analysis to select the types of broken items.

Table 19 An overview of studies that addressed a sharp object safety response
Table 20 An overview of generalization, maintenance, and social validity in studies that addressed a sharp object response

Along with additional research in this area, there is a call for research targeting safety responses involving knives. The CDC (2014) reports that all unintentional cut/piercing-related deaths involved a knife. Interventions targeted at this type of unintentional injury could consider training either correct handling of knives or a do not touch, leave, and tell response similar to that taught in the literature on firearm safety.

Emergency Responses

We identified three studies (3.4%) that taught an emergency response (Desrosiers 1987; Spooner et al. 1989; Risley and Cuvo 1980). A summary of these results can be found in Tables 21 and 22. All three of the studies (100%) used prompting procedures to teach individuals with developmental disability and Down syndrome to dial 911 and provide information to the dispatcher. Prompting was effective in teaching all participants the target responses.

Table 21 An overview of studies that addressed an emergency safety response
Table 22 An overview of generalization, maintenance, and social validity in studies that addressed an emergency response

It is of note that the studies in this category were conducted prior to 1990. It is important for future research to continue to conduct research in this area as technological advancements may present new challenges for consumers. For example, consumers may need to know how to reach 911 services across a variety of devices and contingencies. Consumers should be taught to reach 911 using a landline or a cell phone. When using a cell phone, they should be taught to dial using the number pad and the emergency dial function available for when a phone is locked. Preforming a general case analysis may guide researchers and practitioners in ensuring consumers are taught all relevant dimension of the safety response.

Suffocation Prevention

We identified one experiment (1.1%; Barone et al. 1986) that taught a suffocation prevention response. A summary of these results can be found in Tables 23 and 24. Children of three families who were at risk of being removed or would be returned shortly from foster care. A video-modeling and manualized instruction (i.e. Project 12-ways; Terringer et al. 1984). The participating families were shown a VM on stimuli that presented a suffocation risk and how to place these stimuli out of reach. Accidental suffocation continues to be a leading cause of death (CDC 2015), and the continued research in this area is therefore warranted.

Table 23 An overview of studies that addressed a suffocation safety response
Table 24 An overview of generalization, maintenance, and social validity in studies that addressed a suffocation response

Unaddressed Safety Categories

Several important safety responses have gone unaddressed in the literature reviewed. First, helmet use should be taught to consumers of all ages. Nicaj et al. (2009) found that the majority of fatal crashes (74%) in New York City involved a head injury and almost all bicyclists who died (97%) were not wearing a helmet. VR technology may have some useful applications in this area. For example, it could be used to teach safe riding practices such as avoiding opening doors on parked cars, avoiding vehicle blind spots, and not passing on the right. It should be noted that several group studies have evaluated bike safety responses, but as they did not report individual participant data, they were not included in the current review (Hooshmand et al. 2014; Van Houten et al. 2007).

Second, no experiment that fit our inclusion criteria taught a response related to water safety. Despite numerous national and private initiatives such as National Water Safety Month and the Stew Leonard III Water Safety Foundation, one of the top two leading causes of death for children under the age of 2 is drowning (CDC 2014). While teaching consumers to swim may reduce some of these deaths, they should also be taught how to respond should they become an active drowning victim. Future studies may also teach children to avoid water in the absence of adult supervision by engaging in a response similar to the “don’t touch, leave, tell an adult” used in other studies (e.g., Gunby et al. 2010).

General Summary

The current review extends the findings of previous reviews by extending analysis to a wider range of safety categories. Previous reviews focused on a single safety category (Jostad and Miltenberger 2004; Lumley et al. 1998) or a narrow range of related safety categories (Dixon et al. 2010; Doughty and Kane 2010; Mechling et al. 2009; i.e., social safety responses). To date, the current review is the first to attempt to synthesize the information from a wide variety of safety categories into recommendations for researchers and practitioners

Previous reviews noted the need for more systematic description of participant and procedural details (Bevill and Gast 1998; Dixon et al. 2010). The current review suggests that there continues to be an overall lack of systematic description of certain participant and procedural details. Many studies reported using multiple exemplars of stimuli or settings, but did not report the number or provide a description of the exemplars used (32.9%). This information is important to both researchers and practitioners as such details can speak about the efficiency of different interventions. Furthermore, many studies failed to provide specific enough details on participant ages (16.0%) and diagnoses (24.5%) for us to evaluate them along those dimensions. This lack of systematic description prevents future researchers from completing direct and systematic replications of the procedures used in previous studies. To truly evaluate the effectiveness of certain procedures, description must be detailed enough for those procedures to be replicable.

A primary limitation of the current review is that it included only studies that reported individual participant data. The authors acknowledge that by only evaluating within-subject design studies it is possible to overestimate the effectiveness of certain interventions. However, while group design studies may demonstrate the widespread effectiveness of a particular intervention, group means may mask important variation in individual performance. We advocate for future studies to report participant behavior-change data when group interventions are employed to determine what adaptations, if any, may be necessary to ensure all consumers acquire the appropriate responses. Inclusion of individual subject data in group design formats may provide an avenue for evaluating the effectiveness of interventions across a larger number of consumers without missing important variation in individual participant performance. We refer researchers to the format and design used in Silverman et al. (2007). These authors utilized a group design to evaluate the overall effectiveness of employment-based reinforcement on cocaine abstinence while still reporting individual participant data to show important within group variations.

Second, the article identification process used in this review may appear limited in that a hand search of the reference sections of identified articles yielded 28 potential articles that were not identified in the initial search. Our search terms were selected by combining those used in previous reviews and eliminating any that were specific to a single safety category. Our rationale for doing this was that inclusion of all search terms from previous safety response studies across all safety categories would have returned a prohibitive amount of initial search results. Although this may suggest a potential limitation with our initial search, we are confident that the hand search of the reference sections and the cite forward search identified all relevant evaluations.

In summary, the current literature on safety responses training comprises several strong methodologies for teaching safety responses, including BST, IST, prompting procedures, and VM. Taken together, the results of this review suggest that BST plus IST is still the most well researched and effective training method for teaching safety responses. That said, additional research is still needed to determine the full effectiveness and application of interventions such as VR, manualized instruction, and EBI. Results of the current review indicate that those safety responses (i.e., abduction prevention, abuse prevention, firearm safety) covered in previous reviews (e.g., Doughty and Kane 2010; Jostad and Miltenberger 2004) have continued to garner research in the last decade. However, there are many other safety categories that warrant additional research, such as poison prevention, suffocation prevention, water safety, and bike safety.