Keywords

1 Introduction

In this chapter we describe extinction and differential reinforcement, primarily as elements of behavioral interventions for behavior disorders. In the first section, we will describe extinction as an isolated procedure. However, a central theme of this chapter is that extinction should rarely be used in isolation. Among many reasons for this theme include the following: (a) extinction in isolation has potential side effects, (b) pure extinction is often difficult or even impossible to implement with fidelity, and (c) sometimes extinction does not address a primary variable associated with occurrence of the behavior (such as when a medical or physical problem increases the likelihood of escape behavior). When extinction is not implemented with fidelity, problem behavior is intermittently reinforced, making it more resistant to change than ever. That leads to the second main section of the chapter, on differential reinforcement. Another central theme of this chapter is that differential reinforcement is a more logical behavioral intervention in comparison to extinction in isolation. We describe variants of differential reinforcement in the latter section of the paper. A general premise, based on empirical evidence, is to maximize reinforcement in such a way to favor appropriate alternative behavior while minimizing reinforcement for dangerous or destructive behavior, even when it is not technically placed on extinction (Vollmer et al., 2020).

2 Extinction

2.1 Overview

For the purposes of this chapter, extinction is defined as the withholding of a reinforcer that was previously presented contingent on a response, such that there is a decreased probability of that response (Catania, 2013; Cooper et al., 2020). In the context of behavioral interventions, extinction usually involves withholding the reinforcer(s) for problem behavior that has (have) been identified via functional analysis (e.g., Iwata, Dorsey, et al., 1994), which subsequently results in a decrease and (ideally) elimination of the problem behavior. Contrary to common usage, extinction is not just “ignoring” problem behavior, it is the withholding of the maintaining reinforcer for problem behavior. Because the maintaining reinforcer can take many forms (Kuhn et al., 1999; Richman et al., 1998), simply “ignoring” could be incidental to the functional properties of behavior.

The two main components of the extinction definition highlight that extinction is both a treatment procedure and a behavioral process (Iwata, Pace, et al., 1994). Procedural extinction is the withholding of reinforcement previously presented contingent on a response. An example of procedural extinction is continuing to present an instruction (i.e., not allowing escape) when escape-maintained problem behavior occurs. However, if problem behavior does not decrease (and is not eventually eliminated), then the behavioral process of extinction did not occur. An outcome is required to meet the full definition of extinction.

2.2 Functional Variations of Extinction

One important feature of extinction as treatment for problem behavior is that it requires knowledge of the reinforcer maintaining the problem behavior. As decades of functional analysis research has shown, the same topography of problem behavior could be reinforced by (say) attention for one individual but reinforced by (say) escape for another individual. The implications for interventions are significant (Iwata, Pace, et al., 1994), because interventions based on extinction cannot be developed by merely observing the behavioral topography.

2.2.1 Socially Mediated Positive Reinforcement

Extinction of behavior maintained by socially mediated positive reinforcement involves withholding a positive reinforcer that was previously presented contingent on a response, such that withholding it decreases the probability of the response. One example of this is extinction of behavior maintained by attention (e.g., Fisher et al., 2004). Attention that reinforces problem behavior can take many forms, such as soothing statements from a caregiver (Iwata, Dorsey, et al., 1994), reprimands from a teacher (Iwata, Dorsey, et al., 1994), peer attention (Northup et al., 1995), or even eye contact from a therapist (Kodak et al., 2007). Extinction, in such cases, would involve withholding the particular form of attention. For example, if problem behavior is maintained by reprimands, the behavior change agent would withhold reprimands if problem behavior occurred.

As implied in the definition section, we recommend that the term “ignore” should not be used in the context of assessment and treatment of problem behavior, as it may imply to a lay audience that the behavior analyst is suggesting that the behavior should not be monitored. To the contrary, all individuals responsible for implementing an intervention involving extinction should carefully monitor a client who is engaging in problem behavior, to ensure that everyone in the environment is safe (including the person engaging in problem behavior). A more appropriate characterization would be to provide minimal differential consequences for the individual’s behavior. Minimal differential consequences means that the problem behavior produces no (or as little as possible) change in the therapist’s behavior while maintaining safety. For example, if a care provider is attending to a household task when a child throws a toy (suppose that toy throwing is maintained by reprimands), the care provider would continue to engage in the household task and would not provide any differential consequences (i.e., reprimands) for the disruption. There may be times when the behavior requires some sort of physical intervention to ensure the safety of the individual or others in the environment. However, the reaction to attention-maintained behavior should be minimized as much as possible.

Another variant of problem behavior maintained by socially mediated positive reinforcement is when behavior is reinforced by access to tangibles such as toys, snacks, or activities (Beavers et al., 2013). In these cases, extinction involves withholding the tangible item(s) that was (or were) previously delivered contingent on problem behavior. For example, if a child displays problem behavior maintained by access to an electronic tablet, one would withhold access to the tablet that was previously given contingent upon problem behavior, which will result in a decrease in that response (note that emotional side effects can be expected to occur, and this will be discussed shortly).

2.2.2 Socially Mediated Negative Reinforcement

Extinction for negatively reinforced problem behavior, often called escape extinction, involves continuing to present the activity or requirement from which escape was previously delivered contingent on problem behavior (Cooper et al., 2020). For example, if a student displays problem behavior maintained by escape from math instructions, math instructions would continue when instances of the problem behavior occur. This can be applied to a variety of contexts that may be functionally aversive, such as academic (instructional) demands, loud noises, or even the physical presence of certain individuals. However, usage of escape extinction requires very careful ethical consideration. For example, if a student is engaging in escape behavior in the presence of instructional demands, a behavior analyst should evaluate possible reasons that the instructional demands are aversive (Carr & Smith, 1995; Kennedy & Meyer, 1996; Smith et al., 1995). It is possible the individual does not have the skill in their repertoire, in which case continued presentation of the demand does not make sense from an ethical or clinical standpoint, unless the skill is being taught in some other way. Similarly, a loud ambient noise may be distracting or even painful to a particular individual. Keeping the person in the environment for the purposes of extinction, then, may not address the ultimate cause of the behavior (such as an auditory sensitivity).

A notable example of the effectiveness of escape extinction is in the treatment of pediatric feeding disorders, specifically food refusal. Escape extinction, or non-removal of the spoon, has been shown to produce increases in bite acceptance (e.g., Ahearn et al., 1996; Peterson et al., 2016; Piazza et al., 2003). It is noteworthy that successful escape extinction procedures for pediatric feeding disorders are an outcome of an exploratory process wherein potential physical or medical impediments are addressed first or in conjunction with extinction (Ibañez et al., 2020).

At times, instructional activity, self-care activity, medical activity, and so on are aversive if it is presented too frequently, for too long a duration, or when the individual does not have a proper skill set for compliance (Smith et al., 1995). One approach to address this phenomenon is to implement escape extinction along with instructional/demand fading (e.g., Zarcone et al., 1993), wherein the aversive event is presented gradually while extinction of escape behavior is in place. Often, a gradual presentation of the functionally aversive stimulation is combined with positive reinforcement, such as in the case of necessary medical procedures that cannot be avoided (e.g., Shabani & Fisher, 2006).

2.2.3 Automatic Reinforcement

Extinction can also be used in the treatment of automatically reinforced problem behavior. Automatically reinforced behavior produces its own source of reinforcement, independent of the social environment (Vollmer, 1994). Extinction in this case involves either altering the properties of the response so that they no longer produce the reinforcer or blocking the stimulation produced by the behavior (e.g., Rincover et al., 1979). For example, if disruption in the form of toy throwing is maintained by the sound that the toys make when they hit the wall, one could alter the wall by covering it with a pad so that the toys no longer make the noise when thrown. Extinction of automatically reinforced behavior is sometimes more difficult to implement than extinction of socially reinforced behavior. This difficulty arises from the fact that, by definition, automatically reinforced behavior produces its own source of reinforcement. Thus, the specific stimulus features of the reinforcer(s) may not be detectible or otherwise controlled.

2.3 Limitations and Special Considerations

Extinction should rarely if ever be presented in isolation, without the use of differential reinforcement, environmental enrichment, or noncontingent reinforcement. Extinction is limited as an isolated procedure because it can produce side effects that are attenuated when combined with these other (reinforcement-based) procedures (Lerman & Iwata, 1995). Further, in some circumstances, extinction is difficult if not impossible to implement with fidelity, which creates a host of problems, not the least of which is continued (possibly intermittent) reinforcement of the problem behavior (Vollmer et al., 2020). Also, if extinction is implemented without consideration of other contributing variables, the procedure can be unethical. For example, if someone is required to take a bite of food, but they do not have the skill to swallow the food, procedural extinction would be ineffective at the least and harmful in many cases (Ibañez et al., 2020). We describe these general limitations and considerations next.

2.3.1 Side Effects

One of the common side effects of extinction has been referred to as an extinction burst (Lerman et al., 1999; Lerman & Iwata, 1995). An extinction burst is an increase in the frequency, duration, or intensity of behavior that has been placed on extinction (Lerman & Iwata, 1995). In some cases, the burst can be relatively minor, but in other cases, problem behavior can rise to dangerously high levels. Although the extinction burst is usually temporary and decreases over time as the behavior continues to encounter extinction, the initially increased frequency or intensity can put the client or therapist at significant risk. Extinction bursts can be difficult or even unacceptable depending on the resources of the environment, especially if the burst is prolonged.

Related to extinction bursts, extinction can also induce other types of responses. These other responses can be desirable (e.g., novel communication responses) or undesirable (e.g., other topographies of problem behavior). One can use the desirable effects to advantage when shaping new responses and extinguishing previously reinforced approximations. However, in cases when undesirable responses are induced, problems can arise. It might be the case that new problem behavior occurs that is more intense than the behavior that is placed on extinction and, thus, must be reinforced because it is too dangerous. This is an example of inadvertent shaping of problem behavior intensity; the intense problem behavior that contacted reinforcement will be more likely to occur in the future (Fahmie et al., 2017). A common example of extinction-induced problem behavior is aggression, often toward the person implementing extinction. Withholding reinforcement can be an aversive event, so it is not surprising that aggression occurs toward the individual who withheld the reinforcement (Lerman et al., 1999). In fact, basic research on aggression has shown that both presentation of aversive stimulation (as seen in escape extinction) and reinforcer loss/withholding (as seen in extinction of positively reinforced behavior) can induce aggressive behavior, including but not limited to biting of the self or others (Hutchinson, 1977).

Extinction can also produce emotional responding (Lerman & Iwata, 1996b). Individuals may cry, scream, or say unkind things to the therapist or caregiver. Induced emotional responding poses additional challenges, and collectively the potential side effects of extinction make extinction difficult to implement without other treatment components in place. Further, some potential implementers of extinction may find it unacceptable (Ducharme & Van Houten, 1994), and such unacceptability equates to poor social validity (Wolf, 1978). The side effects of extinction can be attenuated by combining the procedure with reinforcement-based procedures (Lerman & Iwata, 1996b), which will be discussed shortly.

2.3.2 Feasibility

There are several reasons that pure extinction in isolation may not be practical or feasible, and when the procedure is not practical or feasible, implementers make mistakes and sometimes continue to reinforce problem behavior. As a result, a schedule intended as extinction may actually be an intermittent schedule of reinforcement for the problem behavior. Some of the reasons that pure extinction may not be feasible include (but are not limited to) the following:

  1. 1.

    The client may be too large and strong, such that physical guidance is not possible or potentially dangerous.

  2. 2.

    The client may be elusive, such that physical guidance is not possible.

  3. 3.

    It may be too dangerous to withhold response blocking (such as for some SIB, elopement, or aggression) even when it is known that physical contact is a reinforcer for a given client’s SIB or aggression.

  4. 4.

    There may be laws or guidelines against the use of physical guidance during escape extinction.

  5. 5.

    There may be laws or guidelines requiring response blocking for SIB.

  6. 6.

    Even if there are no specific laws, ethical guidelines or personal ethics may lead practitioners to opt against physical guidance or to protect the individual through response blocking.

  7. 7.

    The outcome of some behavior (such as observable injury or even unobservable injury) may require medical consultation, which is not available in all settings.

  8. 8.

    If the behavior is automatically reinforced, the specific form of the reinforcer may not be known or if known may not be easily controlled.

  9. 9.

    Even if primary care providers are expertly coached to implement extinction, the individual is likely to encounter many other people who are not, and therefore the behavior is accidentally reinforced (e.g., by grandparents, siblings, family friends, school personnel). Further, even expertly coached care providers will make at least some integrity errors (Marcus et al., 2001).

Combining extinction with differential reinforcement can reduce some of the above problems, because research on concurrent schedules shows that weighting reinforcement to favor appropriate behavior versus problem behavior will shift allocation toward appropriate behavior, even if problem behavior is still reinforced (Athens & Vollmer, 2010; Vollmer et al., 2020). The key strategy is to ensure that appropriate behavior produces greater reinforcement along at least one dimension such as higher rate of reinforcement, shorter delay to reinforcement, greater quality of reinforcement, higher quality of reinforcement, and greater magnitude of reinforcement (e.g., Athens & Vollmer, 2010).

2.3.3 Root Cause

It is critical to identify the operant contingencies of reinforcement maintaining problem behavior in order to implement extinction. However, it is not enough to simply identify those contingencies when addressing severe behavior disorders. Medical and physical variables can interact with operant contingencies in such a way that problem behavior is exacerbated. For example, if a child has difficulty swallowing, they may develop escape behavior in the context of mealtime or food presentation (Ibañez et al., 2020). Similarly, there is some evidence that physiological factors such as allergies (Kennedy & Meyer, 1996), fatigue (Smith et al., 2016), menstrual cycle (Carr et al., 2003), and illness (Carr & Owen-DeSchryver, 2007) could exacerbate dangerous behavior of the sort that is commonly maintained by operant contingencies. Withholding the source of reinforcement without addressing the medical or physical problem has serious ethical implications (Behavior Analyst Certification Board, 2014). Consider an extreme example: a client displays escape behavior in the context of instructions to put on their socks and shoes and to begin walking. Suppose that, unbeknownst to the therapist, the client has a badly bruised (or possibly broken) toe. If a therapist moves directly to extinction and persists with extinction, the actual reason for escape behavior is not addressed: putting on footwear and walking are aversive because there is an underlying medical problem.

At times this “root cause” issue is more subtle. For example, a student may find reading aloud in a classroom to be aversive (Hofstadter-Duke & Daly, 2011) and therefore displays severe escape behavior, reinforced by being sent out of the classroom. If a functional analysis shows that the severe behavior is maintained by escape, a literal interpretation of extinction would involve continuing to require the student to read aloud in the classroom. However, it is possible that the student does not know how to read or is several grade levels behind other students. The persistent requirement to read aloud does not address the core of the problem, which would require individualized instruction on reading, and probably reflects an ethical shortcoming in application (consider, for example, the humiliation the student might experience).

2.4 Using Extinction in Practice

Extinction can be an effective and useful tool to decrease problem behavior. Extinction can make some treatments more effective and can also decrease problem behavior when other treatments have not worked (Rooker et al., 2013). Extinction has also produced impactful effects on the field in some critical areas such as pediatric feeding disorders (e.g., Peterson et al., 2016). Despite its apparent effectiveness as a treatment for problem behavior, as we have discussed, extinction is very rarely used in isolation. Extinction has been used in the context of noncontingent reinforcement (Fisher et al., 2004; Reed et al., 2004; Saini et al., 2017), instructional fading (Zarcone et al., 1993), and differential reinforcement of alternative behavior (DRA; Piazza et al., 2003), among other procedures. Extinction is almost always used in combination with another procedure because of the importance of skill acquisition in the context of behavior reduction, the side effects associated with extinction, and the practical limitations of applying extinction procedures (and resulting problems associated with extinction failures).

We have suggested that when treating problem behavior for individuals diagnosed with autism spectrum disorder (ASD)/intellectually and developmental disabilities (IDD), it may be useful to provide minimized differential consequences for problem behavior. As previously described, minimized differential consequences means that the problem behavior produces no change in the therapist’s behavior (other than what is necessary to protect the individual, others in the environment, or property). To the best of the therapist’s ability (and safety permitting), the therapist should minimize environmental changes when problem behavior occurs. However, this intervention alone is unlikely to produce a complete reduction in problem behavior, especially in the absence of skill acquisition procedures designed to increase the client’s adaptive repertoire.

Relating to escape extinction, we have emphasized the importance of careful exploration of why a certain event or set of events functions as aversive stimuli. Blanket usage of escape extinction without detailed exploration and analysis at multiple levels has serious ethical implications. It is critical to understand why the event or events are aversive. Some examples of such considerations are listed here, but this list is by no means exhaustive: (a) the activity produces some sort of pain state for the individual, (b) the individual does not have the necessary skills in their repertoire, or (c) the individual is experiencing physical limitations (such as difficulty swallowing or grasping).

In short, due to potential side effects, extinction should be combined with procedures involving reinforcement. Relatedly, due to feasibility concerns wherein it is difficult and sometimes impossible to implement extinction perfectly (and, hence, an intermittent schedule of reinforcement for problem behavior is in place), it is important to minimize reinforcement for problem behavior while maximizing reinforcement for alternative behavior along as many dimensions of reinforcement as possible (e.g., rate, duration, immediacy, quality). Further, extinction should only be considered after other contributing variables have been identified, not only the maintaining reinforcers. A functional analysis is a first step, but an evaluation of medical variables, instructional context, and skill level is equally critical.

3 Differential Reinforcement

3.1 Overview and Forms of Differential Reinforcement

Differential reinforcement is one of the most commonly used behavior change procedures (MacNaul & Neely, 2018; Petscher et al., 2009; Vollmer et al., 1999; Weston et al., 2018). Differential reinforcement is typically defined as reinforcing some response(s) and not reinforcing other responses (Catania, 2013; Cooper et al., 2020; DeLeon et al., 2013; Vollmer & Iwata, 1992). When defined in this way, however, differential reinforcement is procedurally constrained to the use of reinforcement and extinction. Although extinction is a common component when implementing differential reinforcement procedures, many successful applications have occurred without pure extinction (review Trump et al., 2019). Implementing differential reinforcement can be viewed as a concurrent-operant arrangement that involves applying different schedules of reinforcement to two or more responses (Fisher & Mazur, 1997). In other words, it is accurate to view differential reinforcement as any procedure that involves two or more schedules of reinforcement that vary along some dimension (e.g., reinforcer duration, reinforcer quality, delay to reinforcement) across different responses, whereby response allocation favors the programmed schedules of reinforcement (Athens & Vollmer, 2010). Several procedural variations of differential reinforcement exist; however, the most common differential reinforcement procedures are differential reinforcement of alternative behavior, differential reinforcement of other behavior, and differential reinforcement of low rate behavior (Cooper et al., 2020; Vollmer & Iwata, 1992).

3.1.1 Differential Reinforcement of Alternative Behavior

DRA is the most commonly used differential reinforcement procedure (Petscher et al., 2009). Traditionally, as it relates to treating problem behavior, DRA has been described as reinforcing some specific alternative behavior, while placing problem behavior on extinction (Vollmer & Iwata, 1992). A more recent definition, which takes into account the problems associated with implementing pure extinction, describes DRA as “providing greater reinforcement, along at least one dimension, contingent on the occurrence of one form or type of behavior, while minimizing reinforcement for another form or type of behavior” (Vollmer et al., 2020, p. 1300). Thus, DRA involves modifying parameters of reinforcement such that the alternative response receives greater reinforcement than another response (for the purposes of this discussion, problem behavior). In other words, DRA need not be constrained to explicit reinforcement of a target response and extinction for the problem behavior (as previously defined by Vollmer & Iwata, 1992). When DRA is implemented, even without perfect extinction, robust effects can still be obtained when treatment integrity failures occur because the schedule of reinforcement favors appropriate behavior (e.g., Athens & Vollmer, 2010; Brand et al., 2019).

Sometimes DRA procedures are labeled based on the type of alternative behavior that is reinforced. One such example is differential reinforcement of incompatible behavior (DRI). DRI involves selecting an alternative response that is physically incompatible with the target behavior selected for decrease (e.g., Young & Wincze, 1974). Another procedural variant is functional communication training (FCT), wherein the alternative response is always some form of communication (e.g., Carr & Durand, 1985).

The DRA approach is also a key component for establishing new skills in an individual’s repertoire. DRA plays an essential role in shaping new responses or differentially reinforcing successive approximations to a terminal response. For example, a therapist might reinforce successive approximations to the word “tunes” as a mand for music (e.g., Bourret et al., 2004). The vocal utterance “t-” is followed by a positive reinforcer, but then placed on extinction once a closer approximation “tu-” contacts reinforcement. This process would continue until the terminal goal of “tunes” is achieved.

3.1.2 Differential Reinforcement of Other Behavior

Differential reinforcement of other behavior (DRO) involves delivering a reinforcer when a target response does not occur during a specified observation period (Catania, 2013; Reynolds, 1961). DRO is sometimes referred to as omission training (Uhl & Garcia, 1969) or differential reinforcement of no responding (e.g., Poling & Ryan, 1982). The contingencies of a DRO may involve a reset (i.e., timer restart) or no reset (i.e., no timer restart) of the interval when the target response occurs. If a reinforcer unrelated to the function of behavior is used, the implementation of DRO involves a procedural extinction component (i.e., withholding a positive reinforcer unrelated to the function of problem behavior). If the reinforcer maintaining problem behavior is used, the implementation of DRO involves functional extinction (i.e., withholding the reinforcer identified to maintain problem behavior).

At least four potential underlying mechanisms for the effectiveness of DRO have been proposed: (1) repeated delivery of the reinforcer may serve as an abolishing operation that momentarily suppresses the target response, (2) extinction, (3) negative punishment (because scheduled reinforcers are, in a sense, “lost” contingent on the occurrence of behavior), and (4) the strengthening of alternative responses due to adventitious reinforcement (Jessel & Ingvarsson, 2016; Poling & Ryan, 1982). A DRO contingency indicates when reinforcement is delivered based on the interresponse times (IRTs) that are either equal to or greater than the specified interval length (as described by Lindberg, Iwata, Kahng, 1999; Lindberg, Iwata, Kahng, & DeLeon, 1999). Commonly, DRO interval lengths are determined by calculating the mean IRT for a specified number of sessions (Poling & Ryan, 1982) to systematically establish and then to thin the reinforcement schedule. There are two primary procedural variations of DRO: interval DRO and momentary DRO.

For both procedural variations of DRO, there is a specified interval that requires either continuous (interval) or discontinuous (momentary) observation; a reinforcer is delivered contingent on the absence of the target response. Interval DRO involves continuous observation of the target response during a specified interval (which can remain constant, vary, or progressively increase) and then delivering the reinforcer if the target response does not occur at any point during the interval. Momentary DRO involves delivering a reinforcer if the target response does not occur at the end of the interval (or the exact “moment” of observation). Lindberg et al. (Lindberg, Iwata, Kahng, 1999, Lindberg, Iwata, Kahng, & DeLeon, 1999) described and compared the effects of fixed interval, variable interval, and variable-momentary DRO on rates of self-injury. Fixed interval DRO involves a constant interval duration. Lindberg et al. withheld functional reinforcers when self-injury occurred and provided functional reinforcers when self-injury did not occur during a constant time interval specified for each session. For example, if self-injury (the target response) did not occur during a 10 s interval, then an edible (positive reinforcer) was delivered; if self-injury occurred during the interval, then no reinforcer was delivered. Variable interval DRO has varied interval durations, based on an average value. Lindberg et al. administered the same procedures as described in the fixed interval DRO condition, except the interval lengths varied. For the variable-momentary DRO condition, Lindberg et al. withheld the functional reinforcer only if self-injury occurred at the end of a specified interval. Thus, the functional reinforcer was delivered if self-injury was not occurring at the end of the interval (i.e., self-injury could occur at other times during the interval). All three variations of DRO (fixed interval, variable interval, and variable-momentary) were equally effective in reducing self-injury maintained by social-positive reinforcement.

3.1.3 Differential Reinforcement of Low Rate Responding

Differential reinforcement of low rate responding (DRL) involves delivering a reinforcer for low rates of behavior, rather than total response suppression (Ferster & Skinner, 1957). Sometimes the goal is to maintain the behavior at low rates or slowly decrease the response criterion, rather than the total elimination of the response. Thus, this procedure is particularly useful when targeting behavior that should be maintained but is perhaps occurring too frequently or rapidly. Similar to DRO, IRT is a relevant measure when implementing DRL (as described below). The three primary procedural variations of DRL include full session, interval, and spaced responding (Becraft et al., 2017; Deitz, 1977).

For all the procedural variations of DRL, there is a specified observation period during which a predetermined criterion of (low) responding must be met for a reinforcer to be delivered. Full-session DRL involves delivering a reinforcer following a full session (e.g., treatment session, appointment, observation window) during which the target response occurs at or below a predetermined criterion. Austin and Bevan (2011) observed elementary school-aged children during 20-min classroom sessions and differentially reinforced low rates of requests for attention from the teacher (e.g., hand raising, calling out for the teachers). For example, on average, one student requested her teacher’s attention nine times during baseline sessions; however, during the DRL condition, the teacher only delivered a reinforcer if the student requested the teacher’s attention three or fewer times. Interval DRL involves delivering a reinforcer when the target response occurs at or below a predetermined criterion following a specified interval length. For example, Deitz et al. (1977) observed disruptive behavior during a 30-min session and divided the session into 2-min intervals. If disruptive behavior occurred one or zero times during a 2-min interval, the student received a star. If the disruptive behavior occurred more than once during the interval, the interval was reset. The stars were exchangeable for playtime at the end of the session. Spaced responding DRL involves delivering a reinforcer based on a predetermined IRT (i.e., a predetermined amount of time must pass between a response and a subsequent response; Deitz, 1977). For example, Lennox et al. (1987) combined response interruption and spaced responding DRL to increase the time between bites of food (i.e., to reduce rapid eating). Any attempt to have a bite of food before 15 s elapsed was interrupted by blocking; therefore, 15 consecutive seconds were required to occur between bites of food. Additionally, Becraft et al. (2017) combined schedule-correlated stimuli and spaced responding DRL (and compared this condition to full-session DRL and DRO), which reduced bids for attention in a simulated classroom. In both examples, it is clear that the target responses must occur at some level. For example, complete extinction of self-feeding or classroom participation is not the goal. Thus, this procedure’s utility mainly relies on selecting responses that should persist at a socially valid or a medically safe level.

The DRL approach is considered a time-intensive procedure (Cooper et al., 2020). Practitioners can choose this procedure when the response does not require immediate response suppression and can withstand incremental changes. It is appropriate for responses that do not require complete elimination (e.g., reducing rapid eating; Wright & Vollmer, 2002). Practitioners should aim to prevent (i.e., implement safety procedures) or eliminate the occurrence of dangerous behavior that places the individual or others at risk. The procedure is not designed to gradually wean an individual off of problem behavior when the aim is complete reduction. Although incremental change when treating severe behavior disorders is a possible outcome of behavioral treatment, practitioners should not deliberately plan for gradual progress in these cases. The type of DRL selected for the response depends on the terminal goal and schedule of reinforcement required to produce an effect. For example, full-session DRL seems most useful when individuals can follow instructions (e.g., “if you only raise your hand three times, you can earn playtime.”), and the delivery of the preferred stimulus can be delayed. Interval and spaced responding DRL might be useful when the response necessitates a denser schedule of reinforcement. Spaced responding DRL, specifically, seems more useful when IRT is particularly important (e.g., seconds between bites).

3.2 Functional Variations of Differential Reinforcement

Functional variations of differential reinforcement include differential positive reinforcement, differential negative reinforcement, and differential automatic reinforcement (Cooper et al., 2020; Vollmer & Iwata, 1992). Although these procedures can be applied as either DRA or DRO, we will use primarily examples of DRA in our discussion (for reasons that we will subsequently clarify relating to practical implementation of differential reinforcement).

3.2.1 Differential Positive Reinforcement

Most commonly, differential positive reinforcement is used to treat behavior maintained by positive reinforcement, such as attention or tangibles (e.g., Pizarro et al., 2021). The logic behind this approach is that if the alternative behavior produces the reinforcer previously maintaining problem behavior, the alternative behavior functionally replaces the problem behavior, which either is placed on extinction or otherwise produces a minimal outcome. An example of this approach involves varying the duration, quality, or delay when accessing positive reinforcers contingent on problem behavior or alternative behavior (Athens & Vollmer, 2010). For example, Athens and Vollmer (2010) provided qualitatively different forms of attention contingent on aggression (reprimands) and exchanging a picture card to obtain an adult’s attention (praise and physical interaction).

Another application of differential positive reinforcement is to use positive reinforcement even when behavior is maintained by negative reinforcement (Lalli et al., 1999). The logic behind this approach is that the use of positive reinforcement may reduce the aversiveness of the instructional context, and the positive reinforcement for behavior such as compliance with instructional activity might compete with the negative reinforcement in the form of escape (i.e., if it is a higher-quality reinforcer). For example, Slocum and Vollmer (2015) compared the effects of providing escape (the reinforcer maintaining problem behavior) and edibles (reinforcers previously unrelated to problem behavior) contingent on compliance. During both treatments, the problem behavior continued to produce escape. The results demonstrated that problem behavior decreased more substantially, and compliance increased more substantially in the condition where compliance was followed by positive reinforcement (edible delivery).

3.2.2 Differential Negative Reinforcement

Differential negative reinforcement is used to treat behavior maintained by negative reinforcement. The logic behind this approach is that by providing escape or avoidance contingent on alternative behavior (such as compliance, functional communication), the alternative behavior functionally replaces the problem behavior, which would be placed on extinction or otherwise produce minimal escape. An example of this approach involves providing a 60s break from instructions contingent on compliance and delivery of another directive contingent on problem behavior (e.g., Ringdahl et al., 2002). Alternatively, differential escape intervals (240 s break following compliance, 10 s break following problem behavior) can increase compliance and reduce problem behavior (Rogalski et al., 2020).

3.2.3 Differential Automatic Reinforcement

Differential automatic reinforcement is most commonly used to treat behavior maintained by automatic reinforcement (reinforcement not delivered via social mediation). The logic of this approach is that by bringing alternative behavior into contact with alternative sources of reinforcement (e.g., toy play, music, activity), it will functionally replace at least some amount of problem behavior. Because the problem behavior produces its own source of reinforcement, it is sometimes difficult to minimize that source of reinforcement. As a result, researchers have examined an approach known as a competing stimulus assessment (see Haddock & Hagopian, 2020). In a competing stimulus assessment, one can evaluate (a) whether a stimulus is highly preferred, as indicated by high levels of engagement, and (b) whether engagement with a stimulus suppresses instances of the problem behavior, as indicated by low levels of problem behavior when the item is available (Haddock & Hagopian, 2020).

Differential automatic reinforcement, even when based on a competing stimulus assessment, may require some additional components. One is that some individuals with automatically reinforced problem behavior do not have repertoires that bring them into contact with appropriate sources of automatic reinforcement, such as play skills. As a result, it is sometimes critical to explicitly teach a skill or set of skills that ultimately produces automatic reinforcement (e.g., Britton et al., 2002; Leif et al., 2020). Another is that, for some individuals, engagement with highly preferred items does not necessarily suppress the occurrence of problem behavior (e.g., review Gover et al., 2019; Lindberg, Iwata, Kahng, 1999; Lindberg, Iwata, Kahng, & DeLeon, 1999; Piazza et al., 1998; Ringdahl et al., 1997). As a result, differential automatic reinforcement is sometimes combined with response blocking (e.g., Lerman & Iwata, 1996a; Lindberg, Iwata, Kahng, & DeLeon, 1999; Lindberg, Iwata, Kahng, 1999; Roscoe et al., 2013) or response interruption (e.g., Gibbs et al., 2018; Shawler et al., 2020). Examples of these problems and potential solutions can be seen in Vollmer et al. (1994). Three children participated in the study. One child displayed SIB that was entirely replaced by toy play with a preferred toy. A second child required explicit reinforcement of toy contact to learn play skills that subsequently competed with SIB. A third child also required explicit reinforcement for toy contact but further required a response blocking procedure to reduce SIB to acceptable levels.

3.3 Limitations and Special Considerations

Differential reinforcement procedures can be limited in ways similar to how extinction procedures are limited (see the list above under limitations and special considerations for extinction). However, these limitations brought about by side effects, feasibility, and consideration of root causes are less pronounced when using differential reinforcement because alternative means of obtaining reinforcement are explicitly arranged and taught.

It is important to note that DRL is limited to use with a relatively restricted range of behavior disorders: those that are problematic only because the behavior occurs too frequently. Thus, DRL is most commonly used for behavior such as rapid eating, talking out in class, and other topographies that should not be extinguished entirely. As a result, most general types of behavior disorders are not treated using DRL. DRO is limited because it is highly sensitive to treatment integrity failures in the form errors of commission (e.g., Mazaleski et al., 1993). For example, even if someone refrains from reinforcing problem behavior 95% of the time it occurs (which sounds on the surface like good integrity), the problem behavior is still reinforced on a variable ratio (VR) 20 schedule. A VR 20 schedule of reinforcement could easily sustain high levels of behavior for some individuals. Further, DRO does not explicitly arrange for reinforcement of alternative behavior, so it is not always clear which behavior is being reinforced. As a result of these limitations and special considerations, implementation in practice would focus largely on DRL in restricted circumstances, DRO probably only in conjunction with reinforcement of new or alternative skills, and nearly continuous application of DRA-like contingencies throughout an individual’s daily routine (Vollmer et al., 2020).

3.4 Using Differential Reinforcement in Practice

Our conclusion, based on the literature summarized above, is that DRL and DRO are valuable procedures but used in special circumstances and as adjuncts to DRA. To the contrary, DRA is a general “lifestyle” of interactions between a care provider and an individual. By translating the interpretation of DRA expressed by Vollmer et al. (2020) into practice, DRA circumvents many of the limitations of extinction and differential reinforcement described previously. DRA is not restricted to placing one response on extinction and reinforcing another response. It is possible to present greater reinforcement for alternative behavior even when problem behavior continues to be reinforced (e.g., greater magnitude, higher quality, longer duration, more immediate). Also, practitioners need not select one and only one topography of alternative behavior to reinforce. Appropriate behavior of all sorts (e.g., communication, play skills, self-care skills, academic skills) can and should be richly reinforced to compete with the reinforcement schedules maintaining problem behavior. DRA is not only well supported for the treatment of problem behavior but is also essential for establishing skills (Grow & LeBlanc, 2013; Vladescu & Kodak, 2010).

Because the DRA procedure does not necessitate perfect execution to maintain treatment effects (e.g., Brand et al., 2019), treatment integrity errors become less detrimental as long as DRA is implemented with high levels of integrity at the onset of treatment (St. Peter Pipkin et al., 2010; Vollmer et al., 1999). More specifically, errors of omission (i.e., withholding reinforcement for an alternative response) are less problematic than errors of commission (i.e., reinforcing problem behavior) or both errors in combination (St. Peter Pipkin et al., 2010). To this end, it is clear that DRA produces robust effects that maintain even in the face of at least some treatment integrity failures. Thus, DRA is flexible enough to operate throughout the day as a lifestyle, where differential schedules can be moderated loosely (so long as the schedules generally favor appropriate or target responses).

Establishing an alternative response may require a dense schedule of reinforcement at the outset of treatment (e.g., Greer et al., 2016). Thus, practitioners should plan for systematic schedule thinning to ensure that the alternative response is occurring at a rate that is feasible to reinforce and to avoid the resurgence of problem behavior (e.g., see Hagopian et al., 2011). Finally, DRA does not require additional time expenditure (because it occurs in naturally occurring situations) or the use of gadgets (such as re-setting timers). Practitioners or caregivers might equate decreased time expenditure and decreased “setup” with decreased response effort. Response effort is a factor that practitioners often consider, as it might impact caregiver’s adherence to treatment recommendations (Allen & Warzak, 2000). Presenting DRA as a lifestyle that caregivers can integrate into their daily interactions with their child, family member, student, or client might increase acceptability and, therefore, adherence to DRA as a treatment recommendation.

It is also important to consider the use of differential positive reinforcement when treating escape-maintained behavior. This approach is notable because a more commonly discussed route to treating escape-maintained problem behavior involves the use of differential negative reinforcement. Although differential escape intervals (e.g., Rogalski et al., 2020) or teaching an individual to ask for a “break” can reduce problem behavior, there are some less favorable implications of adhering strictly to this “functional match” treatment approach. When the demand context remains aversive, it precludes individuals from learning in more favorable conditions and can potentially limit the rate of engagement in learning activities. Further, the arrangement essentially requires an acceptance that instructional activity should be aversive, which seems counterintuitive to good instructional practices (e.g., review possible implications for practice proposed by Haq & Aranki, 2019). Thus, as a comprehensive treatment for escape-maintained behavior, (a) features of the instructional context must be carefully examined to determine why instructional activity is aversive, (b) the instructional context should then be modified or arranged such that it is less aversive, and (c) the use of differential positive reinforcement is useful as it has been shown to engender less escape behavior even when problem behavior is not fully placed on extinction (e.g., Lalli et al., 1999; Slocum & Vollmer, 2015).

Using DRA in combination with other procedures has also produced favorable results when targeting problem behavior maintained by automatic reinforcement. Leif et al. (2020) identified stimuli that could potentially compete with automatically reinforced problem behavior (e.g., hand mouthing). However, item engagement was relatively low when participants were provided with noncontingent access to different leisure items and, therefore, problem behavior persisted. Including prompting (i.e., vocal and physical support to interact with items) in conjunction with DRA significantly increased item engagement, which permitted the identification of multiple competing stimuli. In this case, simply providing positive reinforcers (edibles) contingent on 10 s of manipulating an item established a sustained item engagement repertoire, which permitted identifying stimuli that successfully suppressed problem behavior.

4 Conclusions

Extinction and differential reinforcement are central procedures and processes that have been tested and used effectively for many years. Extinction presented in isolation can create a range of practical and even ethical problems. By combining extinction and reinforcement (i.e., differential reinforcement), many of these problems and limitations associated with extinction can be circumvented. We have ultimately concluded that a general differential reinforcement approach, in which reinforcement for appropriate behavior is presented richly and reinforcement for problem behavior is minimized, is best practice.