Introduction

Resurgence is the return of a previously reinforced operant behavior with the worsening of more recently available alternative reinforcement and associated response conditions (see Lattal et al., 2017; Shahan & Craig, 2017). The worsening of alternative conditions has been demonstrated when a previously reinforced and extinguished target response increases as an alternative response encounters reduced reinforcer availability through extinction or decreases in alternative-reinforcer rate or magnitude (e.g., Craig et al., 2017a; Winterbauer & Bouton, 2012). Furthermore, resurgence can also result from introducing punishment contingencies (Fontes et al., 2018) or greater response effort (Wilson et al., 2016) for engaging in the alternative response. Because operant responding returns after elimination with extinction, resurgence indicates that extinction does not erase or destroy original learning. Instead, the elimination of responding during extinction reflects a change in performance due to new learning. As a result, the worsening of alternative conditions producing resurgence is related to a class of other extinction phenomena, including changes to contextual stimuli producing renewal, re-presenting reinforcing events or related stimuli producing reinstatement, and time away from experimental conditions producing spontaneous recovery (see Bouton et al., 2021, for a review). Identifying variables contributing to resurgence is scientifically important not only because of its relevance to understanding fundamental behavioral processes contributing to choices in changing environments but also to understanding factors contributing to relapse of clinically relevant behavior (see Wathen & Podlesnik, 2018).

The phenomenon of resurgence is clinically relevant because behavioral interventions designed to eliminate problem behavior typically arrange reinforcement for more appropriate behavior (e.g., Higgins et al., 2013; Tiger et al., 2008). During treatments of challenging problem behavior for individuals diagnosed with intellectual and developmental disabilities (e.g., autism spectrum disorder [ASD]), differential reinforcement of alternative behavior (DRA) procedures provide reinforcement for appropriate behavior (e.g., polite requests, card exchange) while typically also arranging extinction of problem behavior (e.g., tantrums, self-injury). Resurgence of the target problem behavior can occur when alternative-reinforcement conditions worsen (1) by programming reinforcement thinning to make DRA treatments more manageable or (2) inadvertently from treatment-integrity errors resulting in the omission of alternative-reinforcer presentations following appropriate behavior (see Briggs et al., 2018; Muething et al., 2020; Volkert et al., 2009). Likewise, any reductions in delivery of alternative reinforcement during behavioral interventions for substance abuse disorders, such as phasing out contingency management (e.g., Silverman et al., 1999), would result in reduced incentives for appropriate behavior and increased likelihood of resurgence. Identifying variables contributing to resurgence can lead to the development of approaches to enhance the long-term effectiveness of behavioral interventions arranging alternative reinforcers.

Basic and preclinical research can serve to identify variables influencing resurgence that are of foundational and clinical interest. Some of this research arranged preclinical models to develop and assess procedures designed to produce more durable behavioral interventions by mitigating relapse (e.g., Shvarts et al., 2020; Trask, 2019) and other research has evaluated conceptual and quantitative frameworks from which to assess theories and fundamental behavioral processes potentially underlying resurgence (e.g., Bai et al., 2017; Bouton, 2019; Nevin et al., 2017; Podlesnik et al., 2022; Shahan & Craig, 2017). The first apparent laboratory report of the resurgence of operant behavior was presented by Carey (1951) but the procedures arranged in Carey’s report are somewhat atypical (cf. Reed & Morgan, 2007). For two groups of rats, Carey arranged responding on the same bar across three phases, with reinforcement across the first two phases and extinction in the third phase. The rats were required to make a single response per reinforcer or two temporally spaced responses per reinforcer, with these response requirements counterbalanced between the first two phases across groups. During extinction testing, Carey reported a decrease in the response pattern from the more recent phase and, more important, a return in the originally reinforced response pattern—the effect now termed resurgence.

As in Carey (1951), modern laboratory models of resurgence similarly take the form of three phases (e.g., Epstein, 1983; Leitenberg et al., 1970; Winterbauer & Bouton, 2010). Figure 1 shows a basic and typical set of procedures arranged to examine resurgence, along with hypothetical data. During Phase 1, Training typically includes a baseline in which a target response is acquired from contingent reinforcer deliveries. In laboratory models assessing relapse, Training models the baseline levels of problem behavior established under natural conditions. During Phase 2, Elimination typically includes extinction of the target response and initiating reinforcement of an alternative response. Elimination simulates a behavioral treatment, such as DRA, often resulting in the decrease or elimination of target responding and acquisition of alternative responding. Finally, Testing in Phase 3 models the worsening environmental conditions that challenge the long-term maintenance of behavioral treatments (Nevin & Wacker, 2013). As shown in the figure, the most basic form of Testing for resurgence involves worsening alternative conditions by arranging extinction of an alternative response (e.g., Epstein, 1983). There is typically a transient increase in target responding, the resurgence effect, with both responses progressively decreasing with additional exposure to Testing across time/sessions (see Podlesnik & Kelley, 2014, 2015, for a discussion of other response patterns during Testing).

Fig. 1
figure 1

Hypothetical data across phases of a resurgence procedure. Target and alternative (Alt) responses produce either reinforcement (RFT) or extinction (EXT) across Training, Elimination, and Testing

Since Carey's (1951) initial report, there have been hundreds of experimental studies demonstrating the generality of conditions in which resurgence occurs (see Kestner et al., 2018a; Shahan & Craig, 2017; Wathen & Podlesnik, 2018, for reviews). Participants include multiple species of nonhumans and different populations of humans with and without clinical diagnoses. Designs have included a variety of between- and within-subjects comparisons using a range of control conditions. The influence of a wide variety of conditions on the size, pattern, and reliability of resurgence effects have been examined, including differences in learning history, reinforcer types, response types, contingencies for delivering alternative reinforcers, and methods for worsening alternative-reinforcement conditions. Finally, there have been numerous approaches to analyzing resurgence data, both to define resurgence and evaluate whether a resurgence effect occurred relative to control conditions, groups, and responses. Existing reviews conducted on basic research in the resurgence literature convey the generality of resurgence. However, these reviews have been narrative and either conceptual (e.g., Kestner & Peterson, 2017; Lattal et al., 2017; Pritchard et al., 2014) or theoretical (e.g., Shahan & Craig, 2017; Shahan & Sweeney, 2011). As a result, these reviews have not (1) attempted to characterize comprehensively the procedural and analytic methods used in the basic and preclinical experimental literature on resurgence or (2) taken steps to counter potential biases (e.g., selection biases) and threats to generality (e.g., inability to replicate findings revealed in the review). Systematic review, in contrast, features exhaustive search procedures (e.g., ancestral, individual hand searches) and communicates the procedures necessary to replicate the findings of the search (e.g., specific search criteria, Boolean operators). This approach provides the opportunity to exhaustively review the range of procedural and analytic methods used in the resurgence literature in a transparent, standardized, and replicable manner (e.g., Gilroy et al., 2017, 2018; Perrin et al., 2021). The outcomes generated from this synthesis of basic and preclinical research on resurgence can be used by basic researchers to develop novel questions and approaches to study resurgence, whereas preclinical and clinical researchers could identify potential approaches to combating relapse during behavioral interventions. Finally, we point out issues to consider and make some recommendations for future basic, preclinical, and clinical research based on our findings.

Research Questions

The purpose of the current review was to comprehensively and systematically report the procedural and analytic methods used to study resurgence in basic/preclinical research. We characterized participants, designs, procedures, and analyses used in basic and preclinical experimental research on resurgence. Based on the findings from this review, we characterize patterns and trends in procedural and analytic methods used in the extant literature and discuss areas for further research. Therefore, we address the following research questions in this review:

  • RQ1) What species and populations have been examined in studies of resurgence?

  • RQ2) What experimental designs have been used to examine resurgence?

  • RQ3) What procedural manipulations have been arranged when examining resurgence?

  • RQ4) How has resurgence been defined empirically during data analysis?

  • RQ5) What approaches have been used to analyze resurgence data?

Methods

Literature Search Methods

We conducted a systematic search of the available literature to evaluate research examining resurgence phenomena, as shown in Fig. 2. The search methods used in this study were consistent with the guidelines presented in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach. The databases included in the original search consisted of Education Resources Information Center (ERIC), PSYCInfo, Medline, and ScienceDirect. The specific keywords and Boolean operators provided to each of the databases consisted of the following: resurgence AND relapse OR operant OR extinction OR reinforcement dated 2020 or earlier. Following the identification of suitable articles, ancestral searches were performed to examine references for other potentially relevant works. Upon the completion of ancestral searches, hand searches for all journals that featured resurgence-related content were manually searched to identify whether relevant works were published but not yet indexed in databases.

Fig. 2
figure 2

Flow chart of review

Study Selection

Keyword searches for all databases were performed independently by two of the study authors. This initial phase of the search consisted of screening titles and abstracts to determine whether they were potentially eligible for inclusion in the study. For potential studies which had full-text resources available, two authors independently reviewed the methods and results to confirm that the work was suitable for inclusion in the review. Across each phase of the review, disagreements between raters were resolved via discussion until a consensus was reached. The initial searches of the ERIC, PSYCInfo, Medline, and ScienceDirect databases between April and May 2021 resulted in 27, 343, 150, and 3,060 articles, respectively.

Criteria for Study Inclusion

Studies eligible for inclusion in the review met several criteria. In particular, eligible articles presented novel empirical research, examined operant behavior, were available electronically in full text, written in English, were peer-reviewed, did not examine clinically relevant problem behavior, and included all elements of a resurgence procedure. We excluded experiments that examined clinically relevant behavior because a recent systematic review examined research relevant to the populations serving in these experiments (see Perrin et al., 2021).

To include all elements of a resurgence procedure, at least one assessment (e.g., group, condition) within the experiment must include the following elements:

  1. 1.

    During Training, a target behavior was reinforced.

  2. 2.

    During Elimination, contingencies were designed to decrease target behavior by removing the reinforcer maintaining target behavior and arranging alternative reinforcer deliveries differently from Training.

  3. 3.

    During Testing, the contingency governing alternative reinforcer deliveries changed from Elimination in a way designed to assess potential increases in target responding.

Given that published review articles did not constitute empirical research, these were not featured as elements summarized in the final results (e.g., Lattal et al., 2017; Shahan & Craig, 2017). However, the reference lists of these works were reviewed to determine whether supporting works were eligible for inclusion in the study. Other examples of articles that did not meet our inclusion criteria included those that did not reinforce an explicit target response during Training (Cançado et al., 2017), did not remove the reinforcer maintaining target responding during Elimination (Bouton et al., 2017; Houmanfar et al., 2005; Nall et al., 2019; Nall & Shahan, 2020), or only arranged Training followed by Testing with extinction, thereby omitting Elimination (Herrick, 1965; Mechner et al., 1997; Mechner & Jones, 2011).

Coding Strategy

All studies deemed eligible for inclusion in the research synthesis were coded along several dimensions to address the research questions, as described in the sections below. Note the inclusion of definitions and additional outcomes of the review appear in Supplemental Materials. Variables coded during this study can be searched using the spreadsheet generated during this research and using an interactive table available online (https://smallnstats.com/resurgence).

Participant Characterization

We identified the population of participants within each experiment. We recorded different species and different populations of participants within species in terms of age/development (e.g., children vs. adults) and source of recruitment (e.g., university vs. crowdsourcing). We also coded humans based on whether a specific diagnosis was included when describing the participants in the articles.

Experimental Methodologies

Fixed-procedural characteristics referred to aspects of methods that were not manipulated during experiments—they were not independent variables. These characteristics included the number of experiments, groups, and group sizes. We determined the number of experiments per article based on how the data were analyzed. In some cases, experiments were designated as “subexperiments” (e.g., 1a, 1b) and considered separate experiments because the data from the experiments were primarily analyzed separately (e.g., Nighbor et al., 2018; Trask, 2019). In other cases, subexperiments were considered a single experiment because data from those experiments were analyzed together (e.g., da Silva et al., 2008). Also, if experiments from one article (Podlesnik & Shahan, 2010) were included from an earlier article (i.e., Podlesnik & Shahan, 2009), we only included those experiments when coding the earlier article. When experiments included different numbers of participants across groups, we recorded the highest and lowest sample sizes. Other characteristics included method of arranging experimental sessions across time (e.g., single session, multiple sessions) and number of testing data points when determining resurgence (single or multiple); method of changing phases (fixed or performance-based; and types of experimental designs used to examine independent variables, including specific within-subject manipulations and whether two or more phases were replicated directly within subjects.

Procedural Manipulations

The section on procedural manipulations included the greatest number of dimensions to code, including antecedent-stimulus manipulations, approaches to mitigate resurgence, control conditions or groups, and aspects of responses and consequences (e.g., reinforcement, punishment). We primarily included variables that were manipulated in more than one experiment within the literature. For example, we report below that most experiments that included an inactive control response did not manipulate the number of control responses available but a small minority of experiments did (e.g., Cox et al., 2019; Diaz-Salvat et al., 2020). However, for simplicity of organization, we also coded for characteristics of reinforcement schedules and reinforcer types in this section, which included variables that were not manipulated but relevant to those dimensions. When coding for characteristics of reinforcement schedules, we coded whether reinforcers were delivered via free-operant behavior or within discrete trials, the presence of a changeover requirement, and the use of response sequences. Likewise, the presence or absence of using backup reinforcers for points or tokens (e.g., money, course credit) was not manipulated within experiments but was relevant to characterizing the types of reinforcers arranged across resurgence experiments.

Definitions of Resurgence

All experiments included at least one criterion to define whether resurgence occurred. Examples include comparisons of target responding between Elimination and Testing, with control groups or assessments, relative to control responses, unspecified increases in target responding, and others. Articles typically identified relevant criteria when presenting results (e.g., Hernandez et al., 2020) but some articles identified the criteria defining resurgence as part of the analytic strategy before presenting results (e.g., Kuroda et al., 2020).

Analyses of Resurgence

All experiments analyzed data by examining response patterns from individual participants and/or employed statistical analyses. These analyses were conducted on one or more direct measures (e.g., response rate) or derived measures (e.g., proportions, differences), which we also coded. Finally, we reported whether theoretical frameworks were used to simulate effects or were fit to data, including behavioral momentum theory (Nevin et al., 2017), resurgence as choice (Shahan & Craig, 2017), and a theory of choice and generalization (Bai et al., 2017).

Interrater Reliability

All prospective studies were independently screened, inspected, and scored by the first and second authors. Studies screened, inspected, and confirmed to be eligible for inclusion were scored based on the coding strategy listed in the previous section. The data extraction procedures were performed with the aid of a specialized spreadsheet instrument. This tool supported raters in examining the relevant features of included works as well as in assessing agreements and disagreements. After each rater independently scored respective works, the spreadsheet detected disagreements, and these disagreements were resolved via discussion until consensus was reached and 100% agreement was demonstrated across raters.

Results

As shown in Fig. 2, an initial search of the included databases returned a total of 3,580 results. Results of initial reviews revealed that 809 (22.6%) articles were relevant to the research questions. From this quantity, 608 of the search results were found to be duplicated entries (75.15%; n = 808-608 = 201 unique entries). Full-text resources were then inspected and 131 articles (65.17%) were found to be relevant to the research questions. Ancestral searches were performed for each of these studies and yielded a total of 26 additional articles. Hand searches for all relevant venues yielded an additional 16 articles. Among the total 173 works determined relevant to resurgence, 120 were empirical studies and 53 were review articles.

Overview

The 120 empirical articles included in this review represent 200 distinct experiments across 20 different journals spanning the years 1970 to 2020. As a result of so many experiments meeting our inclusion criteria, this review primarily presents the prevalence of the different participants, methods, and analyses used in these experiments.Footnote 1

The outcomes of the review can be examined in several ways. First, tables present counts and percentages out of the 200 experiments, and representative example experiment(s) of the categories described in the text. Second, the online interactive table can be used to organize and identify all experiments meeting user-specified criteria from categories examined in this review. Finally, the Supplemental Materials present additional data and figures. Tables appearing in this main document have corresponding figures with the same counts and percentages as in the text or tables. In addition, the Supplemental Materials provide more detailed narrative with additional published examples of experiments meeting the different criteria when relevant. The Supplemental Materials also include definitions of the variations of criteria included within the different sections of this review.

Figure 3 shows cumulative articles across years and the number of articles per year. The rate of publication remained steady and relatively infrequent with at most 1–2 articles published per year for the first 35 years. Over three fourths (76.7%) of all articles on resurgence were published in the last 10 years of this review, from 2011 to 2020.

Fig. 3
figure 3

Counts of articles shown cumulatively and per year

RQ1: Participant Characterization

We characterized participants to convey the breadth and limitations of populations examined in the resurgence literature, which could indicate novel populations to examine. Table 1 shows that 13 different nonhuman and human populations comprise participants in resurgence experiments. Therefore, resurgence as a behavioral phenomenon is general across species and populations, but rats, pigeons, and university students together make up the vast majority (88.5%) of participants out of the 200 experiments included in this review. Overall, 149 experiments (74.5%) included nonhumans as participants. Of the 51 experiments (25.5%) including human participants, 10 experiments (5.0%) included individuals diagnosed with a developmental disability, with two of those experiments (1.0%) employing a combination of children with and without diagnoses—the remaining 41 experiments (20.5%) employed typically developing adult humans (e.g., university students).

Table 1 Population counts, percentages, and examples of participants

RQ2: Experimental Methodologies

We report the experimental research designs and other fixed procedural features arranged within experiments when examining resurgence. In other words, we examined the prevalence of fixed design and procedural features that were not arranged as independent variables manipulated across basic and preclinical evaluations of resurgence (e.g., groups, replications). Table 2 shows experiments organized based on design type and whether participants were humans or nonhumans.

Table 2 Search results by design and participant

Fixed Procedural Characteristics

Refer to Supplemental Materials and the online interactive table for the number of experiments per article and groups per experiment.

Session Arrangement

Table 3 shows that experimental sessions were arranged either within or between days. Including one or more sessions per day across multiple days was by far the most common approach, followed by the entire experiment arranged during a single visit, and then multiple individual sessions within a single day.

Table 3 Session and testing arrangement during experiments

Testing Arrangement

Single data points during Testing allow for the comparison of initial differences in resurgence effects whereas multiple data points offer the evaluation of response patterns over time. Table 3 also shows that multiple data points were more commonly arranged during Testing than single data points.

Phase Changes

Rules for transitioning from Training to Elimination, Elimination to Testing, and to end Testing were based either on performance factors (e.g., stability criteria) or were fixed, independent of performance, and based on the amount of time, number of sessions, number of trials, or number of reinforcers earned. Table 4 shows that (1) more experiments arranged fixed criteria than performance-based criteria across conditions and (2) the prevalence of using fixed criteria increased across conditions and use of performance-based criteria decreased across conditions. Two additional experiments (1.0%) not included in Table 4 were categorized as “other” for Training because they arranged both performance-based and fixed criteria across multiple successive assessments of Training, Elimination, and Testing.

Table 4 Criteria to change phases

In addition, 14 experiments (7.0%) also not presented in Table 4 included an additional phase during Elimination that arranged extinction of target responding before introducing and reinforcing an alternative response (e.g., Winterbauer & Bouton, 2011, Expt 1–3). This manipulation has been used to examine processes involved in resurgence (see Shahan & Sweeney, 2011, for a review).

Experimental Designs

Table 5 shows the types of experimental designs used to examine resurgence within basic and preclinical experiments. Within-subjects designs were most common. Less prevalent were between-subjects (group) designs, combinations of within- and between-subjects designs, and an inductive design.

Table 5 Experimental designs

Within-Subjects Manipulations

Within-subjects designs were defined as experiments examining both individual-subject data (e.g., single-subject designs) and data from multiple participants summarized as a single group with no other comparison groups. Thus, other than Multiple Approaches, the present section identifies within-subject manipulations arranged in isolation within an experiment in the absence of other within- or between-subject manipulations. Table 5 shows, in part, the types and prevalence of within-subjects manipulations arranged across all 200 experiments.

  • None. These experiments did not include any within-subject manipulations within or across phases when assessing resurgence, other than arranging the relevant within-subject contingency changes to assess resurgence across Training, Elimination, and Testing. Some of these experiments provided demonstrations of resurgence with novel variables (e.g., participant population, procedural feature) or systematically replicated a previously published report. The remainder of experiments identified in Table 5 arranged either at least one within-subject manipulation of an independent variable, as described next, or employed between-subjects or inductive designs.

  • Multiple Assessments. These experiments arranged multiple assessments, either by directly replicating two or more phases (see below) or by arranging different conditions within one or more phases across successive assessments. To elaborate upon the latter, these included Training, Elimination, and/or Testing with variables changed (1) across successive exposures to one or more of those phases or (2) across time within a phase.

    • Phase Replications. Direct replications are a specific form of Multiple Assessment repeating any two or more phases of Training, Elimination, and/or Testing across successive presentations of phases. Table 6 shows the prevalence of experiments arranging phase replications. Note that some experiments including phase replications also were categorized as Multiple Approaches because they included other within-subjects manipulations (see below).

  • Multiple Schedule. In the experiments arranging multiple schedules, they presented within-session alternations of two or more discriminative stimuli. Multiple schedules facilitated the examination of the influence of separate contingencies or other events presented within the component stimuli.

  • Multiple Responses. These experiments arranged the successive differential reinforcement of more than one response or response sequence within or between phases. These manipulations facilitated the examination of primacy and recency effects in the resurgence of operant behavior

  • Concurrent Schedule. These experiments arranged for two target responses to be available simultaneously throughout Training, Elimination, and Testing. Concurrent schedules facilitated the examination of the influence of separate contingencies or other events simultaneously across phases.

  • Multiple Approaches. Finally, these experiments examined the effects of more than one independent variable on resurgence using combinations of two or more of the within-subjects designs described above.

Table 6 Direct replications of phases

Between-Subjects Designs

Between-subjects designs arrange for different groups of participants to receive different manipulations or levels of an independent variable within one or more of the Training, Elimination, and Testing phases.

Combinations of Designs

These designs examined manipulations of one or more independent variables between subjects while also manipulating one or more other independent variables using one of the within-subjects designs described above.

Inductive Design

We described an “inductive” design as experiments using the prevalence of resurgence effects for a group exposed to one set of conditions to determine changes to the independent variables of interest arranged for a subsequent group. This process has been repeated until either a certain predetermined prevalence of resurgence was met within a given group or the experiment was terminated without meeting the prevalence criterion.

RQ3: Procedural Manipulations

Basic and preclinical experiments evaluating resurgence have examined a wide range of stimulus-, response-, and reinforcer-based independent variables across Training, Elimination, and Testing. This section characterizes these variations across these resurgence experiments.

Antecedent-Stimulus Conditions

The influence of some form of antecedent-stimulus change was assessed in 24 experiments (12.0%; e.g., Podlesnik et al., 2019, Expt 1–3). These experiments arranged at least one within- or between-subject change in the contextual or discriminative stimuli across one or more phases of Training, Elimination, and Testing.

Resurgence Testing

Testing conditions examine the conditions influencing resurgence. These included 165 experiments (82.5%) that assessed resurgence exclusively by arranging alternative reinforcement during Elimination and then completely removing alternative reinforcement during Testing (e.g., Hernandez et al., 2020, Expt 1–2). Twenty-nine experiments (14.5%) examined other conditions and have broadened or refined the range of variables influencing resurgence (e.g., Nighbor et al., 2020, Expt 1–2). Of these, most arranged within- or between-subject comparisons to compare different reinforcement or stimulus conditions that might contribute to resurgence. Lastly, six experiments (3.0%) examined only a single Testing condition other than simply eliminating alternative-reinforcer deliveries but nevertheless this line of research demonstrated resurgence occurs under nonextinction conditions (Bachá-Méndez et al., 2007, Expt 1–2). The Supplemental Materials provide detailed descriptions and examples of approaches used to examine resurgence other than with extinction in isolation, both through the worsening of alternative conditions and through changes to the alternative conditions.

Mitigation Techniques

Experiments examining methods to decrease resurgence relative to typical resurgence tests with extinction facilitate both the understanding of behavioral processes underlying resurgence and, in some cases, can model potential components of interventions under well-controlled conditions compared with clinical settings (see Wathen & Podlesnik, 2018). Table 7 shows the prevalence and examples of procedures that could be considered approaches designed to mitigate resurgence in basic and preclinical research.

Table 7 Types of mitigation techniques

Thinning/Decreased Reinforcement

These experiments examined the effects of gradual reductions in the rate, magnitude, or immediacy of alternative reinforcement on the resurgence of target responding, in contrast to abruptly eliminating alternative reinforcement with extinction.

Context/Stimulus Changes as Treatment Cues

These experiments arranged antecedent or consequence stimuli during Elimination and examined whether also presenting those stimuli during Testing influenced resurgence.

Response-Independent Reinforcer Deliveries

These experiments examined whether the reinforcer delivered during Elimination would decrease resurgence if presented response independently during Testing.

Punishment

These experiments examined whether punishment contingencies arranged during Elimination relative to no punishment contingency influenced resurgence, including the use of shock with nonhumans and point loss or timeout presentations with humans.

Extended Elimination

These experiments modeled various differential-reinforcement treatment durations to examine whether longer durations of Elimination could mitigate resurgence relative to shorter durations.

Multiple Alternatives

These experiments examined whether reinforcing multiple alternative responses during Elimination could mitigate resurgence relative to the more typical approach of arranging only a single alternative response.

Drug Effects

These experiments, exclusively with rats, examined whether presession injections of drugs could decrease resurgence of target responses either to test potential pharmacotherapies in a resurgence model of relapse of drug use or to examine the neuropharmacology contributing to resurgence and relapse in general.

On/Off Contingencies

These experiments arranged repeated alternations between reinforcement and extinction of alternative responding during Elimination and examined resurgence during extinction during Testing.

Abstinence Contingency

These experiments arranged, during Elimination, a contingency in which engaging in the target response delayed the availability of response-contingent alternative reinforcers. These methods were designed to model interventions for drug abuse based on contingency management-based interventions (e.g., Higgins et al., 2013).

Multiple Approaches

These experiments examined more than one of the resurgence-mitigation strategies described above, in particular the presence versus absence of on/off contingencies and different durations of Elimination, or extended elimination.

Control Conditions

A critical component for evaluating whether and/or the extent to which experimental manipulations contribute to resurgence effects are the inclusion of appropriate control conditions. Table 8 shows the prevalence of five different manipulations observed in the basic and preclinical literature that provide control conditions during resurgence procedures.

Table 8 Control conditions

Inactive Control Responses

One procedural control arranges opportunities to engage in the one or more inactive response(s) throughout Training, Elimination, and Testing but no reinforcement is available for responding. Any increases in control responses during Testing typically have been interpreted as induced variability, rather than a resurgence effect (see Lattal & Oliver, 2020, for a critical review). As shown in Table 8, the number and form of control responses varied across experiments.

Typical Resurgence Procedure

Some experiments examined a “typical” resurgence effect to compare with the effects of novel manipulations of independent variables on resurgence. These controls included conditions or groups omitting any additional experimental manipulations other than target reinforcement during Training, a consistent source of alternative reinforcement and removal of the reinforcer maintaining target responding during Elimination, and Testing with an extinction contingency.

No Alternative Reinforcement during Elimination

These experiments arranged a simple extinction contingency for target responding without alternative reinforcement during Elimination. This control can identify (1) how effectively an alternative source of reinforcement contributes to decreasing target responding during Elimination and (2) the smallest degree of change in target responding that could accompany the transitioning from Elimination to Testing.

Omission of Training

These experiments omit Training and present only Elimination and Testing. Omission of Training identifies whether a history of reinforcement for target responding versus other processes influences increases in target responding during Testing. If target responding increases with the omission of training, then increases in target responding with the inclusion of Training might only reflect induced variability or other processes (see Lattal & Oliver, 2020).

Presenting Alternative Reinforcement during Testing

Finally, presenting alternative reinforcement throughout Testing identifies the smallest degree of increase in target responding during the transition from Elimination to Testing.

Response Characteristics

We characterized the type of target, alternative, and inactive control responses arranged in basic and preclinical resurgence experiments across several dimensions. This section first focuses on the topography of responses and then reports whether experiments included an alternative response during Training.

Target- and Alternative-Response Topographies

Table 9 shows that experiments typically arranged the same target- and alternative-response topography but others arranged different topographies or compared different types of alternative responses. Table 9 also shows that most experiments arranged one alternative response but others arranged no alternative response, such as in conjunction with response-independent or DRO contingencies. Some experiments arranged multiple alternative responses and others compared different numbers of alternative responses. Finally, Table 10 shows the different response topographies arranged for target and alternative responses. Examples can be identified in the online interactive table and Supplemental Materials.

Table 9 Comparison of target- and alternative (Alt)- response topography and number of alternative responses
Table 10 Types of responses

Control-Response Topography

Table 10 also shows the different control-response topographies, with examples available from the online interactive table and Supplemental Materials. Most experiments did not arrange any kind of inactive control response. The most common control responses involved manipulanda, followed by responses on a computer screen, a computer keyboard, an activity, and breaking a photosensor. Finally, experiments with children examined control–response topographies in participants’ repertoires that were never reinforced during experimental sessions, including emotional or other responses likely functionally equivalent to target responding.

Presence or Absence of Alternative Response during Training

Most experiments did not include an alternative response during Training. Only 75 experiments (37.5%) included an alternative response during Training (e.g., Podlesnik et al., 2019, Expt 2). In these experiments, no rationale typically is provided for including or excluding the alternative response during Training. In contrast, an additional three experiments (1.5%) arranged a comparison that included the presence vs. absence of an alternative response during Training (e.g., Rawson et al., 1977, Expt 2). In the three miscellaneous experiments (1.5%) that did not fit with the above categories, the alternative responses included skills that might or might not have been in participants’ repertoires and a comparison between different Elimination procedures (e.g., math, Williams & St. Peter, Expt 1–2).

Reinforcement Schedules

We examined whether the reinforcement schedules arranged between Training and Elimination were identical, different, or arranged a comparison. A total of 117 experiments (58.5%) arranged a different reinforcement schedule between Training and Elimination (e.g., Galizio et al., 2020) and 61 experiments (30.5%) arranged the same schedule (Craig et al., 2020, Expt 1–2). The remaining 22 experiments (11.0%) arranged a comparison of reinforcement schedules between Training and Elimination (e.g., Sweeney et al., 2014, Expt 1–2), with Training and Elimination schedules being the same in at least one assessment and different in at least one assessment

Target and Alternative Reinforcement Schedules

Table 11 shows the reinforcement schedules arranged during Training and Elimination. Most experiments arranged a type of partial-reinforcement schedule, a comparison of different reinforcement schedules, or continuous reinforcement within the Training and Elimination phases. Other experiments arranged during Training and Elimination for reinforcers to be presented contingent upon combinations of contingencies, duration of responses, and the relative frequency of a response. The experiment labeled “Other” arranged reinforcement across participants according to a range of continuous-and partial-reinforcement schedules during Training and Elimination but the schedules were not examined as an independent variable. Progressive ratios and response-dependent plus response-independent reinforcer deliveries were unique to Training. In contrast, omission schedules, response-independent schedules, lag schedules, and engaging in an activity were unique to Elimination.

Table 11 Types of target and alternative reinforcement schedules

Other Reinforcement-Schedule Characteristics

Table 12 shows procedural features related to the relations between responding and reinforcer deliveries that were not formally a component of the reinforcement schedules. These included the use of free-operant versus discrete-trial procedures, the use of changeover requirements between responses, and requiring response sequences to obtain reinforcement.

Table 12 Other characteristics of response–reinforcer contingencies

Deceleration Procedures during Elimination

Table 13 presents the type of procedure arranged to decrease target responding during Elimination. Most basic and preclinical experiments arranged extinction of target responding while reinforcing an alternative response (DRA), whereas others arranged a comparison of different procedures. Other experiments exclusively arranged omission (DRO) contingencies, reinforced a different response sequence during Elimination than during Training, presented alternative reinforcers response independently (i.e., noncontingent reinforcement [NCR]), or arranged extinction of target responding in isolation before reinforcing an alternative response in isolation. What is less common is when experiments reinforced target responding in one component of a multiple schedule and an alternative response in another component during Training or reinforced different response durations within chain schedules.

Table 13 Deceleration procedures arranged during elimination

Reinforcer Types

We report the prevalence of experiments arranging identical, different, or a comparison of reinforcer types between Training and Elimination. Most experiments, with 170 experiments (85.0%) arranged the same type of event as the target and alternative reinforcer (e.g., Diaz-Salvat et al., 2020, Expt 1–3). In contrast, 26 experiments (13.0%) arranged different events as target and alternative reinforcers (e.g., Cook et al., 2020, Expt 1–2). One other experiment (0.5%) arranged combinations of reinforcers during Training but only edible reinforcers during Elimination (Craig et al., 2018).

Target- vs. Alternative-Reinforcer Type

Table 14 presents the prevalence of target and alternative reinforcer types—see Supplementary Materials or the interactive online table for specific examples of experiments arranging the different reinforcer types. Edible/food reinforcers were considerably more common than other reinforcer types, with point deliveries with human participants being the next most prevalent. The prevalence of edible reinforcers increased from Training to Elimination, largely driven by experiments with nonhumans arranging drug self-administration during Training and nondrug food reinforcers during Elimination. The other reinforcer types are described in greater detail with examples in Supplementary Materials.

Table 14 Reinforcer types

Backup Reinforcers

Most experiments did not arrange backup reinforcers during Training (179 experiments, 89.5%) and Elimination (181 experiments, 90.5%) but arranged positive or negative reinforcers previously demonstrating effectiveness as a consequence of operant behavior (see above). In contrast, 40 experiments (20.0%) with human participants arranged within-session earnings of points or stimulus presentations on a computer screen with no demonstrated functional relevance. Therefore, some of these experiments arranged backup reinforcers delivered sometime following sessions or participation with the purpose of enhancing control by within-session events (see Hackenberg, 2009, 2018). Table 14 shows the prevalence of arranging backup reinforcers contingent upon within-session performance through providing postsession access to money, edibles, the opportunity to earn a gift card or money through lotteries, or access to unspecified but empirically demonstrated preferred items. Of the experiments with humans arranging in-session earnings of points or stimulus presentations, 19 experiments (9.5%) during Training and 21 experiments (10.5%) during Elimination arranged no backup reinforcers (see online interactive table).

Of the 38 experiments (19.0%) employing university students as participants, 26 of those experiments (13.0%) provided course credit for participating in research. Of those 26 experiments, 21 experiments (10.5%) provided only course credit, which was contingent on the duration of participation with no other performance-contingent backup reinforcers (e.g., Bolívar & Dallery, 2020). In contrast, five the experiments arranging course credit also arranged backup reinforcers contingent upon in-session performance (e.g., Podlesnik et al., 2020).

Punishment Types

A total of nine experiments (4.5%) examined the effects of punishment on resurgence within basic and preclinical research. Of those nine experiments, four experiments (2.0%) examined shock presentations with nonhumans (e.g., Rawson & Leitenberg, 1973). With humans, three experiments (1.5%) examined the effects of response cost (e.g., Okouchi, 2015), one experiment (0.5%) examined negative performance feedback (Wilson & Hayes, 1996), and one experiment (0.5%) examined timeout (Houchins et al., 2022).Footnote 2

RQ4: Definitions of Resurgence

Conclusions about whether resurgence occurred can depend on how researchers define a resurgence effect. In fact, 60 experiments (30.0%) used more than one approach to defining resurgence. We reported 12 different approaches used to define whether resurgence occurred during Testing, as shown in Table 15.

Table 15 Criteria used to define a resurgence effect

The most common approach in the basic and preclinical research referred only to unspecified increases in target responding with no reference to, for example, target responding in other phases or relevant control responses. Other approaches defined resurgence as increases in target responding during the first assessment during Testing (e.g., session, time period) relative to the last assessment during Elimination, target responding being greater than levels of inactive-control responding during Testing, greater levels of target responding across multiple assessments during Testing relative to the last assessment during Elimination, greater levels of target responding across multiple assessments during Testing relative to levels of target responding across multiple assessments during Elimination, and greater levels of target responding relative to target response rates occurring in a control group or control assessment. Less common approaches to defining resurgence included target responding being greater than alternative responding during Testing, greater levels of responding during an isolated Testing session relative to levels of responding during an isolated Elimination session, target responding being statistically greater than chance levels, target responding occurring at least one time during Testing, or the highest rate of target responding during any session of Testing exceeding the rate of target responding during the last session of Elimination.

RQ5: Analyses of Resurgence

An additional important component to defining what patterns of data constitute resurgence are the analyses used to verify those effects. Analyses were comprised of statistical methods or visual inspection, direct recording of events versus those calculated or normalized relative to other anchoring data, and the use of quantitative theoretical frameworks to simulate or analyze model fits.

General Analytic Strategy

Researchers have used different approaches to analyzing within- and/or between-subject data to determine whether independent variables investigated in this basic and preclinical research contributed to the occurrence, reliability, and size of resurgence effects. Table 16 shows the different general analytic approaches used across experiments. Most experiments either analyzed data using visual inspection to examine data from individual subjects within an experiment or a traditional frequentist approach to inferential statistics that test null and alternative hypotheses. Other experiments used both visual inspection of individual-subject data and a frequentist approach to statistical inference. Finally, one experiment used mixed-effects modeling.

Table 16 General analytic strategy, direct measures, and derived measures used to evaluate resurgence effects

Specific Measures of Resurgence

As was the case with definitions of resurgence, different measures can provide insight into different aspects of resurgence data. As such, Cançado et al. (2016) provided an in-depth description of uses for different types of measures of resurgence. Direct measures present resurgence data as direct reporting of behavioral events, in contrast to derived measures that report resurgence data through comparison with other events.

Direct Measures

Table 16 presents the prevalence of the use of direct measures. A vast majority of experiments reported one or more direct measures, with 53 experiments (26.5%) using multiple direct measures to analyze resurgence data. Most experiments reported response rate or count followed by responses emitted across time with cumulative records. Direct measures used relatively infrequently recorded whether intervals included a target or other response during Testing, latency to engage in a target response during Testing, the number of changeovers between response options, response duration, and the prevalence in counts of participants engaging in particular response patterns across Training, Elimination, and Testing.

Derived Measures

Table 16 shows measures that present resurgence data derived through comparison with other events, typically responding under different contingencies. Sixty-five experiments (32.5%) reported one or more derived measures of resurgence. The most common derived measure presented target responding during Testing as a proportion/percentage of target responding during Training. The next most common approaches were to examine target responding (1) by subtracting responding during Elimination from responding during Testing, (2) as a proportion/percentage of all other response options during Testing, (3) during Testing as a proportion/percentage of responding during Elimination, and (4) as a function of the range of reinforcer rates arranged across assessments.

The remaining derived measures were used relatively infrequently, including examining the difference in target responding between two multiple-schedule components during Testing as a proportion/percentage of target responding during Training. Another calculated the correlation between target responding during Testing and other measures. Other experiments reported levels of variability in target responding during Testing using a U-value statistic, the number of instances in which target responding occurred during Testing as a proportion of total opportunities to engage in target responding during Testing, response force during Training, Elimination, and Testing as a percentage of each participant’s maximum force recorded during a pretraining assessment, and the proportion of response sequences meeting the lag contingencies arranged to obtain reinforcer deliveries.

Quantitative Analyses

Use of theoretical models is standard across sciences. Theories allow researchers to precisely quantify and directly compare the effects of variables on underlying behavioral processes (Mazur, 2006; Nevin, 1984; Shull, 1991). They also allow researchers to summarize existing findings and make predictions about how and why variables should affect measures based on model assumptions. Models used in research on resurgence include behavioral momentum theory (see Nevin et al., 2017), a generalization model of choice performance (see Bai et al., 2017), and resurgence as choice (RaC; Shahan & Craig, 2017).Footnote 3 A detailed evaluation of the models used in research on resurgence is beyond the scope of this review and these details have been presented in the citations above and elsewhere. It is also worth noting that a conceptual model based on contextual changes and the renewal effect also underlies research on resurgence (see Podlesnik & Kelley, 2015; Shahan & Craig, 2017; Trask et al., 2015, for reviews). This contextual model is not incorporated into this systematic review of basic and preclinical research employing theory because no formal quantitative analyses specifically identify its use.

Table 17 shows the prevalence of using three different models in basic and preclinical research. These models were used to make predictions about resurgence based on model simulations and/or to identify the influence of specific behavioral processes on resurgence through fits to data. Behavioral momentum theory was used most frequently in simulations and model fits.

Table 17 Quantitative theoretical frameworks employed in resurgence experiments

Discussion

We conducted the first systematic review of the basic/preclinical laboratory resurgence literature from 1970 to 2020. From the 200 experiments spanning 120 empirical articles, we reported the participants, research-design elements, procedural manipulations, definitions of resurgence, and the types of analyses used to characterize resurgence. This area of research is growing, with rates of publication generally increasing across years, particularly since approximately the year 2010. This literature demonstrated broad generality of resurgence across populations and experimental designs, which underlies our understanding of factors likely influencing aspects of dynamic behavior outside the laboratory, both in clinically relevant populations (e.g., Briggs et al., 2018; Muething et al., 2020) and in situations requiring problem-solving (Shahan & Chase, 2002; Williams & St. Peter, 2020). The present review can serve as a starting point in organizing this literature across a wide range of categories. Therefore, it can serve as a stimulus for conducting further research, including additional empirical research, more focused and detailed reviews and meta-analyses, and theoretical development. In addition, this review could serve as a template for systematic reviews of other literature examining relapse phenomena, including studies of renewal, reinstatement, and others.

RQ1: Participant Characterization

We showed that basic and preclinical experiments on resurgence have been examined across a wide range of species and therefore appears to be a general phenomenon. However, nearly 9 out of 10 experiments have been conducted with the same three populations (i.e., rats, pigeons, and university students). There have been no examinations of resurgence, for example, in species of invertebrates, amphibians, or reptiles. Thus, further research within these other populations across the animal kingdom is needed to provide a comparative analysis of the conservation of processes underlying resurgence effects. Nevertheless, it is important to note that, by excluding clinical research in the context of behavioral interventions such as DRA (e.g., Kestner & Peterson, 2017; Perrin et al., 2021; Radhakrishnan et al., 2020, for reviews), the present review underrepresents the generality of resurgence in humans, particularly with individuals diagnosed with developmental disabilities.

RQ2: Experimental Methodologies

The breadth of within- and between-subject experimental designs used to examine a multitude of variables in basic and preclinical research suggests that researchers have a powerful set of methods to understand variables and processes involved in resurgence. We identified that a majority of experiments employed within-subjects designs, which generally are effective and efficient for examining functional relations between independent and dependent variables because each participant serves as one’s own control (see Iversen, 2013; Sidman, 1960). Within-subject designs can eliminate the variability in data that comes with assessing the effects of variables between groups of participants.

There are some limitations, however, to using within-subjects designs when attempting to examine multiple variables contributing to resurgence. Repeated exposure to stimuli and/or contingencies comprising these procedures can confound resurgence findings and, as a result, make interpreting effects difficult. In particular, repeated presentation of one or more of the phases comprising resurgence procedures can decrease (e.g., Kestner et al., 2018a; Podlesnik et al., 2020) or less commonly increase (e.g., Cleland et al., 2000; Redner et al., 2022) resurgence across subsequent tests. As a result, examining the effects of novel procedures hypothesized to reduce resurgence, such as mitigation techniques, relative to a standard set of resurgence phases could produce different outcomes depending on whether the novel procedure was arranged first or second. Some within-subject designs provide appropriate alternatives that could mitigate some of these concerns. For example, researchers could arrange a single exposure to Training and Elimination followed by relatively rapid alternation between Testing conditions (e.g., Kimball et al., 2018; Shvarts et al., 2020, Expt 1–2), or that each component of a multiple schedule could present different variables during relevant phases (e.g., Kuroda et al., 2016; Lambert et al., 2015). The benefits of within-subjects designs must be weighed against the cost that learning acquired during any of the Training, Elimination, and Testing phases could affect resurgence during subsequent exposures to those phases.

RQ3: Procedural Manipulations

Resurgence of target responding fundamentally is a result of worsening of alternative conditions, as initially demonstrated by decreases in alternative-reinforcer rate/magnitude and increased delays (see Lattal et al., 2017, for a review). An important contribution to the line of research supporting this conclusion is the worsening of alternative conditions by arranging punishment of alternative responses during Testing (Fontes et al., 2018; see also Wilson & Hayes, 1996). However, Fontes et al. found modest decreases in alternative reinforcer rate accompanied shock deliveries contingent upon alternative lever pressing in rats. Because changes in variables other than decreasing reinforcement rate can enhance resurgence effects (e.g., Kincaid et al., 2015), the punishment contingency might have served both to contribute to resurgence on its own and to decrease reinforcer rates when response rates decreased. Therefore, eliminating decreases in alternative-reinforcement rates during punisher deliveries (e.g., response-independent reinforcer deliveries) and arranging equivalent decreases in reinforcement rates in the absence of punisher deliveries would be important control tests to strengthen these conclusions.

Control conditions isolate the degree to which the worsening of alternative conditions influences resurgence. The most frequent control condition was arranging inactive responses to address whether increases in target responding during Testing are a result of the history of reinforcement during Training (i.e., resurgence) versus general increases in behavioral variability. In research with nonhumans, resurgence effects generally have been unambiguous because changes in levels of inactive responding from Elimination to Testing tended to be minimal relative to increases in target responding (e.g., Kuroda et al., 2017a, 2017b). In contrast, worsening alternative-reinforcement conditions in experiments with adult-human participants frequently resulted in increases in both target and inactive responses during Testing (e.g., Cox et al., 2019), which can be sustained (see Saini et al., 2021). One conclusion is that it is unclear whether the same behavioral processes underlying resurgence in nonhumans underlies resurgence in humans, at least under these types of basic and preclinical laboratory conditions.

There could be multiple reasons for nonhumans typically engaging in few inactive responses during Testing beyond the selective effects of reinforcement history on the resurgence of target responding (see Lattal & Oliver, 2020). For instance, inactive responses might not be sufficiently salient to be discriminated as an available option—if so, the inactive responses might as well not be present. However, most experiments with nonhumans record some low but non-zero level of inactive responding (e.g., Craig et al., 2020, Expt 1–2), suggesting discrimination of the presence of inactive options. Therefore, procedural or organismic differences might underly these differences in inactive responding between humans and nonhumans, of which there are many. Examples include differences in response effort required among available responses, the relevance and motivation for reinforcers (e.g., use of points vs. food), duration of exposure to experimental conditions, and the potential influence of higher-order processes on resurgence (e.g., counting, rule-following). Regarding the latter possibility, an interesting and potentially relevant case is Thrailkill et al. (2019, Expt 1–2). They reported minimal levels of inactive responding during Testing with humans pressing keys on a computer keyboard for target, alternative, and four inactive options. They provided an instruction specifying that participants could access the reinforcer by pressing the “yellow buttons,” which were the target and alternatives, whereas the inactive buttons were colored with black marker. Furthermore, they specified that, “. . . you will know which is the right button because it will make something happen. . . .” Although there were other features to Thrailkill et al. that were unique among experiments with humans (e.g., participant-exclusion rates, reinforcer types), the low levels of inactive responding likely could have been maintained by instructional control. If so, developing procedures to examine common behavioral processes between humans and nonhumans will require creative approaches, including potentially requiring human participants to engage in distracting activities (e.g., counting backward by sevens; see Barnes & Keenan, 1993; Reed, 2020). Finally, the use of specific keys on a keyboard could differ from using other types of buttons, as humans might engage with computer keyboards differently from the types of buttons typically encountered in these experimental environments. Identifying variables influencing responding under control conditions with both human and nonhuman populations is critical for understanding the behavioral processes involved in resurgence and related phenomena, which is important for both foundational and translational research.

The present review identified a wealth of basic and preclinical research examining qualitative and quantitative antecedent-, behavior-, and consequence-based variables influencing resurgence. One area for further research is in understanding neurobiological processes, which are examined extensively in research on other relapse-like phenomena, including reinstatement (e.g., Werner et al., 2021) and renewal (e.g., Bouton et al., 2021). However, only three experiments have examined pharmacological effects in the context of resurgence (e.g., Cook et al., 2020, Expt 2; Quick et al., 2011; Pyszczynski & Shahan, 2014), and only the latter two experiments attempted to examine whether pretreatments of selective receptor agonists and antagonists influenced resurgence generally (rather than as a potential strategy to mitigate drug self-administration). Research examining neurobiological processes would contribute to identifying (1) variables and processes influencing resurgence and (2) pharmacological targets that could mitigate clinical relapse.

RQ4: Definitions of Resurgence

All but 1 of the 12 different approaches to defining resurgence referred to one or more specific criteria. The exception was the most prevalent approach that did not specify a resurgence effect beyond referring to increases in target responding. Defining clear and specific criteria is useful to other researchers, especially if there is considerable variability in the data during Elimination or Testing. In contrast, 117 experiments (58.5%) with 128 instances overall defined resurgence by comparing responding during Testing to one or more data points during Elimination. There are a couple important considerations when using the specific approaches employing this general strategy of comparing Testing and Elimination.

First, comparing a single data point versus multiple data points during Elimination could have different implications for defining resurgence. Including multiple data points during Elimination is more stringent in accounting for levels of variability during Elimination. If there was variability in target responding during Elimination, it could indicate that any increase during Testing might not be due to the worsening of conditions (i.e., resurgence) but instead to a continuation of the levels of variability observed during Elimination (see Jarmolowicz & Lattal, 2014, pigeon 449). If target responding was instead on a consistent downward trend when initiating Testing, this approach would potentially rule out a likely resurgence effect (i.e., Type II error). Extending Elimination until stability is reached would resolve this concern but might not always be practical or desirable, such as when fitting quantitative models to all participants from between-subjects data. Therefore, criteria for defining resurgence can have broad impacts on the general analytic strategy.

Second, comparing Elimination and multiple data points during Testing allows for the assessment of resurgence patterns, unlike when assessing a single data point. For example, different Testing conditions can produce inverted U-shaped patterns of target responding whereas others a monotonic decrease in responding (e.g., Podlesnik & Kelley, 2014) that has informed the development of quantitative analyses and the understanding of behavioral processes (see Shahan & Craig, 2017)

RQ5: Analyses of Resurgence

Direct measures of behavior suffice under relatively simple conditions, but derived measures can be used to control for more complex patterns of responding when Training or Elimination responding differ across comparisons. Examining response measures during Testing relative to Training (e.g., Podlesnik & Shahan, 2009, Expt 1) or Elimination (e.g., Fontes et al., 2018) can control for prior differences in response levels. Careful selection of measures is important because different measures could result in different conclusions (see Cançado et al., 2016).

Although most basic and preclinical experiments (60.5%) used visual inspection to analyze data, over half incorporated frequentist statistical tests such as t-tests and ANOVAs. By aggregating variability within groups or across time periods, data of interest to behavior analysts are lost (e.g., individual-subject variability). These tests also assume that data points are independent from observations at one time to the next. More advanced statistical methods relax these assumptions and are better suited to analyze these arrangements of data. Bayesian frameworks (see Young, 2019) and multilevel (i.e., mixed effects) modeling account for individual-level variability within population-level estimates and preserve individual-subject variability for later inspection and analysis (see DeHart & Kaplan, 2019; Kaplan et al., 2021). Nevertheless, multilevel modeling was used only in a single experiment in the present review (Frye et al., 2018) but has been used in more recent experiments (e.g., Ritchey et al., 2021, 2022). We recommend that, wherever appropriate, researchers take steps to integrate more robust modeling approaches into analyses of resurgence.

Theories of resurgence assume specific roles for behavioral processes potentially underlying variables that influence resurgence (e.g., Bai et al., 2017; Bouton et al., 2021; Nevin et al., 2017; Shahan & Craig, 2017). Despite initiating the quantitative theoretical analysis of resurgence (Podlesnik & Shahan, 2009, 2010), behavioral momentum has been shown to be an inadequate account of the patterns of responding during Elimination (e.g., Craig & Shahan, 2016) and during Testing (Podlesnik & Kelley, 2014). Regarding RaC, Shahan et al. (2020a) modified the model from its initial form after the first evaluation of its fits to data in order to account for sustained biasing effects of reinforcement (see also Shahan et al., 2020b). When compared with behavioral momentum theory, this modified version (i.e., RaC2) provided superior fits to parametric manipulations of different Elimination durations and an on/off alternative-reinforcement contingency.

More recent evaluations of RaC2 fell outside the time range of this review. They evaluated fits of data resulting from changes in alternative reinforcer rates and magnitudes during Elimination (Podlesnik et al., 2022, Expt 1–4) and different durations of Training and Elimination (Smith & Greer, 2022). Although Smith and Greer found good fits of RaC2 to their data,Footnote 4 Podlesnik et al. required an additional free parameter to RaC2 assuming reinforcer misallocation to provide an adequate fit (see Cowie et al., 2021; Davison & Nevin, 1999). The flexibility to modify the assumptions of RaC2 to account for novel findings and underlying processes is a strength of this framework. Further evaluations of quantitative manipulations, especially those with little empirical research (e.g., response effort), are needed to test the fundamental assumptions of RaC2.

In contrast to quantitative theories, Context Theory is a conceptual theory that accounts for resurgence effects as an instance of a more general phenomenon known as renewal (see Trask et al., 2015). In particular, the contingency changes arranged across phases serve as different stimulus contexts—consequences produce resurgence through antecedent discriminative control. According to this framework, resurgence is the return of extinguished operant responding during Testing in the presence of novel contextual stimuli because excitatory conditioning from Training generalizes across conditions more than the inhibitory conditioning subsequently established during Elimination. A good deal of careful experimental research on resurgence supports findings consistent with context theory (see Bouton et al., 2021) but the theory has been criticized for a lack of predictive precision and falsifiability relative to quantitative frameworks (see Shahan & Craig, 2017). Despite these concerns, ample research shows that reinforcers have discriminative effects (e.g., Cowie et al., 2021; Davison & Nevin, 1999; Franks & Lattal, 1976), which have been incorporated into quantitative models of resurgence to account for contextual effects of reinforcers (see Bai et al., 2017; Shahan et al., 2020a, 2020b). Additional research is needed to develop quantitative approaches to account for the discriminative and contextual influence of antecedent contextual stimuli (Kincaid et al., 2015; Trask & Bouton, 2016), effects of punishers (Kestner et al., 2015; Kuroda et al., 2020), and qualitatively different consequences (e.g., Craig et al., 2017b; Trask et al., 2018, Expt 2). Such theoretical development would greatly benefit our understanding of the processes underlying resurgence and other relapse-like processes.

Clinical Relevance

Preclinical research provides a platform from which researchers can identify methods for improving the durability of clinical interventions (see Wathen & Podlesnik, 2018). Purpose-driven translational research that integrates research from basic, preclinical, and clinical investigations is common in biomedical research (e.g., Edgeworth et al., 2020) but it can be argued that it is underutilized in developing behavior-analytic interventions (see Mace & Critchfield, 2010). The study of resurgence is one area of research that has reflected a convergence of basic and clinical investigation, with the generality of resurgence demonstrated through clinically relevant behavior during DRA interventions (e.g., Lieving et al., 2004; Volkert et al., 2009). Synthesizing the methods and frameworks identified in this review could help take this combined effort further. For one example, we identified numerous methods designed to mitigate resurgence, which could be further developed, refined, and combined to identify effective candidates to translate. In addition, most studies in the present review (> 65%) arranged the same response topography between target and alternative responses. By contrast, treatments often involve reinforcing an alternative response topographically different from the target behavior, such as a functional communication response. Thus, variables identified in this review could facilitate increasing the validity of preclinical models to better simulate clinical interventions.

Conclusion

This systematic review showed the number of studies examining resurgence and our understanding of the conditions in which resurgence occurs has expanded greatly, especially in the last 10–15 years. Examining relapse of any form of problematic behavior through the perspective of resurgence follows a tradition consistent with behavior analysis and learning theory, with the goal of identifying relevant antecedent, behavior, and consequence variables. The present review described these events within the basic/preclinical literature in a comprehensive way. In contrast, relapse has also be characterized by more extended environmental and biological risk factors, such as psychiatric comorbidities (e.g., Sliedrecht et al., 2019) or discounting of reinforcers (e.g., Yeh et al., 2020). Research examining how such extended risk factors and local events interact (e.g., Reed, 2019) can begin to provide a more comprehensive picture of the events contributing to resurgence. Use of this systematic review could provide an important step in organizing basic/preclinical research to advance our foundational understanding of resurgence generally and in the context of clinical intervention.