Introduction

Individuals with intellectual and developmental disabilities often have communication support needs and are at an increased risk for engaging in challenging behavior. Communication needs are common among individuals with intellectual disability and approximately one in four children with autism spectrum disorder (ASD) experience severe language deficits (Anderson et al. 2007; Hronis et al. 2017). Communication deficits can greatly affect an individual’s quality of life and are associated with lower social participation and more restrictive academic placements (Liptak et al. 2011; White et al. 2007). Individuals with communication support needs may be more likely to engage in challenging behavior as means of communication (Kaiser et al. 2002; Park et al. 2012). Challenging behavior, such as self-injury, aggression, property destruction, and stereotypy, is common among individuals with intellectual and developmental disabilities (Alimovic 2013; Dworschak et al. 2016). These behaviors often impede students’ academic and social progress (Murphy 2009; Westling 2010). Challenging behavior also affects teachers, peers, and caregivers and can cause increases in stress and decreases in learning (Baker et al. 2003; Bromley et al. 2004; Westling 2010). However, there is a wealth of literature supporting the use of reinforcement-based interventions to increase appropriate communication and decrease challenging behavior (Wong et al. 2015).

Reinforcement-Based Interventions

Reinforcement-based interventions are often used to increase appropriate communication and decrease challenging behavior. Examples of appropriate communication include the use of spoken words and phrases (e.g., “Can I have [item] please?”), manual sign, picture exchange, or speech-generating devices. Reinforcement-based interventions involve providing reinforcement contingent upon a specific response or set of responses. Interventions to increase appropriate communication typically involve structured prompting (including prompt fading) and providing reinforcement contingent upon the communicative response (Goldstein 2002). For example, at least 91 studies have targeted increasing manding in children with ASD (DeSouza et al. 2017). These studies typically use an intervention that includes differential reinforcement to increase mands (DeSouza et al. 2017). The communication literature provides strong evidence for the efficacy of differential reinforcement in increasing appropriate communication for individuals with developmental disabilities (Goldstein 2002).

Differential reinforcement is also typically recommended to reduce challenging behavior and increase appropriate behavior (Wong et al. 2015). Interventions to reduce challenging behavior can include reinforcement of an alternative behavior, incompatible behavior, or the absence of challenging behavior (e.g., DiGennaro Reed et al. 2012). For example, functional communication training (FCT) involves teaching the student an appropriate communicative response and providing function-based reinforcement contingent upon that response (Carr and Durand 1985; Durand and Carr 1991; Tiger et al. 2008). At least 135 high-quality experiments have demonstrated the efficacy of FCT in reducing challenging behavior (Gerow et al. 2018a, b). This body of work indicates that FCT is effective in decreasing challenging behavior for individuals with developmental disabilities and that those reductions generalize to new situations and maintain over time (e.g., Carr et al. 1999; Durand and Carr 1991; Falcomata and Wacker 2013). Together, this body of research indicates that reinforcement-based interventions are often effective in reducing challenging behavior and increasing appropriate behavior for individuals with developmental disabilities.

Multiple Schedules

Much of the previous literature has included evaluations of reinforcement-based interventions with the implementer providing reinforcement following each instance of communication (Hagopian et al. 2011). However, it is unlikely that natural behavior change agents (e.g., parents, teachers) will be able to provide reinforcement for every communicative response and in all circumstances. For example, a mother might not give attention to a child while she is on the phone with a friend. In this situation, the child may stop engaging in the communicative response and/or may return to engaging in challenging behavior (i.e., resurgence). Similarly, a child who was taught to request a break from work (i.e., escape from demands) may request breaks every 30 s, resulting in little or no task completion. Therefore, following a successful increase in appropriate behavior with a reinforcement-based intervention, it is often important to teach the individual to engage in the appropriate behavior at a manageable frequency and in appropriate contexts (Hagopian et al. 2011).

One method for thinning the schedule of reinforcement, multiple schedules, involves alternating between two or more contingencies (e.g., continuous schedule of reinforcement and extinction). For example, a discriminative stimulus (SD) signals when socially-mediated (e.g., access to attention, access to toy) or automatic reinforcement (e.g., self-stimulation) is available whereas S-Delta signals when socially-mediated or automatic reinforcement is unavailable. Multiple schedules also involves presenting a stimulus, or cue, to indicate the contingency in effect (Hagopian et al. 2011). Cues or what are also known as schedule-correlated stimuli can be contrived (e.g., colored cards) or natural (e.g., being on the phone verses watching TV; Muharib and Pennington 2019). This type of arrangement can be used to decrease the overall occurrence of the appropriate response to a more manageable level for the implementer and to teach the child the circumstances in which the appropriate response will result in reinforcement (Hagopian et al. 2011). For example, Akers et al. (2019) conducted a study in which the implementer taught a 7-year-old boy with ASD to request edible items and drinks. Then, the implementer applied a multiple schedules of reinforcement intervention, with specific colors of communication boards indicating the available form of reinforcement (edible, drink, edible and drink, or neither). The child learned to communicate for the edible or drink while the corresponding communication board was present. These data suggested that multiple schedules can be used to teach children to request specific items in the presence of a specific stimulus.

Similarly, Fisher et al. (2015) evaluated the use of multiple schedules following FCT to teach three children to request the function-based reinforcement when the implementer wore a wristband. For each of the participants, the multiple schedules resulted in higher rates of manding in the presence of the wristband. Furthermore, the discriminated responding occurred across three settings in which multiple schedules was implemented. In another study, two individuals with ASD who displayed high rates of motor stereotypy were exposed to two contingencies within a multiple schedules of reinforcement arrangement, with two colored cards representing the two contingencies: one as an SD indicating the availability of engagement in stereotypy (i.e., automatic reinforcement), and one as an S-Delta indicating the contingency of response blocking upon stereotypy (i.e., no automatic reinforcement). The results showed differentiated responding during SD and S-Delta as both participants engaged in a substantially lower level of motor stereotypy during S-Delta compared to SD (Slaton and Hanley 2016). The use of multiple schedules can increase the feasibility of the intervention for the implementer as the target communication responses become under stimulus control of an SD, thereby decreasing the need to reinforce high-frequency communicative responses. Additionally, it can promote a higher success rate for appropriate behavior as S-Delta schedules are gradually increased to be practical in natural settings. For this reason, it is important to provide practitioners and researchers with useful information related to the current literature evaluating multiple schedules.

Quality Standards

Researchers should evaluate the quality of the published literature and the results of the studies to develop recommendations for practitioners (Council for Exceptional Children 2015; Wong et al. 2015). The published literature includes several rubrics that researchers can use to evaluate the quality of a study (e.g., Council for Exceptional Children 2014; National Autism Center 2015; What Works Clearinghouse™ [WWC] 2017; Wong et al. 2015). One of the most commonly used rubrics is the WWC Standards Handbook (WWC 2017). This rubric is relatively simple and involves a rigorous evaluation of the quality of the design with relation to internal validity. Specifically, for single-case design experiments, the rubric includes specific criteria related to the number of attempts to demonstrate effect, number of phases, number of data points per phase, and quality of inter-observer agreement methodology (WWC 2017). Due to the strengths of the rubric, researchers often use this rubric to evaluate the quality of studies (e.g., Fallon et al. 2015; Gerow et al. 2018a, b). Studies meeting the WWC standards have sufficient methodological rigor for the reader to have confidence in the results, if the results also demonstrate a functional relation between the independent and dependent variable (i.e., if the results indicate the intervention is effective; WWC 2017).

Analysis of Results

For studies that meet the methodological standards, the results are evaluated to determine if the intervention was effective. Researchers typically use visual analysis and effect size calculations to evaluate the results. Visual analysis involves reviewing within-phase (i.e., level, trend, and variability) and between-phase patterns (i.e., immediacy of effect, magnitude of change, consistency across similar phases, overlap) to determine the number of demonstrations of effect and non-demonstrations of effect (Horner et al. 2005; WWC 2017). Experiments including at least three demonstrations of effect and no demonstrations of non-effect provide convincing evidence that the intervention was effective. Leaders in the field often recommend visual analysis to evaluate the presence of a functional relation with single-case research design studies (Horner et al. 2005; Kratochwill et al. 2013; WWC 2017). Researchers can also use effect sizes to evaluate the results of single-case studies. One benefit of effect sizes is that researchers can aggregate and compare effect sizes across studies using meta-analysis. Tau-U is a metric often recommended for estimating intervention effect with single-case research data, due to the effect size requiring few assumptions to be met, allowing for adjustment based on baseline trend, and using nonoverlap to measure efficacy (Parker et al. 2011). Meta-analyses of single-case studies have provided valuable information to practitioners and researchers in the field of applied behavior analysis (e.g., Hutchins et al. 2019; Tincani and De Mers 2016).

Purpose of the Present Study

There are currently a few reviews on schedule thinning available to practitioners. Hagopian et al. (2011) provided a comprehensive review and guide regarding common types of schedule thinning procedures paired with FCT. Similarly, Muharib and Pennington (2019) provided a guide to help practitioners implement various schedule thinning procedures following FCT. However, neither article provided a systematic review of research on multiple schedules of reinforcement or described other uses of multiple schedules of reinforcement beyond FCT. Saini, Miller, and Fisher (2016) conducted a review of the multiple schedules literature. The authors identified 31 articles published between 1957 and 2014 and provided valuable descriptive information about the articles (e.g., participant characteristics, topography and function of target behavior, and characteristics of the multiple schedules procedure). However, the review did not include an evaluation of the quality of the literature or effect size calculations. The purpose of the present review was to extend Saini et al.’s (2016) review and provide additional information regarding the multiple schedules literature by conducting a quality review and meta-analysis of the available literature. Specific research questions included:

  1. (a)

    What are the characteristics of studies involving multiple schedules of reinforcement?

  2. (b)

    What are the overall effects of multiple schedules of reinforcement for appropriate communicative responses and challenging behaviors?

  3. (c)

    Do age, disability, communication levels, function of behavior, targeted dependent variables as well as characteristics of the intervention moderate the effects of multiple schedules of reinforcement for appropriate communicative responses and challenging behavior?

Method

Search Procedure

We searched Google Scholar, ERIC, PsycINFO, and ProQuest Dissertations and Theses Global to locate studies that incorporated a multiple schedules of reinforcement procedure. We conducted multiple searches across the online databases by applying one keyword from one category or combining two keywords (two categories at a time) from the following four categories: (a) autism (search terms: ‘autism,’ ‘autism spectrum disorder,’ ‘disability’), (b) functional communication training (search terms: ‘communication training,’ ‘functional communication’), (c) reinforcement (search terms: ‘multiple schedule,’ ‘schedule thinning,’ ‘schedule of reinforcement,’ ‘reinforcement’), and (d) discrimination training (search term: ‘discrimination training’). The searches were restricted to studies published in English. We searched published and unpublished studies (e.g., dissertations) to reduce the threat of publication bias. We completed additional searches by (a) reviewing the reference lists of seven published literature reviews on multiple schedules of reinforcement and FCT (i.e., Andzik et al. 2016; Chezan et al. 2017; Falcomata and Wacker 2013; Gerow et al. 2018a, b; Heath et al. 2015; Saini et al. 2016; Walker et al. 2018); (b) reviewing the reference lists of all included studies identified via the online database search; and (c) reviewing the reference lists of the studies that did not meet our age or disability inclusion criteria (e.g., Anderson et al. 2010; Doughty et al. 2007; McKenzie et al. 2008; Nava et al. 2016). Searches concluded in July of 2019 and resulted in a total of 1,030 articles (979 from online database searches and 51 from ancillary searches) after removing duplicates.

Inclusion and Exclusion Criteria

We evaluated each study against the following inclusion criteria: the study (a) included a multiple schedules of reinforcement arrangement with a minimum of two schedule components (e.g., reinforcement and extinction, or reinforcement and punishment) presented sequentially and each component signaled by a schedule-correlated stimulus, (b) included school-aged participants 22 years old or younger with one or more developmental disabilities (e.g., ASD, intellectual disability), (c) used an experimental single case design with a line graph displaying participant outcomes, or a control group for group experimental studies, and (d) successfully passed the examination against WWC standards by obtaining a rating of Meets Standards or Meets Standards with Reservations. We excluded studies from the review when they met at least one of the following exclusion criteria: (a) all participants were older than 22 years old (e.g., McKenzie et al. 2008), (b) none of the participants had a developmental disability (e.g., Nava et al. 2016; Tiger and Hanley 2005), (c) did not use multiple schedules of reinforcement as an intervention procedure (e.g., Muharib et al. 2019), (d) was not experimental (e.g., Saini et al. 2016), or (e) failed to meet WWC standards.

We reviewed the abstracts of the 1030 studies to identify studies that were not intervention-based (e.g., literature reviews) or not relevant to the current study (e.g., studies on different topics). This led to the exclusion of 853 studies. We then accessed the full text of the remaining 177 studies to apply the inclusion criteria, leading to a total of 43 included studies (35 from database searches, and eight from ancillary searches) before further evaluations using the WWC standards (see Fig. 1).

Fig. 1
figure 1

Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) summary of article extraction process

WWC Design Standards

We used the design standards suggested by WWC (Kratochwill et al. 2013) to determine the quality of the included studies and further determine which studies would be used for effect size estimate calculations and moderator analyses. Because all 43 studies used a single-case research design, we used the WWC standards for single-case research only. The WWC standards for single-case research included the following indicators: (a) the independent variable must be systematically manipulated, (b) each dependent variable must be measured overtime by more than one assessor for at least 20% of each condition and resulted in an average agreement of 80% for each dependent variable, (c) at least three attempts to demonstrate an intervention effect at three different points of time, and (d) at least three data points in each condition (at least five data points in each condition to Meet Standards without reservations). For alternating treatment designs, a study had to have at least five data points in each condition. It is worth noting that, for studies in which aggressive/self-injurious/or destructive behavior were dependent measures, the five data point indicator was not required. That is, a study with such dependent measures did not need to meet the requirement of five data points in each condition to receive a Meets Standards rating due to the danger that could be imposed by those behaviors. We evaluated each of the 43 eligible studies against those indicators and awarded a rating of (a) Meets Standards, (b) Meets Standards with Reservations, or (c) Does Not Meet Standards. A study received the Meets Standards rating when it met all the WWC standards including the minimum of five data points in each condition with the exception of studies that targeted aggression and/or self-injury. A study received a Meets Standards with Reservations rating when it met all the WWC standards but only had three to four data points in one or more conditions. Finally, a study did not meet the WWC standards when it failed to meet at least one indicator. After inspecting all 43 studies, we excluded eight studies due to those studies failing to meet at least one indicator as follows: agreement (IOA) missing for some outcome measures (Fuhrman et al. 2016; Heald et al. 2013; Kaminski et al. 2018; Torres-Visco et al. 2018), less than three data points in each condition (Lanovaz et al. 2009), less than three demonstrations of an intervention effect (Hagopian et al. 2007), and IOA data missing and failing to show three demonstrations of an effect (Alvarez et al. 2014; Scully 2016). We included the remaining 35 studies for further analyses.

Data Extraction and Coding

We extracted descriptive information across each of the 78 participants represented in the 35 included studies in terms of (a) participant characteristics (age, diagnosis, and communication level), (b) dependent measures (targeted behaviors and functions of behavior), (c) functional behavior assessments (reports, observations, and functional analysis), (d) settings (clinic, home, school) and interventionists (teacher/ paraprofessional, parent, researcher), (e) schedule-correlated stimuli (contrived or natural), and (f) characteristics of the intervention (use of prompts, punishment, and/or discrimination pre-training, and terminal schedule of S-Delta). We coded data using “1” to indicate the variable was relevant to the participant or “0” to indicate the variable was not relevant to the participant. When a study did not clearly provide specific information regarding those variables, we coded the variable as “cannot determine.”

Participant Characteristics

For the age group variable, we coded each participant as early childhood (younger than 5 years old), elementary (5–11 years old), or middle and high school (12–22 years old). We combined middle and high school-aged participants due to the small number of participants in each age group. For the diagnosis variable, we coded each participant as having ASD (i.e., autism, autistic disorder, pervasive developmental disorder not otherwise specified, pervasive developmental disorder, or Asperger’s), intellectual disability, behavior disorder (i.e., adjustment disorder, attention deficit hyperactive disorder [ADHD], bipolar disorder, disruptive behavior disorder, intermittent explosive disorder, impulse-control, conduct disorder, obsessive compulsive disorder [OCD], or oppositional defiant disorder [ODD]), or cooccurring (i.e., having at least two diagnoses from the previous categories). If a participant had a diagnosis of ASD and a behavior disorder, for example, we coded “1” for ASD, behavior disorder, and cooccurring. We coded behavior disorders due to the severity of the challenging behaviors exhibited by those participants and due to the fact that behavior disorder mostly cooccurred with ASD or intellectual disability in the included participants. For the communication level variable, we coded each participant as communicating using prelinguistic behaviors (e.g., pointing, leading an adult), one-word utterances (vocally or using augmentative and alternative communication [AAC]), or full sentences (vocally or using AAC). If a participant communicated using two of the previous modes, we coded “1” in both categories.

Dependent Measures

We coded the dependent measures for each participant, which included challenging behavior and/or appropriate communicative behaviors. Challenging behavior was coded as aggression (e.g., hitting, kicking, pinching others, throwing objects at others), self-injury (e.g., head-banging, hair pulling, pica), property destruction (e.g., throwing or breaking objects), disruption (e.g., crying, screaming), elopement (i.e., leaving a designated area), and stereotypy (i.e., vocal including pervasive speech, and/or motor including toe walking). Appropriate communicative behaviors were coded as either mands (when FCT was not delivered) or functional communication responses (FCR; when FCT was delivered). Examples of mands and FCR include requesting to access tangibles including edibles, attention, or breaks. We also coded the functions of the targeted behaviors for each participant as positive, negative, or automatic reinforcement. When problem behavior was multiply controlled (e.g., access to attention and escape from demands), we coded “1” for each function (e.g., positive and negative).

Functional Behavior Assessments

For each participant, we coded whether a functional behavior assessment had been conducted to inform the intervention. In cases in which a functional behavior assessment was reported, we coded whether the assessment included reports (e.g., teacher reports, parent reports), direct observations, and/or a functional analysis (FA; Iwata et al. 1982/1994). For those participants whose intervention was not based on the results of a functional behavior assessment, we coded that as “none.”

Settings and Interventionists

We coded three variables related to settings and three related to interventionists. For settings, we coded whether a participant received the intervention in a clinic, school, or home setting. For interventionists, we coded whether the intervention was delivered by a researcher, parent, or teacher/ paraprofessional.

Schedule-Correlated Stimuli

We coded two variables for schedule-correlated stimuli. For each participant, we coded whether the singling stimuli were contrived (e.g., hats, flashlights, cards, necklaces) or natural (e.g., being busy versus being non-busy).

Characteristics of the Intervention

We coded four variables related to the characteristics of the intervention. These were the use of punishment, prompts, and pre-intervention discrimination training, as well as the terminal schedule of the S-Delta component. For each participant, we coded whether a punishment procedure was used (i.e., response blocking or response cost) during the implementation of the intervention. We also coded whether a prompting procedure (e.g., least-to-most, most-to-least) was used during the intervention to prompt the participant to use an appropriate communicative behavior during a discriminative stimulus (SD) interval. Third, we coded whether the participant received discrimination training (i.e., instruction about the difference between the signals, the rules, and when to use an appropriate communicative behavior) before the implementation of the intervention. Finally, we coded the terminal schedule of the S-Delta component (i.e., < 1 min, 1–2 min, 3–4 min, 5–6 min, > 8 min).

Intervention Effect and Moderator Analyses

We used Tau-U (Parker et al. 2011) to estimate intervention effect across all participants for both dependent measure categories (challenging behavior and appropriate communicative behavior). Tau-U is a robust nonoverlap index appropriate for single-case design that accounts for undesirable trends in baseline (Parker et al. 2011). To interpret Tau-U, we used the following guidelines from Vannest and Ninci (2015): < 0.20 = small change, 0.20–0.60 = moderate change, 0.60–0.80 = large change, and > 0.80 = large to very large change. We extracted all data point values from participant graphs using WebplotDigitizer (Rohatgi 2018) and entered these values into an online Tau-U calculator (Vannest et al. 2016) where we conducted phase contrasts (e.g., baseline condition contrasted with multiple schedules of reinforcement condition, one condition in an alternating treatment design contrasted with another condition such as SD versus S-Delta) to produce Tau-U for each participant and dependent measure. To account for baseline trend, we corrected baseline when a significant trend in baseline data was detected within the calculator (i.e., p ≤ 0.05). We combined the phase contrasts into a weighted average within the calculator to produce an aggregated Tau-U for each participant and dependent measure. Finally, we conducted moderator analyses using the Kruskal–Wallis one-way ANOVA test (e.g., Wiseman et al. 2017) to determine whether there were differences in Tau-U across study variables.

Interrater Reliability

The third author served as a secondary coder for interrater reliability (IRR) purposes during the inclusion criteria application, coding, and data analysis phases. Training the third author entailed oral and written explicit operational definitions of the intervention (multiple schedules of reinforcement), inclusion criteria, WWC standards, and all the variables for coding as well as examples and non-examples for each of the aforementioned items. The third author also had access to a copy of the WWC handbook for a reference. We calculated IRR item-by-item, and divided the number of agreements by the number of agreements plus disagreement and multiplied by 100 to obtain a percentage of agreement.

Inclusion of the Studies

The first author assigned 12 randomly-selected studies (27.9%) to the third author. The IRR result for the inclusion of the studies was 100%.

WWC Standards

The first author assigned another 12 randomly-selected studies (27.9%) to the third author. The IRR results for the WWC standards was 96.6%. The disagreements (n = 2) occurred for one study (Vladescu and Kodak 2016) on (a) whether IOA was collected for at least 20% on each dependent variable and (b) on the final decision. The two authors met to discuss the disagreements by reading the article together and reached consensus.

Descriptive Coding

The first author assigned 20 randomly-selected participants (25.6%) for descriptive data coding to the third author. The IRR result for data coding was 99.3%. The disagreements (n = 3) occurred on the communication level for two participants. The two authors met to discuss the disagreements by reading the articles in which they had disagreements together and reached consensus.

Tau-U Calculations

The first author assigned the extracted data of 23 randomly-selected participants (29.4%) to the third author for Tau-U calculation. The IRR was 100%.

Results

Descriptive Findings

Descriptive findings of study characteristics are presented in Tables 1 and 2. In this analysis, 78 participants received a multiple schedules of reinforcement arrangement to increase appropriate communicative behavior and/or decrease challenging behavior. A majority of the participants (67.9%) were in the elementary age group. Over half of the participants (76.9%) had a diagnosis of ASD. In terms of targeted behaviors, appropriate communicative behavior (i.e., mands, FCRs) were the most common (76.9%) followed by aggression (41%). By far, access to positive reinforcement (e.g., tangible, attention) was the most common (69.2%) function of the targeted behaviors. For a few participants (8.9%), the target behaviors served multiple functions such as access to tangibles and escape from demands. For participants who engaged in challenging behaviors (76.9%), a functional analysis was conducted for all but three participants. For the three participants, parent reports and direct observations were conducted to identify the functions of challenging behaviors. For most participants (96%), a researcher served as an interventionist. A clinic was the most common setting where 65.3% of participants received the intervention sessions. Naturally-occurring stimuli were used with only 0.9% of the participants to indicate the contingencies associated with each schedule. The most common terminal schedule of the S-Delta component was 1–2 min as it was implemented for 30.7% of the participants whereas the longest terminal schedule was 30 min which was implemented for 1.3% of the participants. Of the included studies, 30 studies (85.7%) met the WWC standards. Only five studies (14.2%) met the WWC standards with reservations due to collecting fewer than five data points per phase.

Table 1 Summaries of the included studies
Table 2 Main characteristics of the 78 participants

Overall Effect

There was a total of 82 phase contrasts for challenging behaviors and 109 phase contrasts for appropriate communicative behaviors to estimate the overall effect across participants. The aggregated Tau-U for challenging behavior was 0.54, 95% CI = [0.47, 0.60], p < 0.001, with Tau-U ranging from 0.01 to 1.00. This reflected an overall moderate change in challenging behavior (Vannest and Ninci 2015) during conditions in which multiple schedules of reinforcement were present. The aggregated Tau-U for appropriate communicative behaviors was 0.64, 95% CI = [0.58, 0.70], p < 0.001, with Tau-U ranging from 0.12 to 1.00. This reflected a large change in behavior (Vannest and Ninci 2015) during conditions in which multiple schedules of reinforcement were present.

Moderator Findings

Tables 3 and 4 show the results from the moderator analyses. The results from the moderator analyses indicate that significance differences in Tau-U for both challenging behavior and appropriate communicative behavior were not present among a majority of coding variables. However, we found a significant difference in Tau-U for appropriate communicative behaviors for the prompting variable, χ2 (1, N = 47) = 6.95, p < 0.01. In particular, appropriate communicative behavior was significantly greater when prompting was delivered to teach the communicative behavior during intervention (M = 0.90) as compared to when prompting was not provided to participants (M = 0.62).

Table 3 Moderator analysis findings for challenging behavior
Table 4 Moderator analysis findings for appropriate communicative behavior

Discussion

In this review, we summarized and meta-analyzed data for 78 participants with developmental disabilities across 35 studies that included a multiple schedules of reinforcement arrangement to increase discriminated appropriate communicative behavior and decrease challenging behavior. Overall, multiple schedules of reinforcement produced a large effect for appropriate communicative behavior and a moderate effect for challenging behavior as estimated by Tau-U. Our review is consistent with Saini et al.’s (2016) review in that we found a majority of multiple schedules of reinforcement applications (for 91% of participants) involved the use of contrived schedule-correlated stimuli (e.g., colored cards) as opposed to natural stimuli (busy versus nonbusy). Although it may be necessary to initially use contrived schedule-correlated stimuli as they may provide more salient features compared to natural signaling stimuli, more research is needed to examine the effects of multiple schedules of reinforcement using natural schedule-correlated stimuli. Natural schedule-correlated stimuli may be more feasible to use in natural environments such as school, home, or community settings (e.g., supermarket). It may be unfeasible, for instance, for a parent to flip a colored card in a supermarket to indicate a particular schedule (e.g., extinction). Such contrived stimulus may also get lost or damaged. On the other hand, a natural schedule-correlated stimulus in the form of a parent being busy talking to the cashier may be more practical. Another benefit to natural stimuli is that they may approximate what occurs with same-aged individuals without disabilities within authentic settings. Further, the use of natural schedule-correlated stimuli may facilitate generalization such as a parent being busy talking to on the phone at home (intervention setting) to a parent being busy talking on the phone in the supermarket (generalization setting). Although generalization data were available in some studies included in the review, we did not explore whether certain variables, such as the type of schedule-correlated stimuli, resulted in positive outcomes under conditions different from intervention. Researchers should examine the extent to which natural schedule-correlated stimuli may facilitate generalization of discriminated appropriate communicative behavior and whether there are differences in generalized outcomes based on type of stimuli (natural vs. contrived).

Of particular interest, we found that for most participants (96%), multiple schedules of reinforcement was implemented by a researcher as opposed to a natural change agent such as a parent or teacher. This raises the question of whether multiple schedules of reinforcement can be implemented by natural change agents with a high degree of fidelity. Because multiple schedules of reinforcement may involve extended periods of extinction, natural change agents may find it challenging to implement the procedures with fidelity. For example, a parent whose child is engaging in an escalated level of challenging behavior may be more inclined to reinforce their child’s challenging behavior as opposed to a researcher working with the same child (Allen and Warzak 2000). It is critical that natural change agents develop the knowledge and skills to use multiple schedules of reinforcement, as researcher-delivered interventions may not be sustainable in cases in which researchers are not responsible for supporting individuals with developmental disabilities in authentic settings. Therefore, future researchers should allocate more efforts in examining the use of multiple schedules of reinforcement implemented by natural change agents with a particular emphasis on effective training methods to support natural change agents. Andzik et al. (2016) found that practitioners can implement FCT when provided with proper training which may involve instruction, modeling, and feedback. Thus, future researchers may examine the effects of practitioner training and coaching on their implementation fidelity of multiple schedules of reinforcement.

Furthermore, the terminal S-Delta schedule for over half the participants (58.8%) was relatively short (< 1–6 min). This is concerning as access to reinforcement may not be available for longer periods of time in natural settings (e.g., classrooms, community settings). For example, a student may be required to engage in academic instruction for 10–15 min in which access to the functional reinforcer (e.g., an iPad) may not be available. Without gradually increasing the S-Delta schedule to align to environmental demands and expectations, a student may return to engaging in challenging behavior. Therefore, future researchers should examine longer periods for the terminal S-Delta schedules to be more practical in natural settings.

Our meta-analysis extends Saini et al.’s review by providing an estimate of the overall effect size of multiple schedules of reinforcement as well as examining potential moderators of multiple schedules of reinforcement. In terms of the intervention characteristics, we found that response prompts moderated the effects of multiple schedules of reinforcement for appropriate communicative behaviors. That is, when response prompts were used during intervention, appropriate communicative behaviors were more pronounced than when response prompts were not implemented. This is not surprising, as response prompts have been established as an evidence-based practice for students with intellectual and developmental disabilities (Browder et al. 2014; Wong et al. 2015) and found to be effective across various skill areas (Knight and Sartini 2015). When response prompts are used, the interventionist aims to facilitate a transfer of stimulus control from the prompt to the naturally occurring stimulus (Wolery et al. 1992). This finding is promising given that prompts may be more preferable over punishment-based procedures such as response cost, which should only be used as a last resort (Behavior Analyst Certification Board 2014). However, due to the very limited number of participants who received punishment or pre-discrimination training, we were unable to test the effects of these potential moderators for appropriate communicative behavior.

Limitations and Future Research

Our meta-analysis has a few limitations that are worth discussing and addressing in future research. First, we only conducted electronic and reference list searches to locate potential studies. We may have overlooked eligible studies that would have been captured by a hand search of behavioral journals. Future reviews should exhaust all methods to locate articles to ensure the inclusion of all potential studies. Second, due to limited number of cases (n =  < 8) for some variables, we excluded a few variables from our moderator analyses such as the settings, interventionists, singular versus multiple functions of the target behaviors, terminal schedules, as well as schedule-correlated stimuli (contrived vs. natural) associated with each study participant. In other words, we excluded those variables due to the lack of data collected in those studies. For example, for only seven study participants, naturally-occurring stimuli were used. Future researchers should update our review and test those moderators once additional research has been conducted. Third, due to the limited number of cases in which response cost or response blocking was used, we combined both response cost and response blocking under punishment although response blocking may function as either punishment or extinction (Lerman and Iwata 1996; Smith et al. 1999). Future researchers should test the moderating effect of response blocking without combining it with other intervention components. Finally, we included studies in which participants had received FCT before a multiple schedules of reinforcement intervention and studies in which participants had not. Therefore, it is important to acknowledge this limitation when interpreting our findings.

Implications for Practice

In this meta-analysis, we found overall positive effects for multiple schedules of reinforcement on increased discriminated appropriate communicative behavior and decreased challenging behavior. Based on these findings, we provide two implications for practice. First, practitioners can use response prompts during the initial implementation of multiple schedules of reinforcement and gradually fade out the prompts to facilitate the transfer of stimulus control from the prompts to the schedule-correlated stimuli. As we found, response prompts moderated the effects of multiple schedules of reinforcement for appropriate communicative behaviors. That is, individuals who were prompted to engage in the appropriate response during SD significantly outperformed individuals who were not prompted. Therefore, practitioners should consider the use of response prompts and prompt fading procedures. As found in a previous review, practitioners such as teachers and paraprofessionals can implement response prompts with fidelity (Walker et al., in press).

Second, practitioners should seek input from other stakeholders (e.g., parents) on the terminal schedules (e.g., 10 min of extinction) of multiple schedules of reinforcement. The terminal schedules should also be decided with a consideration to the requirements of natural environments (Stromer et al. 2000). For example, terminal schedules for a student who is expected to be seated in a chair and not ask the teacher for an iPad throughout a whole class (e.g., 15 min) could be 15 min of extinction and 5 min of reinforcement. In addition, practitioners can expect to observe some extinction bursts before they observe a reduction in challenging behavior as extinction bursts occurred for some participants in our dataset.

Conclusion

In this review, we summarized and meta-analyzed studies that involved the use of multiple schedules of reinforcement with individuals with developmental disabilities. Overall, multiple schedules of reinforcement produced a large effect size for appropriate communicative behavior and a moderate effect size for challenging behavior as estimated using Tau-U. We conducted moderator analyses using the Kruskal–Wallis one-way ANOVA test and found no significant differences for age, communication level, disability, function of the target behavior, dependent measures, the use of punishment, or pre-discrimination training, suggesting that multiple schedules of reinforcement is effective across a wide range of participant and intervention characteristics. However, the use of response prompts significantly moderated the effects of multiple schedules of reinforcement for appropriate communicative behaviors, pointing to the potential utility of prompting to facilitate the acquisition and emission of appropriate communicate responses under SD conditions. In order to develop guidelines for interventionists who are responsible for developing and implementing behavioral supports, more research is needed to explore the effects of multiple schedules of reinforcement across a wide range of variables such as settings, intervention components, and schedule-correlated stimuli. Future research examining the role of coaching on the implementation fidelity of multiple schedules of reinforcement by natural behavior change agents in natural settings is also warranted.