Giving students the opportunity to choose a putative reinforcer following correct responses may increase the efficiency of skill acquisition relative to conditions in which the teacher selects the putative reinforcer (Toussaint et al. 2016). This may be the case because not only can the student choose the most currently preferred item, but choice itself may also serve as a reinforcer. If choice is a reinforcer, students are getting access to a higher quality reinforcer in this scenario relative to the one in which the teacher selects the putative reinforcer. These conditions may not be the same when a teacher chooses for the student (Toussaint et al. 2016). In a recent evaluation, Toussaint et al. (2016) compared the efficiency of acquiring academic responses for three participants with autism spectrum disorder (ASD) in a condition in which participants selected a putative reinforcer following correct responses relative to a condition in which the experimenter selected the putative reinforcer. Two of the three participants demonstrated more efficient acquisition during the choice condition, and these findings were replicated during a second comparison.

Considering students may not always prefer choice (e.g., Brandt et al. 2015) and this preference may be influenced by an individual’s learning history (i.e., exposure to specific reinforcement or punishment contingencies), it may not be accurate to assume choice will confer advantage during clinical procedures aimed at changing behavior. Replications are important to evaluate the generality of the findings of Toussaint et al. (2016). Therefore, the purpose of the current study was to complete a systematic replication of Toussaint et al.

Method

Participants

Two children diagnosed with ASD participated in this study. Both participants had been receiving intervention based on the principles of applied behavior analysis for a minimum of 3 years. Parents of the participants completed the Gilliam Autism Rating Scale—Third Edition (GARS-3; Gilliam 2013) to document behaviors characteristic of an ASD. Ratings for each participant indicated a very likely probability of ASD.

Christine, a 9-year-old girl, obtained a standard score of 55 (qualitative description: deficient to low) on the communication domain of the Battelle Developmental Inventory—Second Edition (BDI-2; Newborg 2004). Christine performed all or nearly all skills though level 2 on the mand, tact, listener, and intraverbal portions and all skills on the echoic subtest from the Verbal Behavior-Milestones Assessment and Placement Program (VB-MAPP; Sundberg 2008). Christine scored a 53 (extremely low) on the Peabody Picture Vocabulary Test—Fourth Edition (PPVT-4; Dunn & Dunn 2007) and a 53 (extremely low) on the Expressive Vocabulary Test—Second Edition (EVT-2; Williams 2007).

Zayne, a 5-year-old boy, obtained a standard score of 58 (deficient to low) on the communication domain of the BDI-2. Zayne performed all or nearly all skills through level 2 on the mand, tact, listener, and intraverbal portions and all skills on the echoic subtest from the VB-MAPP. Zayne scored a 107 (high average) on the PPVT-4 and an 86 (low average) on the EVT-2.

Setting and Materials

All sessions were conducted in a room in the participant’s house that contained a table, chairs, session materials, a video camera on a tripod, and putative reinforcers. Materials for all sessions included a pen, data sheets, a digital timer, paper plates of various colors, binders, trial sheets, flash cards (relevant to the conditions), and edibles.

Design, Measurement, and Interobserver Agreement

The experimenter conducted the treatment comparison using an adapted alternating treatments design (Sindelar et al. 1985) to determine the relative efficiency of child and experimenter choice of putative reinforcer during training of tacts (Christine) or auditory-visual conditional discriminations (Zayne). During each session, we scored unprompted and prompted correct and incorrect responses. Only unprompted correct responses are depicted in the figures and were defined as the participant emitting the target response prior to the delivery of the prompt. The experimenter also recorded session duration using a digital timer. The experimenter started the timer immediately before providing the antecedent stimulus on the first trial of a session and stopped the timer immediately following the completion of the last trial of a session. Total training sessions were calculated by adding the number of total training sessions required for participants to demonstrate mastery per condition. Total training trials were calculated by multiplying the number of training sessions by the number of trials per session. Total training time was calculated by adding the session durations for each training session per condition.

An independent observer scored a minimum of 34% of sessions across conditions for both participants for interobserver agreement (IOA) purposes. Trial-by-trial IOA data were calculated. For Christine, the mean agreement was 99% (range, 90 to 100%) and 100% for her first and second treatment comparisons, respectively. For Zayne, the mean agreement was 98% (range, 70 to 100%) during his treatment comparison.

Preference Assessments

The experimenter conducted a paired-stimulus preference assessment using ten caregiver-nominated edibles. The experimenter standardized the size of the edibles included in the PS preference assessment such that each edible was a consistent size across presentations. The top five items were used during subsequent sessions. A color preference assessment (Heal et al. 2009) was conducted using colored pieces of paper and items to determine participant preference for ten colors. Three colors that were approached during an approximately equal percentage of trials were assigned as condition-correlated stimuli.

Pretests and Assignment of Stimuli

We taught auditory-visual conditional discriminations (AVCDs) to Zayne and tacts of common objects to Christine. We selected targets based on individual education goals. We conducted a vocal imitation assessment with Christine to ensure she could repeat the names of unknown targets, in the absence of the corresponding picture, when modeled by the experimenter. We assigned one exemplar of five targets to each condition for Zayne and Christine’s initial treatment comparison; each exemplar was presented twice in this comparison for both participants. Three exemplars of five targets were assigned to each condition during Christine’s replication; each exemplar was presented only once. We assigned targets using a logical analysis (Wolery et al. 2014). That is, we attempted to equate targets across conditions by ensuring no two targets contained overlapping phonemes or shared physical similarity, and targets across conditions contained an equal number of syllables. A list of targets is available from the second author.

General Procedure

We used a constant prompt delay to teach unknown responses to experimental targets to participants. Participants were required to engage in a trial-initiation response at the beginning of each trial (touching a colored card or removing the colored paper covering the trial sheet). The beginning training sessions included trials conducted at a 0-s prompt delay. During the 0-s trials, the experimenter delivered the antecedent stimuli and then immediately provided the prompt (vocal model for tact task and visual model for AVCD). For example, for the visual model provided in the AVCD trials, after flipping the colored card (associated with the condition in the stimulus binder), the experimenter would say “touch Illinois” (a state within an array of five) with an immediate gestural prompt to the state on a trial sheet within a sheet protector. The experimenter continued to implement the 0-s prompt delay until the participant responded for two consecutive sessions with at least 89% prompted correct responses. Then, the prompt delay was increased to 5 s. Unprompted and prompted correct responses resulted in praise and the condition-specific consequences as outlined below. The experimenter presented a vocal (Christine) or visual (Zayne) model prompt following unprompted incorrect responses and allowed the participant 5 s to respond. Prompted incorrect responses resulted in the presentation of the next trial. Training continued until participants demonstrated 100% unprompted correct responses for two consecutive sessions. Sessions consisted of 10 (Zayne and Christine’s first treatment comparison) or 15 (Christine’s second treatment comparison) trials. A minimum of 5-min duration elapsed between sessions. We conducted three to nine sessions per day, 1 to 4 days a week.

Baseline and Control

During tact trials, the experimenter held up a picture card and delivered the antecedent vocal stimulus, “What is it?” During AVCD trials, the experimenter opened the stimulus binder, the participant flipped the blank page, and the experimenter delivered the antecedent verbal stimulus. The experimenter provided a brief verbal statement (e.g., “okay”) following unprompted correct or incorrect responses. No other consequences were provided. The experimenter delivered a putative reinforcer and praise for appropriate collateral behavior (e.g., sitting appropriately at the table) approximately every three trials during the intertrial interval.

Child Choice

During tact trials, the experimenter provided the instruction, “What is it?” after flipping over a flash card from a pile. Each flash card was a specific color, correlated to the condition it corresponded with. During AVCD trials, the experimenter provided the instruction (the state name, e.g., “Illinois”), after flipping over the colored page in the stimulus binder. Following unprompted or prompted correct responses (i.e., correctly identifying the correct uncommon item in the tact task condition and pointing to the correct state in the AVCD task), the experimenter provided praise and presented a plate with five non-identical edibles. The participant was provided the opportunity to choose one edible. Any attempt to take more than one edible was blocked by the experimenter.

Experimenter Choice

The procedures were identical to those in the child-choice condition, except for the delivery of the reinforcer. Following unprompted or prompted correct responses, the experimenter provided praise, presented a plate with the five non-identical edibles, and delivered an edible from the plate. Similar to Toussaint et al. (2016), the experimenter’s selection was yoked to the participant’s selections from the previous child-choice condition session.

Procedural Integrity and Procedural Integrity IOA

An observer collected procedural integrity data using a checklist for a minimum of 34% of sessions for Christine and Zayne. The observer collected data on whether the experimenter correctly or incorrectly presented antecedent stimuli, followed the condition-specific reinforcer delivery procedure, and implemented other procedures (e.g., prompts) during each trial. For each session, the total number of trials implemented correctly was divided by the total number of trials and multiplied by 100 to get a percentage. For Christine, the procedural integrity was 99.5% (range, 91 to 100%) during the first treatment comparison and 100% during the second treatment comparison. For Zayne, the procedural integrity was 99.8% (range, 95 to 100%) during treatment comparison. A second observer collected treatment integrity data for 33% of sessions for IOA purposes. The mean total count IOA data were 100% across participants.

Results and Discussion

During his treatment comparison, Zayne (Fig. 1, top graph) demonstrated mastery responding in 13 training sessions (130 training trials, 17 min and 30 s of training time) and 15 training sessions (150 training trials, 22 min and 40 s) in the experimenter- and child-choice conditions, respectively. During her first treatment comparison, Christine (Fig. 1, bottom graph) demonstrated mastery responding in the experimenter-choice condition in 19 training sessions (190 training trials, 21 min and 42 s of training time), whereas responding reached the mastery criterion in 25 training sessions in the child-choice condition (250 training trials, 29 min and 1 s of training time). We observed similar results during Christine’s treatment comparison replication. She demonstrated mastery level responding in experimenter- and child-choice condition in six training sessions (90 training trials, 9 min and 17 s of training time) and 15 training sessions (225 training trials, 21 min and 1 s of training time), respectively. The results of the current evaluation suggest that under certain circumstances, providing choice of putative reinforcer may impede instruction if experimenter choice is in fact more efficient. This is supported by the additional total training time required to achieve mastery responding in the child-choice condition. More specifically, the child-choice condition was associated with a total training time that was approximately 30% greater during Zayne’s treatment comparison and 34 and 126% greater during Christine’s first and second treatment comparisons, respectively.

Fig. 1
figure 1

The percentage of correct responses across child-choice and experimenter-choice conditions for Zayne (top figure) and Christine (bottom figure)

The finding that the experimenter-choice condition was more efficient for both participants is in contrast to previous research that has generally demonstrated the superiority of child choice (Toussaint et al. 2016). Toussaint et al. (2016) found child choice of putative reinforcers conferred advantage during skill acquisition for two participants, and the performance of the third participant was equal across the conditions. Thus, the current study is the first to demonstrate that experimenter-choice condition resulted in more efficient acquisition than that of child choice.

Because our findings did not replicate those of Toussaint et al. (2016), an examination of differences between the studies is important. A first difference between the current study and Toussaint et al. is that we taught a listener response for one participant (Zayne), whereas Toussaint et al. taught speaker responses to all participants. A second potentially important variation is that different prompt-fading procedures were used. In the present study, we employed a constant prompt delay, whereas Toussant et al. employed a progressive prompt delay. A third variation is that a different number of putative reinforcers were simultaneously presented to participants. Toussaint et al. presented a plate containing three non-identical edibles, whereas we presented a plate containing five non-identical edibles. The increased number of edibles may have influenced participant responding (e.g., increased response latencies) in a way that would not be observed when fewer edibles are simultaneously presented. A final variation between the present and Toussaint et al. studies is that different reinforcement arrangements were utilized. In Toussaint et al., the experimenters’ initially arranged nondifferential reinforcement then transitioned to differential reinforcement when participants’ demonstrated two consecutive sessions with 50% unprompted correct responses. In the present study, we arranged nondifferential reinforcement throughout the entirety of the evaluation.

It is not clear what the exact impact the differences between the present and Toussaint et al. (2016) studies may have played in the different outcomes. Future research is needed to determine the impact of these differences and given our failure to systematically replicate it, future research should attempt direct replications of Toussaint et al.’s study. Research is also needed to delineate the conditions under which choice confers advantage during the acquisition of novel skills. For example, it may be worthwhile to explore the value of choice when stimuli of varying levels of preference (low, medium, high) are arranged as reinforcers under choice and no-choice conditions. In the current evaluation, only high-preference edibles were arranged during the treatment evaluation. It may be the case that high-preference stimuli decrease the value of choice, whereas different patterns of responding may be observed if stimuli of lower preference are presented in choice and no-choice arrangements following responding.

The consumer’s instructional history may have also play a role in the current findings. Previous research (e.g., Coon & Miguel 2012) has demonstrated that proximal history can influence subsequent responding. Evaluating the current findings in light of Coon and Miguel, experimenter choice may have led to a more efficient instruction due to a history of utilization. No specific information was available regarding the degree to which the current participants were provided choice-making opportunities during the course of their educational history. Future studies could evaluate the extent to which history influences the usefulness of choice during educational programming.

Only two participants were included in the current study, limiting external validity. Future studies should be conducted with a larger pool of participants. Such additional studies should elucidate the conditions under which students or experimenters should choose putative reinforcers. These outcomes would be helpful in guiding practitioners in how and when to provide choice-making opportunities in a way that maximizes instructional efficiency and the personal liberties of students with ASD.