Token economies can be broadly defined as a system where individuals can accumulate tokens and exchange them for backup reinforcers (Ayllon and Azrin 1965), which are stimuli already established as reinforcers, such as food, activities, and preferred items or toys. Tokens are often neutral stimuli (e.g., check marks, stickers, poker chips, or laminated paper squares) that become generalized conditioned reinforcers through pairing with a variety of valuable reinforcers. Token economies can be implemented on a large scale. For example, society participates in a large-scale token economy wherein individuals can accumulate money by engaging in a variety of behaviors (e.g., working, selling items, providing services, etc.), and can exchange them for a variety of backup reinforcers (e.g., food, clothing, shelter, services, etc.). Token economies can also be implemented on a smaller scale and be created for a specific individual. For example, a child can earn check marks for doing chores at home and exchange those checkmarks for different items such as access to television shows, candy, play time, etc.

Token economies have been extensively studied, and there has been an overwhelming amount of research that validates their use as a behavior management and motivational tool across: (1) settings, such as in residential facilities (Barkley et al. 1976), psychiatric wards (Lloyd and Abel 1970), prisons (Milan and McKee 1976), classrooms (Filcheck et al. 2004), and autism treatment clinics (Donaldson et al. 2014); (2) populations, for instance with nonhuman animals (e.g., rats, Malagodi 1967; chimpanzees, Wolfe 1936; and pigeons, Raiff et al. 2008), juvenile delinquents (Barkley et al. 1976), typically developing children (Filcheck et al. 2004), prisoners (Milan and McKee 1976), volunteers (Kent et al. 1977), and individuals diagnosed with autism (Tarbox et al. 2006); and (3) target behaviors, both in decreasing unwanted behaviors (e.g., disruptive behaviors, Filcheck et al. 2004; stuttering, Ingham and Andrews 1973) and in increasing desired behaviors (e.g., safety, Fox et al. 1987; social skills, Giley and Ringdahl 2014; and classroom participation, Boniecki and Moore 2003).

Given the success in decreasing unwanted behaviors and increasing desired behaviors, clinicians and teachers often implement this system when working with individuals diagnosed with autism spectrum disorders (ASD). More specifically, individuals diagnosed with ASD who receive behavioral therapy can benefit from a token economy, as it may allow for more teaching trials and less disruptions, thus increasing learning opportunities. This is because teachers and therapists are required to deliver the backup reinforcer less often, which is likely going to decrease the reinforcer consumption time as well as the disruption of the learning activity, which can ultimately increase learning opportunities. For example, when a client is asked to engage in a match-to-sample task, if after each correct response the therapist or teacher delivers a small edible reinforcer (e.g., skittle), the client will have to stop engaging with the task to consume the reinforcer, and, following consumption, must be redirected to the task at hand, causing the inter-response time (IRT) to accumulate. In contrast, if after each correct task, the client receives a token and the client can exchange them for the backup reinforcer only after a predetermined number of tokens are earned, the number of learning opportunities or trials may increase as the IRT may be drastically reduced.

When one designs a token economy for a client, there are several features that should be considered. These considerations include: (1) the type of tokens used (e.g., checkmarks, stickers, etc.), (2) the target behaviors that will produce tokens, (3) the number of tokens required for exchange, and (4) the backup reinforcers included in the economy (Kazdin 1977). Although token economies have been shown to be effective, there are several variations to each component that can alter its effectiveness. One parameter of the token economy that warrants further research is the effect that allowing an individual to physically manipulate a token has on the rate of task completion.

Within a token economy, a teacher may place a token on the token board, or the token can be given to the learner to place on the board. It is possible that having the ability to manipulate a tangible token could lead to a more effective token economy. Leaf et al. (2012) suggests that manipulation of the token will increase the saliency that a token was earned, which could strengthen the detection of the response-stimulus contingency. Additionally, the authors suggest that the learner hand in the completed token board to the clinician or teacher, as this may help to facilitate the student’s understanding that a full token board means reinforcement exchange is available (Leaf et al. 2012). On the other hand, manipulating tokens requires extra attention to the token itself and may disrupt learning.

As mentioned earlier, one purpose of implementing a token economy is to decrease the frequency of delivering the backup reinforcer, as the reinforcer consumption time and the reinforcer itself can interrupt the learning activity, which ultimately decreases learning opportunities. Similarly, manipulating the token may interrupt or delay the ongoing activity, as the learner must switch attention from the task to the token and back to the task. Further, they might choose to manipulate the token instead of completing the task (e.g., engage in stimulatory behavior, play with the token, etc.). Therefore, having the learner place the token on the board might lead to lower rates of the target response, which could result in fewer learning opportunities. Given this potential limitation and the lack of research supporting current recommended practices, it is important to understand the extent to which token manipulation impacts response rates and the overall effectiveness of a token economy for children with ASD.

The purpose of the present study was to evaluate the effects of token manipulation on response rates within a token economy with three young children with ASD. Further, we sought to evaluate participant preference between token manipulation and no token manipulation within a token economy.

Method

Participants

Participants for this study were recruited via recruitment flyers posted in a university-based behavioral treatment center for children with ASD. Flyers were posted in the general lobby area and on the center’s website. Information about possible research opportunities was also provided to caregivers via the telephone when they called the center inquiring about research projects. Caregivers who were interested in having their child participate in the study were asked to contact the first or second author directly via e-email or telephone. If the potential participant met the inclusion criteria, the researchers then met with the participant’s legal guardian to obtain informed consent. To be included in the study, participants had to meet the following criteria: (1) a diagnosis of ASD, (2) no known history with token economies or token training, (3) no severe challenging behavior, and (4) could follow 1-step instructions such as, “Pick one”.

Three children diagnosed with ASD, with no known prior exposure to a token economy, participated in this study. Jack was an 8-year-old boy with ASD who participated in a mainstream general education classroom, and did not receive behavioral services at the center in which the study was conducted, nor at another facility during the duration of the study. He was recruited via the research recruitment flyer that his mother read online. Peter, a 6-year-old boy with ASD, and Wendy, an 8-year-old girl with ASD, both received 15 h per week of behavioral services at the treatment center at which the study was conducted. Jack and Wendy were receiving behavioral intervention services targeting the core deficits of autism, which focused on increasing language and communication, social skills, and adaptive living skills. Both participants were recruited for the study after their case manager (i.e., a Board Certified Behavior Analyst) contacted the first author. Jack and Peter communicated vocally using full sentences, whereas Wendy communicated using a modified picture exchange system. Using pictures, Wendy was able to request preferred items, answer questions, and label some items. Wendy engaged in minimal vocalizations, mainly approximations for highly preferred items, and was also learning how to use a tablet-based voice output communication system. All participants could follow multistep instructions, had the motor ability to pick up and manipulate tokens, could tact multiple common objects, and demonstrated auditory–visual conditional discrimination skills.

Setting and Materials

All sessions occurred at a university-based behavioral treatment center for autism, the same center that Peter and Wendy attended. It should be noted, however, that research sessions were conducted outside of their typically-scheduled therapeutic sessions. All sessions were conducted in a 3 m × 3 m treatment room with a one-way mirror equipped with one table, two chairs, a video camera, and relevant session materials. Additional materials included token boards, tokens, the task materials, back-up reinforcers, a timer, and data sheets.

Token Board and Tokens

The token board consisted of a 25 cm × 25 cm laminated piece of paper with three strips of Velcro® attached. A different colored token board was used for each condition (the color of the token boards was determined by analyzing the results of a stimulus preference assessment with colors). For example, a brown token board may have been present during baseline, a blue board during token manipulation sessions, and a red board for the no token manipulation sessions. All tokens were 2.5 cm × 2.5 cm white laminated squares with a small piece of Velcro® adhered to one side. White laminated neutral tokens were chosen rather than preferred pictures or other items, such as poker chips, to minimize any confounding variables, such as visual or tactile stimulation, that may function as an additional reinforcer.

Target Response and Backup Reinforcers

For each participant, the target response was individually selected and consisted of previously mastered tasks presented on a worksheet. Jack’s target response was copying sentences on a worksheet. All sentences were between three and four words. Two sentences were available on each worksheet. One correct response was defined as copying one sentence (e.g., “The frog can jump.”). Peter’s target response was non-identical matching (e.g., matching the color red to the written word “red”). One correct response was defined as drawing a line from one stimulus from the left column to its corresponding stimulus in the right column. Peter could match the stimuli in any order. For both Jack and Peter, 10 different types of worksheets were presented in no particular order. Wendy’s target response was tracing letters. One response was defined as tracing one letter within a quarter of an inch of the line. One worksheet contained all letters of the alphabet and she could trace the letters in any order. All participants always had a stack (approximately 50–60 sheets) of their identified worksheets available. Peter and Jack had several of each type of worksheet available, while Wendy had a stack of her alphabet worksheets available. At the onset of each session, all participants had a stack of new worksheets available. Participants could complete the target response on the worksheet and select the type of worksheet in any order that they chose. They did not have to complete the entire worksheet before choosing another (they could have completed only one target response per worksheet if they chose to).

We interviewed caregivers (for all participants) and behavior therapists (for Peter and Wendy only) to identify several preferred items that could be used as backup reinforcers. Further, for Peter and Wendy, we reviewed previous stimulus preference assessment data provided by their case managers. Based on the information we received, we chose a variety of preferred items from which the participants could select. Before each session began, the participants were shown these preferred items and were asked to choose which item they wanted to earn. We chose to have the participants select the backup reinforcer prior to each session to decrease the likelihood of satiation effects. Jack and Peter worked for tangible backup reinforcers, such as iPad, blocks, action figures, and cars, whereas Wendy worked for edible backup reinforcers, such as Starburst, Laffy Taffy, and sprinkles.

Pre-Experimental Procedures

Color Preference Assessment

To increase the saliency of the different conditions, a different colored token board was assigned to each condition. To minimize any bias in responding toward a color, we conducted a color preference assessment to identify colors with equal preference. Thus, we minimized the possibility that the participant was engaging in more responses in one condition versus another due to the color of the token board rather than the contingency itself. A color preference assessment was conducted using paired stimulus preference assessment procedures, similar to those described by Fisher et al. (1992). Eight colored cards were included in the assessment. Each color was paired once with all the others. After the assessment was completed, we arranged colors according to preference by calculating the percentage of opportunities each color was selected (i.e., number of times the participant chose the colored card divided by the total number of opportunities to choose the colored card). Two colors with equal preference were identified and paired with an experimental condition (i.e., manipulation or no manipulation). All participants displayed equal preference to at least two colors within the 57–72% range. Two other colors with similar preference were identified and paired with the training and baseline conditions (preference assessment data available upon request).

Token Training

We conducted token training prior to the experimental conditions to ensure that all participants were properly trained on exchanging tokens and to pair tokens with the backup reinforcers. To avoid establishing a long history of token manipulation, we conducted brief training sessions. We first taught the participants to exchange the tokens. To do this, the researcher first placed all five tokens on the board (with no response requirement) and stated, “It is time to exchange your tokens.” If the participant did not independently exchange the tokens, we used least-to-most prompting to prompt the participant to emit the exchange response. Tokens were exchanged one at a time until all five tokens were exchanged. When all five tokens had been exchanged, the participant earned access to the backup reinforcer. The value of each token was 30-s access to a preferred tangible or one small piece of a preferred edible. The participants exchanged all tokens prior to receiving the backup reinforcer. In other words, the participants first handed the experimenter the five tokens and then they received either 150-s access to a preferred tangible or five small pieces of preferred edibles. These sessions continued until the participant independently exchanged all five tokens at least five times. After the participant had demonstrated mastery of exchanging tokens, we taught them how to place the tokens on the board prior to exchanging. We noncontingently gave the participant a token, one at a time, and prompted the participant to place it on the board. When all five tokens had been placed on the board, we stated, “It is time to exchange your tokens.” After all the tokens were exchanged, the participant was provided with the same access to their backup reinforcer as in the first part of training. Token training was completed when the participant was able to independently place and then exchange all five tokens on the board at least five times (detailed token training procedures and data are available upon request from the first author).

Response Measures and Data Collection

The primary dependent variable for this study was the rate of the target response (responses per minute), which was calculated by dividing the total frequency of the target response by the total session duration. All data were collected using paper and pencil. Frequency of the target response and any incorrect response (i.e., a response that did not meet the operational definition of a target response such as scribbling on the paper, errors in tracing, etc.) were recorded by making a tally mark on the data sheet after each response occurred. Total session duration was calculated by starting a timer after the experimenter stated the initial rule for the session and by stopping the timer after the last token was placed on the board (time to exchange the token and consumption of backup reinforcers were not included) or after 5 min elapsed, whichever came first.

Interobserver Agreement (IOA)

A second trained observer independently collected data across all conditions. IOA was either collected live or by a video recording of the session. To calculate total agreement, we divided the lower recorded frequency of the target response by the higher frequency and multiplied by 100 to yield a percentage score. IOA was collected for 72.7% of the sessions for Jack across all conditions, resulting in a mean agreement of 99.8% (range: 92–100%). We collected IOA for 74.7% of the sessions for Peter across all conditions, resulting in a mean agreement of 99.4% (range: 88–100%). Finally, IOA was collected for 86.7% of the sessions for Wendy across all conditions, with a mean agreement of 99% (range: 83.3–100%).

Treatment Integrity

Treatment integrity data were collected for all participants by an independent observer. Data were either collected live or via a video recording of the session. The percentage of correct session implementation was calculated by dividing the total number of steps implemented correctly by the total number of steps and multiplying by 100 to yield a percentage score. We measured the following steps for integrity purposes: (1) using the correct token board (i.e., correct color), (2) delivering the token on the appropriate schedule [fixed-ratio 3 (FR 3)], (3) following token manipulation procedures according to the condition in place, and (4) providing the appropriate duration or quantity of the backup reinforcer. Treatment integrity was collected for 36.4% of the sessions for Jack across all conditions, and averaged 97% (range: 77–100%). Treatment integrity was collected for 33.3% and 36.7% of the sessions for Peter and Wendy, respectively, across all conditions, and averaged 100%.

Experimental Design and General Procedures

We used a multi-element design for Wendy, and an ABAB design within an embedded multi-element design for Jack and Peter, to evaluate the effects of token manipulation on response rate. To minimize extraneous variables affecting the differences of the two conditions, we conducted the same number of sessions of each condition per day. The order of conditions was determined by assigning each condition a number and then randomly selecting a number using a random number generator to decide which condition would be conducted first in the series. Thus, no condition was conducted more than two sessions in a row. We conducted 2–4 sessions per day, 2–3 days per week. All sessions ended after five tokens were earned or after 5 min elapsed. We chose this termination criteria by assessing approximately how long it took the participants to complete 15 responses, which was chosen as the response requirement to allow repeated exposure to the conditions within each session.

At the beginning of each session, the experimenter adhered (via Velcro®) the relevant colored token board to the middle of the table. Prior to the start of the session, the experimenter delivered an instruction to inform the participant about the condition in effect. For example, prior to the token manipulation condition the experimenter said, “You are going to work on the (color) token board. If you want, you can work. You are going to receive tokens. I will give you the tokens that you can place on the board. Ready, go!” The experimenter remained in the room, sitting across from the participant at the table for all sessions across all conditions. At the end of the session, the therapist delivered a vocal prompt that the session was completed and, if applicable, that it was time to exchange the tokens. Each token was exchanged for either 30 s of access to a preferred tangible item (e.g., video) or one edible (e.g., jelly bean). Participants exchanged one token at a time, and, after all tokens had been exchanged, the experimenter delivered the backup reinforcer. For example, if the participant had earned three tokens, he/she would give one token at a time to the experimenter and then receive either 90-s access to the tangible reinforcer or three small pieces of edibles. If the participant earned all five tokens, he/she would exchange them for either 150-s access to the tangible reinforcer or five small pieces of the edible.

Baseline

Materials to complete the target response and a colored token board that corresponded with baseline were present during baseline sessions. Although the participants were not earning tokens, we chose to have a board present during baseline to ensure the presence of the board alone did not influence responding (i.e., to eliminate the presence of the board as a confounding variable). At the start of the session, the experimenter delivered the instruction, “You are going to work on the (color) token board. If you want, you can work, but you will not receive any tokens. Ready, go!” There were no programed consequences for task completion, meaning there were no tokens, backup reinforcers, or praise delivered contingent on task completion. The session was terminated after the participant completed 15 correct responses, or after 5 min elapsed.

Token Manipulation

During the token manipulation condition, a token board, the task materials, tokens, and backup reinforcers were present. At the start of the session, the experimenter delivered the instruction, “You are going to work on the (color) token board. If you want, you can work. You are going to receive tokens. I will give you the tokens that you can place on the board. Ready, go!” Incorrect responses did not produce any programmed consequences. Contingent on every correct response, the experimenter delivered a neutral statement (i.e., “That’s matching.”). Tokens were delivered on an FR 3 schedule by placing a token next to the participant along with saying, “Token.” We did not prompt the participant to place the tokens on the board during the session. Moreover, no prompts were provided to interact or stop interacting with the tokens. The session was terminated after the participant earned five tokens (15 correct responses), or after 5 min elapsed. At the end of the session, the experimenter delivered a vocal prompt to the participant to exchange his tokens, “It is time to exchange your tokens.” If at this time the participant had not placed the tokens on the board, the experimenter prompted them to place them on the board before exchanging them. The participant then exchanged any tokens earned by taking them off the token board and handing them over to the experimenter.

No Token Manipulation

This condition was conducted in the same manner as the token manipulation condition with the following exceptions: (1) the experimenter delivered the verbal prompt: “You are going to work on the (color) token board. If you want, you can work. You are going to receive tokens. I will be the one to put the tokens on the board. Ready, go!” (2) the experimenter, rather than the participant, placed a token on the board, and (3) during token exchange, the experimenter removed any tokens that the participant earned. In other words, during this condition, the participant did not have the ability to physically interact with the tokens throughout the entire session.

Concurrent-Chains Schedule (Preference Assessment)

Following the comparison of the token manipulation and no manipulation conditions, we conducted an assessment to determine whether participants showed a preference for one condition over another. We used procedures similar to those described in Experiment 2 of DeLeon et al. (2014). Specifically, in a concurrent-chains schedule arrangement, the relative response allocation in the initial link (i.e., selecting a token board) was used to determine relative preference for baseline (no tokens), token manipulation, or no token manipulation.

At the start of each session, the three token boards (those used during baseline, token manipulation, and no token manipulation conditions) were placed side by side on the table in a random order, equi-distant from the participant. The experimenter stated the rule for each condition (while pointing to the board) and asked the participant to choose the board they wanted to use. Choice was defined as point to, picking up, or touching a token board with any part of the hand. It should be noted that we chose to provide a contingency-specifying stimulus (e.g., rule) rather than implementing forced-exposure as DeLeon et al. (2014) did. After the participant selected a token board, the corresponding session was conducted as described previously. We analyzed preference by graphing the choice response during the initial link of the concurrent-chains arrangement using a cumulative graph. Preference was determined when there was a clear and consistent separation in the data paths. The choice assessment continued until preference was identified or until 15 sessions were conducted with no differentiation between the data paths.

Results

Figure 1 shows responses per minute of task completion (left panel) and choice assessment results (right panel) for all participants. All the participants engaged in higher rates of responding during the token manipulation and no token manipulation conditions relative to baseline. Jack (top left panel) engaged in low levels of task completion during baseline (M = 0.3; range: 0–2.4). During the treatment comparison, Jack responded at slightly higher rates in the no token manipulation condition (M = 2.80; range: 2.2–4.5) than the token manipulation condition (M = 2.6; range: 2–4.4); however, differences were negligible. Jack’s responding during the initial baseline and comparison conditions were similar in the replication phases. During the choice assessment (top right panel), Jack chose the token manipulation board on most opportunities and showed a clear preference for the token manipulation condition through the choice assessment.

Fig. 1
figure 1

Left panel the rate of task completion; right panel the cumulative choice to evaluate preference. Closed circles the token manipulation condition, open circles the no manipulation condition, and closed squares baseline

Similarly, Peter (middle left panel) engaged in low levels of task completion during baseline (M = 2.42; range: 0–6.5) and higher rates of responding during the no token manipulation condition (M = 8.6; range: 0–27.3) than in the token manipulation condition (M = 8; range: 1.8–14.6). During session 30, Peter showed extremely high rates of responding during a no token manipulation condition. If one were to exclude this outlier, differences between the two conditions were negligible, suggesting token manipulation did not influence rates of responding. Peter’s responding during the initial baseline and comparison conditions were similar to his responding in the replication phases. Peter showed no preference during the choice assessment (middle right panel) until the last four sessions, during which he consistently chose the token manipulation condition.

Wendy (bottom left panel) engaged in low levels of task completion during baseline (M = 1.7; range: 0–9.3). During the treatment comparison condition, however, Wendy showed increased rates of responding in both conditions relative to baseline. Although, initially, only a slight difference was observed, Wendy consistently responded at higher rates in the no token manipulation condition throughout the comparison. During the last four series of the comparison, however, differences in response rates between the two conditions increased, with responding maintaining around 20 responses per minute in the no token manipulation condition and about half of that rate in the token manipulation condition. Overall, when evaluating mean levels of responding across the two conditions, response rates occurred at nearly twice the rate in the no token manipulation condition (M = 16.58; range: 0–25.7) compared to the token manipulation condition (M = 9.7; range: 0–20.5). During the choice assessment, Wendy did not show a preference for either condition for the first nine sessions, after which she showed a consistent preference for the token manipulation condition.

Discussion

The purpose of this study was to evaluate whether physically manipulating tokens within a token economy influenced rates of responding. Further, we aimed to determine whether the participants showed a preference for one condition over another. Overall, the results of our study showed that a token economy, regardless of whether the participant physically manipulated the tokens, successfully increased rates of responding relative to no reinforcement baseline, providing further support for the use of token economies to increase response rates with children with ASD. When analyzing the data to determine whether allowing participants to manipulate tokens affected overall response rates, the results were mixed. Two participates (Jack and Peter) showed no difference in response rates across the token manipulation and no token manipulation conditions, while higher response rates were observed during the no token manipulation condition for one participant (Wendy). Finally, when evaluating the preferences for either condition, all participants eventually showed a preference for the token manipulation condition over no token manipulation and no token baseline conditions.

Assessing the extent to which physically manipulating tokens within the token economy influences responding may have several important implications. First, by physically manipulating tokens, an individual is likely to engage in a response that is incompatible with the target behavior that produced the token, thus increasing the likelihood of lower response rates. Given that reinforcer effectiveness is often evaluated by analyzing rates of responding, it is necessary to determine whether potential lower response rates are due to variables other than reinforcing value of the backup or backup reinforcers. For example, if an individual shows low response rates during a token evaluation, it may be a result of competing behaviors such as playing with or manipulating the token or even engaging in stereotypy when manipulating the token. Further, other behaviors that occur within the IRT may be attributed to token manipulation, such as the individual having to orient back to the task after receiving the token. In combination, these behaviors are often referred to as handling costs and may impact overall response rates for some individuals, as we may have seen with Wendy (DeLeon et al., 2014).

Because the purpose of this study was to simply determine whether token manipulation affected responding, we did not measure behaviors that occurred between responding that may have influenced these response rates. In our study, we saw minimal differences in rates of responding, especially with Jack and Peter, suggesting that other behaviors occurring within the IRT of the target response did not drastically impact overall response rates. Wendy, on the other hand, did show increased responding in the no token manipulation condition. One explanation for this pattern of responding could be that she was engaging in behaviors that were competing with the target response which may have included token manipulation or other behaviors related to handling costs.

Although physically manipulating tokens may increase behaviors associated with handling costs and therefore produce lower response rates, there may also be benefits to allowing manipulation. As Hackenberg (2018) discusses, individuals can demonstrate token-directed behaviors during token economies, typically occurring when the token itself generates behavior directed toward the token. For example, in basic research with nonhuman animals, organisms may demonstrate consummatory responses such as mouthing the tokens as they would food (Kelleher 1958; Malagodi 1967). For individuals with ASD or other developmental disabilities, token directed behavior may occur in the form of stimulatory behaviors. Charlop-Christy and Haymes (1998) showed that idiosyncratic tokens developed specifically for each participant produced higher levels of the target behavior when compared to arbitrary tokens. Further, the authors reported less off-task behavior when using the idiosyncratic tokens. The results of the Charlop-Christy and Haymes (1998) study may have been a result of the idiosyncratic token functioning as a discriminative stimulus for the availability to engage in stereotypy by signaling that stimulatory behavior could occur. Hackenberg (2018) states that, although it is likely that the tokens used in their study established their reinforcing value through pairing with other reinforcers, the reinforcing value may have increased, as the tokens were also serving as a discriminative stimulus for stimulatory behavior. Thus, it may be likely that higher levels of responding could be observed when allowing an individual to physically manipulate the token, as it serves as a more potent reinforcer.

In the current study, it may be the case that Wendy engaged in token directed behaviors, such as those described by Hackenberg (2018), stereotypy with the tokens, or other behaviors associated with handling costs that resulted in lower response rates when given the opportunity to manipulate tokens. Unfortunately, because we did not measure behaviors that occurred during the IRT, we cannot make statements about what variables may have contributed to the differences in response rates. Given Wendy’s data and those described in the Charlop-Christy and Haymes (1998) study, this may be an important variable to consider in future research, specifically when individuals are emitting lower than expected response rates. Therefore, if researchers attempt to replicate this study, they may consider measuring behaviors that occur within the IRT, such as token manipulation, stereotypy with tokens, and those behaviors associated with handling costs, to better understand patterns of responding and the variables that contribute to potential differences among the conditions.

Another factor that may influence the extent to which token manipulation influences response rates is the physical characteristics of the token itself. In the current study, we chose to use neutral tokens to reduce any visual or tactile stimulation that may have served as an automatic reinforcer. For example, if the tokens included a picture of a preferred stimulus (e.g., favorite cartoon character) or was stimulating to touch (e.g., ridges of a poker chip), we may have seen differences in response rates favoring the no token manipulation condition, making it difficult to isolate the variable of manipulation alone. It is quite possible that results may have differed, particularly for Jack and Peter, if the tokens used in this study were more visually appealing or tactilely more stimulating. Thus, future research should replicate this study using various types of tokens that may be more reinforcing to interact with or manipulate. Future researchers also may want to include measures of stimulatory behaviors with the token to determine if physical manipulation of tokens evokes or abates stimulatory behaviors. Further, researchers may consider measuring the handling costs associated with physical manipulation to ensure any lower response levels are not a product of motivation.

One major contribution of this study was the evaluation of preference for physical manipulation. As we did not see large differences in response rates across the two token conditions during the comparison, assessing preference for token manipulation allowed us to develop clinical recommendations. For example, one may consider weighing student preference more heavily when there are little to no differences in response rates. Given similar or equal response rates, one may argue that allowing the individual to help design the intervention by choosing variables such as token manipulation would enhance the social validity, while providing individuals with disabilities with more input in designing their treatment plans.

Take, for example, the differing results in this study and its clinical implications. Specifically, Jack showed no difference in response rates across the two conditions but showed a preference for the token manipulation condition. Given that allowing Jack to manipulate tokens did not impede responding, it may be more socially valid to recommend that Jack be allowed to manipulate his own tokens. In contrast, Wendy showed higher rates of responding in the no token manipulation but preferred the token manipulation. In a case like this, the clinician or teacher may have to weigh his or her options and implement a more preferred intervention, potentially resulting in lower response rates, or achieving higher response rates even though it may mean using the lesser preferred intervention.

Despite the promising results of this study, there are aspects that we should note and may warrant additional investigation. First, only Wendy showed differentiated responding between the two token conditions. It may be the case that Jack and Peter were responding at a ceiling level in both token conditions, or that both were equally as effective; however, there may also be additional explanations for the lack of differentiation. For instance, the experimenters delivered an auditory stimulus, “token,” simultaneously with the tangible token. We did this to increase the likelihood that the participant attended to the token delivery, especially in the no token manipulation condition. However, the word “token” may have functioned as the conditioned reinforcer itself. In other words, the auditory stimulus may have served as the “token” and was more salient. If this were the case, the auditory stimulus itself may have been sufficient to maintain high levels of responding, regardless of the tangible token delivery. Anecdotally, every time that Peter and Jack received a token, they would look at the token board, but it is still unknown whether we would have observed the same patterns of responding if we had not included the auditory stimulus. Additionally, for all participants, the contingent delivery of a social statement following each response may be considered a confound. Because we did not include social statements during baseline, it is unclear whether the statement alone was reinforcing enough to produce the high rates of responding that we saw in the token conditions. Future research may consider replicating these procedures without the use of a social statement or auditory stimulus.

Second, we did not formally test whether participants were able to discriminate between the token boards representing each condition. Thus, the results of the choice assessment may, specifically, be due to a lack of discrimination. However, this is likely not the case, as all participants chose the no token manipulation condition minimally during the concurrent-chains schedule assessment, suggesting discrimination occurred. Further, we chose not to complete the forced-choice trials prior to assessing preference with the concurrent-chains arrangement as the participants were repeatedly exposed to each condition in the previous phase of the study.

Third, we did not replicate Wendy’s results using a reversal design, potentially weakening the experimental control. Lastly, this study only evaluated the rate of task completion using mastered tasks. However, it is possible that token manipulation influences responding differently during acquisition tasks. Therefore, future research should evaluate how token manipulation influences responding during acquisition tasks.

In conclusion, this study replicates previous research by demonstrating the effectiveness of a token economy when used with children with ASD. Further, it provides preliminary results suggesting that, at least in some cases, token manipulation may impact response rates and be more preferred when compared to no manipulation.