Keywords

Most behavior–analytic interventions rely on the careful, systematic arrangement of reinforcement contingencies—a given consequence is delivered contingent upon a desirable response under conditions in which that response would be appropriate and the consequence would be valuable to the individual. That response is “strengthened” insofar as it becomes more probable under future similar circumstances. A token economy is a specific kind of flexible arrangement of reinforcement contingencies, used frequently in therapeutic and instructional contexts, in which the delivered consequence is a conditioned reinforcer, a token, that is later exchangeable for other reinforcers. The reference to “economies” derives from its resemblance to how we all learn to exchange earned arbitrary symbolic units (e.g., coins and paper money) for goods and services. That is, earned tokens are exchanged for backup reinforcers in much the same way that money is exchanged for goods and services in conventional economic systems.

Token economies offer several advantages over other reinforcement systems that rely on the direct delivery of the backup reinforcers (discussed below). For example, tokens are easy to administer and provide learners with a salient marker that represents their progress. As such, they have become common across many settings. In a survey of 406 professionals who serve people with developmental disabilities, Graff and Karsten (2012) found that tokens were the second most delivered programmed reinforcer, following only verbal praise. Token economies can be implemented in the same structured format for multiple individuals in a given setting. They can also be individualized for each client, as there are essential components that can be adjusted to suit each individual’s needs and circumstances. In what follows, we provide a detailed description of these components of a token economy. We follow with an elaborated description of the advantages afforded by these systems over other reinforcement arrangements, advantages that resulted in their widespread adoption. We further describe the history and use of token economies, touching upon the multiple contexts in which token economies have been used. We end with a variety of additional considerations, including embedding punitive outcomes within a token system and concerns that have been expressed about the use of token systems.

Components of a Token Economy

Although descriptions vary, most agree that token economies consist of seven essential components (e.g., Miltenberger, 2012; Hine et al., 2017; Ivy et al., 2017) that can be individualized for each participant or setting and can be adjusted based on performance. These components include (1) identification of the target behaviors, (2) identification of the stimuli used as tokens, (3) identification of backup reinforcers, (4) arranging the token-production schedule, (5) arranging the exchange-production schedule, (6) arranging the token-exchange schedule, and (7) token-training procedures. Below, we address each component by providing a description of the process and product, supplemented by research on best practices when available.

Identifying Target Behaviors

Prior to implementing a token economy, one must identify and operationally define the target behavior(s) that will result in token delivery . Target behaviors should be socially significant (Miltenberger, 2012) and appropriate for the client’s repertoire and treatment goals. Examples of potential target behaviors include communication responses (e.g., Mason et al., 2015), daily living activities (e.g., Paul & Lentz, 1977), alternatives to problem behavior (e.g., Christensen et al., 2004), and health-related behaviors (e.g., DeLuca & Holborn, 1992; Patel et al., 2019). Tokens have also been delivered contingent on the absence of problem behavior, for example, by arranging token delivery on a differential reinforcement of other behavior (DRO) schedule (e.g., Didden et al., 1997; Donaldson et al., 2014). Problem behaviors may also be directly targeted when response cost is incorporated into a token system, as discussed in greater detail below.

As in all behavioral interventions, a critical issue is careful specification of the operational definitions of the target behavior(s). Loose descriptions of the behavioral criteria that result in token delivery (or token removal) can result in suboptimal performance within the token economy. For example, Moore et al. (2001) conducted an informal component analysis of an ineffective token economy in an inpatient psychiatric facility and identified two critical problems. One involved the delay in exchanging the tokens. Another was that the target behaviors had not been operationally defined. The latter was addressed by more clearly specifying the behavior(s) that would result in token delivery. Prior to the modification, the criteria were loosely defined as “follow directions,” “be nice,” and “be where you are supposed to be.” These definitions were clarified for both staff and children. For example, “be nice” was transformed into “remaining at least two feet away from another child” because the key issue seemed to be participants invading each other’s personal space. Although the other modification (altering the delay to token exchange) ultimately had a greater impact, simply modifying the operational definitions had a clear and consistent influence on the number of tokens earned collectively by the participants.

Identifying Stimuli Used as Tokens

In most cases, the tokens themselves are initially behaviorally neutral stimuli. That is, they have little pre-experimental or pre-therapeutic value or stimulus function but acquire value or stimulus functions through token training. The tokens themselves can vary along several dimensions, and common examples include poker chips, laminated images, plastic coins, and check marks. Token system developers must consider several factors when selecting stimuli to use as tokens. First, tokens should be safe to manipulate and should not pose a choking hazard, particularly when working with children and individuals with neurodevelopmental disorders. In addition, tokens should be items that clinicians can easily transport and deliver quickly, such that they immediately follow the occurrence or nonoccurrence of the target behavior. Relatedly, tokens should be stimuli that clients can easily carry, accumulate, and exchange. Tokens should also be durable because they will likely be used throughout the course of an intervention. Clinicians should also avoid using items that are readily available in the environment to prevent learners from bootlegging or counterfeiting tokens. For example, star stickers may be placed on a chore chart in a school-based token economy, but the stickers should not be available elsewhere in the classroom and some marking might be added to differentiate them from stickers that are available in stores or at home.

Clinicians should also consider whether tokens should be manipulable or nonmanipulable. A manipulable token is an item that is physically handled by the learner during token production and exchange (e.g., a poker chip), whereas a nonmanipulable token is a stimulus whose delivery and exchange are mediated by the practitioner (e.g., a check mark on a board or virtual tokens on an iPad). Physical token manipulation may increase the saliency of the response–reinforcer contingency (Leaf et al., 2012). On the other hand, manipulable tokens may occasion token-directed behavior (e.g., playing with the tokens, tapping tokens on a table) that can interfere with learning by increasing the time between learning trials and decreasing overall instructional time. Sleiman et al. (2020) compared relative rates of responding when manipulable or nonmanipulable tokens were provided for academic task completion in three children with autism spectrum disorder (ASD). One participant engaged in higher rates of responding in the nonmanipulable token condition, while the remaining participants engaged in similar rates across both conditions. All three participants demonstrated a preference for manipulable tokens in a preference assessment. In the absence of functional differences, clinicians should consider the client’s preference. However, if the learner engages in token-directed behavior that interferes with learning, nonmanipulable tokens should be considered. Nonmanipulable tokens may also be beneficial when an exchange response cannot be trained (Hine et al., 2017) or if the client is likely to engage in problem behavior related to token exchange. This may be particularly relevant in a response-cost system if learners are asked to relinquish tokens contingent upon undesirable behavior.

When selecting stimuli to use as tokens, clinicians must also decide whether to use novel or preferred stimuli (i.e., interest-based tokens). An example of an interest-based token is using laminated images of Mickey Mouse as tokens for a client known to be particularly fond of Mickey Mouse. The advantage of novel stimuli is that the clinician controls the learner’s history with tokens via token training. However, token training can be time-consuming. Alternatively, stimuli that are already preferred by the learner might already function as conditioned reinforcers and, thus, might require less training time. Fernandez (2021) recently completed an internet survey of token economy practices among clinicians involved in early intervention for learners with autism spectrum disorder and found that roughly 70% of clinicians reported using interest-based tokens in clinical practice. Charlop-Christy and Haymes (1998) found that using stimuli with which children were often preoccupied (i.e., an object of obsession) as tokens resulted in more correct responding and less problem behavior than novel tokens. However, it is unknown whether participants engaged in token-directed behavior outside the definition of problem behavior that might have produced longer intertrial intervals. Nonetheless, when using interest-based tokens over extended periods of time, preference for the stimulus being used as a token may diminish and reduce the tokens’ effectiveness. However, this is unlikely to occur if the interest-based token is exchangeable for valuable backup reinforcers. Thus, clinicians should regularly evaluate the effectiveness of backup reinforcers and not rely solely on the previously existing conditioned reinforcing properties of an interest-based token.

Identifying Backup Reinforcers

Backup reinforcers are the stimuli or activities for which clients exchange their tokens (Hackenberg, 2018). The nature of viable backup reinforcers will vary, of course, depending on the population and what is readily available in the setting. Examples of backup reinforcers often used in token economies for children with neurodevelopmental disorders include edible reinforcers, leisure items (e.g., access to a tablet, toy cars), outdoor playtime, and escape (i.e., break from demands). Notably, one can also include reinforcers that are “free” and readily available, such as opportunities for social interactions with caregivers. By contrast, several studies have used token economies to support desirable behavior in workplace settings with typically developing adults (e.g., Camden et al., 2011; Vergason & Gravina, 2020), where the sorts of reinforcers listed above would clearly be less relevant. Simonian et al. (2020) conducted a systematic review of methods used to identify effective reinforcers for employees in organizational settings. Common candidate back-up reinforcers included items such as gift cards (themselves, characterizable as tokens), coupons, opportunities to leave work early, opportunities to choose work assignments, and preferred parking.

Independent of the population or setting, when selecting backup reinforcers, clinicians should select preferred items, activities, or privileges identified via a preference assessment and demonstrated to support appropriate behavior. Ideally, one should conduct direct preference assessments, which involve the systematic presentation of stimuli and observation of the learners’ approach, selection, and/or consumption responses. Researchers have developed and evaluated several methods of conducting systematic preference assessments. Options include single-stimulus presentation methods (DeLeon et al., 1999; Pace et al., 1985), paired-stimulus presentations methods (Fisher et al., 1992), and multiple-stimulus presentation methods (DeLeon & Iwata, 1996; Hanley et al., 2003; Roane et al., 1998). A detailed description of these methods is beyond the scope of the current chapter, but the reader is directed to Virues-Ortega et al. (2014), who describe each of these methods and provide guidance on selecting the most appropriate preference assessment method under varying circumstances.

For time-based backup reinforcers (e.g., playtime, tablet, and escape), the duration of access should be directly related to the number of tokens exchanged for that activity. For example, if each token is exchangeable for 30 s of tablet access, the client should receive 5 min of tablet time in exchange for 10 tokens. Therefore, clinicians should consider whether they can easily control access to backup reinforcers (i.e., remove when access time expires). Clinicians should also consider selecting backup reinforcers that can be restricted to the token economy, such that the learner can only access the item by exchanging tokens (i.e., a closed economy). Several authors have observed that free access to reinforcers outside of the context in which they must be earned (i.e., an “open economy”) can reduce levels of responding within the earning context (e.g., Kodak et al., 2007; Roane et al., 2005). Thus, access to backup reinforcers outside of the context of the token economy might suppress motivation and responding, limiting the system’s effectiveness.

Setting the Token-Production Schedule

The token-production schedule specifies how and when target behaviors will produce tokens, or in other words, “the rule that describes the specific response requirements and environmental conditions that must be satisfied for token delivery” (Ivy et al., 2017, p. 723). Token-production schedules can theoretically mirror any arrangement that has been studied in behavior–analytic research (see DeLeon et al. (2013), for a description of schedule variations in applied settings) and vary necessarily depending on the clinical target. However, in practice, most clinical researchers adopt a fixed-ratio (FR) or variable-ratio (VR) schedule of reinforcer delivery (Fernandez, 2021). In a FR schedule arrangement, the token is delivered following the emission of a fixed (unvarying) number of target responses, whereas a VR schedule implies that the number of required responses can vary but is anchored to a mean (e.g., a VR 3 schedule implies that a token would be delivered after a mean of 3 responses, but delivery of any single token might occur following a number of responses that ranges between 1 and 5). That said, Ivy et al. (2017) reported finding examples of FR schedules, VR schedules, differential reinforcement of alternative behavior (DRA) schedules, differential reinforcement of incompatible behavior (DRI) schedules, and DRO schedules as token-production schedules in their review of token research.

Token-production schedules seem to result in response rates and patterns typical of schedules of direct reinforcement (Hackenberg, 2018). For example, DeLuca and Holborn (1992) found that successively increasing a variable-ratio (VR) token-production schedule produced response rate increases as would be expected based on what is known about the relation between schedule values and performance on VR schedules in basic research. When setting token-production schedules, one should therefore consider the goals of the intervention. Still, during initial training, clinicians and researchers generally begin by delivering tokens on a very dense schedule (e.g., FR 1) to establish a consistent relation between the response and delivery of the token. Once the target response meets a mastery criterion, the schedule for that response may be thinned to an intermittent schedule for practical purposes. When selecting an intermittent schedule, clinicians should consider the natural schedule under which the target response will be maintained as well as optimal response rates. In some instances, different intermittent schedules may be similarly effective (e.g., Repp & Dietz, 1975). In such cases, one should consider the client’s preferences when selecting the token-production schedule. One should directly train, or verbally describe, the response requirements to produce tokens (Ivy et al., 2017).

However, one should approach increasing the token-production schedule with caution. In token systems in which the amount or duration of backup reinforcers is directly tied to the number of tokens earned, increasing the token-production schedule necessarily increases the ratio of responses to reinforcers (i.e., the unit price), such that more responses are required for each reinforcer delivery. For example, Hackenberg (2018) describes that doubling the token-production schedule doubles the ratio of responses required to produce one reinforcer. As such, clients might demonstrate decreased responding indicative of ratio strain, a situation in which the targeted performance ceases to occur because the behavioral cost of each token has become too high. Thus, Hackenberg (2018) recommends holding the token-production constant and increasing the exchange-production schedule instead. However, if the token-production schedule will be changed, one should consider gradually thinning the schedule to prevent ratio strain (e.g., Ackerman et al., 2020).

A further important consideration about token-production schedules is the immediacy of token delivery. Tokens should be delivered immediately after the target behavior, as delays in token delivery tend to decrease task compliance (Boerke & Reitman, 2011) and response rates and increase latency to responding (Leon et al., 2016). Leon et al. (2016) found that delays to token deliveries as brief as 3- to 6-s produced decrements in responding relative to immediate delivery. Moreover, clinicians should consider and evaluate unprogrammed delays in token delivery (i.e., failures in treatment integrity) as a possible explanation for decrements in responding that may emerge throughout the course of the intervention and for initially low levels of responding that cannot be explained otherwise.

Setting the Exchange-Production Schedule

The exchange-production schedule specifies how and when the client will exchange earned tokens for backup reinforcers. Calling this a “schedule” in the same sense as the token-production schedule, seemingly implies that a specific number of tokens must be earned before an exchange opportunity is arranged. This is accurate in some cases, but in actual practice, exchange-production schedules can take numerous other forms. For example, in an instructional context for learners with ASD, the clinician might arrange an exchange opportunity (1) after a certain number of tokens have been earned, (2) at the end of the instructional session independent of how many tokens have been earned, or (3) at the end of the day (or even at the end of the week) during a convenient time for exchange. The last exemplifies a time-based exchange-production schedule rather than a response-based exchange-production schedule. Ivy et al. (2017) reported that 60% of the studies they reviewed employed time-based exchange production schedules; they seemingly have become the norm. The timing of opportunities to exchange can have a significant impact on the effectiveness of a token economy. Field et al. (2004) reported meaningful improvements in token economy effects in initially “nonresponsive youth” by changing the frequency and immediacy of access to back reinforcers. By increasing exchange opportunities from once a day to twice a day and by halving the number of tokens (or points in their case) that a child needed to earn access to preferred consequences, the authors observed clear increases in the frequency of earning backup reinforcers and corresponding decreases in “intensive behavioral episodes.”

Backup reinforcers may be displayed in a “token store,” for example, by using either a menu format or in a room in which individuals can make their purchases. In time-based schedules, individuals should only have access to the token store at designated times, which may include predetermined store hours (e.g., from 1:00 pm to 4:00 pm). Of course, the time and place where token exchange will occur must be decided in advance. When first establishing a token economy, the token store should be available frequently, but exchange-production schedules could be thinned across time and become available more intermittently for practical purposes (Cooper et al., 2020). McLaughlin and Malaby (1976) assessed the effects of fixed-time (FT) and variable-time (VT) exchange-production schedules on assignment completion of a fifth- and sixth-grade class. Both schedules were equal to 5 days, with token exchange occurring between 3, 5, 7, and 9 days under the VT schedule. Although the FT schedule produced between 88% and 100% assignment completion, the VT schedule produced less variable responding (i.e., 100% assignment completion). Thus, variable exchange-production schedules may produce more consistent responding during token production.

Arranging response-based exchange-production schedules may require careful consideration owing to second-order effects. That is, responding in the token-production schedule may be affected by the exchange-production schedule. Nonhuman research has shown that FR exchange-production schedules can produce decreased response rates and longer postreinforcement pauses (PRP) on the token-production schedule than equivalent VR exchange-production schedules (Bullock & Hackenberg, 2006; Foster et al., 2001). However, Argueta et al. (2019) found that FR and VR exchange-production schedules did not significantly affect responding on an arbitrary task in a child with ASD. Both schedules also produced similar pause-reinforcement pause (PRP) durations, with the exception of the VR2 schedule, which produced slightly longer PRPs and decreased relative to FR2. The differences observed in PRP may be an artifact of the backup reinforcer used in the study (i.e., videogames on an iPad). Given that each token was exchangeable for 15 s of access to the iPad, exchanges occurring following the accumulation of one token may have produced an aversive context in which access to the backup reinforcer was brief and distributed. Distributed reinforcement arrangements have been shown to produce decreased levels of responding and to be less preferred (see DeLeon et al., 2014). Therefore, it is also important to consider whether the reinforcer potency of the backup reinforcer is enhanced by accumulated access when determining the exchange-production schedule.

Regardless of whether the exchange-production schedule is time- or response-based, the schedule should be dense initially to maximize contact with backup reinforcers. Over time, the scheduled can be thinned. If the client stops responding during schedule thinning, the clinician may consider returning to a previous, denser schedule as decreased responding may be indicative of a different kind of ratio strain—a situation in which the targeted performance ceases to occur because the opportunities to exchange have become too few and far between.

Setting the Token-Exchange Schedule

The third schedule to consider is the token-exchange schedule, which specifies how many tokens must be exchanged for a given backup reinforcer or the “price” of each backup reinforcer. For example, each token could be exchanged on a one-to-one ratio for a preferred edible reinforcer or 30-s access to an iPad. There are no explicit rules on how to set the prices of backup reinforcers, and researchers have adopted several strategies. One method is to set the price of all backup reinforcers at the same number of tokens (Akin-Little & Little, 2004). For example, all high, moderate, and low preferred backup reinforcers cost five tokens. Alternatively, the price of each backup reinforcer may vary based on the learner’s preferences or reinforcer availability. For example, the learner’s highest preferred item is set at 10 tokens, while a moderately preferred and low-preferred items are set at 5 and 1 tokens, respectively. Leaners may then accumulate tokens to access higher priced backup reinforcers (Ackerman et al., 2020). Fernandez (2021) reported that 48% of clinicians determined the price of backup reinforcers based on learner’s preferences. In this case, presumably, more preferred items are set at higher prices as a means of promoting motivation to earn tokens, but no research is available to our knowledge to endorse this practice.

The learner’s level of functioning may also be an important consideration when selecting a strategy to set the price of backup reinforcers. While higher functioning learners may be able to effectively accumulate and distribute tokens among a variety of differently priced backup reinforcers, lower functioning learners may benefit from a token store in which all the backup reinforcers have the same price. Clinicians must also consider how many tokens the learner can produce and accumulate before exchange, or lose if a response cost is implemented, to ensure that the learner can access reinforcers. Like with the other schedules, the price of backup reinforcers should initially be low before systematically thinning the schedules. For example, initially, each token could be exchanged for one unit of the selected backup reinforcer, after which the token exchange schedules could be adjusted for practical purposes. Once again, the token system developer may need to be cautious about setting the prices too high. Backup reinforcers that are functionally unobtainable because of a high token-exchange value may cease to motivate targeted responding.

Token Training : Common and Best Practices

Token training refers to procedures used to establish tokens as conditioned reinforcers. Although this step necessarily precedes the execution of other token economy components, sequentially, we describe it last because the description requires an understanding of the other components. Recommendations for token training vary depending on the learner’s repertoire. For clients with intact verbal abilities, it may suffice to provide verbal instructions (e.g., vocal, written) that explain the token-production and exchange contingencies (Cooper et al., 2020; Kazdin, 1977). Ivy et al. (2017) reported that when token conditioning procedures were reported at all, 76% of studies provided a verbal description of the token economy contingencies. For clients who are less responsive to instructions, recommendations typically suggest some sort of pairing procedure to establish a relationship between tokens and backup reinforcers, thus “imparting value” upon the tokens (Doll et al., 2013; Hackenberg, 2018; Hine et al., 2017).

There are multiple pairing procedures from which to choose, the simplest of which is stimulus–stimulus (S–S) or direct pairing, in which tokens are delivered noncontingently and are immediately followed by a backup reinforcer (Doll et al., 2013). The most common pairing procedure reported by practitioners is response–stimulus (R–S) pairing (Fernandez, 2021), in which, contingent on a target response, a token is delivered and immediately followed by an established reinforcer. Kazdin (1977) suggested another procedure, hereinafter referred to as stimulus–exchange–stimulus (S–E–S) pairing, which involves noncontingently delivering tokens and then prompting an exchange response, contingent on which backup reinforcers are delivered. Last, another alternative is to combine R–S and S–E–S pairings by delivering tokens contingent on responding and delivering backup reinforcers contingent on exchanging delivered tokens (e.g., Argueta et al., 2019; DeLeon et al., 2014). However, applied researchers have not directly evaluated the general and relative effectiveness of these procedures for pairing tokens. To the extent that findings with other stimuli and species generalize to tokens, clinicians should use pairing procedures that require an exchange response (e.g., S–E–S training) and, more generally, those that require response–contingent pairings (e.g., R–S pairing; Hackenberg, 2018).

In addition to pairing, token system managers must often also teach token exchange and production responses, the topography of which should be carefully considered. Exchange responses refer to handing in tokens for backup reinforcers. Examples of exchange responses include handing a completed token board (e.g., Leaf et al., 2012) or individual tokens (e.g., Argueta et al., 2019) to a therapist, depositing tokens into a slot (e.g., Smith, 1972), or verbally indicating which backup reinforcer is desired. When selecting an exchange response, one should be mindful that some exchange responses might not be physically possible or might be too effortful for some clients (Hine et al., 2017). If an appropriate exchange response is not identified, clinicians should consider exchanging tokens for the client (Hine et al., 2017) until a suitable response is available. However, clinicians should be mindful that exchange responses are indispensable if one plans to teach learners to accumulate and exchange tokens at their discretion (Hine et al., 2017).

Production responses are those which result in token delivery (i.e., the target behaviors). The topography of production responses can vary widely and can include acquisition or mastered targets. However, to minimize demand- and delay-related problem behavior, mastered and low-effort responses are preferable during initial training. To the extent that unmastered tasks are aversive, their use during training might compromise the reinforcing value of tokens. Production and exchange responses may be trained using a variety of procedures, including but not limited to verbal instructions (Doll et al., 2013; Kazdin, 1977), errorless learning (e.g., Leaf et al., 2012), prompting (e.g., Argueta et al., 2019; DeLeon et al., 2014), and chaining (Hackenberg, 2018). Practitioners’ most common default strategy for training production responses is akin to forward chaining, in which learners earn the first token in the terminal token-production schedule and subsequent tokens are added gradually (Fernandez, 2021). The second most common default strategy for practitioners training production responses is akin to backward chaining (Fernandez, 2021). Note that prior to training production responses, exchange responses should be trained to establish the relationship between tokens and backup reinforcers (Hackenberg, 2018).

After exchange and production response training, clinicians may include accumulation training to teach clients to compile and save tokens. Accumulation allows individuals to access more (Hine et al., 2017) or higher cost and more preferred backup reinforcers at each exchange. However, the increased availability of reinforcers via accumulated tokens can weaken clients’ motivation to engage in token-earning behavior (Hackenberg, 2018). Thus, the decision to allow for token accumulation must be carefully weighed. If clients can accumulate tokens, clinicians should consider restricting accumulation, such as by limiting how many tokens clients can accumulate (e.g., Yankelevitz et al., 2008) or setting expiration dates for tokens. Clinicians can also restrict the quantity of tokens that learners can or must exchange at each exchange opportunity. Accumulation training can employ procedures like those for training exchange and production responses. However, additional training might be required to establish discriminative stimuli that signal when accumulation is available. Prior to training, one must also decide where and how clients will store accumulated tokens.

History and Use of Token Systems

Hackenberg (2009, 2018) provided a detailed history of the development of token economies, beginning with its roots in nonhuman experimentation. In what follows, we cover some of the highlights of that history. Wolfe (1936) and Cowles (1937) were among the first to systematically evaluate tokens in the laboratory. Both researchers used poker chips as tokens and chimpanzees as subjects to investigate response patterns across various token arrangements and to compare the reinforcing effectiveness of tokens to primary reinforcers. In a series of experiments, both Wolfe and Cowles demonstrated that establishing a token–reinforcer relationship (i.e., pairing) is central to tokens’ effectiveness. Specifically, Wolfe (1936) found that chimpanzees preferred tokens that were paired with food over those that were not paired with food. Cowles (1937) found similar results, including that paired tokens supported the acquisition of matching-to-sample discriminations with levels of accuracy comparable to but lower than those supported by food. Following Wolfe’s and Cowles’s experiments, token research lagged until a series of experiments conducted by Kelleher (1956, 1957, 1958). Investigating reinforcement schedules, Kelleher (1956) found that responding for tokens under various simple schedules of reinforcement generally conformed to the typical patterns of responding observed when the reinforcers delivered were food. However, Kelleher (1958) found that higher FR token-production schedules produced longer pauses in responding early on in sessions than those typical of food production schedules.

A spike in applied token research was spurred by Ayllon and Azrin’s (1965) seminal study evaluating the effects of a token economy on the self-help and vocational behaviors of adults with psychosis hospitalized in an inpatient facility. Across six experiments, Ayllon and Azrin demonstrated that participants’ performance improved as a function of the token economy, an effect that was lost when the token system was disrupted. The value of Ayllon and Azrin’s study was in the social significance of the behaviors and population included (Hackenberg, 2018). Kazdin and Bootzin (1972) and Kazdin (1982) addressed the increase in applied token research and identified practical areas (e.g., staff training, generalization, procedural fidelity) warranting further analysis. In the ensuing years, researchers also began applying token economies across a broad range of settings and circumstances. Matson and Boisjoli (2009) suggested that tokens system have been used most often in inpatient psychiatric settings and school-based programs, the descriptions below exemplify the variety of contexts in which token economies have been successfully implemented.

Preschools

Token economy research has been conducted in preschool settings to proactively address behavior management in young children. Filcheck et al. (2004) implemented a class-wide levels system managed by the teacher, in which, in lieu of dispensing tokens, children’s names were moved up and down seven levels contingent on meeting the specified criteria associated with each level. This type of economy removed the effort of the teacher to dispense tokens and did not require the students to count, track, or exchange tokens. It also allowed all students to participate and earn reinforcers without “singling out” the children with behavior management issues.

Elementary Schools

In elementary schools, token economies may be used to identify and diagnose the children with behavior problems and learning disabilities while cultivating peer relationships and social skills. Anhalt et al. (1998) evaluated the ADHD Classroom Kit to increase prosocial behavior and decrease disruptive behavior in classrooms. The Kit was designed for kindergarten through sixth-grade classrooms and involved splitting the class into groups so as not to single out children with ADHD or other behavior management issues. Groups and individuals earned tokens for desirable behavior and received an opportunity to correct disruptive behavior following verbal warnings. The Kit required students to rely on each other, increased accountability within groups, and allowed for peer modeling of prosocial and on-task behaviors. Additionally, the Kit has been shown to increase appropriate behavior in children with problematic behavior or disabled learning and has been used in numerous case studies (Anhalt et al., 1998).

Middle and High School

Token economies have been evaluated with adolescents in middle schools to improve the accuracy and variety of academic skills. For example, Swain and McLaughlin (1998) used a point system in a classroom of adolescents diagnosed with behavioral disorders to improve their math accuracy. Truchlicka et al. (1998) successfully used a token economy with a response cost component to improve the performance of adolescents in a special education classroom on spelling tests. Token research in high schools is sparse and often takes place in special education classrooms. One consideration for the lack of use in high schools may be that teachers do not find token economies socially valid at this level. They may also not be feasible in classrooms where students attend one class a day before moving to the next.

University/College

Token research in university and college settings may be more common than in high schools. Boniecki and Moore (2003) successfully used tokens to increase college students’ participation during class by distributing tokens for answering questions correctly during lecture, and they were exchangeable for extra credit points. Nelson (2010) conducted a similar study, but instead, students earned tokens by asking questions in class. These studies used tokens to increase participation during class toward the aim of improving students’ performances during evaluations. This expands the literature on token use to large, diverse groups and beyond the use to address disordered behavior and psychiatric conditions.

Residential/Community Facilities

Token economies have been successfully implemented in residential and community facilities. For example, Phillips (1968) implemented a token economy at a residential rehabilitation center for predelinquent boys. The adolescents earned points (i.e., tokens) for appropriate behaviors (e.g., self-care, prosocial behavior, academic achievement) and lost points for inappropriate behaviors (e.g., aggressive speech, failing schoolwork, arguing). Phillips reported significant improvement with all the participants. Similarly, Adams et al. (2002) described the implementation of a camp-wide token economy to increase prosocial behaviors at a pediatric burn summer camp. Nastasi et al. (2020) used a token economy to increase physical activity in a residential home for adults with IDD.

Organizational Settings

There have also been applications of token economies within a variety of organizations to reinforce employees’ desired behaviors. Fox et al. (1987) used stamps as tokens to reinforce miners behaving safely in an open-pit mine. They reduced the time and money lost by the company due to injury. Camden et al. (2011) decreased employee absenteeism and rescheduling by almost half through a credit reward system. Vergason and Gravina (2020) successfully had guests and confederates provide tokens to employees at a zoo for appropriate greeting behavior. These applications show the versatility of token economies across several contexts.

Therapy Settings

Token economies have also been implemented in therapy settings. For example, Ingham (1982) evaluated the effects of a token system for reducing the stuttering of adults. However, the results were inconclusive but provided initial evidence that token programs might be an effective intervention for this behavior and population. In general, token economies implemented with patients with schizophrenia have been effective (Dickerson et al., 2005). However, research with this population published since 1994 has not been reviewed and evaluated. Thus, the current effectiveness of recent studies is unknown and cannot be compared to previous research. After all, it is possible that the level of care provided in older research differs from that provided in contemporary studies. Although behavior analysts report using token economies often and although tokens are the second-most common consequence delivered by staff working with people with IDDs, research elucidating the efficacy of tokens with other populations is lacking. As such, studies with populations and in therapy settings with which we do not typically evaluate token economies are needed to better understand the efficacy of token economies in differing therapies.

Several authors have noted a marked decline in the quantity and change in the nature of token economy research. Hackenberg (2018) observed that most recent applied research has generally focused on practical and clinical concerns rather than elucidating the processes underlying token systems’ effectiveness. Thus, recent applied token research has generally not been informed by basic research and has declined significantly since the 1970s. In fact, in applied publications including token economies, token systems are usually a component of treatment packages rather than the focus of the interventions or the research themselves.

Matson and Boisjoli (2009) discuss several reasons why applied token economy research might have declined. At the height of applied token research in the 1970s and 1980s, much of the research involved psychiatric inpatients. However, once the deinstitutionalization movement began, the demand for token economies in institutions declined, and thus related research also declined (Liberman, 2000). Additionally, Hackenberg (2018) posits that the successful widespread application of token economies might have contributed to their decline in applied research. Both as a primary intervention and as a component of treatment packages, token economies have been successfully implemented to change behavior across a variety of settings, subjects, responses, and procedural modifications. Consequently, researchers may have had little motivation and reason to evaluate token economies in their own right or to investigate variables that impact their effectiveness.

Mechanisms Underlying Token Effects

Tokens increase responses upon which they are contingent, but the mechanisms by which they do so are not well understood. One account is that tokens function as conditioned reinforcers that strengthen responses due to their relationship to backup reinforcers (Hackenberg, 2009, 2018). For example, Smith (1972) and Moher et al. (2008) demonstrated that contingent tokens differentially increase children’s responding when the tokens are paired with backup reinforcers compared to when they are not. Moher et al. (2008) further observed that paired tokens maintained levels of responding similar to those maintained by the backup reinforcers themselves, suggesting that the acquired value of tokens is commensurate to that of the backup reinforcers for which they are exchangeable. Wolfe’s (1936) and Cowles’s (1937) findings that tokens maintained similar levels of responding to food also supported the conceptualization of tokens as conditioned reinforcers. Relatedly, tokens are typically conceptualized as generalized conditioned reinforcers when they are paired with more than one backup reinforcer. Supporting the conceptualization of such tokens as generalized conditioned reinforcers, Moher (2008) found levels of responding are less susceptible to fluctuations due to changes in motivating operations (MOs) when tokens are paired with multiple vs. one backup reinforcer.

Another account posits that tokens do not increase responding because they function as conditioned reinforcers that directly strengthen the responses they follow, but that they increase responding much like discriminative stimuli in that they signal that reinforcement is forthcoming and, thus, guide responding much like signs on roadways direct drivers to their destination (Shahan, 2010). For example, Bullock and Hackenberg (2015) compared pigeons’ rates of responding to identical tandem and token schedules for food (i.e., FR 200) in which the token schedule also produced tokens on a FR 50. If tokens were conditioned reinforcers, rates of responding should have been greater during the token schedule because it produced more reinforcers. However, rates of responding were lower during the token schedule, suggesting that tokens have a discriminative or signaling function and, thus, resulted in more efficient responding. In addition, rates of responding toward the end of the token requirement approximated rates at the end of the tandem requirement, indicating that early token delivery signaled a delay to the terminal reinforcer (i.e., food) and, thus, suppressed responding. Bullock and Hackenberg’s results indicated that tokens have discriminative functions when token schedules produced lower rates of responding relative to identical tandem schedules for food, even though the token schedule resulted in four times as many reinforcers (i.e., tokens) for every food delivery on the tandem schedule.

To our knowledge, researchers have not conclusively evaluated the mechanisms responsible for token effects with applied populations, and therefore, it is unclear which mechanisms account for their effects on human behavior. Regardless, tokens’ exchangeability for backup reinforcers appears to be critical to their effects (Hackenberg, 2009, 2018; Shahan, 2010).

Benefits/Advantages of Token Reinforcement

Token economies have many advantages relative to other reinforcement systems. First, tokens are typically discrete and easy to store (Ivy et al., 2017) and transport across environments (Ayllon & Azrin, 1968). Additionally, token systems allow for immediate reinforcement without interrupting ongoing responses or activities in the way that directly delivering other reinforcers (e.g., toys) can (Kazdin & Bootzin, 1972). The ability to reinforce responding immediately is an especially important benefit given findings that delays to reinforcers as brief as 6- to 10-s can negatively impact skill acquisition by reducing instructional efficiency and effectiveness (Carroll et al., 2016; Majdalany et al., 2016). Relatedly, token accumulation facilitates continuous, uninterrupted access to backup reinforcers (e.g., 10 min access after 10 tokens are exchanged), which researchers have found supports more responding and is preferred by learners than distributed (e.g., 1 min access after one token is exchanged) access (DeLeon et al., 2014).

Another benefit is that tokens can be established as generalized conditioned reinforcers and, thus, can be less susceptible to satiation effects. When tokens are paired with multiple backup reinforcers, they can continue to support similar levels of responding even when the client is satiated on one of the backup reinforcers (Moher et al., 2008). By contrast, satiation inevitably renders actual reinforcers delivered as direct consequences less effective. To maximize resistance to satiation, tokens should be paired with at least two to three reinforcers (Moher et al., 2008), and the classes (e.g., edible vs. leisure items) of those reinforcers should vary (Becraft & Rolider, 2015).

Further, each component of token systems can be individualized and tailored to a range of circumstances and treatment objectives (Ivy et al., 2017). For example, differential reinforcement can be easily embedded into a token system by arranging for different target responses to produce different quantities of tokens (Miltenberger, 2012). Similarly, response cost can be incorporated into a token system. Contingent removal of earned tokens can reduce problem behavior when appropriate (see later section on “Response Cost”). Moreover, each schedule in a token system can be readily adjusted to promote optimal responding for each client as the environment and their repertoire change.

Additionally, token economies have large-scale applicability; they can be used to change the behavior of group members (e.g., Fox et al., 1987). A good example is money, which functions as a token reinforcer because it can be exchanged for goods and services and is typically earned (and lost) by engaging in specific behaviors. Money functions as a token reinforcer for most individuals in a society and, therefore, influences the behaviors of many. Relatedly, because token economies mirror societies’ monetary systems, they can be used to teach saving and spending behaviors.

Additional Considerations

Response Cost in Token Economies

Response cost is a negative punishment procedure whereby the tokens one already possesses are removed contingent upon undesirable behavior. Response cost is essentially a modification of the token-production schedule (Hine et al., 2017). It is incorporated into token economies when one of the aims of the token economy is to decrease undesirable behavior, and simply reinforcing appropriate behavior has not achieved this aim (Miltenberger, 2012). Therefore, in practice, token economies do not typically begin with response cost, but they may be added when other attempts to decrease undesirable behavior through positive reinforcement have been thoroughly exhausted. Response cost can be implemented in varying ways. For example, individuals might be given some number of tokens at the beginning of an intervention period (noncontingent token delivery). By contrast, the individual may have to earn the tokens that are later removed contingent upon inappropriate behavior (Conyers et al., 2004).

Results of studies that compared token economies with and without response cost in decreasing problem behavior have found mixed results (Conyers et al., 2004; DeJaeger et al., 2020; Phillips et al., 1971), although some evidence indicates that response cost is just as effective, if not slightly more than symmetrical reinforcement-based procedures for reducing undesirable behavior. Interestingly, when given a choice between the procedures, many study participants have expressed a preference for response cost over the reinforcement-based alternative (Donaldson et al., 2014; Jowett Hirst et al., 2016).

Insofar as response cost is a punitive procedure, several considerations are important in deciding whether to incorporate it into a token economy. Punishment can be associated with undesirable side effects (e.g., emotional responding, punishment-induced aggression), so care must be taken to ensure that implementing response cost does not, in fact, occasion more undesirable behavior than it decreases. Although some studies suggest that token response cost may be relatively benign in this respect relative to other punitive procedures (see Iwata & Bailey, 1974; McGoey & Dupaul, 2000), we know of no direct comparisons between response cost and other kinds of positive or negative punishment procedures.

Another important consideration in the use of response cost is whether it would be difficult to remove tokens from an individual that does not particularly want to relinquish them. Removing tokens under some circumstances may result in a struggle, and under other circumstances may result in a sort of delay to the punitive operation. Delayed punishment has been shown in some cases to have diminished effects relative to more immediate punishment (e.g., Abramowitz & O’Leary, 1990).

Other important considerations involve questions regarding how many tokens should be removed contingent on an undesirable response. On the one hand, the amount lost must be sufficient to offset what is gained from engaging in the target response. That is, the relative value of the loss incurred through response cost must outweigh the gain achieved by engaging in the undesirable behavior. On the other hand, it is important that the individual not lose access to all tokens through response cost, thereby establishing a condition in which no further penalty could be imposed for additional instances of undesirable behavior. In other words, losing all of one’s tokens “might produce a segment of time in which contingencies for appropriate behavior are vague, perhaps creating an establishing operation for problem behavior.” (Hine et al., 2017; see also Miltenberger, 2012).

Fading a Token Economy

Fading a token economy refers to the methods employed to transfer control of target behaviors from the token system to natural contingencies in a manner that promotes response maintenance. In most cases, fading token systems will be necessary to facilitate clients’ transitions from treatment to natural environments. As such, prior to implementing a token economy, one should ensure that fading the token program will be feasible. Otherwise, one should consider alternative interventions to prevent possible decrements in responding resulting from removing the token economy without fading. Additionally, one should establish criteria for initiating fading while developing the token system, and one should also develop fading procedures well before the client meets said criteria.

There are two general approaches to fading a token system: (a) changing the contingencies and schedules of the token system while it remains in effect and (b) gradually eliminating the token economy in its entirety. Paul and Lentz (1977) utilized the former method to fade a type of token economy known as a level system that targeted psychiatric patients’ daily living skills (e.g., bed making). Participants began at Level 1 and moved on to other levels by meeting predetermined criteria. As participants accessed higher levels, the schedules were faded such that the contingencies more closely resembled those in the natural environment outside of the hospital. For example, when participants moved to Level 2, token delivery was delayed such that they received large quantities of earned tokens at once, much like a “pay day” (Boerke & Reitman, 2011). Participants at Level 4 could purchase backup reinforcers without restrictions if they continued to meet applicable response requirements and purchased a card that unlocked this privilege. In this manner, Paul and Lentz faded the token economy and promoted self-management skills (e.g., planning, self-monitoring) required in the natural environment.

The other method of fading involves gradually eliminating the token program and transferring control solely to the natural environment. For example, Petursdottir and Ragnarsdottir (2019) completely faded a token reinforcement system that had successfully changed the disruptive behavior and academic engagement of elementary school students. The researchers faded the token program by systematically (a) pairing token delivery with social reinforcement, (b) increasing delays to token and backup reinforcer delivery (i.e., token-production and exchange-production schedules), (c) raising performance criteria (i.e., increasing the token-production schedule), and (d) increasing the token-exchange schedule (i.e., higher prices for backup reinforcers). Ultimately, participants had to engage in target behaviors for progressively longer intervals to earn tokens, earn a greater proportion of all possible tokens to access the same backup reinforcers, and wait long periods for opportunities to exchange tokens. Eventually, the researchers were able to thin these schedules such that they were able to eliminate the token system entirely while maintaining responding at desirable levels.

Evidence suggests that the treatment effects of a token economy may persist for several years after the system is removed (Kazdin, 1982; e.g., Paul & Lentz, 1977). Variables associated with such maintenance include individualized instruction, smaller classroom sizes, parental involvement, and home-based reinforcement (Kazdin, 1982). However, sometimes the treatment effects of faded token systems are not maintained, and this loss may be the result of individuals operating in environments that do not support the behaviors targeted in the token economy. Alternatively, the environment might support the behaviors, but if the token system was not appropriately faded, individuals might experience ratio strain resulting in response decrements. To promote response maintenance once a token program is removed, Kazdin (1982) recommends incorporating the procedures described by Stokes and Baer (1977) for facilitating generalization. More specifically, Kazdin (1982) suggests fading in reinforcers (e.g., praise) that occur naturally in the environment, increasing delays to and schedules of reinforcement, involving peers and caregivers in delivering reinforcers, and conducting training across environments and stimulus conditions to encourage generalization.

Potential Limitations of Token Economies

Token economies harness much of what we know about arranging effective instructional and therapeutic contingencies, as verified by hundreds of studies. Still, in relation to other behavior change systems, they do incur some costs, which warrants a consideration of their relative benefits and costs in relation to other contingency arrangements.

Unlike the simple provision of immediate tangible reinforcers, token economies require one to train the recipients of intervention to use the token system. This added time detracts from instructional time during which the client might otherwise be acquiring skills via direct tangible reinforcement. Thus, training might result in a delay in the onset of intervention for some behaviors, especially those the token economy will target. However, time spent in training might reduce time that could potentially be spent managing problem behavior related to the use of nontoken reinforcers (e.g., unprogrammed delays to reinforcement, immediate unavailability of reinforcers).

Also, when used to their greatest potential, token systems require continuous monitoring and frequent adjustment of many moving parts (e.g., schedules, backup reinforcers). One generally starts a token system with a dense schedule of contrived reinforcement that bears little resemblance to the circumstances under which that performance is expected to persist in the future. To eventually approximate the target natural contingencies, one is required to change the system based on performance and the changing needs of the client’s repertoire and environment. Additionally, if responding in a token economy begins to degrade, there are many potential components to evaluate and manipulate to restore responding. Among the possible issues are whether one or more of the backup reinforcers are no longer potent, whether one of the three schedules has been thinned too rapidly, and whether treatment integrity has been compromised.

Matson and Boisjoli (2009) outlined several criticisms launched against the use of token economies. One involves potential ethical concerns surrounding the use of response cost within a token system, which carries the same risks as any other punishment procedure. As such, it is possible that the individuals managing the token economy (e.g., teachers, therapists) might find implementing response cost negatively reinforcing, which might result in an overuse of response cost and similar punishment procedures. Thus, a response cost embedded within a token economy can result in the same negative side effects as any other punishment procedure, including aggression, emotional responding, and discriminated avoidance of individuals and stimulus conditions associated with the procedure. However, Matson and Boisjoli point out that few token economies include a response cost component.

Another criticism is that ethical and clinical standards of care for psychiatric patients have changed, and there is a possibility that tokens are not considered appropriate for this population. They may also be difficult to maintain by staff and thus are not feasible. As such, Glynn (1990) posed that the efficacy of token economies in these settings was not properly disseminated, but also that the ethical and feasibility barriers may have been too great to implement in these settings.

Others have expressed concerns that token systems, like all other contrived reinforcement systems, might reduce internal motivation to engage in the targeted activity (i.e., overjustification effect; Deci, 1971; Kohn, 1993). However, researchers using single-subject designs have repeatedly demonstrated that there is little evidence that external reinforcement systems like token economies, as used in applied behavior analytic research, produce a systematic decline in targeted behavior (Levy et al., 2017; Peters & Vollmer, 2014). Even so, teachers and parents who are unaware of such research may be unlikely to collaborate on or consent to token research, respectively. As such, these and other such reservations about token economies might have further contributed to the decline in applied token research.

Such concerns and limitations are genuine and require attention. Despite the above-noted decline in token-oriented research, a variety of questions clearly remain to be addressed. Nonetheless, token economies have been found to be effective across numerous settings and circumstances. The rich literature on their use has shown that they can be successful in diverse applications that can be tailored to suit individual therapeutic and educational needs. As summarized by Matson and Boisjoli (2009), “the technology is powerful, efficient, and largely has been able to deal with critical comments. Thus, we see no substantial clinical justification for the decreased use of token economies.”