Keywords

Individuals diagnosed with Autism Spectrum Disorder (ASD) demonstrate atypical language development. For example, their speech may be delayed, they may have difficulty with language comprehension, and/or display repetitive or rigid language (i.e., echolalia). These difficulties can extend to non-vocal communication (e.g., use of gestures, body language, facial expressions), and often lead to difficulties in forming and maintaining friendships and social interactions (NIDCD, 2020). These challenges can also lead to a wide range of aberrant behavior (i.e., tantrums, aggression, self-injury, property destruction, etc.). The relationship between aberrant behavior and deficits in language and communication was first empirically demonstrated by a series of studies with children with developmental disabilities (Carr & Durand, 1985). The researchers designed an assessment to identify situations in which the challenging behavior reliably occurred; then taught a socially acceptable form of communication (functional communication) to replace the problematic behavior. Beyond reducing challenging behaviors, functional communicative repertoires have been linked to overall optimal outcomes for individuals with ASD (Szatmari et al., 2003).

For individuals who struggle to effectively communicate their wants and needs using vocal verbal behavior, there exist a variety of augmentative and alternative communication (AAC) supports such as sign language, picture-based communication systems, and speech generating devices. Chapter 49 of this handbook will discuss AACs in detail. The focus of the present chapter is on B.F. Skinner’s Verbal Behavior (1957) and its evidence-base for teaching language to individuals with ASD. Importantly, all forms of augmentative communication are considered verbal behavior from this perspective. We will first provide definitions for important terms that inform a behavioral interpretation of language. Then, we will review common strategies supported by the literature for teaching mands, tacts, echoics, intraverbals. Finally, we will provide a brief summary on emerging research for complex verbal repertoires in learners with ASD.

Skinner’s Analysis of Verbal Behavior

As mentioned above, from a behavioral perspective language is a special form of behavior, influenced by environmental variables. Skinner (1957) defined verbal behavior as behavior of an individual (the speaker), reinforced through the mediation of another person’s behavior (the listener). Moreover, for the behavior to be considered verbal, the listener must respond in ways which have been conditioned “precisely in order to reinforce the behavior of the speaker” (p. 225). He contrasted this with non-verbal behavior, which is reinforced via direct manipulation of the environment. According to this definition verbal behavior does not need to be vocal (speech). Non-vocal communication can also be reinforced through mediation of another person (i.e., pointing, gesturing, writing, a picture exchange, or activating a speech-generating device). Although the focus of Skinner’s analysis was on the behavior of the speaker, he considered the listener repertoire to also be under the control of environmental variables. He defined several elementary verbal operants in terms of their specific antecedents and consequences with a focus on the outcome of each response.

First, a mand is a verbal operant under the functional control of relevant conditions of deprivation or aversive stimulation and reinforced by characteristic consequences. These specific conditions of deprivation or aversive stimulation were later coined motivating operations (Laraway et al., 2003). Common examples of basic mands include requesting items (e.g., foods, toys, access to people), asking for help, rejecting offered items, and requesting escape from or avoidance of aversive stimuli (e.g., turning off loud sounds, removing bright lights, etc.). More advanced mand repertoires may include using prepositions, pronouns, and adjectives to describe items requested, and manding for information (i.e., asking “what?” “where?” “how?” “when?” and “why” questions) to gain access to reinforcement.

A tact is a response evoked by a particular nonverbal discriminative stimulus and maintained by generalized conditioned reinforcement (i.e., praise or acknowledgement). Common examples of tacts include labeling objects, events, and activities. Importantly, tacts include labels of stimuli related to all senses (e.g., “I smell cookies,” “I see the pool,” “I hear the birds,” “I feel the cool water”; although the descriptors need not be included in the tact itself). More advanced tacts can include labeling function, feature, and class of particular items; any use of descriptors for stimuli in the environment; and tacts for covert behavior such as feelings and emotions.

An echoic is under the functional control of a verbal stimulus with point-to-point correspondence and formal similarity to the response, maintained by generalized conditioned reinforcement. Examples of echoics include imitation of sounds (“eeee,” “ooooo,” etc.), words (“bye-bye,” “mommy,” etc.) and phrases (“The car goes fast!”, “I like it!” etc.). These responses are also followed by some form of generalized conditioned reinforcement (i.e., “Nice job saying what I said!”).

Finally, an intraverbal is a response under the control of a verbal stimulus with no point-to-point correspondence or formal similarity to the response, also maintained by generalized conditioned reinforcement. Some common examples of simple intraverbals include filling in phrases such as “Ready, set, …” or “One, two, ….”; answering questions such as “Where do you live?” or “What’s your favorite food?” and responding to questions about animal sounds “A dog says….”. Complex intraverbals are fundamental to back-and-forth conversation. That is, a large portion of our day-to-day interactions consist of intraverbal exchanges, such as phone and email communication, storytelling, and description of past and present events, etc. (Sundberg & Sundberg, 2011).

Multiple Control

In his analysis, Skinner (1957) discussed the issue of multiple control in great detail. It is common to observe impure forms of the verbal operants defined above. In fact, pure forms are rare outside of experimental and instructional settings. Michael et al. (2011) described two types of multiple: convergent and divergent multiple control. These authors describe convergent multiple control as the “control of a single response by more than one variable” and divergent multiple control as “the strengthening of more than one response by a single variable” (p. 3). There are many cases of multiple control that contribute to initial failures of teaching methods and procedures. These cases may be easily missed when practitioners are unfamiliar with common issues that lead to faulty stimulus control. For example, if a child is taught to tact an object in response to a verbal discriminative stimulus “What’s this?” it is possible they will not later spontaneously tact the same objects directly taught during the training session (Marchese et al., 2012). A similar situation may occur when mands are under both MO and discriminative control. For example, asking the learner “Do you want milk” when the milk is present is an example of a mand-tact. If the item is not gradually faded from sight during the training, it may result in the learner only emitting mands when items are present, and not under more naturalistic conditions. Familiarity with these sorts of issues related to multiple control of verbal operants is essential when designing language intervention programs for learners with ASD.

It is also important to note that while specific training may be required to establish direct mands, tacts, echoics, and intraverbals in children with ASD; some teaching procedures can help to promote the emergence of untaught skills. A complete review of procedures to establish emergent verbal operants is outside the scope of the present chapter (see Chap. 50 and Rehfeldt & Barnes-Holmes, 2009). However, we will reference some procedures to establish emergent verbal operants in the review of common strategies to establish elementary verbal operants.

Behavioral Language Assessment

Prior to developing a plan for teaching language skills to individuals with ASD, it is vital to assess the learner’s current repertoire. The assessment process is critical not only for identifying skill deficits, but also focusing on established skills that can help build more complex verbal repertoires. There are several criterion-referenced behavioral language assessments on the market. Unlike standardized language assessments, these assessments track a learner’s progress over time through repeated and frequent assessment that compares the individual only to their own prior performance. For example, The Assessment of Basic Language and Learning Skills—Revised (ABLLS-R; Partington, 2006), The Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP; Sundberg, 2008), and Promoting the Emergence of Advanced Knowledge (PEAK; Dixon, 2014a, 2014b) are three behavioral language assessments used by applied behavior analysts working with individuals with ASD. (For information on psychometric properties see: Dixon et al., 2014, 2015, 2016; May & Flake, 2019; and Sundberg & Sundberg, 2011).

Scores for behavioral language assessments are largely based on direct observation and interviews with caregivers. These scores are then used to develop individualized targets for language intervention. The training methodologies commonly used to teach verbal behavior are based on principles of behavior analysis that have been demonstrated in a plethora of basic and applied research studies to date (Cooper et al., 2020). Behavior analysts design the necessary environmental conditions for individuals with ASD to learn to communicate effectively. Therefore, a verbal behavior approach to language intervention is an ABA approach; but there are distinct differences in the analysis of language that set the foundation for the assessment and intervention curriculum. These differences have been described elsewhere in detail (see Petursdottir & Carr, 2011; Sundberg & Michael, 2001).

Teaching Strategies for Elementary Verbal Operants

Mands

The mand is often targeted first because it results in direct benefit for the speaker (i.e., access to reinforcement), and can be used to establish a positive learning environment. As mentioned above, teaching mands can also lead to decreases in aberrant or challenging behavior. When the teacher delivers preferred items to the learner, they get paired with the preferred item and thus increase the likelihood of becoming conditioned reinforcers. While mands can be taught using multiple procedures, early training programs often tend to focus on teaching single-word mands (e.g., “juice” “milk” “car” “book” “ball”, etc.), before proceeding to target complex mands including mand frames (e.g., “I want juice please”; Shillingsburg et al., 2020). Single word mands require less effort and are more efficient to teach because reinforcement is delivered immediately following emission of a response, in this case a single utterance. However, progressing to more complex mands becomes incrementally difficult if the focus is restricted to single word mands during early instruction. When a learner is not able to emit more than one word at a time, the instructor can still model the mand frame upon delivery of the reinforcer (i.e., “I want ____”). In fact, this is part of the recommended teaching practice for learners who use the Picture Exchange Communication System (PECS; Bondy & Frost, 1994) as their primary mode of communication. As the learner progresses toward more advanced mands requiring use of phrases and sentences, mands frames can be introduced (e.g., “May I have ___________”, “Please give me___________”). There are multiple benefits to teaching mand frames, including increased probability of generalization due to interchangeable combinations of items and frames (Shea et al., 2019).

A second general recommendation for early mand training is to avoid teaching generic single word mands such as “more” and “please” (Sundberg & Partington, 1998). Although these mands also tend to require less effort from the learner and may be acquired quickly, the learner may resort to their use in lieu of mands for specific items. Moreover, when a learner uses “more” or “please” to mand for items that are not in sight, the caregiver is left guessing what the learner wants or needs. This can lead to frustration from both the caregiver and the learner, and eventually result in aberrant behavior. For this reason, caregivers should introduce specific mands for items early in training.

Natural Environment Training

There are two behavior analytic teaching strategies that are commonly used to deliver instruction to learners with ASD: Natural environment training (NET) and discrete trial training (DTT). In this section we will focus more exclusively on NET. As the name implies, NET tends to focus on teaching that can take place in naturally occurring language contexts. Using this general training model, the teacher follows the learner’s interest and uses stimuli that are readily available to create a variety of learning opportunities. Mands, tacts, echoics, and intraverbals can all be targeted during a NET session. In this section we will discuss teaching mands in a naturalistic setting.

During NET mand training sessions, the instructor will follow the child’s lead while manipulating the environment to increase motivation (i.e., capture or contrive MOs; Coleman et al., 2020). For example, if a child shows interest in a book, the instructor may hold the book just slightly out of reach, and require a verbal response (e.g., vocalization, sign, picture exchange) before giving access to the book. The caregiver can also momentarily interrupt a fun activity (e.g., stop blowing bubbles, or pushing a swing), to momentarily increase the value of the reinforcing activity, and increase the likelihood that the learner will emit a mand to gain continued access to the ongoing activity (e.g., says “bubbles” or “push me!”).

A common strategy that takes advantage of contriving MOs to teach mands is the behavior chain interruption strategy (BCIS; Hall & Sundberg, 1987). This strategy requires interruption of a predetermined step in an established behavior chain, to create an increased likelihood and opportunity for the learner to mand, and in turn complete the behavior chain with access to the terminal reinforcer. For example, teaching a learner to make a sandwich requires multiple steps. First, get the bread. Second, add condiments to the bread with a knife. Third, select meat or cheese to place on top of the bread. Fourth, add toppings such as tomatoes, pickles, and/or lettuce. Lastly, form the sandwich by placing a second piece of bread on top of the other ingredients. In this example, the interruption can be done at any step of the chain from the first step to forming the sandwich. Interrupting a step in the behavior chain contrives a motivating operation, that will momentarily alter the effectiveness of the reinforcer and make it more likely for the learner to emit a mand. In this example, perhaps the best part of the sandwich for this particular learner is the cheese. Once the learner reliably learns to complete the behavior chain independently, the teacher hides the cheese from sight in order to evoke a mand “cheese” “where is the cheese?” or “can I have the cheese, please?” depending on the sophistication of the learner’s mand repertoire. Access to the cheese allows the learner to continue with the behavior chain and access the terminal reinforcer of eating the sandwich.

Recently, Carnett et al. (2017) reviewed empirical evidence of BCIS for individuals with ASD. Some findings are worth highlighting here for clinical applications. First, intervention procedures more often included verbal cues, modeling to prompt correct mands, time delay, physical and/or partial physical prompting procedures, combination of prompting strategies (e.g., verbal, and physical prompts), and natural reinforcers associated with the terminal reinforcer. Verbal cues require the instructor to present part of the instruction and wait for the learner to complete the phrase with the missing part (e.g., “Let’s make a sandwich. Get [item name]). Modeling procedures involve strategies such as stating the word or phrase required for access to the reinforcer (e.g., Say, “where is the [item]?” then the individual repeats the phrase). Additional strategies to teach mands through BCIS include using either single or combined prompting procedures (e.g., progressive time delay and verbal prompts). When combining prompt procedures, the instructor may give a vocal verbal prompt “What do you need?” at the point of interruption during the task, and then wait a few seconds for the learner to emit the expected target response. In subsequent teaching opportunities, the instructor waits a few seconds before stating the instruction again. If the learner emits the correct mand after the delay, then the requested item is provided.

Second, using BCIS effectively requires practitioners to identify preferred activities and assess the value of the terminal reinforcer prior to implementing this teaching strategy in natural settings. Formal and informal preference assessments can be used to identify items and activities that can be later broken down into multiple steps (see Chaps. 42 and 43 for a review of preference assessments). Finally, some learners may find the BCIS aversive and present with aberrant behavior during the interruption. This can be mitigated by selecting an alternative interruption point that will still create an MO for manding. Using the above example, if the learner finds interruption for their most preferred item (cheese) too aversive, the teacher should try a later step in the behavior chain (e.g., asking for toppings). Finally, practice sessions should be implemented on a regular schedule to promote generalization and maintenance of newly acquired skills.

Transfer of Stimulus Control

Transfer of stimulus control (ToSC) occurs when a behavior initially evoked by a response or stimulus prompt is emitted under naturally occurring conditions (i.e., in the presence of the natural discriminative stimulus or SD). For example, a prompt delay procedure (Touchette & Howard, 1984) can be used to transfer control from a verbal stimulus (e.g., “milk”) when the milk is out of sight and the instructor can reliably predict an MO is in place for milk (i.e., the learner is thirsty). The instructor initially delivers the echoic prompt, followed by immediate access to the reinforcer if the learner emits the targeted response (e.g., echoes “milk”). Over time, the instructor waits longer and longer periods of time before delivering the echoic prompt (see Fig. 1). Transfer of stimulus control is achieved when the learner responds in the presence of the target stimulus before any prompts are delivered. This echoic-to-mand transfer procedure can be effective for learners with a strong echoic repertoire. Overall, transferring stimulus control across operants can facilitate the acquisition of novel mands, tacts, and intraverbals (Barbera, 2007).

Fig. 1
figure 1

Left side of the graph shows delivering an echoic prompt “Milk, please” in the presence of a MO. The right side of the graph shows the emissions of mand “milk, please” after fading the echoic prompt

Multiple Exemplar Training

Multiple exemplar training is an intervention procedure in which the therapist teaches a variety of stimulus and response topographies to promote generalization to untaught topographies (Holth, 2017). To illustrate, when teaching a learner to mand for “water” we can present multiple items (e.g., water bottle, fountain) that evoke a vocal response “water”. The next time under proper MO control (i.e., when the learner is thirsty) the teacher can present a different item (cup) that contains water to evoke the respective mand. In this example, however, the generalization of the mand “water” is under similar MO, but different antecedent control (bottle, fountain, and cup). In this case, transfer of control from one stimulus topography (e.g., bottle) to a different antecedent (e.g., cup) may occur due to derived relational responding processes. In other words, a learner may respond to all stimulus topographies as similar in the context of water deprivation. For the interested reader, further descriptions of the synthesis between verbal behavior and derived relational responding are presented elsewhere (Barnes-Holmes et al., 2000; Rehfeldt & Barnes-Holmes, 2009)). Similar procedures can be used to teach tacts and intraverbals. Examples will be provided in corresponding sections of this chapter.

Combining mand frames and BCIS with MET increases the likelihood for the emergence of new mands. For example, when teaching mand frames in combination with multiple effective reinforcers (e.g., “May I have watermelon?”, “May I have blueberries?”, “May I have strawberries?), then mands for new items may emerge without further training. Also, teaching mand frames can increase resistance to extinction of socially appropriate behaviors and mitigate resurgence of challenging behavior that has been associated with obtaining functionally related reinforcers in the past (Shea et al., 2019). Using the previous example, if the learner does not obtain the edible reinforcer for emitting an appropriate mand, then, she can use an alternative vocal mand frame for obtaining similar functional reinforcers (e.g., blueberries or strawberries). In sum, teaching multiple mand frames increases the opportunity for the learner to obtain the desired reinforcer and decreases the likelihood of engaging in inappropriate behaviors for obtaining similar reinforcers.

Tacts

Teaching tacts not only helps to increase the vocabulary of learners with ASD, a strong tacting repertoire sets the foundation for complex language repertoires. Several procedures have been employed to teach basic tact responses. These procedures include use of various teaching formats (massed vs. interspersed-trial instruction), prompting strategies (see Chaps. 42 and 43 for a review of prompts), error correction procedures (e.g., repetition/rehearsal, delivering verbal feedback/error statements), and varying reinforcement parameters (quality and magnitude; DeSouza et al., 2017).

In addition, as mentioned above, MET can be employed to increase tacting responses in learners with ASD. For example, teachers may present multiple items that belong to the same stimulus class, then probe for new tacts. If a learner tacts “dog” when presented with a picture of a Border Collie; the teacher may incorporate pictures of multiple dogs into subsequent training sessions (e.g., bulldog, poodle, beagle, etc.) until the individual reliably responds to all trained pictures. Then, without explicit reinforcement a learner may emit the tact “dog” when presented with a novel picture of a breed that has not been directly trained (i.e., Labrador).

Massed- and Interspersed-Trials

Both massed-trial and interspersed trial instruction typically occur in a discrete trial training (DTT) format. During massed-trial instruction , a learner is presented with the same unmastered target for multiple trials throughout a session. In contrast, during interspersed trials the instructor presents mastered targets within an instructional sequence that includes more challenging (acquisition) tasks. Majdalany et al. (2014) compared these two training formats for children with developmental delays. Results of this study showed the majority of participants met the predetermined mastery criterion in fewer trials when targets were presented in a massed-trial format. Although evidence supports both procedures for tact instruction, other research has demonstrated lower levels of aberrant behavior and higher levels of response maintenance when trials are interspersed (Rapp & Gunby, 2016). These combined results indicate that it may be beneficial to evaluate individual characteristics and learning histories in order to select an optimal teaching approach. For example, recent studies have evaluated the use of assessment procedures to determine prompting sequences (Seaver & Bourret, 2014), and mand modalities (Valentino et al., 2019) that are best suited for individual learners. These mini assessments can help maximize the instructional time available to learners with ASD.

Error Correction and Reinforcement Parameters

The effectiveness of error correction procedures and parameters of reinforcement also tends to be idiosyncratic. Take for example a learner with a limited tact and listener repertoire, and poor attending skills. This individual may not benefit from corrective feedback delivered without a model for the correct tact response (Cariveau et al., 2019). In this scenario the instructor must first verify the learner is attending to the nonverbal stimulus (e.g., picture of a dog) before delivering a general instruction (e.g., “I’m going to show you some pictures, tell me what you see”). Then, if an incorrect tact response is emitted the teacher requires an observing response to ensure the learner is attending to the target stimulus (i.e., touches the picture), followed by an echoic verbal prompt (e.g., “dog”). The learner is required to repeat the correct response before the next target is presented.

Prior research has also shown that reinforcement procedures used for skill acquisition programs will vary across learners (Boudreau et al., 2015). That is, some learners will benefit more from manipulating reinforcement quality (e.g., using different types of preferred items that may function as reinforcers), rather than varying the magnitude of a specific reinforcer (e.g., giving more of a particular type of snack or added playtime with a particular toy). Given these individual differences in error correction procedures and reinforcement arrangement, teachers should plan to include assessment procedures that will help them identify efficient arrangements for skill acquisition programs (see Carroll et al., 2015).

Tacts for all Sensory Modalities

Although research on teaching tacts is largely focused on visual modalities (presenting a picture or 3-dimensional item to evoke a response; Dass et al., 2018; Hanney et al., 2019), tacts are under the control of all sensory modalities (i.e., auditory, tactile, gustatory, and olfactory). A child learning to cook may reap greater benefit from combined visual and gustatory and/or olfactory tact training. Imagine an individual learning to discriminate between sweet vs. salty flavors. The teacher presents honey and pretzels or juice and mixed nuts as visual stimuli. Multiple errors may later occur if the individual is asked to also visually tact “bacon” or “caramel corn” following this training. Instead, more rapid cooking skills may be acquired if gustatory tact skills are taught either directly or in combination with visual tacts. For example, a tact for bitter food (e.g., cranberries) may be paired with tasting a cranberry. Multiple-exemplar training can also be used to teach other bitter items along with their corresponding pictures (e.g., coffee, endive, brussels sprouts, or citrus fruits, along with their respective pictures). These items may be presented one after the other to promote generalization to new bitter items. In sum, initial tact training should focus on teaching familiar 3-dimensional objects (e.g., animals, fruits, objects, toys) followed by tacting two-dimensional representations of those objects. Subsequent targets may include other tact modalities such as auditory, tactile, or gustatory. In addition, tacts related to size, color, shape , positions, and actions can be introduced to the learner’s curriculum once these more rudimentary tacts are established.

Echoics

If a learner benefits from a strong echoic repertoire, their verbal behavior can be greatly expanded. Echoic responses can range from simple vocalizations such as “bah” or “ma” to more sophisticated verbal behavior “I want water” or “Jack and Jill went up the hill to fetch a pail of water.” A common goal for learners with ASD is to develop a generalized repertoire of echoic responding; meaning no direct reinforcement is required for the emission of new echoics. A procedure that is commonly used to increase or establish early vocalizations is shaping (Cividini-Motta et al., 2017). In this procedure close approximations to the target vocal sound are differentially reinforced with highly preferred stimuli. Although this procedure can be effective in shaping complex vocalizations, it can also be a slow process. Other procedures with some demonstrated efficacy to establish echoic responding include stimulus-stimulus pairing (SSP; Shillingsburg et al., 2015), response-stimulus pairing (RPS; Lepper & Petursdottir, 2017), operant discrimination training (ODT; Lepper et al., 2013) and the mand-model procedure (MM; Cividini-Motta et al., 2017).

Stimulus Pairing

The SSP procedure involves pairing vocalizations with presentation of a reinforcer. This procedure employs the principles of classical or respondent conditioning (see Chap. 42) to increase the conditioned reinforcing value of speech sounds. For example, a parent pairs a neutral vocal stimulus “ma” with a highly preferred item (e.g., food, tickling) without requiring vocalizations from the child. Following multiple repetitions , the parent’s vocalization may acquire reinforcing properties and become a conditioned stimulus. The goal of SSP is to expand the individual’s verbal repertoire so that it can be brought under appropriate stimulus control (Shillingsburg et al., 2015). The empirical support for this procedure remains mixed to date, and therefore recommendations for implementation for this procedure are difficult to make at this time. However, a review of the literature on SSP indicates the procedure can be most effective for children 5 year or younger, when the delivery of a preferred item overlaps with presentation of the target sound, and when delivery of items are withheld if the learner emits a sound immediately following presentation of the vocal model (Shillingsburg et al., 2015).

Stimulus-stimulus pairing has most often been implemented using a response-independent reinforcer, meaning the caregiver’s vocalizations and reinforcer are independently presented or in close temporal proximity in absence of an overt response. In contrast, response-contingent pairing (RSP) requires the caregiver to present known reinforcers contingently on speech sounds (Lepper & Petursdottir, 2017). However, RSP is more effective when vocalizations are initiated by the learner. In other words, for individuals with a limited or absent verbal repertoire using the response-independent procedure is recommended, whereas for individuals with minimal vocal repertoires, using the RSP may be the primary option.

ODT and NET

In contrast to stimulus pairing, operant discrimination training consists of delivery of a reinforcer contingent on a response from the learner. This procedure may be further enhanced by including an S-delta (SΔ), in the presence of which extinction is in effect (Esch et al., 2009). For example, the child’s echoic response “milk” is reinforced, whereas an echoic response of “water” is no longer reinforced. As described above, NET procedures may also be used to expand echoic repertoires. An example of this might be a teacher emitting a sound that corresponds to play items (e.g., saying ‘weee’ as a figurine goes down a play slide) and then prompting the learner to do the same; or requesting for the learner to repeat relevant sounds and phrases that correspond to a book they are looking at together.

A teaching strategy known as the mand-model procedure is similar to NET in that vocal prompts are introduced to learners when they fail to produce a correct mand response to access preferred items. Importantly, the term mand as it is used here does not refer to the verbal operant defined above. Instead, a mand in this context refers to the verbal behavior of the teacher, as in a request for the child to emit a vocalization. Nevertheless, the procedure can be similar to mand training, with the arrangement or presentation of preferred toys to entice the learner. Once the learner shows interest, the teacher requests a specific response from the learner (e.g., “Tell me what you want” or “Tell me what this is”). If the response is below the expected criterion level, the teacher provides a model (i.e., “It’s a lightsaber”) and/or requests the expected response (i.e., “Tell me using a sentence”), with the goal of shaping more complex responses.

Empirical evidence across the procedures used to establish echoics is mixed, indicating that their effectiveness will vary across learners. Additional research is needed to determine which procedure(s) will be most effective depending on the learner’s established repertoire. A recent study designed an assessment procedure to determine the ideal teaching conditions to teach echoic responding to learners with ASD (Cividini-Motta et al., 2017). Results of this study are promising, as the assessment was successful in identifying a teaching procedure for five of the six participants. However, additional research is needed to determine optimal teaching conditions that will lead to a strong echoic repertoire when it is not already present for learners with ASD.

Intraverbals

As stated earlier, the intraverbal represents a wide range of verbal responses. Intraverbals may include simple chains of verbal stimuli such as filling in words to song, to more complex intraverbals such as answering questions and stating members of categories. Stated differently, elementary intraverbal responses require a repertoire in simple discrimination skills, whereas complex intraverbals are under multiple and conditional discrimination control (Stauch et al., 2017; Sundberg, 2016). Take for example a child saying, “You’re welcome” after somebody says, “Thank you” or saying “Go” in the presence of “Ready, set…”. In these examples all verbal responses are under one verbal stimulus control. In contrast, responding to more elaborated questions such as “What are some green animals?” or “What color is an apple?” requires the learner to have an advanced repertoire in conditional discriminations. In the former case, the speaker is under the control of two parts of the utterance “green” and “animal.” A correct response would indicate a strong repertoire in conditional discrimination, saying “lizard” or “parrot” and an incorrect response indicates a lack thereof. In conditional discriminations, a correct response is dependent on knowing that “green” is the conditional stimuli and “animal” is the SD.

Multiple intervention procedures have been used to teach intraverbal responses to individuals with ASD. These include some of the procedures that have been described above for teaching mands and tacts (e.g., NET, ToSC, MET), as well as peer-mediated strategies (McClannahan & Krantz, 2005), instructive feedback (IF; Albarran & Sandbank, 2019), and derived relational responding (Rehfeldt & Barnes-Holmes, 2009). In this section we will focus exclusively on ToSC, IF and MET.

Transfer of Stimulus Control

As described earlier, ToSC can be used to transfer control between verbal operants. In this case, one verbal operant can effectively serve as a prompt (e.g., tact, echoic, textual) that is systematically faded until the controlling verbal stimulus alone results in the targeted response. For example, a therapist may add an echoic prompt, “say, bed” when asking the learner “where do you sleep?” if the learner has a strong echoic repertoire. Upon subsequent training trials the echoic “bed” is eventually removed by adding a time delay to the presentation of the vocal model (Finkel & Williams, 2001). A similar procedure can be used with textual or pictorial prompts. When a textual prompt is delivered, the therapist provides a written script as a model and subsequently requires the participant to respond to the written text. The script can then be faded using a backward chaining procedure (McClannahan & Krantz, 2005). To illustrate, a therapist presents the full textual prompt “I am fine” as an answer to the question “How are you?” Then, delivers the sentence without the last word “I am”, then delivers the sentence without the last two words “I” until the response is under independent verbal control. Pictorial prompts follow a similar teaching strategy.

Empirical evidence provides support for the notion that prompt effectiveness will depend on each individual’s learning history (Coon & Miguel, 2012). Briefly, Finkel and Williams (2001) reported that textual prompts were more effective than echoic prompts in establishing intraverbals in an individual with ASD. A follow-up study showed that both prompt procedures were effective, though textual prompts resulted in quicker acquisition than echoic prompts (Vedora et al., 2009). Similarly, Vedora and Conant (2015) found tact, textual, and echoic prompts were equally effective in establishing intraverbal responding in young adults with ASD. Finally, Coon and Miguel (2012) systematically tested the role of previous history with various prompt types. All participants in this study required fewer trials to learn novel intraverbals with the prompt type they the most recent exposure to. Collectively, results of these studies indicate that the design of language intervention programs should take into consideration the learner’s history with prompt types in order to design effective and efficient programs of instruction.

MET and Instructive Feedback

Two additional teaching procedures with empirical support to teach intraverbal responding are MET and instructive feedback (IF). Multiple-exemplar training has been used as a strategy for teaching both direct and derived intraverbals (Raaymakers et al., 2019). Direct intraverbals can be established through ToSC (as described above) as well as MET procedures. For example, when teaching responses to wh-questions to an individual with ASD, the teacher may present multiple questions (e.g., “What’s your name?” “What is something you eat?” “What is something you wear?”). Upon multiple exposures to the same type of wh-questions, the learner may be able to answer similar questions in the absence of direct reinforcement.

Multiple exemplar training can also be incorporated to establish derived intraverbals. Prior research has shown that emergence of novel intraverbals is more likely once the learner has an established tact and listener repertoire (e.g., Shillingsburg & Frampton, 2019). Multiple receptive language skills such as identifying objects by name, feature, function, or following simple instructions should be targeted before probing for derived intraverbals. For example, a learner may first be taught to visually discriminate between their favorite food items (e.g., pizza, chicken, apple) using MET. Once this skill is well established, the teacher probes for derived intraverbals (e.g., “What is your favorite food?”). Similarly, tacts and mands can serve to establish derived intraverbals. Suppose we teach a learner to tact multiple food items by responding to the question “What are some fruits?” Following this training, the teacher can probe for related intraverbal questions “What is your favorite fruit?” or “What is your favorite fast food?” Although empirical evidence is still mixed, research supports the use of teaching speaker repertoires such as tacts or a combination of listener and tacts to help establish derived intraverbals (Raaymakers et al., 2019).

Instructive feedback refers to extra, non-target information that is presented during the consequence of a learning trial (Albarran & Sandbank, 2019). A response to the IF stimulus is not required, and a consequence is not provided regardless of how the learner responds to the additional information presented. For example, a teacher presents the antecedent verbal stimulus “Tell me three fruits” and waits for a response. If the learner says, “Apple, banana, pear” the teacher provides a generalized reinforcer “That is correct!” Mango, orange, and plum are also fruits. In this example, “Apple, banana, pear” are the primary targets, whereas “Mango, orange, and plum” are the secondary targets that are immediately presented following the vocal reinforcement. In this case, interpolating targeted and untargeted items following delivery of reinforcement for the targeted items may increase the future occurrence of non-targeted stimuli. Evidence for the effectiveness of IF has been demonstrated in several studies to date (Delmolino et al., 2013; Vladescu & Kodak, 2013). Importantly, Haq et al. (2017) also examined learner characteristics that may impact the effectiveness of IF procedures. The researchers identified echoic behavior and attending skills as potential prerequisite skills needed for the acquisition of untargeted stimuli presented during IF. That is, similar to research on assessments for mands, tacts, and echoics, results of this study provide additional support for the use of individualized assessments to help develop effective teaching verbal behavior programs.

Summary

The goal of this chapter was to introduce a behavioral account of language and interventions informed by Skinner’s analysis of verbal behavior. Although seminal work in applied behavior analysis did not explicitly incorporate this analysis into treatment programming for learners with ASD, the field has moved largely in this direction over the last two decades (DeSouza et al., 2017; Sundberg & Michael, 2001). Based on the research and clinical applications conducted to date in this area, a few general recommendations are provided here.

First, goals for language intervention should be informed by the results of a behavioral assessment specifically designed to highlight each individual learner’s skills and deficits (i.e., ABLLS-R, VB-MAPP, PEAK). The teacher or practitioner should work in collaboration with caregivers to select appropriate goals based on the learner’s profile. It is important to take into consideration the family’s culture and values during the goal selection process, including whether multiple languages are spoken in the home. Although little research specifically related to teaching verbal operants in a bilingual or multilingual context has been conducted to date, some recent studies have demonstrated such instruction is viable and can be effective (Leon & Rosales, 2018; Thordardottir, 2010).

Second, given the strong correlation between aberrant behavior and lack of functional communication, it is imperative to establish a mand repertoire early in training. Learners can be taught to mand in various modalities (e.g., sign, picture exchange, speech-generating device) in addition to vocalizations. If a modality other than vocalization is initially selected, parents may express concern that their child may never learn to speak. Practitioners should acknowledge this concern, but also share information with parents regarding the use of AACs that leads to increased vocalization in learners with ASD (e.g., Carbone et al., 2010; Gevarter & Horan, 2019; Greenberg et al., 2014). In addition, a disadvantage of focusing exclusively on vocalization is that the learner will likely show slow progress, especially if the delay is significant (i.e., no sounds or words are initially present in the repertoire).

Third, there is a large body of evidence to support the use of a variety of teaching procedures to establish rudimentary mand, tact, echoic, and intraverbal repertoires in learners with ASD. Overall, and perhaps not surprisingly given the variability in how language deficits are manifested in this population, the results of these studies have demonstrated largely idiosyncratic results in terms of the effectiveness of specific procedures. To address this possible barrier to successful programming, several recent studies have evaluated the use of mini-assessments to determine the ideal mand modality (Valentino et al., 2019), echoic teaching strategy (Cividini-Motta et al., 2017), and error correction procedures (Carroll et al., 2015) for learners with ASD. Although these additional assessments will take time to complete, they will also likely help to maximize the instructional time available in various settings. Future research is warranted in this area.

In closing, it is important to note that although this chapter focused primarily on establishing elementary verbal operants, this in no way represents the wide range of applications that have been informed by Skinner’s verbal behavior. Recent examples of studies that have targeted advanced language skills include teaching children to respond to disguised mands (mands where neither the reinforcer nor MO are evident; Najdowski et al., 2017); and emit extended tact responses (e.g., correctly tact emotions when given metaphors, Dixon et al., 2017). There are also recent extensions of Skinner’s work (Stewart et al., 2013) that go beyond the focus of this chapter. Interventions designed to promote generativity of verbal behavior, including emergent verbal operants, will maximize the impact of a verbal behavior approach to language interventions for learners with ASD.