Children with autism and other developmental disabilities often have limited functional language to request access to preferred items (Filipek et al., 1999). Skinner (1957) used the term “mand” to refer to this type of verbal behavior because the response is evoked by an establishing operation and maintained by access to the specified reinforcer. In the absence of functional mands, behaviors such as tantrums, aggression, or self-injury may be more likely to develop because these behaviors might produce access to the preferred items or events (Hagopian, Wilson, & Wilder, 2001). Thus, it has been recommended that mands be targeted early in behavioral treatment (Sundberg & Michael, 2001).

Although vocal language is the most long-term socially acceptable mand form, training initial mands using alternative modalities (e.g., pictures, sign) may be necessary. Although there are a variety of high-tech augmentative communication systems available (e.g., speech-generating devices), clinicians may choose low-tech modalities such as sign language and picture systems before accessing more expensive options. Mands have been successfully taught to children with autism spectrum disorders (ASD) and intellectual disabilities (ID) using different response modalities which have included vocalizations, manual sign, and exchanged-based communication systems (Carbone, Sweeney-Kerwin, Attanasio, & Kasper, 2010; Tincani, 2004). The decision about mand modality is often made by individuals involved in treatment planning; however, there is limited research guiding this decision and the decision-making process is often “more of an art than a science” (Mirenda, 2003, pp. 212).

There are theoretical and pragmatic arguments for the advantages and disadvantages of each modality (Sundberg, 1993; Tincani, 2004; Wraikat, Sundberg, & Michael, 1991). Some authors (Michael, 1985; Sigafoos, 1998; Sundberg & Michael, 2001; Sundberg & Partington, 1998) suggest sign language is easier to acquire than selection-based systems because signing involves a one to one correspondence between a single stimulus (i.e., motivation for an item) and a single response (i.e., unique sign for the item), whereas selection-based systems require multiple stimuli (i.e., motivation for the item, picture of the item) and a response that has the same topography (i.e., hand a picture card) for every item. In sign language, the stimulus and the response sometimes resemble each other providing a built-in prompt (Sundberg & Partington, 1998). Sundberg (1993) notes that one of the advantages of sign language is that it is free from environmental support, making it portable like spoken language. In addition, there is a natural verbal community (i.e., hearing impaired community) that uses sign language, so materials and trainers are available (Sundberg, 1993). Sundberg cautions, however, that listeners must have special training in order to respond to signed mands. In addition, the trainer must shape individual responses, while picture-based systems can result in generalized card exchanges, once the identity matching repertoire is intact (Sundberg, 1993).

Advantages to selection-based systems include minimal physical effort required (Mirenda, 2003), no requirement of the speaker to imitate different motor responses, and speed with which the system can be taught (Charlop-Christy, Carpenter, Le, LeBlanc, & Kellet, 2002). Another advantage is that selection systems do not require the listener to be familiar with an additional language such as sign (Bondy, 2001; Charlop-Christy, Carpenter, Le, LeBlanc, & Kellet, 2002), thus expanding the number of people with whom the child can speak. Disadvantages to selection-based systems include the environmental support requirement (i.e., the picture book with pictures), the potential negative impact of the additional time required to emit the response on motivation (Bristow & Fristoe, 1984; Sundberg, 1993), and the fact that as words become more abstract (e.g., adjectives, verbs, carrier phrases, ideas), it becomes more difficult to depict the words in picture form. Despite these theoretical and pragmatic arguments for each, there is limited empirical evidence to guide clinical practice. Ideally, research would guide clinicians to make decisions based on learner characteristics known to be associated with success or failure with any given modality.

A few studies have directly compared the effectiveness of different modalities such as the Picture Exchange Communication System (PECS) and sign language (Tincani, 2004; Adkins & Axelrod, 2001; Chambers & Rehfeldt, 2003) and have had varying results. Adkins and Axelrod (2001) and Chambers and Rehfeldt (2003) found selection-based communication to be more effective. However, Tincani (2004) found the usefulness of either modality varied across two participants without a clear indication of the relevant characteristics of each participant that might predict effectiveness. Tincani hypothesized that acquisition of the systems may be related to prerequisite skills, such as motor imitation. Thus, one option is unlikely to prove optimal for all children. Practitioners would benefit from a quick and effective means to predict which modality will produce the most rapid acquisition for a given child rather than using option for all children.

Researchers have begun to examine whether the presence of certain prerequisite skills applicable to the response modality would predict subsequent rates of acquisition or independence of mands (Bourret, Vollmer, & Rapp, 2004; Gregory, DeLeon, & Richman, 2009). Of relevance to the current study, Gregory, DeLeon, and Richman (2009) assessed matching skills as a prerequisite to picture exchange and motor imitation as a prerequisite to sign language. They used the assessment with six children with intellectual disability (ID) and trained mands using a selection-based communication response (picture exchange) and a manual sign response. When both repertoires were intact on the assessment, either response modality was mastered and the picture exchange response was mastered more quickly. When neither repertoire was strong, sign was never mastered and only one of three participants mastered the selection-based system. However, Gregory et al. did not include an assessment of vocal responding or attempt to train a mand using the vocal modality.

The current study developed a brief assessment of participants’ prerequisite skills for three common response modalities (i.e., vocal, sign language, picture exchange) to determine if performance on the skills assessment predicts the rate of acquisition during mand training in each response modality. This study extends the Gregory et al. study by including assessment of echoic skills and subsequently testing the acquisition in the vocal mand modality.

Method

Participants

Thirteen children who did not yet engage in one word functional mands participated in the study. Participants ranged in age from 2 years, 2 months to 8 years, 3 months (mean = 3 years, 4 months) and all had a diagnosis of either developmental disability (DD) or autism spectrum disorder (ASD). Participants consisted of one female and twelve males. All participants received Applied Behavior Analysis (ABA) services in a center-based program or in a home program. Ten of the thirteen participants were new to ABA services. No mand training had been attempted with any modality with these ten participants. The remaining three participants (Earl, Kameron, and Victor) were older than the other seven (average age 6.5) and had received ABA services prior to the start of the study. During Earl, Kameron, and Victor’s prior treatment, vocal mand training has been attempted without success. Participants qualified to participate in the study if acquisition of one word mands was a primary goal in their language programing, and they had not yet mastered mands in any mand modality. For complete participant demographics, see Table 1.

Table 1 Participant demographics

Setting and Materials

The centers contained child-sized chairs, tables, preschool aged toys, and other age appropriate stimuli typically found in educational environments. Sessions in the home were held in the location of the house where ABA sessions were conducted (e.g., living room, bedroom) and contained furniture and standard home décor. All study procedures were integrated into the participant’s existing clinical programming. Materials consisted of eight food or leisure items per participant as identified by a paired stimulus preference assessment, ten sets of identical pictures of plastic fruits and vegetables, ten 3D plastic fruits and vegetables, a picture of each of the three preferred items assigned to mand modalities, data sheets, timers, and pens. All pictures were laminated and 3 in. × 3 in. in size.

Measurement

The primary dependent variable was correct responses, summarized as a percentage. In the prerequisite assessments, correct responses were defined as the participant emitting the specified response per condition (i.e., motor movement, match to sample, vocal sound/word) within 3 s of presentation of the discriminative stimulus (SD). In the mand training sessions, correct responses were defined as the participant emitting the specified response (i.e., a sign, picture exchange, vocal word) within 3 s of presentation of the item. For all modalities, acceptable responses were defined prior to the start of the study (e.g., the correct sign for “candy” consisted of one pointer finger touching any part of the cheek, below the eye and above the chin). For the picture exchange, the participant was required to extend his hand and release the picture. For vocal responses, the participant was required to emit the full vocal word.

Interobserver agreement (IOA) on response accuracy was assessed by having a second observer collect data on all participant responses for an average of 47% of sessions across participants. IOA data was collected for each participant across all conditions. Agreement was calculated by dividing the number of agreements by agreements plus disagreements and multiplying by 100%. Mean IOA for all participants was 94% (range 84–100%) (see Table 2).

Table 2 Indices of the quality of measurement and procedural implementation for each participant

During all conditions, an observer scored implementation of the procedure against a procedural integrity (PI) checklist for some sessions. Each trial was scored for core components of correct implementation of the corresponding condition. The score was calculated by taking the number of components implemented correctly divided by the total number of components in a session and multiplying by 100. Examples of items included in the procedural integrity checklist include accurate use of the prompt hierarchy and correct number of seconds between prompts. Complete procedural integrity checklists for each phase are available from the authors by request. Data were summarized as percent correct implementation with each component as a unique contributor to that percent and averaged for each participant. Mean procedural integrity for all participants was 99% (range 95–100%) and was collected during an average of 47% of sessions for all participants across all conditions (see Table 2).

Design

An alternating treatment design was used to assess mand acquisition. For each participant, one preferred item (e.g., cookie, specific toy car) was assigned to each modality (i.e., sign language, picture exchange, or vocal) and remained in that modality until either the mastery or termination criterion was met. In addition to visual inspection, Pearson product moment correlation (i.e., Pearson r) coefficients were calculated for the accuracy score for each prerequisite assessment with the subsequent accuracy during acquisition for each modality.

Procedure

General Procedures

Sessions were conducted between 2 and 5 days per week and the number of sessions varied from 1 to 3 depending upon the child’s schedule and motivation for preferred items. The order of procedures was paired stimulus preference assessment, 1-min timed observation with each of three preferred items, prerequisite assessments, and mand training. Most participants also completed a “best alone” phase, wherein items that had been assigned to ineffective mand training conditions were re-assigned to the most effective condition, to replicate the effects of the effective condition and rule out stimulus-specific confounds.

Preference Assessment

A paired stimulus preference assessment (SPA; Fisher et al., 1992) was conducted for each participant at the start of the study. The eight stimuli included in the preference assessment were identified via parent and team member report. An overall category of stimuli (i.e., food, leisure items) to be targeted in the study was selected for each child based on caregiver report. Several items from that category were included, every item was presented with each other item once, and side placement was counterbalanced. The purpose of this preference assessment was to identify (1) moderately preferred items from the same stimulus category to be delivered contingent on compliance and responding during the prerequisite assessments and (2) three relatively equally highly preferred items to be used during mand training (i.e., one item per modality).

Timed Observation

Pictures of the three relatively equally highly preferred items were printed and laminated and a single 1-min timed observation was conducted for each item with the item and the picture present. To begin, the preferred item was shown to the participant and the picture was available on the table or on the floor in front of him. The experimenter did not present instructions and recorded any responses that were emitted. The purpose of this observation was to ensure there were no pre-existing mand approximations in any of the modalities (i.e., vocal, sign, picture exchange) prior to assigning the preferred items to a mand training modality. Mand approximations included any part of the response in any modality (e.g., half of the sign, picking up the picture, emitting any part of the word). If participants emitted a mand approximation of the item presented, that item was removed from the study, the next preferred item was included instead, and a new timed observation was conducted with the new item.

Prerequisite Assessments

The order of presentation of each prerequisite assessment was determined by a random draw for each participant. Each assessment contained 20 targets. Low, moderate, and high preferred items were identified based on selections during the SPA. Praise and moderately preferred items were delivered every three to five trials for compliance. Correct independent responses were reinforced with moderately preferred items (i.e., not the items targeted for mand training). The motor imitation and matching assessments consisted of a least-to-most prompt hierarchy and the vocal assessment consisted of presentation of the vocal model three times to equate the maximum number of potential prompts. Two to four s elapsed between the initial instruction and each subsequent prompt to allow an opportunity to respond. Prerequisite assessments lasted approximately 10 min each.

Motor Imitation Assessment

One trial each of ten gross motor movements and ten fine motor movements was conducted. Gross motor trials always preceded fine motor trials. The experimenter presented the instruction “do this” while modeling the motor movement. If a correct response occurred within the allotted time, a moderately preferred item was provided. If the participant emitted an incorrect or no response, the instruction “do this” was presented again and an immediate partial physical prompt was provided. Correct responding resulted in a neutral statement (e.g., “you’re trying hard”) and presentation of the next trial. Incorrect or no responses resulted in a final presentation of the instruction “do this” with a full physical prompt and presentation of the next trial.

Identity Matching Assessment

The matching assessment targets consisted of ten 3D to 2D matching trials and ten 2D-2D matching trials; one trial was conducted for each of these 20 targets. The 3D-2D matching trials always preceded 2D-2D matching trials. A trial consisted of the following steps. First, the experimenter presented the three stimuli on the table or on the floor directly in front of the participant. Two of the stimuli in each array were non-matched items from other trials in the matching prerequisite assessment and one stimulus matched the sample stimulus. Every three to three trials, the location of all stimuli was changed, either by moving them to a new location in the array or by replacing them with an entirely new picture and moving the new picture to a new location in the array. Next, the experimenter presented the instruction “match” while giving the stimulus to the participant. If a correct response occurred within the allotted time, a moderately preferred item was provided. If the participant emitted an incorrect or no response, the instruction “match” was presented again and an immediate gestural prompt was provided. Correct responding resulted in a neutral statement and presentation of the next trial. Incorrect or no responses resulted in a final presentation of the instruction “match” with a full physical prompt and presentation of the next trial.

Vocal Imitation Assessment

The vocal imitation targets consisted of ten one-syllable sounds and ten two-syllable words. One-syllable sounds always preceded two-syllable words. These sounds and words were obtained from the Early Echoic Skills Assessment (EESA) a part of the Verbal Behavior Milestones Assessment and Placement Program (VBMAPP; Sundberg, 2008) ™ and one trial was conducted for each of these 20 targets. The experimenter presented the targeted sound (e.g., “ahh”). If a correct response occurred within the allotted time, a moderately preferred item was provided. If the participant emitted an incorrect or no response, the targeted sound was presented a second time. Correct responding resulted only in a neutral statement (e.g., “you’re trying hard”) and presentation of the next trial. Incorrect or no responses resulted in a final presentation of the targeted sound and presentation of the next trial. Only exact matches were accepted as correct responses.

Mand Training

The three previously identified highly preferred items were randomly assigned to one of three mand training conditions: sign language, picture exchange, or vocal response. Once an item was assigned to a mand training condition, it remained in that condition until the mastery or termination criteria were met. The mastery criterion was three consecutive sessions with at least 80% correct independent responding. The termination criterion (i.e., failure criterion) was six sessions past the point of mastery of the first modality with no increasing trend in acquisition. Each mand training session consisted of five trials and one session for each modality was conducted in each training block (e.g., five trials of the sign language condition, five trials of the vocal condition, five trials of the picture condition). The order of the five-trial mand training sessions was randomized after each block of sessions. Each five-trial block began with a rule that described the contingency (i.e., “If you want the ___ you have to make the sign,” “if you want the ___ you have to give me the picture,” “if you want the ___, you have to say the word.”). The rule was provided only once at the beginning of each five-trial block. A least-to-most prompting hierarchy was used in the sign and picture conditions and the item was provided contingent upon the target response at any prompt level (i.e., the child received the preferred item during each trial). In the vocal condition, the prompts were provided and the participant received the item if he emitted a correct response (i.e., the child may not have received the item during each trial because the vocal response could not be manually prompted). Prompts for each condition are described below. In each mand condition, motivation was assessed by holding the preferred item in sight but out of reach. If the participant approached the item (gestured toward, grabbed for, or attempted to consume it), the experimenter proceeded with the trial. If the participant actively pushed the item away, or turned away from it, the experimenter briefly held the item down and attempted another trial later in the session.

Sign Language

After the rule statement, the preferred item was presented to the participant in sightline, but out of reach. The participant was allowed up to 4 s to respond. Correct responses resulted in access to the item and initiation of the next trial. Incorrect or no responses were followed by a model prompt. The model prompt consisted of the experimenter demonstrating the movement one time. If the participant responded correctly after the model prompt, the item was provided and the next trial initiated. If the participant responded incorrectly or did not emit a response after the model, a full physical prompt was provided. The full physical prompt consisted of the experimenter taking the participants’ hand and forming it into the sign. After the full physical prompt was provided, the item was also provided and the next trial was initiated.

Picture Exchange

After the rule statement, the preferred item was displayed to the participant in sightline, but out of reach. The participant was allowed up to 4 s to respond. Correct responses resulted in access to the item and initiation of the next trial. Incorrect or no responses were followed by a gestural prompt. The gestural prompt consisted of the experimenter moving her full hand toward the picture. If the participant responded correctly after the gestural prompt, the item was provided and the next trial initiated. If the participant responded incorrectly or did not emit a response after the gestural prompt, a full physical prompt was provided. The full physical prompt consisted of the experimenter taking the participants’ hand and moving it to fully pick up the picture and place it into the experimenter’s hand. After the full physical prompt was provided, the item was also provided and the next trial was initiated. This condition was not designed to replicate the popular PECS™ protocol (e.g., we did not introduce a second distracter card). The protocol was designed to replicate the Gregory et al. picture condition and to equate the sessions across the modalities. Participants were not required to emit discriminated mands in the other modalities, so introducing a distracter card in the picture modality would have resulted in different requirements for correct responding across conditions.

Vocal

After the rule statement, the preferred item was displayed to the participant in sightline but out of reach. The participant was allowed up to 4 s to respond. Correct responses resulted in access to the item and initiation of the next trial. Incorrect or no responses were followed by a partial vocal prompt. The partial vocal prompt consisted of emitting half of the word (e.g., “can” for the word “candy”). If the participant responded correctly after the partial vocal prompt, the item was provided and the next trial initiated. If the participant responded incorrectly or did not emit a response after the partial vocal prompt, a full vocal prompt was provided. After the full physical vocal was provided, contingent upon correct responding, access to the item was given. If the participant responded incorrectly or did not emit a response, access was not given and the next trial was initiated.

Best Alone

Once either the success criterion or the termination criterion was met for each condition, item(s) that were not mastered in their initial assigned condition were re-assigned to the mastered mand modality condition, one at a time. The same procedures and mastery criteria described above were followed.

Results

Table 3 provides an overview of the effectiveness (i.e., was mand training successful in that modality) as well as the relative efficiency (i.e., how many sessions to mastery if mastered) for each participant. Table 3 also shows the number of sessions required for mastery once a modality was deemed most effective and applied to the remaining unmastered mand(s). Strong scores are defined as 60% or above, moderate scores are defined as 40 to 59%, and low scores are defined as under 40%. Most participants (10) demonstrated strong matching skills in the prerequisite assessment. The remaining three participants demonstrated strong motor imitation skills. However, 11 of 13 participants emitted moderate to strong skills in both matching and motor imitation with a slight advantage of one over the other. All participants acquired a mand repertoire in at least one modality. The average was 8.5 sessions for mastery of the first modality (range, 3–25 sessions). Seven participants acquired mands in only one modality, while five acquired mands in two modalities and one (Gabriel) acquired mands in all three modalities. When only one modality was acquired, the modality was pictures for six of the seven participants while the other (Allie) only acquired mands in the sign modality. Five participants acquired mands in two modalities—sign and pictures. For the 12 participants for whom 1 or 2 modalities were effective, the most efficient (i.e., fewest sessions to criterion) modality was chosen, and the other two remaining item-specific mands were trained in that effective modality with success observed in every instance. In every instance, participants mastered all mands in the best alone phase. The range of sessions for mastery of mands in the best modality was 3 to 19, with 10 participants mastering remaining mands in 4 or less sessions.

Table 3 Prerequisite scores and sessions to criterion (STC). The highest prerequisite score and the predicted successful modality given that score are both printed in italics

Figure 1 shows three representative examples of results. The bar graph depicts the prerequisite assessment scores (i.e., percentage independent correct responses for motor imitation, echoic and matching skills). The line graph depicts the percentage correct responding in each of the three modalities during five-trial mand training sessions.

Fig. 1
figure 1

Sample single participant data for Kirk, Gabriel, and Brad. Prerequisite assessment scores are shown in the first bar graph and five-trial mand training sessions in each of the three modalities are depicted in the second line graph. The star symbol indicates mastery of the first modality

Kirk’ scores (top panel) were moderate on the motor imitation and matching assessments and low on the echoic assessment. Kirk acquired the signed mand the fastest (i.e., four sessions), followed by the picture mand (i.e., nine sessions) with no acquisition in the vocal modality. The item previously assigned to the vocal modality was then taught in the sign modality and was mastered in four sessions. Jacob, Herman, Sam, Earl, and Eugene followed this same pattern of moderate to strong scores in matching and motor imitation, with subsequent mastery of mands in both modalities, low scores on the vocal assessment, and no mastery in the vocal modality. Moderate to high scores in only two of the assessments (i.e., matching, motor imitation) were predictive of acquisition in only those two modalities but the assessment with the higher accuracy was not always the modality with the fastest acquisition (e.g., Jacob, Earl).

Gabriel (middle panel) displayed strong skills across all three assessments and mastered all three modalities quickly. He was the only participant with strong scores in the vocal assessment and the only one to acquire a mand in the vocal modality. He also acquired the mands more rapidly than any other participant across all modalities.

Brad (bottom panel) demonstrated strong scores in one assessment (i.e., matching) and subsequently acquired the mand only in one modality (i.e., pictures). When the preferred items previously assigned to the other modalities were taught as pictures, they were quickly acquired. In this instance, the prerequisite assessment scores were highly predictive of the modalities that would succeed versus those that would fail. Similarly, Victor and Axel had highly predictive assessment results with the strong assessment score predictive of a singularly effective modality. Kameron, however, had a much higher score in motor imitation than the other assessments and yet only acquired mands in the picture modality (i.e., not predictive).

Table 4 presents the Pearson product moment correlation coefficients (Pearson r) calculated for each assessment with the subsequent mand training results. The correlations examined the relation between the following variables: (1) percentage correct on the relevant prerequisite assessment (i.e., total score) and the overall percentage accurate and independent responding in mand sessions for the matched modality and (2) percentage correct on each ten-trial sub portion of the prerequisite assessment and overall percentage accurate and independent responding in mand sessions in the matched modality. The correlations between the total score and each ten-trial component part of the matching prerequisite assessment were weakly correlated with the overall accuracy in mand training sessions in the picture modality. The correlations between the total score and each ten-trial component part of the motor imitation prerequisite assessment were weakly correlated with the overall accuracy in mand training sessions in the sign modality with the gross motor assessment approaching a moderate correlation. The correlations between the total score and each ten-trial component part of the vocal imitation prerequisite assessment were moderately to very strongly correlated with the overall accuracy in mand training sessions in the vocal modality. Excellent performance in the two-syllable gross motor assessment accurately predicted establishment of vocal mands.

Table 4 Pearson product moment correlation coefficients (Pearson r), showing the relation between (1) the percentage correct on the relevant prerequisite assessment (i.e., total score) and the overall percentage accurate and independent responding in mand sessions for the matched modality and (2) percentage correct on each 10-trial sub portion of the prerequisite assessment and overall percentage accurate and independent responding in mand sessions in the relevant modality

Discussion

Mands have been successfully taught to children with autism spectrum disorders (ASD) and intellectual disabilities (ID) using each of the response modalities included in this study (Carbone et al., 2010; Tincani, 2004). Our findings replicate prior findings of general mand training effectiveness; each of the modalities proved successful for at least one child. The current study advances the literature on a data-based strategy for selection of response modality for mand training, increasing the empirical support for practitioners making this decision.

Gregory et al. (2009) assessed matching skills and motor imitation as prerequisites for mand training with children with ID; however, they did not include an assessment of vocal responding or attempt to train a mand using the vocal modality. They found that with both repertoires intact in the prerequisite assessment, either response modality was mastered and the picture exchange response was mastered more quickly. When neither repertoire was strong, sign was never mastered and only one of three mastered the selection-based system. The current study included an assessment of vocal imitation and vocal mand training. Our findings partially replicate the Gregory et al. finding of the picture exchange response being mastered more rapidly when both repertoires were intact (i.e., moderate to high accuracy in the prerequisite assessment) with nine participants showing this pattern. However, we did not replicate the Gregory et al. findings with three participants (Allie, Kirk, and Sam) because pictures were either not mastered as a modality or sign was mastered more quickly. We did not replicate the second finding because none of our participants had weak repertoires in both the matching and motor imitation assessment. Each child mastered mands in at least one modality and had at least one prerequisite assessment score that was moderate to high. This discrepancy in findings may suggest differences in our participants’ learning histories (i.e., the majority of participants in the current study were younger and did not have ABA prior to participation in the study).

The primary purpose of this study was to determine if a brief assessment of relevant prerequisite skills for three common response modalities (i.e., vocal, sign language, picture exchange) would effectively predict the rate of acquisition during mand training in each response modality. Only the ten-trial two-syllable vocal assessment was strongly predictive of the subsequent success of mand training. The clearest conclusion can be drawn here. If a child scores low on the vocal imitation assessment, vocal mand training is not likely to prove effective quickly, so one of the other modalities should be chosen. Our study included only one participant with strong scores on the echoic assessment. Thus, it is not possible to determine if it is valid to use the vocal assessment to predict success—that is, if a child emits strong scores on the vocal assessment, it is not certain he or she will perform well with the vocal modality. It is, however, possible to use the assessment to rule out the vocal modality, which can have important implications for intervention. We hypothesize that many clinicians may be inclined to choose the vocal modality in children who do not demonstrate the appropriate prerequisite skills to be successful with vocal mands. Choosing the vocal modality with this profile of learner may lead to problem behavior, and lack of success with mand training.

Matching and motor imitation scores were only weakly to moderately predicative of accuracy during mand training. That is, moderate to strong scores on the prerequisite assessment did not guarantee the success or failure of that modality. One participant (i.e., Kameron) had strong scores on one of the three assessments but then mastered mands in the non-predicted modality. All other participants with a single score that was much higher than the other scores acquired the mands in the predicted modality rapidly. Many children scored moderately in two of three assessments and then acquired mands in both of those modalities, but the most rapid acquisition (i.e., the efficient modality) was not always predicted by the assessment. Thus, these findings suggest that the prerequisite assessment is generally predictive of effectiveness but not necessarily predictive of efficiency of a mand modality. A child with moderate to strong scores on the matching and motor imitation assessments is likely to acquire mands in one or both of those two relevant modalities. However, it is not guaranteed that the higher of the two scores identifies the more probable modality for rapid success.

Since the prerequisite assessments were not equally predictive of success, they should not be conducted as a sole means of selecting the modality for mand training. The prerequisite assessment can confidently be used to rule out the vocal modality if the scores are low and on average, which takes only 10 min to conduct. Greater confidence can be gained about the choice between pictures and signs by implementing mand training with a single item for each condition as was done during the mand training phase of this study (i.e., the right panels in Fig. 1). This portion of the assessment was conducted relatively quickly with the average five-trial mand training sessions lasting 4.5–5 min in duration. Since the average child was able to meet the mastery criterion in about eight to nine sessions (range, 3–25 sessions), the direct comparison could allow a practitioner to choose the modality for teaching subsequent mands within approximately 15–125 min.

If multiple modalities are likely to prove equally effective (i.e., strong scores on multiple prerequisite assessments; rapid acquisition of the initial mands in all conditions), other important variables should impact the choice of modality (e.g., social validity, response effort for change agents, parental preferences, and willingness to implement procedures). Some research has begun to investigate these variables. For example, Torelli et al. (2015) conducted a concurrent operant mand preference assessment with one participant in three main mand modalities (picture exchange, iPad ®, GoTalk®) and found that the participant’s highest preferred modality was the iPad ® which was also most preferred by his mother. This type of methodology could be used in conjunction with a parent-completed treatment acceptability assessment to select among relatively equally effective modalities. It is also possible that the quick prerequisite assessment could be repeated multiple times to identify when a new repertoire is intact and might render vocal mand training a viable option. However, this would need to be evaluated in future studies over a longer period of participation.

One limitation of the current study is the lack of ability to ensure a reinforceable response in each condition. The sign or picture exchange response could be prompted in those respective conditions, but the experimenter could not physically guide the vocal response. This means that the probability of emitting a reinforceable response and contacting reinforcement was lower in the vocal mand training condition than the other two conditions, which likely accounts for the less robust effects in that modality. However, this limitation of the procedure is the specific limitation of vocal mand training that leads to the need for other modalities in the first place. A possible solution to this issue would be to accept vocal approximations. However, we specifically chose to only accept full vocal responses as correct, because we wanted to identify a modality that would be effective for the child. We hypothesize that some practitioners may choose the vocal modality because a child can echo certain sounds within the word, but those approximations may not be easily interpreted by listeners. By only accepting full vocal responses as correct, we ensured the vocal modality was only demonstrated to be the modality of choice if it was a response that did not require additional shaping. A second limitation of the current study is that the item-specific differences in preference for the items or in the establishing operation during training could have impacted the comparison of the different mand training conditions (i.e., differential preference rather than the modalities account for differential effectiveness). However, the items were initially chosen to be relatively equally preferred and in the same stimulus class to minimize this likelihood. In addition, each time an item-specific mand was not acquired in the assigned modality, it was reassigned to the successful modality condition and was rapidly acquired, suggesting that the modality was the critical variable in differential effectiveness of mand acquisition. A third limitation is that the picture exchange condition did not mimic some popular picture exchange systems such as PECS ™. While this could be seen as a limitation because participants were never required to emit mands with a distracter card present as is prescribed in these protocols, it is also a strength in that it ensured the conditions were equal. That is, participants in the sign and vocal condition never had to emit mands under conditional stimulus control of different preferred items, so introducing a distracter card would have given the picture condition a disadvantage over the other two. Future researchers may wish to include a conditional discrimination requirement to more closely reflect the ultimate mand repertoire that is commonly established via selection-based systems. Finally, the goal of mand training is for the learner to emit mands for a variety of preferred items. Future researchers should include skills hypothesized to be necessary for success with discriminated manding and include them in a prerequisite assessment. Additionally, it is difficult to directly compare the matching prerequisite scores to the other two assessment scores because chance responding could have occurred. However, this limitation reflects the difference in the way the behaviors would occur in the natural environment (e.g., matching in an array vs. imitating).

Future researchers may wish to consider investigation of the prerequisites that might predict success with other AAC systems such as speech-generating devices. We did not choose to include investigation of those here because we sought to replicate and extend the Gregory paper which did not include them. Additionally, the majority of our participants were early learners and did not have access to these types of systems, so including basic modalities was consistent with what likely would have been done in early clinical practice (i.e., choosing a low-cost, low-tech alternative system first). There are most certainly interesting prerequisites that might predict success in other AACs such as pointing, scanning, and orientating to the device.

The prerequisite assessments examined in this study provide some predictive utility for selecting a mand modality for training. This approach to using preliminary assessments to select intervention procedures is showing promise as a strategy to enhance the efficiency of behavioral treatment for children with autism (e.g., Kodak et al., 2015). However, the quick assessments are not perfectly predictive. It is preferable to consider the initial assessment as a tool to rule out whether the vocal modality should be attempted in the subsequent quick evaluation of viable modalities. Other variables such a caregiver preference can then be incorporated to select a modality if multiple modalities are prove equally effective (i.e., a mand is mastered in each). As the literature on tailoring behavioral interventions for specific individuals evolves, the consideration of effectiveness, efficiency, and preference among equally effective strategies can guide clinical decision-making.