According to 2016 data, approximately 1 in 54 children in the United States are diagnosed with autism spectrum disorder (ASD; Centers for Disease Control, 2018). Individuals diagnosed with ASD are characterized by deficits in social communication and restricted and repetitive behaviors (American Psychological Association, 2015). Although researchers have identified numerous interventions that improve outcomes for children with ASD, early and intensive ABA programming has been the most advantageous, leading to positive effects in intellectual functioning, language development, acquisition of daily living skills, and social skills (Virués-Ortega, 2010).

Despite insurance mandates in almost all 50 states, ABA treatment and associated caregiver training remain inaccessible to many families due to financial barriers, limited qualified professionals in rural areas, and waiting lists across treatment providers (Siller et al., 2014; Irvin et al., 2012), thereby creating a service-need discrepancy (Nefdt et al., 2010; Wainer & Ingersoll, 2013). Researchers have emphasized that caregiver involvement is critical to language development and the long-term success of children with ASD. Therefore, identifying an effective method for training caregivers to implement interventions may circumvent the service-need discrepancy as well as promote generalization and maintenance for children (Barton & Fettig, 2013; Lang et al., 2009; McConachie & Diggle, 2007).

Communication skills may be an important child outcome to target via parent-implemented interventions given delays in communication among children with ASD and the cascading effects of this delay on other areas of functioning. The mand is a verbal operant under control of a motivating operation that allows individuals to communicate their wants and needs (Sundberg, 2007). Delays in the development of a vocal mand repertoire may lead to a myriad of behavioral deficits and excesses that impede successful communication and social interaction (Plavnick & Vitale, 2016). The benefits of mand training for children with ASD include a reduction in maladaptive behavior, an increase in social initiations, and an increase in spontaneous language (Carr & Durand, 1985; Charlop-Christy et al., 2002). The mand, therefore, is the most advantageous verbal operant for the speaker and should be prioritized in treatment (Sundberg & Michael, 2001).

In general, mand training based on Skinner’s analysis of verbal behavior is comprised of four critical components: a motivating operation (MO), a prompt if necessary, the mand, and contingent reinforcement. Motivating operations (MO) temporarily increase the value of a reinforcer and the probability of behaviors that have been previously reinforced and are thus a critical component of mand training (Sundberg & Partington, 1998). To increase the likelihood that mands are controlled by an MO, researchers have examined the effectiveness of capturing and contriving MOs (e.g., Howlett et al., 2011; Sundberg et al., 2002; Taylor et al., 2005). Capturing naturally occurring MOs involves observing the individual for behavioral indication such as approaching, reaching, pointing, or using nontargeted language in reference to a stimulus prior to providing any prompts to mand for that stimulus (Albert et al., 2012; Drasgow et al., 1996; Jennett et al., 2008; Sundberg, 1993). Contriving MOs involves withholding preferred items, positioning items in view but out of reach, and interrupting behavior chains to increase the value of items needed to complete the chain (Bourret et al., 2004; Drash et al., 1999; Simic & Bucher, 1980; Sweeney-Kerwin et al., 2007). Although contriving MOs is beneficial for increasing opportunities to mand, it is still critical that the interventionist wait for behavioral indication to ensure that there is an MO present for a given stimulus (Drasgow et al. 1996).

After the interventionist has observed that an MO is present, they wait to allow the individual to independently mand. If the mand is not emitted, the change agent can prompt or model (i.e., an echoic prompt during vocal mand training) the target response and deliver the reinforcer contingent upon the emission of the mand (Greer & Ross, 2008; Kodak & Clements, 2009; O’Reilly et al., 2012; Sundberg & Michael, 2001). Finally, the mand, whether independent or prompted, is reinforced by delivering the stimulus. These components of mand training have been effective in teaching individuals to request items across modalities (Charlop-Christy et al., 2002; Hall & Sundberg, 1987; Jennett et al., 2008) for a variety of reinforcers including edible items (Kodak & Clements, 2009; Sweeney-Kerwin, 2007), help (Rodriguez et al., 2017), information (Landa et al., 2020; Sundberg et al., 2002), and the removal of a stimulus that prevents access to a preferred activity (Shillingsburg et al., 2013).

Caregivers are well-positioned to provide intervention on this deficient repertoire with young children given the proportion of time they spend in the natural environment. Caregiver training is commonly provided in the home and community settings using behavioral skills training (BST). BST consists of the trainer defining the target behavior and providing trainees with a written description of the procedures to be learned. The trainer models the procedures being implemented correctly, then subsequently requires the trainee to practice or rehearse implementing the procedures. The trainer provides feedback and additional opportunities for the trainee to rehearse and receive feedback until mastery is demonstrated (Parsons et al., 2012). Researchers have used these procedures to teach caregivers to implement a variety of interventions including three-step prompting (Tarbox et al., 2007), imitation (Ingersoll & Gergans, 2007), and communication training (Hsieh et al., 2011; Suberman & Cividini-Motta, 2020). Despite being the most widely used training procedure for teaching change agents to implement interventions, BST may not be the most cost-effective or efficient method for training caregivers (Maffei-Almodovar & Sturmey, 2018), particularly those without access to a trainer. Although there is recent research suggesting that front-line staff with little training experience can learn to train others using BST (Erath et al., 2020), which can reduce the need for expert practitioners to conduct BST, families without any access to a service provider would not be able to contact even front-line staff for training. Therefore, one potential alternative for training caregivers is video modeling.

Video modeling is a teaching procedure that involves an individual viewing a videotaped sample of a model performing a specific, scripted activity or task. Immediately after viewing the video-based model, the trainees are directed to perform the activity or task they observed in the video. Like BST, video modeling allows the trainee to observe the correct implementation of the target procedures. However, once a video is created it can be reused and adapted as necessary with the same trainee and other trainees (Ayres & Langone, 2005). Video models can be easily disseminated and can serve as feedback in instances when trainees need continued support (Brock et al., 2018), which can save time and may reduce costs. Video models may also include voice-over narration (VMVO) and on-screen text highlighting salient features (VMVOT).

Researchers are increasingly demonstrating the effectiveness of video modeling and its derivatives for teaching a variety of skills. Gerencser et al. (2020) conducted a review of asynchronous training methods for teaching implementers to conduct interventions with children with ASD. They concluded that video modeling was a critical component across the asynchronous methods and increases in all implementers’ fidelity were observed across studies when training procedures included a video model (Gerencser et al., 2020). Researchers have successfully utilized VMVO and VMVOT to teach staff to implement a variety of assessment and intervention procedures, including discrete trial instruction (VMVO; Vladescu et al., 2012), preference assessments (VMVO; Weldy et al., 2014), generalized imitation assessment and intervention (VMVO; Du et al., 2016), and behavior intervention plans (VMVOT; DiGennaro-Reed et al., 2010). The extant literature suggests that video modeling may be an efficient and effective option for training practitioners, educators, and caregivers to implement a variety of interventions with a high degree of fidelity.

Notwithstanding the growing literature base on the utility of video modeling, few studies have examined its effectiveness related to training caregivers to influence communication outcomes such as manding. Loughrey et al. (2014) used BST combined with traditional video models to sequentially train caregivers to implement eight skills associated with mand training (e.g., capturing and contriving motivation, incidental teaching, differential reinforcement). Instructions alone were insufficient in increasing participants’ fidelity to criterion levels, but when participants received all the components of BST, they each increased fidelity above 80%. Lane et al. (2016) used video models and coaching to teach two caregivers to increase environmental arrangement and responding to promote vocal communicative responses. Both caregivers reached criterion with minimal coaching, but maintenance was only assessed for one participant, and her fidelity was poorer than during intervention.

To investigate training methods that circumvent resource intensity, scheduling demands, and accessibility, Douglas et al. (2018) used an online course management system to train parents to increase opportunities and to respond to their child’s communication. The training consisted of written slides with narration, video models, and quizzes. The mand training intervention required that the caregiver prepare the activity, offer opportunities for the child to communicate, wait for the child to communicate, and respond to the child’s communication. Caregivers spent an average of 2 hours completing the online training, after which they each increased communicative opportunities and responsiveness. However, their performance was variable during posttraining sessions, and below criterion in the maintenance phase. The effect on the children’s communication was variable and slightly above baseline levels. One critical limitation in this study was that the mand training intervention did not include instructions for prompting communication, a critical element of mand training (Hart & Risley, 1975; Rogers-Warren & Warren, 1980).

Although the effectiveness of video modeling has been evident in the results of these studies, some important gaps in the literature need to be addressed. The present study evaluated whether a brief video model containing the critical elements of mand training, described using the mnemonic POWER, was effective in teaching parents of children with autism to conduct mand training during 10-min play sessions and increase single-word vocal mands. Our specific research questions were as follows:

  1. 1.

    Is VMVOT an effective method for teaching caregivers to conduct mand training?

  2. 2.

    To what extent do caregivers implement mand training with fidelity without first viewing the VMVOT and do these effects maintain over time?

  3. 3.

    To what extent do caregivers find the goals, procedures, and outcomes to be socially valid?

  4. 4.

    What effect does caregiver fidelity have on the percent of children’s independent vocal mands?

Method

Participants and Setting

We recruited parent–child dyads who were on waiting lists to receive ABA therapy through contacts with early intervention providers and diagnostic clinics, and by advertising through social media in an urban city in the southeast United States. Two dyads responded to social media posts and the other was referred through participation in an unrelated parent training center at a nearby university. Adult participants were included if they reported they had not received any previous training on conducting mand training with their child, agreed to meet one to two times per week for up to 1 hr, agreed to provide an appropriate area in the home for sessions to occur, and consented to audio and video recording for themselves and their child.

To be included in the study, child participants must have met the following criteria: (1) age 2–5 years, (2) have a medical or educational diagnosis of ASD that could be confirmed via parent supplied reports, (3) on a waiting list to receive ABA therapy, (4) show an interest in manipulative activities, and (5) have an echoic repertoire but little to no functional mands for preferred items. We conducted a preassessment of the echoic repertoire to determine participant eligibility. The researcher administered groups 1 and 2 of the Echoic Screening Assessment (EESA; Esch, 2008), according to the instructions. The researcher only assessed these groups because the child could mand for the available toys during intervention using one- or two-syllable words (e.g., ring, link, Lego®). Children were included in the study if they scored a minimum of 25 points with at least 20 points from Group 1. This criterion corresponds to the upper bound of Level 1 to a mid-range of Level 2 on the Verbal Behavior Milestone Assessment and Placement Program (VB-MAPP; Sundberg, 2008), which indicates the vocal skills typically acquired between birth and 30 months. The specific skills represented in Groups 1 and 2 of the assessment correspond to the presence of the following speech skills: vowels, diphthongs, early consonants, and two-syllable combinations. Two children were assessed but failed to meet the minimum criteria; therefore, they were excluded from participating in the study. No parents demonstrated mand training with fidelity above criterion levels during baseline; therefore, none were excluded from participation.

Three mother–child dyads participated in the study. All participants identified themselves and their children as Black (of African American descent) and were each assigned a pseudonym to maintain their privacy. Carol, Alex’s mother, was 38 years old. She was married and had two other children. Carol had an associate degree, worked as a court reporter, and had a household income over $100,000 per year. Alex was 4 years and 6 months old at the start of the study. He was diagnosed with ASD, attended a half-day preschool inclusion classroom in a public school, and achieved a score 52 on the EESA (Esch, 2008), indicating vocal abilities within the 18–30 months range (Level 2).

Melissa was Jackson’s mother. She was 37 years old, had a bachelor’s degree, and worked as a business owner. Melissa was married, and Jackson was their only child. Their annual household income was greater than $100,000. Jackson was 2 years and 6 months and had a diagnosis of ASD and developmental delay. Jackson participated in a full-day inclusion Montessori preschool. His score on the EESA (27) was consistent with entering into Level 2 (0–18 months).

Annette was Daniel’s mother. She was 38 years old. Annette held a high school diploma and was a stay-at-home parent. She was married and had one other child. Her family’s annual income was between $25,000 and $50,000. Daniel was 5 years and 1 month. He had a diagnosis of ASD and scored within the Level 2 range on the EESA (45.5). His mother provided him a home-school education.

Mothers identified their living room as an area their child frequently played. The room included at least a 1.2 m x 1.2 m area for the mother and child to engage in toy play, was free of competing activities, and included a place for the researcher to sit and record the session. Participants conducted mand training with their children seated 1–2 ft in front of them on the floor or at a child-sized table with two chairs (Annette and Daniel). Target activities were individually stored in clear plastic bins with lids and kept in the child’s view but out of reach. The researcher either sat on the couch or the floor within 5 ft from the participants and only engaged with them according to the written procedures described below.

Materials

The researcher used a digital timer to keep track of the session duration and used pencil and paper to collect data on both caregiver and child dependent variables. The researcher brought the target activities to the participants’ home each session and showed the caregivers the training video during intervention sessions on a Surface Pro laptop computer. Caregivers conducted sessions using five activities suitable for engaging in reciprocal play for which there was only one salient feature to mand such as blocks or trains, as opposed to a dollhouse that may have several items. The five activities for each dyad were identified through a single stimulus preference assessment.

We created a 10-min VMVOT containing five clips to depict different child responses (i.e., independent mand, prompted mand, error, loss of motivation, and no motivation). The video model depicted the researcher conducting mand training (see Table 1) with a 3-year-old neurotypically developing girl. We used a mnemonic, POWER, to highlight the steps of mand training using an echoic prompt and aid acquisition. The procedure consisted of facilitating mands while taking turns playing with the child using their preferred items. The researcher used the child’s preferred activities to demonstrate how to contrive motivation, use an echoic prompt to evoke a vocal mand, correct an error, respond if the child was not motivated, and reinforce mands. In particular, the researcher contrived motivation by first manipulating one of the target toys and offering a piece to the child (holding the item toward the child slightly outside of reach). The researcher waited 3 s for behavioral indication (e.g., reaching for the item, looking at the item and back at the researcher, vocalization). If the child emitted a vocal mand (the noun matching the item for which there was motivation) the researcher immediately delivered it and said the name of the item to pair the spoken word with its delivery. If the child indicated motivation but did not mand within 3 s, the researcher maintained control of the item and provided an echoic prompt (i.e., said the noun that corresponded to the item). Then the researcher waited 3 s for the child to imitate. If the child imitated, the researcher immediately delivered the item and said the corresponding noun. If at any point the child exhibited behavioral indication but incorrectly manded, the researcher maintained control of the item and provided an echoic prompt. If at any point the child lost motivation the researcher removed the activity and presented another. In addition to these modeled procedures, the first author described the steps individually as it was demonstrated (via voice-over) and simultaneously displayed salient words on the screen. For example, “The first step is to play. Select an activity and position it between yourself and the child, and then add the first piece.” This was demonstrated and the on-screen text displayed “Play: put toy between you and your child.”

Table 1 POWER Mand Training Procedures

Recording and editing took approximately 2.5 hr. Two board certified behavior analysts and one graduate student lead registered behavior technician viewed the training video for clarity before beginning the study. All three reviewers had experience in conducting mand training with echoic prompts and indicated the procedures were clear and succinct. None of the reviewers recommended that we make any revisions.

Dependent Measures and Reliability

We had two dependent variables in our study: (1) the percent of correctly implemented intervention steps and (2) the percent of independent mands. We developed a 10-step mand training task analysis (see Table 1). We measured caregiver fidelity for each mand opportunity and defined an opportunity as any instance in which the mother offered the child a toy (e.g., holding a block out toward the child). Only certain steps of the task analysis were applicable for data collection depending on the child’s response. For example, if the child independently manded then step 6 was coded as NA because the mother did not have to prompt the mand. If the child manded (either independently or following a prompt) and continued to show motivation, steps 9 and 10 were coded as NA. Steps 9 and 10 required that the caregiver terminate the activity and present an alternative one, therefore, if at any point the child lost motivation (e.g., attempted to reach for another activity, or left the play area) the remaining steps were coded as NA. We omitted all steps scored as NA from the total number when calculating the fidelity for each session.

We calculated caregiver fidelity as the percent of steps completed correctly across trials during the 10-min mand training session by dividing the number of steps correct by the total possible steps and multiplying by 100. The total number of trials per session varied based on the child’s motivation. Sessions were terminated if the child did not show motivation (e.g., reach for toys, watch the parent manipulate the toy, vocalize) for 2 consecutive minutes and were excluded from analysis if the caregiver provided fewer than five opportunities. We chose these criteria because mand training is controlled by the motivation of the learner; if there was no motivation for at least 2 consecutive minutes, then few teaching opportunities would be available. In addition, too few opportunities may overestimate the mother’s fidelity. We only discarded one session (dyad 2).

We measured mands by tallying prompted and independent mands on a direct observation datasheet and converting this into a percentage. Percent of independent mands were derived by dividing total independent mands by total opportunities to mand and multiplying by 100. Prompted mands were defined as articulate vocal utterances of a noun or adjective–noun phrase (e.g., ring or blue ring) that specified the stimulus for which there was motivation within 3 s of the caregiver ' s echoic prompt. Independent mands were defined as articulate vocal utterances of a noun or adjective–noun phrase within 3 s, under the control of the MO and/or tact (i.e., following the caregiver holding the stimulus out toward him, reaching for the item, or looking at the item) without vocal prompts from the caregiver.

Experimental Design

We used a nonconcurrent multiple baseline across dyads design (Carr, 2005; Coon & Rapp, 2018) to evaluate the effects of VMVOT on the fidelity of mand training. The nonconcurrent multiple baseline design is a variation of the multiple baseline design, which does not require concurrent observations and was selected because the participants of this study were not all recruited at the same time. The first dyad began baseline in May 2019. Dyads 2 and 3 entered the study 5 months later and were separated by 2 weeks. We made phase change decisions based on the stability of caregiver fidelity.

Procedures

Preference Assessment

We conducted a single stimulus preference assessment to identify potential activities to target for each participant. A single stimulus preference assessment is a brief assessment in which each stimulus is singly and successively presented, and approach behaviors are measured to differentiate preferred from nonpreferred stimuli (Pace et al., 1985). We selected this type of preference assessment for several reasons. First, our purpose was not to identify a preference hierarchy, but to identify several stimuli that the child might enjoy with their mother. Second, identification of multiple stimuli for this purpose would require several iterations of other types of preference assessments (e.g., paired stimulus, multiple stimulus without replacement; DeLeon & Iwata, 1996; Fisher et al., 1992). Third, the procedures included steps for making prompts contingent upon the child displaying behavioral indication; thereby giving an in the moment assessment of preference. Fourth, the single stimulus preference assessment controlled for the possibility of position bias and if children had difficulty making selections from multiple stimulus arrays.

The single stimulus preference assessment consisted of 10 preselected activities with one salient feature appropriate for dyad play (such as blocks or trains, as opposed to a dollhouse that may have several components). The researcher conducted three 10-trial sessions in a counterbalanced order; each stimulus was presented three times. A trial began with the researcher modeling the use of the toy/activity for 10 s and refraining from any vocalizations. The researcher then held the toy out toward the child for up to 5 s. If the child approached the stimulus within 5 s, the researcher provided access for 30 s. If the child did not approach the item within 5 s the researcher removed the item and presented the next toy (Pace et al., 1985). An approach was defined as reaching or moving toward the toy (Hagopian et al. 2001). The researcher recorded a (+) to indicate that the child approached the toy and a (-) to indicate that he did not approach. We gave the child a 5-min break after each 10-trial session in which we restricted access to the assessment items to control for satiation. We repeated these procedures two more times. Preference for each toy was determined by calculating the total number of approaches divided by the total number of presentations (three) and multiplying by 100% for each stimulus. The five activities with the highest percent of approaches were selected for the intervention. There was a tie for the fifth rank in Jackson’s preference assessment thus his mother selected the toy to be included.

Alex’s highest-ranked activities identified in the single stimulus preference assessment were shaper sorter, pop-up pirate, links, pegs, and ring stacker. Alex previously used colors as a primary method for requesting items. His mother did not want to discourage the use of adjectives therefore acceptable mand form included nouns and adjective–noun phrases. In addition, he previously acquired the tact for most shapes. Acceptable mands for the shape sorter included “shape,” “specific shape (e.g., triangle),” or “color + specific shape (e.g., red crescent).” For the pop-up pirate, acceptable mands included “sword,” or “color + sword.” For the remaining activities, acceptable mands included the specific nouns, link, peg and ring, and the desired color (e.g., yellow link, red peg, blue ring). Daniel and Jackson only used nouns. Daniel’s targets included peg, link, sword, shape, and gear. Jackson’s targets included Lego, puzzle, ring, bead, and sword.

Baseline

Baseline and intervention sessions took place in the residence of each mother–child dyad. We recorded all sessions using a Samsung Galaxy S10 Plus cell phone. Each visit consisted of one to two 10-min sessions and occurred one to two times per week. To control for possible satiation, the researcher maintained control of the target activities between sessions and scheduled sessions after participants arrived home from school. Alex and Carol’s sessions always took place within an hour of his arrival home from school, except for three mid-morning weekend sessions due to rescheduling. Melissa and Jackson’s sessions always took place in the early evening immediately upon being picked up from his grandmother’s home, who picked him up from school and cared for him until his parents left work. Annette and Daniel’s sessions occurred either early morning or early afternoon taking place before or after his home-school lesson. His mother restricted access to similar activities at least 30 min before scheduled sessions.

During each session, we gave the caregivers the five target activities in clear bins then instructed them to “Play with your child and try to get him to ask for the specific items.” No systematic consequences were provided for the participants’ correct or incorrect implementation of the training procedures. We did not provide instructions, answer questions, or provide feedback. After 10 min elapsed the researcher instructed the mothers to terminate the session and clean up. During visits in which two training sessions occurred the mother and child took a 10-min break between training sessions, during which access to the target toys was restricted.

Intervention

Intervention sessions were identical to baseline except that before the mothers conducted the mand training session, she viewed the video model. The researcher gave the participants the laptop, set a timer for 10 min, and said “You have 10 min to watch this video. You can rewind, fast forward, or replay as much as you want.” At the end of the 10 min, the researcher took the laptop from the participant, gave her the bins of toys, and said, “Do what you saw in the video and try to get your child to ask for the specific items.”

Just as in baseline, the researcher did not answer any questions or provide feedback. During visits in which two training sessions occurred, the participants took a 10-min break during which the mothers watched the video model a second time. Children spent their breaks engaging with nontarget activities or eating a snack. Participants did not have access to the video model outside of sessions.

Motivation Probe

At different points in the study, Jackson and Daniel engaged in behaviors that suggested that they may have been satiated with the target activities used in the study. We utilized this as an opportunity to reevaluate motivation for the target items. We conducted one motivation probe with Jackson and Melissa during the intervention phase, and one during the baseline phase with Daniel and Annette.

One of Jackson and Melissa’s sessions were discarded due to too few mand opportunities. When Melissa initiated play, Jackson left the play area and ran around the house laughing. Melissa responded by chasing after and redirecting him back to the play area. This led the research to hypothesize that Jackson was not motivated to play with any of the target activities. The subsequent session included the motivation probe. Melissa viewed the video, then we instructed her to follow Jackson’s lead and conduct mand training with toys and activities of his choosing. Jackson continued to indicate motivation for the target items; therefore, we did not conduct another preference assessment or change the target activities.

In the session prior to the motivation probe, Daniel engaged in vocal refusal (e.g., saying no), crying, and falling to the floor when his mother touched the box containing the target activities suggesting that he was not motivated to play with those specific items. Daniel and Annette were still in baseline, so during the motivation probe we asked Annette to follow Daniel’s lead and conduct mand training with toys and activities of his choosing. In addition, we conducted a second preference assessment to select new targets to ensure that infrequent mands in the intervention phase were not due to changes in preference or satiation from target activities. The preference assessment took place following a 10-min break after the motivation probe and based on the results we selected four new target activities for Daniel. In all sessions following the motivation probe, we conducted only one 10-min session per scheduled visit to reduce satiation.

Posttraining

After the participants achieved 80% fidelity across three consecutive sessions, we conducted a posttraining probe to evaluate whether they could conduct mand training with fidelity without first viewing the video model. Posttraining probes were identical to baseline procedures and occurred during the next scheduled session after the criterion was met.

Maintenance

Four to 6 weeks after mastery we returned to the participants’ home for one visit. One to two mand training sessions occurred in which conditions were arranged identically to baseline; that is, the caregiver did not view the video model. Participants did not have access to the video model during the time between posttraining and maintenance.

After the maintenance check, the researcher reviewed the participants’ overall performance and provided recommendations for the continued use of the mand training procedures to further facilitate their child’s mand repertoire.

Interobserver Agreement

The first author trained the third author to collect data as a secondary observer on all dependent variables for the purpose of evaluating interobserver agreement (IOA). Training consisted of the researcher explaining each step of a trial, reviewing both the datasheet and video model, and achieving a minimum of 90% agreement on all dependent variables across two randomly selected baseline sessions (these were not included in the agreement data). The secondary observer was a graduate student with 3 years of experience working in a center-based program for children with ASD. She had experience conducting mand training using an echoic prompt. IOA was assessed from video recordings of both caregiver and child participant dependent variables for at least 30% of sessions across all conditions and dyads. We calculated point-by-point agreement by comparing the primary and secondary observers’ data, dividing the number of agreements by the number of agreements plus disagreements, and multiplying by 100. Agreement for participants’ integrity was 89.5% (range: 80%–96%) in baseline, 93.5% (range: 87%–100%) in intervention, 99% (range; 98%–100%) in posttraining, and 97% (range; 96%–99%) in maintenance. IOA for children’s independent mands was 94.8% (range: 82%–100%) in baseline, 92.6% (range 85%–100%) in intervention, 93.6% (range: 93%–94%) in posttraining, and 95.3% (range 91%–100%) in maintenance.

Procedural Fidelity

Procedural fidelity was measured during the same sessions as IOA across phases for all dyads. In baseline, intervention, posttraining, and maintenance phases, the secondary observer viewed the recorded sessions and measured whether the researcher provided the target toys, delivered the correct instruction, refrained from answering questions or giving feedback, and terminated the session after 10 min. Procedural fidelity for the intervention phase was identical to the baseline phase, with an additional step of ensuring that the researcher gave the participant 10 min to watch the video model. Procedural fidelity was 100% across all baseline, intervention, posttraining, and maintenance sessions.

Social Validity

We created an eight-item, 5-point Likert-type scale questionnaire evaluating the social validity of the goals, procedures, and outcomes of the study. Each item was rated on a scale from strongly disagree (1) to strongly agree (5). Following the maintenance probe, we sent participants a web link via text message to access the anonymous questionnaire. Participants were asked questions related to their child’s need for the intervention, the effectiveness of the video model, and ease of implementing the intervention. They were also asked questions about whether the intervention produced an increase in their child’s requesting and whether they would continue to use the intervention in the future. The questionnaire included one additional open-ended question at the end asking for feedback about the study and/ or video model. Two of the participants completed the questionnaire.

Results

Figure 1 depicts the effect of VMVOT on caregiver fidelity across baseline, intervention, posttraining, and maintenance phases. Carol’s baseline performance had a stable level and a flat trend (average steps implemented correctly = 9%). There was an immediate change in level and trend from baseline to intervention, and no overlap between intervention and baseline data points, but fidelity in session 12 fell below criterion. Mean fidelity in the intervention phase was 75%, suggesting a basic effect. When Carol was asked to conduct mand training in the posttraining phase, without first viewing the video model, her fidelity averaged 97% across the two mand training sessions. The effect of VMVOT was maintained for 6 weeks at 95% fidelity.

Fig. 1
figure 1

Participant Fidelity per Session

Visual inspection of Melissa’s implementation of mand training in baseline showed little variability with a flat, relatively stable trend, and a mean of 47% fidelity. A basic effect was observed, indicated by an increase in fidelity to an average of 82% in the intervention phase. Melissa completed 86% of steps correctly during the posttraining session and maintained fidelity for 4 weeks at levels greater than (92%) the intervention phase.

Annette’s data also depict a basic effect. There was an initial increasing trend across the first five sessions, however, the remaining sessions were stable around 48%. Fidelity immediately increased in level following the introduction of VMVOT and remained stable throughout the intervention phase. Annette’s average fidelity in baseline was 42% compared to 73% during the intervention phase. Annette maintained fidelity (80%) of mand training in the posttraining session at levels greater than the intervention average. She subsequently maintained fidelity after a 4-week follow up (79%).

Figure 2 shows the effect of caregivers’ fidelity of implementing mand training on their child’s percent of independent mands. In baseline, Alex independently manded in an average of 16% of opportunities in baseline. When the video model was introduced to his mother, we observed a delayed but gradual increase in the trend of independent manding with slight variability and 20% overlap. Consistent with his mother’s fidelity, there was an increase in the level of independent manding in the intervention phase. The posttraining sessions showed that Alex manded independently in 100% of opportunities and 76% across the 6-week maintenance probes.

Fig. 2
figure 2

Percent of Independent Mands

During baseline, Jackson’s level of independent manding was characterized by an initial increase with slight variability. During the intervention phase when mand training was preceded by his mother viewing the video model, the percent of independent manding increased above baseline levels. We observed a basic effect although there was a slight decreasing trend in the last three sessions that were inconsistent with his mothers’ pattern. Jackson manded independently in 55% of opportunities in the posttraining probe and about 48% across both maintenance probes in which mands were 33% and 64% independent, respectively.

Daniel’s results were the least remarkable. In baseline, independent manding was variable and at a level of about 12%. When he had access to other play items in addition to the target activities, he independently manded in 63% of opportunities. In the first intervention session, independent manding increased slightly above baseline levels, but within the overall range of baseline performance. Manding varied across the intervention phase, with independence falling below baseline levels in three sessions. Daniel manded independently in 43% of opportunities during the posttraining session and 33% in maintenance.

After the study, we measured mothers’ perceptions of the importance of mand training, acceptability of the procedures, and the significance of outcomes (see Table 2). Two mothers completed the social validity questionnaire. The mean for questions related to the significance of the intervention goals was 5; the mean for questions related to the feasibility of the procedures was 4.5, and the mean for the importance of the outcomes was 4.88. Neither respondent provided additional comments related to the procedures or outcomes.

Table 2 Results of the Social Validity Questionnaire

Discussion

We conducted this study to test the effectiveness of VMVOT to train caregivers to implement mand training with their young children with ASD. We trained three Black mothers to conduct mand training during 10-min play sessions in their homes using a 10-min VMVOT. We visually analyzed data within a nonconcurrent multiple baseline across dyads design and concluded that there was a functional relation between VMVOT and mothers’ overall treatment fidelity. The effects of mand training on child mands were not robust; however, these results must be interpreted in light of the fact that caregivers only implemented mand training with the target activities during the research sessions. In other words, children had no opportunities to practice manding for the target activities outside of sessions because the researcher had the materials. In addition, sessions occurred one to two times per week, resulting in child participants’ contacting only 10–40 min of mand training per week.

In baseline, caregivers demonstrated some components of mand training on their own but were far below the mastery criterion. This was unremarkable given that some of the steps might be common practice in any interaction between a parent and a child during play. Likewise, all the child participants manded in less than 50% of opportunities during baseline. Fidelity for all participants was characterized by an increase in level and trend when the VMVOT was introduced and zero overlap was observed. It is worth noting that participants spent a relatively brief amount of time viewing the video model before reaching 80% fidelity (20–40 min) as compared to other parent training studies in which participants spent an average of 17 hr (range: 1.5–30 hr) with a trainer (Lang et al., 2009). Although we did not evaluate generalization to novel contexts (e.g., snack time), all fidelity data represents generalization across stimuli, because the stimuli depicted in the video model were different from the preferred stimuli they used with their children during training.

One participant, Annette, never met the criterion; however, her fidelity immediately increased by 30% from the final baseline probe to the first intervention probe and remained relatively stable at a level of 73 throughout the intervention phase in which she viewed the video five times (50 min). Her performance indicates an increasing trend; thus, it is possible that more exposures to the video may have produced mastery. However, Annette’s fidelity was stable across five consecutive sessions and she consistently made the same errors related to maintaining control over the pieces and providing a prompt. It is possible that participants were unaware of which, if any, steps they were implementing incorrectly because we did not provide feedback. Although viewing the video multiple times could serve as a source of feedback for some individuals, Annette may have been unable to identify inconsistencies between her own behavior and that of the model, thereby explaining the repeated errors.

Annette’s consistent error patterns suggest that improvements were unlikely without feedback or coaching on these specific steps. Although previous researchers have concluded that a third of participants need additional coaching to achieve mastery (Erath and DiGennaro-Reed, 2020; Fettig et al., 2015; Martocchio & Rosales, 2017), we elected not to provide coaching or feedback. Our aim was to evaluate the extent to which caregivers could learn to implement mand training without researcher or practitioner mediation, for the purpose of improving outcomes for families without access to a service provider. However, future research could add a feedback component to the VMVOT for caregivers who do not meet mastery or could include a video self-feedback component, which may reduce the need for a trainer to be present.

All children engaged in some independent manding up to 4 weeks after the intervention ended, suggesting that they may have been under states of deprivation from the target stimuli. Another possible explanation for independent manding during the maintenance probes is participants may have continued implementing mand training with other stimuli in the absence of the video. The effects on manding were minimal for Daniel. Mand training requires that access to the reinforcer be made contingent upon the mand (whether prompted or independent). Daniel’s mother (Annette) frequently provided toys independent of the target response (i.e., when Daniel reached and whined), thus intermittently reinforcing some members of the response class preventing an increase in vocal manding. Previous research suggests that when members of a response class are not reinforced, emission of other members may increase (Carr & Durand, 1985; Drasgow et al., 2016). Further, Pence and Peter (2015) evaluated the effects of integrity errors during mand training and demonstrated that treatment integrity errors consisting of delivering the target item independent of responding can be detrimental to mand acquisition for children with few to no independent mands.

Another possible explanation for Daniel’s limited manding may be related to satiation. The proportion of independent mands were the highest in the MO evaluation, in which novel stimuli were available, and in the maintenance probe, which occurred after 4 weeks without access to the training stimuli, thus further supporting this explanation. He may have been more likely to request under extended periods of deprivation; that is, 1- to 2-day between sessions may not have been long enough to produce deprivation. Daniel’s results should be interpreted with caution, but also raise several questions for future research in terms of the effect of variable schedules of reinforcement for members of a response class, the influence of integrity errors, and the magnitude or length of states of deprivation.

Although a functional relation was demonstrated between the video model and parent fidelity, it should be noted that there were several manding trials in which the participants omitted critical steps. For example, Annette consistently had difficulty implementing the following steps: restricting access to the reinforcer, encouraging (prompting the mand), and simultaneously labeling the item while delivering it. Maintaining access and prompting the mand are critical in establishing a contingency between the child’s motivation and access to the desired item. This may have hindered the child’s mands as the mother intermittently provided access to preferred items without requiring a mand, and she inconsistently provided an echoic model for the mand. These inconsistencies suggest that merely providing mand opportunities is not sufficient in evoking a verbal response with learners with emerging mand repertoires (Douglas et al., 2018). Anecdotal observation showed that this led to vocal scrolling and grabbing. It is possible that the omission of these critical steps was reinforced by the experimenter not providing feedback. In addition, participants may not have recognized their errors and subsequently did not allocate more attention to those steps when watching the video model.

In addition to our primary findings, we measured maintenance of fidelity 1 month after the posttraining probe. Maintenance data indicate that participants’ fidelity remained above intervention levels when they were asked to conduct mand training without first viewing the video 4 weeks after the intervention ended. Maintenance data of children’s manding indicated some durability. These findings suggest that the VMVOT resulted in long-lasting improvements in participants’ fidelity of mand training with their child. It is important to note that participants may have received similar recommendations for teaching mands from their child’s teachers or other service providers during the break between intervention and maintenance probes; however, we ensured that they did not have access to our specific video model, and none received ABA therapy provided by a qualified practitioner during that time.

Finally, we measured the social significance of our goals, procedures, and results with posttraining surveys. Two participants responded favorably (ratings of “agree” or “strongly agree”) to all social validity items. These results suggest that, overall, the participants perceived that the components of the intervention were socially significant. One participant did not complete the survey. Readers should note that these finding may be limited to caregivers whose children are just beginning to acquire a manding repertoire for whom teaching single word mands during play are more appropriate. Because the VMVOT was designed with feasibility in mind, we hypothesize that caregivers of newly diagnosed children or those on treatment waiting lists who are not yet manding and exhibit little to no challenging behavior could view a brief video model (10 min) at their convenience and conduct mand training in their natural environment.

Limitations and Implications for Future Research

There are limitations to our findings. First, although a second observer indicated the researcher implemented the procedures as designed, we did not collect interobserver agreement on procedural fidelity. Such a measure would increase the trustworthiness of our procedures and contribute to advances in determining which interventions are effective. Second, all children were receiving preschool educational services and/or speech therapy during the study, thus it is possible that children and mothers had previous exposure to some or all the components of mand training evidenced by some fidelity in baseline. Melissa reported that she had been following the speech therapists ' recommendations to provide a model before delivering preferred items, although she was not instructed to make delivery of the reinforcer contingent upon her child ' s response nor had she been trained to conduct mand training with her son.

An additional limitation is that we did not conduct a component analysis of VMVOT, thus cannot draw conclusions about the separate effects of the video model, voice-over, or on-screen text. Future research might evaluate which component contributes to participants’ acquisition and fidelity of mand training. A component analysis would provide evidence of which components are necessary to demonstrate a basic effect, and therefore could guide the development of similar interventions to teach other skills.

A third limitation is that we did not teach caregivers how to respond to challenging behavior. Like Douglas et al. (2018) and Suberman and Cividini-Motta (2020), who anecdotally reported that some participants engage in minor forms of challenging behavior during mand training, all our participants engaged in some degree of challenging behavior during at least one session that may have interfered with caregiver fidelity and the percent of independent manding. Without training to address this inevitable side effect of increasing response effort and making access to reinforcers contingent upon manding caregivers and children may find communication training aversive and discontinue the intervention. However, it is important to consider whether the risk of minor challenging behavior outweighs no intervention at all.

We recommend several directions for future research on teaching caregivers to implement mand training interventions without researcher or practitioner mediation. As seen with previous research across interventions (e.g., Douglas et al., 2018; Penney & Schwartz, 2019) despite a functional relation being demonstrated with caregivers, not all children displayed immediate or large increases in the dependent variable. Our effects may be viewed as socially significant when weighed against no progress due to inaccessible caregiver training or mand intervention, particularly given that the total mand instruction time ranged from 1 to 2 hr (i.e., 6 sessions for Daniel and 12 sessions for Alex). Perhaps future research can evaluate caregiver fidelity and child mands by making mastery contingent upon increases in the child’s behavior or extend the intervention phase considering that mand acquisition may require thousands of opportunities (Sundberg & Partington, 1998). Participants in our study had an average of 15 mand opportunities per 10-min session (1.5 per min) during the intervention phase (an increase from an average of seven per session in baseline), and sessions only occurred one to two times per week. We might expect findings similar to Drasgow et al. (1996), whose participant required up to 173 mand opportunities across 34 days before she spontaneously manded for a target item. Additional consideration should be taken in that teaching mands concurrently may also influence acquisition latency.

Second, when minor challenging behaviors occurred, the researcher instructed the participants to do their best to conduct mand training. Future research should consider teaching caregivers specific strategies for addressing challenging behavior because attempts to prompt mands in this context may result in mands that are emitted under faulty stimulus control. In other words, although the child might emit the vocal response, it may function as negative reinforcement rather than a mand for the present item. Procedures that include steps for teaching caregivers to manage challenging behavior is particularly relevant because making preferred items contingent upon mands is likely to evoke problem behavior with individuals with an unsophisticated communicative repertoire. We would be remiss if we did not acknowledge the potential for the parent or play routine being established as a reflexive conditioned motivating operation (Michael, 2000); however, this is the risk in any mand training procedure for which the reinforcer is contingent upon the mand. On the other hand, it is possible that for some individuals, access to highly preferred activities with the parent may establish the play routines as conditioned reinforcement. These antithetical explanations are understudied and future research might investigate how problem behavior influences acquisition of mands.

Third, the researcher was present during all sessions. Although she did not provide instructions, prompts, or feedback, it is possible that the presence of the researcher served as a discriminative stimulus for conducting mand training. Future research might examine the effects of VMVOT in the absence of a trained professional to truly examine whether it is feasible for families to use independently.

Fourth, we taught participants to rely on behavioral indication to determine whether a mand should be prompted, but the presence of the item may have functioned as a discriminative stimulus for a tact; therefore, future research should evaluate the effect of caregiver fidelity on child ' s manding when items are not present to ensure they are not multiply controlled. An alternative might be to conduct a functional analysis of the response to determine if the child is manding (Lerman et al., 2005). Another point related to motivation is that we did not teach caregivers how to identify potential target activities through preference assessments. Future research might consider this when evaluating all the necessary skills a caregiver must have to effectively conduct mand training with little practitioner mediation.

One important area for future investigation should be how caregivers who are not receiving consultative services can gain feedback on their implementation. Perhaps embedding knowledge checks into the video model can promote acquisition; however, this may extend the length of the video thus contradicting the intended purpose of making training efficient and feasible for families without access to ABA services. Another consideration might be to encourage caregivers to record themselves implementing the procedures and monitor their behavior rather than use a checklist during implementation like McCulloch and Noonan (2013). The use of a self-monitoring checklist or referencing training notes (Martocchio & Rosales, 2017) while conducting mand training may remedy the problem with receiving feedback but may also interfere with the natural flow of instruction (i.e., the child may lose motivation while the caregiver is collecting data or referencing the checklist). In addition, it was not clear whether the self-monitoring checklist or online training videos were responsible for the change in implementers’ fidelity (Martocchio & Rosales, 2017; McCulloch & Noonan, 2013). On the other hand, video self-monitoring may allow caregivers to identify steps for which they need additional training and can subsequently fast forward the video to the respective model. Again, this may contraindicate the feasibility of using video modeling to train caregivers, but the benefit of improved fidelity and increases in child manding may counteract the additional response effort imposed on the caregiver.

We did not gather data on how the participants engaged with the video across sessions (e.g., how many times they rewatched specific clips). It may be noteworthy for future research to evaluate whether there is correspondence between engagement and fidelity. In addition, our video model did not include nonexemplars because we believed it may have been detrimental to acquisition given that sessions were brief and occurred one to two times per week. Future research might evaluate whether the inclusion of nonexemplars enhances discrimination. Finally, all families identified as Black and varied in socioeconomic backgrounds and education. The range of education and socioeconomic status may increase the study ' s external validity and suggest that families of diverse income and education can learn to conduct mand training; however, these effects should be evaluated across families of other ethnicities, socioeconomic, and educational backgrounds.

Conclusion

This study provides evidence that a brief VMVOT intervention, implemented without trainer involvement, enabled mothers of three boys with ASD to implement a mand training intervention with relatively high fidelity corresponding to small gains in some children while receiving less than 1 hr of instruction per week. Although continuous research on the efficacy of VMVOT as a standalone training for caregivers is warranted, these findings suggest that VMVOT is a viable option for feasibly disseminating interventions to families who are not receiving services, and some therapeutic benefits can be achieved for children. Future research should evaluate methods for training caregivers to manage challenging behavior, assess the feasibility of caregivers viewing the model and implementing the intervention in the absence of a trainer, teach caregivers to identify and remediate their errors through repeated access to the video, and evaluate whether the results generalize to other caregivers (e.g., fathers).