Augmentative and alternative communication (AAC) systems have been commonly used to enable functional communication in children with developmental disabilities (DD) who have limited communication skills (Ganz et al. 2012, 2013). Indeed, there have been successful demonstrations of using various types of AAC systems to enable functional communication in such children via AAC modes, such as manual signing (MS; e.g., Layton and Watson 1995), picture-communication systems (PCSs; e.g., Charlop-Christy et al. 2002), and speech-generating devices (SGDs; e.g., Millar et al. 2006). SGDs are electronic devices that produce a synthetic or digitized speech output of words, phrases, or sentences upon the activation of graphic symbols. One of the potential advantages of SGDs, relative to other AAC methods, is that these devices produce speech output that is akin to natural speech production and which thus might be easier for listeners to interpret.

Schlosser and Wendt (2008) conducted a systematic review of the literature and found that aided AAC interventions were unlikely to impede natural speech development and were effective at increasing vocal speech in some individuals with disabilities. Several factors might play a role in the effectiveness of SGDs in speech development in some individuals with DD who use AAC methods. From a behavioral perspective, automatic reinforcement could be a contributing factor (Mirenda 2003). The simultaneous presentation of graph symbols with the immediate and consistent model of speech output followed by positive socially mediated reinforcement could increase the use of the AAC mode and also induce corresponding natural speech production via a process such as modeling/imitation. For example, in Kasari et al. (2014) study, children with autism spectrum disorder (ASD) and minimal vocal repertoire showed significant improvement in their spoken language after introducing SGD with a naturalistic behavior intervention; which entailed providing language models and encouraging vocal imitation through play contexts. Of course, any such facilitative mechanism would seem to depend on the child possessing prerequisite vocal imitation skills. In light with this hypothesis, several studies have demonstrated that participants who could not vocally imitate sounds or speech did not exhibit improvement in their speech development compared to others with such skills (Ganz et al. 2008; Gevarter et al. 2016).

Only a few studies have investigated the impact of synthetic speech output on vocal speech in individuals with DD. Romski et al. (2010) used random assignments to compare augmented and natural speech interventions on the development of language skills in young children with DD. These authors found communication interventions that included SGDs were superior to the natural speech-only interventions in the enhancement of language performance. Similarly, Kasari et al. (2014) conducted a longitudinal study that aimed to determine the effectiveness of adding SGDs to joint attention and milieu teaching procedures in developing natural speech production in young children with ASD. In this study, the participants exhibited significant gains in their spontaneous vocal language.

Moreover, other researchers have manipulated the availability of SGDs or speech output to determine the effects of the auditory stimuli from the SGD on the production of natural speech (Roche et al. 2014; Schlosser et al. 2007). Roche et al. (2014), for example, examined the effects of SGDs on natural speech production in children with DD with limited social-communication skills. The investigators used systematic instruction (e.g., least-to-most prompting, constant-time delay, differential reinforcement) to teach participants to use an iPad to mand for preferred items and also examined if the participants produced any natural speech. The findings indicated that the participants’ vocal rates were higher after learning to use the SGD to mand for preferred items. In a similar study by Schlosser et al. (2007), the investigators determined the effectiveness of teaching augmented mands using SGDs with and without speech output in children with ASD. The instructional strategies that were used in both conditions were least-to-most prompting, delayed prompting, and constant-time delay. These authors reported that some participants exhibited improvements in their augmented mands and vocalizations when the speech output was available. Overall, these studies results are suggestive that speech output could facilitate natural speech production depending on how SGD use is taught.

In terms of teaching approaches, several studies have evaluated the effects of behavior intervention packages for teaching both augmented and non-augmented language skills to children with DD (Cagliani et al. 2017; Carbone et al. 2010; Gevarter and Horan 2018; Gevarter et al. 2016; Sigafoos et al. 2011; Tincani 2004; Tincani et al. 2006). For example, results of a study by Carbone et al. (2010) support the use of prompt delay and echoic prompting along with MS during mand training for the development of vocalizations in children with DD. Furthermore, incorporating delayed reinforcement and prompt fading during training with the Picture Exchange Communication System (PECS) has been reported to increase vocalizations (Cagliani et al. 2017; Tincani 2004; Tincani et al. 2006). Tincani (2004) and Tincani et al. (2006) suggested that a 3–4-s delay to access a preferred item contingent on a picture exchange may lead to the development of intelligible word and approximate vocalizations. Similar results have also been reported by Cagliani et al. (2017). Cagliani et al. suggested that the use of delayed reinforcement during PECS training increased vocalizations and picture exchange to mand for preferred items in three participants with DD.

In addition to the empirical results indicating the effectiveness of combining a behavior intervention package with no-to-low-tech AAC, a few studies have replicated those procedures during an SGD-based intervention. Sigafoos et al. (2011) manipulated the presence and the length of speech output to determine effects on augmented mands and on natural speech production by children with DD. The study was implemented in two phases. In Phase, I, the participant’s response was reinforced by giving access to requested items when augmented mands occurred under the three different output conditions (i.e., short message, long message, and no speech output). In Phase II, extinction was implemented in a randomly selected condition to determine whether extinction would increase vocalizations via the effect of an extinction burst. The results revealed that by placing augmented mands on extinction (i.e., withholding reinforcement) the participants’ vocalizations increased. However, these preliminary results need to be further investigated to see if the effects of synthetic speech output on the development of speech production by children with DD can be replicated.

Along these lines, Gevarter et al. (2016) explored the extent to which children with autism and limited vocal speech would emit spontaneous vocalizations while using an iPad loaded with the Go Talk Now™ application for manding. The researchers implemented a reinforcement delay and differential reinforcement of vocalizations with four children with autism between the ages of 4 and 7 years. The children had mastered the ability to use the iPad to mand before the study. The children were taught to activate a single symbol to access a preferred item. Two of the four children needed additional echoic prompts to increase their target vocalizations. The results indicated that applying echoic prompts in combination with SGDs increased vocalizations in some children with autism and vocal imitation skills.

In another relevant study, Gevarter and Horan (2018) evaluated the effects of applying delayed and differential reinforcement (Phase I) and echoic prompts (Phase II) during the implementation of SGD-based instruction in the acquisition of vocalizations in children with ASD. A total of six participants with varying degrees of vocal imitation skills and AAC experience with picture-based communication systems and SGDs learned to emit both vocalizations and SGD-based responses during mand training. The results of Phase I indicated that delaying access to the desired item after activating the corresponding symbol and differentially reinforcing independent responses increased the vocalizations of three participants. For Phase II, the addition of echoic prompting with the previous behavior procedures improved vocalization rates and close approximations in five out of the six participants. Furthermore, vocalizations continued to occur when the SGD was removed for five participants during generalization.

The results of Gevarter et al. (2016) and Gevarter and Horan (2018) studies provide further support for the use of behavior intervention packages and SGD-based interventions for increasing natural speech production. Even though these results are encouraging, there is still a need to determine the effects of combining both interventions for children with no to minimum prior exposure to SGDs. Because prior studies focused on single-step manding (activating a single symbol to request a single item), it is unknown whether increasing the response effort when using the SGD for manding influences the rate of vocalizations. In addition, some participants in both studies (Gevarter et al. 2016; Gevarter and Horan 2018) were introduced to echoic prompting within a behavior package, however, the current study aimed to investigate the effects of applying echoic prompting to all participant from the onset of the intervention package. Another important research question to consider is whether early exposure to both synthetic speech output and echoic prompting from the onset of the intervention would lead to an immediate change in the level of vocalizations. Hence, the purpose of the current study was to combine a behavior intervention package (Gevarter et al. 2016; Gevarter and Horan 2018) with SGD removal (Roche et al. 2014) during aided AAC instruction to teach augmented and vocal mands to three children with DD, limited vocal imitation skills, and no prior exposure to SGDs. This study aimed to address the following questions:

  1. 1.

    What are the effects of a behavioral intervention package (least-to-most prompting, echoic prompting, progressive time delay, and differential reinforcement) on augmented and vocal mands?

  2. 2.

    If the behavioral intervention package did not lead to mastery of vocal manding, would subsequent removal of the SGD increase vocal manding?

  3. 3.

    To what extent do vocal mands maintain in the absence of synthetic speech output?

  4. 4.

    To what extent do both augmented and vocal mands generalize across teachers in the absence of synthetic speech output?

Method

Participants

Three children participated in this study. Table 1 presents information regarding the participants’ communication characteristics. The participants met the following inclusion criteria: (a) an age between 4 and 8 years old, (b) a diagnosis of ASD and/or DD, and (c) absent or weak mand repertoire based on the Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP; Sundberg 2008) Barriers Assessment, and (d) a score between 1 and 20 on Group 1 (simple reduplicated syllables) on the Early Echoic Skills Assessment (EESA; Esch 2008). The second author followed the EESA protocol to assess participants’ skills 1 week prior to baseline.

Table 1 Participants’ communication characteristics

Jackson was an 8-year-old African-American male with a developmental disability whose first language was English. Based on EESA and classroom observations, Jackson was able to imitate one-syllable words (e.g., bye). Because his speech was unintelligible, Jackson would point to things or engage in challenging behavior (i.e., cry, scream) as a means of communication. Jackson received speech therapy 30 min a week. He had no history of using any form of communication device. Jackson used an iPad to play educational games in the classroom.

Carlos was a 5-year-old Latino-American male with a developmental disability whose first language was Spanish. Carlos was observed to be non-vocal. When asked to echo simple and reduplicated syllables (e.g., wow, oo) from the EESA, he did not emit any vocalizations. Carlos communicated through pointing to things or using two manual signs (i.e., please, eat). Carlos received speech therapy 30 min a week that consisted of teaching him manual signs. Carlos had no history of using any form of communication device. Carlos would sometimes use an iPad to play educational games in the classroom.

Sarah was a 6-year-old Caucasian-American female with a medical diagnosis of ASD whose first language was English. She was also diagnosed with DD and sensory integration disorder. At birth, she was diagnosed with fetal alcohol syndrome. Sarah was able to imitate one-syllable words. She was also observed saying two-syllable words or phrases, however, inconsistent and irrelevant to the context (e.g., it’s mine, snowman, at you). As shown in Table 1, Sarah’s score on EESA was 3, which may not represent her actual vocal abilities. Sarah often engaged in noncompliance, and because emitting echoics during EESA was not reinforced (see EESA protocol), Sarah’s performance on the instrument was low. Overall, Sarah rarely communicated her needs and wants. She would scream “no” when she refused to do what she was asked (e.g., checking her visual schedule). Sarah received speech therapy 30 min a week. She had no history of using any form of communication device. Sarah was allowed to use an iPad to play educational games in the classroom, however, she did not like to play.

Setting

The study took place in a self-contained classroom for students with ASD/DD in a rural public school located in the southeastern region of the United States of America. Baseline, intervention, and follow-up sessions took place in the classroom’s independent work area. The interventionist set up the area to have two desks adjacent to each other (to have more room to display the items) and two chairs. The participant sat to the interventionist’s left side. The independent work area was screened by a partition. Other students were not allowed to be in the area during the sessions to avoid potential interruptions. The generalization sessions took place in the small group area of the classroom. The table was U-shaped, and the participant sat to either the teacher’s left or right side. One to two students were allowed to be at the same table at the same time (working on his/her literacy or math tasks). Other students were instructed to be in their areas in accordance with their individualized daily schedules. The sessions were 5 to 10 min long, conducted once or twice per day, three to four times per week, over the course of 17 weeks. The sessions were videotaped to aid recording of interobserver agreement and procedural fidelity. The second author, a third-year doctoral student in Special Education at a local university, collected data during preference assessments, baseline, intervention, and follow-up sessions.

Materials

An iPad (version 11.2.2), used as a dedicated SGD, was loaded with the GoTalk Now application (The Attainment Company2017). GoTalk Now is an AAC application that allows customization based on the user’s communication level and interests. The application generates a synthetic speech output (e.g., “popcorn”) upon touching a symbol that corresponds to a preferred item. The GoTalk Now application was placed on the iPad’s homepage. The homepage displayed no other applications. When a participant selected the GoTalk Now application, it led to the display screen that contained symbols of the participants’ preferred items. The display screen on the GoTalk Now application was customized for each participant to display nine symbols, at least three of which were highly preferred items and the rest were neutral items. Each symbol contained a real photograph of the item and a text caption that labeled the item. Pressing any of the nine icons led to activation of speech output. The locations of the symbols were randomly rotated prior to each session to prevent position-bias. In addition, a stylus was solely provided for Sarah due to her fine motor skills issues.

Experimental Design

A multiple-probe design across participants (Horner and Baer 1978) was used to examine the effects of using GoTalk Now application and a behavior intervention package on augmented and vocal manding. All three participants began baseline at the same time. After achieving a stable baseline path, the intervention was introduced to Jackson first due to his challenging behavior. After achieving a stable data path and an increased trend in intervention for Jackson, the intervention was introduced to Carlos. The same procedure was followed for Sarah. The participants had to achieve the mastery criterion of 80% or more across three consecutive sessions on augmented and vocal manding to transition to generalization.

Dependent Variables and Data Collection

The number of independent (i.e., without prompts) and accurate (i.e., symbol selected and vocalizations produced matched the item selected) augmented and vocal mands was recorded in each session. Augmented manding was defined as activating the (a) GoTalk Now icon on the iPad’s home page, and (b) the symbol that corresponded to the selected preferred item within 5 s of the verbal cue (i.e., pick one). Two criteria had to be met for a response to be recorded as correct: (a) pressing the icon or the symbol one time with enough pressure to produce the synthetic speech output, and (b) not touching any other part of the iPad within 3 s of producing the synthetic speech output. Vocal manding for Jackson and Sarah was defined as vocalizing the same sounds and number of syllables as the word of the selected item (e.g., saying “cookies” for cookie) either before or within 5 s after the synthetic speech output. For Carlos, vocal manding was defined as vocalizing an approximate of at least one sound or syllable of the selected item (e.g., “buh” for balloon) either before or within 5 s after the synthetic speech output.

Procedures

Stimulus Preference Assessments

Preference assessments were conducted in two stages: indirect and direct. First, the classroom teacher was asked to identify the top five preferred items for each participant. Five preferred toys that were reported by the teacher were then assessed using a multiple stimulus without replacement procedure (MSWO; DeLeon and Iwata 1996). All five items were presented on a table in front of the participant. Then, the participant was instructed to pick one item. When the participant pointed to, reached for, or picked an item, the remaining items were removed and the selected item was recorded. The participant was allowed to play with the toy, or watch a video, for 20 s. The aforementioned procedure was continued until all items were selected or not selected. The order in which the items were chosen was recorded. This assessment was conducted four times across four non-consecutive days. Furthermore, through naturalistic observations, the interventionist observed the participants consume edibles provided by the teacher several times. The preferred items that were included in the sessions consisted of music videos, toys, and edibles. Based on the results of preference assessments, three items were selected for each participant. Items selected for each participant did not have to be from each of the aforementioned categories. In other words, a participant could have three preferred items from a single category (e.g., edibles). For Carlos, the labels of selected items were either one or two syllables (i.e., popcorn, candy, and ball). For Jackson and Sarah, the labels were either two or three syllables (e.g., Look at Me, Baby Shark). Labels of the selected items included some sounds in children’s repertoire except for Carlos who had not been observed to produce any vocal sounds.

Baseline

The iPad was turned on with the GoTalk Now icon shown on the display screen and placed within the participant’s reach. Three to four preferred items were placed on a table but out of the participant’s reach. Each session consisted of 10 trials, and each trial commenced by instructing the participant to pick one preferred item. When the participant attempted to reach to one item, the interventionist held the item in hand and waited for 5 s for an augmented and/or vocal mand to occur. When augmented and/or vocal mands occurred independently and accurately within the 5 s interval, the participant was given the requested item for 30 s. In cases where augmented and/or vocal mands occurred independently before the participant tried to reach for an item, the interventionist instructed the participant to take it (e.g., “OK, you can have it”). When the participant did not mand using the iPad or vocalizations, the participant was given the item s/he was trying to reach for 30 s. If the participant did not play with the item or consumed the edible within 5 s, the trial was redone. The session was terminated if the participant refused to participate by not reaching for an item or by getting up off the seat and leaving/trying to leave the designated area. No prompts were provided in this phase.

A pre-generalization session was conducted by the classroom teacher to determine if the participant would use the iPad to mand or vocalize to access preferred items across the classroom lead teacher in a different setting in the same classroom. Prior to the session, the interventionist trained the teacher to follow the procedural steps by providing correct and incorrect examples of target responses as well as role-playing the procedural steps. Training mastery criterion was 100%, that is, the teacher had to demonstrate all steps with 100% fidelity in training.

Intervention

The procedure was similar to the baseline, except that systematic instruction (i.e., least-to-most prompting, progressive-time delay, differential reinforcement) was used to teach augmented and vocal mands. Progressive time delay was implemented by delaying the delivery of the prompts by 1 s in the first two trials and adding an additional 1 s in the subsequent set of trials until reaching a terminal delay of 5 s. Further, least-to-most prompting (i.e., verbal, gesture, physical) was implemented by providing verbal prompts (i.e., instructions to activate the symbols) when no augmented mands occurred during the 1–5 s interval. When the participant did not respond to the verbal prompt within the timeframe, a gestural prompt was used by pointing to certain symbols to mand for preferred items. If no response occurred, the intrusiveness of the prompt increased by providing physical prompting (e.g., gently guiding the participant’s hand to activate the symbols).

When incorrect augmented mands occurred, the interventionist implemented error correction procedure by (a) blocking access to the tablet and the preferred items, (b) pointing to the correct symbol and verbally labeling it, (c) providing least-to-most prompt, and (d) allowing access to the preferred item after selecting the correct symbol.

For vocal mands, the interventionist provided an echoic prompt (e.g., “candy”) when no vocalizations occurred either before or within 5 s of the synthetic speech output. If the participant did not vocally mand within 1–5 s interval, the interventionist repeated the prompt one more time. If the participant did not respond, the interventionist provided a distractor trial which consisted of a motor imitation single-step direction (e.g., “touch your shoulders”), and then delivered the desired item.

Differential reinforcement was implemented by giving the participant 30 s access to the requested toy/video, or two small pieces of the requested edible when both independent and accurate augmented and vocal mands occurred. If either or both responses were prompted or incorrect, the child was given 15 s access to the requested toy/video, or one small piece of the requested edible. Further, verbal praise and social reinforcement were also given to the participant for independent and accurate responses.

Participants transitioned to the next phase when they (a) did not make progress in vocal mands after criteria for augmented mands had been reached, and (b) had an average of 60% or less in independent and accurate vocal mands across 10 sessions.

iPad Removal

This phase was identical to intervention except that the iPad was made unavailable. For correct and independent vocalizations, the participant was given 30 s access to the requested toy/video, or two small pieces of the requested edible. Praise was also given to the participant for independent vocalizations (e.g., “Good job, you said ball.”). For prompted vocalizations, the participant was given 15 s access to the requested toy/video, or one small piece of the requested edible. This phase was added to promote the participants’ vocal manding.

iPad Reintroduction

This phase was added to determine if the vocal manding would maintain after the reintroduction of the iPad. Procedures for this phase were identical to the intervention.

Stylus (Sarah)

Because Sarah had engaged in noncompliance by refusing to place her index finger on an icon for two consecutive intervention sessions, a stylus was added to support her fine motor need. The stylus remained in use throughout the rest of the phases.

Response Blocking (Sarah)

Response blocking was introduced to Sarah due to her fine motor issues, which as a result caused her to activate symbols inaccurately (i.e., touching a symbol multiple times). Response blocking was implemented by placing a hand on Sarah’s and gently holding it off the screen to prevent repeated tapping. The response blocking was gradually faded to increase independent and accurate augmented mands.

Generalization

Generalization sessions were conducted by the classroom teacher who ran the pre-generalization session during baseline. The sessions took place in the small group area in which the teacher conducted small-group instruction. Generalization started after at least 1 week of the last intervention session. The procedures were identical to baseline except that the speech output was off. This was done to determine if the vocal mands could occur without the control of the synthetic speech output. The same preferred items used in baseline and intervention were used in generalization. The generalization phase started 1 week after the termination of intervention for Jackson, and 3 weeks after the termination of intervention for both Carlos and Sarah (due to a school break).

Follow-Up

Follow-up sessions took place after at least 2 weeks of the last generalization session. The sessions were conducted by the interventionist in the same area as baseline and intervention sessions. The procedures were identical to baseline except that the speech output was off. The same preferred items used in baseline and intervention were used in follow-up.

Interobserver Agreement

Interobserver agreement was determined by comparing the interventionist’s data to the data extracted from the videotapes by an independent observer (first author). Thirty-percent of randomly selected videotaped baseline and intervention sessions were coded. Interobserver agreement was calculated using a trial-by-trial formula by dividing the number of agreements by the total number of agreements and disagreements multiplied by 100 (Kazdin 1982). A mean of 100% agreement was recorded for both augmented and vocal mands during baseline. During intervention, the reliability averaged 93% (SD = 1.5; range, 70–100%) for augmented mands, and 96% (SD = 4.2; range, 70–100%) for vocal mands.

Procedural Fidelity

The procedural fidelity was assessed by the first author reviewing 30% of videotaped sessions for each baseline and intervention. The first author recorded whether the interventionist implemented the procedural steps with accuracy by using two separate checklists for baseline and intervention. Procedural fidelity was calculated by using the following formula: number of steps implemented correctly divided by the total number of steps multiplied by 100. Overall, the average procedural fidelity for baseline was 100% and 93% (SD = 4.6; range, 88–100) correct implementation for intervention. The fidelity checklist can be requested from the authors.

Social Validity

The interventionist provided the teacher and teacher assistants with the Behavioral Intervention Rating Scale (BIRS; Elliott and Von Brock Trueting 1991) to evaluate their perspectives of the acceptability and the effectiveness of the intervention. On the scale, there are 24 items that were ranked on a 6-point Likert scale (6 = strongly agree, 1 = strongly disagree). The scale assesses the social significance of the target behaviors as well as the acceptability, suitability, generalizability, effectiveness of the intervention, and the sustainability of the outcomes.

Results

Data were examined using a combination of visual analysis and calculation of effect size between baseline and intervention. Visual analysis included an inspection of the graphed data to identify changes in level, trend, and variability (Gast and Spriggs 2010), with a nonparametric method for analyzing single-case data, Tau-U (Parker et al. 2011) was calculated for each dependent variable to measure the effect of the intervention. Tau-U scores ranged from 0 to 1, and can be interpreted per the following range: scores greater than 93 indicate a strong or large effect, scores 66–92 indicate a medium to high effect, and those below 65 indicate a weak or small effect (Parker and Vannest 2009). The analysis of Tau-U scores suggests that the behavior intervention package and the iPad-based SGD were moderately effective in developing augmented and vocal mands in children with ASD and other developmental disabilities. The weighted average of Tau-U for augmented mand was 0.88, with 95% confidence intervals (CIs) between 0.66 and 1, which indicate that 88% of the data demonstrated improvement after the intervention. The weighted average Tau-U for vocal mands was 0.83, with 95% (CIs) between 0.62 and 1, indicating that 83% of the data showed improvement.

Augmented Mand

Figure 1 shows the number of independent and accurate augmented mands for all participants. None of the participants used the iPad to mand for preferred items during baseline. During the intervention, some participants met the acquisition criteria rapidly. For example, Jackson independently and accurately manded using the iPad across all preferred items on an average of nine times (SD = 2.10; range 3–10) and met the criteria in four sessions. Others required more trials to acquire the use of the iPad to mand for preferred items. Carlos manded for his preferred items on an average of six times (SD = 3.6; range 0–10) and met the acquisition criteria in eight sessions, while Sarah manded an average of five times (SD = 3.2; range 0–9), and met the criteria in 16 sessions. Visual inspection of the top panel of Fig. 1 indicates an immediate change in level compared to baseline, a steep upward trend, low variability, and no overlap. In contrast, intervention data points in the middle and bottom panels show a gradual change in level and an upward trend with high variability and overlap. When the iPad was reintroduced for all participants, the data indicate a high level, a positive trend with low to moderate variability, and no overlap.

Fig. 1
figure 1

Number of independent and accurate augmented and vocal mands across participants. Note. BIP = behavior intervention package. Circle data points represent the sessions conducted by the researcher. Triangle data points represent sessions conducted by the classroom teacher

During generalization and follow-up probes, some participants continued to perform at a high level, and the data showed low variability. Jackson and Carlos were successful in using the iPad to mand for preferred items across people and during the follow-up probes. They manded an average of  nine times (SD = 0.5; range 9–10) and 10 times (SD = 0; range 10), respectively. Sarah’s performance, on the other hand, started at a high-level during generalization probes, and the data showed a decreased trend and high variability. She used the iPad to mand an average of seven times (SD = 2.5; range 4–10). However, Sarah’s data during follow-up probes indicated a high level, increased trend with low variability. Her augmented mands averaged seven times (SD = 2; range 5–9).

Vocal Mand

Figure 1 illustrates that participants had no independent and accurate vocal mands during the baseline. Across the intervention, participants required more trials to meet the acquisition criteria for vocal mands compared to augmented mands. Jackson’s vocal mands averaged five times (SD = 3.8; range 0–10), and he met the acquisition criteria in 11 sessions. Carlos and Sarah’s vocal mands averaged four (SD = 3.8; range 0–10) and five times (SD = 2.8; range 0–10) and met the criteria in the 23rd and 28th sessions, respectively. It is worth noting that Carlos’ criteria for vocal mands entailed producing an approximation of a sound of the item’s label. He produced “ba” for ball and popcorn, and “ca” for candy. Visual analysis of Jackson and Carlos’ data, during the intervention, revealed a gradual change in level, a positive trend with high variability, and overlapping data points between baseline and intervention. For Sarah, intervention data indicate a clear change in level compared to baseline, an upward trend with high variability, and overlapping data. When the iPad was removed, the data points show a high level, positive trend, low to moderate variability, and little overlap across participants. When the iPad was reintroduced, all participants’ vocal mands continued to perform in the same manner.

The visual inspection of the generalization probes in Fig. 1 revealed that the participants showed a different level of performance in the vocal mands. For Carlos and Sarah, for example, the data showed stability and were clustered around the low values on the y-axis, vocal mands averaged 0 (SD = 0; range 0) and one time (SD = 0.58; range 1–2), respectively. Jackson, however, was able to generalize vocal mands across different people and to maintain the acquired skill during follow-up probes, his performance averaged nine times (SD = 0.5; range 9–10). During follow-up probes, Sarah vocally manded for preferred items in a few trials, her performance averaged one time (SD = 0.5; range 1–2). Carlos’ vocal mands during follow-up probes contrasted his performance during generalization, his vocal mands averaged four times (SD = 3; range 1–8). The data showed a high level, increased trend with no variability.

Social Validity Results

Teachers rated the intervention to be acceptable and effective on the BIRS. The overall average rating across all items averaged 4 (agree) with a range from 2 to 6. The ratings of the teacher (A) averaged 5 (agree) and ranged between 4 and 6. For teachers B and C, the ratings averaged 4 (agree) with a range from 2 to 6 and 2 to 5, respectively.

Discussion

The results of the study suggest that the behavior intervention package, when combined with SGD-based intervention, resulted in increases in augmented and vocal manding. Specifically, all participants exhibited an increase in their use of both SGD and vocal-based manding during the intervention phase. However, the magnitude of their responses to the intervention varied. Still, these results are consistent with those of previous studies (e.g., Gevarter et al. 2016; Gevarter and Horan 2018), which indicate that combining an SGD-based intervention with behavior approach techniques may improve augmented and non-augmented communication skills in children with DD.

With regard to the augmented mands, Jackson and Carlos mastered the use of the iPad to mand rapidly compared to vocalizations. A plausible explanation for this difference in outcomes is that activating symbols on the iPad required less effort than vocalizing sounds/words to access preferred items for these two participants. According to Johnston et al. (2004), individuals tend to engage in a communication form that requires less effort but results in access to the reinforcers, which was supported by the results of previous studies (e.g., Torelli et al. 2016) that compared the acquisition of and preference for two different aided AAC modalities. Indeed, manipulating the response effort has shown to influence the response allocation for one communication modality compared to another. For example, in the Cagliani et al. (2017) study, the participants increased their use of vocal responses when they had to perform multiple steps to mand using PECS.

The synthetic speech output and the echoic prompts demonstrated a moderate effect in terms of improving vocalizations for some participants. Jackson and Carlos’ independent and accurate vocal mands were less than 40% prior to the removal of the iPad. Sarah’s vocal mands, on the other hand, were independent and accurate roughly half of the time. A possible explanation for Sarah’s increased use of vocal mands compared to the other participants was her difficulty activating the symbols on the iPad due to her fine-motor skill deficit. After removing the iPad, all of the participants’ vocal mands increased substantially, indicating that vocalizations may in fact increase after the removal of an SGD (Roche et al. 2014; Sigafoos et al. 2011). When the iPad was reintroduced, the participants were able to maintain both vocal and augmented mands. The differential results of the occurrence of vocal mands across participants might be due to the varying levels of the participants’ vocal imitation skills. For example, even though Jackson’s ability to vocally imitate single-syllable words was stronger than that of Carlos and Sarah, he required additional trials to reach mastery. Similar results have also been found in other studies (Cagliani et al. 2017; Gevarter et al. 2016; Gevarter and Horan 2018), which suggest that strong vocal imitation skills may not be a predictable factor of immediate changes in vocal behavior.

The findings of the current study also suggested that augmented manding generalized across the classroom teacher for Jackson and Carlos. These two participants maintained augmented manding in follow-up sessions, suggesting that the removal of the synthetic speech output did not affect their SGD-based responding. Although Sarah exhibited variability in augmented manding, her augmented manding was higher in generalization and follow-up than at the baseline and slightly lower than in the intervention. One possible reason for Sarah’s lower performance in augmented manding is the absence of auditory stimuli due to activating symbols, which has been reported to affect the frequency of SGD-based responding in some cases (Schlosser et al. 1995, 2007). Jackson was able to generalize and maintain his vocal responses in contexts when synthetic speech output was unavailable. However, vocal manding was lower for Sarah and substantially lower for Carlos in generalization compared to intervention, suggesting that these participants’ vocal responses may still be an effect of the synthetic speech output; thus, their vocal responses in the intervention phase may be interpreted as echoic rather than manding. Nevertheless, Carlos exhibited a substantial increase in vocal manding in follow-up sessions compared to the generalization sessions.

The combination of a behavior intervention package and SGD-based intervention was effective in improving functional communication skills for all participants. Similar results were reported in studies that employed both non-augmented (e.g., spoken language) and low- (e.g., Charlop-Christy et al. 2002; Ganz and Simpson 2004; Greenberg et al. 2014) to-high tech AAC intervention (Kasari et al. 2014; Romski et al. 2010; Schepis et al. 1998), which suggest that AAC may facilitate the development of speech production in children who have limited-to-no vocal skills.

Overall, the results from this study could be seen as helping to extend the literature by demonstrating that adding additional procedures (e.g., least-to-most prompting, time delay, differential reinforcement, echoic prompt) to SGD-based interventions may enhance augmented and non-augmented functional communication skills in children with DD who have minimum vocal imitation skills. Furthermore, the participants were able to maintain their vocalizations after the removal of the iPad, suggesting that SGD-based intervention can be faded out without negatively affecting speech production.

Limitations and Directions for Future Research

The current study has a few limitations that can be addressed in future research. First, while all participants scored at level 1 on EESA, Carlos was the only one who appeared to use the least functional speech (he only said “blue, please, right here” during the study, one time each, with different adults), and primarily depended on prelinguistic communication (e.g., pointing to items). Carlos successfully made vocalizations during intervention phases that consisted of the first sound of the desired item (e.g., “ca” for candy, “ba” for a ball). However, unlike the other two participants, he never said the full word of the desired item. While this situation was an improvement for Carlos compared to baseline- when he did not make any vocalizations-, future research should include participants who have similar vocal imitation repertoires. Another important consideration would be to incorporate other tools besides EESA to assess vocal imitations skills of participants. As previously pointed out, emitting echoics during EESA is not reinforced (see EESA protocol). Thus, participants may be less likely to imitate echoics during the assessment due to a lack of reinforcement and not necessarily a lack of capabilities.

Second, vocal manding did not maintain in generalization for Carlos and did not maintain in generalization and follow-up for Sarah. This situation may be attributed to the long time period that elapsed between the last intervention session and the first generalization session for those two participants (i.e., over 3 weeks). While this gap in time was inevitable, one option for maintaining vocal manding would have been to conduct a few generalization probes during intervention as well as after the termination of intervention. Therefore, future research should probe generalization across all phases of a study and gradually fade out speech output in the intervention phase.

In addition, generalization was probed across the classroom lead teacher only. While this study is one of the very few mand investigations to measure generalization across a communicative partner (e.g., Genc-Tosun and Kurt 2017), generalization could have also been measured across teaching assistants in the classroom or other students and staff at the school. Therefore, future research should examine the extent to which augmented manding and vocal manding of children with DD can generalize across various novel people, settings, and items.

Lastly, the ease with which the participants operated the iPad was far from uniform. Sarah demonstrated difficulty activating the symbols accurately and independently due to her deficits in fine motor skills. As a consequence, her vocal responses increased during the early intervention sessions. After modifying the access method to accommodate her motor ability, Sarah’s accuracy in selecting symbols improved substantially. Future replications should consider the participants’ motor abilities prior to introducing a high-tech SGD intervention.

Implications for Practice

The use of an iPad as an SGD may be a viable alternative for children with DD who have limited speech skills. In addition, the intervention consisting of progressive time delay, least-to-most prompting, and differential reinforcement with SGDs was effective at increasing the participants’ vocal manding. Therefore, practitioners working on promoting manding skills of children with DD who have limited speech skills may consider using an iPad as an SGD. It is worth noting that an intervention package may be necessary for a successful implementation. However, there is a need to conduct component analyses to determine which behavior-based strategy is most effective in supporting functional communication skills across different modalities. Further, practitioners should fade out the use of SGDs. The process can start with gradually limiting the availability of such devices or removing the speech output/SGDs to help promote the development of speech production. Equally important is determining a child’s preferred stimuli that can serve as reinforcers. According to Skinner (1957), a mand is a verbal operant under the control of either a deprivation condition or an aversive stimulus, which creates the establishing operation to ask for the reinforcer. Therefore, it is critically important to create an environment that contains children’s preferred stimuli that can evoke manding for reinforcers.

Conclusion

The current study aimed to investigate the effects of a behavior intervention package and an SGD on the acquisition of vocal and augmented mands for three children with DD. The results suggested that all children acquired augmented mands as well as vocal mands. However, vocal mands did not appear to generalize across the classroom teacher for two children and did not maintain for one child. Future research is critically needed to examine ways to support the generalization and maintenance of vocal manding in children with DD who use SGDs.