Introduction

Discrete-trial Teaching (DTT) is one of the main teaching procedures used in Early Intensive Behavioral Intervention (EIBI) programs for children with autism. It is employed to teach language, social, and academic skills (Leaf and McEachin 1999; Lovaas and Smith 2003; Sturmey and Fitzer 2007). In DTT, the teaching process is broken down into three-term contingency units including a clearly defined instruction, response, and consequence, followed by a pause of no more than 5 s (inter-trial interval). If a learner responds incorrectly, prompts are given. To monitor children’s progress, staff typically record all trials presented as correct, incorrect, or prompted. Although DTT is highly effective when delivered correctly, a number of studies have shown that its utility may be dramatically reduced when staff who deliver it are not adequately trained (Allen and Warzak 2000; Smith and Lovaas 1998; Symes et al. 2006).

These data indicate the importance of developing efficient training procedures to ensure that DTT can be delivered with high levels of fidelity. Such training requires considerable time and resources, especially within larger scale organizations (e.g., Perry et al. 2008). For example, to provide the recommended standard EIBI program of 30 h per week, one child typically requires an intervention team of between two and five therapists (Green 1996). To provide enough staff, many service providers enroll graduate or undergraduate students as therapists for brief periods (i.e., between 2 and 6 months) in a training practicum. Moreover, most providers experience high levels of staff turnover, not least because the work is challenging, and, in many cases, levels of remuneration are relatively low.

High turnover necessarily increases both the demand for, and importance of, efficient and effective staff training. Although some elements of such training, including grounding in the principles of Applied Behavior Analysis (ABA), can be delivered in a classroom setting, training that relates to actual implementation of teaching procedures must, at some point, be undertaken through real-life interaction with actual children with autism (i.e., in vivo; Joyce and Showers 2002). Such training may, however, be problematic, because children with autism can react negatively to new staff, especially if the latter are inexperienced and lack basic teaching and behavior management skills.

A number of researchers have focused on designing and evaluating methods of training staff to improve their implementation of DTT. For example, Koegel et al. (1977) have shown that teachers can acquire DTT skills through training procedures that include the use of training manuals, direct modeling of DTT procedures, and feedback on performance. Such training, however, was reported to have taken up to 25 h to complete. In a similar study, Ryan and Hemmes (2005) used a training package consisting of structured teaching sessions, videotaped instruction, role-playing, and in vivo training, supplemented by printed materials on DTT and other autism-related issues. Because of the importance placed by these authors on the acquisition of high levels of declarative knowledge (i.e., knowledge of the principles of behavior, developmental disabilities, professional behavior), training continued until participants achieved 100 % accuracy in written and oral quizzes. This process required between 25 and 35 sessions, each of between 1 and 2 h duration. The findings of both Koegel et al. (1977); Ryan and Hemmes (2005), therefore suggest that achievement of high standards in staff training can potentially raise direct resource costs and lead to short-term service disruption.

Improvement in DTT implementation has, however, also been achieved through other modes of training. Sarokoff and Sturmey (2004), for example, evaluated a training package consisting of instructions, feedback, rehearsal, and modeling. Correct implementation of DTT by all participants increased from an average of 45 % at baseline to 98 %, post training. The number of sessions required to achieve this outcome was not, however, reported and only three participants were trained. It should be noted, nevertheless, that similar training procedures can produce generalization of teaching skills across teaching programs and children (Sarokoff and Sturmey 2008). Furthermore, Arnal et al. (2007) reported use of a self-instructional manual to teach four staff to apply DTT across three tasks. Subsequent to a mean of 2.2 h exposure to the training manual, participants’ mean accuracy in using DTT increased from 44 % at baseline to 67 %, but this improvement was measured in a teaching test with a role-playing confederate, rather than with a child with autism.

Although research has shown that traditional forms of training can increase the accuracy and consistency of DTT, it has also shown that such training is typically both costly and time consuming to implement. Moreover, familiarizing staff with the actual practice of DTT is difficult, not least because it requires time to supervise direct practice with children with autism and/or to role-play them. Additionally, the majority of the studies that have evaluated such procedures have not used direct measures of in vivo DTT implementation. These considerations, coupled with the high rates of staff turnover, underline the pressing need for a way of delivering practical DTT skills training without the costly involvement of trained clinicians or children with autism.

Randell et al. (2007) reported a potential solution to this problem through the use of DTkid, an interactive computer simulation developed as a training tool for tutors and carers of children with autism. Within DTkid, users interact with “SIMon”, a virtual child with autism, and can either receive real-time onscreen feedback on their actions (i.e., “teaching mode”) or simply have the accuracy with which they present discrete-trials evaluated within the program (i.e., “evaluation mode”). At present, both “object matching” and “receptive labeling” can be taught and evaluated within DTkid. Using undergraduate participants with no prior experience of behavioral interventions for autism, Randell et al. (2007) showed that DTkid training produced significant improvements in both declarative and procedural knowledge of DTT, in comparison with a control group who engaged in an unrelated interactive computer game for the same amount of time prior to assessment. Declarative knowledge was measured by the accuracy and confidence with which participants categorized video clips of correctly and incorrectly presented individual real-life DTT trials. Procedural knowledge (i.e., participants’ ability actually to perform DTT) was assessed using DTkid in evaluation mode.

Although Randell et al.’s (2007) study provided some grounds for confidence that DTkid training increased procedural knowledge; it offered no evidence that the skills learned would transfer to in vivo teaching sessions with children with autism. Thus, the present study’s primary research question was: Does DTkid training produce stimulus generalization of DTT skills? We hypothesized that simulation training would lead to an improvement in the capacity of staff to implement DTT in vivo. A second research question was: Does DTkid training also produce response generalization? We hypothesized that participants who had learned how to teach specific skills (e.g., object matching) using DTkid would show subsequent improvements in their ability to teach other skills (e.g., imitation). The final research question was: Does participants’ declarative knowledge of DTT increase as a result of DTkid training. Based on Randell et al.’s (2007) findings, we hypothesized that this would be the case.

Method

Participants

A convenience sample of 12 novice tutors (“staff participants”; 11 women, 1 man, M age = 23 years, age range: 19–28 years) who were to be employed full time at a center providing behavioral interventions in the UK took part in the present research. None had previous knowledge of behavior analysis or experience in implementing DTT or any similar techniques. One held a Masters level degree in psychology and six had Bachelors level degrees in psychology or a related discipline. The remainder had high-school degrees and courses in general child care. All were informed of the nature of the study and gave informed consent prior to participation.

The novice tutors taught three boys with autism, aged 3, 5, and 9 years during the in vivo pre- and post-tests. The boys’ diagnoses were made by a clinician independent of the present study. All children had been enrolled in intensive behavioral intervention programs at the center for between 1 and 3 years. None displayed challenging behavior that required any special procedures to be implemented. They were recruited to the study via a letter to their parents, who in turn signed an informed consent form allowing participation and videotaping for scoring purposes.

Setting

The center provided attending children with intensive behavioral intervention for 40 h per week (i.e., weekdays from 9 a.m. to 5 p.m.). The center was organized as a charity and placements funded by local authorities on an individual basis. As part of this provision, the two oldest children participating in the study also attended a local mainstream school for approximately 10 h a week, accompanied by center staff. All training sessions took place at the center.

DTkid training sessions were conducted in an office equipped with a desk, a chair, and a laptop on which DTkid software was installed. The in vivo implementation of DTT was carried out in a therapy room at the center. The room was equipped with desks, chairs and teaching materials required for the various teaching programs.

Apparatus

The DTkid simulation was developed by Randell et al. (2007) with the support of a research grant from the Economic and Social Research Council of Great Britain. Its design was based on consultation with experienced ABA service providers who were regularly responsible for new staff training. Copyright of the application has been asserted, but it has not been marketed commercially. Those wishing to use DTkid for research purposes should contact the 5th or final author of the present study.

The DTkid software was installed on an Acer Aspire laptop with a 17 inch screen, running Windows XP Professional with peripheral mouse and headphones. A Sony DCR HC36 digital video camera positioned on a tripod was used to record all teaching sessions.

Measures

DTkid software in evaluation mode (see Randell et al. 2007: see Appendix 1) was used to assess participants’ DTT competence in both object matching and receptive labeling training within the simulated teaching environment. The Evaluation of Therapeutic Effectiveness (ETE) Scoring Sheet (adapted from Koegel et al. 1977: see Appendix 2) was used to measure participants’ competence in implementing DTT. This was done in vivo across three teaching programs, with a child with autism. A measure of declarative knowledge of DTT developed by Randell et al. (2007) was also obtained. This Video Observation Test required participants to view standardized clips of a teacher implementing DTT, categorize her performance in each as correct or incorrect, and report on their confidence in their judgments. Implementation details of all three measures appear in the Procedure section below.

Design and Analysis

The study employed a within-subjects pre-/post-test (AB) design for all 12 participants. For five participants, however, a second pre-test was also carried out prior to DTkid simulation training (i.e., AAB design) to control for the possible impact of repeated testing and exposure to testing materials. Thus, in the control condition one-way within-subjects analyses of variance could be conducted to check for changes between the three testing periods (n = 5), followed by within-subjects contrasts to compare both pre-test one with pre-test two and pre-test two with post-test. Paired t tests were used to check for changes between pre-test one and the post-test (N = 12; the training condition). These comparisons were made for all dependent variables including percentage correct scores on (a) measures of receptive labeling, expressive labeling, and verbal imitation obtained during an in vivo DTT test; (b) measures of object matching and receptive labeling obtained with DTkid in evaluation mode; and (c) declarative knowledge and confidence in scoring on the Video Observation Test. For both pre-tests and the post-test, order of testing was randomized.

Procedure

All staff participants completed all the pre-tests and post-tests and training across 1 or 2 days (i.e., between 5 and 9 h) when they first joined the center. All pre-tests and the post-test were identical. The tests were all administered at the center and included the following measures. One set of testing took approximately 1 h to complete (i.e., in vivo DTT test: 15 min; DTkid: 25 min: Video Observation Test: 15 min).

In Vivo DTT Test

Participants completed three 2.5 min sessions with a student with autism, during each of which they were instructed by the experimenter to teach the child “to the best of their ability”. Three types of teaching program (receptive labeling, expressive labeling, and verbal imitation) were selected, one for each of the sessions. Center staff selected specific content within each program that the child had yet not mastered (i.e., which objects to use and which sounds or words to present for imitation). The participants were given standardized (scripted) instructions for each program. For the receptive and expressive language programs the following instruction was given: “The child is currently working on improving his vocabulary. Could you, to the best of your ability, teach him to point to (for receptive programs)/name (for expressive programs) this new item (usually a picture of a novel item).” For the verbal imitation program the following instruction was given: “The child is currently working on imitating sounds/words. Could you to the best of your ability teach him to imitate this sound/word.” Although all programs included were considered “basic”, only the receptive labeling program was part of the DTkid simulation training. Thus, the in vivo test provided the opportunity to assess both stimulus generalization (i.e., from DTkid to in vivo teaching) and response generalization (from a familiar to a novel DTT program) in relation to teaching skills acquired using DTkid. All sessions were videotaped for scoring using the ETE scoring sheet (see Appendix 2) which evaluates five different components of DTT: instructions (5 items), prompting (3 items), shaping (1 item), consequences (8 items), and discrete-trial structure (5 items). Participants’ performance in each of the three 2.5 min programs was scored in 30 s intervals from the video recordings. In each interval, an item was scored as correct only if all trials in that interval met the relevant ETE criteria. Items not observed in the interval were categorized as not applicable. On the basis of these data, a percentage score reflecting the correct application of DTT components within each program was calculated, along with a mean score for each test.

The second author scored the video tapes. Two Master level students in behavior analysis from the University of Bangor were recruited to check reliability of scoring. Before the reliability testing was conducted, the students were trained to an 85 % inter-observer agreement criterion with one of the first two authors. Approximately 20 % of the sessions were randomly chosen and checked for reliability: All were above the 85 % criterion using the formula: agreements/(agreements + disagreement) × 100. The students were blind to whether they scored a pre-test or a post-test session.

DTkid Evaluation Mode

DTkid did not provide any onscreen feedback to users relating to their performance, but simply recorded trials as either correct or incorrect, based on set criteria (see Appendix 1 of Randell et al. 2007). These data were used to calculate the percentage of discrete-trials performed correctly. During DTkid pre-tests, participants first received brief verbal instructions and a demonstration of the use of DTkid from the experimenter (first or second author), including how to use the mouse to select the various training stimuli, and how to select teaching instructions and feedback from the set menus (see Randell et al. 2007, for further details). No verbal instructions concerning the implementation of DTT were provided. Following familiarization with the simulation software, participants were given the opportunity to complete three practice trials with the experimenter present to help resolve any practical issues. Next, each participant worked with DTkid in both object matching and receptive labeling modes, each for 10 min.

Video Observation Test

Identical to Randell et al.’s (2007), measure of participants’ declarative knowledge of DTT, three practice and 24 test video clips (between 11 and 18 s long) were presented, each of a teacher implementing DTT in a program designed to teach object matching to a child with autism (role-played by a typically developing 8 years old boy). Eleven clips showed the teacher presenting a discrete-trial correctly, and 16 clips showed the teacher presenting a discrete-trial incorrectly (see Randell et al. (2007) for a full description of error-types illustrated). Onscreen instructions informed participants to use answer booklets provided to judge each discrete-trial presented as either correct or incorrect, and, as appropriate, to add a brief description error-type. Participants were additionally asked to indicate their confidence in each judgment made on a scale ranging from 0 to 10 (i.e., from “not at all confident” to “very confident”). Progress through the Video observation test was self-paced. Participants first observed and rated all practice clips, immediately followed by the 24 test clips. Participants indicated to the experimenter when they had completed the test.

Training

Following pre-test(s), participants learned to implement DTT through interaction with DTkid simulation in teaching mode. This involved working through two teaching programs (receptive labeling and object matching), each for 10 min. Participants were verbally instructed that the procedure for using DTkid in teaching mode was the same as in evaluation mode, except that, if they made a procedural error during training, they would receive corrective onscreen feedback (see Randell et al. 2007, for full details) that explained how to correct any mistake made and which would remain on the screen until the correction was made. Only when a trial had been performed correctly, or errors made in a trial had been corrected, would the next DTkid training trial be presented.

Post-test

The post-test was conducted directly following the training, normally on the same day or the next morning.

Results

Mean scores on all study measures for all 12 staff participants, together with test statistics for pre-post comparisons, are presented in Table 1. Four paired sample t tests were conducted on the data from all participants to establish whether, as hypothesized, there were significant improvements in in vivo DTT implementation (as measured by the ETE) between pre-test one and the post-test. Average scores pooled across the three teaching programs improved significantly, t(11) = 5.9, p < .001, reflecting improvements in DTT performance on each of the programs considered individually; receptive labeling, t(11) = 2.6, p = .023; expressive labeling, t(11) = 4.5, p < .001; and verbal imitation, t(11) = 2.8, p = .017. Likewise, there was a significant improvement between pre-test one and the post-test in participants’ DTkid evaluation mode performance (i.e., teaching within the simulation), t(11) = 6.9, p < .001, mirroring improved performance on both receptive labeling, t(11) = 6.7, p < .001, and object matching, t(11) = 6.3, p < .001. Finally, significant improvements were also seen on the two Video observation tests; declarative knowledge t(11) = 2.8, p = .016, and confidence in scoring t(11) = 2.8, p = .019.

Table 1 Mean pre-test and post-test scores for the full sample (n = 12)

To evaluate whether the changes reported above were the result of DTkid training, or repeated evaluation, the data from the five individuals who participated in the control condition were analyzed separately. Four one-way within-subject analyses of variance were conducted to assess changes in in vivo teaching between pre-test one, pre-test two, and the post-test. As shown in Table 2, there was a main effect of time on group average scores pooled across teaching programs for the in vivo teaching, and for each of the three individual programs (receptive labeling, expressive labeling and verbal imitation). Within-subjects contrasts on the these scores showed, however, that the there was no significant change between pre-tests one and two, F(1, 4) = 1.55, p = .281, ηp2 = .281, but marked improvement occurred between pre-test two and the post-test, F(1, 4) = 8.67, p = .042, ηp2 = .684. This pattern of results is reflected in the analyses of the individual measures although some variability was observed.

Table 2 Mean scores and comparisons at each time point for the control participants (n = 5)

Results of the five control participants’ interactions with DTkid in evaluation mode showed the same pattern as for the in vivo teaching (i.e., no changes between pre-test one and pre-test two, but a significant improvement in the post-test). Again, one-way within-subjects analyses of variance were conducted to assess changes in performance with DTkid in evaluation mode between pre-test one, pre-test two, and post-test. There was a main effect of time on the pooled scores and on each of the two teaching programs tested (receptive labeling and object matching). Within-subjects contrasts on the pooled scores showed that the there was no significant change between pre-tests one and two; F(1, 4) = 0.359, p = .581, ηp2 = .082, but marked improvement occurred between pre-test two and the post-test, F(1, 4) = 9.38, p = .038, ηp2 = .701. This pattern of results was also seen in each of the two DTkid teaching programs analyzed separately.

Because no significant changes were detected for the overall effect of time on either declarative knowledge F(1, 4) = 2.72, p = .126, ηp2 = .405, or confidence in scoring, F(1, 4) = 2.80, p = .120, ηp2 = .411 on the Video observation test for the five control participants, within-subject contrasts on these measures were not conducted.

To establish that the improvements seen for the whole group were not the result of repeated testing, the same analyses were performed for the subgroup of seven participants who did not receive a second pre-test (see Table 3). For these participants, pooled scores across the three in vivo teaching programs improved significantly consequent to DTkid training, t(6) = 3.6, p = .011, as did pooled scores on the two DTkid evaluation mode tests, t(6) = 6.6, p = .001. Enhancements in performance on each of the DTkid programs were also significant, but, despite improvements on each the in vivo programs (generally with medium to large effect sizes) only improvement on the expressive labeling program, which had the lowest baseline score, reached significance, t(6) = 3.4, p = .015. Improvements in the Video observation test measures also approached significance t(6) = 2.3, p = .061 for declarative knowledge and t(6) = 2.1, p = .084 for confidence in scoring.

Table 3 Mean pre-test and post-test scores for the participants that only had one pre-test (n = 7)

Discussion

The present study was designed to evaluate the utility of the DTkid simulation program to train staff to implement DTT in vivo with children with autism. Results indicated that simulation training on two teaching programs resulted in significant improvement in the accuracy with which participants’ implemented DTT within DTkid in evaluation mode, and, crucially, when working directly with a child with autism in a real-world teaching setting. These latter, in vivo, improvements were observed across three programs, even though only one of these (receptive labeling) had been specifically practiced using the DTkid simulation. In keeping with previous findings (Randell et al. 2007), results also indicated significant increases in participants’ declarative knowledge of DTT and their confidence in that knowledge consequent to DTkid training. The subgroup of participants that received only one pre-test showed similar post-test improvements to those seen in the overall group, whereas the subgroup of participants who took part in a second pre-test conducted as a control condition showed no such improvements in either DTkid evaluation mode or in vivo performance between pre-test one and pre-test two (i.e., prior to DTkid training), but did show an overall improvement at post-test. This pattern of results strongly suggests that the improvements observed following DTkid training were not simply the result of repeated testing.

The results observed therefore indicated that, as hypothesized, DTkid training produced both stimulus and response generalization: Participants learned the teaching skills involved in object matching and receptive labeling using DTkid, and improvements were seen when these skills were tested within the simulation. It is also of considerable interest that improvements in receptive labeling observed using DTkid in evaluation mode were mirrored by improvements in participants’ ability to teach receptive labeling in vivo. This can be regarded as indicating the kind of stimulus generalization expected from any effective simulation training (e.g., pilot instruction in a flight simulator). Additionally, however, participants showed improvements in delivering two kinds of DTT programs that had not been specifically trained in the simulation—expressive labeling and verbal imitation. This type of response generalization indicates that, subsequent to DTkid training, participants were able to use the basic elements of DTT across novel situations. It thus appears that participants had acquired a broader response class of general teaching skills, learning, for example, how to present material, when to prompt, and how to administer contingent consequences for the child’s responding. Although no formal assessment of social validity was attempted, responses to the training program at debriefing were uniformly positive.

These findings therefore also extend those of previous research that used only DTkid in evaluation mode to measure the impact of simulation training (Randell et al. 2007) by showing that DTkid training can also result in the emergence of in vivo procedural knowledge (i.e., teaching skills in real-life settings).

As hypothesized, and keeping with the findings of Randell et al. (2007), significant increases in participants’ declarative knowledge of DTT, and in their confidence in that knowledge, were observed. The magnitude of such increases was smaller in the present research, however, perhaps owing to the substantially higher pre-test scores obtained; 71.2 % correct, compared with 59.3 % for the control group participants in the Randell et al. (2007) study. This suggestion is also supported by the fact that post-test scores for the DTkid group reported by Randell et al. (2007) study were virtually identical to those observed in the present study; respectively, 80 and 79.5 % correct. The discrepancy in scores between those obtained by the control group in Randell et al. (2007) and the pre-test scores in the present study requires further explanation. It is possible that the latter scores were higher because participants recruited by the center where the work was based had chosen to work with children with autism. In other words, our sample was characteristic of individuals who would typically receive training in the use of instructional methods with children with autism.

In the light of the baseline differences in participant knowledge and confidence between Randell et al. (2007) and the present study, it is worth noting that, although our participants were not complete novices, their skills were nevertheless enhanced by DTkid training. Mean pooled in vivo scores for participants prior to any training was 61 % correct (range: 56.9–63.6 %). Therefore, the significant changes in real life teaching seen reflected improvements from a relatively high baseline, and required only very brief DTkid training (20 min in total on two basic teaching programs).

Some limitations of the present study should be noted. Firstly, use of more comprehensive methods for evaluating the quality of participants’ DTT performance would have been desirable. Although a revised version of scoring procedures developed by Koegel et al. (1977) was employed, this measure has not been standardized or validated. Although no alternative was available at the time that the research was carried out, this measure does have the advantage of being in widespread use among behavioral clinicians working in the field of autism. Secondly, control procedures for repeated testing were limited by practical considerations. Because of this, although the subgroup of participants that received repeated pre-tests served the same purpose as a waiting list control group, the times of the first and second tests were not aligned between groups in the manner typical of larger scale studies and the data obtained could not be analyzed as a between-group design. Thus, although changes in performance following interaction with DTkid were significant, inferences of a causal role for the simulation must be made with caution until replication using standard randomized control procedures has been achieved using a larger number of participants. Despite such limitation, the present results would nevertheless appear sufficient to justify investment in the further development of DTkid as a simulation methodology and its further evaluation in controlled trials.

In summary, a simple simulation program, DTkid, has been shown to be an effective way of teaching inexperienced staff to conduct relatively simple programs with children who are familiar with DTT. DTkid’s efficacy in teaching more advanced programs and procedures is unknown because the capabilities of the software are not yet sufficiently developed. In principle, however, it should be possible to use simulations to teach a novice therapist how to conduct a teaching situation with a child who is just starting on an early intervention program, how to establish a token economy system, how to pace teaching sessions, and how to handle challenging behaviors. As yet, however, DTkid’s ability to teach these important skills awaits development.

In conclusion, the results of the present research indicate that the DTkid allows some of the key elements of DTT to be taught and evaluated effectively and efficiently via short sessions of computer simulation. This offers the potential to free up resources for service providers and to eliminate, or markedly reduce, the problems that arise when children with autism are taught by novice tutors. Because simulation training is unlikely ever to cover all aspects of staff education, however, it remains essential to identify which skills can best be taught by such means, and to consider how simulation can most effectively be used alongside other evidence-based techniques such as modeling, group training, lectures, and direct feedback. In this way the amount of staff training involving real children may be reduced, and available resources targeted towards the use of other methods to establish skills that cannot be taught using simulation techniques.