Introduction

Language delays are frequently associated with autism spectrum disorders (ASD), including those with average intelligence (American Psychiatric Association (APA) 2000; Nation and Norbury 2005). No more than a quarter of children with autism have language skills in the normal range (Kjelgaard and Tager-Flusberg 2001). Language development varies widely among children with ASD (Lord et al. 2004), and often includes deficits in spontaneous language, difficulty with conversational skills, delayed grammar usage, frequent use of echolalia, difficulty with social use of communication skills, or lack of spoken language (APA 2000). This variation is also found in rates of vocabulary development (Smith et al. 2007). Further, young children with ASD often perform higher on non-verbal IQ tests than verbal IQ tests and those with average or better IQs perform poorly on auditory and short-term memory subtests (Mayes and Calhoun 2003). Both children with higher and lower IQs perform poorly on verbal comprehension assessments (Mayes & Calhoun). Such language deficits point to the need for explicit instruction, such as Direct Instruction (DI), to address language skills.

Direct Instruction shares characteristics with other behavioral approaches in the following ways: (a) through task analysis, program skills and tasks are broken into components parts and taught to mastery; (b) there are sets of teacher behaviors and procedures such as the provision of instruction (model, lead, test), and immediate corrective feedback (model correct response, lead, test); (c) students engage in repeated practice with the correct response; and (d) program procedures are designed so that the learning environment and teacher behaviors set the stage for effective and efficient learning. According to Carnine et al. (2004) and Stein et al. (1997), DI is comprised of a set of directions for implementing instruction so that students acquire, maintain, and generalize skills, ideas, and concepts in an efficient and effective manner. There are three essential components of DI: (a) instructional design; (b) presentation techniques; and (c) organization of instruction. Instruction is designed so that the curriculum is divided into strands and students engage in learning tasks from several different strands within the same lesson. There are prescriptive presentation techniques for the teacher which vary based on the learning objective. Instruction is organized so procedures for scheduling and arrangement of materials allow students to maintain engagement throughout lessons (Carnine et al. 2004).

Little research has been conducted reporting the effects of DI on language development. Waldron-Soler et al. (2002) implemented DI Language for Learning (Engelmann and Osborn 1999) with 16 preschool children, including four with developmental delays (DD), and compared them to a control group of 20 preschoolers, including four with DD. They found that the children with DD in the Language for Learning group had more growth in receptive and expressive language skills and greater reduction in behavior problems than children in the control group. Humphries et al. (2005) investigated the use of several DI programs, including Language for Learning, with 55 children with epilepsy and significant academic deficits. The participants demonstrated significant improvement in most areas of academic instruction, including language. Language for Learning also resulted in significant gains in receptive language when implemented with typically-developing kindergarteners in comparison with a control group (Benner et al. 2002). This literature, though sparse, suggests DI is a promising practice, particularly for students who do not easily learn language skills incidentally, such as those with ASD. Direct Instruction, as noted above, is particularly suited for use with individuals with ASD who lack a considerable amount of common language concepts and who require intensive, explicit instruction to learn such skills.

DI interventions have resulted in improvements in reading skills in children with such deficits. Specifically, DI has positively affected reading decoding (Fredrick et al. 2002; Shippen et al. 2005) and reading comprehension (Carlson and Francis 2002; Flores and Ganz 2007). DI has improved reading skills in children from elementary (Carlson and Francis 2002; Humphries et al. 2005) to middle school (Grossen 2004; Shippen et al. 2005). Additionally, DI has been used successfully with children with a variety of abilities, including autism (Flores and Ganz 2007), epilepsy (Humphries et al. 2005), learning disabilities and cognitive impairments (Carlson and Francis 2002), those with limited English proficiency (Carlson and Francis 2002), and students at risk (Carlson and Francis 2002; Grossen 2004).

Only one published study has investigated the impact of DI on children with ASD. Flores and Ganz (2007) reported the results of DI to improve reading comprehension in children with developmental delays and autism. Specifically, they investigated the impact of a DI reading comprehension program on the reading skills of four children, two of whom had autism and reading comprehension delays, using a single-subject multiple probe design. Results indicated that there was a functional relation between the DI intervention and three reading comprehension skills, specifically using analogies, using facts, and statement inference. These results were replicated for each of the four participants.

Although the majority of studies on the effects of DI interventions have had positive results, some have not reported overwhelmingly positive results or have reported mixed or inconclusive results. MacIver and Kemper (2002) reported the results of a large-scale comparison of DI with other reading interventions implemented in general education elementary schools. Results indicated that DI did not result in significantly better reading achievement than other reading programs. Similarly, Ryder et al. (2006) reported the results of a study comparing DI intervention only, a combination of DI and other reading instruction, and the other reading intervention only at three general education elementary schools. Results indicated that, while DI was effective, it was not more effective than a combination approach.

The purpose of this study was to extend the research on the use of DI to the remediation of oral language skills in elementary children with ASD. The research study investigated the effectiveness of a DI program with regard to a specific oral language skill in three children with ASD, specifically investigating their performance on identifying materials of which objects are made. Though this is a specific, seemingly inconsequential skill, it was chosen because all of the participants lacked this ability prior to intervention, and it is just one of many aspects of conversation and learning that children with ASD often lack when compared to their peers, for whom similar skills come easily and incidentally.

Methods

Participants

Three elementary participants were chosen from the participating class based on their performances on the placement test for DI Language for Learning (Engelmann and Osborn 1999). Each participant made some errors on the placement test that qualified them to begin Language for Learning at Lesson 41. Students in the class who placed out of Language for Learning began Corrective Reading Thinking Basics: Comprehension level A (Engelmann et al. 2002) and did not participate in this study. Other participants were excluded from this study because they placed significantly lower on the placement test than the three who participated. Direct Instruction requires that group members be at approximately the same level.

The participants previously attended public schools and were eligible for and received special education at their public schools. Each participant was diagnosed with an autism spectrum disorder (ASD) by a medical or educational professional independently of this research. The researchers confirmed the participants’ diagnoses through implementation of the Childhood Autism Rating Scale (CARS; Schopler et al. 1988). Because the focus of this study was on language skills, the researchers also conducted a Test of Nonverbal Intellegence-3 (TONI-3; Brown et al. 1997) and a Test of Language Development-Intermediate: 3 (TOLD-I:3; Hammill and Newcomer 1997). Background information on the participants is summarized in Table 1.

Table 1 Participant information

Kyle was a 10-year-old boy who was diagnosed with autism at age 2-years by a developmental pediatrician. This diagnosis was confirmed by a current CARS score in the severe autism range (Total score = 38.5). His score on the TONI-3 indicated “average” intellectual ability (Q = 95) and his scores on the TOLD-I:3 were all in the “very poor” range. Kyle often spoke spontaneously and initiated conversations, though he did so in a rote manner. That is, upon greeting someone, he frequently asked the same questions each time, including delayed echolalia, though he used it in a way that fit the circumstances. This was evident as the study progressed as Kyle frequently initiated conversations with us by asking of what materials different items were made. Kyle’s use of speech was typically concrete and rote and he had difficulty with abstract concepts.

Aidan was a 10-year old boy who was diagnosed with autism by a developmental pediatric group at age 2½-years. His diagnosis was confirmed by a current rating in the moderate autism range on the CARS (Total score = 36.5). Aidan’s intellectual ability was “below average” according to the TONI-3 (Q = 85) and his scores on the TOLD-I:3 were in the “very poor” range. Though Aidan could speak spontaneously, he did not frequently initiate conversations. He was able to answer basic conversational questions, though he had difficulty answering questions that were novel or abstract. Aidan occasionally engaged in delayed echolalia, repeating lines from books or movies.

Nico was an 11-year-old boy who was diagnosed by a medical professional with pervasive developmental disorder-not otherwise specified (PDD-NOS) and dyspraxia at 6½ years of age. His diagnosis was confirmed by a score in the “mild autism” range on the CARS (Total score = 33). He scored in the “poor” range on the TONI-3 (Q = 76) and in the “very poor” range on all categories of the TOLD-I:3. Nico enjoyed initiating conversations, though he often did so in a rote manner. That is, most of the time, when one of the researchers approached Nico, he would ask the same question each time, such as, “wanna go to [the school supply store]?” Nico had some difficulties with articulation and was often difficult to understand.

Setting

This investigation took place in a southern urban setting within a K-12 private school for children with disabilities, specifically within a classroom for children with ASD and developmental delays. The classroom was staffed by two state-certified teachers, one of whom was a Board Certified Associate Behavior Analyst (BCABA). The class included ten students, five of whom had ASD, four of whom had mental retardation, and one who had attention deficit hyperactivity disorder.

Materials

Language instruction was provided using a DI program, Language for Learning (Engelmann and Osborn 1999). Materials included a teacher presentation book with scripts. Lessons in the presentation book consist of instruction in several strands. For the purposes of this study, one strand was chosen: identification of common materials. This strand was chosen because it was one that all of the participants had not mastered when baseline data was collected and it appears early in the teacher presentation book. Other strands were excluded because one or more of the participants demonstrated mastery during baseline data collection. The lessons consisted of instructor scripts and drawings of items to show the participants.

Materials made of the first four common materials were incorporated as well, including a shirt, pants, a robe, a paperback book, a paper napkin, a tissue, a pen, a CD jewel case, a plastic bag, a wallet, a belt, and a leather shoe.

Response Definition and Measurement

The researchers designed language probes modeled after the tasks included in the DI program (Engelmann and Osborn 1999). These probes were used to measure the dependent variable, identification of items made from different materials. Probes were given during baseline and occurred on instruction days prior to instruction, approximately three days per week. The probe consisted of eight statements asking the participants to name two items made of each of the following materials: cloth, paper, plastic, leather, glass, wood, metal, and concrete. The statements were read orally to the participants individually and each participant was required to respond orally while the researcher recorded his response. Statements were given in random order for each probe. Correct responses required the participant to name two items made from the material, beginning within three-seconds of the statement being asked. Possible answers were open-ended and not limited to the items taught during the lessons. For example, “Tell me two things that are made out of wood.” A correct response was, “a chair and a table.” Naming the same item twice, giving another name for the material, and listing items not made of the specified material were not correct responses. Credit was not given for partial responses. Each probe was scored according to the total correct out of eight possible responses (i.e., correctly naming two items for each of the eight materials would result in a score of eight out of eight).

Throughout instruction, probes were given approximately two to three times a week, depending on student attendance and school holidays. Probes were not given on Mondays and were given prior to daily instruction. All eight requests to name items made of the materials were given at each probe session. That is, there were eight possible correct responses for each probe session.

Treatment Integrity and Reliability

The researchers provided instruction according to a checklist of teacher behaviors prescribed in the Direct Instruction program (Engelmann & Osborn, 1999). Once each week the researchers observed each other providing instruction. Each of the treatment integrity observations was performed with 100% accuracy. Inter-observer reliability was calculated as the total number of agreements divided by the total number of agreements plus disagreements, then multiplied by 100. Inter-observer agreement was assessed throughout baseline, treatment, and maintenance for 38% of Kyle’s probes with reliability averaging 98% (range = 88%–100%), assessed for 42% of Aidan’s probes with reliability averaging 100%, and assessed for 44% of Nico’s probes with reliability averaging 100%.

Procedures

Pre-Experimental Skill Assessment

Prior to any instruction, the researchers administered the placement test. Instruction took place during regularly scheduled instructional time, for approximately 20 min a day. One of the two researchers conducted instruction in a group format. One day each week, both researchers were present to assess treatment integrity and the instructor role switched from week to week. The first author provided instruction 3 or 4 days per week and the second author provided instruction 1 or 2 days per week. Instruction occurred for approximately 3 months.

Baseline

Baseline data were collected until each participant demonstrated a consistent level of performance. Instruction began with cloth, paper, and plastic because these three materials appeared in the teacher presentation book in the first materials lesson. Instruction continued individually with leather, then glass, then wood. Instruction for metal and concrete did not occur due to the school year nearing an end.

Intervention

The researchers implemented instructional procedures and behaviors as specified in the teacher’s guide (Engelmann and Osborn 1999), including: (a) following the script for the strand with minor modifications; (b) requiring choral student responses; (c) using an explicit signal to cue student responses; (d) using correction procedures for incorrect or non-responses; and (e) modeling correct responses, chorally responding with the students, then asking the students to respond independently. At times, the scripts required students to be asked questions individually and these procedures were followed.

While instruction followed the scripts provided with the teacher presentation book, two modifications were made. First, because the students required more concrete representations than those provided in the presentation book alone, the scripts were supplemented with a collection of three or four actual items made from each material. For example, in the first materials lesson, circles of plastic, paper and cloth are to be presented. Instead, the researcher referred to a pen, book, and pants. Other lessons were followed as written with the inclusion of a variety of items provided made of plastic, paper, cloth, and leather. When instruction on glass and wood began, the items placed on the table were faded and items throughout the room and pictures in the presentation book were referred to. The other modification that was made was repetition of each lesson until all of the participants had demonstrated mastery of the specified materials on the probes. Instruction progressed to the next lesson once all the participants had mastered the current material.

Experimental Design

A single-subject changing criterion design was employed in this study. Changing criterion designs allow for gradual, systematic manipulation of a target behavior and do not require a withdrawal or return-to-baseline phase (Richards et al. 1999). Changing criterion designs are particularly useful when the target behavior is initially performed at low rates, or is not yet displayed by the participant, and is a behavior that would lend itself to improvement in increments (Kazdin 1982). Effects of the intervention are observed when the target behavior increases in a stepwise trend to criteria predetermined by the researcher. These changes take place within subphases in which criteria are increased or decreased incrementally and the target behavior rapidly improves within each subphase. Two or more shifts in behavior in the expected direction across the subphases are required to demonstrate a functional relation between the intervention and the behavior. Data are examined via visual inspection to determine if the target behavior changed as predetermined, in the predicted direction, and this change is replicated. Instruction during this study continued at each criterion level until all of the participants met or exceeded the set criterion for a minimum of three consecutive probes. Though no established guidelines are outlined in the literature (e.g., Alberto and Troutman 2006; Kazdin 1982) for determining the most effective and efficient method of determining when to make criterion shifts, we chose a minimum of three consecutive probes at or above the set criterion to insure that the participants had sufficiently mastered each material prior to adding new instruction. During criterion changes, additional materials were added to instruction.

The researchers also calculated percentage of non-overlapping data (PND) for each participant to supplement visual analysis of graphed data to determine the effects of the intervention (Scruggs and Mastropieri 1998). PND is calculated by dividing the number of intervention data points that exceed the highest baseline data point by the total number of intervention phases data point, multiplied by 100 (Scruggs et al. 1987). Scruggs and Mastropieri recommend the following guidelines to evaluate PND scores: scores higher than 90% suggest highly effective treatments, scores from 70% to 90% indicate effective treatments, scores between 50% and 70% indicate questionable treatments, and scores below 50% suggest ineffective treatments.

Results

Figures 13 present the number of correct responses for language probes for Kyle, Aidan, and Nico. The x-axis represents language probes and the y-axis represents the number of correct responses for each language probe (i.e., number of materials for which the participant correctly names two items). All three participants rapidly responded to treatment in an upward trend, rapidly met criteria in each phase, and had high PNDs. Table 2 provides a summary of each participant’s total number of probes administered and total correct responses per material during baseline and intervention probes.

Fig. 1
figure 1

Number of correct responses: Kyle

Fig. 2
figure 2

Number of correct responses: Aidan

Fig. 3
figure 3

Number of correct responses: Nico

Table 2 Total probes and total correct responses per material during baseline and intervention

Kyle

During baseline, Kyle’s average performance was 1.75 (range = 1–3) correct responses. Data trends in baseline were somewhat variable though low. Because he knew approximately two of the materials prior to instruction, the researchers set his first criterion at 5 (CR = 5). During criterion (CR) 5, when instruction for cloth, paper, and plastic took place, he rapidly met and exceeded criterion within four probes (mean = 5, range = 3–6). The data trend in this phase was increasing and he met or exceeded the criterion in all but the first data point during this phase. During the next criterion change (CR 6), during instruction for leather, Kyle exceeded criterion for all four probes (mean = 7, range = 7), so his next criterion change was set at eight correct. During CR 6, his data trend was stable. During the final criterion change (CR 8), during instruction for glass, he met criterion in three probes (mean = 8, range = 8). His data trend during this phase was stable. Kyle maintained his performance of eight correct responses when a maintenance probe was given three weeks after instruction ended (M). Across phases, Kyle’s data were increasing, though variable, particularly during the first two phases (baseline and CR5). To supplement the visual analysis, PND calculated for Kyle resulted in a score of 92%, which suggests a highly effective treatment.

Aidan

During baseline, Aidan’s average performance was 0 correct responses (range = 0) and his data trend was low and stable. The researchers set his first criterion at 3 (CR = 3). During CR 3, when instruction for cloth, paper, and plastic took place, Aidan met criterion within seven probes (mean = 2.3, range = 0–3) and his data were increasing, though stable for all but the first data point of this phase. During the next criterion change (CR 4), when instruction for leather took place, he met criterion in four probes (mean = 3.8, range = 3–4) and the data trend was increasing and fairly stable. During CR 5, when instruction for glass took place, he also met criterion in four probes (mean = 5, range = 5) and the data trend was stable. During the final criterion change (CR 6), when instruction for wood took place, he met or exceeded criterion within six probes (mean = 5.7, range = 5–7), and his data trend was increasing in this phase. Aidan maintained his performance of six correct responses when a maintenance probe was given 3 weeks after instruction ended (M). Across phases, Aidan’s data were increasing and generally stable. To supplement the visual analysis, PND calculated for Aidan indicated a score of 95%, which suggests a highly effective intervention.

Nico

During baseline, Nico’s average performance was 0 correct responses (range = 0) and was low and stable. The researchers set his first criterion at 3 (CR = 3). During CR 3, when instruction for cloth, paper, and plastic took place, he met criterion within nine probes (mean = 1.8, range—0–3) and his data trend gradually increased. During CR 4, when instruction for leather took place, he met criterion in four probes (mean = 3.8, range—3–4) and his data trend in this phase was increasing. During CR 5, when instruction for glass took place, Nico met criterion in four probes (mean = 4.8, range = 4–5) and his data trend was increasing. During the final criterion change (CR 6), when instruction for wood took place, he met criterion within three probes (mean = 6, range = 6) and his data were stable. Nico maintained his performance of six correct responses when a maintenance probe was given 3 weeks after instruction ended (M). Across all phases of data collection, Nico’s data were increasing in trend and stable. PND calculated for Nico resulted in a score of 90%, which suggests an effective to highly effective treatment.

Discussion

The purpose of this study was to extend the research on the use of DI to the remediation of oral language skills in elementary children with ASD. The research study demonstrated the effectiveness of a DI program with regard to the oral language skill of identifying the materials of which objects are made. The percent of non-overlapping data points was at least 90% for each of the students, indicating that DI was a highly effective intervention (Scruggs and Mastropieri 1998). A functional relation was demonstrated between material identification and DI through replication of skill increases over at least three criterion changes across three students. The students’ increases in expressive language skills are consistent with previous research regarding students with developmental delays (Waldron-Soler et al. 2002) and students with epilepsy and academic deficits (Humphries et al. 2005). However, this study extends the line of research in include students with ASD. Furthermore, the students in the current study maintained their performance after instruction ceased.

In addition to maintaining material identification skills within the classroom, one of the students generalized these skills across settings and people. Though pivotal response intervention (PRI) was not included as an intervention in this study, Kyle demonstrated a skill often targeted by PRI, learning via initiating to others (Koegel et al. 2001), or self-initiation. Kyle’s teacher reported anecdotally that following initial DI instruction, he frequently and spontaneously, at home and school, asked adults and older siblings to tell him items that were made of different materials and asked of what materials different items were made. PRI recommends explicitly teaching children to ask questions, however, Kyle began doing so without instruction. Though this could be interpreted as delayed echolalia, Kyle used it in a functional manner, incorporating others’ answers into his own verbal repertoire. Future research could investigate the extension of DI by teaching students to initiate or extend their own learning by asking questions of others related to current language lessons.

The instructional procedures were modified based on the students’ individual needs. Initially, the students had difficulty when instruction involved few pictures or words only; this difficulty with abstract language is consistent with the characteristics of individuals with ASD (APA 2000). The researchers provided scaffolding to bridge the gap from concrete to abstract. The concrete-representational-abstract (CRA) sequence has been demonstrated as effective for students with learning disabilities in the area of mathematics (Harris et al. 1995; Mercer and Miller 1992a). CRA instruction begins with concrete objects, which in this study were actual objects made of each material. The researchers used multiple and varied objects made from each material until the students demonstrated mastery during daily instruction. The students touched and manipulated each of the concrete objects during instruction. The representational stage of instruction involves the use of pictures only; in the current study, the researchers used pictures from the program presentation book and pointed to objects within the classroom. Finally, instruction moved to the abstract stage in which the researchers presented instruction using words only. The use of the CRA sequence was successful with these students, consistent with this strategy’s success in the area of mathematics instruction (Butler et al. 2003; Miller and Mercer 1993a; Miller and Mercer 1993b; Mercer and Miller 1992b; Harris et al. 1995; Witzel et al. 2003).

Limitations

This was a small study and although it demonstrated a functional relation, further replication under varied conditions (Kazdin 1982) is needed in order to draw conclusions about DI as a language intervention for students with ASD. The efficacy of DI demonstrated in this study does not preclude the effects of other interventions since there was no comparison between DI and other methods. The authors’ role as instructor is another limitation (Kazdin 1982). First, as outsiders to the classroom environment, the authors may have influenced the students’ motivation and desire to please the instructors. Second, the authors’ expertise in DI methodology and procedures may have produced results that are unrealistic when compared to a typical implementation of the program by the classroom teacher. Finally, though the participants demonstrated skill generalization to both of the researchers implementing the study, no further measures of generalization to additional communicative partners or new skills were collected due to the school year ending.

Implications and Future Research

Students with ASD often demonstrate language delays, including deficits in vocabulary development, and verbal comprehension which interfere with their academic performance (APA 2000; Kjelgaard and Tager-Flusberg 2001; Lord et al. 2004; Nation and Norbury 2005; Mayes and Calhoun 2003; Smith et al. 2007). The Language for Learning program (Engelmann and Osborn 1999) provides explicit instruction in language concepts, knowledge, and information needed for learning within the typical classroom. The current study provides initial evidence as to DI’s efficacy with regard to increasing language skills which may provide students with ASD greater access to learning and the general education curriculum.

In order to make this a reality, more research is needed in order to investigate the efficacy of DI with regard to language instruction; this includes more comprehensive implementations of the program and the inclusion of more students with varied characteristics. Finally, to bridge the gap between research and practice, future implementations of DI language instruction should involve typical classroom teachers as instructors.