Learning to read requires an understanding of the alphabetic principle, that is, the individual sounds within spoken words correspond to the letters and letter clusters in written language (Liberman, Shankweiler, & Liberman, 1989). Perhaps the most important finding in the last 30 years of reading research is the importance of phonological abilities, specifically phonological awareness, in reading development (Bradley & Bryant, 1983; Wagner & Torgesen, 1987). The research on reading disabilities further finds that children with dyslexia typically present with significant deficits in phonological awareness and related skills (e.g., Melby-Lervag, Lyster, & Hulme, 2012; Morris et al., 1998; Rack, Snowling, & Olson, 1992). The strongest evidence of the causal relationship between fundamental phonological abilities and reading development, however, comes from intervention studies which demonstrate that instruction in both phonological awareness and systematic phonics (i.e., letter-sound correspondences) improves phonological awareness, phonological decoding, and reading skills (e.g., Blachman et al., 2004; Hatcher, Hulme, & Ellis, 1994; Lovett et al., 2000; Torgesen et al., 2001; Wise & Olson, 1995).

One commonly used remedial teaching method for treating the reading deficits in children with dyslexia is the Orton Gillingham multisensory language approach (Gillingham & Stillman, 1956). The curriculum originated in the work of Dr. Samuel T. Orton and colleagues who recognized that children with dyslexia would benefit from individualized tutoring with systematic phonics-based reading instruction (Orton, 1925). Their approach was unique at the time for (a) its emphasis on individually introducing each phonogram and all the rules for blending them into larger units (e.g., syllables), (b) every unit was taught using visual, auditory, and kinesthetic information to establish representations of print-sound correspondences and, (c) the units of language were introduced in a systematic sequence of increasing complexity from simple vowels and consonants through multiple syllable words (Uhry & Clark, 2005). The approach has been adapted by teachers trained in the method to accommodate specific educational needs and, as a result, Orton Gillingham provides the basis of many current published reading programs (Uhry & Clark, 2005).

The individual therapy model of the original Orton Gillingham approach, however, proved a limiting factor as increasing demand for specialized reading teachers exceeded the numbers provided by qualified teacher training programs (Cox, 1985). This constraint on the delivery of instruction motivated the development of Alphabetic Phonics, a direct extension of the Orton Gillingham Manual that restructured and expanded the approach with a structured lesson sequence to accommodate teaching small groups of students (Cox, 1992). The curriculum was further developed with planned daily lesson outlines and standardized lesson content in the Dyslexia Training Program (Beckham & Biddle, 1987). The structure provided by the DTP curriculum and supplemental teacher and student workbooks allowed the entire lesson sequence to be video recorded to support delivery of expert dyslexia instruction in schools that did not have adequately trained teachers (Oakland, Black, Stanford, Nussbaum, & Balise, 1998). The Dyslexia Training Program also proved a useful resource for fully trained dyslexia specialists who did not use the accompanying videotapes.

The two curricula shared an emphasis on teaching the structure of written English (i.e., phonics) and vocabulary, two important components of effective reading instruction (National Reading Panel, 2000). However, the programs did not include specific instruction in phonological awareness, reading fluency, and reading comprehension, the remaining components of a comprehensive reading program. Given limited class time in the typical school environment, the curriculum needed to be restructured again to include the missing components in the lesson plan while preserving the strength of its phonics instruction. The outcome of that effort, Take Flight: A Comprehensive Intervention for Students with Dyslexia (Avrit et al., 2006), contained the systematic instruction on the structure of written English found in its predecessors, but it also included integrated instruction in phonological awareness, reading and spelling, and added instruction in reading rate or fluency and reading comprehension.

Although widely adopted in clinics and schools, there is limited research on the efficacy of Orton Gillingham instruction in general and the Alphabetic Phonics-based curricula in particular. A few studies suggest that the programs improve the word reading skills of children with reading disabilities. For example, one study found improvements in reading, but no comparative benefit of Alphabetic Phonics instruction on word reading or decoding relative to synthetic phonics after one academic year of instruction (Foorman et al., 1997). A second study with younger first-grade students, however, found significant improvements in phonological awareness and decoding relative to a classroom basal reading control group (Joshi, Dahlgren, & Boulware-Gooden, 2002). Similarly, a study with older children that compared the Dyslexia Training Program, provided by a trained teacher or video-taped lessons, with a matched classroom control sample showed significant gains in phonological decoding, word reading, and reading comprehension (Oakland et al., 1998). The objective of this paper, therefore, is to add to the literature on Orton Gillingham programs and report data from a hospital-based learning disabilities clinic that provides qualified support for effectiveness of the Take Flight curriculum.

Research questions

The evaluation of the curriculum’s efficacy addressed two research questions: (a) Does the curriculum improve the basic reading skill deficits associated with reading disability? and (b), What was the comparative effectiveness of the Take Flight curriculum’s reading rate and reading comprehension strategies and exercises? The first question was addressed by examining longitudinal data from initial diagnosis and referral for services through the conclusion of the intervention. The question of the added benefit of reading rate and reading comprehension instruction was addressed by comparing the effects of the Take Flight intervention with a historical control sample that received evidence-based intervention with the Dyslexia Training Program. Significant improvements in reading skills were expected given the research consensus on the effectiveness of systematic phonics instruction for teaching reading to children with dyslexia (Vellutino, Fletcher, Snowling, & Scanlon, 2004). Moreover, previous studies report that specific exercises for reading rate and direct instruction in reading comprehension further improve those reading outcomes (e.g., Berends & Reitsma, 2007; Klingner & Vaughn, 1998).

Method

Participants

Participants were recruited from 12 cohorts of patients that were treated for a diagnosed reading disability at a hospital-based learning disabilities clinic. The students’ ages ranged from 7 to 14 years at the start of intervention; the majority (62%) were in Grades 3 to 5. Participants received one of two Orton Gillingham-based treatment programs. A sample of 87 patients that received Take Flight (TF) instruction were included to address the question of treatment efficacy. A historical control sample (n = 37) treated with the Dyslexia Training Program (DTP) was included to assess the added value of the reading rate and comprehension components of the Take Flight curriculum. The academic and demographic characteristics of the two intervention groups are shown in Table 1.

Table 1 Pre-intervention participant characteristics by curriculum

Procedure

Diagnostic evaluation

All participants were evaluated in a single session by a licensed psychologist, educational diagnostician, or speech-language pathologist. Final diagnoses of dyslexia were decided by consensus of the testing clinician and attending developmental-behavioral pediatrician. Patients were referred for educational services at the learning disabilities clinic if they did not have access to adequate treatment at home or in school. The interval between diagnosis and the onset of intervention varied for each participant and ranged from zero to 41 months (M = 9.0, SD = 6.2). An examination of patient records indicated that 47% of the sample received reading tutoring either in special education or with private tutors. An additional 40% of the sample were homeschooled. In all cases, the students received some form of phonics instruction that proved ineffective and prompted enrollment in the clinic’s dyslexia treatment program. There was no charge for any services provided by the clinic.

Study recruitment

All students who met a 90% attendance criterion and completed their respective two-year interventions were eligible for this study. Patients from nine consecutive cohorts in the learning disabilities clinic were eligible for recruitment for the Take Flight sample. Inclusion required complete phonological awareness, word reading, and decoding data from diagnosis through post-intervention evaluation. Forty-nine patients from the nine cohorts were missing one or more data points at diagnosis and were therefore excluded from participation. The enrolled Take Flight sample (n = 87) thus represented 64% of 136 patients available at the time of recruitment. The inclusion criteria for students enrolled in the Dyslexia Training Program historical control group required complete word reading, decoding, and reading comprehension data collected during their intervention. The enrolled historical control group represented 100% of 37 patients that received treatment at the time of recruitment.

Intervention programs

Two Orton Gillingham-based curricula, Take Flight and the Dyslexia Training Program, were used in this study (Avrit et al., 2006; Beckham & Biddle, 1987). The latter program was the dyslexia curriculum used in the learning disabilities clinic before Take Flight was developed. The two curricula were designed with similar implementation conditions. Specifically, the programs were taught by experienced Certified Academic Language Therapists to small groups of two to six students identified with a phonologically based reading disability. Second, the hospital’s dyslexia intervention services using either curriculum were delivered in either one-hour or 90-minute sessions for an average of five hours per week for a minimum contact time of 280 hours over two academic years.

Take Flight

The instruction of the Take Flight curriculum was presented in two alternating daily lesson plans. The first lesson plan (A) included introductions to phonemic awareness and phonics concepts (i.e., grapheme-phoneme correspondences), syllable division rules, morphology, vocabulary, comprehension strategies, and spelling rules. The daily lesson also includes practice to automaticity of previously learned phonics concepts. The alternate lesson plan (B) primarily focused on applying skills and strategies learned on A-days in timed repeated reading exercises, vocabulary development and comprehension strategy use with authentic text reading, and spelling to dictation.

The phonemic awareness component of the lesson sequence focused attention on how multiple levels of language are represented in words and then combined visual and verbal introduction with kinesthetic exercises to explore the production of phonemes (e.g., Lindamood & Lindamood, 1975). The lessons incorporated phonemic identification and phonemic manipulation using labels and mouth picture manipulatives to represent articulator actions for each phoneme (e.g., voiced bilabial plosive -> “lip puffer”), making obvious the distinctions between consonants sounds and more complex vowel sounds. Phonemic awareness instruction was then integrated with spelling instruction in oral and written sound-symbol exercises for each phoneme-grapheme pairing in the curriculum lesson sequence. The exercises progressed to abstract colored squares to represent individual phonemes in the latter half of the curriculum lesson sequence.

The core phonics instruction of the curriculum introduced the 26 graphemes and 96 combinations of graphemes in different situations of written English (e.g., in a one syllable base word with a short vowel, a final ck is pronounced /k/). The lesson sequence was organized by place of articulation and frequency. The introductions proceed in the same manner for all phonic concepts in the lesson’s “new learning.” The concepts are first presented auditorily for phonemic discovery (e.g., what sound do you hear in it, dip, bid?) and then visually (grapheme discovery). Socratic questioning with the students then determines if the phonics concept is a consonant or vowel, its manner of articulation (open or blocked), voiced or unvoiced sound, and the sound and symbol associated with a mnemonic keyword. The sound-symbol correspondences of the new learning was then established with four activities (i.e., ‘Linkages’) that included hand writing the grapheme, repeating the reading correspondence (i.e., print to sound), repeating the spelling correspondence (i.e., sound to print), and a final activity that combined the phonemic discovery and the other three linkages. More advanced units such as derivatives and syllable division rules were introduced using the questioning method and reading and spelling linkages. Two grapheme-phoneme situations or phonics concepts (i.e., “new learnings”) were introduced each day, first practiced in words and then sentences that contain the lesson’s new learning and previously learned phonics concepts. Students were also taught a strategic approach to decoding novel words by noting (a) where the accent is in the word, (b) the number of syllables in the word, (c) what are the adjacent letters (i.e., the situation), and (d) the position of the graphemes in the syllable or word (i.e., initial, medial, or final).

Reading rate practice was informed by research that supported the efficacy of repeated oral readings as a means of improving reading fluency (Chard, Vaughn, & Tyler, 2002; NICHD, 2000). However, rather than having continuous text as its focus, the curriculum’s repeated reading exercises were designed to promote the recognition of phonics concepts (i.e., vowel and syllable patterns) within words (e.g., Berends & Reitsma, 2007; Conrad, 2008). Students would begin working on reading rate exercises only after achieving accurate recognition of each orthographic unit. The curriculum included 23 rate exercise packages. The rate packages started with single-syllable short vowel situations (i.e., CVC, CVCC) and proceeded with more complex vowel situations (e.g., accented and unaccented r-controlled) and syllable division patterns (e.g., VCCV) through Latin (e.g., “tract”) and Greek (e.g., “geo”) combining forms. The exercises trained rapid recognition of the target phonics concepts by repeated reading of an array of words that shared the central concept (e.g., CVC: hid, set, bat). Additional lists of “instant” words were included to develop sight word vocabulary (Fry & Kress, 2006). While increasing word reading speed to a stable level, the naming exercises introduced repeated reading of phrases and sentences containing the central phonics concepts. Students progressed to the next set of exercises after achieving stable reading rate improvement on phrases and sentences. Progress was measured with a cold reading of a passage containing the central phonics concept to establish baseline rate and compared to reading after completing the rate package.

Comprehension instruction included introductions of grammatical concepts (e.g., sentence diagramming), vocabulary concepts (e.g., word relationships), specific vocabulary-building strategies (e.g., dictionary and thesaurus skills), organizational patterns of text structure (e.g., cause-effect), and metacognitive strategies that teach students to actively question the text as they read. The metacognitive strategy strand of the curriculum was organized around a central investigative theme (i.e., “Comprehension Mystery”) and based in part on the collaborative strategic reading model (e.g., Klingner & Vaughn, 1998; Vaughn, Klingner, & Bryant, 2001). The first step was preview and prediction of the text to determine the central theme. Students also learned to actively identify salient vocabulary items and develop background knowledge (Ogle, 1986). The next strategy was an investigative phase where students are taught to identify the key story elements (i.e., who, what, etc.) as they read the text and manually flag each element for future review. Students were also taught strategies to address questions that arise as they read the text, how to identify missing clues (i.e., inferencing), and how to further develop background knowledge to aid ongoing comprehension of the text. The final strategies centered on producing a quality summary of the text that concisely contains key story elements and how they fit in the narrative structure (Somebody Wanted But So; Macon, Bewell, & Vogt, 1991).

The comprehension lessons employed a scaffolded teaching model, each step starting with teacher-directed instruction and gradually passing control to the students as they were ready to assume that responsibility (Palincsar & Brown, 1984). The vocabulary/comprehension strand proceeded through the lesson sequence with progressively sophisticated comprehension concepts and strategies taught with more complex authentic narrative texts. At the midpoint of the second year of the intervention, instruction shifted from narrative to expository text. The investigative focus of the narrative strategies already learned readily adapts for use with expository text. However, the strategies now focused on content vocabulary development and expository text structure, such as main ideas, to aid understanding non-fiction text and textbooks.

In total, approximately 35% of the lesson time was devoted to direct instruction of phonological skills (25% phonics and 10% to phonemic awareness). Spelling instruction accounted for an additional 17% of instruction time. The reading rate exercises accounted for approximately 18% of instruction time. Reading comprehension and vocabulary lessons comprised an additional 30% of class time (20 and 10%, respectively).

Dyslexia Training Program

The DTP taught the same phonics concepts as the Take Flight curriculum although the DTP lesson sequence was based primarily on frequency of grapheme units in the language (Hanna, Hanna, Hodges, & Rudorf, 1966). The only substantive differences in the phonics lessons instruction was the use of additional linkage activities (e.g., “skywriting,” or tracing the letter form using gross motor movements) and the new learnings were limited to one per lesson rather than the two per lesson in Take Flight. Spelling instruction for each phoneme-grapheme unit was not taught at the same time as reading for the same unit but followed by several weeks. Each lesson included instruction in sound/symbol associations, syllable types and patterns, morphology, and spelling rules.

Phonemic awareness, while not a part of the original DTP curriculum, was taught to the participants in this study using a separate phonemic awareness program (Avrit & Rumsey, 1997). The method of teaching phonemic awareness was the same as in Take Flight; however, the sequence of instruction was not integrated with the reading and spelling instruction of the daily lesson. The DTP lesson plan did not contain specific exercises for reading rate. Comprehension instruction was primarily focused on vocabulary and syntax presented in the context of listening activities. There was no specific instruction in metacognitive reading comprehension strategies or application when reading continuous text. Additionally, students did not begin applying their skills with continuous texts until the second year of intervention, in contrast with Take Flight which begins using trade books for applied practice after the first nine weeks of instruction.

Approximately two thirds of the instructional time was devoted to phonology-dependent skills such as decoding and spelling. Fifteen percent of the lesson was allocated for additional spelling exercises to provide visual reinforcement of spelling rules and handwriting instruction to provide kinesthetic reinforcement. Phonemic awareness instruction comprised approximately 10% of the lesson. Listening comprehension exercises were appended at the end of each lesson and accounted for 10% of instruction time.

Therapist training

The curricula used in this study were designed to be taught by Certified Academic Language Therapists (CALT), that is, teachers who have completed an accredited two-year training program. The therapist training curriculum focused on teaching the structure of written language, multisensory structured language-based teaching methods, instructional strategies, reading development, and assessment of reading disability. The standards for certification included a minimum of 200 instructional hours from qualified instructors in an Orton Gillingham-based training program, including observations of experienced therapists demonstrating the curriculum with children in a small-group classroom environment. The therapists-in-training also had to complete a minimum of 700 supervised clinical hours working with children with dyslexia and ten lessons over the two years of training were video recorded for review and feedback from a qualified instructor.

The hospital has served as an accredited training center for more than 30 years, training new therapists using the Dyslexia Training Program, and subsequently, the Take Flight curriculum. As a result, all the CALTs on the education staff that participated in this study had the dual responsibilities of teaching their students and also providing model intervention practices for classroom observations by teachers in the therapist training program. Demonstrating the components of the curricula and effective Orton Gillingham intervention methods for therapists-in-training required rigorous adherence to the respective curriculum lesson sequences and consistent implementation of the daily lesson plans, procedures, and therapeutic model of remediation. There were no substantive differences in the teacher training programs using the Dyslexia Training Program or Take Flight other than the instructional details of specific curriculum content. All CALTs on the education staff had a minimum of two years’ experience working with their respective curriculum before data collection. Regardless of previous experience, all therapists new to the hospital’s education staff were mentored by senior instructors during their first year of practice to ensure adherence to each curriculum’s protocol.

Fidelity of implementation

Research suggests fidelity of implementation can be assessed in at least five dimensions (O’Donnell, 2008). Although treatment fidelity was not formally measured in this study, the quality of instruction provided in this study was a necessary condition of the hospital’s accredited therapist training program. First, the equivalence of instructor experience for the two curricula was established by the role each education staff member performed as a trainer of teachers. Adherence, referring to the delivery of curriculum components as designed, and consistent teaching methods were assured in this study as a result of the requirements of modeling expert instruction in the classroom. Treatment duration, defined as the number, length, and frequency of classroom contact, was similar for the two curricula in this study and documented in progress reports that indicated compliance with the respective curriculum lesson sequences. Student engagement was supported by interactive small group teaching methods and an enforced attendance policy so that all students were involved in a least 90% of available instruction. Treatment differentiation was evident in the contrast of reading rate and comprehension gains in the Take Flight curriculum with the historical control that did not receive such instruction.

Measures

Norm-referenced standardized measures were selected for the majority of reading skills so that treatment outcomes could be compared with expected development. All evaluations were completed by the diagnostic staff of the hospital’s learning disabilities clinic.

Intervention outcomes

Word recognition and reading comprehension were measured with the Wechsler Individual Achievement Test (WIAT; Psychological Corporation, 2002). The word identification subtest required reading isolated real words. Reading comprehension required participants to silently read passages and answer verbally presented questions. Split-half reliabilities were .92 for word reading and .88 for reading comprehension subtest. The Decoding Skills Test (DST; Richardson & DiBenedetto, 1985) assessed pseudoword reading. The pseudowords were derived from a list of single- and multi-syllabic real words. The unit of analysis (Phonological Transfer Index; PTI) reflected accurate decoding of the pseudowords as a proportion of the correctly read real word list. The DST is a criterion referenced instrument (i.e., adequate decoding is .70) with a reported internal reliability of .90. Oral reading fluency was measured with the Gray Oral Reading Test (GORT, Wiederholt & Bryant, 2001). Participants read text passages and responded to verbally presented multiple-choice questions. Reading accuracy, rate, and comprehension performance was combined in a summary oral reading quotient with a reported internal consistency of .96. Participants in the Take Flight group were also given a measure of phonological awareness from the Comprehensive Test of Phonological Processing (CTOPP; Wagner, Torgesen, & Rashotte, 1999). The phonological awareness subtest required participants to elide and blend phonemes in verbally presented words. The measure reports internal reliability of .90.

Diagnostic and demographic instruments

Intellectual aptitude was assessed with the Wechsler Abbreviated Scale of Intelligence (WASI; Psychological Corp., 1999) or Wechsler Intelligence Scales for Children (Wechsler, 1991, 2003).The measures were highly correlated (r > .83) and both instruments reported internal reliabilities greater than .90. Attention problems were assessed from parent and teacher ratings on the SNAP-IV Rating Scale (Swanson, 1992). The SNAP-IV scale has reported coefficients alpha of .94 (parent) and .96 (teacher) ratings (Bussing et al., 2008). A diagnosis of attention-deficit hyperactivity disorder (ADHD) was made when functional impairments met DSM-IV (American Psychiatric Association, 1994) criteria. Socioeconomic status was assessed with the Hollingshead Four-Factor Index of Social Status (Hollingshead, 1975). The scale estimated socioeconomic status as a weighted sum of both parents’ education levels and types of occupation.

Data analysis

Data screening found no extreme univariate outliers (i.e., ≥ 3 IQR; Tukey, 1977). The distribution of wait-list intervals was positively skewed and therefore log-transformed prior to analyses. Continuous covariates were mean-centered prior to analyses. All data were analyzed with SPSS v.24. An alpha level of .05 was used for all analyses.

Baseline descriptive measures of the two enrolled samples were compared with independent-sample t tests or Pearson’s chi-square tests of independence (see Table 1). The Take Flight sample summarized in Table 1 represented a subset of patients available for recruitment. Analyses of the Take Flight sample data therefore first compared the enrolled sample with the remaining unselected patients to ensure representativeness of the sample. Multivariate methods (MANOVA) were used in the analyses of baseline and intervention gains to reduce family-wise error rate (Tabachnick & Fidell, 1996). Follow-up univariate ANOVAs were then used to identify any significant group differences.

The intervention data presented two mixed quasi-experimental designs for analyses of the study’s two research questions. First, the data relevant to the efficacy of the Take Flight curriculum (Research Question 1) was from an interrupted time-series design with a variable wait-list period prior to intervention. Repeated measures analyses of covariance (ANCOVA) compared development during the wait-list period with observed outcomes after the onset of treatment. Single degree-of-freedom profile contrasts compared skill status at each time point with the subsequent time point to model incremental change over time. Wait-list interval was used as a continuous covariate in the analyses. Participant cohort was included as a nominal blocking variable to control for year-to-year variance in the sampling of students for treatment from year-to-year. Intervention gain scores were then computed and used as outcome measures in univariate ANCOVAs to aid interpretation of any significant interactions with covariates.

The second research question regarding the added value of Take Flight reading rate and comprehension instruction was assessed by comparing treatment effects with the historical control sample that received the DTP for their intervention. Participant cohort was also included in the statistical model; however, because of the historical comparison design, the cohorts were nested within curricula. For this reason, two-factor mixed effects nested ANOVAs were used for the analyses with Curriculum as a fixed effect and Cohort as a random effect to control for any variability in sampling.

Results

The first set of analyses compared the baseline status of the enrolled Take Flight sample to patients that were excluded due to missing diagnosis evaluation data. A 2 × 9 [Enrollment Status (Enrolled, Not Enrolled)] × [Cohort (Year1, …, Year 9)] MANOVA found no multivariate effect of Enrollment Status, Wilks’ Λ = .63, F(6, 100) = 1.55, p = .17, η 2 = .09, in baseline measures of age, phonological awareness, decoding, word reading, reading comprehension, or oral reading ability. The effect of Cohort was significant in the analysis, Wilks’ Λ = .49, F(48, 496.1) = 1.6, p = .01, η 2 = .11, suggesting year-to-year variability in baseline status. There was no interaction of Enrollment Status with Cohorts, Wilks’ Λ = .92, F(48, 496.1) = 1.03, p = .42, η 2 = .08. Follow-up univariate analyses, however, indicated the enrolled sample was older than the unselected students (MENROLLED = 10.1, SD = 1.7 vs. MNOT ENROLLED = 9.2, SD = 1.7), F(1118) = 7.7, p = .01, η 2 = .06.

A 2 × 9 [Enrollment Status (Enrolled, Not Enrolled)] × [Cohort(Year1, …, Year 9)] MANOVA of intervention gains in phonological awareness, phonological decoding, word reading, and comprehension showed no multivariate effect of Enrollment Status, Wilks’ Λ = .97, F(4, 107) < 1, p = .51, η 2 = .03. The effect of Cohort, Wilks’ Λ = .72, F(32, 396.2) = 1.18, p = .24, η 2 = .08, and the interaction of Cohort with Enrollment Status, Wilks’ Λ = .71, F(32, 396.2) = 1.23, p = .19, η 2 = .08, were also not significant in the analysis.

Effects of intervention

The previous analyses found no reliable evidence of selection biases in the enrolled Take Flight sample. The following set of analyses evaluated the efficacy of the Take Flight curriculum by comparing skill development during the wait-list period between diagnosis and the onset of treatment with observed outcomes after intervention. The analyses focused on constructs central to the diagnosis of dyslexia, that is, phonological awareness, phonological decoding, and word recognition. Data from the diagnostic clinic and intervention evaluations of the Take Flight sample are shown in Table 2.

Table 2 Phonological awareness and reading scores by test point

A 4 × 9 [Time (Clinic, Baseline, Midtest, Posttest)] × [Cohort(Year 1, …, Year 9)] repeated measures ANCOVA with wait-list interval as a covariate indicated a significant multivariate effect of Time on phonological awareness standard scores, Wilks’ Λ = .39, F(3, 74) = 38.69, p < .0001, η 2 = .61. The interaction effects of Time with the wait-list interval covariate, Wilks’ Λ = .94, F(3, 74) = 1.57, p = .20, η 2 = .06, and Cohort, Wilks’ Λ = .84, F(24, 215.2,) < 1, p = .96, η 2 = .05, were not statistically reliable. The profile contrasts found no reliable difference in phonological awareness between initial diagnosis evaluation and pretest, F(1, 76) = 2.05, p = .16, η 2 = .03. The contrasts indicated significant gains from pretest to midtest, F(1, 76) = 47.02, p < .0001, η 2 = .38, and midtest to posttest, F(1, 76) = 16.32, p < .0001, η 2 = .18.

The same statistical model applied to word reading standard scores also indicated a significant multivariate effect of Time, Wilks’ Λ = .59, F(3, 74) = 16.76, p < .0001, η 2 = .41. The interaction of wait-list interval with the Time effect was statistically reliable in that analysis, Λ = .86, F(3, 74) = 3.9, p = .01, η 2 = .14. The effect of Cohort was not reliable, Wilks’ Λ = .67, F(24, 215.2) = 1.35, p = .13, η 2 = .13. The profile contrasts indicated a statistically significant decrease in word reading scores during the wait-list interval, F(1, 76) = 8.88, p = .004, η 2 = .11. In contrast, significant gains were found from pretest to midtest, F(1, 76) = 10.21, p = .002, η 2 = .12, and at posttest, F(1, 76) = 36.25, p < .0001, η 2 = .32. A post hoc analysis of the covariate interaction term indicated that the length of the wait-list interval was related with a decrease in standard scores prior to intervention, b = − 1.5, t(1, 76) = − 2.54, p = .01, sr 2 = .08. The effect reversed in the first year of instruction, b = 1.5, t(1, 76) = 2.19, p = .03, sr 2 = .06.

Similar analyses of monosyllabic pseudoword decoding found a significant multivariate effect of Time, Wilks’ Λ = .20, F(3, 73) = 99.73, p < .0001, η 2 = .80. The multivariate interaction of wait-list interval with Time was significant in these analyses, Λ = .87, F(3, 73) = 3.69, p = .02, η 2 = .13. The interaction of Cohort with Time was also reliable in the analysis, Wilks’ Λ = .52, F(24, 212.3) = 2.21, p = .002, η 2 = .19. The profile contrasts indicated no significant differences in pseudoword decoding between diagnosis and pretest, F(1, 75) < 1, p = .71, η 2 = .00, but significant gains from pretest to midtest, F(1, 75) = 88.54, p < .0001, η 2 = .54, and posttest, F(1, 75) = 76.71, p < .0001, η 2 = .51. Profile contrasts of the interaction of wait-list interval and Time found significant effects of the interval length during both the baseline period and the first year of intervention, F(1, 75) = 8.76, p = .004, η 2 = .11, and F(1, 75) = 7.53, p = .008, η 2 = .09, respectively. Post hoc ANCOVAs indicated that the wait-list interval was positively related to gains from diagnosis to pretest, b = .07, t(1, 75) = 2.96, p = .004, sr 2 = .11, and negatively related with gains during the first year of instruction, b = −.06, t(1, 75) = − 2.74, p = .008, sr 2 = .09.

The analyses of polysyllabic pseudoword decoding found similar outcomes for the multivariate effect of Time, Wilks’ Λ = .24, F(3, 52) = 55.91, p < .0001, η 2 = .76. The interaction of the wait-list interval with the Time effect was also significant in the analysis, Λ = .83, F(3, 52) = 3.65, p = .02, η 2 = .17. The interaction of Time and Cohort was reliable in this analysis, Wilks’ Λ = .49, F(24, 151.4) = 1.75, p = .02, η 2 = .21. The profile contrasts indicated no significant gains during the wait-list interval, F(1, 54) < 1, p = .57, η 2 = .01; however, the contrasts did show significant gains from pretest to midtest, F(1, 54) = 57.06, p < .0001, η 2 = .51, and posttest, F(1, 54) = 44.47, p < .0001, η 2 = .45. The post hoc ANCOVA showed that longer wait-list intervals were related to larger gains from diagnosis to pretest, b = .08, t(1, 54) = 2.91, p = .005, sr 2 = .14.

Curriculum differences

The goal of the curriculum’s development was to preserve the effectiveness of word reading and phonics instruction while adding reading rate and comprehension components to the lesson plan. Separate 2 × 12 (Curriculum[TakeFlight, DTP] × Cohort [Year 1, …, Year 12]) mixed effects nested ANOVA models were used to assess the comparative treatment effects of curricula on decoding accuracy, oral reading, and reading comprehension gain scores. Cohort was nested within curriculum for each analysis. Group means for each outcome measure are shown in Table 3.

Table 3 Comparative treatment effects

The analyses found no significant curriculum differences in word reading, F(1, 112) < 1, p = .64, η 2 = .03, or phonological decoding skill gains, (Monosyllabic) F(1, 111) < 1, p = .81, η 2 = .01 or (Polysyllabic) F(1, 97) < 1, p = .47, η 2 = .05. The Cohort effect was not reliable in either word reading, F(10, 112) < 1, p = .86, η 2 = .04, or decoding analyses, F(10, 111) < 1, p = .69, η 2 = .06 and F(10, 97) = 1.40, p = .19, η 2 = .12, respectively. The analysis of oral reading ability also did not indicate a curriculum difference, F(1, 86) < 1, p = .82, η 2 = .01, or effect of Cohort, F(8, 86) = 1.17, p = .33, η 2 = .09. The analyses of the curriculum effects on the oral reading subscales were not significant for either reading rate, F(1, 88) < 1, p = .74, η 2 = .02, or reading accuracy subscales, F(1, 88) = 3.8, p = .09, η 2 = .03. Cohort was also not reliable in the analyses, F(8, 88) < 1, p = .65, η 2 = .06 and F(8, 88) < 1, p = .72, η 2 = .05, respectively. The ANOVA model for reading comprehension gains, however, did indicate significantly different effects of curriculum, F(1, 98) = 5.84, p = .03, η 2 = .26. Examination of means in Table 3 shows that reading comprehension treatment gains were larger for the Take Flight sample relative to the historical control DTP group. The Cohort effect was not reliable in the analysis of comprehension gains, F(10, 98) < 1, p = .76, η 2 = .06.

Discussion

The purpose of this report was to document recent developments for treating children with significant phonologically based reading difficulties. The curriculum development goal was to maintain the strengths of previous generations of Orton Gillingham-based intervention while incorporating phonological awareness, reading rate/fluency, and reading comprehension instruction in the lesson cycle. Based on previous research, it was expected that the Take Flight curriculum would result in significant growth in reading and decoding accuracy (e.g., Ritchey & Goeke, 2006; Oakland et al., 1998). Moreover, because research-supported reading rate and comprehension activities were included in the lesson cycle, it was expected that students would show stronger gains in those skills. In general, observed outcomes after intervention provided qualified support for the efficacy of the curriculum for improving reading ability.

The admittance procedure of the hospital treatment program resulted in a wait-list design that allowed the participants in this study to serve as their own control group. The outcome measures in Table 2 showed significant gains on important criterion skills during treatment when compared to the wait-list interval prior to intervention. More specifically, the two norm-referenced measures of phonological awareness and word reading did not show reliable growth in those skills during the pre-intervention control period but analyses indicated significant improvement in both skills after the onset of treatment. The two measures of phonological decoding, in contrast, showed modest but statistically significant gains during the wait-list period. Those measures, however, were not norm-referenced but simply reflected change in the proportion correct responses at each time point. Thus, the small growth on the pseudoword reading measures prior to intervention might be interpreted as a result of general skill maturation during the school year. More importantly, however, the contrast of decoding gains prior to intervention with growth after treatment onset indicated the significant effects of the curriculum’s phonics instruction on observed decoding ability.

These treatment effects, however, were moderated by two important factors. First, the wait-list period was not uniform for all participants and longer wait-list intervals were associated with some differences in pre-intervention skill development. Word identification standard scores showed a significant decrease prior to intervention, and that decrease was more pronounced with longer wait-lists intervals. This result suggests that absolute word reading accuracy growth (i.e., raw score) was stable at best before treatment and the longer intervals revealed that weakness in decreasing relative reading status (i.e., standard scores). The analyses of decoding measures showed the opposite effect with longer intervals related to better growth in decoding before treatment, an effect that may be attributed to slow maturation of those skills as a result of exposure to literacy. Class cohort was a second moderating variable in the analyses of word reading and decoding development, suggesting year-to-year variation in treatment responses. However, since treatment delivery was standardized as a result of teacher training and structured lesson plans, the observed effect was likely due to variability in the skill profiles of students referred for services each year.

The outcomes of the historical control comparisons were more equivocal with respect to curriculum development goals. Specifically, the analyses of data summarized in Table 3 showed no reliable group differences in word reading or decoding skill gains after treatment. This was an expected outcome and suggests that despite the accelerated phonics instruction, the students were able to learn and apply those important reading skills. The results in Table 3 also support the efficacy of integrating specific reading comprehension instruction in the curriculum. Comprehension skills were below average and not significantly different for the two groups of participants at baseline; however, the difference in treatment gains was significant with the result that scores for Take Flight students were well within the average range after the intervention.

The word-level repeated reading exercises, however, did not show the expected differential effects on the oral reading performance of Take Flight students even though the exercises were derived from research-based principles (e.g., Levy, 2001). Previous studies of this type of fluency training showed transfer to untrained items; however, the latter items all shared orthographic structure with the trained items (e.g., Berends & Reitsma, 2007). The reading rate in this study was measured with a standardized test of general oral reading and therefore poor transfer may be related to greater variability in the orthographic structure of the items in each passage. A more focused list of untrained items that shared word structure with the trained phonic concepts may provide a better test of the transfer of training in this study. More generally, however, these results are unfortunately consistent with data reported on differences in fluency outcomes between early and later intervention efforts (e.g., Torgesen, Rashotte, & Alexander, 2001). The majority of students in this study were in the 3rd or 4th Grade at the onset of treatment and the amount of repeated reading practice during the intervention may be insufficient to overcome the deficiencies in sight-word vocabulary development often observed in older students with reading disabilities (e.g., Cunningham & Stanovich, 1998).

In summary, the efficacy of Take Flight is largely a result of integrating research-supported best practices into its constituent components. Converging research has shown that teaching both phonological awareness and letter-sound correspondences (i.e., phonics) improves phonological awareness, phonological decoding, and reading skills (e.g., Blachman et al., 2004; Hatcher et al., 1994; Torgesen et al., 2001). These results are consistent with those studies and add to research on the Orton Gillingham approach to phonics instruction (Ritchey & Goeke, 2006; Oakland et al., 1998). Research has also shown the importance of reading comprehension and oral reading fluency instruction for struggling readers (NICHD, 2000). The group comparison data in this study supported the effectiveness of the added reading comprehension instruction and replicated previous studies of the efficacy of specific components that were integrated in the comprehension strand (e.g., Vaughn, Klingner & Bryant, 2001). The reading rate exercises however did not have the expected effects on reading fluency and may reflect the difficulty of remediating fluency issues in this population. Further research is needed to identify efficient means to provide the accurate print exposure that these students need to develop the automatic word recognition of skilled reading.

Limitations

These results, while of interest for theory and practice, must be interpreted within the context of important methodological limitations. First, the study comparisons were quasi-experimental designs and thus subject to important constraints on valid inference (e.g., Cook & Campbell, 1979). For example, the longitudinal nature of a time-series design confounded general cognitive maturation with the timing of the control and intervention periods and may have affected observed outcomes. The historical control comparison did not permit random assignment or matching of the samples and also confounded historical events in the background education environment with treatment effects (e.g., No Child Left Behind [NCLB], 2002). Additionally, fidelity of implementation in this study was not formally measured but maintained as a result of the study being conducted by the education staff of an accredited academic language therapist training center. Nevertheless, quantitative assessments of adherence, duration, and engagement are needed to help practitioners implement treatment with fidelity. The validity of any intervention depends on whether it was delivered as designed, and variable treatment fidelity may moderate observed effects (O’Donnell, 2008). Third, the students in this study were selected from a convenience sample at a hospital-based clinic and as a result may not be representative of all children receiving dyslexia treatment in other education environments. Another issue concerns the lack of norm-referenced decoding measures for comparisons of relative status with other reading skills. Finally, the reading comprehension and reading rate exercises of the curriculum could not be adequately assessed in this study because the specific instruments used to measure those constructs were not consistently included in the patients’ diagnostic test batteries. A prospective, randomized control trial under routine practice conditions (e.g., public school) with a standardized assessment battery and quantitative fidelity monitoring would adequately address these limitations and provide a much stronger test of the efficacy of the Take Flight curriculum.

Summary

Dyslexia intervention has come a long way since the original guide for teaching children with severe reading difficulties was published (Gillingham & Stillman, 1956). That teaching has been appropriately described as an approach rather than a method because the latter implies more rigidity in practice than was intended. The flexibility to meet the needs of their students has since inspired practitioners to modify and adapt the instruction and is one reason why Orton Gillingham is the basis of many current published curricula (Uhry & Clark, 2005). Although Take Flight may look different, the curriculum retains the central features of the Orton Gillingham approach and in many respects feels much the same as originally outlined. The intensive teacher preparation replicates the training provided by Anna Gillingham with the result of producing more than sophisticated reading teachers but rather academic language therapists. The curriculum itself is language-based in that it teaches the structure of the written English language and it is multisensory by introducing concepts visually, auditorily, and kinesthetically. The instruction is also direct and systematic with sequential and cumulative lesson plans. Moreover, true to its immediate predecessors, the current curriculum is designed to deliver instruction in small groups of students.

The primary differences between the curriculum and the legacy curricula lies in the integration of curriculum components within the lesson scope and sequence. For example, introductions for each grapheme-phoneme pairing are first taught for reading in accuracy practice with words and sentences. In the same new learning lesson, the concept is then taught for spelling in phonemic awareness exercises and spelling of individual words and sentences. In subsequent lessons, the practice of establishing accuracy then transitions to automaticity practice with repeated reading rate exercises. Another example is the cumulative addition of individual comprehension strategies after its introduction into extended comprehension activities, first with narrative and then expository texts. In general, the organizing principle was close integration of all curriculum components from initial introduction and, when appropriate, transition from accurate response to automaticity and application in connected text reading.

The scope of the Take Flight curriculum is also comparable to other structured and comprehensive reading programs designed for small-group, Tier-III intervention. For example, the curriculum shares the same systematic phonics content as the Wilson Reading System (Wilson, 2002), another Orton Gillingham-based curricula. The programs also offer systematic practice for reading fluency and a strategic approach to reading comprehension. One significant difference between the two programs is the integration of articulatory phonetics in the Take Flight phonemic awareness, reading, and spelling instruction. Articulator placement and frequency of use also organizes the sequence of phonics instruction rather than syllable types as in the Wilson program. A second important difference is that students do not need to establish mastery of a particular phonics concept before proceeding to the next lesson. The Take Flight lesson sequence has sufficient repetition of phonics concepts in subsequent lessons that the concepts will be encountered multiple times and mastery acquired through distributed practice.

A second comparable published curriculum is Empower Reading, the result of a program of research at the Toronto Hospital for Sick Children (e.g., Lovett, Lacerenza, & Borden, 2000; Lovett et al., 2000). The program’s approach to phonics is based on the Direct Instruction method (Engelmann & Bruner, 1988) and therefore similar to Take Flight in that Empower shares a systematic, direct, and structured model of phonics instruction with distributed practice. The lesson sequence and specific methods differ however, for example, the Empower program also teaches a metacognitive approach to selecting different word identification strategies for different words and for monitoring outcomes. The Take Flight approach in contrast is more systematic and applied to all novel words. The second difference again lies in the emphasis on articulatory phonetics in Take Flight’s integrated phoneme awareness, reading, and spelling lessons and the organization of the lesson sequence. The fluency exercises of Empower Reading have a more text-level focus than Take Flight, but the reading comprehension approaches of the programs are not substantively different from Take Flight. Finally, the lesson sequence of Empower Reading is much shorter than both Take Flight and Wilson Reading and is designed to be completed in one academic year.

Current and future development

In addition to addressing unresolved instructional needs of children with dyslexia (e.g., reading fluency), future curriculum development will focus on improving both efficiency and accessibility of instruction. Educational technology will certainly play an increasing role in supporting that accessibility for all types of students (Every Student Succeeds Act (ESSA), 2015). One current approach adopts technology as supplemental support for the existing curriculum, that is, applying individual and classroom technologies to efficiently organize and present elements of the daily lesson. This approach however represents a surface level of support and would require a fully trained therapist for implementation. An alternative approach might further expand the levels of interactive support provided for the intervention environment.

In an article describing Alphabetic Phonics, Aylett Cox referred to the development of multimedia tools that could expand the availability of Orton Gillingham treatments (Cox, 1985). That prediction was soon realized in the creation of the video-taped lessons of the Dyslexia Training Program. More recently, computer-assisted learning has become a more common method of providing intervention for struggling readers, including those with dyslexia. A distinct feature of many of these programs is interactive drill-and-practice with differentiated feedback and instruction. Despite the theoretical advantages based on behavioral learning theory, evidence of the benefits of computer-directed reading instruction has been mixed. Student progress has often been poor because of a lack of integration with teacher-led instruction (Dynarski et al., 2007. Another factor impeding reading growth is off-task behavior of students left unsupervised during computer learning activities (Underwood, 2000).

A method of delivering dyslexia intervention that has several advantages combines teacher-initiated, computer-based, and teacher-led instruction during each session. Computer technology could be used to teach the phonemic awareness and decoding elements that most educators are least trained to deliver (Torgesen, Wagner, Rashotte, Herron, & Lindamood, 2010). Lesson scripts could guide teachers in comprehension instruction and text reading practice. Seminars that demonstrate how these components are integrated could be offered at considerable savings of time and cost compared with what would be needed to develop teachers’ proficiency to provide all components of successful intervention. Allowing the teacher to remotely control when the computer-based instruction is provided and repeated would help maximize student engagement and meet individual student needs.

The number of states passing legislation mandating the identification and treatment of dyslexia is escalating nationwide (Youman & Mather, 2013). Public schools are increasingly being asked to provide evidence-based intervention when less than 20% of colleges of education adequately prepare teachers to provide the five components of effective reading instruction described in the National Reading Panel report (Greenberg, Walsh, & McKee, 2014). Take Flight and similar interventions can be made more widely available using a combination of technology and non-technology components. The fact that interactive white boards, laptops, and other electronic devices are now commonplace in elementary schools helps make this a viable solution. However, research is still needed to determine what factors influence the effectiveness of technology applications for children with dyslexia (Cheung & Slavin, 2013).