Introduction

Not all children enter school equally prepared to learn. In spite of significant federal and state investment in compensatory programming, stubborn achievement gaps continue to plague large numbers of children from various ethnic, socioeconomic, and linguistic backgrounds (Moats & Foorman, 2008). For many, early language and literacy experiences remain widely disparate, with poor children least likely to receive quality language and cognitive input during early childhood (Hart & Risley, 1995; Wasik & Hendrickson, 2004). In their homes, these children receive less exposure to print and lower quality language input, including less varied and sophisticated vocabulary (Bus, van Ijzendoorn, & Pellegrini, 1995; Neuman, 2006). Because of this variation in the home environment, many children need high quality preschool and school environments with excellent instruction to be assured of reading success. Research shows that high quality intervention during preschool can mitigate the effects of disadvantage, with lasting benefits for language and literacy outcomes (Ackerman & Barnett, 2006).

In the 2006–2007 year, state-funded preschool education halted a disturbing trend in per-child funding, achieved key milestones in accessibility, and continued efforts towards higher quality standards. Unfortunately, behind the national averages still lies a troubling and persistent trend. An extensive literature shows that disadvantaged children still attend schools with fewer resources (Aikens & Barbarin, 2008; Al Otaiba, Kosanovich-Grek, Torgesen, Hassler, & Wahl, 2005), experience school routines that differ substantially from their home environments, are placed in classes with less experienced teachers, and are subjected to poor instruction (RAND, 2002). For these children, the common denominator is “less”—less opportunity, less experience, less interaction, less coherence in school curriculums, less knowledge of content, and less prior knowledge, all of which impair the acquisition of more complex skills and concepts (Simmons et al., 2008, p. 189). Thus, much of the difference in reading preparedness appears to be a question of opportunity rather than ability (Biemiller, 2006).

Responding to the challenges: The case for quality preschool instruction

As children get older, meaningfully altering long-term chances of academic success becomes increasingly difficult (Dickinson, McCabe, & Essex, 2006). One approach to forestall or reduce the achievement gap is by expanding access to high quality preschool services (Parkinson & Rowan, 2008). Preschool is a time when development of literacy skills is most responsive to intervention (Dickinson et al., 2006). Long-term studies of well-designed and well-executed early childhood development programs have established that participating children have higher scores on reading and math achievement tests, are less frequently identified for special education or remedial needs, evidence lower grade retention and dropout rates, and are more likely to pursue secondary education than peers from traditional programs (Lynch, 2006).

What do quality preschool programs look like? Effective preschool environments feature well-balanced materials, resources, and instruction that accelerate vocabulary and conceptual knowledge; provide access to a variety of books and writing materials; target learning domains supported by research; and have a specific focus on language development (Frede & Ackerman, 2007). Structurally, quality preschool programs have a variety of grouping patterns with multiple levels of guidance, orchestration of pacing and behavioral management, qualified staff, class sizes no larger than 20, and staff-child ratios of 1:10 (Ackerman & Barnett, 2006; Biemiller, 2006; Bowman, Donovan, & Burns, 2001; Neuman, 2006; Roskos, 2007; Xue & Meiseles, 2004).

Despite the potential of preschool, in 2005, only two-thirds of 4-year-olds and 40% of 3-year-olds were enrolled in some type of preschool education program. Further, most of those programs were of poor quality (Ackerman & Barnett, 2006; Barnett & Yarosz, 2007; Children’s Defense Fund, 2005; Dickinson et al., 2006). Surveys indicate that fewer than half the state-funded pre-K programs meet minimum quality standards (Committee for Economic Development, 2006). As a result, far too many children fail to receive the type of instruction necessary for optimal language and literacy development (Dickinson, McCabe, & Clark-Chiarelli, 2004). In addition, Hispanics, the fastest-growing and least academically prepared group in this country (Laosa & Ainsworth, 2007), were found to be least likely to receive any preschool education. When Hispanic children did enroll in pre-kindergarten, they were more likely to attend low-quality preschool programs with less-prepared teachers, fewer resources, larger classes, and higher child-to-teacher ratios (Laosa & Ainsworth, 2007).

In response to troubling trends and disappointing outcomes, educators, advocacy groups, policy makers, and researchers have called for changes in the early-literacy policy landscape. The urgent call is driven by the finding that expanding numbers of children are failing at one of development’s most important tasks-learning to read (Dickinson et al., 2004) and fueled by studies showing that quality early childhood programs (e.g., Perry Preschool Project, Abecedarian Early Childhood Intervention, Chicago Child-Parent Center Program) can mitigate the onset of learning disabilities (Lynch, 2006). As evidence accumulated that at least half the educational achievement gap between poor and non-poor children is evident prior to entry into kindergarten, school districts, states, and the federal government moved to improve the quality of preschool instruction.

Early Reading First (ERF) was one of the most aggressive initiatives to advance the high quality preschool agenda. Early Reading First was created to address the concern that many of the nation’s children begin kindergarten without the necessary foundations for success in reading. ERF provides competitive, multi-year grants to transform existing early education programs, most notably those serving low-income children and/or English language learners, into high quality centers that provide language and literacy rich environments. The overarching aim of ERF is to prepare preschool-aged children to enter kindergarten with the language, cognitive, and reading skills necessary for success in reading (United States Department of Education, 2008a).

The specific purposes of ERF were to: (a) encourage and support efforts to enhance early language, cognitive, and reading development of children from low-income families and/or English language learners (ELL) through strategies and professional development based on scientifically-based reading research (SBRR); (b) provide cognitively stimulating opportunities using high quality language and print-rich environments to foster knowledge and skills required for optimal learning; (c) incorporate language and literacy activities grounded by SBRR to support development of phonological awareness, oral language, print awareness, and alphabet knowledge; (d) use screening assessments to identify and monitor progress of preschool children at risk for reading failure; and (e) integrate SBRR materials and programs into existing preschool programs (United States Department of Education, 2008b). In this study, which is part of a larger research program assessing the impact of ERF preschool enrichment, we report on the effects of an ERF project in the third and final year of implementation of an ERF project.

Evaluating the impact of ERF-enriched schools

Despite the potential of the ERF program, there is little information on its effectiveness. The most ambitious effort in this area is the national evaluation of ERF mandated by the No Child Left Behind (NCLB) Act of 2001 (2002) (Jackson et al., 2007). Two primary questions addressed by the evaluation were (a) What is the impact of ERF on the language and literacy skills of children in schools receiving ERF support? and (b) What is the impact of ERF on the quality of language and literacy instruction and classroom environments that ERF grantees provide?

Jackson et al.’s (2007) evaluation showed that ERF-funded programs (n = 28) had a statistically significant positive effect on children’s print and letter knowledge compared to non-funded (n = 37) programs, but no discernable effect on phonological awareness, expressive vocabulary, or auditory comprehension. ERF also improved the general quality of the classroom environment and increased the number of hours of professional development received by teachers and the number of teachers receiving professional development (Jackson et al., 2007).

The comprehensive national evaluation by Jackson et al. (2007) provides a wealth of summary information on the characteristics and outcomes of ERF projects, but some details of the individual treatment (i.e., ERF projects) and comparison groups were lost in the process. Further, the regression discontinuity analyses that formed the foundation of the evaluation were based on funded and unfunded proposal application scores (Jackson et al., p. 7), which measured grant reviewers’ evaluations of factors such as the quality of the proposed activities and services and of the proposed project’s management and evaluation plans (Jackson et al., p. 3). It might be argued that the use of scores based on the quality of the application rather than on actual instructional conditions or practices calls the results of the evaluation into question.

To our knowledge, only three studies of the impact of ERF have been published in peer-reviewed journals: quasi-experimental studies by Martin et al. (2007) and Gray (2007); and a randomized clinical trial by Gettinger and Stoiber (2007). Martin et al. evaluated the effectiveness of instruction in an ERF project in its second year of implementation. Like other ERF projects, including those summarized below, it included professional development and instructional practices, materials, and environment grounded in SBRR. Martin et al. found that the difference between ERF and comparison children on the Peabody Picture Vocabulary Test-III (PPVT-III; Dunn & Dunn, 1997) oral language normalized curve equivalency (NCE) gain scores was not statistically significant. Differences between the two groups alphabet knowledge at the beginning of the year precluded meaningful comparisons between the two groups.

Gray (2007) examined the benefits of ERF instruction in seven classrooms selected from public school, Head Start, and for-profit day-care classrooms. Using ANCOVA to control for pretest scores, Gray found that by the end of preschool, ERF children had significantly higher total scores on the Phonological Awareness Literacy Screening-PreKindergarten (PALS-PreK; Invernizzi, Sullivan, Meier, & Swank, 2004). However, ANCOVAs revealed no statistically significant differences between the two groups on the Preschool Word and Print Awareness Assessment (PWPA; Justice & Ezell, 2002) Words in Print and Print Concepts raw scores, PPVT-III (Dunn & Dunn, 1997) standard scores, or the expressive vocabulary test (EVT; Williams, 1997).

In the final study, Gettinger and Stoiber (2007) reported on the effectiveness of an ERF project that shared many of the characteristics of the one being evaluated here, including the implementation of a three-tiered response-to-intervention model. They randomly assigned 30 Head Start classrooms with a total of 188 children to the ERF group and 20 classrooms with 154 children to the control group. ANCOVAs on posttest scores revealed significant differences on eight key components of early literacy acquisition consisting of PPVT-III (Dunn & Dunn, 1997), three individual growth and development indicators (IGDIs; Early Childhood Research Institute on Measuring Growth and Development, 1998), three PALS-PreK (Invernizzi et al., 2004) subtests, and a custom story retelling measure developed by the project staff. All tests proved to be statistically significant, with effect sizes ranging from .13 to .45.

In addition to the articles summarized above, there was an unpublished study of the impact of the ERF project evaluated in this report in its second year of implementation (Gonzalez, Hall, Goetz, & Payne, 2009). The setting, design, and characteristics of the children in that study were virtually identical to those of the present study; therefore, we will briefly summarize the findings. Analyses of variance and covariance revealed highly significant effects of instruction on Alphabet Knowledge and Name Writing subtests of the PALS-PreK (Invernizzi et al., 2004) compared to a practice-as-usual contrast group, but failed to provide any evidence of differences between the two groups on PPVT-III (Dunn & Dunn, 1997) scores.

Thus, the few available published studies of the impact of ERF-enriched instruction provide some support for the effectiveness of the programs being evaluated, but the results are mixed. Further, there are a number of limitations that temper conclusions drawn from these studies. Gray (2007) and Gettinger and Stoiber (2007) provide detailed accounts of the professional development, curricula, instructional activities, and classroom environments in the ERF program, but none of the studies provide much information about activities in the contrast classrooms. Gray is most forthcoming on this point, indicating that they did not have an early literacy program. She also reported pre and post early language and literacy classroom observation (ELLCO; Smith, Dickinson, Sangeorge, & Anastasopoulos, 2002) scores for both ERF and comparison classrooms. Further, length of the preschool day and the number of days per week for the non-ERF children varied or was not explicitly stated in any of the studies.

Small sample sizes, particularly for the children in the comparison group, limited the power of the statistical tests reported by Martin et al. (2007) and Gray (2007). Depending on the measure, Martin et al.’s analyses included data from about 110–120 ERF children, with about 30 in the comparison condition; for Gray, the total number of children with pre and posttest scores for any of the measures was 82 for ERF versus 38 for the comparison group. Gettinger and Stroiber’s samples were much larger, but the exact number of children whose data were included in specific analyses was not reported in any of the studies. In addition, Martin et al.’s (2007) ERF and comparison groups were not comparable. First, nearly all of the ERF children attended Head Start, Even Start, and state-funded public school programs, whereas all of the comparison children attended a single church-supported child care programs. Second, children who attended ERF-enriched preschools came from low-income families who qualified for free preschool services, whereas the families of children in the comparison group had to pay for their enrollment. Therefore, the contrast children were assumed to come from families with higher incomes, but demographic data, including SES and ethnicity, were not reported. Third, the entering skills (i.e., mean pretest scores) of ERF children were considerably lower on all measures than those of the comparison group (e.g., upper case letters recognized on the locally developed measure: ERF = 7, comparison = 18.5).

Statistical analyses also were problematic, especially in Martin et al. (2007). For example, the impact of ERF instruction on a PPVT-III (Dunn & Dunn, 1997) was evaluated using normal curve equivalency (NCE) gain scores and pre to post shifts in the distributions of students across stanines rather than standardized scores. Standard deviations were not reported for any measure, making directs test of the statistical significance of pretest and posttest differences impossible, and analysis of covariance was not used to correct for apparent differences in entering knowledge. Gray concluded that on the PWPA (Justice & Ezell, 2002), the ERF children had “closed the gap” (p. 27) between the contrast group on the basis of (a) a statistically significant difference favoring the contrast group in an ANOVA for pretest scores and (b) a nonsignificant ANCOVA on adjusted posttest scores. None of the studies used multilevel modeling to control for intra-class correlations; only Gettinger and Stoiber (2007) had a sufficient number of classrooms to make this feasible.

Study purpose

Given the limited number of studies on the effectiveness of ERF enriched instruction in the literature, the purpose of the present study was to provide additional evidence of its impact on preschool language and literacy development. Unlike previous studies, multilevel modeling was used to account for inter-correlations among the scores of students in the same classroom. The study compared the language and literacy development of children who were enrolled in ERF-enriched preschool programming to that of a demographically comparable contrast group from the same school district. The ethnic composition of the children in this study differed from that of previous studies. Martin et al. (2007) state that their program served primarily African American children, but as noted above, demographic statistics were not reported. In Gettinger and Stoiber (2007) study, more than 90% of the students were African American. More than half of the children Gray’s (2007) study were Hispanic, but there were almost no African American students. In the present study, 70–80% of the students were Hispanic and 20% were African American, making this the most diverse sample of minority children studied so far. The primary research questions addressed were as follows:

  1. 1.

    Does ERF enrichment enhance preschoolers’ acquisition of cognitive (e.g., alphabet knowledge, print concepts, oral language) skills essential to successful language and literacy development?

  2. 2.

    Does the effectiveness of ERF enrichment in promoting literacy related cognitive skills vary for different groups of children and in different contexts?

Method

Design

The study used a quasi-experimental pretest–posttest design to compare the performance of preschool children participating in the ERF project with that of a contrast group of children from the same school district receiving instruction typical of the school district during the third year of the ERF project implementation (2007–2008). The evaluation plan for the project specified in the grant application included comparison of the literacy learning of children who received the ERF enriched instruction to children who received the usual preschool instruction currently being offered in the school district. Whereas the preschool instruction offered at that time consisted of only a half day of instruction, ERF mandates a full day of instruction. Therefore, it was not possible to control for the length of instruction. Although this design made separation of the effectiveness of ERF-enriched instructional methods and classroom environment from the amount of instruction unattainable, it does provide a valid test of the impact of this ERF project, taken as a whole, against “practice-as-usual” instruction in the same school district. In order to address this constraint, comparison of the effectiveness of instruction in this ERF project to that of full-day preschool programs without ERF enrichment in other studies is presented in the discussion. It should be noted, however, that the half-day contrast school is not atypical of US schools: In 2004, nearly half of all preschools in this country offered half-day instruction (National Center for Education Statistics, 2008).

Participants

This report is based on data used in the evaluation of the project: therefore, following ERF guidelines, only data from “age-eligible children,” defined as 4 years of age as of September 1 in the year they entered preschool, were analyzed. Participants were preschool children from two multi-ethnic public schools in the same school district located in a Southwestern state. Both schools were also located in a county experiencing large-scale growth in low-income Hispanic families. The contrast school was nominated by school district administration based on matching the ERF school on percentage of children qualifying for free or reduced-price lunch. Further, although the ERF assessment battery was administered to all qualifying children participating in the ERF program, parental consent was required for children in the contrast school. Table 1 summarizes the demographic statistics of age-eligible children for whom both pretest and posttest data were available for at least one of the assessments at the ERF and contrast campuses. Table 2 summarizes ERF and contrast teacher group characteristics. Both ERF and contrast teachers had participated in project data collection for two years. As shown, the two teacher groups were relatively comparable, although a lower proportion of ERF teachers had early childhood certification.

Table 1 Demographic characteristics of contrast and ERF children included and not included (Not Incl) in the multilevel analyses (%)
Table 2 Teacher characteristics

Due to the relatively lower participation rate in children attending the contrast campus, Chi square analyses were conducted to determine if the students for whom data were available were representative of all children attending the school. Analyses comparing children included in at least one analysis versus those missing from all analyses did not reveal any significant differences on demographic variables. More relevant to the current discussion, however, is the comparison between ERF and Contrast children whose data were included in the analyses. In this instance, differences in representation across ethnicity were statistically significant, χ2 = 12.47, p = .002, Φ = 0.22, with proportionately higher representation of Hispanic students and lower representation of Caucasian students in the ERF group. This might be interpreted as suggesting higher English language skills among the Contrast students, providing a conservative test of ERF.

Classroom placement and instructional format for ELL and ELP students

In this report, the terms English language learner (ELL) and English language proficient (ELP) are used to designate children whose level of English proficiency dictated the language(s) in which they received instruction following school district policies and procedures. Placement in bilingual classrooms was based on a multi-staged process that included family surveys and testing with the Pre-LAS 2000 English and Spanish versions (Duncan & De Avila, 1998). For students in bilingual classrooms (i.e., ELL), 80% of the instructional time was devoted to instruction in Spanish.

Early Reading First classrooms

Classroom settings

Of the eight ERF teachers, five offered bilingual education and three used English only. Class sizes ranged from 14 to 22 for the bilingual classes and 15–21 for the English-only classes. Each class had one language technician. Language technicians were paraprofessionals assigned to work with teachers who had completed professional development. The language technicians along with the teachers participated in the beginning of the year staff development. Language technicians were observed once a month by the teacher, and provided with corrective feedback.

Teacher professional development

The ERF teachers and language technicians participated in seven days of staff development throughout the 2007 academic year: six half-days prior to the start of the academic year, and one full day at the end. The staff development included topics that focused on classroom management, instructional approaches, and instructional goal setting and planning for the upcoming academic year. The professional development (PD) was organized around SBRR principles on early language, literacy, reading development, and classroom practices. Goals included increasing teachers’ conceptual knowledge and understanding of SBRR, enhancing understanding and implementation of language and literacy curricula and strategies, improving instructional practices, and accelerating the vocabulary, background knowledge, and literacy of preschoolers.

University personnel worked with teachers and paraprofessionals to specify the knowledge to be acquired through PD taking care to clarify why application of the knowledge was important. ERF and university personnel identified the most appropriate formats for presentation of knowledge and contracted with the Center for Improving the Readiness of Children for Learning and Education (CIRCLE; Landry, Anthony, Swank, & Monseque-Bailey, 2009; Landry, Swank, Smith, Assel, & Gunnewig, 2006) to provide PD. The PD provided by CIRCLE is based on the most recent scientific research and meets the needs of adult learners. Training focused on classroom arrangement language, emotional growth, maintaining children’s interest, read aloud, and using themes. Teachers came to interactive training sessions that provided hands-on activities and ideas for teachers to immediately take back and implement in their classrooms. Upon returning to classrooms, teachers had a deeper understanding of early literacy development and ideas for implementation.

Classroom instruction

The ERF school used an RtI model of general education service delivery (Fuchs & Fuchs, 2006; Fuchs, Mock, Morgan, & Young, 2003; Gresham, 2007) focusing on matching instruction to student needs and progress monitoring with ongoing data-based decision making (Vaughn, Linan-Thompson, & Hickman, 2003). Three successively more intensive tiers of instruction were used to identify students with early academic readiness skill problems and providing a safety net for struggling students (Vaughn, Wanzek, Woodruff, & Linan-Thompson, 2007). Through its use of a series of successively more intense SBRR curricula, the RtI model placed greater emphasis on more intense practice with early literacy skills for successively smaller groups of children who performed below peers on progress-monitoring data. In each tier, curricula were structured around a common core set of skills identified by ERF and for which there was scientific consensus on their importance to later reading (Gettinger & Stoiber, 2007).

Tier I was general education readiness skill instruction provided to all preschoolers. Tier II instruction was provided only to those students who showed problems based on universal screening measures or demonstrated weak or no progress to Tier I core curriculum instruction. In Tier II, students received targeted supplemental instruction in groups ranging from three-to-five students. Tier III instruction was more concentrated and provided individualized and small group instruction (i.e., one-to-three students) for those who did not show progress after two six-week cycles in Tier II.

Universal screening is the crucial first step in identifying students who are at high risk of skill deficits and may be in need of more targeted instruction. Universal screening was conducted September 15 through October 15, 2007 using the Alphabet Knowledge and Name Writing subscales of the PALS-Pre-K and the PPVT-III. Where available, ERF benchmarks were used to distinguish between preschoolers likely to make satisfactory or unsatisfactory progress towards year end goals without further assistance. Subsequently, students’ response to instruction on phonological awareness, alphabet knowledge, oral language and print concepts was monitored using teacher-administered district-developed progress monitoring probes to determine if they had made adequate progress and either: (a) no longer needed intervention, (b) continued to need some intervention, or (c) needed even more concentrated intervention.

Curriculum

The Scholastic Early Childhood Program (SECP), the Tier I curriculum, addressed children’s school readiness skills through its focus on language and literacy, integration with mathematics, social studies, arts, physical development, and personal and social development (Block, Canizares, Church, & Lobo, 2008). The SECP curriculum was supplemented with Let’s Begin with the Letter People ® (Abrams & Company, 2000). Let’s Begin with the Letter People ® is an early education curriculum organized around thematic units to develop children’s language and literacy skills. Both curriculums were teacher-delivered.

In the fall of 2007, after six weeks SECP instruction, teachers compared students in their classrooms to local school district developed goals for alphabet knowledge, phonological awareness, oral language, and print concepts. Students who were unresponsive to Tier I instruction received differentiated instruction (e.g., more intense practice opportunities on deficit skill) before proceeding to Tier II. After a second six weeks, 45% (n = 59) of the students were unresponsive to Tier I proceeded to Tier II and provided with an additional 30 minutes daily intensive, and explicit instruction delivered by teachers or trained language technicians in groups of three-to-five or less students.

Building Language for Literacy (Neuman, Snow, & Canizares, 2008), the Tier II curriculum, is a research-based intensive program emphasizing systematic letter/sound instruction, writing, and high-frequency word use. Oral language activities included extended vocabulary, contextual speech and syntax use, and oral comprehension abilities (Beck, McKeown, & Kucan, 2002; Biemiller, 2003). Tier II also included Pebble Soup (Harcourt Supplemental Publishers, 2008) for English Language Learners (ELL) who needed extended opportunities to practice English and Spanish oral language, listening comprehension, vocabulary, and verbal expression. Pebble Soup comes in English and Spanish and focuses on real-world themes interesting to young children.

Students in Tier II were grouped homogeneously on the deficit skill (e.g., alphabet knowledge, oral language, print concepts), and their progress was monitored on a weekly basis. Following a six-week period of teaching to mastery on core skills, progress monitoring data were used to regroup students to maintain group homogeneity. None of the preschoolers in the first round of Tier II (n = 59) made adequate progress, so they were exposed to a second cycle of Tier II intervention. After a second cycle of Tier II instruction, 58% of the students (n = 34) progressed to the point of moving back to Tier I. Forty-two percent of the students in the second cycle of Tier II (n = 25) did not make sufficient progress, so they were automatically provided one-to-one or small group instruction in Tier III.

In Tier III, students received an additional 30 minutes of one-to-one or small group (fewer than three students) concentrated instruction provided by the language technicians. Tier III instruction was a more intensive combination than provided in Tiers I and II. Instruction was intensified by focusing on one or two high priority core skills (e.g., alphabet knowledge) during lessons, extensive practice, and immediate quality feedback (Gersten et al., 2008). Because of the individualized nature of Tier III instruction, university school psychology personnel participated in problem-solving consultation with school-district personnel to identify ways of intensifying instruction (e.g., material supports). Throughout the day, students received ~90 min of language arts instruction. Twenty-five students (19%) received Tier I and Tier II instruction for the entire year. Three students (2%) received Tier III instruction through the remainder of the year.

Classroom environment observation

The quality of treatment in ERF classrooms was assessed with the teacher behavior rating scale (TBRS; Landry, Crawford, Gunnewig, & Swank, 2002), a 63-item measure that codes the frequency and quality of SBRR instructional practices that promote general language and literacy development, such as the enrichment activities included in the ERF teachers’ training. Items with 3- and 4-point Likert scales measure quantity and quality, respectively, of each characteristic. The TBRS was used at pre- and posttest by two trained observers. It was completed in a three hour classroom observation during the third year of the projects implementation. The TBRS consists of 11 subscales and a total score, with alpha reliability coefficients ranging from .63 to .97 (Landry et al., 2002).

Because of the high correlations between the quality and quantity dimensions in the TBRS (from .74 for Lesson Plans/Dynamic Assessment/Portfolios to .99 for Language Use) (Jackson et al., 2007), items were averaged to create single-item and subscale scores for this study. Mean subscale scores averaged across all ERF enriched classrooms were as follows: Classroom Community, 94%; Sensitivity, 87%; Lesson Plans/Dynamic Assessment/Portfolios, 87%; Centers, 93%; Book Reading Behaviors, 77%; Print and Letter Knowledge, 86%; Math Concepts, 66%; Phonological Awareness, 41%; Written Expression, 59%; Oral Language Use with Students, 84%; Team Teaching, 68%; and Total, 84%, indicating a moderate to high level of fidelity of treatment for most ERF program components.

In summary, ERF instruction occurred in the context of a three-tiered general education service delivery model. The three-tiered model is an intervention process designed to detect early struggling preschoolers before they fall behind. Each tier provided successively more intense curricula beginning with whole class Tier I instruction (Scholastic Early Childhood Program), more intense supplemental Tier II instruction (Building Language for Literacy; Pebble Soup) in groups of three-to-five children, and Tier III (combination Tier I and Tier II) concentrated instruction for one-to-three children. Movement through the tiers was designed as a dynamic process with students entering and exiting tiers as needed. Progress monitoring was used to track student progress and also inform instruction. Next we discuss the contrast condition.

Contrast classrooms

Classroom settings

Contrast classrooms were located at a separate school in the same school district. Each of the 11 contrast teachers taught two classes (morning and afternoon), for a total of 22 classes. Of the 22 contrast classes, 10 offered bilingual education and 12 used English only. Class sizes ranged from 15 to 21 for the bilingual classes and 15–22 for the English-only classes. In the bilingual classes, there were one or two teacher’s aides; English-only classes had one teacher’s aide.

Teacher training

The teachers participated in five half-days of staff development before the start of the 2007 academic year, with one additional full day of staff development at the end of the academic year. The staff development consisted of five six-hour workshops covering a continuum of meaning-based and skill-based instructional approaches aimed at enhancing children’s cognitive, language and early reading development, particularly low-income and ELL children. The teachers also received approximately 36 hour of additional staff development throughout the year that focused on maintaining accountability and rigor with the instructional content aligned with the state Pre-K guidelines.

Contrast teachers also met weekly for one hour instructional planning sessions led by the school principal. The aim of each session was to assist teachers in making informed decisions about (a) curriculum content, (b) accommodations and modifications for children with disabilities, (c) strategies for ELL children, (d) design of print-rich classrooms, and (e) thematic units that link language, reading, and writing.

Instruction

Instruction in the contrast classrooms addressed the same general skills as that of the ERF classrooms, but the teachers approached them in individual ways. Like the ERF-enriched classrooms, the contrast classrooms used the SECP program. While no other curricular programs were identified, teachers used a variety of supplementary approaches (e.g., repeated readings) and strategies to provide extra instruction to struggling preschoolers. No specialized additional curricula were used neither was there any progress monitoring.

Measures

Peabody Picture Vocabulary Test, third edition

The Peabody Picture Vocabulary Test (PPVT-III; Dunn & Dunn, 1997) is recommended for use in educational and clinical settings to measure receptive vocabulary and to screen for English language ability and general language development. On the PPVT-III, the child is asked to point to one of four pictures on a panel that represents an object or action named by the examiner. The test consists of 204 progressively more difficult items recommended for ages 2 through 99. It generally takes 10–15 min to administer. Scores are age-based standard scores (M = 100, SD = 15). Reported alpha and split-half reliability coefficients are in the range of 0.86–0.98 for both forms A and B (Dunn & Dunn, 1997).

Phonological awareness literacy screening—prekindergarten

The PALS-PreK (Invernizzi et al., 2004) is a scientifically based phonological awareness and literacy screening that measures developing knowledge of important literacy fundamentals. In the evaluation of this ERF program, the PALS-PreK Alphabet Knowledge and Name Writing subtests were used.

In the Alphabet Knowledge subtest, the teacher asks the child to name the 26 upper-case letters of the alphabet presented in random order. Children who know 16 or more upper-case letters also take the lower-case alphabet recognition task. Children who know nine or more lower-case letters are also asked to produce the sounds associated with the 23 letters and three consonant digraphs. Combined, both tests take ~10–15 min to administer.

The interrater reliability of both scales was .99. Concurrent validity ranged from .41 to .71 with other exiting measures of phonological awareness. Predictive validity estimates ranged from .53 to .56 (Invernizzi et al., 2004).

Print concepts: name writing

In Name Writing, the teacher asks the child to draw a self-portrait and to write his/her name. Responses are rated on an 8-point scale: 0 = “Name is a scribble and the picture represents both child’s picture and written name,” 1 = “Name is a scribble intertwined with picture. The child identifies the picture or part of the picture as his/her name;” 2 = “Name is an unrecognizable scribble but name is separate from picture,” 3 = “Name consists of random letters and symbols. Name is separate from picture;” 4 = “Name consists of some correct letter and possibly some filler letters or symbols. The name is separate from picture;” 5 = “Name consists of many correct letters with no filler letter or symbols. The name is separate from picture,” 6 = “Name is generally correct and is separate from picture. Some letter may be written backwards or name may be completely written in mirror image;” and 7 = “Name is correct with no backward letter or mirror image writing. The name is separate from picture.”

Procedure

Students on both sites were tested in the fall of 2007 and spring of 2008 using the PPVT-III (Dunn & Dunn, 1997) and the Alphabet Knowledge and Name Writing subtests of the PALS-PreK (Invernizzi et al., 2004). All measures were administered by graduate students, teachers, or aides who had been trained to proficiency on the standardized testing procedures. Administration took approximately 35 min. Each test protocol was scored twice, once by the individual who conducted the initial assessment and once by another examiner. Any discrepancies in scoring were resolved though a third examiner or a senior project investigator.

Results

Because the study was a quasi-experiment, contrast and ERF students were compared at pretest to see whether they were different. Students were compared on pretest measures of the three outcomes of interest (PPVT-III, PALS-PreK Alphabet Knowledge and Name Writing subtests), as well as on demographic variables. Significant differences were found for name writing (students assigned to ERF scored 0.6 points higher than comparison students) and for ethnicity (students assigned to ERF were more likely to be Hispanic and less likely to be Caucasian). To control for these differences, pretest scores for each of the three measures were entered as covariates in models of the corresponding measures at posttest, and demographic variables were entered as covariates in all models.

The two research questions were investigated using multilevel modeling (Hox, 2002) analyses. Multilevel models were chosen to take into account the non-independence of students in the same classroom in testing the effects of ERF. The SPSS mixed procedure was used to estimate the multilevel models, and Hedges’ δT was used to measure effect size (Hedges, 2007).

The first research question dealt with the overall effectiveness of ERF-enriched instruction in enhancing student learning. The following equations describe the main-effects models that were used to investigate this question.

Level 1 (student level) main-effects model

$$ \begin{aligned} {\text{Posttest}}_{\text{ij}} = & \beta_{{0{\text{j}}}} + \beta_{{1{\text{j}}}} \;{\text{Pretest}}_{\text{ij}} + \beta_{{2{\text{j}}}} \;{\text{Gender}}_{\text{ij}} + \beta_{{3{\text{j}}}} \;{\text{English}}\;{\text{proficiency}}_{\text{ij}} \\ & + \beta_{{4{\text{j}}}} \;{\text{African-American}}_{\text{ij}} + \beta_{{5{\text{j}}}} \;{\text{Hispanic}}_{\text{ij}} + \beta_{{6{\text{j}}}} \;{\text{Absences}}_{\text{ij}} + {\text{e}}_{\text{ij}} \\ \end{aligned} $$

where i = 1, …, 249th student and j = 1, …, 19th group, with teacher as the grouping variable. The model was estimated separately for each of the three measures of interest (PPVT-III, PALS-PreK Alphabet Knowledge and Name Writing subtests), with the posttest measure as the outcome variable and the corresponding pretest measure plus other demographic variables, including gender, English proficiency, and ethnicity entered as covariates. Number of absences was also included to control for the dosage effect.

Level-2 (group level) main-effects model

$$ \begin{aligned} \beta_{{0{\text{j}}}} = & \gamma_{00} + \gamma_{01} \;{\text{Instruction}}_{\text{j}} + \gamma_{02} \;{\text{Class pretest}}\;{\text{mean}} + \gamma_{03} \;{\text{Teacher}}\;{\text{years}}\;{\text{experience}} \\ & + \gamma_{04} \;{\text{Teacher}}\;{\text{number}}\;{\text{of}}\;{\text{certifications}} + {\text{u}}_{{0{\text{j}}}} \\ \end{aligned} $$
$$ \begin{gathered} \beta_{{ 1 {\text{j}}}} = \gamma_{ 10} \hfill \\ \beta_{{ 2 {\text{j}}}} = \gamma_{20} \hfill \\ \beta_{{ 3 {\text{j}}}} = \gamma_{30} \hfill \\ \beta_{{ 4 {\text{j}}}} = \gamma_{40} \hfill \\ \beta_{{ 5 {\text{j}}}} = \gamma_{50} \hfill \\ \beta_{{ 6 {\text{j}}}} = \gamma_{60} \hfill \\ \end{gathered} $$

The instruction effect (Contrast = 0, ERF = 1) was included in the level-2 model because participation in ERF varied between teachers rather than between students taught by the same teacher. Classroom means were calculated on the pretest scores of each measure of interest to find whether classes differed on average before the study. None of the differences were significant (all p > .10), but given that these were relatively low power tests (at the classroom level, so n = 19), mean pretest scores were entered as covariates in the level-2 models to control for any differences. Similarly, teacher’s years experience teaching and their number of different certifications held did not differ significantly between conditions (all p > .10), but these were also entered as covariates in the level-2 models. The overall main-effects model was constructed by substituting the level-2 model equations into the level-1 model equation, as shown below.

Overall main-effects model

$$ \begin{aligned} {\text{Posttest}}_{\text{ij}} = & \gamma_{00} + \gamma_{ 10} \;{\text{Pretest}}_{\text{ij}} + \gamma_{ 20} \;{\text{Gender}}_{\text{ij}} + \gamma_{ 30} \;{\text{English}}\;{\text{proficiency}}_{\text{ij}} + \gamma_{ 40} \;{\text{African-American}}_{\text{ij}} \\ & + \gamma_{ 50} \;{\text{Hispanic}}_{\text{ij}} + \gamma_{ 60} \;{\text{Absences}}_{\text{ij}} + \gamma_{0 1} \;{\text{Instruction}}_{\text{j}} + \gamma_{0 2} \;{\text{Classpretest}}\;{\text{mean}}_{\text{j}} \\ & + \gamma_{0 3} \;{\text{Teacher}}\;{\text{years}}\;{\text{experience}}_{\text{j}} + \gamma_{0 4} \;{\text{Teacher}}\;{\text{number}}\;{\text{of}}\;{\text{certifications}}_{\text{j}} + {\text{e}}_{\text{ij}} + {\text{u}}_{{0{\text{j}}}} \\ \end{aligned} $$

In this model, γ01 describes the instruction effect controlling for all the covariates. The test of this main effect was used to answer the first research question concerning the overall effect of ERF-enriched instruction.

The second research question, which concerned the generalizability of benefits from ERF-enriched instruction across student and contextual characteristics, was tested by adding interaction terms between instruction and student variables (including demographic variables and the appropriate pretest score) and between instruction and the other classroom level variables (mean pretest score and teacher experience and certifications). Each of the variables that was entered into an interaction with instruction (except absences, which has a meaningful zero) was centered at its sample mean so that the main effect of the ERF versus Contrast predictor in these models would describe a student of average age and ethnicity, who was never absent, in a classroom of average ability, with a teacher having average experience and certifications. Because the effect of ERF-enriched instruction on the most disadvantaged children was of particular interest, pretest scores on each of the measures were centered at their minimum possible values so that the main effect of instruction in these models would describe the effect of instruction for a student scoring at the minimum at pretest. These minimums were 40 for the PPVT-III and zero for the Alphabet Knowledge and Print Concepts measures. One other change made in the interaction models was that ethnicity was reduced to a single dummy code contrasting Hispanic children with African American and Caucasian children. This was done because the small number of Caucasian children would have resulted in very small numbers of children being used to estimate interactions when it was crossed with instruction. The following equation describes the overall model for the second research question (note that letters are used in place of numbers in some of the coefficient subscripts to avoid making the model appear to have three levels).

Overall interactions model

$$ \begin{aligned} {\text{Posttest}}_{\text{ij}} = & \gamma_{00} + \gamma_{ 10} \;{\text{Pretest}}_{\text{ij}} + \gamma_{ 20} \;{\text{Gender}}_{\text{ij}} + \gamma_{ 30} \;{\text{English}}\;{\text{proficiency}}_{\text{ij}} + \gamma_{ 50} \;{\text{Hispanic}}_{\text{ij}} \\ & + \gamma_{ 60} \;{\text{Absences}}_{\text{ij}} + \gamma_{0 1} \;{\text{Instruction}}_{\text{j}} + \gamma_{0 2} \;{\text{Class}}\;{\text{pretest}}\;{\text{mean}}_{\text{j}} + \gamma_{0 3} \;{\text{Teacher}}\;{\text{years}}\;{\text{experience}}_{\text{j}} \\ & + \gamma_{0 4} \;{\text{Teacher}}\;{\text{certifications}}_{\text{j}} + \gamma_{0 5} \;{\text{Instruction}}_{\text{j}} \times {\text{Pretest}}_{\text{j}} + \gamma_{0 6} \;{\text{Instruction}}_{\text{j}} \times {\text{Gender}}_{\text{j}} \\ & + \gamma_{0 7} \;{\text{Instruction}}_{\text{j}} \times {\text{English}}\;{\text{proficiency}}_{\text{j}} + \gamma_{0 8} \;{\text{Instruction}}_{\text{j}} \times {\text{Hispanic}}_{\text{j}} \\ & + \gamma_{0 9} \;{\text{Instruction}}_{\text{j}} \times {\text{Absences}}_{\text{j}} + \gamma_{{0{\text{A}}}} \;{\text{Instruction}}_{\text{j}} \times {\text{Class}}\;{\text{pretest}}\;{\text{mean}}_{\text{j}} \\ & + \gamma_{{0{\text{B}}}} \;{\text{Instruction}}_{\text{j}} \times {\text{Teacher}}\;{\text{years}}\;{\text{experience}}_{\text{j}} + \gamma_{{0{\text{C}}}} \;{\text{Instruction}}_{\text{j}} \times {\text{Teacher}}\;{\text{certifications}}_{\text{j}} + {\text{e}}_{\text{ij}} + {\text{u}}_{{0{\text{j}}}} \\ \end{aligned} $$

Parameter estimates for the main-effects models are presented in Table 3, and those for the main-effects plus interactions models are presented in Table 4. Multilevel models require complete data on all student variables for inclusion in the analyses. A total of 249 students—133 in the contrast condition and 116 in the ERF condition—were tested on at least one measure, but not all of these students were available for all the models because of missing data (e.g., a few students were classified as untestable by the PPVT-III). Each model was estimated using all available complete cases. The sample size of contrast and ERF children is given with the results of each model below.

Table 3 Parameter estimates for main-effects multilevel models for PPVT-III and PALS-PreK upper case alphabet knowledge and name writing subtests
Table 4 Parameter estimates for interaction-effects multilevel models for PPVT-III and PALS-PreK upper case alphabet knowledge and name writing subtests

Receptive vocabulary

The children’s receptive vocabulary was evaluated using the PPVT-III. The means and standard deviations for all children who had pre- and posttest data are presented in Table 5. Data from 96 contrast and 90 ERF children were available for the analyses. As noted above, missing data, which caused students to be excluded from inclusion in the analysis, occurred because students (a) were not enrolled or were absent during at least one of the testing periods, (b) were untestable given PPVT-III administration procedures, (c) received special education services, or (d) had missing data (e.g., number of absences from school).

Table 5 Means and standard deviations for PPVT-III and PALS-PreK upper case alphabet knowledge and name writing subtest scores for contrast and ERF children

In the main-effects model, the effect of ERF-enriched instruction on the PPVT-III posttest was positive and statistically significant, γ01 = 5.11, p = .044, δT = 0.27, after controlling for all covariates. Of the student variables, only pretest score made a statistically significant contribution to the prediction of posttest scores. Of the classroom variables, class pretest mean also significantly predicted students’ posttest scores, over and above the effects of their individual pretest scores, γ02 = 0.44, p < .001.

In the interaction effects model, no interactions were statistically significant. The main effect of instruction (which in the interaction model represents the effect of instruction for students at the minimum at pretest) was statistically significant, γ01 = 11.54, p = .028, δT = 0.61, indicating that for students scoring at the minimum possible on the PPVT-III at pretest, a year of ERF enriched instruction increased PPVT-III scores by over two-thirds of a standard deviation compared with those of the Contrast group. To summarize, the main-effects analysis found a significant effect of the instruction provided in this ERF project on receptive vocabulary as measured by the PPVT-III.

These results are consistent with the findings of exploratory analyses. The distributions of the PPVT-III scores are depicted in the boxplots presented in Fig. 1. It is worth noting that for the Contrast group, the score corresponding to the first quartile did not change appreciably from pretest to posttest. On the other hand, the posttest score of the bottom quartile of the ERF group evidenced dramatic improvement, roughly equaling that of the overall pretest median and approaching the median posttest score of the Contrast children. Similarly, the pretest–posttest increase in the median for the ERF children, although not as dramatic as that for the lowest quartile, exceeded that of the of Contrast group, with the median posttest score being equivalent to the third quartile for the pretest. Consequently, while the spread of typical scores (i.e., middle 50%) for the Contrast group was slightly larger at posttest than at pretest, it shrank markedly for the ERF group, primarily as a result of increases at the lower end of the range.

Fig. 1
figure 1

Distributions of PPVT-III pretest and posttest standard scores for Contrast (n = 96) and ERF (n = 90) children included in the multilevel analyses

Alphabet knowledge

The children’s alphabet knowledge was assessed using the PALS-PreK Alphabet Knowledge subtest (Invernizzi et al., 2004). Due to the low performance of the children, only upper-case letter recognition is reported here. For this measure (and for the analysis of the PALS-PreK Name Writing subtest reported below), complete data for the analyses were available for 132 contrast children and 116 ERF children. Means and standard deviations for pretest and posttest scores are shown in Table 5.

In the main-effects model, a statistically significant difference was found between ERF students and those in the Contrast group, γ01 = 11.77, p < .001, δT = 1.19. Thus, there was a strong positive effect of ERF-enriched instruction on children’s alphabet knowledge, as ERF children learned an average of nearly 12 more letters than did those receiving preschool instruction typical of the school district.

Among the student variables, the only statistically significant predictor other than pretest was English proficiency, γ30 = 2.77, p = .015, indicating that students identified as English proficient knew more alphabet letters at posttest than did students who were identified as English language learners.

This difference between the two preschool programs is evident in the frequency distributions for ERF and Contrast children presented in Fig. 2a and b. As shown in Fig. 2a, most Contrast children correctly recognized fewer than three letters at pretest; by posttest, fewer students performed at such a low level, but few exhibited high levels of upper case letter recognition. At pretest, the performance of ERF children (Fig. 2b) was slightly worse than that of the contrast children. By posttest, however, most of the ERF children had essentially mastered the alphabet.

Fig. 2
figure 2

Number of pretest and posttest upper-case letters correctly recognized on the PALS-PreK Alphabet Knowledge subtest by Contrast (n = 132) and ERF (n = 116) children included in the multilevel analyses. a Contrast, b ERF

In the interaction effects model, the interaction between instruction and the Alphabet Knowledge pretest was significant, γ11 = −0.61, p < .001, δT = −.06, indicating that the effect of ERF-enriched instruction was greatest for the students who knew the fewest letters at the beginning of preschool (see Fig. 3). Further, the main effect of instruction (which in the interaction model represents the effect of instruction for students at the minimum at pretest) was significant, γ01 = 12.97, p < .001, δT = 1.31, indicating that ERF children who started the year unable to correctly identify a single letter learned an average of 13 more letters than did contrast children starting in the same position. The interaction of instruction and number of teacher certifications was significant and positive, γ0C = 6.55, p < .001, but the main effect of number of teacher certifications (which in the interaction model represents the effect of number of teacher certifications for students in the contrast group) was nearly the same magnitude but negative, γ04 = −5.33, p = .001. Taken together, these results suggest that the effect of number of teacher certifications was near zero for ERF students, but was significantly negative for comparison students. Note, however, that as this is an interaction between two level-2 variables, it is based on a sample size of only the 19 teachers, so it may be an artifact of the small sample. None of the other interactions between instruction and student variables were statistically significant, suggesting that the instruction was equally effective for all student groups. As in the main-effects model, English proficiency (which in the interaction effects model represents the effect of English proficiency for students in the contrast group), γ11 = 3.61, p = .007, was the only statistically significant predictor among the student variables other than pretest score.

Fig. 3
figure 3

Model-estimated mean posttest PALS-PreK Alphabet Knowledge subtest scores as a function of pretest score and ERF versus contrast group

Thus, the results of these analyses provide evidence of the effectiveness of the ERF-enriched instruction provided by this project when compared to usual practice for increasing preschool children’s knowledge of the alphabet. The turnaround of the ERF children, from knowing few or no letters at pretest to knowing most or all of them by the end of the year, was dramatic.

Print concepts

The children’s print concepts were tested with the PALS-PreK Name Writing subtest (Invernizzi et al., 2004). Means and standard deviations for this subtest are shown in Table 5. In the main-effects model for this subtest, the overall effect of ERF-enriched instruction was found to be statistically significant, γ01 = 0.84, p = .034, δT = 0.46, indicating that children receiving ERF-enriched instruction scored higher than did children in the control condition. Among the covariates, in addition to the significant prediction by pretest score, gender was also a statistically significant predictor of posttest performance, γ20 = .44, p = .044, indicating that girls were better at writing their names than were boys.

In the interactions model, none of the interactions of instruction with any of the covariates were significant, but the main effect of instruction (which in the interaction model represents the effect of instruction for students at the minimum at pretest) was significant, γ01 = 1.18, p = .016, δT = 0.36, indicating that for children who scored zero on the PALS-PreK print concepts pretest, the program’s effect was significant and positive. That there were no significant interactions between instruction and any of the demographic variables, indicates the ERF instruction tended to benefit all groups of children equally. Thus, ERF-enriched instruction produced dramaticgains in children’s knowledge of print concepts, as illustrated in Fig. 4.

Fig. 4
figure 4

Distribution of pretest and posttest PALS-PreK Name Writing Scores for Contrast (n = 132) and ERF (n = 116) children included in the multilevel analysis. a Contrast, b ERF

Discussion

Findings and implications

This study evaluated the impact of a project providing ERF enrichment in its third year of implementation on preschooler’s early language and literacy skills. ERF preschool students’ progress was compared to that of a demographically similar contrast group of children enrolled in a “practice-as-usual” school. As reported above, the analyses of key literacy skills produced clear and positive results, which are summarized below along with their implications.

Receptive vocabulary

In the present study, ERF oral language enrichment activities were purposeful, with teaching of vocabulary through successively more intense and extended encounters, as needed, with purposeful curricula targeting oral language development. Children’s oral language needs were addressed through multiple opportunities to develop rapid access to representations of word meanings using curricular activities presented in everyday language. The finding that ERF improved receptive vocabulary most for those with the lowest initial vocabularies has educationally meaningful implications. Research shows that vocabulary has significant effects on reading comprehension (Stahl & Fairbanks, 2006) accounting for as much of as 66% of the variance (Biemiller, 2003). Well conducted studies also show that oral language at age four plays a key role in first grade reading competence both directly and indirectly (NICHD Early Child Care Research Network, 2005).

Despite the success of ERF in fostering language acquisition, ERF preschoolers still performed on average more than one standard deviation below national norms for the PPVT-III. Our findings are consistent with prior research showing that children brought up in economically disadvantaged homes (as in the present study), tend to score ≥1 SD below the mean on receptive vocabulary measures (Whitehurst, 1997). It is possible that limited experiences with language and literacy resulted in preschoolers having difficulty in responding to decontextualized tasks and demands such as those presented in the PPVT-III (Restrepo et al., 2006).

Alphabet knowledge

The impact of this ERF project was particularly notable for alphabet knowledge: by the end of preschool, ERF children recognized an average of 22 upper-case letters, compared to only 9 for the contrast children. Alphabet knowledge is considered the “anchor” for the entire reading “system” (van Kleeck, 2003, p. 301), and knowledge of letters, their functions, shapes, sounds, and names plays an important and predictive role in reading and spelling development (Treiman, Pennington, Shriberg, & Boada, 2008).

Given that alphabet knowledge appears to be the strongest and most robust predictor of later reading achievement (Craig & Washington, 2004), it is noteworthy that ERF children’s letter-naming fluency grew dramatically. Children generally learn the names of letters prior to their sounds. Learning the letter names enables children to learn the sounds that relate to the written symbols and enhances levels of phonological sensitivity (Foorman, Anthony, Seals, & Mouzaki, 2002). Knowing the letters enables children to detect and manipulate phonemes, which plays an influential role in phonological awareness (Whitehurst & Lonigan, 1998). Finally, a better grasp of alphabet knowledge enables children to translate the written word into spoken language, an essential prerequisite for reading (Gunn, Simmons, & Kameenui, 1998).

Print concepts

Children’s knowledge of print concepts was assessed using the PALS-PreK Name Writing subtest (Invernizzi et al., 2004). As with the other crucial precursors of future reading acquisition, multilevel analyses confirmed the effectiveness of ERF-enriched instruction and revealed that this advantage was equally strong for ELL and ELP children.

This is an important finding, since emergent writing is an important route to print concepts (Whitehurst & Lonigan, 1998). Studies show that children’s representations of their own names may reflect and draw on more global aspects of emergent literacy knowledge, serving as important predictors of later literacy performance by connecting letters to letter sounds (Bloodgood, 1999; Scarborough, 1998; Welsch, Sullivan, & Justice, 2003). There is also evidence that invented spellings, as those seen through the Name Writing subtest of PALS-PreK, may be a vehicle for fostering phonological sensitivity and letter-sound knowledge, both shown to be directly related to decoding (Foorman et al., 2002).

Comparison to previous ERF studies

The finding that this ERF project improved children’s alphabet and print knowledge replicated the findings of the national evaluation (Jackson et al., 2007) and the studies by Gettinger and Stoiber (2007) and Gonzalez et al. (2009), and is consistent with Gray’s (2007) finding of an advantage of ERF instruction on PALS-PreK total scores. Thus, the initial evidence suggests that the ability of ERF enrichment to enhance these crucial building blocks of the development of literacy may be robust across different instantiations of the ERF program. In addition, unlike the studies by Martin et al., Gray, and Gonzalez et al., as well as the national evaluation by Jackson et al., this study documented a statistically significant positive effect of ERF-enriched instruction on preschool vocabulary learning. Thus, it is among the first to replicate the finding of Gettinger and Stoiber (2007). Given the crucial importance of vocabulary knowledge on the development of literacy, and the crucial role of literacy in setting the developmental trajectory of children’s success in school and beyond, this fining is particularly noteworthy. Several factors may account for the difference in findings, including differences in measures, curricula, instructional models and methods, student characteristics, and analytical techniques used.

The national evaluation (Jackson et al., 2007) assessed vocabulary using the expressive one-word picture vocabulary test (EOWPVT; Brownell, 2000), whereas the PPVT-III (Dunn & Dunn, 1997) was used in the other previous studies and in the present study. Jackson et al. did not consider curricula used in the various projects or instructional models and methods, and although ERF mandates that curricula and instruction are grounded in SBRR, this leaves considerable room for variation between projects, as demonstrated by the three published studies on individual projects. For example, neither Gray (2007) nor Martin et al. (2007) used a multi-tiered RtI instructional model. Gettinger and Stoiber (2007), on the other hand, used a three-tiered RtI framework similar to that of the present study. The curriculum used in the project evaluated by Gray was different than that used in the present study, but it appears that many of the instructional strategies and activities were similar. Martin et al. did not report the curriculum used; further, their description of instructional methods is embedded in the literature review, making it difficult to follow.

All or most ERF students are economically disadvantaged by definition, and many come from homes in which Spanish is the primary language (Jackson et al., 2007). The majority of the children included in Gray’s (2007) report were Hispanic, but information about English proficiency is not included in the study, and it appears that all instruction was in English. The preschoolers in Martin et al.’s (2007) study were primarily African American. The PPVT-III pretest scores of the children in the present study (mean = 77) were lower than in the other published studies (82, based on a conversion of NCE scores for Martin et al., 93 for Gray, 84 for Gettinger and Stoiber 2007). Supplemental multi-level analyses of the effectiveness of the ERF-enriched instruction in this study in which PPVT-III scores were centered one and two standard deviations above the minimum score revealed that the children with the very lowest scores benefited the most compared to their counterparts who did not receive such instruction.

The students, instructional model, and curricula were virtually identical in this study and the one by Gonzalez et al. (2009), so a different type of explanation of differences in findings regarding the PPVT-III must be sought. One possible explanation is that the fact that second year ERF teachers remained in the project providing continuity, collective participation, and pedagogical coherence, and ultimately assimilation of SBRR language and literacy skills; by contrast, there was considerable turnover between years one and two. Third year training also included follow-up pedagogical content knowledge aligned to preschool guidelines and assessments. This combination of training likely solidified teacher’s content knowledge in meaningful and important ways that increased the effectiveness of ERF enrichment in its third year of implementation.

Finally, the present study was the first to use multilevel modeling in the evaluation of ERF projects. This analytical technique may prove illuminating here as it has in other domains.

Limitations

The major limitation of this study is the confounding of ERF instructional enhancement and length of the preschool day. Hence, there is no way to disentangle the impact of the instructional practices employed in the ERF classrooms from the amount of time spent in them. However, prior to and in the absence of ERF funding, the participating school district provided only half-day preschool. This is not atypical of national practice, although providing a full day of instruction is one of the requirements for ERF funding (United States Department of Education, 2008a). Thus, if viewed as an evaluation of the impact of ERF-enriched instruction in a school district offering a half-day of instruction, the lack of control of instructional time does not diminish the value of the study.

However, if viewed as a test of the independent effect of the enrichment in teacher training, classroom environment, and instructional practices provided by this ERF project, the study lacks internal validity due to the inability to equate the amount of instructional time. Having acknowledged this limitation, it might also be noted that a tradeoff between ecological validity and internal validity is characteristic of field-based quasi-experimental research.

One way of addressing the problem is to compare the results of the present study to those of other studies that reported data on key literacy building blocks for full-day control “practice as usual” preschools. The national evaluation of ERF (Jackson et al., 2007) does not provide the statistical information necessary to make such comparisons, but a search of the preschool literature yielded several sources that provide the necessary information (Early et al., 2007; Fischel et al., 2007; Howes et al., 2008; Wasik, Bond, & Hindman, 2006).

Description of the characteristics of the preschools and children that participated in those studies is beyond the scope of the present study, but the proportion of ELLs was higher in the present study than in any of the others. The approach taken to compare preschool learning was to calculate standardized effect sizes (i.e., posttest mean minus pretest mean divided by the pretest standard deviation).

For the comparison of PPVT-III (Dunn & Dunn, 1997), Early et al. (2007) provided the results of seven studies, five of which involved 4-year olds, yielding a total of eight studies for which PPVT-III scores could be compared. The pre to post effect size for PPVT-III in the present study was 0.43. For the comparison groups in the other studies, the effect sizes ranged from 0.15 to 0.26, with a mean of 0.21. Thus, this purely heuristic analysis suggests that the instruction offered by this ERF project produced roughly twice as much vocabulary learning on this standardized metric as a full day of preschool without ERF enhancement.

Alphabet knowledge data from researcher-developed tests of letter naming were available for three of the studies, yielding effect sizes of 0.63, 1.09, and 1.25 for Howes et al. (2008), Wasik et al. (2006), and Fischel et al. (2007), respectively, compared to 3.62 in the present study. As on the PPVT-III, the ERF children in this study started out at a considerably lower level of performance, identifying only 2.37 letters at pretest compared to a mean of 7.04 for the other three studies. By the end of the year, however, the ERF children knew 22.19 letters compared to an average of 15.28 for the other studies. Thus, the impact of this project on alphabet knowledge is evident even when compared to preschools offering full-day instruction. These comparisons are offered for their heuristic value; additional research is needed.

Other limitations of the study also must be noted. The children in this study came from a single suburban school district, so caution must be used in generalizing the results to other children and other classrooms. The demographic data and description of the sample may aid the reader in assessing the relevance of the results for a population of children of particular interest. The overwhelming majority of the children in the two groups were ELL, but ERF was designed to serve economically disadvantaged children, and Hispanics are the fastest growing ethnic group at the low-SES level. As in all quasi-experimental studies, children were not randomly assigned to treatments reflecting the real-world constraints on such research. However, the demographic comparability of the ERF and contrast groups, the fact that they came from the same school district and had similar demographic characteristics and entering skills suggests that the contrast group was appropriate for comparative purposes. Further, student and teacher characteristics were used as covariates along with pretest scores in the multilevel analyses. Another limitation is that there was only one school per group, but the fact that there were multiple teachers in each group and the inclusion of teachers in the hierarchical analyses may be viewed as helping to alleviate this problem.

An additional limitation of the study is the lack of a direct measure of phonological awareness and other components of early literacy acquisition. Further, personnel constraints of the project precluded the use of testers blind to the children’s instructional group and multiple classroom observers that would have permitted the assessment of inter-rater reliability, but all testers and observers were trained on the instruments they administered and followed the standardized administration procedures.

Additional research is needed in order to provide a fuller understanding of the effectiveness of ERF enrichment as implemented in our classrooms. Such research should include large-scale randomized clinical trials such as that by Gettinger and Stoiber (2007) to overcome the limitations of quasi-experimental studies. In addition, the impact of implementation of programs such as ERF should be evaluated in longitudinal studies that track children through kindergarten and into the primary grades in order to more accurately assess its true effectiveness. Finally, studies that measure all components of language and literacy development and include the evaluation of preschool classroom environments and observations that assess the frequency and quality of SBRR instructional practice in both ERF and control classrooms could provide a more fine-grained analysis of why individual ERF classrooms produce the gains in language and pre-literacy skills that are the mission of project or fail to deliver on that promise. Gray (2007) did report classroom observation data for both conditions, but there were only three contrast classrooms.

Conclusion and implications

The benefits of high quality preschool programs are manifold. There is strong consensus from experts who have investigated quality early childhood development programs that such programs have substantial payoffs. Specifically, payoffs translate into higher levels of verbal, mathematical, and intellectual achievement; greater success in school, including less grade retention and higher graduation rates; higher employment rates with more earnings, better health outcomes, less dependence on welfare; lower crime rates; and higher government revenues and lower government expenditures (Lynch, 2006).

Results of this study indicate that the ERF-enriched school improved children’s oral language, alphabet knowledge, and print concepts. Although replications are needed, the enhanced vocabulary acquisition found here is important, especially in light studies showing that vocabulary knowledge of children entering kindergarten may be one of the most robust predictors of first- through fifth-grade standardized outcomes in reading (Juel, 1988; Kurdek & Sinclair, 2001). Taken as a whole, the results of this study and previous research suggest that the instruction provided in programs such as ERF can likely provide low income and LEL students with the help they need to enter kindergarten on a more even footing, but longitudinal studies that track the impact of ERF preschool enrichment as children progress through school are needed in order to assess its true effectiveness.