Introduction

As educational systems around the world are increasingly affected by the number of children who speak different languages at home than the instructional language of their schools, the importance of understanding potential differential processes involved in emerging literacy for young bilingual students grows as well. Learning to read is a cornerstone of academic achievement that prepares children for their educational futures. Knowledge regarding literacy acquisition for bilingual learners, however, is far from conclusive. The literature offers models and theories of reading almost exclusively based on the cognitive and academic profiles of monolingual readers, and it is unclear if the same base reading skills are as important for bilingual children’s reading comprehension as they are for that of monolingual children. It is important for educators to know if childhood literacy acquisition is affected by speaking two languages at home or by receiving instruction in a language other than the home language (Bialystok, Luk, & Kwan, 2005).

Many educational systems are currently struggling with the question of how to best serve their minority language populations. International investigations indicate that compared to other OECD countries, minority language children in Germany are particularly disadvantaged in their literacy achievement (OECD, 2010). The achievement gap between German and minority language students in Germany is especially pronounced for students with a Turkish language background, even after controlling for social and educational background characteristics (Stanat, Rauch & Segeritz, 2010; Marx & Stanat, 2011). This gap is even more poignant as the majority of these students in the German educational system is born and raised in Germany. Thus, it is important to understand the processes and components involved in literacy development of Turkish speaking children in Germany.

The present study aims to enhance our empirical knowledge of bilingual reading development, which is primarily based on bilingual English speakers in the North American educational system. Using data from a 3-year longitudinal study, the present analysis explores the importance of different base reading skills for children’s German reading comprehension for bilingual Turkish-German and monolingual German students. While we assume that the same predictors or base reading skills are relevant for monolingual and bilingual reading development, we aim to investigate to what extent these skills differ between the two groups and how these differences affect their reading growth in primary school.

Predictors of reading

Reading comprehension is a complex process beginning with the perception of a word, the access to a word’s meaning, and finally the production of inferences based on pre-existing knowledge paired with new information extracted from the text (Kintsch, 1988). This process is addressed in the literature through various reading models (Florit & Cain, 2011; Gough & Tunmer, 1986; Lundberg, 2002; Rapp, Folk, & Tainturier, 2001; Näslund & Schneider, 1991). Despite differences regarding the number of and interrelationships between different base reading skills, these models distinguish between two main paths towards reading comprehension: a verbal route and a phonological or code-based route.

The present paper focuses on a simple and parsimonious theoretical model of reading comprehension in the German context adapted from Näslund and Schneider (1991). This model has been empirically tested and shown to have robust effects modeling reading growth in German monolingual emerging readers. The model (see Fig. 1) postulates a verbal strand affecting the development of reading comprehension directly. This strand encompasses verbal ability mostly operationalized through the use of vocabulary measures. The second, code-based strand is portrayed as phonological awareness abilities that directly influence reading comprehension and enable the decoding of a word. These two strands are not independent from one another, as a child’s ability to relate speech sounds to objects and events is necessary for the acquisition of word meanings. The model is thus similar to the simple view of reading (Gough & Tunmer, 1986; see Florit & Cain, 2011) distinguishing a verbal and code-based strand, but with the addition of a phonological awareness component. In the following, we will detail these three relevant base reading skills—phonological awareness, decoding, and vocabulary.

Fig. 1
figure 1

Theoretical model of early reading comprehension adapted from Näslund and Schneider (1991)

Developing throughout primary school, phonological awareness is defined as the ability to recognize and manipulate the sound structure of speech by detecting and differentiating phonemic units. Phonological awareness has repeatedly been demonstrated to be of utmost importance in early reading acquisition (Goswami & Bryant, 1990; Schneider, 2004; Wagner et al., 1997). It has commonly been shown to be the single most important predictor of reading acquisition in the first language (e.g., Chiappe, Siegel, & Gottardo, 2002; Verhoeven, 2000; Schneider & Näslund, 1999).

Reading single words, or decoding, is commonly understood to be the act of extracting orthographic and phonological information from print without the integration of contextual information. It is a distinct process from reading comprehension in that no semantic meaning is necessarily derived (see Pazzagli, Cornoldi, & Tressoldi, 1993). Since word reading must come before text comprehension, the importance of decoding abilities in reading comprehension performance is clear (e.g., Muter, Hulme, Snowling, & Stevenson, 2004).

The construct of vocabulary encompasses the broad proficiency to express and comprehend the meaning of words and language structures. It demonstrates substantial predictive power in many studies investigating the development of reading comprehension. Vocabulary skills increase in importance as age-appropriate texts become more complex over time (e.g., Cain, Oakhill & Bryant 2004; Schneider, 2004).

Although the body of research, which investigates this complex process, reveals strong similarities in these base reading skills across most alphabetic orthographies, there is compelling evidence that linguistic structures and cultural specificity play an important role (e.g., Mann & Wimmer, 2002; Näslund, 1999) and that it is valuable to investigate specific processes involved in reading acquisition in different languages individually (see Gottardo & Mueller, 2009). In the following section, we will consider the aforementioned base reading skills, highlighting possible differences between monolingual and bilingual children.

Monolingual and bilingual reading skills development

As discussed above, phonological awareness, word decoding and vocabulary are identified in the literature as key building blocks in reading development. These skills develop throughout a child’s life at different rates. Although vocabulary and reading comprehension in general may develop more gradually, phonological awareness and decoding may develop quickly at the beginning of language acquisition and formal education and slow down after the first school years. However, having command of two languages creates differences in the way people experience and produce language, thus causing different patterns of growth (Bialystok, 2002). Indeed, studies indicate likely differences between monolingual and bilingual patterns of development in various aspects of literacy (see Carlo et al., 2004; Fitzgerald, 1995; Proctor, Carlo, August, & Snow, 2005; Verhoeven, 2000).

Several studies encompassing a wide variety of language combinations indicate that bilingual children exposed to more than one phonological system may have heightened levels of phonological awareness (Bruck & Genesee, 1995; Campbell & Sais, 1995; Kang, 2012; Marinova-Todd, Zhao, & Bernhardt, 2010; Oren, 1981). This may, however, depend on the languages involved (Bialystok, Majumder, & Martin, 2003; Bialystok et al., 2005; Yeong & Rickard Liow, 2012). For example, due to the characteristics of the phonological structure of Turkish, Durgunoğlu & Öney (1999) predicted and found that monolingual Turkish-speaking children have particularly well-honed phonological awareness (in comparison to English speaking monolinguals). A cross-sectional study with Mandarin and Cantonese speaking children in China indicated acceleration for bilinguals in the development of phonological awareness (Chen et al., 2004). However, longitudinal studies with mixed-language bilingual groups in North America found no interaction between time and group membership (Jean & Geva, 2009; Kieffer & Vukovic, 2012). These findings suggest that phonological skills may be heightened for some bilingual language combinations but not others.

Research has yet to reach a consensus regarding differences in decoding skills between bilingual and monolingual readers. Some studies found no significant differences between the two groups (Lesaux, Koda, Siegel, & Shanahan, 2006; Lesaux, Rupp, & Siegel, 2007; Hutchinson, Whiteley, Smith, & Connors, 2003; Bellin, 2009) or in the development of decoding skills (Jean & Geva, 2009). Others (Kieffer & Vukovic, 2012) found language minority students to have significantly lower decoding skills but similar rates of growth compared to their monolingual peers. Differentiating between three difficulty levels of orthographic structures in word decoding, Verhoeven (2000) showed that a heterogeneous group of second language learners lag behind their Dutch peers as words became more orthographically complex over time, suggesting that some measure of word decoding might prove more difficult for bilinguals due to the orthography of words.

Studies often show that bilingual children typically have command of smaller vocabularies in each given language than monolingual speakers of either of the two languages (e.g., August, Carlo, Dressler, & Snow, 2005; Bialystok, 1988; Merriman & Kutlesic, 1993). This difference seems to persist throughout a child’s school career (Hutchinson et al., 2003; Droop & Verhoeven, 2003; Cobo-Lewis, Pearson, Eilers, & Umbel, 2002; Bialystok & Herman, 1999). However, bilingual and monolingual children appear to develop their vocabularies at similar rates (Jean & Geva, 2009; Kieffer & Vukovic, 2012).

In summary, current German and international research reveals several systematic differences for bilingual and monolingual children in base reading skills. Still, the dearth of longitudinal research on the development of those skills makes drawing conclusions unviable. The literature suggests a bilingual advantage for phonological awareness. There is a lack of evidence for differences in word decoding development between the two groups. And although there seems to be a monolingual advantage in the overall command of vocabulary, the development of vocabulary seems to grow at a similar rate for bilingual and monolingual children. We will discuss the consequences of these differences for reading comprehension of bilingual and monolingual students in the following section.

Models of reading development: predictors for bilingual readers

The variation in base reading skills between monolingual and bilingual children can have major consequences on the development of reading. In this section, we examine how phonological awareness, word decoding and vocabulary predict reading comprehension and argue that, given the difference between bilinguals and monolinguals in these base reading skills, the predictors in models of reading play different roles for these two groups of learners.

Several studies have shown phonological awareness to play a strong predictive role for reading comprehension for all children. According to the phonological strand of reading comprehension (see Fig. 1 and Näslund & Schneider, 1991), phonological awareness may affect reading comprehension directly as well as via decoding. Chiappe, Glaeser, and Ferko (2007) found indications for common mechanisms in the development of literacy skills from the beginning to the end of first grade for Korean-English bilingual and English monolingual children. However, phonological awareness made much stronger contributions to decoding development among the Korean-English bilingual children. In comparing monolingual English speakers and English language learners over the first 4 years of primary school, Jongejan, Verhoeven, and Siegel (2007) found phonological awareness in grade one and two to be an equally strong predictor of decoding in grade three and four for both groups. Investigating a Spanish–English bilingual group from first through the sixth grade, Nakamoto, Lindsey, and Manis (2007) found that initial phonological awareness skills played a strong significant role in predicting reading comprehension. However, more research is needed on differences between bilingual and monolingual readers in the relationship between phonological awareness and passage reading comprehension.

Similar to monolingual readers, decoding abilities are important for reading comprehension of bilingual readers, with decreasing importance in later primary school compared with an increasing relevance of vocabulary (Proctor, Carlo, August, & Snow, 2005; Gough & Tunmer, 1986). In an investigation of struggling Spanish–English bilingual readers from 4.5 to 11 years of age, Mancilla-Martinez and Lesaux (2010) found that vocabulary and decoding were significant predictors of reading comprehension, and the initial status of decoding at 4.5 years exerted a greater influence on comprehension outcomes at age 11 than the initial status or rate of change of productive vocabulary. However, the lack of a monolingual group renders comparisons impossible. These results contrast with other studies which found vocabulary to be the primary source of variability in reading comprehension for all readers in later primary school (e.g., Gough & Tunmer, 1986; Vellutino, Tunmer, Jaccard, & Chen, 2007).

Researchers predict that under-developed vocabulary is more likely to compromise reading comprehension among children with diverse language backgrounds (see Lesaux et al., 2007). Indeed, several studies have found vocabulary skills to be a particularly important predictor for bilinguals’ reading comprehension (Proctor et al., 2005). Lesaux, Crosson, Kieffer, and Pierce (2010) examined reading development from fourth to fifth grade for Spanish–English bilingual children in “biliteracy” classes in North America. Structural equation modeling confirmed that English vocabulary skills had a large, significant effect on English reading comprehension, whereas students’ decoding skills, whether in Spanish or English, were not significantly related to English reading comprehension. These findings hint at not only the importance of vocabulary for bilingual children, but the decrease in the importance of decoding as children reach late elementary grades. However, this study neither examined the role of phonological awareness for reading comprehension nor did the sample include a monolingual comparison group.

Further investigating differences between bilingual and monolingual children in base reading skills over time, Verhoeven (2000) explored the interactions of receptive vocabulary skills, decoding and reading comprehension in a structural equation model. Comparing early elementary school monolingual Dutch-speaking children with bilingual children who spoke various home languages, he found that creating separate models for the two groups resulted in a much better fit. Verhoeven attributed the large differences in the model to the substantially stronger role of vocabulary for development of reading comprehension within the bilingual group. These results mirror those of Lervåg and Aukrust (2010), who examined the role of decoding and vocabulary skills as longitudinal predictors of reading comprehension in Norwegian monolingual and Urdu-Norwegian bilingual children. By means of two-group latent growth models, they found vocabulary skills to be a stronger predictor of the growth of reading comprehension for the bilingual children. However, the two groups differed greatly in their reading comprehension performance at all measurement points, rendering group comparisons between reading skills and comprehension difficult.

The lack of consensus between aforementioned studies can be attributed to a variety of differences. First, studies differ in the students’ languages involved. Some of them only explore bilingual development without a monolingual comparison group, whereas others investigate heterogeneous language groups of bilingual students without considering the importance of individual language characteristics. Second, there is little consistency in the current body of literature with regard to the terminology and groupings of monolingual and bilingual children. Participants are often grouped into English language learner or language minority groups without clear descriptions of the criteria used for grouping and without actually measuring their language skills in either language. Finally, the noticeable difference in reading comprehension performance for monolingual and bilingual readers in most studies weakens inferences about the relation between base reading skills and reading comprehension for the two groups.

Taken as a whole, research suggests that bilinguals possess stronger phonological awareness while monolinguals present stronger vocabulary skills. There is a lack of evidence for differences in word decoding abilities between the two groups. Frequently, vocabulary skills seem to be more relevant in the prediction of bilinguals’ than monolinguals’ reading comprehension. However, to our knowledge, very few studies have considered the relevant base reading skills vocabulary, phonological awareness, and decoding, together in a longitudinal design comparing monolingual and bilingual students’ reading comprehension.

The present study

Although there is consensus that bilingualism in itself has no negative consequences for the development of reading skills (e.g., Chiappe & Siegel, 1999; Da Fontoura & Siegel, 1995), it remains unclear what the precise strength and weaknesses of bilingual and monolingual students are in relation to the development of base reading skills and if those base reading skills lead to reading comprehension in the same way for both bilingual and monolingual readers. This study attempts to shed light on these questions.

The present study addresses the aforementioned shortcomings by including the key basic reading skills vocabulary, phonological awareness, and decoding, a monolingual and bilingual student sample with comparable levels of reading comprehension and comparable background characteristics, and an explicitly language-based determination of which participants may be categorized as bilinguals. We also extend the research base by examining a large yet under-investigated language group outside of English-speaking countries, who have not served as the primary focus of studies on literacy acquisition so far. The longitudinal development of reading of bilinguals and monolinguals has not been sufficiently investigated in Germany with Turkish speaking children (Limbird & Stanat, 2006), the largest group of minority language children in the German school system (Diefenbach, 2010). Like the majority of the internationally published research on bilingualism, Turkish and German are both alphabetic languages such as English, Spanish or French (for more information about structure of the German language see Hall, 2010, for a description of the Turkish language in this context see Durgunoğlu & Öney, 1999; Öney & Durgunoğlu, 1997).

Table 1 Sample means (and standard deviations) for demographic characteristics and cognitive abilities between German monolinguals and Turkish-German bilinguals at Time 1

The following study utilizes an adapted theoretical model of reading comprehension from Näslund & Schneider (1991; see Fig. 1). This simple and parsimonious model, which was tested with early primary school readers, demonstrated robust effects for longitudinally predicting reading comprehension abilities in German monolingual emerging readers. The present investigation into the reading development of both German monolingual and Turkish-German bilingual children was guided by two overarching research questions: (1) Do Turkish-German bilingual children show similar patterns of growth as their monolingual German speaking peers on measures of base reading skills and reading comprehension between the second and third grade? (2) Does a common model of reading fit the actual development of reading comprehension from second through third grade for a Turkish-German bilingual compared to a German monolingual population equally?

Method

Participants

This study is part of a 4-year longitudinal study entitled “The Berlin Longitudinal Study of Reading Competence Development among Primary School Children” (BeLesen) conducted from 2002 to 2006 that encompassed 59 classes in 30 schools located in socio-economically disadvantaged districts in inner-city Berlin. Based on a longitudinal design, the present study utilizes a sub-sample of the larger BeLesen study. We followed a smaller subsample of Turkish-German bilingual children (TB; n = 100) and a comparison group of German monolingual children (GM; n = 69) from the first through the third grade (see sample description in Table 1 and timeline in Table 2). The children came from 14 classes in six different schools. All six schools in which the 14 classes were embedded fell into SES zones with scores of 6 or 7 (1 being a highly advantageous district and 7 being highly disadvantageous) as classified by a local governmental agency. The classes had teachers with many years of teaching experience (M = 22.92 years), similar class sizes, and the classes had substantially larger proportions of minority language students (M = 67 %) than native German speakers.

Table 2 Overview of the measurement timeline

A participant was included into one of the two language groups if she consistently met all the following criteria: child self-report of home language use across two measurement points, teacher report of the children’s home language at two measurement points, and for the Turkish-speaking children, verbal Turkish assessments. All children in the study attended German schools from the onset of their education and were proficient enough in German to complete all German language tests. To ensure a base level of bilingualism, children in the Turkish bilingual group were required to demonstrate Turkish language proficiency as measured by a modified version of the Bilingual Verbal Abilities Test (BVAT; Muñoz-Sandoval, Cummins, Alvarado, & Ruef, 1998) by scoring no more than one standard deviation below the average of the reportedly Turkish-speaking sample (n = 151 initially categorized as Turkish bilingual by the student or teacher).

The monolingual and bilingual groups showed no significant demographic differences (see Table 1). Participants were between the ages of 7.1 and 9.5 (M = 7.9, SD = 0.4) at the onset of the verbal data collection (Time 1) and had just begun second grade. None of the participants in this study repeated the first, second, or third grade. Analyses of variance and Chi square analyses found no significant differences between the two groups with regard to age, school district SES, sex, or any other background characteristics (see Limbird, 2006). The fact that only 4 % of the bilingual group and 1 % of the monolingual group were born outside of Germany and that they were virtually identical on all demographic characteristics indicates this sample is well-suited to an examination of the participants’ linguistic abilities without any known sociological confounds.

Procedure

Trained masters-level university students administered written measures of reading and cognition in classroom settings. Turkish-German bilingual graduate students administered the verbal assessments individually, including the measures of German phonological awareness and vocabulary. The written assessments were administered in 6-month intervals (see Table 2) in the students’ classrooms.

Measures

Phonological awareness

To assess phonological awareness, we administered a modified version of the standardized German phonological awareness measure “Basiskompetenzen für Lese-Rechtschreibleistungen” (BAKO 1–4; Stock, Marx, & Schneider, 2003). With the aim of establishing language neutrality, we reduced the BAKO items to pseudoword items wherever possible and modified the structures of the pseudowords to be linguistically possible non-words in both Turkish and German. Four of the seven BAKO subtests were included in this investigation: phoneme identification, elision, word remainder determination, and sound categorization.

Phoneme identification

The BAKO pseudoword segmentation scale consisted of eight items that required the participant to listen to and repeat a non-word, then identify each phoneme individually, placing small cards down to represent each sound. The internal consistencies across the different measurement points were similar between the monolingual and bilingual groups (GM: T1α = .72, T2α = .62; TB: T1α = .81, T2α = .62).

Elision

The children’s ability to modify vowel sounds verbally in a non-word was measured with a modified version of the BAKO vowel replacement subtest. For example, the children were asked to repeat a pseudoword then replace all /a/ sounds with an /i/ sound. This scale had an internal consistency of .91(T1) and .92 (T2) for the GM group, and .92 (T1 and T2) for the TB group.

The word remainder determination subtest required participants to verbalize a non-word with either the beginning or end phoneme missing (GM: T1α = .81, T2α = .71; TB: T1α = .80, T2α = .72).

Sound categorization

This subtest required participants to listen to a series of four non-words and real words to determine which one began or ended with the “wrong” sound (i.e., which one did not match the others). The scale proved to have similar average internal consistencies at T1 for both groups, although it was weaker for the bilingual group at T2 (GM: T1α = .70, T2α = .71; TB: T1α = .73, T2α = .58).

An aggregate phonological awareness scale was created from the four subscales. This scale consisted of 35 items and had an average internal consistency of α = .92 at Time 1 and α = .89 at Time 2 for the monolingual group and α = .92 at Time 1 and α = .88 at Time 2 for the bilingual group. Factor analyses confirmed the validity of aggregating the sub-scales into a summative scale.

Word decoding

To identify word-level decoding speed in German, we administered the Würzburg Silent Reading Test (WLLP; Küspert & Schneider, 1998, 2001). In this timed test administered in groups, participants are presented with series of written words followed by four pictures, from which the participant is instructed to select the one that best represents the written word. Expandable in length (80 items T1, 120 items T2, 140 items T3), it is well suited for measuring growth in longitudinal studies (parallel test form r = .92 for the standardization sample; item level data were not available for this sample).

Vocabulary

We administered a modified shortened version based on the Bilingual Verbal Abilities Test (BVAT; Muñoz-Sandoval et al., 2005) in German to all participants. Because this test tapped into the participant’s ability to produce appropriate verbal responses, they present measures of expressive vocabulary. The BVAT is a measure of cognitive academic language proficiency assessing school-related language, as opposed to general conversational language. We assessed the children with the picture vocabulary, oral vocabulary synonyms, and oral vocabulary antonyms subtests. For each subtest, items increased in difficulty and were administered until the participant reached her ceiling.

The Picture vocabulary subtest required the participants to identify small pictures orally (split-half reliability across different measurement points GM: T1r = .84, T2r = .79; TB: T1r = .90, T2r = .85). The synonym measure required participants to respond verbally to a spoken stimulus with a similar word (split-half reliability GM: T1r = .77, T2r = .79; TB: T1r = .73, T2r = .76). The 18-item antonym measure was similar but the intended response was the opposite of the stimulus word (split-half reliability T2 GM r = .74; TB r =  .71). In order to reduce the chance of obtaining ceiling effects in the case of substantial vocabulary development between second and third grade, additional items were added to the synonym scale at the second measurement point (T2) along with the more difficult antonym scale for a total of 55 items at T2 (compared to 37 items at T1). The subscales were aggregated into a German expressive vocabulary scale for each time of measurement (GM: T1r = .85, T2r = .88; TB: T1r = .89, T2r = .88). Factor analyses confirmed the validity of aggregating the subscales into a summative scale.

Reading comprehension

We administered the Text Comprehension subtest of the well-established ELFE (“Ein Leseverständnistest für Elementarschüler”) German reading comprehension measure (Lenhard & Schneider, 2005) as a group test at Time 2 and Time 3. This untimed subtest aims to test a child’s ability to find information in a text, to infer meaning beyond written sentences, and to draw conclusions about that text. A series of short texts (2–3 sentences) are provided in a test booklet, each followed by one or several questions on the content of the text with 20 items in total. The ELFE demonstrated slightly higher internal consistency values for the German group (T2α = .89; T3α = .90) compared to the Turkish-German group sample (T2α = .74; T3α = .82).

Cognitive abilities

We administered three subtests of the Culture Fair Intelligence Test (CFT1; Cattell, Weiß, & Osterland, 1997) to measure fundamental non-verbal cognitive skills in the first half of first grade. The 36 items in the three subtests were designed to measure visual processing, classification skills, and detail recognition. Internal consistency for the Turkish-German children in the larger BeLesen sample was comparable to that of the German monolingual children (GM: α = .78; TB: α = .84).

Missing data

Analyses of the missing values showed that there was no systematic loss of participant data over the points of measurement. Of the 100 TB children in the sample at T1, 91 completed the assessments at T3. Of the 69 GM participants, 53 completed the final reading assessment at T3. Participants who were absent for any measure at any point of time demonstrated no signs of significantly diverging performance on any of the other primary measures of interest. Absence or attrition throughout the 24 months of investigation was independent of reading-related skills performance or cognitive skills. All analyses were conducted to account for missing data (for more details see Limbird, 2006).

Results

The purpose of this study was to examine if there were different patterns of growth across base reading skills between TB and GM students (Research Question 1) and to investigate if a common model of reading fit both groups equally (Research Question 2). We used repeated-measures one-way ANCOVAs to investigate the change in base reading skills over time as well as differences between groups with gender and general cognitive abilities as covariates. To address the second research question, we utilized multi-group structural equation modeling (SEM) with maximum likelihood estimation to test the model fit in the bilingual and monolingual groups.

Patterns of growth of base reading skills

To begin investigating the patterns of base reading skills development in the Turkish bilingual and German monolingual groups, we first examined the individual scales and found that all scales were normally distributed with no noticeable ceiling effects. We then conducted a preliminary examination of mean scores at each point of measurement (see Table 3). Between Time 1 and Time 2 both groups showed an increase in their phonological awareness scores with the bilingual group scoring slightly higher than the monolingual group. With regard to word decoding, both groups’ performance improved at each point in time with the bilingual group showing a slight disadvantage. The analysis revealed a larger advantage for the monolingual group in German vocabulary at both Time 1 and Time 2 with a greater variance for both groups at Time 2. On measures of reading comprehension both groups showed a positive trend from Time 2 to Time 3 with the monolingual group demonstrating slightly higher scores on average. The mean at both measurement points showed that both groups on average scored just below the 50th percentile of the national standardized sample. Based on this preliminary examination, we next investigated the development patterns over time and between groups.

Table 3 Sample means (and standard deviations) of Turkish-German bilinguals (TB) and German monolinguals (GM) across the three points of measurement

To investigate the patterns of development for German vocabulary, phonological awareness, and word decoding within the two groups, as well as possible differences in the rates of development between the two groups, we conducted repeated-measures one-way ANCOVAs investigating the factors of both time and group (for graphic representation, see Fig. 2a–d) using gender and cognitive abilities as covariates. For the phonological awareness measure, we found no significant effect for time, (Wilk’s Λ = .99, F (1, 143) = 1.08, p = .30) and no meaningful group differences in slope (F (1, 143) = 3.26, p = .07). Thus, the two groups did not differ in phonological awareness development from the beginning of second to the end of second grade (see Fig. 2a). For word decoding, the analyses showed a significant linear effect from Time 1 to Time 3, (Wilk’s Λ = .89, F (2,113) = 6.76, p < .01) with no significant differences between the bilingual and monolingual group (F (1,113) = 0.13, p < .72) with no interactions. This finding indicates that both groups improve their decoding skills significantly over time at a similar rate (see Fig. 2b).

Fig. 2
figure 2

ad z-score growth comparisons between the Turkish bilingual (TB) and the German monolingual (GM) groups for German vocabulary, phonological awareness, word decoding, and reading comprehension

An examination of German vocabulary performance revealed a significant effect from Time 1 to Time 2 (Wilk’s Λ = .91, F (1, 142) = 13.76, p < .01). Monolinguals showed on average significantly higher scores on German vocabulary than their bilingual counterparts at both points in time (F (1, 142) = 49.41, p < .01). These results show that despite the significant improvement from the middle to the end of the second grade, the bilinguals’ vocabulary skills lag substantially behind their monolingual peers. Because no interactions were found, this analysis supports the hypothesis that Turkish bilingual children do not differ in the growth rate of their vocabulary skills from the German monolingual children (see Fig. 2c).

We next examined reading comprehension performance between Time 2 and Time 3. As with the growth analyses above, we calculated a repeated-measures ANCOVA with group as the between-subjects factor. The main effect of time was not significant (Wilk’s Λ = 1.00, F (1, 116) = .31, p = .58). No group differences in average reading comprehension were detected either (F (1, 116) = 1.50, p = .22). Most importantly, the ANCOVA revealed no significant interactions between time and group with regard to reading comprehension (see Fig. 2d). Taken together, the results give evidence that the bilingual and monolingual children, despite some significant group difference, on average do not differ in their patterns of growth of phonological awareness, German vocabulary development, word decoding or reading comprehension in their early primary school years.

Model of reading development

In this section, we address our second research question about the extent to which base reading skills affect reading comprehension differently in TB and GM groups over time. As an initial view of the relationships between the base reading skills and reading comprehension, Table 4 shows the correlations between the proposed predictors and reading comprehension for both groups. For the bilingual group, reading comprehension at Time 2 and Time 3 correlated significantly with the measures of German vocabulary at both Time 1 and Time 2, the phonological awareness measures at both times of measurement, and the decoding measures at both Time 1 and Time 2. German vocabulary at Time 1 and Time 2 is moderately associated with reading comprehension for the monolingual group. Both measurements of phonological awareness and word decoding were highly related to reading comprehension at both Time 2 and Time 3. One significant difference emerged with regard to the correlation coefficients in the two groups. With the application of Fisher’s r to z transformation, phonological awareness measured at the end of second grade demonstrated a much stronger correlation with reading comprehension in third grade in the monolingual group (r = .71) than in the bilingual group (r = .38, z (144) = 3.05, p < .001).

Table 4 Correlation coefficients for reading comprehension for Turkish Bilinguals (TB) and German Monolinguals (GM) at Time 2 and Time 3

Given our substantive interest in TB and GM group differences in base reading skills and their effect on reading comprehension, we employed structural equation modeling and multi-group analysis in MPLUS 6.1 (Muthén & Muthén, 1998–2010). We based the a priori structure on a simplified version of the model of German reading proposed by Näslund & Schneider, (1991) utilizing the base reading skills from Time 1 and the measure of reading comprehension from Time 3 (see Fig. 3). In the models, we used the two subscales of the synonyms and picture vocabulary as manifest (observed) variables, serving as indicators for the latent construct of vocabulary abilities. Similarly, the four subscales of the phonological awareness measure—pseudoword segmentation, vowel replacement, word remainder determination, and sound categorization—made up the latent construct representing phonological awareness. Because over one-hundred items measured decoding and because it demonstrated substantial reliability as a measurement instrument, we used the aggregated scale of word decoding as a manifest (observed) variable. The same rationale was applied to the manifest (observed) variable of reading comprehension, composed of a twenty-item scale. Following Bollen’s (1998) hierarchy for group comparison of models, we conducted the multi-group analysis in three steps. The first step included fitting a baseline model, which tests both groups together. The second step fits a multi-group model in order to compare it to the baseline model, by allowing for the paths to differ for the TB and GM groups. In the third step, we explored if the group differences in the sample were significant by fitting a series of models where the different SEM paths were constrained to be equal between the two groups.

In the first two steps, we fit a model that included the entire sample as one group and then compared it with the multi-group model (see Fig. 3). The fit statistics showed an acceptable match to the data, and compared to the baseline model, all the fit measures for the multi-group model provide strong evidence that the model has a significantly better fit when divided into two groups. In the multi-group model, we observed several noticeable differences in the parameters between the groups. The standardized regression weights for vocabulary in the TB group have a significant relationship with reading (β TBvocab : 0.20, p = .03) while this is not the case for the GM group (β GMvocab : 0.05, p = .67). Conversely, the mediating factor decoding is a significant predictor of reading comprehension only for the GM group (β GMdecode : 0.26, p = .02). And while the relationship between phonological awareness and decoding was stronger for the TB group (β TBphonological to decode : 0.68, p < .01), the effect of phonological awareness on reading comprehension was stronger for the GM group (β GMphonological to reading : 0.59, p < .01). The model also explained more of the variance of reading comprehension in the GM group (R 2: 0.62) than in the in the TB group (R 2:0.49). Overall, despite the acceptable match in the form of the model of reading development for both groups, we can conclude that there are noticeable differences between the TB and GM groups.

Fig. 3
figure 3

SEM multi-group model with base reading skills from Time 1 predicting reading comprehension at Time 3 for Turkish German bilingual (TB) and German monolingual (GM) children

As a third step we next fitted a series of models systematically constraining coefficients to be equal in both groups to test if these differences were significant in the population. Comparing these constrained models to the model in Fig. 3, we used the difference in Chi squared statistics to test if group differences were statistically significant. Since there were no significant changes in the Chi squared statistic of overall model fit for any of the constrained models, we are not able to make further inferences regarding this population of students.

Discussion

This study of bilingualism and literacy acquisition contributes to the understanding of reading development in a large yet under-investigated population of bilingual children in Europe. Years of research in bilingualism have indicated that bilinguals produce and experience language differently than monolinguals; it is still unknown however, how those linguistic differences affect literacy acquisition (see Bialystok, 2002). Using a parsimonious and thorough theoretical model of reading comprehension, this study is the first to examine if the base skills in German reading develop differently for bilingual children compared to their monolingual classmates and if a common model of reading comprehension fits both groups equally. The clear similarities between the two groups on socio-economic characteristics and other literacy related skills uniquely enabled us to isolate the differences in the students’ linguistic abilities and explore their effects on reading comprehension skills a year later.

As hypothesized in our first research question, the patterns of growth between the two groups showed that in all three base reading skill areas measured over time (phonological awareness, vocabulary, and decoding), growth patterns were essentially identical in the two groups; no interaction effects were found for group and time. Bilingual and monolingual children in this sample not only demonstrated similar growth in reading comprehension abilities from the second to the third grade; their mean performance scores were also very similar. With the exception of an advantage in phonological awareness for the bilingual children early in second grade, and significantly stronger vocabulary skills for the monolingual children in first and second grade, the two groups showed patterns of base reading skill development that were more alike than different. In that regard, the lack of group differences in development patterns in this study reflects findings from North American populations, such as Kieffer and Vukovic (2012).

Although the reading comprehension and other reading-related skills (phonological awareness and decoding) developed congruently in the two groups, there were a few noteworthy discrepancies. As others have found (e.g., Bruck & Genesee, 1995; Marinova-Todd et al., 2010; Jean & Geva, 2009), the bilingual children in this sample had somewhat heightened phonological awareness, but those skills developed at a similar rate to the monolingual group. Their phonological advantage is likely due to the metalinguistic benefits of bilingualism documented by Biaylstok (e.g., 2002) and others. Also, the equal decoding performance of the bilingual and monolingual children in this sample mirrored findings of studies with English speaking bilingual children (e.g., Jongejan et al., 2007; Lesaux & Kieffer, 2010; Lesaux et al., 2007). The bilingual students’ lag in vocabulary skills was similar to, but less marked than the 2-year developmental lag in expressive vocabulary found by Hutchinson et al. (2003) among mixed L2 learners in England as well as the large gap in vocabulary knowledge of minority language students in the Lesaux and Kieffer (2010) study. One reason for the smaller discrepancy in vocabulary abilities in this study could be attributed to the high level of similarity in the two groups’ socio-economic conditions and the fact that over 95 % of the children in both groups had attended German preschool programs.

The bilingual and monolingual children in this sample showed similar growth in reading comprehension in early elementary school. This parallels the findings of Lesaux et al. (2007) in North America. However, in a similar investigation with slightly older children, Droop and Verhoeven (2003) found differential growth rates among older primary school children, which was manifested in increased differences in reading performance between L1 and L2 Dutch school children over time. The lack of a widening performance gap found here might be explained by the use of relatively concrete language for the early primary school reading measures as well as our overall lower SES inner-city sample (as opposed to the Droop and Verhoeven study, who found the gaps to be most pertinent at higher levels of SES). The overall similarities in the developing base reading skills for the two groups in this sample can likely be attributed to the high levels of resemblance between them with regard to background characteristics such as SES, cognitive abilities, family and educational background variables. In essence, aside from their home languages, the two groups had almost identical learning experiences and environments which seem to have resulted in equivalent growth patterns.

The second line of investigation in this study was to determine the extent to which a single model of reading comprehension fit both monolingual and bilingual emerging readers. Structural equation modeling enabled us to examine if the paths to reading comprehension differ when empirically tested in a theoretical model of reading while including decoding as a mediating variable. Indeed, our findings showed that the theoretical model fit both groups with same base components of reading. Nonetheless, the model fit was only acceptable when the model was tested separately for the two groups, clearly indicating differences in the way the model operated for the two groups. This pattern echoed Verhoeven’s (2000) finding that creating separate models of reading for minority language and Dutch children produced a much better goodness-of-fit and highlights the importance of considering different models of reading for bilingual readers. Although the key variables in the longitudinal models were able to account for a substantial amount of the variance in reading comprehension abilities for both groups, the SEM analyses explained more variance for the German monolingual group. We can therefore assume that further base reading components need to be considered in order to create a more comprehensive model of reading for bilingual emerging readers.

Although the same base components in Grade 2 contributed to predicting reading comprehension 1 year later in Grade 3, our analyses demonstrated that the latent factor for phonological awareness in early Grade 2 exerted stronger influence on Grade 3 reading comprehension in the monolingual group, while the latent factor for vocabulary demonstrated significant influence on reading comprehension for the bilingual group only. We can therefore interpret that phonological awareness plays a larger role in predicting reading comprehension for the monolingual participants than for the bilingual participants. Conversely, vocabulary was a significant predictor of reading comprehension for the bilingual group, but not for the monolingual group. The fact that these model differences were found in the present sample but could not be inferred to the general population is most likely a case of low statistical power. The relatively small sample size for both groups resulted in a lack of power for creating two separate models and therefore warrants further research with larger sample sizes.

The differential predictive powers of the paths are particularly salient in light of the fact that the two groups performed at similar levels of German reading comprehension. This may be because both groups are at the early stages of reading, in which demands on advanced German vocabulary knowledge are still minimal. Most research with monolingual readers indicates that vocabulary abilities have little effect on early reading, but gain in importance as texts increase in complexity after second grade whereas more basic skills like phonological awareness and decoding play an important role earlier on in the literacy acquisition process (e.g., Cummins & Swain, 1986; Proctor et al., 2005; Schneider, 2004; Storch & Whitehurst, 2002). Like comparable studies in the Netherlands (Verhoeven, 2000) and in Norway (Lervåg & Aukrust, 2010), our analyses show that this pattern is different for bilingual readers, in that their reading comprehension shows a stronger influence of vocabulary skills. We surmise that the vocabulary demands required by the comprehension task were less taxing for the verbally stronger monolingual children, resulting in a higher dependence on the other base skills (phonological awareness and decoding) and in patterns of literacy often seen from monolinguals in early primary school. The bilingual readers, on the other hand, were more challenged by the vocabulary demands of the comprehension passages and therefore showed different patterns. The bilingual readers’ model in this study resembles more that of a later primary school reader such as in studies like Storch and Whitehurst (2002), whereas the monolinguals’ patterns of reading development more closely resembled that of early primary school readers. It is unclear if the bilingual readers are using their strong phonological awareness and decoding abilities to reach similar levels of reading performance and are compensating for their 6-month lag in vocabulary skills or if there are additional compensating skills not assessed in this study.

The results of this investigation should be taken as preliminary due to several restrictions of the study design and sample. Theorists and researchers suggest that vocabulary in the second language becomes increasingly important as decoding is mastered and reading processes shift toward requiring greater levels of inference in context-reduced texts (e.g., Cummins & Swain, 1986; Proctor et al., 2005; Schneider, 2004; Storch & Whitehurst, 2002). Because the children in this sample were investigated only during the earliest stages of literacy, during which reading requires relatively simple vocabulary and contexts, the available data cannot be used to examine the consequences of lower German vocabulary skills for more demanding abstract reading materials. Secondly, although the entire sample of 169 would have been adequate for creating a solid single SEM, the relatively small sample size did not create enough statistical power to fully investigate the multi-group nature of the sample. The structural equation models in this paper can therefore only be considered exploratory. Thirdly, in light of the substantially lesser amount of variance explained by the model for the bilingual group, future research should ensure that a broader range of instruments is used and over a longer period of time. Measures of grammar, pre-literacy skills such as alphabet knowledge, family reading practices, and non-word decoding should be incorporated in further investigations in order to establish appropriate models of reading for multilingual populations.

As bilingual theorists and researchers have surmised, the present study indicates that linguistic differences in bilingual compared to monolingual children have an effect on their emerging literacy. Indeed, our findings show that bilingual children develop their reading comprehension skills differently than their monolingual peers. Although similar base components play a role in learning to read for both bilingual and monolingual children, the components manifest themselves differently for the two groups, with vocabulary playing a stronger role in predicting reading comprehension for the bilingual group and phonological awareness for the monolingual group.

One of the key implications of this study is the need to focus teachers’ efforts on addressing the vocabulary needs of bilingual children. Early intervention programs such as that evaluated by Lesaux and Siegel (2003) show that initial deficits in L2 learners’ vocabulary skills could be overcome with specific instructional strategies aimed at learning new words in the L2. Studies such as this lend themselves well to further research on the effects of varying types of interventions and how they might support bilingual and monolingual readers’ development differently. Our findings indicate that “one size fits all” reading instruction is not necessarily best for classrooms of bilingual and monolingual readers. The other major implication of this study is a need for researchers to determine if there is, in fact, a unified model of bilingual reading comprehension development that can be applied within the Turkish-German bilingual population or even beyond into other language combinations. As educational systems around the world are increasingly struggling with how to best serve children who speak languages at home other than the school language (e.g., see Stanat & Christensen, 2006), the importance of understanding their unique paths to literacy cannot be understated. Still, it is important to recognize that although bilingual children may experience special challenges in their reading development, they possess the ability to communicate in two languages early in life. This constitutes a definitive linguistic advantage on a larger scale.