In Canada, approximately 34% of all children under the age of 14 speak a language other than English or French as their first language (L1), a percentage that is expected to increase in the years to come (Statistics Canada, 2018)Footnote 1. These children are referred to as English language learners (ELLs) because English is not the primary language spoken at home (Geva & Farnia, 2012; Lesaux, Lipka, & Siegel, 2006b; Lipka & Siegel, 2012). Despite the demographic differences that exist between Canada and US, consistent findings have been reported from research examining literacy development and skills of ELLs in both countries (August & Shanahan, 2006). Notably, even when ELLs perform similarly to their English L1 (EL1) peers on word-level skills such as decoding and phonological processing, they face challenges in reading comprehension, especially when the text becomes more complex in the upper elementary grades (Farnia & Geva, 2013; Lesaux & Kieffer, 2010).

Poor reading comprehension may be attributed to deficits in oral language, metalinguistic skills, working memory, and higher-level skills. School-aged ELLs come from a variety of language backgrounds in the Canadian context. Chinese is the third most commonly spoken language in Canada after English and French due to the substantial number of Chinese immigrants (Citizenship and Immigration Canada, 2017; Statistics Canada, 2016). Thus, the primary purpose of the present study was to explore sources of reading comprehension difficulties beyond word-level skills in Chinese-speaking ELLs. Sources of reading comprehension difficulties between Chinese ELLs and their monolingual peers were also compared. Selecting a homogeneous group of ELLs allowed us to ensure that many L1 effects were similar across ELL participants.

According to the Simple View of Reading model, reading comprehension is the product of decoding and linguistic comprehension (Gough & Tunmer, 1986). Thus, poor reading comprehension can occur as a result of deficits in either decoding, linguistic comprehension, or both. Although word reading difficulties (i.e., dyslexia) are commonly associated with poor reading comprehension, a subset of students experience comprehension difficulties despite adequate decoding ability. These students are often referred to as “poor comprehenders” (e.g., (Cain & Oakhill, 2006; Nation & Snowling, 1998)). The poor comprehender profile has been identified among both EL1s (e.g., (Catts, Adlof, & Weismer, 2006; Tong, Deacon, Kirby, Cain, & Parrila, 2011)) and ELLs (e.g., (D’Angelo & Chen, 2017; Li & Kirby, 2014; O’Connor, Geva, & Koh, 2019)). It is estimated that 5-15% of EL1 school-aged children are poor comprehenders, while the prevalence rate is 10-18% among ELLs. The percentage of this reader group is higher for ELLs due to their weaker English vocabulary skills, which lead to reading comprehension problems (Lesaux, Koda, Siegel, & Shanahan, 2006a).

Research has demonstrated that poor comprehenders may struggle with oral language, metalinguistic skills, working memory, and/or higher-level processing skills (e.g., (Cain & Oakhill, 2006; Catts, Adlof, & Weismer, 2006; Clarke, Henderson, & Truelove, 2010; Nation & Snowling, 1998; Nation & Snowling, 1999; Tong, Deacon, & Cain, 2014)). Therefore, these skills were the focus of the present study. Oral language skills refer to constructs such as oral vocabulary and listening comprehension, which use spoken language to express knowledge and provide a foundation for literacy development (Cain & Oakhill, 2007; Foorman, Herrara, Petscher, Mitchell, & Truckenmiller, 2015; Language and Reading Research Consortium, 2015). Metalinguistic awareness refers to the ability to reflect upon language; both morphological awarenesss and syntactic awareness are considered metalinguistic skills (Gillion, 2018; Lipka & Siegel, 2012). Morphological awareness contributes to reading comprehension by enhancing decoding efficiency and facilitating access to the meaning of morphologically complex words (Kirby et al., 2012). Syntactic awareness is important for reading comprehension because it allows the reader to integrate information by reflecting on grammatical structures and parsing complex sentences (Deacon & Kieffer, 2018). Working memory enables the reader to retain and intergrate information and thus is essential for reading comprehension. Finally, higher-level processing skills are needed to monitor comprehension, use conjunctions and generate inferences (Cain & Oakhill, 2006). These skills help the reader make connections and integrate text segments to construct a situation model for successful reading comprehension ((Geva, 2007; Kintsch, 1998); McMaster, Van den Broek, Espin, White, Rapp, Kendeou, ... & Carlson 2012). According to Catts et al. (2006), poor comprehenders have reading comprehension problems due to deficits in the abovementioned areas.

Poor comprehenders are typically identified between late elementary and middle school. Demands on oral language and higher-level skills increase around this time as children shift from learning to read to reading to learn (Chall, Jacobs, & Baldwin, 1990). With respect to the method of identification, poor comprehenders have traditionally been identified by selecting children who score above a certain cutoff score (e.g., 40% percentile) on word reading but below a certain cutoff score (e.g., 25–35% percentile) on reading comprehension (e.g., (Cain & Oakhill, 2006; Herbert, Massey-Garrison, & Geva, 2020; Nation & Snowling, 1998)). Because the use of cutoff scores does not capture the variability of a continuous distribution (Branum-Martin, Fletcher, & Stuebing, 2013), researchers have recently adopted a regression technique to identify poor comprehenders (e.g., (D’Angelo & Chen, 2017; D’Angelo, Hipfner-Boucher, & Chen, 2014; Li & Kirby, 2014; Tong et al., 2011; Tong et al., 2014)). This technique accounts for a number of variables (e.g., age, nonverbal ability, word reading accuracy and fluency) in addition to reading comprehension in the identification of poor comprehenders. The regression method also identifies average and good comprehender groups. Comparing poor comprehenders to average and good comprehenders allows for the examination of relative rather than absolute discrepancies between skills and provides insights into the core deficits of poor comprehension. Therefore, we chose to use the regression techinique to identify poor, average and good comprehenders in the present study.

Reading comprehension difficulties in ELL children

While ELLs develop decoding skills in a manner similar to their EL1 peers, they usually lag behind in vocabulary and reading comprehension ((Farnia & Geva, 2013); (Lesaux, Koda, et al., 2006a; Lesaux, Lipka, & Siegel, 2006b)). This is expected because ELLs have reduced exposure to English. Preliminary evidence suggests that ELL poor comprehenders demonstrate persistent difficulties in oral language, metalinguistic, and cognitive skills such as vocabulary, morphological awareness, syntactic awareness, listening comprehension, and working memory as well as higher-level skills such as inference and reading strategies (e.g., (Farnia & Geva, 2019; Geva & Massey-Garrison, 2013); Kieffer & Vukovic, 2012; (Lesaux & Kieffer, 2010; Lesaux, Koda, et al., 2006a; Lesaux, Lipka, & Siegel, 2006b; Lipka & Siegel, 2012; O’Connor et al., 2019; Zhang & Shulley, 2017)). It is important to note that most of these skills are reliant on and predicted by vocabulary knowledge which ELLs lack (Cain, Oakhill, & Lemmon, 2004b). Comparing ELL poor comprehenders to their EL1 counterparts reveals similarities and differences between these two groups and helps researchers and educators design effective interventions for each group. In what follows, we review existing research related to oral language, metalinguistic skills, working memory, and higher-level skills in ELL poor comprehenders.

Oral language

Research on monolingual children has demonstrated that poor comprehenders show deficits in oral language, such as vocabulary knowledge (e.g., (Cain & Oakhill, 2011); Catts, Adlof, & Weismer, 2006; (Colenbrander, Kohnen, Smith-Lock, & Nickels, 2016; Elwer, Keenan, Olson, Byrne, & Samuelsson, 2013); (Ricketts, Bishop, & Nation, 2008)) and listening comprehension (Colenbrander et al., 2016). In particular, several longitudinal studies suggest that poor comprehenders experience persistent difficulties in vocabulary over time (e.g., (Clarke et al., 2010; Elwer et al., 2013)). Poor comprehenders also demonstrate difficulties inferring the meaning of new words from context, compared to good comprehenders (Cain, Oakhill, & Lemmon, 2004b).

Weaknesses in oral language lead to poor reading comprehension for ELLs as compared to ELLs without reading difficulties. In fact, the oral language profiles of ELL and EL1 poor comprehenders tend to be similar (Geva & Massey-Garrison, 2013; Lesaux & Kieffer, 2010; O’Connor et al., 2019). Until now, most studies involving either ELLs or EL1s have focused mainly on the role of vocabulary breadth (i.e., the number of words known) in reading comprehension. Other aspects of oral language, such as vocabulary depth (i.e., how well a word is known) are less investigated (Ouellette, 2006). According to the Lexical Quality Hypothesis, high-quality representations of words are characterized by breadth and depth of vocabulary knowledge and both aspects are essential to achieving successful reading comprehension (Perfetti, 1985; Perfetti & Hart, 2002). Because poor comprehenders’ deficits in vocabulary are not homogenous (Colenbrander et al., 2016), additional research is needed to examine the roles of different aspects of vocabulary in poor reading comprehension.

Metalinguistic skills

In addition to oral language, poor comprehenders demonstrate weaknesses in metalinguistic skills, such as morphological awareness (Nation et al., 2004; (Nation, Snowling, & Clarke, 2005; Tong et al., 2011; Tong et al., 2014)) and syntactic awareness (Adlof & Catts, 2015; (Catts et al., 2006; Nation, Cocksey, Taylor, & Bishop, 2010; Nation & Snowling, 2000; Tong et al., 2014)). Morphological awareness refers to the ability to understand morphemes, the smallest units of meaning, and to manipulate word structures (Carlisle, 1995; Carlisle, 2000; Kieffer & Lesaux, 2012). Syntactic awareness involves the ability to manipulate and reflect on the grammatical structure of language (Cain, 2007; Lipka & Siegel, 2012). These two aspects of metalinguistic awareness reflect children’s sensitivity to word and sentence structures, respectively (Kirby et al., 2012; Tong et al., 2014). For example, in a study involving English native-speaking children in Grade 5, Tong et al. (Tong et al., 2011) found that poor comprehenders performed worse than average comprehenders on morphological awareness. In another study, Tong et al. (Tong et al., 2014) found that poor comprehenders performed worse than average comprehenders on syntactic awareness in addition to morphological awareness.

ELL poor comprehenders also demonstrate deficits in metalinguistic skills. They perform more poorly than ELL good comprehenders on morphological and syntactic awareness tasks (e.g., (Lipka & Siegel, 2012; Zhang & Shulley, 2017)). In addition, ELL poor comprehenders perform more poorly than EL1 poor comprehenders on syntactic awareness (Lesaux, Koda, et al., 2006a; Lesaux, Lipka, & Siegel, 2006b). However, the role of metalinguistic skills in reading comprehension difficulties for ELLs from diverse linguistic backgrounds is still not clear due to the limited research. Most of existing studies that investigated morphological awareness in ELL poor reading comprehension used cut-off scores to identify poor comprehenders. More research is needed to examine the similarities and differences between ELL and EL1 poor comprehenders on metalinguistic skills with alternate, advanced identification methods.

Working memory

Working memory capacity is required to simultaneously store and process information to integrate text segments for successful reading comprehension. Problems with working memory are documented in poor comprehenders. Specifically, poor comprehenders perform less well than good comprehenders on working memory tasks that involve digits and word lists (Cain, Oakhill, & Bryant, 2004a; Nation, Adams, Bowyer-Crane, & Snowling, 1999; Oakhill, Hartt, & Samols, 2005; Pimperton & Nation, 2014; Swanson, Howard, & Sáez, 2007). Furthermore, ELL poor comprehenders perform worse than EL1 poor comprehenders on working memory for words (e.g., reading span task) due to weaker linguistic knowledge (Lesaux, Koda, et al., 2006a; Lesaux, Lipka, & Siegel, 2006b). However, no differences have been reported between ELL and EL1 poor comprehenders on the task of working memory for digits (e.g., backward digit span) which requires more constrained linguistic knowledge (Geva & Massey-Garrison, 2013). We note that different measures have been used to assess working memory across studies, making it difficult to draw definite conclusions. Thus, there is a need for more research to determine whether there are differences between ELL and EL1 poor comprehender groups on working memory.

Higher-level skills

Reading comprehension draws on skills at different levels. Higher-level skills such as inference, use of conjunction, and comprehension monitoring are needed to make necessary inferential links and integrate information from different sources during reading (Perfetti, Landi, & Oakhill, 2005). These skills aid the construction of meaning-based representations of the text ((Kintsch, 1998); McMaster et al. 2012). Reading comprehension also involves the construction of coherent mental representations ((Kintsch, 1998); McMaster et al. 2012). These coherent relations can be signaled by various connectives, including conjunctions such as and, because and although ((Golding, Millis, Hauselt, & Sego, 1995); Halliday & Hasan, 1976). Connectives entail the understanding of logical relations between clauses, promoting integration of information (Graesser, McNamara, Louwerse, & Cai, 2004; Maury & Teisserenc, 2005; Millis & Just, 1994; Sanders & Noordman, 2000). The construction of text meaning also requires inference, a necessary skill for generating opinions or deriving conclusions using known information (Cain & Oakhill, 1999). While processing textual information, an intentional reader monitors his or her understanding to ensure successful comprehension. This comprehension monitoring skill can be assessed by tasks that require children to detect inconsistent information within text (Oakhill et al., 2005).

Existing research has investigated the association between higher-level skills and reading comprehension among young EL1 poor comprehenders by examining how readers generate inferences (Cain, Oakhill, Barnes, & Bryant, 2001; Cain, Oakhill, & Bryant, 2004a), interpret conjunctions and connect words ((Cain, Oakhill, & Elbro, 2003); (Geva & Ryan, 1985)), and monitor comprehension of information (Cain & Oakhill, 2003; Cain & Oakhill, 2006; Oakhill et al., 2005). Together, this body of research has shown that weaker performance on tasks measuring higher-level skills is related to reading comprehension difficulties. However, only a handful of studies have evaluated higher-level skills in ELL poor comprehenders. In a pioneering study, Li and Kirby (Li & Kirby, 2014) found that ELL good comprehenders outperformed ELL average and poor comprehenders on higher-level skills including inference strategies and summary writing but there were no significant differences between average and poor comprehenders on these skills. They hypothesized that higher-level skills were the defining characteristics of good comprehenders and these skills were related to reading comprehension once students reached a certain threshold of English language proficiency. When readers have well-developed vocabulary and language proficiency, higher-level skills appear to be the main source of variability in reading comprehension in differentiating groups. Foundational oral language skills such as vocabulary and grammatical knowledge underlie higher-level skills such as making inferences and comprehension monitoring, which in turn, are necessary for reading comprehension success (Cervetti et al., 2020; Kim, 2017). Comprehension of conjunctions can enhance reading comprehension and has also been found to predict reading comprehension in ELLs ((Crosson & Lesaux, 2013; Crosson, Lesaux, & Martiniello, 2008); Droop & Verhoeven, 2003; (Geva, 2007; Geva & Ryan, 1993; Welie, Schoonen, Kuiken, & van den Bergh, 2017)). For example, Crosson and Lesaux (Crosson & Lesaux, 2013) reported that the ability to understand conjunctions explained sizeable variance in reading comprehension beyond word reading efficiency and vocabulary breadth in ELLs in Grade 5. As a result, conjunctions may be a source of comprehension difficulties in ELL poor comprehenders. Finally, although comprehension monitoring plays an important role in reading comprehension (e.g., (Cain & Oakhill, 2003; Cain & Oakhill, 2006)), little is known about the effect of comprehension monitoring in ELL poor comprehenders.

The present study

The current study thus sought to answer two research questions about ELL poor comprehenders: (1) Do ELL poor comprehenders differ in oral language, metalinguistic skills, working memory, and higher-level processing skills from ELL average and good comprehenders? (2) Are there similarities and differences in reading comprehension profiles between ELL and EL1 poor comprehenders? Our participants were L1 speakers of Chinese, a language spoken by a large number of immigrants in English-speaking countries (Crystal, 2009), and EL1 students in the same grades. Students in Grades 4 to 6 were selected for this study because poor comprehenders are typically identified in upper elementary grades.

Our study was designed to address several gaps in previous research. The first reseach question aimed at identifing sources of comprehension difficulties in ELL poor comprehenders. In addition to oral language, metalinguistic skills, and working memory, we also considered the role of higher-level processing skills, which has been largely ignored by previous studies. Furthermore, the majority of previous studies only compared ELL poor comprehenders to good comprehenders (e.g., (Geva & Massey-Garrison, 2013; Lesaux & Kieffer, 2010; Lipka & Siegel, 2012; Zhang & Shulley, 2017)). Because of the large gap between the two groups, such comparisons may not reveal key characteristics of poor comprehenders. In the present study, both good and average comprehenders were included as comparison groups to capture the full range of reading comprehension abilities.

The second research question sought to address the similarities and differences between ELL poor comprehenders and their EL1 counterparts. Since inherent differences exist between ELLs and EL1s, it is not clear whether the results of studies involving EL1 poor comprehenders can be generalized to ELL poor comprehenders. For example, Cho, Capin, Roberts, Roberts, and Vaughn (Cho, Capin, Roberts, Roberts, & Vaughn, 2019) found that vocabulary knowledge and listening comprehension each explained a larger proportion of variance in reading comprehension for ELLs with reading comprehension difficulties than their EL1 counterparts.

Finally, most of the previous studies included ELLs from diverse language backgrounds with varying levels of English proficiency. Given that L1 characteristics in terms of script and linguistic typology are sources of potential transfer from the L1 to English (Chung, Chen, & Geva, 2018), studies including a mixed sample of ELLs from different L1 backgrounds cannot account for the effects of L1 on children’s performance. Conversely, focusing on a relatively homogeneous group of ELLs allows for the isolation of sources of variance related to reading comprehension skills and enables us to gain a better understanding of the unique characteristics of the group under investigation. Thus, we chose to include ELLs from a single L1 background in the present study.

Method

Participants

Participants were 124 ELLs (60 males and 64 females) who spoke Chinese as their L1 and 79 EL1 children (36 males and 43 females) in Grades 4, 5, and 6. The children were recruited from 10 publicly-funded elementary schools in a large and diverse school board in Ontario, Canada. The number of participants in Grades 4, 5, and 6 were 55, 37, and 32, respectively for ELLs and 29, 26, and 24, respectively for EL1s. English was the language of instruction for all schools. ELL status was established by collecting data from school files, teacher reports, and parent questionnaires. Each year, schools evaluated individual ELL’s status based on his or her performance on measures of English language proficiency. We also confirmed the ELL status using teacher interviews. A parent questionnaire of home literacy environment, including languages spoken at home, was used to confirm ELL status. In the parent questionnaire, we also asked parents “at what age did your child start receiving consistent exposure to English” to ensure all ELLs were equivalent in their exposure to English. The ELLs were either born in Canada or had arrived in Canada at least 2 years prior to testing. All children in this group attended Mandarin Chinese heritage language schools on weekends (two hours per week). According to the demographic data obtained, the majority of participating schools were located in neighbourhoods of low to middle socioeconomic status (SES), indicating that the two groups were comparable in SES (Statistics Canada, 2016). At the time of testing, the mean age of ELLs was 126.26 months, SD = 10.6 and that of EL1s was 128.54 months, SD = 10.7. Only children whose parents signed a consent form were included in the study.

Measures

Control measures

Nonverbal reasoning

Two subtests from the Matrix Analogies Test (MAT expanded form; (Naglieri, 1989)) were used to assess nonverbal reasoning ability: pattern completion and serial reasoning. Each subtest consisted of 16 items of increasing difficulty. Children were asked to complete a figural matrix by choosing the missing piece from five to six possible choices. The Cronbach’s alpha reliability for this task was 0.90 for the ELLs and 0.91 for the EL1s.

Word reading accuracy

Word reading accuracy was assessed with the Letter-Word Identification subtest from the Woodcock Johnson-III Tests of Academic Achievement (WJ-III; (Woodcock et al., 2001)). The child was required to read single words ordered in increasing difficulty in list form. The Cronbach’s alpha reliability for this task was 0.90 for the ELLs and 0.87 for the EL1s.

Word reading fluency

The Sight Word Efficiency test from the Test of Word Reading Efficiency (Torgesen, Wagner, & Rashotte, 1999) was used to assess word reading fluency. This test consisted of 104 real words ordered in increasing difficulty, assessing the number of words accurately read in 45 seconds. The Cronbach’s alpha reliability for this task was 0.87 for the ELLs and 0.86 for the EL1s.

Oral language measures

Vocabulary breadth

The Peabody Picture Vocabulary Test-Fourth Edition (PPVT-IV Form A; Dunn & Dunn, 2007) was used to assess oral vocabulary breadth. There were 228 items within 19 sets. For each item, the examiner said a word and the child was required to point to one of four pictures that best depicted that word. Items were ordered in increasing difficult, and testing stopped when there were eight or more errors within a set. The Cronbach’s alpha reliability for this task was 0.96 for the ELLs and 0.98 for the EL1s.

Vocabulary depth

A multiple meaning vocabulary test designed by Biemiller and Slonim, (2001) was used to measure students’ depth of vocabulary knowledge by focusing on their knowledge of multiple meanings of words. In this task, there were 21 target words with multiple meanings and the children were asked to choose the correct meanings of the target word from four choices. For example, the target word was “ring.” The four choices were as follows: A. sound, B. bell, C. jewellery, D. shoe. The correct answers are A, B, and C. The total score was the number of options correctly chosen. The maximum score was 63. The Cronbach’s alpha reliability was 0.90 for the ELLs and 0.98 for the EL1s.

Listening comprehension

The Woodcock-Johnson Listening Comprehension Test (Woodcock, 1998) was used to assess child’s listening comprehension. In this cloze-type task, the child listened to sentences of increasing difficulty, and provided an oral response to complete each unfinished sentence. The Cronbach’s alpha reliability of this task was 0.76 for the ELLs and 0.78 for the EL1s.

Metalinguistic measures

Morphological awareness

A derivational morphological awareness task was used to measure children’s morphological awareness. Adapted from Carlisle (Carlisle, 2000), this task required the child to produce a derived word to complete a sentence, [e.g., Target word: Farm; Sentence: My uncle is a ____ (farmer)]. The Cronbach’s alpha reliability was 0.89 for the ELLs and 0.85 for the EL1s.

Syntactic awareness

In this experimental task, children were told that they would hear incorrect sentences and their task would be to correct the sentence. The examiner read each of 16 grammatically incorrect sentences to the child (e.g., I wonder how old is he). After each sentence was read, the examiner asked the child to “fix” the sentence to make it grammatically correct. The Cronbach’s alpha reliability was 0.77 for the ELLs and 0.66 for the EL1s.

Working memory measure

Working memory

The Auditory Working Memory test from the Test of Cognitive Ability, Woodcock Johnson–III (WJ-III, (Woodcock et al., 2001)) was used to measure children’s working memory. In this test, the child heard words mixed with digits and was asked to repeat the words first, followed by the digits in the correct order. The Cronbach’s alpha reliability of this task was 0.83 for the ELLs and 0.86 for the EL1s.

Higher-level measures

Inference

Adapted from Cromley and Azevedo (Cromley & Azevedo, 2007), this task used five passages chosen from the Gates-MacGinitie Reading Test II ((MacGinitie & MacGinitie, 1992); Level D4, Form 4) to assess children’s ability to draw inferences while reading. We developed four multiple-choice questions for each passage that tapped inferences that require making inferential links between adjacent sentences within the text as well as those that incorporate information not in the text, i.e., background knowledge. The Cronbach’s alpha reliability was 0.79 for the ELLs and 0.91 for the EL1s.

Conjunction use

This task was adapted from a measure developed by Geva and Ryan (Geva & Ryan, 1985). There were two narrative cloze tasks (the 1st consisting of 198 words, and the 2nd 267 words). In each narrative, various types of high frequency conjunctions were deleted (e.g., so, because, and, but). For each blank, the child was required to select the correct conjunction out of four options. For example, “Cities are large places that are divided into neighborhoods. ___, a city can have many neighborhoods. Also, cities have different kinds of neighborhoods. A. So, B. Because, C. But, D. Before.” In each of the two texts, there were 10 conjunctions to add. The Cronbach’s alpha reliability was 0.70 for the ELLs and 0.72 for the EL1s.

Comprehension monitoring

Children read five short stories and were asked to underline the parts that did not make sense (from (Cain & Oakhill, 2006)). In three of the five stories, there were segments that were contradictory and had two segments that did not make sense. An example of a story that does not make sense is the following passage: “David was making a birthday cake for his friend, Peter. Peter was going to be eleven years old, so David counted out the candles carefully on his fingers as he was putting them on the cake. David put on the same number of candles as he had fingers. The perfect number.” The total score was the total number of correctly identified segments that were underlined because “they did not make sense.” The Cronbach’s alpha reliability was 0.86 for the ELLs and 0.89 for the EL1s.

Reading outcome

Reading comprehension

The Gates-MacGinitie Reading Test II ((MacGinitie & MacGinitie, 1992); Level D4—D5/6, Form 3) different from the one used for the inference task was used to assess children’s reading comprehension. This standardized measure is composed of 12 short passages. The children read passages and answered multiple choices comprehension questions. The reliability for this task was 0.92 for the ELLs and 0.88 for the EL1s.

Procedure

All measures were administered by trained research assistants from March to May in the spring semester. Research assistants were trained in three sessions with each session lasting for two hours. The first two sessions focused on how to administer all measures and the last session was a practice session where research assistants practised administering the measures in pairs and asking questions if needed. Fidelity of administration was monitored through the last training session, observation of on-site administration of the measures, and weekly communication with the testers. Measures of nonverbal ability, vocabulary breadth, listening comprehension, syntactic awareness, working memory, word reading accuracy, and word reading fluency were administered individually, and the remaining measures were administered in two separate group sessions.

Results

Preliminary analyses

Because the norms for the standardized tests are based on EL1 samples, they might not reflect the norms for ELLs. Therefore, raw scores rather than standard scores were used in the analyses (Geva & Farnia, 2012). We examined missing data patterns and found that data were missing at random (Little’s MCAR Test, χ2 = 171.86, df = 53, p = .08). We then used multiple imputation to deal with missing data. One-way analysis of variance (ANOVA) showed that the ELLs performed more poorly than the EL1s on measures of vocabulary breadth, F (1, 202) = 33.44, p < .001, and reading comprehension, F (1, 202) = 15.29, p < .001, confirming their ELL status.

The ELLs and EL1s were next classified as being poor, average, or good comprehenders on the basis of a regression technique. The regression diagnostics were conducted to ensure that the residuals were within the normal range. The assumption of linearity was met as all standardized residual values were between -2 and 2. The residual plot (QQ-plot) was also checked and the normality assumption was met. Children’s reading comprehension scores were regressed upon their age (in months), nonverbal ability, L1/L2 status, word reading accuracy, and word reading fluency scores, which jointly accounted for 67% of the variance. The children’s actual reading comprehension scores were then plotted against the standardized “predicted values.” We first eliminated the students whose predicted values were one standard deviation below or above the overall mean. This excluded students who had very poor or very good word reading scores because the essential characteristics of poor comprehenders is poor reading comprehension in the presence of adequate word reading skills (e.g., (Cain & Oakhill, 1999)).

To define the groups more precisely, and to avoid having students close to the boundaries between groups, confidence intervals around the regression line were used. The previous studies used different confidence intervals, e.g., 80% in Tong et al. (Tong et al., 2011), 70% in Li and Kirby (Li & Kirby, 2014), and 65% in Tong et al. (Tong et al., 2014). Because of inconsistent criteria, we conducted sensitivity analysis using 75% and 70% confidence intervals respectively to determine which confidence intervals might yield robust results. We did not use 80% confidence intervals because they are conservative and would limit the sample size of the poor comprehenders. We also did not consider 65% confidence intervals because adopting these interveals would identify those who may not have severe difficulties as poor comprehenders.

The receiver operating characteristics (ROC) curve was used to test the sensitivity and specificity of the “predicted values” to correctly identify at-risk or no-risk status. Sensitivity refers to how good the tasks are in detecting students who are at risk while specificity refers to the task’s ability to avoid falsely identifying students who are actually not at risk. The area under the curve (AUC) indicates how accurately the set of predictors can classify students as at-risk or not at-risk: greater than 0.90 are considered excellent, 0.80-0.89 are good, 0.70-0.79 are fair, and 0.70 or less are poor (Compton, Fuchs, Fuchs, & Bryant, 2006). When 75% confidence intervals were used, the sensitivity and specificity were 82% and 77%, respectively and the AUC was 0.83, p < .01. These intervals identified 13 ELL poor comprehenders and 10 EL1 poor comprehenders. When 70% confidence intervals were used, the sensitivity and specificity were 80% and 74%, respectively and the AUC was 0.80, p < .01. This time 15 ELL poor comprehenders and 11 EL1 poor comprehenders were identified. Since the sensitivity and AUC values did not decrease much from 75 to 70% confidence intervals and the values at both cutoffs were robust, we decided to adopt the 70% confidence intervals, which generated more poor comprehenders to increase statistical power.

Students below the lower 70% confidence interval of the regression line were defined as poor comprehenders and those above the upper 70% confidence interval were defined as good comprehenders. Using this procedure, 15 ELL poor comprehenders and 11 EL1 poor comprehenders were identified and 13 ELL good comprehenders and 11 EL1 good comprehenders were identified. Students who scored within the 20% confidence intervals were identified as average comprehenders. Since the majority of the students fell into the category of average comprehenders, in order to match the number in each comprehender group, we selected 14 ELL average comprehenders and 12 EL1 average comprehenders who were matched on age, nonverbal ability, word reading accuracy, and word reading fluency with the poor and good comprehender groups. Those who were not selected were excluded from the analyses. The unselected students also included those who had poor word reading skills and thus would not fall into the poor comprehender category. The proportions of the three comprehender groups were similar for the ELL (12% poor, 11% average, and 11% good) and EL1 (14% poor, 15% average, and 14% good) groups. The distributions of poor, average, and good comprehenders were also similar across the three grade levels in ELLs, ranging from 12.5 to 13.5% for poor comprehenders, 9.4 to 12.7% for average comprehenders, and 9 to 12.5% for good comprehenders.

The descriptive statistics for age, nonverbal ability, word reading accuracy and fluency, and reading comprehension for the three comprehender groups are summarized in Table 1. The means and standard deviations were calculated separately for the ELL and EL1 groups. A series of ANOVAs were used to confirm that poor, average, and good comprehenders fulfilled the selection criteria and did not differ in terms of age, nonverbal ability, word reading accuracy, and word reading fluency in both the ELL and EL1 groups (See Table 1). However, as expected, there were significant differences in reading comprehension across the three comprehender groups in the EL1 and ELL samples. For the ELL sample, we also compared the initial age of consistent exposure to English to ensure that the three groups were comparable on this variable. Consistent exposure to English was measured with the item “At what age did your child start receiving consistent exposure to English” in a parent questionnaire sent out to parents. The ANOVA showed that the ELL poor, average, and good comprehenders were equivalent on the exposure to English.

Table 1 Means, standard deviations, and ANOVA results for ELL and EL1 poor, average, and good comprehenders on age, nonverbal ability, word reading accuracy, word reading fluency, and reading comprehension

Performance on oral language, metalinguistic skills, working memory, and higher-level skills in the three ELL comprehender groups

Because a key research question was to examine comprehender group differences within ELLs, we conducted MANOVA and ANOVA analyses for the ELLs. For this sample only, three MANOVAs were conducted on oral language, metalinguistic awareness, and higher-level skills, whereas one ANOVA was conducted on working memory. Comprehender group (poor, average, and good) was a between-subjects variable in these analyses (see Table 2).

Table 2 Performances of oral language, metalinguistic, cognitive, and higher-level skills in ELL comprehender groups—descriptive statistics and MANOVA summary table

The first MANOVA compared performance on oral language across the three groups. Vocabulary breadth, vocabulary depth, and listening comprehension were entered as dependent variables. There was an overall statistically significant group effect, Wilks’ λ = .59, F (6, 74) = 2.68, p < .05, ηp2 = .23. Follow-up univariate one-way ANOVAs indicated significant group differences in vocabulary breadth, F (2, 39) = 4.55, p < .05, ηp2 = .19, and listening comprehension, F (2, 39) = 6.48, p < .01, ηp2 = .25, but not in vocabulary depth. Post hoc tests with Benjamini-Hochberg correction confirmed that poor comprehenders had significantly lower scores than average and good comprehenders on vocabulary breadth and listening comprehension, but there were no significant differences between average and good comprehenders on these oral language skills.

The second MANOVA compared performance on metalinguistic skills across the three groups. The dependent variables were morphological awareness and syntactic awareness. There was a statistically significant overall group effect, Wilks’ λ = .79, F (4, 76) = 2.31, p < .05, ηp2 = .11. Follow-up univariate one-way ANOVAs indicated group differences in morphological awareness, F (2, 39) = 4.78, p < .05, ηp2 = .20, but not in syntactic awareness. Post hoc tests with Benjamini-Hochberg correction showed that poor comprehenders had significantly lower scores than average and good comprehenders on morphological awareness. Furthermore, there were no significant differences between average and good comprehenders on the two metalinguistic skills.

The third MANOVA compared performance on higher-level skills across groups and the dependent variables were inference, conjunction use and comprehension monitoring. The effect of group was statistically significant, Wilks’ λ = .50, F (6, 74) = 5.17, p < .001, ηp2 = .30. Follow-up univariate one-way ANOVAs indicated that there was a significant effect on inference, F (2, 39) = 7.73, p < .01, ηp2 = .28, conjunction use, F (2, 39) = 11.42, p < .001, ηp2 = .37, and comprehension monitoring, F (2, 39) = 8.03, p < .01, ηp2 = .29. Post hoc tests with Benjamini-Hochberg correction showed that poor comprehenders had significantly lower scores than good comprehenders on all higher-level skills. Moreover, there were also significant differences between average comprehenders and good comprehenders on all higher-level skills, favoring good comprehenders.

An ANOVA was conducted to compare performance on working memory across the three groups. Between-group differences were statistically significant, F (2, 39) = 3.45, p < .05. Post hoc tests with Benjamini-Hochberg correction showed that poor comprehenders had significantly lower scores than good comprehenders on working memory but no significant difference was found between poor and average comprehenders on working memory.

Comparison of oral language, metalinguistic skills, working memory, and higher-level skills between ELLs and EL1s

To compare the performance between ELLs and EL1s on oral language, metalinguistc awareness, higher-level skills, and working memory, MANOVAs, and ANOVAs were conducted with language status (ELL, EL1) and comprehender group (poor, average, and good) as between-subjects variables (see Table 3). For each analysis, follow-up post hoc tests were carried out to compare ELLs and EL1s within each of the three comprehender groups. For the first MANOVA, the dependent variables were oral language. There was an overall statistically significant group effect, Wilks’ λ = .54, F (15, 200) = 3.16, p < .001, ηp2 = .19. Follow-up univariate one-way ANOVAs indicated significant group differences in vocabulary breadth, F (5, 70) = 8.76, p < .005, ηp2 = .39, and listening comprehension, F (5, 70) = 6.94, p < .001, ηp2 = .33, but not in vocabulary depth. Post hoc tests with Benjamini-Hochberg correction confirmed that ELL poor comprehenders had significantly lower scores than EL1 poor comprehenders on vocabulary breadth and listening comprehension. On the other hand, no significant differences were detected between ELL and EL1 average comprehenders or between ELL and EL1 good comprehenders on these oral language skills.

Table 3 Comparisons of oral language, cognitive-linguistic, and higher-level skills between ELL and EL1 groups—descriptive statistics of EL1 groups and MANOVA summary table

For the second MANOVA, the dependent variables were metalinguistic skills. Overall, there was a statistically significant group effect, Wilks’ λ = .68, F (10, 138) = 2.92, p < .01, ηp2 = .18. Follow-up univariate one-way ANOVAs indicated significant group differences on morphological awareness, F (5, 70) = 6.01, p < .001, ηp2 = .30 and syntactic awareness, F (5, 70) = 2.51, p < .05, ηp2 = .15. Post hoc tests with Benjamini-Hochberg correction showed that ELL poor comprehenders had significantly lower scores than EL1 poor comprehenders only on morphological awareness. There were no significant differences between ELL and EL1 average comprehenders or between ELL and EL1 good comprehenders on the two metalinguistic skills.

For the third MANOVA, the dependent variables were higher-level skills. There was a statistically significant group effect on all skills, Wilks’ λ = .54, F (15, 200) = 3.18, p < .001, ηp2 = .19. Follow-up univariate one-way ANOVAs indicated significant group differences in inference, F (5, 70) = 6.33, p < .001, ηp2 = .31, conjunction use, F (5, 70) = 3.73, p < .05, ηp2 = .21, and comprehension monitoring, F (5, 70) = 3.59, p < .05, ηp2 = .20. Post hoc tests with Benjamini-Hochberg correction did not show any significant differences between ELL and EL1 poor comprehenders, ELL and EL1 average comprehenders, and ELL and EL1 good comprehenders on these higher-level skills.

Finally, an ANOVA was calculated to analyze the group differences on working memory. There was no statistically significant difference between groups, F (5, 70) = 2.15, ns. Thus, post hoc tests were not conducted.

Discussion

This study sought to examine group differences in a large set of skills related to reading comprehension in Chinese-speaking ELLs in Grades 4, 5, and 6. EL1s in the same grades were included as a comparison group. Poor comprehenders were children who had average word reading skills but demonstrated reading comprehension difficulties. The results showed significant differences between ELL poor and average comprehenders, as well as between ELL and EL1 poor comprehenders on oral language and metalinguistic skills. Specifically, ELL poor comprehenders performed lower than their EL1 counterparts and ELL average comprehenders on vocabulary breadth, listening comprehension, and morphological awareness. Significant differences were also observed between ELL average and good comprehenders, with the latter group scoring higher on all three higher-level processing skills.

The reading profile of ELL poor comprehenders

When the three ELL comprehender groups were compared, poor comprehenders performed much more poorly than good comprehenders on all tasks except vocabulary depth and syntactic awareness. Importantly, poor comprehenders also performed worse than average comprehenders on two oral language skills, vocabulary breadth and listening comprehension, and on morphological awareness. The Simple View of Reading model states that reading comprehension is supported by both decoding and linguistic comprehension skills (Gough & Tunmer, 1986). Since poor comprehenders have lower than expected comprehension based on their decoding skills, it is reasonable to anticipate that sources of their reading comprehension difficulties come from linguistic comprehension. Our findings suggest that poor comprehenders experience difficulties in both oral language skills such as vocabulary breadth and listening comprehension, and metalinguistic skills such as morphological awareness. Previous research reported lower performance on listening comprehension, vocabulary and morphological awareness in ELL poor comprehenders when they were compared to ELL good comprehenders (Geva & Massey-Garrison, 2013; Lesaux & Kieffer, 2010; O’Connor et al., 2019; Zhang & Shulley, 2017). Our findings extend this body of research by showing that ELL poor comprehenders also perform worse on these skills than average comprehenders. Comparing poor comprehenders to both average and good comprehenders enables us to identify sources of reading comprehension difficulties more accurately.

With respect to vocabulary, the present study found that vocabulary breadth, but not depth, distinguished ELL poor comprehenders from average and good comprehenders. It is possible that ELLs need to acquire a large number of words, before vocabulary depth, which involves a more nuanced understanding of vocabulary, starts to play a significant role in text comprehension. In a meta-analysis of comprehension problems for ELL poor comprehenders, Spencer and Wagner (Spencer & Wagner, 2017) reported a substantial difference in oral language skills including vocabulary breadth between ELLs with and without reading comprehension deficits. Relatedly, vocabulary and listening comprehension have been shown to explain more variance in reading comprehension in ELLs with reading comprehension difficulties than in EL1s with reading comprehension difficulties (Cho et al., 2019). Taken together, the research evidence indicates that vocabulary, listening comprehension, and morphological awareness are potential sources of difficulties for ELL poor comprehenders in the upper elementary grades.

We did not find differences between the three groups of comprehenders on syntactic awareness. This may be attributed to the measure used in the present study as different measures of syntactic awareness may require different processing strategies (Cain, 2007). Syntactic awareness is typically assessed by either a grammatical correction task or a word-order correction task. A grammatical correction task requires children to detect and correct grammatical mistakes, e.g., She swims not (She does not swim). A word-order correction task, on the other hand, asks children to re-arrange sentences presented in a jumbled order, e.g., ran after the boy bus the (The boy ran after the bus). A grammatical correction task was used to assess syntactic awareness in the present study. Since no significant difference was found between the three groups on syntactic awareness measured by a grammatical correction task, future research should explore whether a word-order correction task may differentiate the three groups of comprehenders or that the deficits of poor comprehenders could be explained by factors other than syntactic awareness deficits.

Our results also showed that ELL poor comprehenders performed more poorly than ELL good comprehenders but not ELL average comprehenders on working memory. This finding is consistent with previous research involving EL1 children (Cain, Oakhill, & Bryant, 2004a; (Oakhill et al., 2005; Pimperton & Nation, 2014; Swanson et al., 2007)). The reader must maintain and manipulate information in memory to comprehend texts. ELL poor comprehenders demonstrate difficulties in vocabulary and listening comprehension. This weakness may restrict their ability to access word meaning from memory and store verbal information in memory, which in turn impairs their reading comprehension (Swanson, Sáez, & Gerber, 2006). Future research should replicate the role of working memory in ELL poor, average, and good comprehenders.

With respect to higher-level skills, our findings demonstrated that ELL poor comprehenders had lower performance than ELL good comprehenders on inference, conjunction use, and comprehension monitoring skills. Average comprehenders also performed lower than good comprehenders on these skills but no differences were found between average and poor comprehenders. In a study involving middle school ELL poor comprehenders in China, Li and Kirby (Li & Kirby, 2014) reported that inference, reading strategies, and summary writing distinguished between average and good comprehenders, but not between poor and average comprehenders. While both studies focused on Chinese-speaking ELLs, the current study extends Li and Kirby’s (Li & Kirby, 2014) findings to additional higher-level skills such as conjunction use and comprehension monitoring and to a younger population in elementary school. The findings of the two studies are consistent despite differences in educational context (Canada vs. China) and language learning environment (English as an L2 vs. a foreign language), pointing to the potential generalizability of the findings.

To our knowledge, the present study is the first to demonstrate comprehension monitoring deficits in ELL children with poor reading comprehension. Comprehension monitoring requires readers to evaluate their understanding of the text and regulate their reading process (Cain & Oakhill, 2003; Cain & Oakhill, 2006). Our study suggests that comprehension monitoring is a key component of successful reading comprehension, and deficits in monitoring lead to poor comprehension for ELLs. Furthermore, our results regarding the role of conjunction use in reading comprehension corroborate previous work ((Crosson et al., 2008; Crosson & Lesaux, 2013); Droop & Verhoeven, 2003; (Geva, 2007; Geva & Ryan, 1993; Welie et al., 2017)). Since conjuctions connect idea units in discourse and improve the ability to make accurate inferences, knowledge of conjunctions is essential for reading comprehension among L2 learners. Therefore, it appears that ELL poor comprehenders have weaknesses in higher-level comprehension skills and working memory as well as oral language and metalinguistic skills compared to other ELLs. This is consistent with Spencer and Wagner’s (Spencer & Wagner, 2017)’s meta-analysis, which showed that ELL poor comprehenders had greater difficulties in reading comprehension than oral language skills. It is perhaps the combination of these deficits that makes reading comprehension a particularly difficult task.

The comparison of ELL and EL1 poor comprehenders

While no differences were found between ELLs and EL1s in the reading profiles of average and good comprehenders, ELL poor comprehenders performed lower than their EL1 counterparts on vocabulary breadth, listening comprehension, and morphological awareness. Research has shown that ELLs take longer to acquire vocabulary and reading comprehension skills than their EL1 counterparts (Farnia & Geva, 2011). Our findings depict a more nuanced picture regarding this delay. We found that ELL average and good comprehenders developed language and literacy skills that approximated those of their EL1 peers, but ELL poor comprehenders performed worse on oral language and metalinguistic skills as compared to both typically developing ELLs and EL1 poor comprehenders. Although previous studies demonstrated that ELLs lagged behind EL1 peers on vocabulary and reading comprehension (Geva & Farnia, 2012; Lesaux & Kieffer, 2010), each language group was treated as a whole in the analyses. By breaking ELLs down into different comprehender subgroups, we found that the general low performance of ELLs as compared to EL1s may be largely attributed to the difficulties experienced by the ELL poor comprehender group. The problems faced by ELL poor comprehenders cannot be simply explained by their reduced English exposure, as ELL average and good comprehenders face similar obstacles in L2 learning. This can be seen from the equivalent levels of consistent exposure to English reported by the three groups. Rather, ELL poor comprehenders experience persistent deficts in developing language and comprehension skills.

No significant difference was found between ELL and EL1 poor comprehenders on working memory in our study. This is not surprising given the mixed findings reported in the literature. Studies in which ELL poor comprehenders performed worse than EL1 poor comprehenders typically adopted working memory tasks that relied heavily on linguistic knowledge, such as a reading span task. A reading span task has different versions. It can require individuals to indicate whether a sentence was “true or false” or ask individuals to judge whether the sentence is semantically and syntactically correct while recalling the final word of the sentence (Conway, Kane, Bunting, Hambrick, Wilhelm, & Engle, 2015). In contrast, no differences were observed between ELL and EL1 poor comprehenders on working memory tasks that required minimal linguistic knowledge, e.g., a backward digit span task (Geva & Massey-Garrison, 2013). An auditory working memory task was adopted in the present study. Because this task required children to repeat a combination of words and digits in the same order presented to them, it tapped limited linguistic knowledge. This may explain why the auditory task did not distinguish the two groups on working memory.

On the other hand, ELL and EL1 poor comprehenders did not differ on higher-level skills such as inference, conjunction use, and comprehension monitoring. As higher-level skills capture abilities beyond basic language skills, our findings indicate that these skills may exert an effect on reading comprehension only after a certain threshold level of language proficiency has been reached. Higher level skills might also be influenced by experience with text, resulting in a reciprocal relationship between higher level skills and reading comprehension. Thus, vocabulary and metalinguistic skills, rather than cognitive and higher-level processing skills, are likely to be key sources of weakness in ELLs poor comprehenders. The current study suggests that ELL poor comprehenders experience more severe reading comprehension difficulties than EL1 poor comprehenders because they face dual challenges related to their ELL status and their poor comprehender status.

Limitations and future directions

The results of the current study should be interpreted with caution due to the small sample size. We only identified a small number of ELL poor comprehenders and EL1 poor comprehenders. However, our poor comprehenders represented 10–15% of the total sample, which is consistent with the percentages of poor comprehenders reported in previous studies (e.g., (Catts et al., 2006)). For example, Tong et al. (Tong et al., 2014) identified 15 poor comprehenders in their Grade 4 sample and McBride-Chang, Liu, Wong, Wong, and Shu (McBride-Chang, Liu, Wong, Wong, & Shu, 2012) identified 16 poor readers in Chinese and 16 poor readers in English. That said, future studies with larger sample sizes are necessary to increase the power of the analyses. Additionally, longitudinal studies should be conducted to understand factors that contribute to reading comprehension difficulties over time. As few studies have examined higher-level skills in ELL poor comprehenders, it is particularly important to explore the role of higher-level processing skills in the later grades. Finally, this study focused exclusively on ELLs from Chinese-speaking families. Future research should investigate the generalizability of the findings by targeting ELLs who speak other L1s.

Conclusion

The findings of this study suggest that ELL poor comprehenders have considerable weaknesses in oral language and metalinguistic skills. With respect to identification, a comprehensive battery that includes oral language and metalinguistic skills should be implemented to assess ELLs who experience reading comprehension difficulties. Enhancing higher-level skills is also important for all average comprehenders and poor comprehenders regardless of language status. Because ELL poor comprehenders face dual challenges, it is important to improve the accuracy of identification of procedures to separate true reading comprehension difficulties from limited language learning experience among ELLs. In addition to identifying risk status, our findings also can inform instruction and intervention. That is, effective instructional and intervention programs need to be created to enhance oral language and metalinguistic skills to facilitate reading comprehension and reduce reading failure among ELLs.