Introduction

There are several reasons for studying poor reading comprehension in Chinese children learning to read Chinese. Most of the studies on Chinese children’s reading acquisition and development in Chinese are on phonological sensitivity with some data on orthographic and morphological processing in relation to reading characters or two-character words (e.g., Ho & Bryant, 1997; McBride-Chang & Ho, 2005; McBride-Chang et al., 2005; McBride-Chang & Kail, 2002; Shu, Chen, Anderson, Wu, & Xuan, 2003; Siok & Fletcher, 2001). There are few studies on reading comprehension in Chinese elementary school students. The recent Progress in International Reading Literacy Study (PIRLS) involving nearly 5,000 representative Hong Kong Grade 4 Chinese students (Tse, W. Y. Lam, Y. H. Lam, & Loh, 2005) in a total of some 150,000 Grade 4 students in 35 countries was an attempt to assess the level of reading comprehension in Chinese within the stricture of a cross-language and cross-national survey. This otherwise informative study was not specially designed to study various cognitive and linguistic factors influencing Chinese text comprehension.

The present study examined how some “lower level” cognitive and linguistic skills might influence “higher level” text comprehension. Text comprehension was defined operationally as encoding and activating relevant information during reading, which may not be stated explicitly in the text (Kintsch, 1994; Kintsch & Kintsch, 2005; McKoon & Ratcliff, 1989; Perfetti, Landi, & Oakhill, 2005). The lower cognitive and linguistic skills focused on verbal working memory, word reading, rapid naming, and segmentation at the onset-rime level. Of these variables, verbal working memory in particular has been shown to influence children’s reading comprehension (Cain, Oakhill, & Bryant, 2000, 2004a; Cain, Oakhill, & Lemmon, 2004b; Oakhill, Cain, & Bryant, 2003). Verbal working memory has been found to explain the poor reading performance of reading disabled children (Bayliss, Jarrold, Baddeley, & Leigh, 2005; de Jong, 1998; Gathercole, Alloway, Willis, & Adams, 2006; Nation, Adams, Bowyer-Crane, & Snowling, 1999; Swanson, 2003; Swanson & Alexander, 1997) and to differentiate subgroups of children with word recognition and reading comprehension deficits (Swanson, Howard, & Sáez, 2006).

Specifically, the present study compared 31 less skilled reading comprehenders (PC) in Chinese with 37 reading comprehension (RC) control children and 23 chronological age (CA) control children, all recruited from two schools in Hong Kong. Less competent or less skilled comprehenders in the present study were operationally defined as those scoring at about or below 16th percentile in Grade 5 Chinese on a fairly comprehensive set of written language comprehension tasks designed by individual schools (details provided in the Method section).

Research framework for components of Chinese text comprehension

Verbal working memory and reading impairment

Verbal working memory tasks for children generally require them to hold increasingly complex verbal information in memory while responding to questions about the tasks (Baddeley, 1986; Daneman & Carpenter, 1980, 1983). The underlying idea is that participants have to understand the meaning of each of a group of unrelated sentences so as to be able to answer a comprehension question (the processing aspect), and at the same time, to recall the last words in the sentences (the storage aspect). Skilled comprehenders may allocate more working memory resources to text comprehension than to word recognition as compared with less skilled comprehenders (Swanson & Berninger, 1996). Given similar levels of background knowledge, good comprehenders tend to make more integrative inference than poor comprehenders, who are constrained by working memory processing capacity to build mental models of text (Oakhill, Cain, & Yuill, 1998).

In two studies involving 41 participants, Daneman and Carpenter (1980) reported correlation of 0.72 between their reading and listening span task and reading comprehension. The impact of this early investigation has been wide ranging. A number of studies have shown that measures of reading span make a unique contribution to measures of reading and language comprehension (e.g., Bayliss et al., 2005; Cain et al., 2000, 2004a, b; Daneman & Carpenter, 1983; Seigneuric & Ehrlich, 2005; Seigneuric, Ehrlich, Oakhill, & Yuill, 2000). Directly on working memory deficits of reading disabled children, there are also variant views and findings (de Jong, 1998; Gathercole et al., 2006; Nation et al., 1999; Swanson, 2003; Swanson & Alexander, 1997). In a meta-analysis based on 77 studies and over 6,000 participants, Daneman and Merikle (1996) confirmed the Daneman and Carpenter (1980) finding that verbal storage and processing are good predictors of processes integrating successively encountered propositions in reading or language comprehension.

Poor text comprehenders showing lower memory spans compared with normally developing readers were found in those tasks that placed a heavy demand on the semantic and syntactic systems supported by speech perception, production, and language comprehension (Nation et al., 1999). There were similar findings that poor comprehenders showed deficits in more general language comprehension, which might be present from early school grades (Catts, Adlof, & Weismer, 2006). Catts et al. suggested that poor comprehenders’ difficulties in text-level comprehension could be due to working memory difficulties over and above problems in grammar and vocabulary.

Results similar to those reviewed have been obtained from children with dyslexia. Berninger et al. (2006) used a multiple set of developmentally stable markers of dyslexia to examine competing theories of phonological core deficit versus working memory deficit in 122 children with dyslexia and their 200 affected biological parents. Structural equation modeling showed that for the children, first-order factors from phonological, orthographic, and or morphological word-forms uniquely predicted a large number of reading and spelling outcomes. Furthermore, structural equation modeling of the three working memory component factors (temporal storage unit of phonological word form, a time-sensitive phonological loop for the learning of new words, and executive function) found the most consistent predictor of text-reading and writing for both the children and the adult samples to be the second-order word form factor. These results led Berninger et al. (2006) to conclude that dyslexia as a developmental disorder is characterized by both a working memory deficit and a phonological deficit, and these deficits may not be mutually exclusive. Individual analyses further revealed that there were considerable interindividual and intraindividual variations of the temporal orchestration of the working memory components. Similar views were expressed in a focus article by Swanson and Siegel (2001) that some individuals with reading disability perform poorly on working memory tasks that place heavy demands on both storage and processing.

From the review and integration of recent studies summarized above, it is clear that working memory plays an important role in reading comprehension in children, and impairment in working memory results in a bottleneck and in learning difficulties in written and oral discourse.

Chinese pseudowords

The notion and characteristics of Chinese pseudowords are quite different from those in English. Pseudowords in English are pronounceable nonwords (e.g., bave) and have been shown to correlate highly with real word recognition and reading comprehension (e.g., Rack, Snowling, & Olson, 1992). Children’s level of skills in reading pseudowords is an indication of their phonological processing ability, which is critical in reading alphabetic orthographies. Would similar logic apply to reading pseudowords in Chinese?

In the first place, a distinction must be made between a Chinese character (zi) as a basic orthographic unit or graphic symbol and a word (ci) consisting usually of two or more characters. The basis of compositionality of characters is the corpus of about 560 foundational bujians, which subsume about 212 radicals with constituent phonetic and semantic cues as aids in reading/spelling and in accessing dictionaries (Zhong guo guo jia yu wei [Chinese National Language Committee], 1998). A character almost always corresponds to a morpheme (not a phoneme) in the spoken language, whereas a word is the smallest independent unit of meaning and is polymorphemic (Leong, 1997). Take as examples, the noun phrase “The People’s Republic of China” consists of seven graphic units or characters (zi) but only three words (ci) in Chinese to denote “China or Chinese, people, republic”.

In the present study and in a study by Leong, Cheng, and Tan (2005), two-character Chinese pseudowords, or strictly pseudo cis, were used. For these two-character pseudowords, each of the constituent characters was a real pronounceable character, but their combination yielded a pronounceable meaningless Chinese word with no semantic link between the two real constituent characters. This method of constructing Chinese pseudowords gets around the issue of pronounceability, as each constituent character is pronounceable. More importantly, unlike pseudowords in English, the focus is on the children’s correct oral reading of each of the two characters singly and in combination by utilizing phonetic, orthographic, morphological and meaning cues. The mechanism of retrieving all linguistic cues from long-term memory to address the phonology in pronouncing the two-character pseudowords may approximate the mechanism in retrieving, integrating, and interpreting information in short Chinese text materials.

Rapid automatized naming

The early notion proposed by Denckla and Rudel (1974) of “lack of automaticity” as a correlate of reading and its difficulties and the resultant rapid automatized naming (RAN) test with practical applications have been substantiated and refined in a number of recent studies (Compton, Olson, & De Fries, 2002; Denckla & Cutting, 1999; van den Bos, Zijlstra, & Spelberg, 2002). RAN with its pre-symbolic component (colors and objects) and symbolic component (numbers and letters) is part of efficient and rapid phonological retrieval process; and the underlying process relating RAN performance to word reading is a complex one (Compton et al., 2002; Denckla & Cutting, 1999; van den Bos et al., 2002). All these recent behavioral studies validate the “visual–verbal” responses instantiated by RAN which may be a “biomarker” of what the brain does and needs to do in fluent reading (Denckla, 2005, p. 150).

Onset-rime phonological segmentation

On the question of the involvement of phonology in reading Chinese, it is often assumed [erroneously] that reading Chinese characters relies primarily on “visual” skills and orthographic analysis and that the processing route is more from graphic symbols to meaning. While the square-shaped Chinese characters occupying the same geometric space for each symbol are visually complex as compared with English words, there is little support for the assertion that the identification of characters is from graphic symbols directly to meaning. Research using different experimental paradigms such as priming and backward and forward masking has shown that phonology is an interactive constituent part in identifying Chinese characters and is activated early, rapidly, and at the moment of recognizing orthographic shapes (Spinks, Liu, Perfetti, & Tan, 2000; Tan & Perfetti, 1998, 1999). Furthermore, according to the Universal Writing System Constraint, all writing systems encode language and according to the Universal Phonological Principle, the activation of word pronunciation occurs across all writing systems and the effect of phonology is robust as tested with Event Related Potentials (ERP) and functional magnetic resonance imaging (fMRI) (Perfetti, 2003; Perfetti & Liu, 2005; Perfetti, Liu, & Tan, 2002).

In Chinese, there is evidence that onset-rime deletion, rather than segmental phonemic awareness, of Chinese characters would predict Chinese character and word reading (Siok & Fletcher, 2001). This emphasis on onset-rime deletion, rather than segmental phoneme deletion, is in keeping with psycholinguistic analysis that Chinese is basically paradigmatic and not so much segmental and that the basic unit of the character is coterminous with the syllable with its onsets and rimes (Leong, 1997; Wang, 1985). Would these results of the role of onset-rime deletion go beyond character and word reading? Would they apply to reading comprehension in Chinese? Quite possibly, the incorporation of phonological segmentation tasks at the beginning phase in reading in Chinese might explain some additional individual variations in reading comprehension in Chinese (McBride-Chang, Bialystok, Chong, & Li, 2004).

In this study, we hypothesized that less skilled comprehenders would perform more poorly and would also show larger individual differences than their controls in reading and reading-related tasks. It is further posited that text comprehension in Chinese, as assessed by short, written answers to open-ended inferential questions would be strongly influenced by verbal working memory, and to a lesser extent, by Chinese pseudoword reading together with a very small contribution from rapid naming and perhaps onset-rime phonological segmentation tasks (see Fig. 1). As a corollary, we also examined the mediating functions of verbal working memory and pseudoword reading in relation to text comprehension (Baron & Kenny, 1986). The emphasis was on drawing inference and going beyond explicit statements from text, and the accent was on a deeper understanding of text materials (Kintsch, 1994; W. Kintsch & E. Kintsch, 2005; Perfetti et al., 2005).

Fig. 1
figure 1

Structural equation modeling showing standardized effects of verbal working memory, rapid automatized naming (RAN), and onset-rime segmentation on text comprehension and pseudoword reading

Method

Participants

The total sample of 91 students was selected from over 500 grades 3, 4, and 5 Chinese students in a larger study in two schools in Hong Kong. The target group of 31 less competent or less skilled comprehenders (21 boys and 10 girls) was selected from 191 grade 5 students in these schools on the basis of low performance in an omnibus battery of internal Chinese language tests covering mainly reading comprehension and first standardized within each school. The bottom 31 students (16%) of these grade 5 students formed the target or poor comprehension group (PC) with M age = 11.6 years, SDage = 1.2 years. From the 140 students in grade 3 of the same two schools, a reading comprehension (RC) control group was selected at random and 37 students (20 boys and 17 girls) were deemed to be equivalent to the Chinese reading performance of the PC students as judged by the internal tests and the class teachers, M age = 9.0 years, SDage = .4 years. From the same grade 5 classes, 23 students (12 boys and 11 girls) were randomly selected to form a chronological age control group (CA), M age = 10.3 years, SDage = .5 years. Because of the method of selection of the students and to accommodate the requests of the schools, there was an overall difference in ages among the three groups: F(2,90) = 96.72, p < .001; and also between the PC and CA groups. However, on the basis of the specially devised text comprehension task (see Tasks and Procedure), with 8 passages and 24 open-ended questions, there was no difference in the performance of the PC and RC groups, while there was an overall difference among the PC, RC, and CA groups: F(2,90) = 26.76, p < .001.

Tasks and procedure

Text comprehension

Eight short text passages were adapted, modified, and rewritten in traditional (complex) Chinese characters from the most recent (2004) series of Chinese text books for grades 3 to 6 published by the People’s Education Publishing in Beijing. These books were approved by the national textbooks committee of China and the characters and words used were based on statistical analyses of the corpus (Beijing Language Institute, 1984). By using refereed materials from mainland China as the basis for the rewritten text passages, rather than commercial textbooks published in Hong Kong, the effect of prior learning would be minimized.

Of the eight rewritten text passages, four were narrative pieces, three were expository, and one was a poem from the well-known poet Li Po. These 8 short pieces, carefully balanced in syntactic complexity, ranged in length from 6 sentences to 12/13 sentences. The contents were familiar to 9- to 11-year-old Chinese children to minimize the impact of background knowledge (e.g., passages on Hong Kong, the Great Wall of China, unattractive but practical peanuts). Based on the genre and structure of the passages, the mainly narrative pieces (passages 1, 2, 3, and 8) constituted text comprehension 1 (TC1). The mainly expository essays (passages 4, 5, 6, and 7) formed the second indicator text comprehension 2 (TC2).

The text comprehension task with the eight passages followed by three open-ended inferencing questions each was administered as a written task to the whole class of students in 40 min plus about 8/10 min for two short practice examples to explain clearly the aim of drawing inferences from text. The children were told to read silently each printed text passage, to concentrate on making inferences in their written answers and not to worry unduly about their writing and spelling. Differential credits of 0, 1, 2, to 3 were awarded according to the implausibility, shallowness, or depth of the short written answer to each question. The maximum score for the whole task was 72 (8 passages × 9 marks for 3 questions per passage).

The principles of scoring the written answers on the basis of transforming knowledge and not merely re-telling it verbatim (Bereiter & Scardamalia, 1987), of explanatory and not just descriptive answers and of “envisionment” of text-worlds (Langer, 1986) characterized the approach to assessing text comprehension in the present study. Passage no. 1 “the Pearl of the Orient [Hong Kong]” is shown in Appendix. The first question on selecting three words or phrases from the text to explain why Hong Kong was called the Pearl of the Orient was an example of literal inferencing utilizing linguistic features of the text materials. To secure a maximum score of three, students had to search the text and make the best choices of relevant words or phrases. The second question asking students to explain why the lights in Hong Kong were like burning fire and rivers was an example of coherence inferencing requiring the integration of propositions. The third question requiring the students to select the most appealing characteristics of the city and to justify their answers was an example of elaboration inference, where students had to integrate the different concepts to go beyond the information given to attain the top score of three.

To ensure consistency of grading, each set of written protocols was scored independently by two research assistants according to the marking principles explained above. Cronbach’s alpha coefficient for the protocols for the eight passages as a whole was .908. This coefficient indicates that the 8 passages, as a whole, and the answers to the comprehension questions were consistent and useable. The inter-rater reliabilities for the answers to the three questions considered as a whole for the eight passages from nos. 1 to 8 were, respectively, .854, .876, .850, .702, .792, .716, .657, and .709.

To supplement the quantitative answers, the written answers provided by all the students were scrutinized for patterns of answers according to accuracy, clarity, and completeness, which might distinguish between the less competent comprehenders from their controls.

Memory span task (MEMSP)

The memory span task (MEMSP) as one indicator of verbal working memory for children was modeled after the sentence span task of Swanson (1992), which follows the principle and format of Daneman and Carpenter (1980). A total of 13 sets of 2, 3, 4, and 5 sentences, all unrelated in meaning, were read orally by the experimenter to the participants as a group task. They were asked to listen to each set of sentences and to write down their short answers to a comprehension question, and at the same time, the last word in each sentence of the set. A translated example from a two-sentence set was: “The sun gives out bright light. I helped mom do a hard job.” The expected answer to the comprehension question “What kind of light does the sun give out?” should be “bright” and the last words should be “light, job”. The total testing time for this task was about 25 min, and all the answers were scored independently by two assistants. One mark was awarded for each answer to each comprehension question, and each word correctly named with a maximum score of 60 (13 answers and 47 last words). The inter-rater reliability was .904. A further example of the task is provided in Appendix.

Tongue twister task (TT)

The Chinese tongue twister task was based on the logic and finding that phonological similarity of shadowing items presented auditorially for recall interfered with memory for Chinese characters (Tzeng, Huang, & Wang, 1977). There was the more recent finding that automatic phonemic interference arises more from working memory processes and much less from articulatory processes and applies to both Chinese and English word reading (Zhang & Perfetti, 1993). Chinese tongue twisters have the added advantage that there is no confounding between phonological and visual-orthographic similarity, and this unconfounding of phonological and orthographical similarity should produce a stronger tongue twister effect than in English. Zhang and Perfetti (1993) reported a robust tongue twister effect for visually presented short stories in Chinese. They found their participants took longer to read tongue twister stories than they did in reading control stories, and the results supported the phonemic nature of the tongue twister effect and in activating the phonological code to support reading comprehension in Chinese.

The tongue twisters were designed with sets of nonsegmental phonemes such as alveolar fricatives (/s/ and /z/), alveolar stops (/t/ and /d/), and bilabial and velar stops (/b/ and /p/, /g/ and /k/). The spoken sentences were modifications of those used by Leong and Tan (2002) with Putonghua-speaking children in Beijing, later modified to accommodate Cantonese speech sounds used by children in Hong Kong (see Leong et al., 2005) and were further refined for the present study. There were eight sets of sentences drawing on discriminating Cantonese speech sounds with their different lexical tones. An example of a two-sentence set of tongue twisters was: “Silzi[2]saanlsoeng[5]silsasnlzi[2],saanlzi[2]mun[4]cin[4]sei[3]silzi[2]saanlzi[2]. Sannlzi[2]si[6]sim[4]zi[2], silzi[2]si[6]sek[6]silzi[2].” The translation was: “Lion temple is on top of lion hill; there are four lion [statues] in front of the temple. The temple is a Buddhist temple; lion statues are stone lions.” The general idea in this actual item was to play on the Cantonese onsets of /s/ and /ts/. A further example is shown in Appendix. Each child was asked to listen to each spoken sentence and to repeat it in the same character or word order and the same lexical tone as spoken. The tongue twisters were scored according to the number of characters repeated in the correct order and correct tone, and the maximum was 170. Testing time was about 15 min. Cronbach’s alpha coefficient was .808.

Chinese pseudoword reading

The Chinese pseudoword reading task consisted of 72 items (sample items in Appendix) with the characters all carefully selected from the same series of textbooks published by People’s Education Publishing in Beijing. Each of the two constituent characters was a real character, but their combination yielded a pronounceable meaningless Chinese pseudoword. The 72-item task was refined from that used by Leong et al. (2005) and was subdivided into two subtasks pseudoword 1 (PW1) and pseudoword 2 (PW2) with 36 items each according to the level of complexity by stroke number and printed frequency of the characters. Each child was asked to read aloud correctly and rapidly each decontextualized two-character pseudoword or pseudo ci. Total testing time was about 7 min per child. A credit of 1 was given for each character identified and read correctly and the maximum score for both PW1 and PW2 was 144. Cronbach’s alpha coefficient for the total task was .937. Even though this was a word reading task, we were able to capture generally the salient kinds of pronunciation errors and discuss these briefly in the Discussion section. The sample pseudowords are shown in Appendix.

Rapid automatized naming

From the studies discussed earlier and the factor analytic finding by van den Bos et al. (2002) of the separability of pre-symbolic and symbolic components (see also van den Bos, Zijlstra, & van den Broeck, 2003), the alphanumeric part of RAN (one for letters of the alphabet and one for Arabic numbers) was administered individually to the participants. The format used was the alternative version (RAN-Alternative) of Compton et al. (2002) because of their finding that this arrangement explained significantly more variance in word recognition and orthographic processing skills as compared with the traditional arrangement.

Following this logic and using the same items and arrangement from these authors, our letter naming (RANL) consisted of the high frequency lower case letters (a, b, d, o, p, s) and RAN-Number (RANN) task consisted of the six digits (1, 2, 4, 6, 7, 9), all presented in random order in 15 rows of 5 items each. Letters of the alphabet were selected because letter names are known to children in Hong Kong from kindergarten onward. The individual children were asked to read horizontally from left to right across the printed page and to name the numbers and letters as rapidly and as accurately as possible in two separate sessions of 30 s each. The total score correct within this time limit for each component was taken as the RANL or RANN score of the child. The use of scores in correctly naming the numbers or letters in unit time (30 s) made for easier administration and obviated the interpretation of negative scores if time in seconds were used as the metric. The maximum score for each part was 75, and Cronbach’s alpha for the two parts was .738.

Onset-Rime phonological segmentation

The speech-sound segmentation construct for the present study was subserved by two tasks at the syllabic level: deletion of rime (DR) and deletion of onset (DO). For DR, ten items were Chinese characters, and ten were based on English words for rime deletion (e.g., /m-ian/, /h-ide/). Similarly for the onset deletion (DO), there were ten Chinese characters and ten English-based words for onset deletion (e.g., /t-ian/, /g-old/). The maximum score for rime deletion (DR) and onset deletion (DO) was 20 each. Individual children listened to the spoken character or one syllable word and were asked to delete the end sound (DR) or the beginning sound (DO) and to say what was left. Cronbach’s alpha coefficient for the segmentation task as a whole was .766. Sample items are shown in Appendix.

Results

Performance of the students and individual differences

The results are organized to provide answers to the hypotheses tested. To compare the performance of the target group of less skilled comprehenders with the two control groups, multivariate analyses of variance (MANCOVA) and univariate analyses of variance (ANCOVA) with age as covariate were used for the cognate tasks of the constructs. Table 1 displays the means and standard deviations of the various tasks of the experimental and control groups. The intercorrelations of these tasks after controlling for chronological age for the total group of 91 students are shown in Table 2.

Table 1 Means and standard deviations of all tasks for the less skilled comprehenders, reading comprehension controls, chronological age controls and total group
Table 2 Intercorrelations of tasks after controlling for chronological age for a total group of 91 students

A set of univariate ANCOVA was conducted with the three experimental and control groups as the independent variable, each of the ten tasks (not including text comprehension total) as the dependent variable, and age as the covariate (see Table 1). In general, other than the deletion of rime and deletion of onset tasks, the three groups were significantly different in their performance. Expectedly for those eight tasks with overall significant differences, the less skilled comprehenders (PC) and reading comprehension control (RC) groups were not statistically different from each other, but both were statistically and substantially weaker than the chronological age control group (CA).

For the text comprehension construct, a 3 group × 2 text comprehension MANCOVA with the last factor repeated and age as covariate showed significant difference for group: F(2,87) = 12.175, p < .001, η 2 = .221. ANCOVA for text comprehension 1 (TC1) was significant: F(2,87) = 26.521, p < .001, η 2 = .379. Pairwise comparisons showed that the target group (PC) and the comprehension control (RC) group were significantly weaker than the CA controls but not from each other, as hypothesized. Similarly, the three groups were significantly different in text comprehension 2 (TC2): ANCOVA, F(2,87) = 14.839, p < .001, η 2 = .254. Pairwise comparisons showed that the target group (PC) and the comprehension control (RC) group were again significantly (p < .01 and p < .001, respectively) weaker than the CA control, but the PC and RC groups were not significantly different from each other, as hypothesized.

For the pseudoword reading construct, a 3 group × 2 pseudoword MANCOVA with the last factor repeated and age as covariate showed significant difference for group: F(2,87) = 5.041, p < .001, η 2 = .105. ANCOVA for pseudoword 1 was significant: F(2,87) = 9.615, p < .001, η 2 = .181. Pairwise comparisons showed that the target group (PC) and the comprehension control (RC) group differed significantly (p = .034 and p < .001, respectively) from the CA control but not from each other, as expected. ANCOVA for pseudoword 2 (PW2) was significant: F(2,87) = 8.566, p < .001, η 2 = .165. Pairwise comparisons showed only the RC group was significantly (p = .001) weaker than the CA control group.

For the verbal memory construct, a 3 group × 2 verbal working memory MANCOVA with the last factor repeated and age as covariate showed significant difference for group: F(2,87) = 5.058, p < .001, η 2 = .105. ANCOVA for working memory span was significant: F(2,87) = 9.680, p < .001, η 2 = .182. Pairwise comparisons showed that the target group (PC) and the comprehension control (RC) group were significantly lower than the CA controls (p = .002 and p = .013, respectively) but not from each other, as expected. ANCOVA for tongue twister was significant: F(2,87) = 4.550, p < .013, η 2 = .095. Pairwise comparisons showed only the RC group was significantly lower (p < .05) than the CA control group.

For the RAN construct, a 3 group × 2 RAN MANCOVA with the last factor repeated and age as covariate showed significant difference for group: F(2,87) = 2.517, p < .05, η 2 = .055. ANCOVA for RAN Letter was significant: F(2,87) = 4.826, p < .010, η 2 = .100. Pairwise comparisons showed there was no difference between the target group (PC), the comprehension control (RC), and chronological age (CA) control groups, nor between RC and CA groups. ANCOVA for RAN number was significant: F(2,87) = 3.192, p = .046, η 2 = .068. Pairwise comparisons showed there was no difference among the three groups.

For the onset-rime segmentation construct, a 3 group × 2 segmentation MANCOVA with the last factor repeated and age as covariate showed nonsignificant difference for group: F(2,87) = 1.540, p = .193, η 2 = .035. ANCOVA for deletion of rime was not significant: F(2,87) = 1.213, p = .302, η 2 = .027. Pairwise comparisons showed there was no difference between the target group (PC), the comprehension control (RC), and chronological age (CA) control groups, nor between RC and CA groups. ANCOVA for deletion of onsets was not significant: F(2,87) = .151, p = .860, η 2 = .003. Pairwise comparisons showed there was no difference among the three groups.

Inspection of the means and standard deviations of the tasks (Table 1) shows that even for the less skilled comprehenders (PC) their text comprehension mean performance was at the 51% level for TC1 and 32% for TC2. Their pseudoword performance was at the 53 and 54% level for PW1 and PW2, respectively. Their memory span task performance was at 37% level, while their tongue twister was at 78% level. For RAN, the performance was higher. For rime segmentation, it was 58%, and for onset segmentation, it was 39%. These summary results (Table 1) suggest that the tasks were designed in such a way to obviate ceiling or floor effects even for the less skilled comprehenders.

Table 2 shows that for the group of 91 children, the cognate tasks correlated highly to moderately between cognate members of each composite set constituting a construct after controlling for chronological age. As examples, the correlations between the two text comprehension subtasks was .681; for PW1 and for PW2, it was .902; for memory span and tongue twister, it was .529; for the two RAN tasks, it was .688; and for the two phonological segmentation tasks, it was .718. It should also be noted that the correlations of the onset-rime segmentation tasks with the other tasks were generally low.

The literature suggests that it is informative to make comparisons with both types of control students. Our use of the more complex reading comprehension controls and not just the typical match on single word decoding helped to focus on text comprehension and not bias our results toward word reading per se. The RC controls provide information on impaired text comprehension processes, and the CA controls provide a perspective on the students with reading impairment on how delayed they are compared with good readers of the same chronological age (Manis, Seidenberg, Doi, McBride-Chang, & Petersen, 1996). As revealed by the relative sizes of the standard deviation, there was a much wider spread of the scores of the PC group, and this greater variability would bring about their overall lower mean scores. These and other results suggest the heterogeneous nature of the less skilled comprehenders as a group. The finding of the generally depressed profile of the PC students is in accord with the broad findings of Swanson and Alexander (1997) and de Jong (1998).

The correlations in Table 2 showed that those who did well in the pseudoword construct also tended to do well in text comprehension, and those who did poorly on pseudoword reading also tended to do poorly in text comprehension. There were very few students who did well in pseudoword reading but poorly in text comprehension. This issue of the relationship between Chinese pseudoword reading and text comprehension, especially using case studies, needs to be further explored.

Error analysis of text comprehension and pseudoword reading

The open-ended inference-level comprehension questions and the written answers also have another advantage over multiple-choice and cloze type of comprehension questions. Our method of using written protocols completed in 40 min to examine text comprehension enabled us to use these reports as further data. Analyses of these written records show that there are certain patterns in the way the less competent comprehenders, including many of the reading age control students, attempted to provide inference-level answers.

One pattern was the verbatim repetition of phrases or sentences taken directly from the written text displayed in full view before the students even though they were asked to use their own words. As each written passage with its three questions was clearly visible, any demand on memory load was minimized. Examples from the passage on Hong Kong were the typical answers of “Hong Kong is good; there are lots of places for enjoyment”. Another pattern was the provision of partial and incomplete answers emphasizing literal meaning. A good example is the passage on “Shutting the Barn when the Horse having Bolted”. The equivalent idiom in Chinese could be translated as “Shutting the Pen after losing the Goat”. What was interesting was that almost all the poor comprehenders and many of the reading comprehension controls interpreted “losing” or “lost” literally to mean death as the character has the same meaning. The third was the failure to see analogies such as the Great Wall of China being compared to the meandering and variegated dragon with all the glory of both. The fourth was the failure to delve into deeper levels of comprehension and the inability to comprehend or use visual and auditory imagery. The lack of sensitivity to imagery is shown in not grasping significant words such as “lone sail” to convey sadness and loneliness in seeing a friend sailing off in the poem by Li Po. The rather explicit meaning of the unpretentious and lowly peanuts hidden underground and not showing off such as brightly colored apples and peaches to suggest practicality and humility was almost completely lost on the poor comprehenders and many of the reading age controls. Extrapolations to the question of what father wanted the children to become included people with achievement, and only a few students saw the real meaning of looking beyond mere appearance, be practical and not showing off. Similarly, the blind girl “seeing” in her mind’s eye moonlit waves in oceans on listening to Beethoven playing the Moonlight Sonata was almost beyond the imagination of the poor comprehenders and many of the reading age controls.

These summary descriptions suggest that written reports as data provide insightful and rich sources into text comprehension to complement statistical data and should be further explored.

While the written answers to the open-ended text comprehension questions provide further sources of data, the oral reading of the Chinese pseudoword task did not allow us the same fidelity and richness even though the assistants were careful to write down the incorrect reading. From their careful recording and analyses in terms of accuracy of pronunciation including correct lexical tones, we were able to deduce some salient patterns. The less competent comprehenders and also many of the RC controls used several erroneous strategies. One strategy was to use the phonetic component inherent in each character to deduce the pronunciation of that character. The other strategy was to deduce the pronunciation from a similar looking heterographic lexical item. The third was to mispronounce a character with similar meaning and also with similar configuration. The fourth was the use of the wrong lexical tone, which would alter the meaning of the item as well. We agree with the reviewers that this is an area that should be further explored, especially as each of the constituent Chinese characters is legal and pronounceable and is different from an English pseudoword, the pronunciation of which is affected by the density of the neighborhood effect.

Structural equation modeling

The relative strength of the relationship among the constructs and their contribution to individual differences in Chinese text comprehension was examined with structural equation modeling (LISREL version 8.72, Jöreskog & Sörbom, 1996–2001) on the total group of 91 students. It was found that one of the observed variables (rime deletion) had a small negative uniqueness. Such an improper solution was common in small sample research (Marsh & Hau, 1999). Following Marsh and Hau’s recommended strategy, the two relevant factor loadings were constrained to be equivalent, which was justified in this specific case because all other path coefficients in the model were almost unaffected with this additional constraint. The hypothesized model and the standardized path coefficients are shown in Fig. 1 with reasonably good fit, using fit indices as recommended by Marsh, Hau, and Grayson (2005); χ 2(26) = 32.04, p = .192, root mean square error of approximation (RMSEA) = .035, non-normed fit index (NNFI) = .987, comparative fit index (CFI) = .993. The results further showed that verbal working memory, RAN and onset-rime segmentation correlated moderately to fairly highly from .31 to .70. However, their effects on text comprehension and pseudoword reading varied (beta weights shown in Fig. 1). Verbal memory had a much stronger effect (beta = .84) on text comprehension than the constructs of RAN and onset-rime segmentation. The effect of verbal memory on text comprehension (beta = .84) was considerably stronger than that on pseudoword reading (beta = .44). The effect from pseudoword reading to text comprehension was moderate (beta = .27). The direct and indirect effects of verbal working memory and pseudoword reading on text comprehension are shown by the various beta weights and summarized in Fig. 1. These results will be further discussed in the Discussion section.

Relative effects of predictors on text comprehension

To examine in greater detail the unique and conjoint contributions of each task, a set of hierarchical multiple regression was conducted with text comprehension (sum of standardized scores of the TC1 and TC2) as the criterion variable (J. Cohen & P. Cohen, 1983). Age was controlled by being entered first, while the indicators of verbal working memory, rapid automatized naming, and onset-rime were entered separately (see Table 3, Equations 2, 3 and 4), in pairs (Equations 5, 6, and 7), and all together (Equation 8). The additional contribution of pseudoword reading performance in predicting text comprehension was analyzed in the regression analysis shown in Equations 9. Furthermore, because of our interest in the unique contribution of pseudoword reading performance over and above the other variables, four analyses (Equations 10 to 13) were carried out with pseudoword being entered simultaneously with each of the other sets of tasks.

Table 3 Summary of hierarchical multiple regression analysis of variables predicting Chinese text comprehension in the total group of 91 target, reading comprehension and chronological age control students

The results showed that verbal working memory (see Table 3, Equations 2, 5, 6, 8 and 9, and 11), especially memory span, predicted text comprehension substantially with beta weights ranging from .319 to .486. Though the memory span and tongue twister tasks shared some of the variance when considered conjointly with other tasks, verbal working memory’s unique contributions to text comprehension remained relatively high.

In contrast, rapid automatized naming and onset-rime segmentation had much smaller unique contribution in predicting text comprehension. The RAN construct by itself (Equation 3), especially RAN-Letter (beta = .322) had considerable effect on text comprehension. However, this effect was not significant (beta = .180), while RAN-Number had some effect (beta = .274) (Equation 7) in the presence of onset-rime segmentation tasks and had no significant effect in the presence of the verbal memory construct (Equations 5). These values showed the effects of RAN on text comprehension could be accounted for considerably by the children’s performance in verbal working memory.

Similarly, though onset-rime segmentation, in particular rime deletion (beta = .455, Equation 4) seemed to contribute toward text comprehension, its effect dropped to beta .344 in the presence of the RAN construct (Equation 7) and to beta .113 in the presence of verbal working memory (Equation 6).

In the regression analysis (Equation 9) in which all the tasks were entered together to predict text comprehension, the two indicators of verbal working memory had very strong effects especially for memory span (beta = .319) and also tongue twister (beta = .206). Importantly, pseudoword 1 had some contribution by itself (beta = .566, Equation 10) and in the presence of other variables (beta = .448 to .511, Equations 9, 11, 12, and 13). It should be noted that there was much overlap in the explanatory power of our predictor variables. While the additional explanatory power of pseudowords was seemingly relatively small as shown by the R 2 increase from 49.8 to 57.8% (in Equations 8 and 9), its effects with those of verbal working memory explained a total of 57% of the variance of text comprehension (Equations 11). These results further reinforced and reflected the broad structure as summarized in the structural equation analyses in Fig. 1.

Discussion and conclusion

At the outset, it should be acknowledged that there are many reasons for comprehension deficits in children. Sources of inference-making deficits include less skilled comprehenders’ processing limitations hindering integration of text information with prior knowledge and their not knowing when or where to draw inferences (Perfetti et al., 2005). There are also different ways of assessing reading comprehension involving both experimental and correlational approaches, text genre and structure, response format and individual differences (Fletcher, 2006; Shuy, McCardle, & Albro, 2006).

The construct of verbal working memory needs to be strengthened with even more complex tasks such as memory updating (Carretti, Cornoldi, De Beni, & Romano, 2005) and complex memory span tasks (Bayliss et al., 2005). In memory updating, participants are asked to recall serially the last three digits presented orally in a series of one-digit numbers in set lengths usually varying from 3, 5, 7, and 9 items. This memory-updating task requires the active manipulation and addition of new information and replacement of old or irrelevant information in working memory and has been shown to relate to reading comprehension (Carretti et al., 2005). Complex memory span tasks typically include a verbal–verbal and verbal–visuospatial processing component, a storage component (e.g., digit span and Corsi span), and a processing efficiency component. These different complex memory components have been shown to reflect working memory limitations in terms of rate and qualitative differences in children with learning disabilities compared with “typically developing children” (Bayliss et al., 2005). There are also other possible refinements to the tasks used in the present study, which were all specially devised according to theoretical findings and which were shown to have high reliability and surface validity.

Before fuller discussion of the results, we would suggest that the present study contributes to children’s reading comprehension and text comprehension difficulties in Chinese in a number of ways. First, the study shows carefully constructed and relatively short text passages with reasonably high surface validity and well-designed open-ended inferential questions can examine inference-making reading comprehension with high fidelity. As stated by W. Kintsch and E. Kintsch (2005, p. 88) “open-ended responses are more indicative of a person’s real understanding than multi-choice items” unless the latter are very carefully constructed. The related use of the written protocols combining reading and writing by the students in full view of the passages is a viable approach to studying reading comprehension (see W. Kintsch & E. Kintsch, 2005; Swanson & Berninger, 1996). Second, the use of a reading comprehension age group and a chronological age group as controls allows some of the “causal” possibilities to be ruled out even though these matched groups inherit the inherent limitations of multivariate correlation designs for making direct causal conclusions. Third, in support of the literature on English reading comprehension, verbal working memory as an active system has a strong effect on Chinese reading comprehension, thus, showing the relatively orthography-independent characteristic of this construct (Daneman & Carpenter, 1980). Fourth, two-character Chinese pseudowords have a mediating role to play in reading comprehension, especially with less competent comprehenders. Fifth, rapid naming of letters by itself makes some contribution to Chinese text comprehension but not in conjoint combination with other tasks. Sixth, the contribution of onset-rime segmentation is negligible. We now turn to detailed discussion of the results.

Contribution of verbal working memory and pseudoword reading

In the literature on reading comprehension in English, there is evidence of the important role of working memory (Bayliss et al., 2005; Cain et al., 2004a, b; Seigneuric & Ehrlich, 2005; Seigneuric et al., 2000). The storage and manipulation of linguistic materials in memory as assessed by the kind of memory span task used likely draws on the same or similar sorts of strategies in processing text materials. There is evidence of this in the present study (Table 3 and Fig. 1). The results showing that the less competent comprehenders had difficulty in storing information and performing concurrent processing seem to apply across writing systems to both the alphabetic English and the morphosyllabic Chinese.

The structural equation analysis summarized in Fig. 1 provides some answers to the issue of mediating functions (see Baron & Kenny, 1986) or direct and indirect effects of verbal working memory in relation to text comprehension (see also Table 3). In the present study (Fig. 1), verbal working memory had a strong direct effect on text comprehension (beta = .84) and an indirect effect mediated through pseudoword reading (beta = .44, .27). In contrast, the construct of RAN had an indirect positive effect mediated through pseudoword reading (beta = .37, .27) with a much smaller and negative direct effect on text comprehension when other relevant effects were controlled. The results of the structural equation analyses and the hierarchical regression analyses summarized in Table 3 suggest that verbal working memory had a strong direct effect on text comprehension, and this effect was mediated partially through pseudoword reading.

What are the plausible reasons for Chinese pseudoword reading explaining some important variation in Chinese reading comprehension for the total group of 91 students? At the theoretical level, the interactive constituency model of Perfetti and colleagues (Tan & Perfetti, 1998, 1999) provides the underpinning. This model posits that Chinese word identification results from the convergence of the phonological form, the orthographic form, and the semantic form with the suggestion that phonologic–orthographic convergence is more rapid and more reliable than the orthographic–semantic. The ability to read accurately and rapidly Chinese pseudowords in the way they were designed for this study likely has some similarity to what is needed to read textual materials and with understanding.

The other plausible reason is that to identify and read correctly each character constituting the two-character pseudoword, the child has to draw on his or her knowledge of the vocabulary, which has been shown to affect reading comprehension (Cain et al., 2000, 2004a, b; Seigneuric & Ehrlich, 2005; Seigneuric et al., 2000). As the two constituent characters bear no semantic resemblance to each other, the correct reading of each of the two-character pseudoword likely draws on the linear process, and not an interactive one, of reading each character correctly.

The third plausible reason is that the 31 less competent comprehenders (34% of the total group of 91 students) were less efficient in linking their background knowledge with the text materials and in drawing relations between propositions or ideas. They would rely on their knowledge of characters and words and would passively read on without reflecting much on where to draw inferences from text. From observations of the pseudoword reading of these students, they typically read the two-character pseudowords by the constituent bujians of the individual characters, used the first character to infer the reading of the second character, or misread similarly configured characters from certain constituent parts.

The considerable reliance on character or pseudoword reading on the part of the 31 less skilled comprehenders, who comprised a third of the total group, likely also explains the mediating role of pseudoword reading in text comprehension as shown in the structural equation analysis summarized in Fig. 1 and in the hierarchical multiple regression analysis shown in Table 3. These mediating functions of pseudoword reading in relation to verbal working memory may shift with a much larger unselected sample of students and different age groups.

Role of phonological segmentation and RAN

The phonological sensitivity tasks made no contribution to Chinese text reading. This pattern likely reflects the much more predominant contributions by verbal working memory, the two-character pseudoword reading tasks, and their conjoint effect. It is also likely that phonological segmentation of onset-rime deletion plays a role only at the emergent literacy stage, as shown by Siok and Fletcher (2001) in their study of preschool and grade school Chinese children in Beijing. More importantly, our results on onset-rime segmentation not affecting reading comprehension are in line with the findings with American children by Catts, Fey, Zhang, and Tomblin (1999) and with the findings by Demont and Gombert (1996) in a four-year follow-up study of 38 French speaking preschool children.

The low, although statistically significant, contribution by RAN on its own (RAN letters in particular) is in line with general findings from a meta-analysis by Swanson, Trainin, Necoechea, and Hammill (2003) of the correlation literature on measures of reading, RAN, phonological sensitivity, and related abilities from a large N of 2,257 Caucasian children in 49 independent samples with corrections for sample size, restriction in range, and attenuations. Swanson et al. found that correlations between RAN and phonological sensitivity were low (.38), and RAN and phonological sensitivity tasks correlated with real word reading to be moderate (.46 and .48). RAN and phonological sensitivity were found to be less important than measures of spelling and word attack skills and also played a less important role in reading comprehension. It would thus appear that even with quite disparate writing systems—English and Chinese—there are common and general findings on the role of RAN.

Summary

Future research should specify with fine-grained analyses how various cognitive and linguistic components might relate to different facets of reading comprehension in Chinese and to understand more fully the process of inferencing in text materials at different elementary grade levels. The role of inference awareness training in making links between narrative and expository texts and their meaning for poor comprehenders (e.g., Yuill & Oakhill, 1988) and general language mechanisms underlying poor reading comprehension (Nation et al., 1999; Stothard & Hulme, 1992) should be further investigated. Moreover, as shown by W. Kintsch and E. Kintsch (2005) and Shuy et al. (2006), it is important to consider experimental and discourse-level procedures using latent structural analyses and other means to assess reading comprehension and to provide for diagnostic information on the ability of students with difficulties in comprehending text materials.