Introduction

Reading in a second language (L2) is a complex process requiring a vast array of intricately interacting sub-processes. The challenge of reading in Chinese is particularly daunting for L2 readers with an alphabetic first language (L1) background because Chinese is a logographic language with unique linguistic features that necessitate specific reading processes and skills.

Oral reading fluency (ORF) is an indicator of the quality of the orchestration of multiple sub-processes and sub-skills that are evidently affected by language-specific features. Fluent processing of the written material is indispensable for efficient comprehension and a critical component of reading development. ORF has become a pillar of reading curriculum and an effective tool to measure comprehension and detect reading difficulty (Kuhn, Schwanenflugel, & Meisinger, 2010), driving major pedagogical decisions. The implementation of fluency instruction and assessment builds upon the sufficient understanding of the ORF construct and its role in skilled reading. Despite the sustained interest in ORF (Berninger et al., 2010; Kuhn, Schwanenflugel, & Meisinger, 2010; Rasinski & Samuels, 2011; Valencia, Smith, Reece, Li, Wixson, & Newman, 2010), the conceptualization of the construct is still incomplete. Most previous literature investigated the accuracy and speed aspects of ORF, with considerably less attention to other aspects.

The importance of ORF extends to L2 reading. Furthermore, since the development of ORF is constrained by language-specific features that are probably different in L1 and L2, understanding L2 ORF would point out the areas of L2 language competence that require targeted teaching and practicing. The accuracy and speed dimensions of ORF in L2 has garnered empirical attention (Crosson & Lesaux, 2010; Jeon, 2012; Jiang, 2016; Jiang, Sawaki, & Sabatini, 2012; McTague, Lems, Butler, & Carmona, 2012), whereas the L2 ORF construct has yet to be explored (Pey, Min, & Wah, 2014). Prior studies focused on alphabetic L2, and little is known about ORF in logographic Chinese. It remains unclear what constitutes ORF in L2 Chinese, and how the multiple tiers of ORF are associated with comprehension and learners’ reading difficulty. The present study examined the ORF construct and its relation with comprehension and learner-perceived difficulty for Chinese L2 readers.

Literature review

Oral reading fluency construct

Historically, ORF is hard to define. Despite the conceptual fluidity, a consensus has been reached that ORF is the oral reproduction of connected text with speed and accuracy. ORF demonstrates the ability to read rapidly with ease and accuracy, and with appropriate prosodic expression (Rasinski & Samuels, 2011).

The most thoroughly researched aspects of ORF are accuracy and speed, which are commonly measured by a combined index, the number of words read correctly per minute. It exhibits the speed and correctness with which written material is decoded and reproduced into spoken language. This accurate rate index has been extensively used to as a proxy of ORF among L1 readers (Arnesen et al., 2016; Morris, Pennell, Perney, & Trathen, 2018; Sabatini, Wang, & O’Reilly, 2018; Wayman, Wallace, Wiley, Ticha, & Espin, 2007) and has also increasingly been adopted for L2 ORF (Crosson & Lesaux, 2010; Jeon, 2012; Jiang, 2016; Jiang, Sawaki, & Sabatini, 2012; McTague, Lems, Butler, & Carmona, 2012). However, thus far, the accurate rate has been rarely used in L2 Chinese reading (Lv, 2016).

Miscues, the oral errors produced during text reading (Goodman & Goodman, 1994), constitute a bundle of indices for the ORF accuracy dimension. Non-target substitutions, omissions, and insertions of additional words demonstrate the deviations from the original text and have been the focus of the miscue analysis research. Fluent reading is characteristic of few substitutions, omissions, and insertions (Laing, 2002). Moreover, the patterns of substitutions provide a lens into the ongoing reading processes (Beatty & Care, 2009; Briceño & Klein, 2018; Chang, 2015; Kucer, 2009; Wu & Anderson, 2007; Yan & Wang, 2011). Only a handful of studies (Wang, 2006) have described Chinese L2 readers’ miscues.

Repetitions reflect readers’ struggles with words/phrases and serve as a speed index of ORF. Frequent repetitions slow the reading down, leaving it choppy and stumbled. Fluent reading, by contrast, characterizes smoothness with few “rough spots” caused by repetitions.

Intrusive pauses within words and inappropriate phrasing of text into fragmented sections disrupt the “prosodic phrasing and contours of the text” (Grabe, 2009, p. 292) and hence capture the chunking/prosody component of ORF. Pauses are measured by objective indices of pausal duration, frequency, and position, whereas phrasing is generally evaluated via subjective scales (Godde, Bosse, & Bailly, 2020). Researchers have found that fluent readers signal major boundaries with pauses and strategically process texts in meaningful chunks (Álvarez-Cañizo, Suárez-Coalla, & Cuetos, 2018; Kuhn & Schwanenflugel, 2018). Less fluent readers either halt inside meaningful units or separate words in ways that deviate from the natural phrasing (Binder, Tighe, Jiang, Kaftanski, Qi, & Ardoin, 2013; Miller & Schwanenflugel, 2006; Valle, Binder, Walsh, Nemier, & Bangs, 2013). It worth mentioning that pause misplacement is more symptomatic of decoding than chunking difficulty, as it is a compensation invoked when word recognition fails (Kuhn, Schwanenflugel, & Meisinger, 2010). Few studies have investigated L2 reading chunking/prosody (Pey, Min, & Wah, 2014) and even fewer for Chinese L2 readers, which was only touched upon in the rough description of within-word pauses and incorrect segmentations in miscues (Wang, 2006).

Self-correction is an overt manifestation of the monitoring processes (Kormos, 1999) and thus taps the monitoring aspect of ORF. As a self-initiated, self-completed repair, self-correction in oral reading comes out when the reader detects the error and executes a correction. Self-correction enhances accuracy, while frequent self-correction would rend the reading less smooth and coherent. Studies have shown that the occurrence of self-correction decreased as readers became more sophisticated (Kucer, 2017; Share, 1990), because fewer errors were made and the majority of miscues did not cause meaning breakdown. Meanwhile, more proficient readers (McGee, Kim, Nelson, & Fried, 2015; Nguyen, Pickren, Saha, & Cutting, 2020) and readers with faster progress (D’Agostino, Kelly, & Rodgers, 2019) displayed a greater tendency to self-correct, revealing a heightened awareness of self-monitoring. L2 readers are less likely to correct themselves, either because their unstable L2 knowledge fails to perceive errors or their insufficient skills of processing the written input leave little cognitive capacity for self-monitoring (Francis, 1999). Chinese L2 reading researchers observed rare self-correction among novice readers and increasing repair behaviors as learners’ proficiency improved (Liu, 1999; Wu & Anderson, 2007).

Finally, it was noted that ORF has an extra element of verbal output compared to silent reading fluency. Reading aloud is a sequential process comprising visual recognition, the conversion of orthographic representations into phonological codes, and the activation of the corresponding articulatory-motor program for overt production (Timmer & Schiller, 2012). In this regard, ORF involves phonological recoding, which imposes another burden for L2 readers. Due to the less developed oral proficiency in L2, even successful word recognition does not guarantee proper articulation for reasons such as failure to sound out the phonemes and mispronunciation (Lems, 2006).

Oral reading fluency and comprehension

The vast majority of ORF studies focused on the accuracy and speed aspects and employed the accurate rate index (number of correctly read words/min). A strong correlation between ORF and comprehension has been established among young L1 readers (Arnesen et al., 2016; Jenkins, Fuchs, Van den Broek, Espin, & Deno, 2003; Roehrig, Petscher, Nettles, Hudson, & Torgesen, 2008; Sabatini, Wang, & O’Reilly, 2018; Schilling, Carlisle, Scott, & Zeng, 2007; Silberglitt, Burns, Madyun, & Lail, 2006; Wanzek, Roberts, Linan-Thompson, Vaughn,Woodruff, & Murray, 2010) and L2 learners (Crosson & Lesaux, 2010; Jeon, 2012; Jiang, 2016; Jiang, Sawaki, & Sabatini, 2012; McTague, Lems, Butler, & Carmona, 2012).

The line of miscue analysis research probed into the relationship between different types of miscues and comprehension. Children who omitted less frequently (Laing, 2002) or made a smaller total number of miscues as well as fewer substitutions with meaning change tended to recall more and better of the text (Beatty & Care, 2009; Kucer, 2009; Wu & Anderson, 2007). The similar pattern has been observed among L2 learners (Wang, 2006). Nevertheless, the small sample sizes of these miscue studies make it impossible to statistically connect miscues with comprehension.

As they further explored, scholars found that prosody, the indicator of efficient chunking, predicts comprehension performance (Arcand et al., 2014; Benjamin & Schwanenflugel, 2010; Calet, Gutiérrez-Palma, & Defior, 2017; Fernandes, Querido, & Verhaeghe, 2018; Groen, Veenendaal, & Verhoeven, 2019; Klauda & Guthrie, 2008; Rasinski, Reutzel, Chard, & Linan-Thompson, 2011). However, prosody does not add much explanatory power to comprehension over accuracy and speed (Riedel, 2007; Sabatini, Wang, & O’Reilly, 2018; Schwanenflugel, Hamilton, Kuhn, Wisenbaker, & Stahl, 2004). Regarding L2 readers, the role of chunking/prosody in comprehension remains underexplored (Pey, Min, & Wah, 2014).

Monitoring, an awareness of the ongoing reading processes, is closely associated with the maintaining of a coherent textual representation. Monitoring enables readers to constantly check whether or not comprehension is occurring and strategically address the possible problems, thus plays a key role in integrating the constructed meaning (Kim, Vorstius, & Radach, 2018). The relationship between self-correction, a monitoring behavior, and comprehension, has been rarely investigated. Most studies calculated and compared self-correction instances of readers at different levels and yielded inconclusive results (Kucer, 2017; Liu, 1999; McGee, Kim, Nelson, & Fried, 2015; Nguyen, Pickren, Saha, & Cutting, 2020; Share, 1990; Wu & Anderson, 2007). In their inquiry of the association between self-correction and comprehension improvement, D’Agostino, Kelly, & Rodgers, (2019) reported that self-correction positively predicted the progress in comprehension among struggling L1 readers, indicating its facilitative effect.

Generally speaking, the relationship between ORF, especially its accuracy and speed components, and comprehension is robust. The contribution of ORF to comprehension draws theoretical support from Automaticity Theory (DeKeyser, 2001; LaBerge & Samuels, 1974; Segalowitz, 2003; Segalowitz & Segalowitz, 1993). The key premise is that the lower- and higher-level processing compete for the limited cognitive capacity and automaticity of the former ultimately leads to comprehension. If too many cognitive resources are consumed by the lower-level processes such as word recognition and chunking, few will be available for higher-level processes and comprehension will break down. By contrast, if readers perform lower-level processes rapidly and non-deliberately, a large amount of conscious attention will be shifted toward higher-order processes to build comprehension. In other words, reading depends upon the automatic execution of myriad lower-level sub-skills to enable higher-level processes.

ORF, with components of accuracy, speed, chunking/prosody, and monitoring, represents the complex, multi-faceted skill set that entails processes of accessing word meaning, segmenting words into meaningful units, and monitoring the establishment of a coherent representation (Breznitz, 2006). Except the monitoring aspect, ORF is mainly a marker of the efficient orchestration of lower-level processing (Grabe, 2010; Rasinski, Reutzel, Chard, & Linan-Thompson, 2011), which allows adequate cognitive resources to focus on higher-level processes of meaning construction. Therefore, ORF is a prerequisite for comprehension.

The critical role of ORF in comprehension renders it a sensitive tool to diagnose reading problems. The accuracy and speed dimensions of ORF (Fuchs, Fuchs, & Compton, 2004; Hintze & Silberglitt, 2005; Keh, 2016; Parker et al., 2015; Wise, Sevcik, Morris, Lovett, Wolf, & Schwanenflugel, 2010) have been used to identify weak readers and pinpoint difficulties in word recognition. Much less attention, however, was dedicated to reader-perceived difficulty of oral reading and its connection with performance on different aspects of ORF.

Chinese-specific processes in oral reading fluency

As previously stated, ORF reflects the efficiency of lower-level, linguistic processing, which is constrained by language-specific features. Chinese unique linguistic features necessitate specific lower-level processes that cause difficulties for L2 readers and highlight certain components of ORF.

As a logographic orthography, Chinese lacks script–sound correspondence (Shen, 2013). The basic graphic unit, character, does not provide route to phonological representation. It imposes great challenges on L2 learners who are accustomed to assembling word pronunciation from phonemes represented in letters. In fact, the difficulty of pronouncing characters/words despite successful lexical access has been frequently mentioned by L2 learners (Hu, 2010). The phonological processing in Chinese is post-lexical: the articulation of written symbols is reliant on the access to meaning. It is possible to know the meaning of a character without knowing its sound, contrasting with being able to pronounce a word without knowing its meaning in alphabetic languages. Decoding without comprehension, a phenomenon common in alphabetic languages, does not work in Chinese; instead, the successful pronunciation indicates successful lexical access. This post-lexical nature of phonological processing allows ORF to be a more robust measure in Chinese than in alphabetic languages.

Another unique feature of Chinese is the lack of spatial demarcation for word boundaries, which necessitates an extra, explicitly operating word segmentation process. The array of characters is continuously parsed into meaningful words as reading proceeds, enabling the specification of individual spatially unmarked words based on which other types of processing can occur (Li, Rayner, & Cave, 2009). It is vital to have existing word representations to segment words, because knowing characters that likely comprise a word helps identify and separate them within the continuous character strings. In this regard, word boundaries naturally arise from successful lexical access. However, under other circumstances, segmentation is implemented in order to recognize words. When encountering unfamiliar words, readers need to determine word boundaries to access the weakly represented entries in mental lexicon. When ambiguity occurs, that is, a character combines with its adjacent characters in different ways to form different words, correct segmentation is essential for the construction of contextually appropriate meaning (Shen & Jiang, 2013). The unspaced layout of Chinese print makes word segmentation an uneasy task for L2 readers whose L1 writing systems have visible word boundaries. L2 readers have a hard time isolating meaningful words in the running text even at advanced levels (Bassetti, 2005; Lee-Thompson, 2008; Shen, 2008; Yang & Jiang, 2012). Distinguished from the phrase-level chunking in reading alphabetic languages, word segmentation in reading Chinese is a fundamental process of word recognition and hence is a more prominent component of ORF.

Previous literature on ORF showed some gaps. First, thus far, research on L2 ORF was scarce and almost exclusively focused on an alphabetic L2. Little is known about ORF in L2 Chinese that activates language-specific processes such as the post-lexical phonological processing in word/character recognition and the explicit word segmentation. Moreover, most studies on ORF − comprehension relationship examined the accuracy and speed aspects, with other aspects largely neglected. Given the crucial role of ORF in reading, it requires exploration of what the ORF construct is and how its different components are associated with comprehension and reader-perceived difficulty among Chinese L2 learners.

The present study was designed to fill the gaps by addressing the following research questions:

  1. 1.

    What are the components of the ORF construct among Chinese L2 learners?

  2. 2.

    How are different components of ORF related to comprehension?

  3. 3.

    How are different components of ORF related to learner-perceived difficulty of oral reading?

Methods

Participants

One hundred L2 Chinese learners from four universities in the U.S. and two universities in China participated in this study (53 males and 47 females; age range: 19 to 26 years old). Among them, 74 were native English speakers, and the rest were from alphabetic European language backgrounds. The racial/ethnic breakdown was as follows: 76% white, 20% Asian, 2% Black, and 2% Hispanic. Convenient sampling was adopted; L2 learners from First (n = 35), Second (n = 47), and Third (n = 18) Year Chinese classes were recruited. A character recognition test was administered to further determine participants’ proficiency level.

Instruments

Given that the number of known characters is an important benchmark specified in Chinese proficiency guidelines, a character recognition test (see “Appendix A”) was designed to check participants’ proficiency level. The test is composed of 50 characters retrieved from the most frequent 1500 Chinese characters listed in the Modern Chinese Character Frequency List (National Working Committee for Language and Characters, 1992), which form a hierarchy of characters L2 learners across proficiency levels should master. One character is selected from every 30 characters. Participants read characters aloud as quickly as possible and were given about five minutes to complete this test.

The reading comprehension section of an AP Chinese test was used to measure comprehension. The AP Chinese test has been widely utilized in the U.S. as a designated proficiency assessment for the two-year foreign language requirement at the college level. The test contains six texts of various genres, ranging from 133 to 382 characters in length. It consists of 25 comprehension questions that mainly assess readers’ abilities to understand the main idea and specific details, draw inferences, and interpret the purposes of a text. Participants read the texts silently and answered the comprehension questions. The whole process lasted about 40 min.

The oral reading task aimed at eliciting participants’ ORF performance. One paragraph was randomly retrieved from each of the six texts in the AP test (see “Appendix B”). The length of the six paragraphs ranged from 81 to 190 characters, with an average of 130 characters. Participants read the paragraphs aloud at their natural pace and they generally spent 10 to 15 min on this task.

A questionnaire (see “Appendix C”) was developed to collect participants’ evaluation of the oral reading’s difficulty level. Participants rated the task difficulty on a 10-point scale, with 1 representing the easiest and 10 representing the hardest. They were also required to explain what aspect of oral reading they felt difficult.

Data collection

The participants were randomly assigned to one of two groups. One group (n = 50) first took the AP test and character recognition test and then completed the oral reading and the questionnaire. The other group (n = 50) followed the reverse order. The procedure was counterbalanced to control the practice effect resulting from the use of the same materials twice in the AP test and oral reading. The entire process lasted 60 min.

Scoring

Character recognition test

Each correct naming of a character was marked as correct. A divergence only in tone was treated as correct. Incorrect responses, unintelligible expressions or pronunciations, and skipped or indicated as unknown characters were marked as errors. The accuracy score was the percentage of correctly read characters. The Cronbach’s coefficient alpha value was 0.95, indicating high consistency. A second rater scored 15% of the character recognition data (15 participants). The inter-rater reliability was high at r = 0.99 (p < .01). The few discrepancies in coding were negotiated and resolved. Participants’ proficiency level was determined following the criteria of previous literature (Zhang, 2018): below 60% accuracy was classified as low proficiency, 60% to 80% accuracy as intermediate proficiency, and above 80% accuracy as high proficiency.

AP test

The accuracy score was the percentage of correctly answered items. The Cronbach’s coefficient alpha value showed a reliability estimate of 0.83, indicating relatively high consistency.

Oral reading

Seven types of reading behaviors were identified (see Table 1). An omission was considered no response to certain characters/words. A substitution was the actually produced, non-target character/word/sound. A repetition took place when participants repeated themselves on characters/words. A pause was the hesitation for more than two seconds within a word. An insertion referred to extra characters/words/sounds that were inserted into the text. A self-correction occurred when participants made an error, realized it, and self-corrected successfully. A segmenting problem was incorrect grouping of characters into non-words, which resulted in unnatural prosody.

Table 1 Seven types of oral reading behaviors

The percentage of omissions, substitutions, repetitions, insertions, self-corrections, and segmental problems out of the total produced characters was calculated. The number of pauses per minute was also counted. Another two indices were developed. The unpruned rate was the number of actually produced characters per minute. The pruned accurate rate was the number of correctly read characters per minute, excluding self-corrections, repetitions, and segmental problems. Altogether nine ORF indices were derived. A second rater scored 15% of the oral reading data (15 participants). The inter-rater agreement reached 98%. The few discrepancies in coding were negotiated and resolved.

Questionnaire

The participants’ difficulty ratings were recorded as the indicator of learner-perceived difficulty of oral reading. Their comments on the locus of difficulty were collected and analyzed.

Results

RQ1: what are the components of the ORF construct?

The descriptive statistics of character recognition test, AP comprehension test, difficulty rating, and nine ORF indices are displayed in Table 2.

Table 2 Descriptive Statistics (n = 100)

First, the average character recognition accuracy (31.96%) revealed that the participants in this study were at the lower end of the proficiency spectrum (below 60%). Only 7 out of the 100 participants achieved above 60% accuracy.

The data were transformed in response to the non-linearity problems. The percentage scores were transformed with the empirical logit function. The rate scores were log-transformed.

To explore the structure of the ORF construct measured by the nine indices, a factor analysis was run (see Table 3). Factors with an eigenvalue greater than one were retained. Accordingly, a well-defined three factor solution was generated. The three factors had eigenvalue of 3.67, 1.82, and 1.02 respectively, and explained 40.80%, 20.21%, and 11.30% of the total variance. In total, the three factors accounted for 72.31% of the variance. The indicators with a factor loading greater than .45 were considered in the interpretation of the factor. Given the six indices (UR, PR, OM, SU, RP, IN) that loaded highly on the first factor represented accuracy and speed, the factor was labeled the accurate rate factor. The second factor was labeled the chunking factor because the two indices (P, SP) that loaded highly on it measured the processes of chunking text into meaningful units. One index (SC) tapping monitoring loaded highly on the third factor and thus the factor was labeled the monitoring factor.

Table 3 Factor Loadings for Factor Analysis

RQ2: how are different components of ORF related to comprehension?

Three indices that represented each of the three ORF components/factors were selected. They were: pruned accurate rate for the accurate rate factor, segmental problem for the chunking factor, and self-correction for the monitoring factor. Pruned accurate rate was selected because it is a combined index of accuracy and speed and has been widely used in previous literature. Segmental problem was chosen over pause because pause more reflected the deficiency in decoding than chunking (Kuhn, Schwanenflugel, & Meisinger, 2010) and thus was less appropriate for indexing chunking.

Zero-order correlations were run to establish the direction and strength of the relationships between the three ORF components and comprehension (see Table 4).

Table 4 Correlation Matrix between ORF Components and Comprehension

The students’ comprehension performance had a moderate, significant correlation with accurate rate and a weak albeit significant correlation with chunking. In contrast, the correlation between comprehension and monitoring was negligible.

The accurate rate and chunking components were both significantly related to comprehension. However, considering the high inter-correlations, it was not clear whether they would explain the unique variance in comprehension. To investigate the individual and collective contributions of the two ORF components to comprehension, multiple regression analyses were conducted.

As shown in Table 5, chunking was significant when it was the only predictor in the model, accounting for 5% of the variance in comprehension. However, the standardized beta estimate of chunking became non-significant when accurate rate was entered into the equation. Accurate rate, on the other hand, made an independent, significant contribution to comprehension (p < .01) after controlling for chunking. Together the two components explained 32% of the variance in comprehension.

Table 5 Hierarchical Regression Analysis for ORF Components

RQ3: how are different components of ORF related to learner-perceived difficulty?

Zero-order correlation analyses were operated to investigate the relationship between the three ORF components and learners’ difficulty rating (see Table 6).

Table 6 Correlation Matrix between ORF Components and Difficulty Rating

The learners’ difficulty rating had a moderate, significant negative correlation with accurate rate, whereas the correlations between the other two components and difficulty rating were weak and insignificant.

Table 7 summarized participants’ remarks on the difficulties in oral reading. Four categories were identified. Character/word recognition was considered as the biggest obstacle, reflected in up to 67 statements on this issue. Another problem involved the pronunciation. Learners felt hard to pronounce characters/words correctly, although they knew the meaning. The difficulty on chunking included segmenting characters into words and parsing complex grammatical structures. The affective difficulty referred to the anxiety caused by the task of reading aloud.

Table 7 Oral reading difficulty comments

Discussion

Components of ORF

According to the factor analysis, Chinese L2 ORF consisted of three components: accurate rate, chunking, and monitoring.

It has been widely accepted that ORF primarily involves speed and accuracy (Kuhn, Schwanenflugel, & Meisinger, 2010; Morris, Pennell, Perney, & Trathen, 2018). Fluent readers can smoothly and effortlessly read at an appropriate pace, whereas slow reading with many halts and repetitions does not represent fluency. Accuracy is assumed in the concept of ORF, and it is meaningless to read rapidly but produce nonsense words. Omission, substitution, and insertion reflected the reader’s failure of providing the target responses and captured the accuracy. Unpruned rate outlined the overall pace of oral reading and repetition demonstrated the reader’s struggling with processing the characters/words, both representing the speed. Pruned accurate rate combined speed and accuracy.

The results also indicated that chunking constitutes an important dimension of ORF. Chunking, the ability to separate text into meaningful units, is one of the fundamental processes for fluent reading (Kuhn & Schwanenflugel, 2018). Without chunking, readers cannot read smoothly with the appropriate prosodic expression even if they have mastered word decoding skills. Chunking is particularly crucial in reading visually unspaced Chinese, since it enables the grouping of seemingly disjoint characters into larger blocks, namely words, instead of following the slow character-by-character manner. In this study, chunking was measured by within-word pauses and segmental problems. The former exhibited the failed grouping of identified individual characters into words and the latter signified the wrong parsing of characters that yields meaningless units that do not fit the ongoing text.

Monitoring is another component underling ORF. Monitoring involves checking the coherence of what has so far been constructed and repairing possible breakdowns and errors (Kim, Vorstius, & Radach, 2018), which was manifested in self-correction.

The relationship between ORF and comprehension

One major finding is that the accurate rate component of ORF uniquely and robustly predicted comprehension. Chinese L2 learners who read aloud correctly and rapidly tended to have better textual understanding. It confirmed that accurate and fluent oral reading was related to enhanced comprehension in reading alphabetic L1 (Arnesen et al., 2016; Jenkins, Fuchs, Van den Broek, Espin, & Deno, 2003; Roehrig, Petscher, Nettles, Hudson, & Torgesen, 2008; Sabatini, Wang, & O’Reilly, 2018; Schilling, Carlisle, Scott, & Zeng, 2007; Silberglitt et al., 2006; Wanzek et al., 2010) and L2 (Crosson & Lesaux, 2010; Jeon, 2012; Jiang, 2016; Jiang, Sawaki, & Sabatini, 2012; McTague, Lems, Butler, & Carmona, 2012). Such a strong connection extends to reading a logographic L2 (Chinese), suggesting the cross-linguistically universal role of the accuracy and speed aspects of ORF in comprehension.

Another finding is the significant correlation between chunking and comprehension. Chinese L2 readers who made more chunking errors in oral reading performed less well in comprehension. It was in line with prior research which showed that the chunking/prosody performance is an indicator of comprehension (Arcand et al., 2014; Benjamin & Schwanenflugel, 2010; Calet, Gutiérrez-Palma, & Defior, 2017; Fernandes, Querido, & Verhaeghe, 2018; Groen et al., 2019; Klauda & Guthrie, 2008; Rasinski, Reutzel, Chard, & Linan-Thompson, 2011; Pey, Min, & Wah, 2014). The lack of visually marked word space in Chinese renders chunking an explicitly activated process that operates at word level instead of phrase level in alphabetic languages. Word-level chunking in Chinese contributes to comprehension through lexical access (Shen & Jiang, 2013). On the one hand, word boundaries naturally take shape when readers decipher the meaning of individual words within the connected text. Therefore, successful word segmentation to some extent depends on accurate and rapid word recognition. On the other hand, finding the correct word boundary helps pack relevant characters into contextually appropriate words and elicits lexical access, if word recognition fails or ambiguity occurs. Lexical access then provides data for higher-order processes to work on to build comprehension. Inaccurate lexical access may cause difficulty in comprehension, which in turn informs readers of the inappropriateness of an initial segmenting decision and further promotes a re-examination and new combination that leads to a change in meaning.

However, chunking did not add significantly to comprehension when controlling accurate rate, consistent with previous studies that chunking/prosody does not uniquely predict comprehension beyond accurate rate (Riedel, 2007; Sabatini, Wang, & O’Reilly, 2018; Schwanenflugel, Hamilton, Kuhn, Wisenbaker, & Stahl, 2004). The accurate and fluent oral reading requires the synchronization of multiple sub-skills (Breznitz, 2006), including segmenting the text into meaningful chunks and preserving the natural prosody. It is uncommon for one to read rapidly and accurately without appropriate prosody. Hence, it is reasonable that accurate rate captured variance associated with chunking/prosody.

The critical role of the accuracy, speed, and chunking aspects of ORF in comprehension highlights the importance of low-level word recognition and segmentation processes. Low-level processing competes with high-level processing for the limited cognitive capacity. Lack of efficiency in low-level processing places additional demands on readers’ cognitive resources, leaving few for high-level processing to generate comprehension. If readers have difficulty in identifying and segmenting words, their reading will be slow and laborious, which further disrupts the processes of making and maintaining connections among ideas within a text. Consequently, the understanding of textual meaning will be hampered. As readers move toward automatic low-level processing, more cognitive resources will be freed up for text processing to gain better comprehension. For L2 readers, the already limited cognitive resources are devoted to low-level processing and thus little remains for higher-level processes. Comprehension will be challenging as a result.

A very weak relationship between self-correction and comprehension was observed, probably due to the floor effect: self-correction seldom occurred and the individual variability was rather small. The rare self-correction was in parallel with prior studies on Chinese L2 low-level readers (Liu, 1999; Wu & Anderson, 2007). The reason L2 readers did not intend to self-correct might be the failure to either realize the presence of errors or provide a repair. L2 oral reading is constrained by learners’ unstable language system and insufficiently developed processing skills (Segalowitz, 2010). With cognitive resources exhausted by recognizing, segmenting, and pronouncing words, L2 readers have no additional attention to detect and fix errors despite sensing the non-comprehension (Francis, 1999). Apart from not having the competence to identify and correct certain errors, L2 readers are also likely to be uncertain whether their oral reading is error free due to their limited metalinguistic awareness (Kormos, 1999). Another possibility is that L2 readers have their own perceptions as to what extent their errors may result in meaning breakdown and are reluctant to correct errors because they still make sense from the text (Kucer, 2017; Share, 1990).

The relationship between ORF and learner-perceived difficulty

Oral reading was viewed as a moderately difficult task (M = 6.47) and the difficulty rating was negatively correlated with accurate rate performance. Participants who read texts slowly and arduously were more likely to consider the task difficult. The accuracy and speed dimension of ORF marks the efficiency in word recognition, and its relation with learner-perceived difficulty provided evidence that stumbles at word recognition are at the core of reading difficulties (Stanovich, 2000). It has been extensively reported that at-risk L1 readers unexceptionally encountered difficulty in word recognition (Parker et al., 2015). Most of L2 reading problems also lie in the processing of L2-specific linguistic forms such as words (Yamashita, 2001). It was further supported by participant statements that the unrecognized characters/words constituted the main hurdle in oral reading.

The correlation between difficulty rating and chunking performance did not reach significance. It seemed that the implementation of segmenting adjacent character strings into meaningful words did not affect learners’ evaluation of the oral reading’s difficulty level. Nevertheless, a number of participants commented that struggling with chunking imposed burden on oral reading, increasing the task’s perceived-difficulty.

Another challenge that was frequently mentioned by participants was the hindered pronunciation retrieval. It has been posited that comprehension without recoding is more common in L2 reading (Lems, 2006; Jeon, 2012). Knowledge of a word does not ensure a successful recoding due to the deficiency in pronunciation. It is especially true in Chinese: the lack of script–sound correspondence makes pronunciation retrieval of a word a challenging task (Hu, 2010).

Conclusion

The current study showed that L2 Chinese ORF consisted of accuracy, speed, chunking, and monitoring components. The accuracy, speed, and chunking components were robustly related to comprehension, with the accuracy and speed emerging as the stronger predictor. The accuracy and speed dimension was also an indicator of learner-perceived difficulty of oral reading. The findings offered strong support for ORF as an effective tool to assess comprehension and diagnose reading problems in Chinese L2 reading. The accuracy, speed, and chunking aspects of ORF are sufficiently accurate to reflect comprehension performance and sensitive to detect reading difficulties. The scoring can be conducted promptly and reliably.

Another implication of this research lies in the systematic development of ORF. Given the essential role of ORF in comprehension, ORF training should be incorporated into Chinese L2 reading curricula. Effective pedagogical practices include the specially-designed exercises targeting character/word recognition and segmentation, as well as integrated activities such as reading aloud, extensive reading, and repeated reading.

There are several limitations of the present study. The majority of the participants were at lower proficiency level. It remains a question how the relationship between ORF and comprehension evolves as learners’ overall language competence develops. Future studies that recruited advanced learners would generate more insights into the interaction between ORF and comprehension at varying proficiency levels. It is also a worthwhile avenue of further research to probe the benchmark of ORF a reader should reach to obtain adequate comprehension.