Introduction

Scientific expository texts are often supported by graphics that describe and explain scientific facts, concepts, and phenomena (Ainsworth, 1999). There are three common types of informational diagrams: representational, organizational, and explanatory (Carney & Levin, 2002). Representational diagrams depict the description in the main text, aided by textual labels and brief descriptions, which are common instructional diagrams in relation to scientific texts written for young readers. The content of these diagrams provides engaging and easy-to-understand imagery of the structure or scenario covered by the main text, assisting readers in building a more coherent understanding. As such, readers who decode, organize, and integrate information from both the verbal and graphical elements of a text tend to attain better learning outcomes (Jian, 2017, 2022a, 2023; Jian et al., 2023; Mayer, 2003; Schnotz & Bannert, 2003; Wang & Jian, 2022). The reading behaviors and cognitive demands regarding illustrated texts are understandably different from reading pure text (Slough & McTigue, 2010). The spatial and temporal patterns of cognitive processing when reading can be inferred from eye movements with the use of eye-tracking technology (Rayner, 1998). Specifically, the eye movements of young readers when reading multimedia texts were examined to study the processing of diagrams in the text, as well as cross-references made between the main text and the diagrams (Jian, 2017; Hannus & Hyönä, 1999). However, little is known about the longitudinal development of text-diagram integration in young readers. The present study aimed to examine how text-diagram integrative processing patterns change in upper elementary grades, during which learners are expected to acquire increasingly complex scientific knowledge from reading.

Integrative processing in multimedia reading

The multimedia effect, or the advantages of reading text with diagrams over text-only reading (Mayer, 2003), has been found to manifest in memory retention, deep comprehension, and inference generation on content (Butcher, 2006; Mayer & Anderson, 1992; Mayer & Gallini, 1990). In addition to the established relationship between multimedia reading and its learning outcomes, Mayer (2009) has also put forward a framework that describes the cognitive process of reading a text with supportive diagrams. The framework is based on the assumption of the dual-coding theory (Paivio, 1990), which states that visual and verbal information are processed in separate channels.

The Cognitive Theory of Multimedia Learning (Mayer, 2009) provides a framework for the mental processing of texts that contain both textual and graphical information, from the perspective of cognitive psychology and the information-processing model of cognition. Firstly, a reader actively selects useful information from the text and diagram, then encodes it into verbal and visual working memory, respectively, and separately. Subsequently, the information on either channel is separately organized into coherent mental representations before undergoing integration to form new knowledge. However, as diagrams bring in the benefits of facilitating the building of coherent mental representations, in general, empirical research has observed a positive association between integrative processing behavior and learning outcomes among young readers (e.g., Jian, 2017, 2021, 2022b; Jian et al., 2019; Hannus & Hyönä, 1999). However, how and when integrative processing emerges in young readers remain to be understood. From the results, researchers could assess whether text-diagram integration instruction needs to be promoted and at what stage in the development of content-area reading it should take place.

Eye-movement on multimedia learning

Eye-tracking technology allows for the moment-by-moment process of reading to be captured and analyzed. Based on the eye-mind hypothesis, eye movements are indicative of the real-time cognitive processes involved in reading or watching (Just & Carpenter, 1980). Computational models such as E–Z Reader (Reichle et al., 1998) and SWIFT (Engbert et al., 2005) also explain how eye movements during reading are shaped by visual, linguistic, and cognitive processes. Furthermore, empirical studies have found that eye movements can reflect and predict general reading performance (Foster et al., 2017; Underwood et al., 1990). Specific indexes, such as decreased total fixation durations, could indicate high word frequency among adults and children (Juhasz & Rayner, 2003; Tiffin-Richards & Schroeder, 2015). The same eye-movement index was also associated with neural activations in the oculomotor and language areas when reading a text (Henderson et al., 2015). Eye movements can also be used to identify strategies to improve reading outcomes (Jian, 2017, 2022a, b, c; Lou et al., 2016).

In considering text-diagram integration, two eye-movement indicators have been adopted: the total fixation duration on the diagrams (Hannus & Hyönä, 1999; Mason et al., 2015a, 2015b) and the saccadic numbers between the text and diagrams (Jian, 2017). The former is commonly measured by the total duration of eye fixations in the diagram region, whereas the latter is gauged by the number of eye-fixation shifts between the text and diagram regions, covering both the temporal and spatial aspects of eye movements (Jian, 2022a, b, c; Lai et al., 2013). Based on these indicators, it was found, for example, that those with high reading abilities showed a higher degree of integrative processing (Jian, 2017; Hannus & Hyönä, 1999); abstract diagrams induced more integrative behaviors than concrete diagrams (Mason et al., 2013a); and integrative processing could be induced by appropriate guidance (Mason et al., 2015a).

Regarding reading in general, previous longitudinal eye-movement studies traced the development trajectories of successful and struggling readers alike in elementary students (e.g., Huestegge et al., 2009; Sperlich et al., 2016). Specifically, the growth of oculomotor skills, such as the precise execution of automatic and voluntary saccades (Klein & Forester, 2001; Kramer et al., 2005), provides a solid foundation for reading development. The development of both oculomotor and linguistic skills is behind the increase in perceptual span (i.e., the area around an eye-fixation where information is inspected) as reading ability progresses (Rayner, 1986). Longitudinal studies on higher-level reading processes have also contributed to the researchers’ understanding of the development of specific linguistic skills (e.g., Joseph & Liversedge, 2013; Tiffin-Richards & Schroeder, 2018). This study aims to expand on the existing findings on text-diagram integrative processing in children and to track its development across a developmental period during which basic oculomotor and linguistic skills are close to maturity, despite reading strategies still being in the formative stage.

Cluster analysis

Cluster analysis has been widely used as a statistical method for classifying individual observations into discrete clusters based on the collection of variables attributed to individuals. The number of clusters was determined by the researcher prior to the analysis (Lloyd, 1982). This technique is also common in longitudinal studies on learning and reading to examine the multivariate change in individuals over time (e.g., Corpus & Wormington, 2014; Wei et al., 2015). One advantage of using cluster analysis to examine individual changes over traditional within-subject comparisons is that the former technically allows for a different set of measurements or variables on the same construct across time, whereas the latter is much more stringent in this regard. To examine changes in the reading process with longitudinal development, the present study selected one set of text that was identical throughout the three years and another set of texts that were new to the participants every year to avoid repetition and ensure that the text was appropriate to the grade level. As such, the researchers ran the cluster analysis on each grade level and text set for year-on-year comparisons.

The researchers employed a cluster analysis approach to identify eye-movement patterns adopted by previous research (Jian et al., 2019; Mason et al., 2013b). Mason et al. (2013b) used eye-movement data to identify the three reading types exhibited by fourth graders during text-and-diagram science reading: (1) intermediate integrators; these readers had a lower number of gaze shifts from text to diagram sections or in reverse, as well as a shorter re-fixation on the diagrams while re-reading the text. (2) Low integrators; these readers had the lowest level of integration of text and diagrams, as well as no re-fixations on the diagram while re-reading the text. (3) High integrators; these readers displayed a longer inspection time of the diagrams during the first encounter and more integrative eye transitions from text to diagrams, as well as in reverse. Among the three reading types, close to 50% of the participants were classified as high integrators.

The Jian et al. (2019) also used eye-movement data to classify readers’ reading patterns. They found that the sixth-graders had four reading strategies during text-and-diagram science reading: (1) Initial-global-scan; students reading the science text and examining the science diagram for the first time tended to quickly scan the material, then read it carefully, and engage in saccade behavior. (2) Shallow-processing; students spent little time on the text or diagram during their first and second-pass reading, and they also seldom engaged in saccade behavior. (3) Word-dominated; students spent a long time reading the text during the first-pass reading. (4) Diagram-dominated; students spent considerable time and effort on diagrams during the first-pass reading and outperformed the other three groups in the reading comprehension test. In contrast to the findings of Mason et al. (2013b), the Jian et al. (2019) found 58% of the participants were shallow-processing.

In sum, both of the above-mentioned studies reported that using cluster analysis to analyze eye-movement data was a useful way to identify the different reading strategies the readers employed, and the level of text-and-diagram transitions was the main factor in distinguishing among the different strategies. Nonetheless, the percentages of the dominant reading strategy for science text reading used by elementary school students differed between the studies (Jian et al. 2019; Mason et al., 2013b). This conflicting finding deserves to be examined in further detail.

The present study

This study focuses on examining whether elementary school students employ different reading strategies based on various levels of text-diagram integrative processing and whether these reading strategies are maintained or change across the three grade levels. This exploration is conducted using a cluster analysis of relevant eye-movement metrics. Furthermore, the study aims to answer the following research questions:

  • RQ1. Whether the cluster analysis of the eye-movement metrics pertaining to the text-diagram integration could give rise to three discrete reading patterns, which the researchers termed “integrative” (actively engaged in text-diagram integration), “textual” (focused largely on the textual part only), and “shallow” (weak in both text-diagram integration and textual reading).

  • RQ2. Based on multiple clustering analyses performed across the upper elementary levels, does text-diagram integration become more common as a reading strategy in higher grades?

  • RQ3. At the individual level, among readers who perform text-diagram integration, is the reading strategy maintained across the grade levels? The researchers assume that the reading strategy tends to be stable once it is formulated. However, the actual application of the reading strategy might depend on factors outside the scope of this study.

Methods

Participants

A total of 175 pupils (99 females; initial age: M = 10.26, SD = 0.59) from four primary schools in Taiwan successfully completed the study across three successive years from Grade 4 to Grade 6. Initially, 246 participants were included in this study. However, 16 of them failed to attain the required score on the reading comprehension test and were therefore not invited to the subsequent procedure, resulting in 230 children who successfully completed the procedure in Grade 4. Subsequently, in Grade 5, 38 participants were lost due to attrition, and another six were excluded due to poor calibration for eye-tracking. Lastly, in Grade 6, eight children did not return to the study, and another three were excluded due to poor eye calibration. Consequently, 175 participants remained with a complete set of data for three consecutive years. All participants were native readers of traditional Chinese and had normal or corrected-to-normal vision. Informed consent and assent were obtained from the parents and participants, respectively, each year before beginning the procedure.

Materials

Three scientific expository articles in traditional Chinese were read by the participants at each grade level. One of the articles, hereinafter known as the mantis set, was identical for all three grades; the other two articles differed in content and length across the grade levels. The manuscript consisted of two pages; the other two articles, also known as the main texts, were seven pages long (with different passages) at each grade level. All participants read these passages. However, the analysis in the current study was restricted to only two pages (more details below). These articles were taken directly from local science magazines, with the intended readership being children and teenagers, namely: Young Newton (Taipei: Newton Media) and Young Scientist Monthly (Taipei: Yuan Liou Publishing). The layouts of the passages were minimally altered so that they could be properly presented on a typical computer screen. Each article contains the main text, heading, subheadings, illustrative diagrams (usually on the right or bottom of the page), and captions. The articles were reviewed by a professor specializing in reading and two primary school teachers with a master’s degree in science to ensure that their level of difficulty was suitable for the corresponding grade levels.

From these articles, two sets of reading materials were selected for the clustering analysis, each consisting of one page of expository text on a particular science topic and was read by pupils of a particular grade. The cluster analysis focused on a fraction of the original articles to simplify the analysis and allowed for the selection of relatively more similar texts across grade levels. The first set of reading materials (known as the “mantis set” or “mantis text”) is identical across the grade levels, and its topic is the bodily features of mantises; the other set (known as “the assortment set”) contained different passages for each grade, which describes the biological characteristics of a dolphin (“dolphin text”), an octopus (“octopus text”), and the inner core of planet Earth (“Earth text”)—for the respective grades. The selection of texts for the assortment set was based on the following requirements: (1) the text must contain one or two representational diagrams (depicting the structure of an object or organism covered in the main text); (2) it must contain two main paragraphs; (3) each diagram must correspond to the content of a particular paragraph; and (4) the diagram must contain textual labels on specific components of the object depicted. Similarly, the manuscript meets all the requirements mentioned above. Other pages of reading materials were excluded from the present analysis because of deviations from the above requirements. The text in relation to Earth was slightly different from the rest because it contained a short introductory paragraph (60 characters long) before the two main paragraphs. This paragraph was excluded from the analysis because it had no direct correspondence with the diagram in terms of content, as all the pages selected for the present analysis described the overall bodily structure of an animal (or the structure of the Earth for Grade 6), as well as the functions and operations of some of the key components. Other unselected pages covered had either a broader perspective, a specific feature, or a specific operation.

All the diagrams depicted the animals or objects described in the main text. Some of the labeled parts in these diagrams were mentioned in the main text, portraying the spatial relationships between these parts and others, as well as between the components in focus and the animal or object as a whole, which were otherwise abstract and difficult for mental visualization. As the diagrams were complementary to the main text in facilitating the understanding of spatial relationships among components, readers were expected to integrate information from both sources in the process of reading, rather than treating the visual and verbal sections as dissociated materials that were read separately. It should be noted that the text did not carry any instructions to refer to the diagrams during the course of reading, nor were there any instructions of a similar nature delivered by the experimenter.

Assessments

The reading comprehension was measured using Ko’s (1999) Chinese reading comprehension screening test in Grade 4, administered as a screening test before the researchers’ reading procedure. Participants who scored one SD below the mean were not invited to participate in the subsequent procedure. The internal consistency reliability was 0.82, and the test–retest reliability was 0.94.

Apparatuses

The eye movements during the reading task were monitored and recorded by a desktop-mount EyeLink 1000 video-based eye tracker (SR Research Ltd., Mississauga, Ontario, Canada) in Grade 4. In Grades 5 and 6, a head-mounted EyeLink Portable Duo from the same manufacturer, with a sampling rate of 1000 Hz, was used. A chin rest was used to stabilize the head, with approximately 65–70 cm between the participant and the 24-inch monitor screen, which had a resolution of 1920 × 1200 pixels. The screen covered 48° of horizontal and 30° of vertical visual angles.

Procedure

Data collection took place yearly, starting from Grades 4–6. After obtaining parental consent and child assent, in Grade 4, the reading comprehension test was first administered in groups, with a time limit of 30 min. In the eye-movement experiment, the participants were seated in a classroom individually, accompanied by a research assistant, and completed the eye tracker calibration and testing, followed by the reading task. The manuscript text was read first, followed by the main text. Participants were told to read the text at their own pace, and for the two main texts, they were informed that they would subsequently be tested on the content. When reading, one page of the text was presented on a computer screen for a natural reading experience. The participants pressed a key on the keyboard for the next page upon completing a page and were not permitted to return to the previous page. This procedure was repeated for the main text before the participants were released. The procedure for the eye-movement experiment was the same in the following two years.

Data processing

Areas of interest

The areas of interest (AOIs) were defined using the diagrams and main paragraphs in the selected texts. A rectangular boundary was drawn around each diagram and the main paragraph in the EyeLink Data Viewer to identify the fixation points within the area. Depending on the number of diagrams (all texts contained two main paragraphs), each text had either three or four AOIs.

Eye-movement indices

In a previous study (Jian et al., 2019), the researchers examined the strategies of reading scientific texts among Grade 6 students using cluster analysis, with the eye-movement indicators entered into the analysis comprising the first- and the second-pass total fixation durations (TFD) on the text and diagram regions, as well as cross-regional saccade count. This study shifted the focus to the longitudinal development of text-diagram integrative processing and followed a large group of Grade 4 students for three years.

In addition, the selection of eye-movement indicators departed from the conventional metrics of the first- and second-pass TFDs. Instead, considering the possibility that the first-pass fixation runs might only capture a circumstance (approximately 1–2 s), where the reader only intended to scan through the text before actually reading the region, the researchers isolated the longest fixation run from all the fixation runs within a specific area and added the TFD as a measure that represented engaged processing. The longest fixation run for a specific area may or may not occur at the first entry into the region, as captured by the first-pass TFD measure. This measure, the longest-run TFD, is more commonly known as the longest period of fixation in the literature (e.g., Bylsma et al., 1995; Shakespeare et al., 2015). In this study, it refers to the total fixation duration of the longest fixation run on a diagram or paragraph AOI. While it has been established that a long TFD in general represents more cognitive effort in processing the information within a specific area (Hannus & Hyönä, 1999; Hegarty & Just, 1993; Miller, 2015), a larger value of the longest-run TFD reflects more sustained cognitive efforts in processing the target AOI without the mind wandering or jumping to another AOI for cross-references.

The second set of eye-movement metrics used in the cluster analysis was the remaining TFD. It refers to the total fixation duration of all fixation runs other than the longest run on a specific AOI (i.e., overall TFD minus longest-run TFD). Similar to the longest-run TFD, a larger value of the remaining TFD generally reflects high overall cognitive efforts. The ratio of the longest-run TFD to the remaining TFD could indicate an uninterrupted, prolonged reading of the specific area, the opposite, or anything in between. A high longest-run TFD paired with a low remaining TFD suggests that the reader is devoted to processing the region in one step. In contrast, a low longest-run TFD paired with a high remaining TFD suggests that the reader is devoted to processing the region in separate episodes, likely having made cross-references to other regions. Readers with a high longest-run TFD and a high remaining TFD are likely to have intensively inspected the region, as well as made extensive cross-references to other areas in the text.

The third set of eye-movement indicators entered into the cluster analysis was the number of saccades between text and diagram. It refers to the number of eye-fixation shifts between paragraphs and a specific diagram. A regression specifically refers to an eye-fixation shift sequence that involves a shift away from any one of the paragraphs to a specific diagram, followed by a returning shift from the diagram to any one of the paragraphs. This index reflects attempts to integrate information from both the text and diagrams (Hannus & Hyönä, 1999; Hegarty & Just, 1993; Johnson & Mayer, 2012; Mason et al., 2013b). This indicator constitutes direct evidence of cross-references between the text and diagrams.

Cluster analysis

The K-means cluster analysis was performed on the selected eye-movement indicators for each of the two sets of reading materials and for each grade using the algorithm by Hartigan and Wong (1979) in R (version 4.1.2). In other words, the two texts at each grade level are clustered separately. For each cluster, the number of clusters was pre-set to three, which presumably covered (i) a group that showed a strong tendency to read the text intensively (i.e., “textual group”), (ii) a group that was inclined to perform cross-references between the text and the diagrams (i.e., “integrative group”), and (iii) a group that fell into neither of the above, which the researchers termed the “shallow group.” Note that, for example, the performance that was subsequently assigned to the textual group did not preclude text-diagram cross-referencing, and the opposite was true for the integrative group. The corresponding tendencies were construed in relative terms.

The set of eye-movement indices entered for clustering showed only slight variations across grade levels, owing to the differing number of diagrams. In Grade 4, as the reading material consisted of two main paragraphs and one diagram, a total of seven eye-movement indices were entered for clustering. These included the longest-run TFD for the first paragraph, the second paragraph, and the diagram, as well as the remaining TFD for the three areas, along with the number of saccades between text and diagram. In Grade 5, while the reading material maintained the same number of main paragraphs as Grade 4, it featured an additional diagram, resulting in a total of four AOIs and ten eye-movement indices for the cluster analysis: four longest-run TFD measures and four remaining TFD measures, with one dedicated to each area, as well as two numbers of saccades between text and diagram measures, one for each diagram. Grade 6, with only one diagram in the reading material, shared an identical set of eye-movement indices as Grade 4. Before clustering, all indices were rescaled. The algorithm was configured with 10 initial random centroids for better data separation.

Results

The cluster analysis on reading the strategy patterns

Considering the overall performance of the clustering for the mantis text, the ratio between the sum-of-squares to the total sum-of-squares, being an indicator of the goodness of classification, was 0.31 for Grade 4, 0.32 for Grade 5, and 0.34 for Grade 6. Regarding the assortment set, the ratio between the sum-of-squares to the total sum-of-squares was 0.37 for Grade 4, 0.30 for Grade 5 (with a larger set of indices), and 0.35 for Grade 6.

As expected, the clustering analysis for all the grade levels in both the mantis and assortment sets returned one cluster with below-mean values on all indices, another cluster with higher longest-run fixation durations for paragraphs and a lower number of saccades between text and diagram, and a third cluster with higher remaining-run fixation durations for paragraphs and a higher text-diagram count. These clusters were labeled as “shallow,” “textual,andintegrative” groups, respectively (Table 1). Please refer to Tables 2 and 3 for the eye-movement metrics of each cluster for the mentioned set and the assortment sets, respectively. The tests of the significance of the eye-movement indices of the three clusters were performed to examine the characteristics of each group; the results supported the group labels. For further details, please refer to Appendix.

Table 1 The number of participants by grade level and by reading behaviour group as revealed by K-means clustering on selected eye-movement indicators, for both the grasshopper text and the assortment set, which consists of a different reading text for each grade level
Table 2 Mean and SD of Eye-movement indicators of the three clusters (integrative, textual and shallow) by grade level for the mantis text
Table 3 Mean and SD of eye-movement indicators of the three clusters (integrative, textual and shallow) for the assortment set

Considering the distribution of the students among the three groups (Fig. 1), on the mantis text, a chi-square test for independence on the cross-tabulation of the number of pupils by cluster and grade level showed that the student distribution among the three groups might be associated with the grade level, χ2(4) = 13.40, p = 0.01, Cramer’s V = 0.11 (small). The post-hoc pairwise chi-square test for independence, using the false discovery rate approach to correct for multiple comparisons, revealed a significant difference in student distribution between the grades of five and six only, p = 0.03. The cross-tabulation showed that as pupils were promoted from Grade five to Grade six, the number of pupils in the integrative group and the textual group increased by 63% and 29%, respectively; however, the number of pupils in the shallow group decreased by 23%. Regarding the stability of group membership throughout the three years, a total of 46, four, and three students remained in shallow, textual, and integrative groups, respectively, from Grades 4 to 6. These students constituted 45.54%, 9.68%, and 9.09% of their respective groups in Grade 4. Overall, 69.89% of students were assigned to different groups over time.

Fig. 1
figure 1

Changes in group assignment amid grade-level progression for the two text sets

Considering the assortment set, the participant distribution of the three clusters was stable across grade levels. The shallow group remained the largest group, consisting of 108, 112, and 117 (out of a total of 176) children in Grades 4, 5, and 6, respectively. It was followed by the textual group (39, 33, and 27 children) and the integrative group (29, 31, and 32 children). A two-way chi-square test revealed that the number of participants in the three clusters and three grade levels was nonsignificant: χ2(4) = 2.70, p = 0.61. However, this did not reflect the exact composition of the participants because the numbers remained stable from one grade to another. Sixty-four out of the 117 pupils assigned to the shallow group in Grade 6 remained in the same cluster throughout the three grade levels; eleven out of 27 children belonged to the textual group in this case, and only one out of 32 students was consistently assigned to the integrative group. Changes in group assignment occurred over time in 56.25% of participants. The changes in group assignments according to grade level for the two text sets are shown in Fig. 1.

The correspondence of the group assignment between the mantis set and the assortment set appeared to fare slightly better than between-grade consistency: 63.07%, 68.75%, and 59.09% of the participants were assigned to the same cluster in Grades 4, 5, and 6, respectively.

As an extension of the cluster analysis, omnibus tests were conducted to examine whether the standardized reading comprehension test scores in Grade 4 were among the three clusters. The Kruskal–Wallis test exhibited no significant differences regarding any of the three grade levels nor the two reading sets, ps > 0.05, suggesting that the reading strategies were unlikely to be related to the baseline reading comprehension abilities.

Discussion

This study aimed to contribute to the expansion of the understanding about the development of text-diagram integrative processing, regarding the reading of illustrated expository texts in science among upper elementary students, as they transition to the read-to-learn stage. The researchers examined online processing during eye-movement indicators that reflect integrative processing involving the verbal and visual regions of the text. According to Mayer’s (2009) CTML, a cohesive mental model could arise as a result of processing both the text and diagrams that were complementary to each other. The selection of eye-movement metrics that comprised both the temporal and spatial dimensions—including the longest-run TFD of the textual and diagram regions, the remaining TFD of the same regions, and the eye-fixation regression to and from the diagram region—were deployed to measure the degree of integrative processing. While the eye-fixation regression between the textual and visual regions is a more direct measure of integrative attempts, the pairing of the longest TFD and the remaining TFD of the textual and diagram regions revealed the overall level of cognitive efforts, the relative level of engagement in the respective regions, and the extent to which the engagement was interrupted by attempts to perform cross-references between the regions. The cluster analysis was conducted to identify the patterns of integrative processing behaviors.

The researchers’ first research question asked whether the cluster analysis of the eye-movement metrics pertaining to the text-diagram integration could give rise to three discrete reading patterns, which the researchers termed integrative, textual, and shallow. The study’s results supported this hypothesis. Specifically, the shallow group tended to show shorter fixation durations for both textual and diagram regions, as well as a low number of fixation transitions between the two regions. This group of readers displayed no strong preference for either text or diagrams. The second type, the textual group, engaged intensively in text reading, characterized by lengthy fixation durations and a high proportion of fixations for the longest fixation run, suggesting a long attention span and processing time within textual regions. The third type, the integrative group, tended to perform cross-references between the textual and diagram regions, showing longer fixation durations for the diagrams, more spread-out fixation durations across fixation runs for the textual regions, and a higher number of fixation transitions between the textual and diagram regions. The researchers’ findings supported this categorization of reading strategies with text-diagram integration, using the selection of the eye-movement metrics, including the TFDs of the longest fixation run for the textual/graphical regions, the remaining TFDs for each of the regions, and the number of saccadic movements to and from the diagrams. Specifically, the adoption of the long-run TFD and remaining TFD for the verbal and visual regions as indicators of the degree of attentional focus and uninterrupted processing for a region, in contrast to frequent cross-references performed between regions, is shown to be a possible alternative to the first-pass and the second-pass TFDs (as in Jian et al., 2019 and Mason et al., 2013b), which could be affected by a quick initial scanning strategy.

Instead of using the cluster analysis to divide participants into different degrees of text-diagram integration based on eye movements during reading, as in Mason et al. (2013b), the present study considered a reading strategy that involved little integrative processing, but heavy textual processing. This group of readers showed strong motivation and intention to actively process the text, but due to the lack of awareness of the importance of diagrams or inadequate visual literacy (McTigue & Flowers, 2011), they tended to neglect the diagrams during the reading process. The researchers believe that by distinguishing this group from those with low processing in the text and diagrams, the findings on the text-diagram integration could be more informative. This is also the crux of the previous study (Jian et al., 2019), in which the researchers identified text- and graphic-oriented readers that informed the analysis of this study. Despite the different study designs and analytical methods, Mason et al. (2013b) identified half of their Grade 4 readers as high in integrative processing, while Jian et al. (2019) categorized only 10% of the Grade 6 participants as graphic-oriented and where four clusters emerged from the cluster analysis. Meanwhile, around one-fourth or less of Grade 4–6 readers were in the integrative group in the present study. Overall, the proportion of upper elementary students who showed relatively satisfactory text-diagram integration remained in the minority, and the proportion was particularly limited when the analysis included a text-oriented group, despite objective thresholds or requirements for clustering not being present. Nevertheless, the present study conducted a separate cluster analysis on the selected indicators of text-diagram integration for two different text sets each year. The clustering results showed a higher consistency in the group assignment between the texts within each grade, as opposed to the grade levels, thereby lending support to the value of the clustering analysis in identifying integrative reading strategies among the readers.

The second research question was whether text-diagram regression as a reading strategy would become more common as students were promoted to upper grades. The cluster analysis revealed that a relatively small proportion of students either focused on the textual sections or showed a tendency to integrate information from the textual and graphical regions. The shallow group comprised the largest number of students across the grade levels and text sets (52–67%), with the textual group (16%–25%) and the integrative group (17–28%) trailing behind. The proportion of each cluster did not vary significantly, and there is a lack of evidence to support the hypothesis that integrative processing became more common during the advancement of the participants’ reading abilities, especially at the upper elementary level. Apparently, the majority of young readers did not fully develop the competency to integrate verbal and visual information into a typical, expository science text. Specifically, CTML (Mayer, 2003) assumes that the verbal and visual components of a text can attract almost equal attention and allow for similar levels of cognitive processing, or at least has chosen to ignore the issue of different reading strategies. The researchers’ findings suggest that at least one-fourth of young readers approached a text with multimedia in a manner close to a pure text across the grade levels. They may have been uninterested in the pictures or failed to apply appropriate strategies to multimedia learning, assuming that the best strategy was to focus on the text and carefully read it word-by-word from beginning to end.

The third research question asked whether students who were actively engaged in text-diagram regression in lower grades would continue to remain so when promoted to upper grades. The results showed that affiliation with reading strategy groups was far from stable across grade levels, with no clear pattern changes in the reading strategy over time. Similar patterns of results were observed in reading materials that remained constant and also changed across the grade levels; the shift in reading strategies with regard to the text-diagram integration is unlikely to be affected by the topic or content. Examining the possible factors behind the reading strategy shift is beyond the scope of this study; however, the results point to the fluid development of strategies for reading science texts that contain graphics and diagrams. The young readers appear to have developed inconsistent reading strategies when approaching such a reading genre, even when their read-to-learn abilities reach maturity (Wanzek et al., 2010). It could take a longer time frame (for example, well into secondary education) for more learners to master the skills of text-diagram integration during the course of reading, although the skills might also be lacking even among undergraduates (Cromley et al., 2010).

Overall, the cluster analysis based on the related eye-movement indicators returned three types of reading strategies in relation to multimedia learning: showing satisfactory text-diagram cross-references, focusing on paragraphs only, and neither of the above. Regarding the developmental aspect of integrative strategies, the analysis provided no evidence of a gradual improvement of text-diagram integration from Grade 4 to Grade 6; the reading strategies deployed by young readers were not very stable over time either. The young readers’ use of reading strategies in science expository texts could be subject to motivational and incidental factors.

Conclusion and research contributions

With the objective of tracking the longitudinal development of text-diagram integration in science reading among senior elementary school students, the researchers set out to investigate three research questions: Regarding the first question—whether the cluster analysis on selected eye-movement metrics could classify students into three reading strategy groups, the method did indeed enable us to classify students into “integrative” (active in text-diagram integration), “textual” (engaged in largely the textual part only), and “shallow” (weak in both text-diagram integration and textual reading) groups. The analysis showed that the reading process for illustrated science texts gave rise to these three strategy patterns at each grade level and for each text set. In terms of the eye-movement indicators used for the cluster analysis, in addition to the number of fixation regressions to and from the diagram areas, which is a more direct and typical measure of text-diagram integration, the researchers adopted the eye-movement indicators of the longest-run TFD and remaining TFD for textual and diagram regions. They did so to measure the relative efforts in processing verbal and visual information; this enabled the researchers to identify students who actively perform cross-references or focus on the text while largely neglecting the diagrams. At the same time, the overall cognitive efforts in the verbal and visual information could also be inferred from the grand total fixation durations for the respective regions (i.e., the combination of the longest-run TFD and the remaining TFD). For the second research question, the a priori speculation that text-diagram integration would become more prevalent as elementary school students progress from Grades 4 to 6 was not supported by the results. Judging from both the output of the cluster analysis and indicative eye-movement metrics alone, text-diagram cross-references did not manifest as an increasing trend during upper elementary grades. As for the third research question, the speculation is that those who performed text-diagram integration in Grade 4 would subsequently maintain the reading strategy. The results showed that individual students did not consistently adopt the strategy of text-diagram integration as their reading ability and general knowledge matured. The text-diagram integration is yet to become habitual behavior for readers in upper elementary grades.

This study has several research contributions and pedagogical implications. Regarding reading in general, previous longitudinal eye-movement studies traced the development trajectories of successful and struggling readers alike among elementary students (e.g. Huestegge et al., 2009; Sperlich et al., 2016). Specifically, the growth of oculomotor skills, such as the precise execution of automatic and voluntary saccades (Klein & Forester, 2001; Kramer et al., 2005) provides a solid foundation for reading development, and the development of both oculomotor and linguistic skills is behind the increase in the perceptual span (i.e., the area around an eye-fixation where information is inspected) as the reading ability develops (Rayner, 1986). The longitudinal studies on higher-level reading processes have also contributed to the understanding of the development of specific linguistic skills (e.g., Tiffin-Richards & Schroeder, 2018). This study expanded on the existing findings regarding text-diagram integrative processing in children and tracked its development across a period of development wherein the basic oculomotor and linguistic skills are close to maturity; however, the reading strategies during this period are still in the formative stage. As the first study to investigate the longitudinal development of text-diagram integration, it has identified the need to conduct further research into the reading strategy at a more advanced learning stage, where both reading skills and metacognitive skills become even more proficient.

For pedagogical implications, with a relatively small proportion (i.e., around one-fourth) of the students in the upper elementary grade levels in this study performing the reading behavior of referring to the relevant textual and pictorial information in the scientific article, the researchers suggest that children should be guided to perform cross-references between the texts and the relevant diagrams by teachers, instead of focusing heavily on the text or ignoring the supportive diagrams. Visual literacy should form an important part of the reading instruction (Jian, 2017, 2020, 2021, 2022a, c; Wu et al., 2021; Yang, 2017), in order to ensure that young readers understand how common illustrative diagrams, such as representational diagrams, should be approached, and how they can be utilized in conjunction with the main text to build mental representations. The misconception that a good reader attentively follows the main text word-by-word from beginning to end, even for an illustrated text, should be addressed. The ability to employ reading strategies appropriate to the genre, subject matter, and purpose of reading should become a part of the content-area instruction (Ainsworth, 2006; Kuo & Jian, 2022).

Limitations

This study has several methodological limitations. Firstly, the findings on the assortment set could be subject to the selection of texts that vary in topic and content complexity, as the present study aims to examine reading in a more naturalistic setting. Meanwhile, the issue with the mantis set was that the same text was presented repeatedly over time, which might have partially explained the overall decline in fixation duration from Grade 4 to Grade 6. In addition, clustering is based on the relative values of input variables among the selected cohort of participants, rather than on reliable thresholds or population-based reference values.