Introduction

Reading comprehension is essential to learning new knowledge in content areas and is sustained by various cognitive, motivational, and contextual factors (Alexander, 2012; Bråten, Ferguson, Anmarkrud, & Strømsø, 2013; Kim, Petscher, & Foorman, 2013; Taboada, Tonks, Wigfiled, & Guthrie, 2009). Printed or digital textbooks and websites accessed as information sources contain texts accompanied by various kinds of visual displays to support learning: diagrams, graphs, photographs, charts, maps, etc. Successful comprehension of these materials requires comprehension of multiple external representations, which have potential benefits (Ainsworth, 2006).

The multimedia principle states that comprehension is better when learning from text and pictures, rather than from text alone (Mayer, 2009). Empirical research has documented that texts accompanied by visuals are more effective than non-illustrated texts (e.g., Butcher, 2006; Mason, Pluchino, Tornatora, & Ariasi, 2013c) regardless of the domains of study, whether presentation formats are paper or digital, and whether assessment is for retention or transfer of knowledge (Butcher, 2014; Eitel & Scheitel, 2014 for recent reviews). In particular, some graphical reading processes are correlated with comprehension measures (Norman, 2012). Research has also shown that students’ metacognitive judgments reflect their belief that they learn better from texts with diagrams than from texts alone, even when visuals are not effective (Serra & Dunlosky, 2010). Students may believe that they comprehend pictures easily as they are processed faster than written texts (Schroeder et al., 2011). Students may also skip over relevant visuals when interacting with a biology text which includes complex diagrams, although they are able to engage in high-level cognitive activity when they do read the diagrams (Cromley, Snyder-Hogan & Luciw-Dubas, 2010a, 2010b).

What underlies the beneficial effects of multimedia instructional materials? Through the current study we aimed to extend previous research providing evidence that what uniquely contributes to the successful comprehension of an illustrated science text is the integrative processing of verbal and graphical information. This takes place during the delayed and more purposeful re-reading of the instructional material. To this aim we used eye-tracking methodology in the context of a lower-secondary school to trace students’ verbal and graphical information processing as revealed by multiple indices of visual behavior while interacting with an illustrated science text.

Multimedia principle and comprehension of text and picture

Two theoretical accounts may explain the potentially beneficial effects of multimedia materials. The first is the cognitive theory of multimedia learning (Mayer, 2009, 2014). According to this theory three essential processes lead to the comprehension of verbal and graphical information: selection, organization and integration. The selection process leads to the extraction of relevant words from the text and relevant elements from the picture. During the organization process the selected material is processed further for comprehension and retention of textual and graphical information. This process results in the construction of a verbal model and a pictorial model. The last process implies connecting these two models with each other and with relevant prior knowledge retrieved from long-term memory to form a coherent mental representation.

The second theoretical account of the potential benefits associated with an illustrated text is the integrated model of text and picture comprehension (Schnotz, 2014; Schnotz & Bannert, 2003). According to this model, dual coding applies to the processing of both texts and images, and the different principles of representation complement each other. For text comprehension, constructive processes based on schemata with both selective and organizational functions lead to a structured propositional representation. A mental model from a mental representation of the text surface structure is also formed. Similar processes occur for picture comprehension starting from the visual perception of the picture and resulting in a mental model and a propositional representation of the content via high-order cognitive processing. The formation of a coherent mental model of an illustrated text relies on structural mapping processes involving the propositional representation and the mental model, in both text and picture comprehension.

According to both theoretical accounts, integration processes are crucial to learning from texts and pictures, once relevant information has been selected and organized. It is worth noting that the integration of verbal and graphical information may concern not only the text segments that correspond precisely to the graphical segments, but also the non-corresponding segments. For example, when a student reads about condensation in a text regarding the water cycle, s/he may need to look at the depiction of evaporation to understand better the difference between the two phenomena, or to connect different but relevant segments of the two (verbal and graphical) representations.

If successful comprehension of an illustrated text implies the integration of verbal and graphical information, it seems particularly relevant to examine when integrative processing occurs and whether it uniquely predicts learning from text over and above individual characteristics. In this regard, eye-movement recording is a useful methodology to trace the time course of information processing and to attain quantitative and objective indices of visual behavior during reading (Rayner, Chace, Slattery & Ashby, 2006).

Processing of text and picture: evidence from eye-tracking data

Eye-tracking methodology has received increasing attention in research on multimedia learning (van Gog & Scheiter, 2010; Mayer, 2010; Hyönä, 2010). Several eye-tracking studies have contributed to unravelling aspects of university students’ text and picture processing (e.g., Eitel, Scheiter, Schüler, Nyström & Holmqvist, 2014; Hegarty & Just, 1993; Johnson & Mayer, 2012; Stalbovs, Eitel & Scheiter, 2013). However, only few investigations have focused on text and picture processing in younger students. A pioneering study was carried out by Hannus and Hyönä (1999) with 10-year-old students learning biology textbook materials. Eye-fixation data showed that the readers attended only marginally the graphical representations and their comprehension was largely driven by the text. High-ability students, however, attended for relatively more time the pertinent segments of the verbal and visual material (experiment 2).

Recently, Mason, Pluchino and Tornatora (2013b) examined the effects of reading a science text illustrated by either a labelled or an unlabelled picture in 6th graders. It emerged that the former promotes more integrative processing of the verbal and graphical parts of learning material, as revealed by the time spent re-inspecting the picture while re-reading the text and vice versa. In addition, integrative processing correlated with scores for factual knowledge and transfer of knowledge.

Another study focused on the role of a concrete and an abstract picture in illustrating a science text to 11th graders. The concrete picture was a contextualized representation of the scientific concept introduced in the text, where the concept of an inclined plane was depicted in a mountain scenario. The abstract picture was a decontextualized representation as the inclined plane and descending body were depicted schematically without using a realistic scenario. It emerged that the participants processed the verbal information more efficiently and made a greater effort to integrate it with the pictorial information when reading the text accompanied by an abstract, rather than a concrete illustration. Moreover, some indices of integrative processing during the second-pass reading, as revealed by the frequency of transitions (gaze shifts) from text to illustration and vice versa, correlated with learning outcomes (Mason et al., 2013c).

A recent eye-tracking study examined the strategies used by fifth and eighth graders when dealing with texts and pictures. It revealed that they serve different functions associated with different processing strategies. Texts seem to be used for coherence-oriented general processing. Pictures can act as scaffolds for initial mental model construction and then for task-driven selective processing when necessary to update mental models of specific items (Schnotz et al., 2014).

Particularly pertinent to the present investigation is the study carried out by Mason, Tornatora and Pluchino (2013d). Using multiple indices, they identified patterns of eye movements in 4th graders who learned new knowledge from a text and picture on the topic of air. Better learning performances were associated with the pattern characterized by longer total fixation time on the picture, and greater integrative processing of verbal and graphical information. It is worth noting that the authors have distinguished indices of first- and second-pass, but they have considered together both types of index when identifying patterns of visual behavior during reading. Therefore, they did not indicate which processing—immediate, delayed or both—was essential to reading outcomes.

The current study

To add to the existing literature, this open issue was addressed in the current study, examining the immediate and delayed effects of reading processing separately. Theoretically, we took into consideration the strategy proposed by Bartholomé and Bromme (2009) to promote the construction of an integrated coherent representation of text and graphics. Based on the cognitive processes envisioned in the Mayer (2009) and Schnotz and Bannert (2003) theoretical accounts, this strategy includes three steps of text and picture processing in which the latter is conceptually guided by the former. First, readers process the whole text to identify central concepts. Second, readers inspect the picture using text information to direct it in order to identify the visualizations of the central concepts of the text. This step also implies making correspondences between the verbal and graphical representations, shifting from one to the other. Third, readers continue relating the two types of representation and then focus on the verbal parts that are not depicted, since text and pictures can be mapped only partially.

Methodologically, we found eye tracking to be a very useful technique: initial reading or inspection can be separated from later re-processing. In this respect, the first step of the strategy mentioned above implies initial or first-pass reading, the second step implies initial or first-pass inspection and then re-processing or second-pass reading and inspection of verbal and graphical information, which continues during the third step.

The first pass-reading or inspecting is considered to reflect early processing. It is the summed duration of all fixations on a target region before exiting it. The second-pass reading or inspecting is the summed duration of fixations that return to the target region after its first-pass reading. The second-pass reading is considered to reflect delayed processing, which can indicate, on the one hand, the readers’ attempts to resolve comprehension difficulties during reading (Rayner, 2009) and, on the other, a more purposeful reading behavior than the first-pass (Hyönä, Lorch & Kaakinen, 2002; Hyönä, & Nurminen, 2006). Indices of second-pass reading can be further categorized on the basis of their destination and origin (see below the section on eye-movement measures). More light on the integrative processing of verbal and graphical information, especially whether it is the only type that predicts various forms of learning from an illustrated text, would have theoretical and practical significance.

We sought therefore to contribute to understanding which processing of text and graphics is associated with successful learning from science text in lower-secondary school, after controlling for some important individual differences. In this respect, we took into account that a large body of research on the comprehension of informational text has indicated that some individual characteristics affect reading outcomes. In this study we considered two crucial cognitive factors: reading comprehension and prior knowledge, and one motivational factor: self-concept.

Reading comprehension skills, by definition, are expected to be related to learning from text (e.g., Schellings, Aarnoutse & van Leeuwe, 2006). Skilled readers are more likely to comprehend a text at a deeper level, that is, the situation model level.

Another reader characteristic that can be easily conceived as influencing learning from text is prior knowledge of the topic (e.g., Kendeou & van den Broek, 2007; McNamara & Kintsch, 1996; Ozuru, Dempsey, & McNamara, 2009). Readers who bring high relevant knowledge to the reading process are more likely to gain the deepest level of comprehension than low-knowledge readers.

Why reading comprehension and prior knowledge should be considered when investigating learning from text is fairly evident, but the measurement of self-concept may need some clarification. Self-concept is defined as a person’s self-perceptions about her or his competence, which are formed through personal experiences and interpretations of one’s environment (Marsh, 1990). Self-concept involves the totality of one’s self-perceptions as well as the perceptions that one has in relation to specific areas or domains (Schunk & Pajares, 2005). In this study we considered the domain of science (science self-concept) since the instructional material regarded a scientific topic. We took into account reader characteristics in light of the research indicating that a domain-specific self-concept is closely related to performance and achievement in the domain, for example reading (Katzir, Lesaux, & Kim, 2009), science (Mason, Boscolo, Tornatora, & Ronconi, 2013a) and maths (Marsh, Trautwein, Lüdtke, Köller, & Baumert, 2006).

No prior study, as far as we know, has examined the contribution of eye fixations of first- and second-pass reading to various forms of learning independent of cognitive and motivational characteristics.

The following research questions and hypotheses guided the study:

  1. 1.

    What distinct eye-movement patterns of processing of verbal and graphical information emerge when considering various indices of the immediate first-pass reading and indices of the delayed second-pass reading?

  2. 2.

    Do only eye-movement patterns of integrative processing of text and graphics during the second-pass reading uniquely predict learning from text after controlling for individual characteristics, such as reading comprehension, prior knowledge, and self-concept?

For research question 1, we expected that during the first encounter with the learning material distinct processing patterns would emerge differing for fixation times on the central concepts of the text and their visualizations. Specifically, we expected that a more laborious processing pattern due to comprehension difficulties during text reading or picture inspection would result in a longer first-pass fixation time on the verbal and graphical parts of the main concepts. In contrast, we also expected that during the second-pass reading distinct patterns of ocular behavior would emerge characterized by relatively less and more transitions (gaze shifts) from the verbal to the graphical representations, and vice versa, and by shorter or longer re-fixation times on the picture while re-reading the text (look-from text to picture fixation time) and re-fixation times on the text while re-inspecting the picture (look-from picture to text fixation time). Look-from fixation times would reflect delayed processing of verbal and graphical information. The more strategic pattern of eye movements would be characterized by longer second-pass integrative processing of text and picture. It is worth noting that transitions from one representation to the other can also occur during the first-pass. However, we expected that only the more purposeful transitions during re-processing would differentiate readers’ ocular behavior during reading and inspecting.

Based on the available literature mentioned above, for research question 2, we hypothesized that only the second-pass integrative patterns of verbal and graphical information would uniquely predict reading outcomes over and above individual characteristics. In particular, we expected the predictability of deeper learning, as reflected in the transfer of knowledge. More than text retention or comprehension of factual knowledge, it would require stronger integration of the two types of information of the instructional material for constructing a high-quality mental representation. Eye-movement patterns of integrative processing as predictors of learning from text would emerge after controlling for cognitive and motivational factors, that is, reading comprehension, prior knowledge, and self-concept, which are all considered to be resources in text comprehension and learning.

Method

Participants

Forty-eight 7th graders were involved initially. They attended a public lower-secondary school in a north-eastern region of Italy and participated on a voluntary basis with parental consent. Because of poor eye calibration in 5 participants, we considered the data of 43 students (22 females), with a mean age of 12.8 years (SD = 8.3 months). All were native-born Italians with Italian as their first language and shared a homogeneous middle-class social background. All had normal or corrected-to-normal vision. Participants were involved in a pre-test and immediate post-test design.

Reading material

The illustrated text read by all participants regarded the food chain. This topic had not been previously presented in science classes attended by the participants. The text comprised 214 words (in Italian) and one picture (Fig. 1) and had been used in a previous study (Mason, Pluchino, & Tornatora, 2015).

Fig. 1
figure 1

The instructional material with text and picture regarding the food chain. Highlighted parts of the text and picture are the corresponding segments of the verbal and graphical representations. Reprinted from Contemporary Educational Psychology, vol. 41, L. Mason, M. C. Tornatora, and P. Pluchino, Eye-movement modeling of text and picture integration during reading: effects on processing and learning, pp. 172–187. Copyright 2015, with permission from Elsevier

Eye-movement measures

Eye movements were collected using a non-invasive eye tracker (Tobii T120) in the real school context. As an extension of existing research (Mason et al., 2013d), for eye-movement analyses, the text was divided into sentences (areas of interest, AOIs) taking into account whether the information provided was, or was not, visualised in the picture. More specifically, 5 sentences were considered as corresponding AOIs (i.e., areas of interest that contain the same information depicted in the illustration) and 7 sentences were considered as non-corresponding AOIs (i.e., areas of interest containing information about the food chain, but were not depicted in the illustration). The illustration was also divided into corresponding AOIs (areas that visualise text information) and non-corresponding AOIs (areas that do not visualise text information).

In the analysis of eye-movement data, we computed the frequencies of first-pass and second-pass transitions from the corresponding and non-corresponding text segments to the corresponding and non-corresponding picture segments and vice versa. These measures indicate how many times a reader’s gaze shifted from a given area of the verbal representation to a given area of the graphical representation, or from a given area of the latter to a given area of the former, during the first encounter with the reading material and during re-reading or re-inspecting, respectively. Transitions reflect the learner’s attempts to integrate words and pictorial elements (Johnson & Mayer, 2012).

We also focused on both the duration of the first- and second-pass fixation times (in milliseconds). For the first-pass, we considered the fixation time spent on the corresponding and non-corresponding AOIs of the text and picture summing the duration of all fixations on either type of AOI, during the first encounter with the learning material. For the second-pass, we considered the look-from fixation times. Look-from text to picture fixation time was computed for the corresponding and non-corresponding AOIs by summing the duration of all re-fixations that “took off” from a segment (AOI) of the text, either corresponding or non-corresponding, and “landed” on a corresponding segment (AOI) of the picture. Similarly, the look-from picture to text fixation time was computed by summing the durations of all re-fixations that “took off” from a segment of the picture, either corresponding or non-corresponding, and “landed” on a segment of the text, either corresponding or non-corresponding. Look-from measures offer an index of the extent to which a text segment is used as an “anchor” point for processing the picture segments, or a picture segment is used as an “anchor” point for processing text segments, which is essential for integrative processing.

As mentioned in the theoretical framework, it should be noted that for corresponding verbal and graphical segments we considered the sum of all transitions and looks-from all visualized text AOIs to picture AOIs and vice versa. In other words, when computing the transitions, we computed either a shift from the text AOI “producers” to the picture AOI “producers” or a shift from the text AOI “producers” to the picture AOI “first order consumers” and vice versa. To exemplify, when a student reads in the text about first-order consumers s/he may need to look at the depiction of second-order consumers to better understand the difference between the two orders, or to connect different but relevant segments of the two (verbal and graphical) representations. Therefore, a more global index may better reflect the integrative processing of verbal and graphical information.

All eye-tracking measures were transformed logarithmically because of the great variance in participants’ visual behavior that led to non-normal distributions.

Individual characteristics

Reading comprehension

This was measured using the Italian MT test for seventh grade (Cornoldi & Colpo, 1995). It consists in an expository text and 14 multiple-choice questions. The reliability of this instrument has been reported in the range of .73–.82 (Cronbach’s alpha). In the present study the reliability coefficient was =.74.

Prior knowledge of the scientific topic

Factual knowledge about the food chain was measured using nine questions, two open-ended and seven multiple choice that also required a justification for the chosen option (α = .73). Answers to the open-ended questions were awarded 0–2 points depending on their correctness and completeness. Answers to the multiple-choice questions were scored 1–2 only when a correct justification was given. Inter-rater reliability for coding the former and the latter, as measured by Cohen’s k, was .86.

Self-concept

Self-concept for the domain of science was measured using six items in a 4-point Likert-type scale (α = .75), already used in a previous study (Mason et al., 2013a). It was taken from the Self- Description Questionnaire (Marsh, 1990). Items were adapted for science (e.g. “I have always done well in science” and “I easily comprehend a text on scientific topics”).

Learning outcomes

Verbal recall

To measure text retention, participants were asked to write all that they remembered from the text, which included twenty-three information units. Recall protocols were coded according to the number of correct information units they reported. The two raters coded the recalls independently and their agreement, as measured by Cohen’s k, was .90.

Graphical recall

For retention assessment, participants were also asked to draw everything they could remember from the picture they observed. Graphical recalls were scored 0-2 depending on their correctness and completeness. The two raters coded the drawings independently and their agreement, as measured by Cohen’s k, was .96.

Factual knowledge

Participants’ text-based factual knowledge about the food chain at post-test was assessed using the same nine questions asked at the pretest, and were scored in the same way by the two independent raters. Inter-rater reliability, as measured by Cohen’s k, was .93. Cronbach’s reliability coefficient for these questions was .75.

Transfer of knowledge

Participants’ deeper learning from text was measured using a transfer task that reveals the ability to apply the newly learned knowledge. The task included eight questions, four open questions and four multiple-choice questions that also required justification for the chosen option (α = .77). Like questions about factual knowledge, answers to the open-ended questions were awarded 0–2 points depending on their correctness and completeness. Answers to the multiple-choice questions were scored 1–2 only when a correct justification was given. Inter-rater reliability for coding the justifications was .94, as measured by Cohen’s k.

Procedure

Data collection took place in two sessions. In the first, a classroom session, participants were collectively administered the self-concept questionnaire, the pre-test questions, and the reading comprehension test. This collective part took about 50–60 min. The second, an individual session, took place in a quiet room in the school. First, the eye tracker was calibrated for each participant. After calibration, the participant was instructed to read carefully and silently the illustrated text on the computer screen, as s/he would be asked to answer some questions. Participants read the material at their own pace while eye movements were recorded. They then performed the various post-tests. This session took 45–55 min.

Results

Research question 1: identifying eye-movement patterns during the first and second-pass reading

Patterns of first-pass reading

To answer research question 1, we focus first on eye movements during the immediate and more automatic first-pass reading. Comprehension difficulties during text reading usually imply a longer first-pass fixation time (Rayner et al., 2006). We considered eight indices of eye movements: (1) first-pass fixation time on corresponding text segments; (2) first-pass fixation on non-corresponding text segments; (3) first-pass fixation time on corresponding picture segments; (4) first-pass fixation time on non-corresponding picture segments; (5) first-pass transitions from corresponding text segments to corresponding picture segments; (6) first-pass transitions from non-corresponding text segments to corresponding picture segments; (7) first-pass transitions from corresponding picture segments to corresponding text segments; (8) first-pass transitions from non-corresponding picture segments to corresponding text segments.

A cluster analysis using the Ward method was performed with the eight eye-movement indices as the grouping variables to identify patterns of ocular behavior during the first reading. Ward’s hierarchical procedure is an agglomerative technique that groups data on the basis of their proximity to each other in multivariate space. It is therefore used to identify the underlying structure of data. The more meaningful and parsimonious solution emerging from the cluster analysis was a two-pattern solution. Table 1 reports means and standard deviations of the eye-movement indices for the two patterns according to the order of their identification using the clustering technique.

Table 1 Means and standard deviations of eye-tracking measures as a function of eye-movement patterns of first-pass reading

A MANOVA was carried out to statistically evaluate whether the two patterns differed for all the measures considered in the cluster analysis. It revealed a large main effect of type of cluster, Wilks’ Lambda = .21, F(8, 34) = 15.58, p < .001, η 2 p  = .78. Univariate tests showed significant differences only for four measures: first-pass fixation time on corresponding text segments, F(1, 41) = 58.13, MSE = 1.29, p < .001, η 2 p  = .58; first-pass fixation time on corresponding, F(1, 41) = 5.82, MSE = 1.28, p = .020, η 2 p  = .12, and non-corresponding picture segments, F(1, 41) = 13.33, MSE = 1.81, p = .001, η 2 p  = .24, and first-pass transitions from non-corresponding text segments to corresponding picture segments, F(1, 41) = 5.42, MSE = .05, p = .025, η 2 p  = .11. Readers characterized by pattern 1 attended more the text segments with the central concepts and their visualisations, and less the non-corresponding picture segments, than readers who showed pattern 2 during the first encounter with the learning material. It is worth noting that both patterns of first-pass processing were characterized by very few transitions from the verbal to the graphical representation and vice versa. Pattern 1, in particular, included readers who did not make any gaze shift from text to picture while they were reading the text for the first time.

Patterns of second-pass reading

To answer research question 1, we then focused on the delayed and more purposeful second-pass reading or re-processing of verbal and graphical representations. Eight indices of eye movements were used as mentioned above: (1) second-pass transitions and (2) look-from corresponding text segments to corresponding picture segments; (3) second-pass transitions and (4) look-from non-corresponding text segments to corresponding picture segments; (5) second-pass transitions and (6) look-from corresponding picture segments to corresponding text segments; (7) second-pass transitions and (8) look-from non-corresponding picture segments to corresponding text segments.

Another cluster analysis using the Ward method was performed with the eight eye-movement indices as the grouping variables. A two-pattern solution was again the more meaningful and parsimonious solution emerging from the cluster analysis. Table 2 reports means and standard deviations of the eye-movement indices for the two patterns according to the order of their identification using the clustering technique.

Table 2 Means and standard deviations of eye-tracking measures as a function of eye-movement patterns of integrative processing (second-pass reading)

A MANOVA was carried out to statistically evaluate whether the two patterns differed for all the measures considered in the cluster analysis. It revealed a large main effect of type of cluster, Wilks’ Lambda = .16, F(8, 34) = 21.80, p < .001, η 2 p  = .83. Univariate tests showed significant differences in favour of the pattern of stronger integrative processing for all eight fixation indices: (1) second-pass transitions from corresponding text segments to corresponding picture segments, F(1, 41) = 79.45, MSE = .35, p < .001, η 2 p  = .66; (2) second-pass transitions from non-corresponding text segments to corresponding picture segments, F(1, 41) = 14.95, MSE = .56, p < .001, η 2 p  = .27; (3) second-pass transitions from corresponding picture segments to corresponding text segments, F(1, 41) = 64.03, MSE = .49, p < .001, η 2 p  = .60; (4) second-pass transitions from non-corresponding picture segments to corresponding text segments, F(1, 41) = 12.93, MSE = 11.31, p < .001, η 2 p  = .30; (5) look-from corresponding text segments to corresponding picture segments, F(1, 41) = 39.49, MSE = 6.88, p < .001, η 2 p  = .49; (6) look-from non-corresponding text segments to corresponding picture segments, F(1, 41) = 32.90, MSE = 7.55, p < .001, η 2 p  = .44; (7) look-from corresponding picture segments to corresponding text segments, F(1, 41) = 64.32, MSE = 7.01, p < .001, η 2 p  = .61; (8) look-from non-corresponding picture segments to corresponding text segments, F(1, 41) = 12.99, MSE = 10.85, p = .001, η 2 p  = .24.

Research question 2: predicting learning from text by eye-movement patterns of integrative processing

To answer research question 2, we first carried out correlational analyses that examined the association of all dependent variables with the eye-movement patterns during the second-pass and first-pass readings. Table 3 displays the correlations between the variables. Regarding the second-pass reading—which is of primary concern in this study—all post-reading measures, except text-based factual knowledge, correlated positively and significantly with eye-movement patterns of integrative processing. The longer the students’ integrative processing of verbal and graphical information, the better their verbal recall, graphical recall, and transfer of knowledge. In addition, reading comprehension also correlated positively with all post-reading measures except verbal recall, whereas prior knowledge correlated positively with all except the graphical recall. Self-concept correlated positively with the verbal recall. Note, however, that none of the individual characteristics correlated with the eye-movement patterns of integrative processing.

Table 3 Zero-order correlations for all variables (N = 43)

Regarding the eye-movement patterns of the first-pass reading, correlation analyses revealed that they neither correlated significantly with the post-reading measures, nor with the individual characteristics.

Successively, to examine whether eye-movement patterns of integrative processing predicted the various outcomes of text reading after controlling for reading comprehension, prior knowledge, and self-concept, we carried out a hierarchical regression analysis for each dependent variable, that is, verbal recall, graphical recall, text-based factual knowledge and transfer of knowledge. Table 4 reports the scores for all post-reading outcomes.

Table 4 Means and standard deviations of scores for verbal and graphical recalls, factual knowledge, and transfer of knowledge as a function of eye-movement patterns of first- and second-pass reading

For each analysis, in the first step reading comprehension, prior knowledge, and self-concept were entered into the equation. In the second step, the dummy variables of eye-movement patterns of first- and second-pass were entered in all the analyses. Results of the regression analyses are reported separately for each post-reading outcome.

Verbal recall

The regression model was significant after entering reading comprehension, prior knowledge, and self-concept in the first step, R 2 = .19, F(3, 39) = 2.94, p = .045. However, none of these individual variables reached significance as a predictor of verbal recall. The addition of the eye-movement patterns in the second step resulted in a statistically significant increase in the explained variance, R 2 = .40, F change(2, 37) = 6.59, p = .004. Only the patterns of integrative processing during the second pass-reading (ß = .41, p < .01) predicted retention of text information. Table 5(a) summarizes the hierarchical regression analysis for verbal recall.

Table 5 Results of hierarchical regression analyses for variables predicting verbal recall, factual knowledge and transfer

Graphical recall

The regression model was not significant after entering reading comprehension, prior knowledge, and self-concept in the first step, R 2 = .14, F(3, 39) = 2.13, p = .111, although the first individual factor was a significant predictor of the pictorial reproduction (ß = .39, p < .05). The addition of the eye-movement patterns in the second step resulted in a statistically significant increase in the explained variance, R 2 = .28, F change (2, 37) = 3.73, p = .033. Only the patterns of integrative processing during the second-pass reading (ß = .32, p < .05) predicted the recall of graphical elements. Reading comprehension was also a predictor (ß = .43, p < .05). Table 5(b) summarizes the hierarchical regression analysis for graphical recall.

Text-based factual knowledge

The regression model was significant after entering the three individual factors in the first step, R 2 = .53, F(3, 39) = 14.54, p < .001. Both reading comprehension and prior knowledge were predictors of the acquisition of factual knowledge (ß = .44, p < .01 and ß = .40, p < .01, respectively). The addition of eye-movement patterns in the second step did not result in a statistically significant increase in the explained variance, R 2 = .54, F change < .1. Patterns of integrative processing did not predict this level of illustrated text comprehension. Table 5(c) summarizes the hierarchical regression analysis for factual knowledge.

Transfer of knowledge

The regression model was significant after entering reading comprehension, prior knowledge, and self-concept in the first step, R 2 = .25, F(3, 39) = 4.31, p = .010. Specifically, reading comprehension was a predictor of the deeper level of learning from text (ß = .39, p < .05). The addition of the eye-movement patterns in the second step resulted in a statistically significant increase in the explained variance, R 2 = .37, F change(2, 37) = 3.59, p = .037. Only the patterns of integrative processing during the second-pass (ß = .32, p < .05) again predicted learning from illustrated text. Reading comprehension was also a predictor (ß = .42, p < .05). Table 5(d) summarizes the hierarchical regression analysis for transfer of knowledge.

Discussion

This study sought to extend current research on processing of text and graphics that is associated with successful learning from science text in lower-secondary school, in two main ways. First, we distinguished between eye-movement patterns of immediate and more automatic first-pass reading from the eye-movement patterns of delayed and more purposeful second-pass reading. Second, we examined whether the latter uniquely predicted the off-line measures of reading, after controlling for important individual differences, to reveal the link between visual attention and learning from illustrated text more closely.

The first research question asked what distinct eye-movement patterns of processing of verbal and graphical information would emerge when considering various indices of the immediate first-pass reading and the delayed second-pass reading. As concerns the former, two eye-movement patterns were identified through a cluster analysis. Readers differed for the time spent on the visualized text segments and the overall picture during the first encounter with the learning material. As concerns the delayed processing, two patterns of eye movements also emerged. As expected, they differed for the extent to which the readers were involved in shifting from text to picture and from picture to text, and re-reading text segments while re-inspecting picture segments and re-inspecting picture segments while re-reading text segments. This re-processing reflects integration of verbal and graphical information, which occurred rarely during the first-pass in both patterns. Integrative re-processing has been indicated as more critical than the immediate processing in multimedia learning (Mason et al., 2013b, 2013d).

The second research question asked whether only readers’ eye-movement patterns of integrative processing would predict various post-reading outcomes after controlling for the individual characteristics of reading comprehension, prior knowledge, and self-concept. As expected, the results of the regression analyses showed that only eye-movement patterns of integrative processing characterizing the second-pass reading uniquely predicted the verbal and graphical recalls and deeper learning from text in the transfer task, after controlling for individual characteristics. More specifically, verbal recall was predicted only by eye-movement patterns after controlling for the latter. Graphical recall and transfer of knowledge were predicted by eye-movement patterns over and above reading comprehension. For all post-reading outcomes predicted by these patterns, the longer the students’ integrative processing of text and graphics during the second-pass reading, the higher their performances.

It should be pointed out that only one post-reading performance, the acquisition of text-based factual knowledge, was not predicted by the patterns of integrative processing. It is unclear why this measure—which required comprehension at the level of a locally and globally coherent representation of the propositions introduced in the text—was predicted only by participants’ reading proficiency and what they already knew about the topic. This issue needs further investigation. A possible interpretation is that the questions used to measure factual knowledge did not require particular integration of verbal and graphical elements.

It is worth noting that the eye-movement patterns of first-pass reading did not predict any outcome measure. This means that the immediate and more automatic processing of the instructional material contributed to neither less deep, nor to deeper learning from text.

In sum, the study provides further evidence of the multimedia principle (Mayer, 2009; Butcher, 2014), indicating that only the patterns of integrative processing of verbal and graphical information during the second-pass are associated with retention and transfer of knowledge. This outcome extends the findings of previous eye-tracking studies with older (Johnson & Mayer, 2012; Stalbovs et al., 2013) and younger students (Mason et al., 2013d), and to some extent indirectly, also the findings of outcome-oriented studies that designed instruction to sustain learning from text and graphics (Bartholomé & Bromme, 2009; Florax & Ploetzner, 2010; Schlag & Ploetzner, 2011).

Nevertheless, the present study also has limitations that should be taken into consideration when interpreting the findings. Similarly to almost all eye-tracking studies, which are particularly laborious, the sample size is modest and a larger one would be more optimal. In addition, because of technical constraints related to the use of the index of the look-from fixation time, a short text illustrated by one picture presented on only one screen was used. However, we can speculate that if the relevance of integrative processing emerged clearly for limited material, it could be even more critical when considering longer texts accompanied by multiple instructional pictures.

Conclusion and significance

Despite these limitations, the present study has theoretical significance as it not only confirms, but also extends previous investigations, providing evidence that deeper learning from an illustrated text is predicted only by integrative processing of verbal and graphical information in their corresponding and non-corresponding segments. This processing occurs during a delayed, less automatic and more purposeful allocation of visual attention when re-reading text parts while re-inspecting picture parts and vice versa.

The importance of reading behavior after the first encounter with the instructional material also underlines the educational significance of the study. In this regard, two implications can be drawn. First, teachers should believe that integrative processing is essential, even when brief or simple material is to be learned, in order to emphasize it to their students (Schroeder et al., 2011).

The second educational implication highlights the need for students to be metacognitively aware that pictures should not be disregarded or processed only superficially. One possible way to increase this metacognitive awareness is to show students the replays of their eye movements during reading (Mikkilä-Erdmann, Penttinen, Anto, & Olkinuora, 2008). Modern eye trackers not only provide unique information regarding perceptual and cognitive processes underlying learning performance, but they also make gaze replays available in videos. Low-integrator readers can observe the video of their ocular behavior and reflect upon how they allocated their visual attention on the instructional material. In this way they can be supported to create or refine metacognitive awareness that their ability to integrate text and picture makes a difference to learning outcomes.