Introduction

Within the context of multimedia learning a number of studies show that adding pictures to an instructional text, rather than presenting the text alone, facilitates learning and results in deeper understanding (multimedia effect, Mayer 2009, see also Carney and Levin 2002; Verdi and Kulhavy 2002). The benefit of presenting pictures along with text is explained by the idea that pictures and words are two qualitatively different systems for representing knowledge and can therefore complement each other (Paivio 1986). A picture, for example that explicitly depicts the spatial relations among components of a target system can present information which is verbally difficult to express (or would be lengthier to describe in words, Larkin and Simon 1987). The learner, however, can only make use of such complementary information when he or she establishes a referential connection between corresponding words and pictures. The process of constructing referential connections requires the learner to distinguish which word-based and image-based representations refer to the same referent.

Theories of multimedia learning (e.g., Mayer 2009; Schnotz and Bannert 2003) posit that establishing referential connections between corresponding text and pictures is a key process in understanding illustrated text and is therefore essential for successful learning. Thus, strategy instructions that encourage learners to construct referential connections should positively affect the students’ understanding. However, empirical evidence is inconsistent. Whereas some studies documented supportive effects, others did not. In the present study, we therefore adopted a more comprehensive approach and examined instructions that either encouraged students to construct referential connections or distracted them from constructing referential connections.

In studies that investigated the role of referential connections learners are instructed to create referential connections between text and pictures. These studies have produced inconsistent results (Bodemer and Faust 2006; Bodemer et al. 2005; Bodemer et al. 2004; Brünken et al. 2005; Seufert 2003; Seufert et al. 2007). Bodemer et al. (2004) asked students to learn how a tire pump works. In the first condition, students were instructed to mentally integrate words and pictures that were presented in a separated format. In the second condition, students were instructed to externally relate corresponding text and pictures by drag-and-drop operations on a computer screen. Results showed no significant differences between the conditions (see also Brünken et al. 2005, Experiment 2). However, in a second experiment, Bodemer et al. (2004) presented more complex materials, and the group that externally integrated the words and pictures outperformed the group that mentally integrated the words and pictures (see also Bodemer and Faust 2006, Experiment 2). Unfortunately, no baseline for this comparison was available because there was no control group whose participants received no strategy instructions. Therefore, we do not know whether or not learners who created referential connections would show better performance than learners who were not instructed to create referential connections.

Bodemer and Faust (2006, Experiment 1) addressed this question. They asked students to study a text and a picture that showed how a geothermic pump works. In one condition, students were instructed to mentally integrate the words and pictures; in another condition, students received no strategy instructions. The results showed no differences between these groups. Furthermore, these groups did not differ from a group that was asked to externally integrate the text and pictures by dragging the corresponding labels onto the pictures. These results indicate that students might have tried to create referential connections even though they were not explicitly instructed to do so, which is a possible explanation for failing to show positive effects of the strategy instruction.

Seufert et al. (2007) investigated whether inter-representational hyperlinks would help learners to construct referential connections when studying computer-based materials. These links helped learners to visualize the referential connections between the words and pictures. When a learner clicks on such a hyperlink, arrows point to the corresponding part of the picture and thus help learners to integrate verbal and pictorial representations. Seufert et al. (2007) included a control group that did not receive any help when studying the learning materials. In the first experiment, learning with the hyperlinks improved performance; in the second experiment with a more complex learning environment, it did not (see also Brünken et al. 2005, Experiment 3), and in the third experiment, only learners with higher prior knowledge benefitted from the hyperlinks (Seufert 2003).

In sum, the results provided no clear evidence that the instructions to integrate the text and the corresponding pictures were effective. Also, the aforementioned studies focused on learning with text and pictures and thus did not include a group that learned without pictures. Therefore, no conclusions could be drawn with regard to whether the text-picture combinations would lead to better performance than a text-only condition (multimedia effect).

Studies that compared the effects of text and corresponding pictures (presented in a separated format) to a control group that learned without pictures, demonstrate benefits for the text-picture conditions (Butcher 2006; Leopold et al. 2013; Schwamborn et al. 2010). As students in these studies benefitted from the text and corresponding pictures without being instructed to create referential connections, students must have processed not only the text but also the pictures. It is likely that students engaged in a process of creating referential connections between text and picture elements even though no explicit strategy instructions were given. This view is in line with Plötzner et al.'s (2013) assumption that creating referential connections is a strategic process that can be triggered by features of the learning materials (e.g., Florax and Plötzner 2010) even when no explicit strategy instruction is given. In sum, the review of the reported studies suggests that a methodical approach which contrasts instructions that encourage the construction of referential connections with control groups who learn without strategy instruction has a limited potential. The main reason is that it cannot be ruled out that students in the control groups created referential connections without being explicitly instructed to do so.

Therefore, we adopted a more comprehensive approach and contrasted experimental conditions in which (a) no strategy instructions were given, (b) learners were encouraged to construct referential connections, and (c) learners’ attention was distracted from forming referential connections. To provide an adequate baseline, a control group received the text without pictures.

We expected the groups who learned with illustrated text but no particular strategy to show better performance than the control group (multimedia effect). We further predicted that strategy instructions that distracted students from creating referential connections would affect the multimedia effect: The group who learned with illustrated text but no particular strategy would show better performance than the group who was distracted from constructing referential connections. The group who was asked to construct referential connections should show better performance than the group who was distracted from creating referential connections.

Experiment 1

We asked high-school students to read a scientific text with the goal of understanding its contents and tested them for their comprehension and transfer performance. We varied the availability of pictures and the instructions about how to relate the text and pictures so that four experimental conditions resulted.

  • The text-picture group was presented with a scientific text with pictures illustrating its contents. Participants in this group did not receive any explicit strategy instructions to create referential connections but were instructed only to read the text and view the pictures to understand the contents.

  • The text-only group was given the same text without any pictures. These participants received no explicit strategy instructions but were asked to read the text to understand its contents.

    These two conditions were included to determine whether the material was suitable for generating a multimedia effect. In this case, the text-picture group was expected to perform better than the text-only group. The next two conditions were included to investigate effects of explicit strategy instructions.

  • The integration group received the text with pictures (same materials as the text-picture group) and was asked to read the text and view the pictures to understand its contents.

    Furthermore, these students were instructed to actively create referential connections. To this end, the participants were asked to identify the important concepts for each paragraph and to write them right by the corresponding parts of the associated picture.

  • The separation group was also given the text with pictures and was asked to read the text and view the pictures to understand its contents. Furthermore students were asked to identify the important concepts—as were the participants in the integration group. However, in the separation group, participants were not instructed to write these concepts right by the corresponding parts of the associated picture, but were asked to write them beside the picture, each concept below the previous one. Thus, the participants identified important concepts but were not required to create referential connections between these concepts and the corresponding pictures but to list these words sequentially.

We expected learners in the text-picture group to outperform the text-only group (Hypothesis 1), thus replicating the multimedia effect. The text-picture and text-only groups differed only with regard to the availability of pictures but not with regard to instructions. A difference in performance would therefore indicate the relevance of the pictures.

The text-picture group was expected to outperform the separation group because students in the separation group were distracted from creating referential connections, whereas the text-picture group was not (Hypothesis 2).

We expected the integration group to perform better than the separation group (Hypothesis 3). Explicitly prompting the construction of referential connections should generate a greater number of accurate representations of referential connections than the separation instructions that suggested bypassing referential connection construction.

Performance differences in line with Hypotheses 2 and 3 would indicate effects of strategy instructions because students in all three experimental conditions received identical learning materials (text with corresponding pictures) and differed only with regard to the particular strategy instructions.

The available research does not allow a clear prediction of whether the integration group would outperform the text-picture group because students may create referential connections without strategy instructions.

Method

Participants and design

One hundred twelve German high-school students (Grade 11) from two schools (one of the schools had a vocational focus) participated in the experiment. The study was conducted during the students’ regular lessons. Within their classes, students were randomly assigned to the experimental groups. Their mean age was 18.3 years (SD = 1.05), and the percentage of female students was 52.7 %. The data of four students were excluded from further analysis because two did not follow the instructions, and the other two did not receive the correct materials. Thus, data from 108 participants were analyzed.

Materials

The learning and testing materials consisted of (a) a science text and instructions prepared according to the experimental conditions, (b) a multiple-choice test for assessing prior knowledge, (c) a multiple-choice test for assessing comprehension after studying the text, (d) a transfer test with open questions requiring deep understanding and the application of knowledge acquired from the text to novel problems, (e) a referential connections test for assessing whether students could represent corresponding concepts pictorially, (f) a self-report questionnaire, and (g) two standardized tests for measuring verbal ability and spatial ability as control variables.

The science text about water molecules (1,370 words) explained six central topics that comprised (a) the chemical structure of water molecules, (b) the dipole-character of water molecules, (c) hydrogen bonds, (d) the hydration process, (e) surface tension, (f) the density anomaly of water. The text consisted of a short introduction and 13 paragraphs with one picture each. In three of the six main topics processes were described (e.g., hydration process) which required explanations of more than one paragraph. This text is a shortened version of the text used by Leopold and Leutner (2012) and Leutner et al. (2009). The text-picture group was given the text along with illustrations that were placed above each paragraph (see Fig. 1, upper picture). The integration group was given the same materials; however, the participants were instructed to write important concepts from the paragraph by the parts of the corresponding picture (for an example of a participant’s answer, see Fig. 1, middle). The separation group was given the same materials, but the participants were instructed to write important concepts beside each picture (for an example of a participant’s answer, see Fig. 1, lower picture). The text-only group was given the text without any pictures.

Fig. 1
figure 1

A sample of the learning materials (with participants’ annotations) presented to the text-picture group (upper picture), the integration group (middle picture), and the separation group (lower picture) in Experiment 1. The corresponding text paragraph was presented below each picture

Prior knowledge was assessed by using 10 multiple-choice questions related to important concepts explained in the text (e.g., “Why is the water molecule called a dipole molecule?”) with four response alternatives per question. From these alternatives, one, two, three, or all four items could be correct. To prevent students from gaining points by guessing, the numbers of correct and incorrect responses were balanced across the whole test (20 items were correct and 20 were false. Scores were calculated by awarding 1 point for selecting a correct option and 1 point for not selecting an incorrect option. Therefore, the maximum score of points that could be achieved was 40.

Comprehension was assessed by two different tests: a multiple-choice comprehension test and a transfer test with open problem-oriented questions. Correctly answering the comprehension items required linking information from different sentences of the science text and forming a mental model of the structure of water molecules. The item format resembled the format of the prior knowledge test, but the items on the two tests were different. There were 18 questions with four possible responses to each question, resulting in a maximum score of 72 points (72 items; Cronbach’s alpha = .72). The numbers of correct and incorrect responses were balanced across the whole test. A sample question is: “What is the chemical basis of a hydrogen bond?” with answer choices given as: “(a) the polar nature of water molecules, (b) attraction forces between electrons, (c) attraction forces between ions, or (d) the polar covalent bond of the water molecule” (see also Leopold and Leutner 2012).

The transfer test consisted of four open-ended questions that were based on Mayer’s (2009) example to assess deep understanding of the learning contents (Cronbach’s alpha = .70). Learners were asked to explain and solve problems that were not explicitly given in the text. An example is the question: “Seawater in polar areas could be colder than 0 °C without freezing. How would you explain this fact?” (Leopold and Leutner 2012). To score each question, a checklist presented three key ideas that should appear in the answer. Each answer was thus scored with a maximum of 3 points, resulting in a maximum score of 12 points for the whole test. The answers were scored by two raters with an acceptable inter-rater agreement of kappa = .88. Disagreements were settled by consensus.

The referential connections test was constructed to assess whether students had drawn connections between important text concepts and their visual-spatial counterparts. We handed each student a sheet of paper with nine key concepts (e.g., hydrogen bonds, dipole moment of the water molecule) and asked the student to draw a picture that represented the corresponding concept (Cronbach’s alpha = .77). The participants were informed that sketching the important components of the construct and their interrelations would be better than drawing aesthetically appealing pictures. The accuracy of the drawings was analyzed with respect to nine expert reference visualizations (e.g., Hall et al. 1997; Van Meter 2001) adapted from Leopold and Leutner (2012). An accurate drawing was given 2 points, a partly accurate drawing was given 1 point, and an unacceptable drawing received 0 points so that the maximum score of points was 18. The drawings were scored by two raters with high inter-rater agreement (kappa = .98). Disagreements were settled by consensus.

A self-report scale was developed to measure whether students had created referential connections between corresponding words and pictures when studying the text. After reading the science text, students were asked to rate the extent to which they invested effort in creating referential connections by five items using a 4-point scale ranging from 1 (completely agree) to 4 (completely disagree; Cronbach’s alpha = .88). The questionnaire began with the question: “How did you proceed in studying the text?” Sample items are: “I related concepts of the text to corresponding parts of the picture” or “I thought about which piece of information from the text was shown in the picture.” In addition, distractor items were inserted between the referential connections items (“I tried to remember as much information as possible”). Students in the text-only group were given only the distractor items and were not asked to answer the referential connections questions because they received the text without pictures and therefore would not understand the referential connections items.

Verbal ability was measured with the word fluency scale from a standard intelligence test (Heller and Perleth 2000) and used as a control variable. Empirical findings showed that the benefits of adding pictures to instructional text also varied with students’ spatial abilities (Höffler 2010; Mayer and Sims 1994). Therefore, spatial ability was included as an additional control variable and measured with the paper-folding test (Ekstrom et al. 1976).

Procedure

Five classes of students from two different schools were randomly assigned to the four treatment groups. Students were assigned to their conditions within their classes by drawing lots. Then students in each of the four groups were led by their instructor to their room. Afterwards, they were asked to take the pretest to assess their prior knowledge (4 min). Thereafter, the students received the science text that was prepared according to the particular treatment condition. Participants were given 35 min to study the text for comprehension. Each student received an instruction sheet with an example, and the experimenter explained the particular instructions. Students in the text-picture group were instructed to read the text and view the pictures. Students in the integration group were instructed to read the text, to select important concepts from the paragraph, and to write them right by the corresponding components of the appropriate picture. Students in the separation group were instructed to read the text, to select important concepts from the paragraph, and to write them beside the picture. Students in the text-only group received the text without pictures and were instructed to read and understand the text. All students were aware that they would be tested on their understanding. Thereafter, students were given the self-report questionnaire to be completed at their own pace. Then they took the verbal (7 min) and spatial (3 min) ability tests. The participants were then given 10 min to answer the comprehension test, 15 min to answer the transfer test, and finally, 10 min to answer the referential connections test. The participants were informed that they would receive individual feedback on their results if they wished. To this end, students were asked to memorize or jot down an individual password.

Results

Before testing the hypotheses, we analyzed whether the four experimental groups differed in verbal ability, spatial ability, and prior knowledge. No between-groups differences were found (all F-values < 1.07). When these variables were used as covariates in the analyses reported below, none of the effects changed. For simplicity, we report only the ANOVA results. The alpha level for this and the following analyses was set at α = .05.

Transfer and comprehension performance

The means and standard deviations of students’ performance on the transfer and comprehension tests are shown in Table 1. To examine differences in transfer and comprehension scores across the four experimental groups, we calculated ANOVAs with treatment as the between-groups factor. Results revealed an overall effect of the treatment on transfer performance, F(3, 103) = 3.41, MSE = 4.25, p = .020, η2 = .09, and on comprehension performance, F(3, 104) = 2.99, MSE = 45.28, p = .034, η2 = .08. In line with Hypothesis 1, planned comparisons (Rosenthal and Rosnow 1985) showed that the text-picture group outperformed the text-only group on the transfer test, t(103) = 2.05, p = .022, d = 0.58, and the comprehension test, t(104) = 2.79, p = .003, d = 0.73, demonstrating that our experimental materials yielded a multimedia effect. Also, the text-picture group performed better than the separation group on transfer, t(103) = 3.11, p = .001, d = 0.79, and comprehension, t(104) = 1.91, p = . 037, d = 0.47, as predicted by Hypothesis 2. The integration group performed better than the separation group on the transfer test, t(103) = 1.70, p = .047, d = 0.47 (Hypothesis 3), but not on the comprehension test, t(104) = 1.09, p > .05. The integration group did not perform better than the text-picture group on the transfer test, t(103) = 1.33, p > .05, and comprehension test, t(104) < 1.

Table 1 Experiment 1: means of the dependent variables (standard deviations in parentheses)

Referential connections test and self-report data

An ANOVA revealed an overall effect of the treatment on the referential connections test scores, F(3, 104) = 15.83, MSE = 12.29, p < .001, η2 = .31 (see means and standard deviations in Table 2). Planned contrasts showed that the text-picture group obtained higher scores than the text-only group, t(104) = 5.56, p < .001, d = 1.73, a finding that is consistent with Hypothesis 1. Likewise, the text-picture group outperformed the separation group, t(104) = 2.55, p = .006, d = 0.60 (Hypothesis 2). As predicted by Hypothesis 3, the integration group generated a greater number of accurate referential connections than the separation group, t(104) = 3.25, p = .001, d = 0.88. Performances did not differ between the text-picture and integration groups, t(104) < 1.

Table 2 Experiment 2: means of the dependent variables (standard deviations in parentheses)

These results were compatible with our hypothesis that students in both the text-picture and integration groups would form a greater number of accurate referential connections than students who were instructed to write important concepts beside each picture (separation group). The data from the self-report questionnaire pointed in the same direction. Students in the text-picture group, t(76) = 2.25, p = .014, d = 0.56, and the integration group, t(76) = 1.85, p = .035, d = 0.51, reported that they created referential connections more often than the participants in the separation group did. Students in the text-picture group and the integration group did not differ from each other, t(76) < 1.

Exploratory post hoc analyses

The above analyses demonstrated that the groups that performed better on the referential connections test also had higher scores on the comprehension and transfer tests. To explore the consistency of these relations, we computed the correlations between the referential connections test scores and the learning performance measures and conducted two mediator analyses.

Correlations between referential connections test and learning performance

We found strong positive correlations between the accuracy of the construction of referential connections and performance on both outcome tests. The correlations between the referential connection test scores and the learning performance scores were r = .59, p < .001, for the comprehension test, and r = .60, p < .001, for the transfer test. The correlation coefficients did not differ between the four treatment groups: χ 2 = 2.79, p = .426 for the transfer and χ 2 = 6.94, p = .074 for the comprehension test.

Mediator analysis

We further conducted two mediator analyses to test whether the difference in transfer performance between (a) the text-picture group and the separation group and (b) the integration group and the separation group would be mediated by the students’ performance on the referential connections test. One advantage of a mediation analysis is that it can help to elucidate the critical components of an experimental manipulation (MacKinnon et al. 2000). We were interested in determining whether the students’ performance on the referential connection test would be a critical component that would be affected by the different strategy instructions and would in turn affect transfer performance.

Text-picture versus separation group

To test whether the variable group (text-picture group vs. separation group) would have an indirect effect on transfer performance through the mediating variable referential connections performance, we applied the product of coefficients approach proposed by Sobel (1982) and MacKinnon (2008; MacKinnon et al. 2002). The advantage of this approach is that it is based on a quantification of the indirect effect—the product of its constituent paths (Hayes, 2009). First, we tested whether group affected the mediating variable referential connection performance (path a): a = 2.40 (α = .29), SE = 1.09, p = .032 (unstandardized regression coefficients are the preferred metric according to Hayes 2013; standardized coefficients are reported in brackets). Second, we tested whether “referential connections performance” affected transfer performance while controlling for the effect of group (path b): b = .34 (β = .61), SE = .06, p < .001. The product of the two regression parameters ab = .82 (αβ = 0.18), equals the indirect effect. The Sobel test demonstrated that the indirect effect of group on transfer performance through referential connections performance was significant, zab = 2.06, p = .039. Thus, referential connections performance mediated the effect of group (i.e. separation instructions vs. text-picture with no strategy instructions) on transfer performance.

Integration versus separation group

To test whether group (integration group vs. separation group) would have an indirect effect on transfer performance through the mediating variable “referential connections performance,” we first computed the effect of group on the mediating variable “referential connections performance” (path a): a = 3.13 (α = .41), SE = .98, p = .002. Second, we tested whether “referential connections performance” affected transfer performance while controlling for the effect of group (path b): b = .38 (β = .71), SE = .06, p < .001. The Sobel test demonstrated that the indirect effect (ab = 1.19, αβ = 0.29) was significant, zab = 2.85, p = .004. Thus, referential connections performance mediated the effect of the integration instructions versus separation instructions on transfer performance.

Discussion

The pattern of results was consistent with the view that presenting pictures along with expository text can enhance comprehension compared to reading the text without pictures (multimedia effect, Hypothesis 1). The text-picture group performed better than the text-only group on the comprehension test, the transfer test, and the referential connections test. This pattern of results documents that the materials used in this experiment were designed such that comprehension and transfer performance benefitted from processing pictures along with the text.

The separation instruction neutralized the multimedia effect. The separation group did not benefit from the availability of pictures and performed worse than the text-picture and integration groups on transfer test performance (Hypotheses 2 and 3). These results indicate that even slight differences in strategy instructions affect learning from text and pictures: The instructions to write important concepts beside the picture in comparison to instructions to write the concepts by specific parts of the picture had detrimental effects.

By contrast, the explicit instructions to create referential connections did not lead to higher performance on the referential connections test, the comprehension test, and the transfer test compared to the performance of the text-picture group. These results indicate that students in the text-picture group did not need explicit instructions to create referential connections. The participants’ self-reports nicely reflected the students’ results on the performance measures as the text-picture and integration groups reported constructing more referential connections than the separation group and did not differ from each other.

Consistent with our assumption that referential connections play an important role in building mental models from text and pictures, performance (most consistently on the transfer tasks) was higher in groups that constructed more accurate referential connections. Furthermore, mediation analyses consistently indicated that strategy instructions that distract students from creating referential connections compared to no explicit strategy instructions or explicit integration instructions affected students’ performance on the referential connections test, which in turn affected transfer performance and the multimedia effect.

Before discussing potential conclusions from these findings, we tested whether the effects of different strategy instructions could be replicated. Therefore, we conducted a second experiment with the same strategy instructions (creating or not creating referential connections) but with a different comprehension strategy.

Experiment 2

The purpose of the second experiment was to examine whether the effects of strategy instructions on comprehension and transfer obtained in Experiment 1 could be replicated with a different comprehension strategy. Instead of asking students to identify important concepts in the text, the participants in the second experiment were instructed to summarize the text’s paragraphs. More specifically, we asked students to write summary sentences related to each text paragraph in their own words. In contrast to the important concepts strategy used in Experiment 1, the summary strategy required the learners not only to select important concepts but also to understand the relevant relations between them and to formulate these relations adequately. Hegarty and Just (1993), for example, provided evidence that students glance at a picture presented beside a text to encode relations between components rather than to focus on individual components only.

The variations in how the summary strategy was implemented were the same as in the first experiment. The text-picture group received no specific instructions. The integration group was instructed to write summary sentences by the parts of the picture that represented the processes described by these sentences (see Fig. 2, middle picture). The separation group was asked to write summary sentences beside the picture, each sentence under the preceding one (see Fig. 2, lower picture). Thus, in contrast to the integration condition, the separation instructions did not require participants to establish a reference between a sentence and the corresponding picture element(s). Although a different strategy was used, its implementation was expected to affect comprehension in the same manner as in the first experiment. Consequently, the text-picture group and the integration group were expected to outperform the separation group with regard to comprehension and transfer (corresponding to Hypotheses 2 and 3 in Experiment 1). A text-only group was not included because, for the given experimental materials, the multimedia effect had already been demonstrated in Experiment 1, and our focus was to test the effects of different strategy instructions on learning from text and pictures.

Fig. 2
figure 2

A sample of the learning materials presented to the text-picture group (upper picture), the integration group (middle picture), and the separation group (lower picture) in Experiment 2. The corresponding text paragraph was presented below each picture

Method

Participants and design

Fifty-five German high-school students (Grade 10) participated in the experiment. The study was conducted during a regular chemistry lesson. The participants’ mean age was 15.9 years (SD = 0.45), and the percentage of female students was 36.4 %. Within classes, students were randomly assigned to one of the three experimental groups. Twenty students were in the text-picture group, 18 in the separation group, and 17 in the integration group. As in Experiment 1, the students were informed that they would receive individual feedback if they wished.

Materials

The materials for Experiment 2 were identical to those used in Experiment 1 except that the instructions for the integration and separation groups were adapted to the summary strategy. All groups received the text along with illustrations that were placed above each paragraph.

Procedure

The procedure was similar to that used in Experiment 1. Students in the text-picture group were instructed to read the text and view the pictures to understand the contents (see Fig. 2, upper picture). Students in the integration group were instructed to read the text, formulate summary sentences, and write them by the corresponding components of the picture (see Fig. 2, middle picture). Students in the separation group were instructed to read the text, formulate summary sentences, and write them beside the picture, each one under the previous one (see Fig. 2, lower picture). Afterwards, students filled out the self-report questionnaire and took the paper-folding test, the comprehension test, and the transfer test. Due to time constraints in one of the participating schools, we were unable to collect the data for the referential connections test (the final test). Therefore, we restricted our analyses to the learning performance measures. Footnote 1

Results

First, we analyzed whether the three experimental groups differed in verbal ability, spatial ability, and prior knowledge. No between-groups differences were found (all F-values < 1.0).

Transfer and comprehension performance

The means and standard deviations of students’ performance on the transfer and comprehension tests are shown in Table 2. Results showed a significant overall effect of the treatment on transfer performance, F(2, 51) = 3.40, MSE = 4.92, p = .041, η2 = .12, but not on comprehension performance, F(2, 52) = 1.97, MSE = 49.09, p = .149, η2 = .07.

Contrast analyses showed that the text-picture group outperformed the separation group on the transfer test, t(51) = 2.23, p = .015, d = 0.68, but not on the comprehension test, t(52) = 1.43, p = .082. The integration group outperformed the separation group on transfer, t(51) = 2.32, p = .012, d = 0.78, and comprehension scores, t(52) = 1.92, p = .030, d = 0.72. The integration group did not outperform the text-picture group on the transfer and comprehension tests, both ts(51) < 1. We would like to note that the planned comparisons on the comprehension scores should be interpreted cautiously because the main effect on the comprehension scores was not significant. We nevertheless reported these differences because the size of the effect of the treatment on comprehension was substantial.

Self-report data

The treatment had a significant effect on the participants’ self-reports, F(2, 52) = 3.47, MSE = 0.25, p = .039, η2 = .12. The results pointed in the same direction as the learning performance scores. Students in the integration and text-picture groups reported creating referential connections more often than the participants in the separation group did, t(52) = 2.14, p = .016, d = 0.70, and t(52) = 2.62, p < .006, d = 0.82, respectively. The text-picture and integration groups did not differ from each other, t(52) < 1 (see Table 2 for means and standard deviations).

Discussion

The results of the second experiment strongly resembled the pattern of results found in the first experiment even though a summarization strategy was used instead of the important concepts strategy. Asking students to form summary sentences and to write them by the corresponding parts of the picture (integration group) was more beneficial in terms of comprehension and transfer performance than writing summary sentences beside the picture (separation group). Furthermore, students who were not explicitly instructed to integrate text and pictures (text-picture group) also outperformed the separation group on transfer performance. Thus, whereas the explicit integration instructions did not increase performance beyond that of the text-picture group, the separation instructions decreased performance. The most probable reason is that the separation group constructed fewer accurate referential connections between text and picture elements as shown by the self-report data.

One limitation of the results is that the overall effect of the treatment on comprehension performance did not exceed the significance level as it did in Experiment 1. This can be explained by the lower statistical power in Experiment 2 due to the smaller sample. The corresponding effect size (η2 = .07), however, resembled the one found in Experiment 1 (η2 = .08).

General discussion

The purpose of the present experiments was to investigate the effects of strategy instructions on learning from text and pictures. To this end, we varied strategy instructions to encourage learners to form referential connections between words and corresponding pictures or to distract them from creating referential connections while all other features of the strategy were held constant. First, we tested whether our materials yielded a multimedia effect (i.e., the advantage of presenting text and pictures versus text alone). Our results showed that the multimedia effect was replicated with learning materials that explained the structure of water molecules and their chemical bonds. Second, our results demonstrated that the multimedia effect was affected by strategy instructions to separate text and pictures. Learning performance was impaired when the students were asked to separate important concepts (Experiment 1) or sentences (Experiment 2) from their pictorial referents compared to a condition in which students were asked to integrate important concepts or sentences with their pictorial referents or when they received no specific strategy instructions. In both experiments, the negative effect of the separation instructions was more strongly reflected on transfer tests than on comprehension tests, indicating that these effects are more likely to be revealed on assessments that measure deeper understanding.

The negative effect of the separation condition shows certain similarities to the findings of Schwartz and Kulhavy (1981) who investigated students’ performance on a listening task. The authors asked students to listen to a story about a fictitious island and investigated, among others, the following two conditions: The “map group” saw a map of the island with its features spatially arranged on the map. The “list group” saw a map where features were listed on the right hand side of the page one below the other. All students were instructed to relate the contents of the story to the map. The results showed that the map group outperformed the list group on a free-and-cued-recall test. These findings as well as our findings are in line with theoretical models that assign the integration of word-based and image-based representations an important role in multimedia learning and in the development of deep understanding (Bodemer et al. 2004; Kulhavy et al. 1994; Mayer 2009).

Why did the text-picture group perform better than the separation group? In contrast to the separation and integration groups, the text-picture group was not given any specific instructions for how to study the text. The text-picture group was not prompted to process the text in a particular way—only the goal to read the text and view the pictures to understand the contents was communicated. The students in this group seemed to process the text in a manner similar to the one used by students in the integration group. This interpretation is substantiated by similar patterns of results in these two groups. The text-picture and integration groups (a) showed better comprehension and transfer performance than the separation group, (b) reported higher subjective ratings of mental integration than the students in the separation group, and (c) created more referential connections than the separation group (Experiment 1). Furthermore, the students in the integration and text-picture groups did not differ in their self-reported mental integration, accuracy of referential connections, and comprehension and transfer performance. Therefore, the explanation for why the integration group outperformed the separation group can also be applied to answer why the text-picture group outperformed the separation group, with the exception that the text-picture group learned without explicit strategy instructions.

In general, the reported results are in line with other research that has shown that explicit instructions to create referential connections do not necessarily enhance learning and understanding (Bartholomé and Bromme 2009; Bodemer et al. 2004, 2005; Brünken et al. 2005; Seufert 2003; Seufert et al. 2007). Bodemer et al. (2004, Experiment 1) results, for example, indicated that students who were explicitly instructed to label pictures with corresponding text components on a computer screen did not perform better on a comprehension test than students who were instructed to mentally relate pictures and corresponding text elements. Brünken et al.’ (2005; Experiment 2) results indicated that students did not depend on instructions to mentally relate text and pictures because a group that did not receive any instructions on how to study the learning materials performed as well as a group that was instructed to label pictures with elements from the text. Related results by Bartholomé and Bromme (2009) showed that student performance when learning from text-picture arrangements was unrelated to whether or not prompts to integrate the text and pictures were presented. Whereas these experiments did not directly assess the referential connections created by the participants, performance on a comprehension test and a transfer test in the present research converged with performance on a test that explicitly assessed which referential connections were formed and how accurate they were. Therefore, the relation between the accuracy of referential connections and outcome variables such as comprehension and transfer performance could be directly assessed.

The impact of the current research is related to the following three aspects. First, these experiments are, as far as we know, the only ones that have tested whether fostering and hindering the construction of referential connections affects learning from text and pictures. Our results show that in a situation in which many factors favor the multimedia effect (otherwise the text-picture group would not have performed better than the text-only group in Experiment 1), the multimedia effect could be reduced to zero by a simple and small difference in the instructions. The effect of this small difference demonstrates that although the multimedia effect is a stable phenomenon, it can be reduced when instructions distract the students from creating referential connections. The experimental manipulation that distracted learners from constructing referential connections was unobtrusive. The learners could simply write down the important concepts without being required to establish relations to picture elements. It is likely that by selecting and writing down important concepts learners were more prompted towards processing the text than towards processing the picture and their referents. This is of theoretical relevance as it indicates the importance of considering the effects of strategy instructions in multimedia learning (Plötzner et al. 2013).

Second, our experiments show that the effects of instructions are not unique to one specific comprehension strategy as they held for two different strategies (selecting important concepts and summarization) when they were implemented in a similar way. The pattern of results was very similar across the two experiments, indicating that the effects depend on the role of referential connections rather than on the particular comprehension strategy.

Third, the two experiments emphasize that diminishing the multimedia effect is based on how the strategy is implemented. Both strategies (important concepts and summarization) could be implemented in ways that hindered or did not hinder the multimedia effect. These results represent additional evidence for the idea that the question of which strategy would be superior is often less relevant than the question of how a particular strategy is implemented in the learning process.

Limitations and future directions

The finding that explicit instructions to construct referential connections did not enhance performance beyond that of the text-picture group suggests further questions beyond the scope of the present study. Forming referential connections is not an unconditional process. Previous research has indicated that this process can fail. One factor affecting its success may be the complexity of the material (Bodemer et al. 2004, Experiment 2). A second factor relates to how the learning materials are presented. For example, text and pictures may be presented in a way that makes it easier or harder to relate them (Florax and Plötzner 2010). Explicit instructions to create referential connections may be effective when the learning materials make it harder for learners to create referential connections. Results concerning the spatial contiguity principle (Mayer 2009) and the split attention principle (Ayres and Sweller 2005; Chandler and Sweller 1991, 1992; Mayer et al. 1995) point in this direction.

A third factor may be the learner’s experience with these kinds of learning materials. In the present experiments, homogeneous samples of students with limited age ranges were used. These students may have already gained some experience in how to learn from text and pictures. Therefore, they may have been aware that relating text and pictures to one another improves understanding even when they were not prompted to do so. In other words, although our results demonstrate that externally prompting or instructing readers is not a necessary condition for forming referential connections between text and picture elements, further research is required in order to explore the conditions that support or hinder this process. One possibility to explore this process is to implement manipulations that vary in the degree to which they disrupt students from creating referential connections or even hinder students to create referential connections at all. These designs have the potential to reveal the mechanisms by which referential connections affect the multimedia effect.

Conclusions

The reported results demonstrate the specific role of strategy instructions in multimedia learning. Our results indicate that slight variations in strategy instructions are an important factor that affects the quality of referential connections (Experiment 1) and comprehension and transfer (Experiment 1 and 2). The instruction to write important concepts beside an illustration or right by the respective pictorial counterpart of the illustration evoked a significant difference in the students’ mental representations of the content matter and affected their learning performance. The practical contribution of these results is to help practitioners to be aware of whether learning strategies encourage students to process the text in itself or to process the referential content by creating relations between concepts and their pictorial referents.