Introduction

Combinations of texts and pictures are used in many digital and analog learning materials (Mayer 1993). The types of texts and pictures employed, however, strongly differ from one another. This paper focuses primarily on combinations of instructional texts (Große 1974) and pictures that serve a construction function (Weidenmann 1994). According to Weidenmann (1994), pictures with construction function support the construction of knowledge about certain facts or processes. The individual picture elements are usually already known to the learners and help to clarify a process in pictorial form.

Texts are commonly supplemented with pictures with the intent of having a beneficial effect on learning (e.g., Mayer 2001). However, learners do not always process the text-picture combinations appropriately. They often have difficulty understanding the pictures or relating text and picture information to each other (e.g., Brünken et al. 2005; Levie and Lentz 1982; Weidenmann 1991). How can learners be encouraged to process texts and pictures in an appropriate manner?

Current research on learning with pictures in texts focuses primarily on the question of how learning can be improved through the design of the texts and the pictures. Examples of design measures are the color coding of texts and pictures (e.g., Mayer 2005), the labeling of pictures (e.g., Mayer 2005) and the spatial integration of texts and pictures (e.g., Ayres and Sweller 2005). It has been found, however, that such measures do not always guarantee successful learning (e.g., Florax and Ploetzner 2010) and that the design of the material alone does not necessarily lead to an active processing of the representations by the learner (e.g., Bartholomé and Bromme 2006; Dean and Kulhavy 1981; Schnotz and Rasch 2005; van Nimwegen et al. 2006). An active processing is usually necessary, however, in order to remember and understand the information presented (Wittrock 1990). On the other hand, learners often encounter material that is simply not designed to benefit learning (cf. Mayer 1993).

Although it has been successfully demonstrated that learning strategies for understanding texts are effective (e.g., Artelt 2000; Dansereau et al. 1979; Mandl and Friedrich 2006), according to our knowledge, there are presently no comprehensive learning strategies which facilitate learning from text-picture combinations. Only isolated techniques for learning from pictures have been proposed (e.g., Peeck 1994b; Seufert 2003; Weidenmann 1988). In this paper, we therefore present a learning strategy which systematically supports learning from texts and pictures.

We first begin by identifying the potentials and challenges of learning with texts and pictures. Thereafter, we describe the measures available which support learners to take advantage of the potentials and to successfully cope with the challenges they encounter. In order to conceptualize a strategy for learning from text-picture combinations, we examine cognitive models of learning from texts and pictures, strategies for learning from texts, as well as the techniques for learning from pictures. Finally, we describe the strategy developed and test its effectiveness in two experimental studies. A discussion of the observed results concludes the paper.

Potentials and challenges of learning from texts and pictures

Texts are often supplemented with pictures in order to improve learning. Numerous studies have demonstrated the beneficial effects that combining texts and pictures have on the learners’ retention (e.g., Clark and Paivio 1991; Levin et al. 1987; Levie and Lentz 1982). Paivio’s Dual-coding Theory (1986) best explains these findings. Paivio (1986) assumes that textual and pictorial information are processed differently and thereby differentiates between a verbal and a visual cognitive system. Whereas textual information is processed for the most part in the verbal system only, pictorial information is processed in the visual system as well as in the verbal system. According to Paivio (1986), this is due to the fact that pictorial information has a higher probability of being additionally internally verbalized than text does of being additionally internally visualized. This dual-coding of pictorial information enables information to be more easily retained and later recalled from memory.

The beneficial effects that pictures in texts have on the learners’ comprehension can be explained in that texts and pictures mutually complement each other in their informational content (e.g., Ainsworth 1999, 2006), or that pictures assist to correctly interpret the text and vice versa (e.g., Ainsworth 1999, 2006). Furthermore, both Engelkamp (1990) and Weidenmann (1988) assume that certain information in pictures is almost automatically processed due to its resemblance to objects in the physical world. This has the advantage that information can be taken from pictures without having to carry out error-prone inference processes (see also Larkin and Simon 1987).

Nevertheless, pictures in combination with texts do not always lead to higher learning success (e.g., Peeck 1994a). This is evident in various studies in which the learning success fell short of the expectations, especially with respect to comprehension (e.g., Levie and Lentz 1982; Weidenmann 1991). Learners frequently have difficulty relating the information given in texts to the information presented in pictures (e.g., Bodemer et al. 2004; Brünken et al. 2005; Mayer 2001; Sweller et al. 1998).

Many learners therefore require support to appropriately take advantage of pictures in texts (Ainsworth et al. 2002; Seufert 2003). In contrast to learning from texts, students are not taught strategies for learning from pictures during their education (cf. Lieber 2008). This could be due to the fact that pictures seem easy to comprehend and, as a result, the processing requirements of pictures are often underestimated (cf. Salomon 1984).

In order to develop a systematic approach to support learning from pictures in texts, we must first determine which information processes promote understanding when learning from different representations. In the following section, two process models of learning from text-picture combinations are described.

Process models of text-picture combinations

The Theory of Multimedia Learning from Mayer (2001) proposes which cognitive processes are relevant to appropriately process information from different representations and which processes are relevant to be able to integrate this information into a coherent mental model. Mayer (2001) conceives multimedia material as being combinations of spoken or printed texts and static or dynamic pictures. The starting point of his theory forms the supposition that the human memory is divided into three sub-systems: the sensory registers, the working memory, and the long-term memory (Atkinson and Shiffrin 1971; Baddeley 1986). The working memory plays a pivotal roll in processing information. Mayer (2001) formulates three basic assumptions concerning the working memory. First, that the working memory is made up of both an auditory-verbal and a visual-pictorial channel (Baddeley 1986; Paivio 1986). Second, that the capacity of the working memory is limited, i.e., only a limited amount of information can be processed simultaneously (Atkinson and Shiffrin 1971; Baddeley 1986). Third, that successful learning from different representations requires an active processing and integration of the information presented (Wittrock 1990).

Mayer (2001) also regards three types of cognitive processes as being important: the selection, the organization, and the integration of information. Furthermore, the model assumes that the processes of recoding verbal to visual information and vise versa are sub-processes of information organization. These processes are referred to as the transformation of information.

According to Mayer (2001), the first step when learning from multimedia is to select relevant words from the available texts and relevant elements from the available pictures. Subsequently, the material needs to be further processed in order to gain an understanding of the information and to retain it in long-term memory. For this purpose, it is necessary to organize the selected information. At first this takes place separately for textual and pictorial information so that both a verbal model and a pictorial model develop. Thereafter, by means of transformation processes, mental images might be generated from the verbal model and internal verbalizations from the pictorial model. Finally, in order to store the information in the long-term memory, both the verbal and pictorial models, as well as already existing knowledge must be integrated by relating corresponding elements to each other.

Schnotz and Bannert (2003) formulate a similar model. Like Mayer (2001), they assume in their Integrated Model of Text and Picture Comprehension that texts and pictures are processed in different channels. Schnotz and Bannert (2003) emphasize the transmissibility of these channels for all types of representations. It is therefore possible for information to be processed in both channels. Initially, texts are processed subsemantically and a text-surface representation is constructed. In the case of pictures, graphical entities are identified and distinguished from one another so that visual images can be generated. Through further semantic processing of texts and pictures, the information is integrated into a mental model as well as a propositional representation.

The theory from Mayer (2001) as well as the model from Schnotz and Bannert (2003) both encompass the information processes of selection, organization, transformation, and integration. These four types of cognitive processes are regarded as being crucial for learning from texts and pictures. Nevertheless, the question remains of how these processes can be systematically invoked in the learners. We therefore summarize existing approaches to facilitate learning from texts and pictures in the following section.

Facilitating measures

There are two main approaches available to facilitate learning with text-picture combinations: either the learning material can be “optimized” through the implementation of various design principles, or the learners can be empowered to competently deal with representations through the use of learning strategies. The goal of both approaches is to improve the intake and processing of information.

Current research provides numerous recommendations which aim at designing texts and pictures in a manner that they can be appropriately processed by the learners. Headings and segments, for instance, help to organize texts, whereas highlighting helps to emphasize important terms. Pictures should be reduced to the basics and information in the pictures should be pointed out by means of markings and labels. Ballstaedt (1997) and Dwyer (1978) provide overviews of such design measures.

Various principles for designing text-picture combinations have been formulated on the basis of research on multimedia learning (for overviews see Clark and Mayer 2008; Mayer 2005). For instance, according to The Split-Attention Principle, text and picture information should be presented in an integrated format rather than a spatially separated format (e.g., Ayres and Sweller 2005). In contrast to a separated format, an integrated format aims at minimizing unnecessary visual search processes. This in turn makes cognitive capacity available for the relevant learning processes. The Spatial Contiguity Principle makes similar assumptions in that texts and pictures should be presented spatially close to each other (e.g., Mayer 2005). A further example is The Coherence Principle which emphasizes that texts and pictures should only be added to the learning material if they are relevant to the subject matter (e.g., Mayer 2005). Seductive details should not be included because they detract attention from the pertinent information.

A multitude of further approaches which aim to support coherence formation when learning from texts and pictures are also available. The use of various graphical aids has been investigated to help clarify the relationship between text and pictures. Examples are labeling, in which the individual picture elements are labeled (e.g., Bartholomé 2007; Florax and Ploetzner 2010); color coding, in which identical colors are used in texts and pictures to highlight corresponding information (e.g., Kalyuga et al. 1999); or inter-representational hyperlinks, in which lines or arrows are used to visualize the relationship between text and pictures (e.g., Brünken et al. 2005).

Although the implementation of such design principles has proven to be beneficial (cf. Ginns 2006), individual differences still arise. For example, learners profit differently from optimized material depending upon their pre-knowledge (expertise reversal effect; e.g., Kalyuga et al. 2003). Even with optimized material, learners often have difficulty to successfully process the representations (cf. Weidenmann 1989). In addition, in everyday life, learners do not always encounter material whose design is based upon the aforementioned principles. The question therefore arises of how can learners be encouraged to actively and systematically process text-picture combinations.

Suggestions on how to induce and promote relevant comprehension processes can be found in research on learning strategies. At present, however, the field is almost completely geared towards learning from texts (e.g., Artelt 2000; Hasselhorn 1987; Leutner and Leopold 2003; Marton and Saljö 1984; Weinstein and Mayer 1986; for overviews see Mandl and Friedrich 1992, 2006). With the exception of a few studies in the past years (e.g., Drewniak 1992; Kombartzky et al. 2010; Lewalter 2003), other representational formats, such as pictures, have received hardly any attention.

Although—according to our knowledge—a concise strategy for learning from texts and pictures does not presently exist, the research available provides a good starting point for the conceptualization of such a strategy. Both the results from learning strategy research for text comprehension as well as individual learning techniques for picture comprehension can be taken advantage of.

According to Streblow and Schiefele (2006), a learning strategy can be understood as “… a sequence of efficient learning techniques, which are used in a goal-orientated and flexible way, are increasingly automatically processed, but remain consciously applied” (p. 353; translation by the authors). Learning techniques thereby denote the individual components of a learning strategy, e.g., underlining text or marking important elements in a picture. Only when a number of learning techniques are coordinated in a goal-oriented way do they constitute a learning strategy. The application of a learning strategy aims at inducing, supporting, and sustaining effective learning processes.

In the early phases of learning strategy research, the main objective was to identify how do successful and less successful learners differ from one another in their strategic behavior. Marton and Saljö (1984), for example, have empirically identified two different approaches to learning from texts. They distinguish between a so-called surface level approach, which is characterized by memorizing the material through repetition, and a so-called deep level approach, in which an understanding of the material is attained by elaborating the connections between separate pieces of information. Successful learners mainly employ deep level approaches. Similar results have been reported in other studies (e.g., Pask 1976; Svensson 1977).

Based upon these research results, methods have been developed which make it possible to teach deep level strategies to the learners. Many of these strategies exhibit common underlying conceptual ideas and employ similar learning techniques. In a strategy proposed by Ballstaedt (2006), the learners start by selecting important assertions and underlining them. Next, they divide the text into segments and assign headings. Afterwards, the text is condensed through incremental consolidation and summarization. Other models, such as the PQ4R-Method (Preview, Question, Read, Reflect, Recite, Review) from Thomas and Robinson (1972), employ different learning techniques which aim at inducing cognitive processes similar to those stimulated by Ballstaedt’s method (2006). The PQ4R-Method consists of six learning techniques: (1) Survey the material to get a general overview (Preview), (2) formulate questions about the text (Question), (3) read the text thoroughly while keeping the formulated questions in mind (Read), (4) reflect on the text by relating the information to prior knowledge or formulating examples (Reflect), (5) answer the questions by giving an account of the text in one’s own words (Recite), and (6) try to recall or summarize the information that has been read without looking at the text (Review).

A close analysis of such learning strategies reveals certain commonalities. The first techniques of a learning strategy often aim at obtaining a general overview in addition to the initial selection of important information. The techniques that follow encourage the learner to organize the information. Building upon this, learning techniques which aim at the transformation and integration of the subject matter are stimulated, e.g., by recounting in one’s own words and constructing associations to prior knowledge. These approaches clearly show parallels to the process models previously described.

In respect to facilitating learning with pictures, up until now only various isolated learning techniques have been proposed. For instance, in order to emphasize the importance that pictures have on learning, the learners are requested to orient themselves towards the learning material (Salomon 1984; Weidenmann 1989) and to pay attention to the picture (Peeck 1994b). Peeck (1994b) further challenges learners to create pictures of their own. Peeck (1994b) as well as Weidenmann (1994) request the learners to answer questions concerning the pictures at hand. Weidenmann (1988) prompts the learners to compare different pictures.

Dean and Kulhavy (1981), as well as Brünken et al. (2005) assume that text-picture combinations are better understood when learners have to complete specific tasks such as labeling the pictures or specifying the characteristic properties of the pictures. The research from Drewniak (1992) and Seufert (2003), as well as from Bodemer et al. (2004) focuses on how learners can be encouraged to systematically relate information from texts and pictures to each other.

The following section takes advantage of the previously described findings and combines various learning techniques in order to conceptualize a comprehensive strategy for learning from texts and pictures.

Conceptualizing a learning strategy

As the starting point for the conceptualization of a learning strategy, we drew upon the process categories of selection, organization, integration, and transformation of information as identified in the models of Mayer (2001) and Schnotz and Bannert (2003). These processes are potentially important during every learning phase, thus a learning strategy should aim to induce them. Furthermore, the processes should be sequentially ordered when they are prompted by a learning strategy (cf. Ballstaedt 2006; Dansereau et al. 1979; Mayer 2001; Thomas and Robinson 1972). The processes of information selection and organization are often encouraged before the processes of information transformation and integration take place. When formulating concrete learning techniques, the cognitive processes can be induced by drawing on existing techniques for learning from text, pictures and text-picture combinations. Table 1 shows the entire learning strategy composed of six learning techniques.

Table 1 A strategy for supporting learning from text-picture combinations

Initially, the learners should obtain an overview which helps them to grasp the learning material on the whole and to raise learning expectations (cf. Friedrich 1995; Thomas and Robinson 1972). This is recognized as a component of the selection process since it is only on the basis of the material as a whole that the importance of the individual pieces of information can be determined. Learners are prompted to mark relevant elements in the picture. In doing so, the frequently used technique of underlining important assertions in the text (e.g., Hasselhorn and Schreblowski 2002) is carried over to the processing of the picture. The learners should subsequently label the picture, thereby relating text and picture to each other. This promotes the organization as well as the integration of the material. Building upon these processes, further integration and transformation processes are encouraged. The learners are requested to summarize the text and pictures in their own words (cf. Ballstaedt 2006) and to make a sketch (cf. van Meter and Garner 2005). In both cases, text and picture information are not only integrated, they are transformed into a new knowledge representation as well. The learners are required to generate both a textual and a pictorial summarization of the material in order to enhance a dual-coding of the relevant information (cf. Paivio 1986).

The learning strategy has been abstractly formulated in Table 1. In order to make the strategy suitable for a specific group of learners, it needs to be specified with concrete instructions. The instructions shown in Table 2 have been formulated for 6th grade students ranging in age between 11 and 13 years old.

Table 2 Working instructions that the students in the Strategy Group received during the learning phase

Empirical studies

A pilot study and a main study were carried out in order to test the effectiveness of the conceptualized strategy. The research question and hypothesis, the design, material, and procedure were all identical for both studies.

Research question and hypothesis

The empirical studies focus on whether the use of the previously described learning strategy, when learning from text–picture combinations, results in more successful learning than when such a strategy is not employed. It is expected that the use of the strategy will have a positive impact on learners’ retention and understanding of the material since the learning strategy systematically induces those cognitive processes considered to be relevant for successful learning.

Design

Two groups were investigated in each study: a Strategy Group and a Control Group. Only the Strategy Group was given the learning strategy (cf. Table 2). In order for learners in the Control Group to interact meaningfully with the learning material, they were requested to write a summary of what they had learned.

Material

Learning material

The subject of the learning material was “Dances of the Honeybee” (for an example see Fig. 1). Honeybees perform dances in order to communicate the location of food sources to other bees. Depending upon the distance of the food source, the type of dance varies between a round dance and a waggle dance. During the round dance, the bee flies inside the beehive in a circular pattern which the other bees imitate and then swarm out. During the waggle dance, the bee flies in a pattern resembling the Fig. 1; in doing so it shares information about the distance of the food source as well as its orientation in relationship to the position of the sun. The texts and pictures were compiled on the basis of material presented in Microsoft Encarta (2002). The material used in the studies consisted of four text-picture combinations, each respectively on an A4-page with the text presented above the picture. The texts averaged between 24 and 77 words. The relevant information was divided between the texts and pictures so that both representations had to be taken into account in order for the material to be understood.

Fig. 1
figure 1

An example page taken from the learning material (picture taken from Microsoft Encarta 2002; screenshot reprinted with friendly permission from the Microsoft Corporation)

Learning strategy

The learners in the Strategy Group were requested to follow the instructions described in Table 2 during the learning phase. The instructions were presented to the learners on a worksheet. The instructions were the same for each of the four text–picture combinations; hence, the learners carried out the various learning techniques a total of four times during the learning phase.

Pre- and post-test

The pre-test consisted of eight open questions which assessed the learners’ factual knowledge about the dances of honeybees. The post-test consisted of 24 open questions: eight questions assessing factual knowledge, eight questions assessing conceptual knowledge, and eight questions assessing transfer knowledge (cf. Anderson and Krathwohl 2001; for examples see Table 3). The questions assessing factual knowledge were the same as the questions included in the pre-test.

Table 3 Examples of the three types of knowledge addressed in the post-test

Questions dealing with factual knowledge address information that is directly presented in the learning material—either in the texts or in the pictures. These questions provide an indicator for retention. In contrast, questions assessing conceptual knowledge require different pieces of information to be combined and inferences to be drawn. These questions are therefore an indicator for comprehension. Questions assessing transfer knowledge provide an indicator for the ability of close transfer. In this case, learners need to be able to apply the acquired facts and concepts to new problem situations.

The evaluation of the learners’ performance on the tests was carried out using an answer sheet which provided the minimum requirements for the correct answers to each question, i.e., it was defined which information must be provided by the learner in order to answer the questions correctly. One point was awarded for each question answered correctly.

Procedure

The participating students were randomly assigned to the Strategy Group or the Control Group. The Strategy Group was given a short introduction (approx. 10 min) to the learning strategy by the investigator. A demonstration of the learning strategy was provided using texts and pictures about the human circulatory system. The material used in the introduction exhibited similar arrangements of texts and pictures as found in the learning material.

The Control Group received a short recapitulation on writing summaries. It was assumed that the students were already familiar with creating summaries. Thereafter, both groups obtained the pre-test. In order to familiarize the students with the content of the learning material, a short story was read to both groups. The story described how the dances of the honeybees were discovered.

During the learning phase, the students obtained text-picture combinations which focused on the dances of the honeybees. The students in the Strategy Group were given the learning strategy worksheets and were encouraged to carry out each step of the strategy. The processing of the learning strategy was randomly inspected. The quality of the processing, however, was not the subject of examination here.

The students in the Control Group received a sheet of paper on which they were able to write their summaries after processing the learning material. A learning time of 40 min was set for all learners. The post-test took place directly following the learning phase. The procedure was carried out during class and required a total of 90 min.

Pilot study

In addition to serving as a first trial of the learning effectiveness of the strategy, the pilot study also assessed the usability and understandability of the tests and the learning material. In total, 61 sixth-grade students with a mean age of 12.08 years (SD = 0.46) participated in the pilot study. The study was carried out with two classes from a middle school in south-west Germany. The students from each class were randomly assigned to the Strategy Group or the Control Group.

The Strategy Group (M = 1.90, SD = 0.30) and the Control Group (M = 1.93, SD = 0.25) showed nearly the same performance on the pre-test. Both groups exhibited very little prior knowledge of the subject matter. There were no significant differences between groups with respect to prior knowledge (t(59) = −0.49, ns).

The Strategy Group performed significantly better on the post-test than did the Control Group (see Table 4). This applies at the multivariate level (F(3,56) = 10.19, p < 0.01, η 2p  = .35), as well as at the univariate levels with respect to all three types of knowledge: factual knowledge (F(1,58) = 6.82, p < 0.05, η 2p  = 0.11), conceptual knowledge (F(1,58) = 10.05, p < 0.01, η 2p  = 0.15), and transfer knowledge (F(1,58) = 26.66, p < 0.01, η 2p  = 0.32).

Table 4 The means (M) and the standard deviations (SD) on the post-test in the pilot study

Schlag and Ploetzner (2009) describe the results of the pilot study in further detail. The learning materials, tests, and the amount of time allocated for learning all proved to be adequate in the pilot study. The results of the pilot study suggested that the strategy can have a positive influence on learning success. In order to confirm the learning conduciveness of the developed learning strategy, an additional study was carried out on the basis of replicating the results on a new and larger sample.

Main study

Participants

A total of 133 sixth-grade students took part in the main study: 71 girls and 62 boys, with a mean age of 11.59 years (SD = 0.59). The study was carried out with five classes from two different middle schools in south-west Germany. The Strategy Group was comprised of 70 students: 37 girls and 33 boys, with a mean age of 11.55 years (SD = 0.56). The Control Group was comprised of 63 students: 34 girls and 29 boys, with a mean age of 11.65 years (SD = 0.60). The students from each class were randomly assigned to the Strategy Group or the Control Group.

Results

The Strategy Group answered on average 1.84 (SD = 0.44) from eight questions correctly on the pre-test, whereas the Control Group answered 1.57 (SD = 0.66) correctly. Although prior knowledge in both groups was low, there was a significant difference between the two groups (t(131) = −2.75, p < 0.01, d = 0.48) with respect to prior knowledge.

Because the post-test consisted of open questions, the students’ answers were scored by two independent raters. The interrater reliability (intraclass correlation coefficient, ICC) was ICC(3,k) = 0.95. Differences in the two ratings were jointly settled by the raters.

The descriptive statistics demonstrate better results on the entire post-test for the Strategy Group than for the Control Group (see Table 5). The Strategy Group exhibited more successful learning than the Control Group with respect to factual knowledge, conceptual knowledge, and transfer knowledge.

Table 5 The means (M) and the standard deviations (SD) on the post-test in the main study

The students’ prior knowledge correlates significantly with their factual knowledge (r = 0.22, p < 0.01), but not with the other two types of knowledge or with the overall score in the post-test. In order to determine significant differences between the two groups on the post-test, a multivariate analysis of covariance (MANCOVA) was calculated with the factor Group (Strategy Group vs. Control Group) as an independent variable, prior knowledge as a covariate, and the three types of knowledge from the post-test as dependent variables.

Across all types of questions, the analysis does not yield a significant influence of prior knowledge on the acquisition of knowledge (F(1,130) = 0.74, ns). The analysis revealed that students in the Strategy Group attained significantly better results at both the multivariate level (F(1,130) = 24.55, p < 0.01, η 2p  = 0.16), as well as at the univariate levels, than did students in the Control Group. The students in the Strategy Group performed significantly better with respect to all three types of knowledge: factual knowledge (F(1,130) = 16.68, p < 0.01, η 2p  = 0.11), conceptual knowledge (F(1,130) = 12.56, p < 0.01, η 2p  = 0.09), and transfer knowledge (F(1,130) = 9.82, p < 0.01, η 2p  = 0.07).

Discussion

In this paper, a learning strategy for learning from illustrated texts was presented. The starting point for the conceptualization of this strategy was based upon current theories and models of multimedia learning. In these models, four kinds of cognitive processes are in the foreground: selection, organisation, integration and transformation of information. We assumed that if a learning strategy could systematically induce these processes, then learning would be more successful.

Unfortunately, however, the theories and models of multimedia learning do not directly suggest how the cognitive processes which they describe can systematically be activated. A further source of information for conceptualizing a strategy for learning from text–picture combinations was therefore needed. We drew upon the specific learning techniques that have already been successfully employed in strategies for learning from text, as well as various techniques that have been proposed for learning from pictures. The strategy we constructed on the basis of these sources of information is made up of six different learning techniques. These techniques aim to systematize the learners’ information processing when learning from textual and pictorial representations. They particularly aim at fostering the integration of information taken from both representations as well as from prior knowledge.

The learning effectiveness of the strategy was evaluated in two empirical studies. Both studies demonstrated that students who were requested to take advantage of the learning strategy achieved significantly larger learning gains than those students who were not given the learning strategy. The learning advantage of the Strategy Group was not only evident on the entire post-test, but for each type of knowledge assessed as well, i.e., factual, conceptual, and transfer knowledge. The corresponding effect sizes are middle to large.

The study proved the fundamental learning effectiveness of the proposed strategy. It was shown that after a short introduction, students were already able to benefit from a strategy that was new to them. Nevertheless, it remains unclear as to what extent and for how long the learners can benefit from such a strategy. It seems unlikely that the learners are in the position to employ the strategy outside of the experimental setting. Rather, an intensive training would be required if the learning strategies are to be “used in a goal-orientated and flexible way” and “increasingly automatically processed” (Streblow and Schiefele 2006, p. 353). During such a training, the learners would be provided the opportunity to apply the strategy to different learning materials and to internalize the strategy step by step. As a result, the learners would become increasingly confident and able to apply the strategy to new learning material on their own. Only then will the learners become capable of transferring the strategy to new learning contexts as well (cf. Hasselhorn 1987).

The question further arises of why the learners in both of our studies experienced almost no difficulties in applying the new learning strategy, whereas the learners in other studies did (e.g., Clark 1990; Drewniak 1992). One of our own studies also revealed that university students rarely succeed in directly taking advantage of new learning strategies (Schlag et al. 2007). In this study, two groups of students were compared: one group learned with a surface level learning strategy and the other group learned with a deep level learning strategy which was similar to the strategy examined in this paper. The analysis of think aloud protocols showed that neither surface level strategy learners nor deep level strategy learners adopted the requested and practiced strategies. Instead of taking advantage of the proposed strategies, they retained their already existing learning habits. Furthermore, an analysis of the protocols revealed that those learners who actually made use of a deep level approach nevertheless outperformed the learners who actually made use of a surface level approach (cf. Marton and Saljö 1984). Based on current research, it can be assumed that prior learning experiences influence how well learners pick up a new strategy and apply it (Hasselhorn and Körkel 1986). It is thus hypothesized that already existing and practiced learning techniques and strategies, as is often the case with older learners, can impede the acquisition and application of new strategies (cf. Hasselhorn 1987). Therefore, additional studies are needed to investigate if and how a strategy for learning from illustrated texts can be successfully taught to older learners as well.

In order to formulate a conclusion about the general learning-conduciveness of the strategy, it is also necessary to examine how successful learning is when employing the strategy with different text-picture combinations. Metz and Wichert (2009) demonstrated that the conceptual framework put forward in this paper could be successfully adapted to create a learning strategy for a text-picture combination dealing with a different subject matter, namely a “knight’s castle”. Nevertheless, it would be useful to evaluate the strategy with a wider range of learning materials.

Even when the basic learning effectiveness of the strategy has been demonstrated in our studies, we still do not know how the strategy precisely affects the learning process and contributes to learning success. In order to better understand how the strategy works, the individual learning processes need to be taken into account. For this purpose, it would be helpful to record and analyze think aloud protocols (e.g., Lewalter 2003). The results of these analyses could also help to further improve the strategy or to tailor it to the individual differences of learners.